WO2020248780A1 - Living body testing method and apparatus, electronic device and readable storage medium - Google Patents

Living body testing method and apparatus, electronic device and readable storage medium Download PDF

Info

Publication number
WO2020248780A1
WO2020248780A1 PCT/CN2020/091047 CN2020091047W WO2020248780A1 WO 2020248780 A1 WO2020248780 A1 WO 2020248780A1 CN 2020091047 W CN2020091047 W CN 2020091047W WO 2020248780 A1 WO2020248780 A1 WO 2020248780A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
living body
probability
frame
sample
Prior art date
Application number
PCT/CN2020/091047
Other languages
French (fr)
Chinese (zh)
Inventor
王鹏
姚聪
卢江虎
李念
Original Assignee
北京迈格威科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京迈格威科技有限公司 filed Critical 北京迈格威科技有限公司
Publication of WO2020248780A1 publication Critical patent/WO2020248780A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive

Definitions

  • the embodiments of the present application relate to the field of data processing technology, and in particular, to a living body detection method, device, electronic device, and readable storage medium.
  • identification technology in the field of data processing technology is widely used in security, finance and other fields, such as face recognition, palmprint recognition or fingerprint recognition based access control unlocking, mobile phone unlocking, remote payment, remote account opening, etc.
  • identification technology Security is getting more and more attention. For example, people will pay attention to how to determine that the recognition object comes from a real person when the recognition object is recognized through the device. For this reason, related technologies have proposed live detection methods.
  • the detection method proposed by related technologies is: first, the object to be detected is required to complete specified facial actions such as opening the mouth and blinking in front of the camera, and the camera captures the specified facial actions. Based on the face image, the processor determines whether the object to be detected in the face image is a living body.
  • facial actions such as opening the mouth and blinking will affect the accuracy of face recognition and reduce the user experience.
  • live detection is performed based on a single image, and the accuracy of live detection is low.
  • the embodiments of the present application provide a living body detection method, device, electronic device, and readable storage medium, aiming to improve the accuracy of living body detection.
  • the first aspect of the embodiments of the present application provides a living body detection method, the method including:
  • the object to be detected is a living body.
  • the method further includes:
  • determining whether the object to be detected is a living body includes:
  • the object to be detected is a living body.
  • the method further includes:
  • the respective characteristics of the multiple frames of video images are spliced to obtain the characteristics of the video, and the video characteristics are used to characterize the inter-frame correlation.
  • determining whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images includes:
  • the second probability and the corresponding probability, and the first probability and the corresponding weight corresponding to each of the multiple frames of video images it is determined whether the object to be detected is a living body.
  • the method further includes:
  • sample video set where the sample set includes a plurality of sample videos carrying tags, and the tags carried by the sample videos represent whether the sample videos are videos collected for a living body;
  • the characteristics of the sample video image of the frame are input into the first fully connected layer of the model to be trained, and the third probability corresponding to the sample video image of the frame is obtained.
  • the third probability characterizes whether the sample video image of the frame is derived from a live body Video of
  • determining the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image includes:
  • the feature of the frame of video image is input into the first fully connected layer of the living body detection model to determine the first probability that the frame of video image represents whether the object to be detected is a living body.
  • the method further includes:
  • Inputting the third probability corresponding to each of the multiple frames of sample video images into the second fully connected layer of the model to be trained to obtain whether the sample video is an estimated probability of a video collected from a living body includes:
  • the fourth probability and the third probability corresponding to the multi-frame sample video image are input into the second fully connected layer of the model to be trained to obtain the estimated probability of whether the sample video is a video collected by a living body .
  • determining whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images includes:
  • the first probability corresponding to each of the multiple frames of video images is input into the second fully connected layer of the living body detection model to determine whether the object to be detected is a living body.
  • the method further includes:
  • a second aspect of the embodiments of the present application provides a living body detection device, which includes:
  • the first extraction module is configured to extract multiple frames of video images from the video collected for the object to be detected
  • the first determining module is configured to, for each frame of video image in the multiple frames of video images, determine the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image;
  • the second determining module is configured to determine whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images.
  • the device further includes:
  • a third determining module configured to determine, according to the inter-frame correlation of the multiple frames of video images, the second probability that the inter-frame correlation characterizes whether the object to be detected is a living body
  • the second determining module includes:
  • the first determination submodule is configured to determine whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images.
  • the device further includes:
  • the first splicing module is used to splice the respective characteristics of the multiple frames of video images to obtain the characteristics of the video, and the video characteristics are used to characterize the inter-frame correlation.
  • the first determining submodule includes:
  • An allocation subunit configured to allocate weights to the first probabilities corresponding to the second probability and the multiple frames of video images, wherein the weight corresponding to the second probability is greater than the weight corresponding to each of the first probabilities;
  • the determining subunit is configured to determine whether the object to be detected is a living body according to the second probability and its corresponding probability, and the first probability and its corresponding weight corresponding to each of the multiple frames of video images.
  • the device further includes:
  • the first obtaining module is configured to obtain a sample video set, the sample set includes a plurality of sample videos carrying tags, and the tags carried by the sample videos indicate whether the sample video is a video collected for a living body;
  • the second extraction module is configured to extract a multi-frame sample video image from the sample video with a mark for each sample video with a mark included in the sample video set;
  • the first input module is configured to input each frame of the sample video image in the multi-frame sample video image into the convolutional layer of the model to be trained to obtain the characteristics of the frame sample video image;
  • the second input module is used to input the characteristics of the frame sample video image into the first fully connected layer of the model to be trained to obtain a third probability corresponding to the frame sample video image, and the third probability represents the frame sample video image Whether it comes from a video collected from a living body;
  • the third input module is configured to input the third probability corresponding to each of the multi-frame sample video images into the second fully connected layer of the model to be trained to obtain an estimate of whether the sample video is a video collected by a living body Probability
  • the second obtaining module is configured to establish a loss function according to the estimated probability and the third probability corresponding to each of the multi-frame sample video images, so as to update the model to be trained and obtain a live detection model;
  • the first determining module includes:
  • the first input submodule is configured to input each frame of the video image in the multi-frame video image into the convolutional layer of the living body detection model to obtain the characteristics of the frame of video image;
  • the second input submodule is configured to input the characteristics of the frame of video image into the first fully connected layer of the living body detection model to determine the first probability that the frame of video image represents whether the object to be detected is a living body.
  • the device further includes:
  • the second splicing module is used to splice the respective characteristics of the multiple frames of sample video images to obtain the characteristics of the sample video;
  • the fourth input module is configured to input the characteristics of the sample video into the third fully connected layer of the model to be trained to obtain the fourth probability of whether the sample video is a video collected for a living body;
  • the third input module includes:
  • the third input sub-module is used to input the fourth probability and the third probability corresponding to each of the multi-frame sample video images into the second fully connected layer of the model to be trained to obtain whether the sample video is for a living body The estimated probability of the captured video.
  • the third determining module includes:
  • the fourth input sub-module is configured to input the first probability corresponding to each of the multiple frames of video images into the second fully connected layer of the living body detection model to determine whether the object to be detected is a living body.
  • the device further includes:
  • the third obtaining module is used to obtain the video captured by the video capturing device when the object to be detected is in a silent state.
  • a third aspect of the embodiments of the present application provides a readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps in the method described in the first aspect of the present application are implemented.
  • a fourth aspect of the embodiments of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the method described in the first aspect of the present application when executed A step of.
  • the living body detection method provided in this application, by extracting multiple frames of video images from the video collected for the object to be detected, for each frame of video image, according to the characteristics of the frame of video image, it is determined whether the frame of video image represents the object to be detected It is the first probability of a living body, and finally, according to the determined multiple first probabilities, it is comprehensively determined whether the object to be detected is a living body.
  • the living body detection method provided in this application is based on a video collected for the object to be detected, the living body detection is performed. Specifically, extract multiple frames of video images from the video, use multiple frames of video images to characterize the video, and then for each frame of video image, according to the characteristics of the frame of video image, determine whether the frame of video image represents whether the object to be detected is a living body Finally, according to the determined multiple first probabilities, comprehensively determine whether the object to be detected is a living body. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate.
  • the living body detection method provided by the present application extracts multiple frames of video images from the video collected by the object to be detected, the redundant information of the video can be reduced, thereby reducing the amount of calculation and improving the detection efficiency.
  • the living body detection method provided by the present application does not require the subject to be detected to complete specified facial actions such as opening mouth and blinking in front of the camera, which can not only avoid the impact of facial actions on the accuracy of face recognition, but also make it unnecessary for users to make In the case of designated facial movements, complete the living body detection, thereby improving the user experience.
  • FIG. 1 is a schematic diagram of a training process of a living body detection model in an embodiment of the present application
  • FIG. 2 is a flowchart of a living body detection method proposed in an embodiment of the present application.
  • FIG. 3 is another flowchart of the living body detection method proposed by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a living body detection device provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the device needs to collect the user's fingerprint or palm print, or needs to capture the user's face or palm print and other identification objects. Take the face or palm print of the photographed object as an example, in order to prevent the attacker from showing the face photo or palm print photo of another person to the camera, causing the attacker to pass the verification without the permission of others.
  • unauthorized access to another person s account or account, it is necessary to perform a live detection of the face or palm print in the photo taken by the camera to determine whether it is from a real person, that is, to determine whether it is from a living body.
  • a living body judgment method provided by related technologies: First, the object to be detected is required to complete specified facial actions such as opening mouth and blinking in front of the lens.
  • the lens collects a face image for the specified facial action, and the processor judges the face image based on the face image. Whether the object to be detected in the face image is a living body.
  • facial actions such as opening the mouth and blinking will affect the accuracy of face recognition and reduce the user experience.
  • live detection is performed based on a single image, and the accuracy of live detection is low.
  • the applicant proposes to perform live detection based on a video collected for the object to be detected.
  • this application extracts multiple frames of video images from the video, and then for each frame of video image, according to the characteristics of the frame of video image, it is determined whether the frame of video image characterizing the object to be detected is the first of a living body. Probability, and finally according to the determined multiple first probabilities, comprehensively determine whether the object to be detected is a living body.
  • this application uses video as the basis to perform live body detection, and the detection result is more accurate.
  • the applicant first constructed the model to be trained, and trained the model to be trained based on the sample video set to obtain a live detection model (for example : The first living body detection model or the second living body detection model described below), the applicant uses the living body detection model to perform part or all of the steps in the above method.
  • a live detection model for example : The first living body detection model or the second living body detection model described below
  • FIG. 1 is a schematic diagram of a training process of a living body detection model in an embodiment of the present application.
  • the living body detection model includes: a convolutional layer, a first fully connected layer, and a second fully connected layer.
  • the convolutional layer can specifically adopt a convolutional neural network.
  • the model to be trained also includes a convolutional layer, a first fully connected layer, and a second fully connected layer. After training, the model parameters of the model to be trained are updated and adjusted, and finally the live detection model is obtained.
  • the sample video set is a sample video set about a human face as an example, and each step is introduced.
  • the type of sample video set is not limited to the sample video set about human faces. For example, it can also be a sample video set about palm prints. If the training model is trained based on the sample video set about palm prints, the final result is
  • the living body detection model can be used for living body detection for palmprint videos.
  • S110 Obtain a sample video set, the sample set includes a plurality of sample videos carrying tags, and the tags carried by the sample videos indicate whether the sample videos are videos collected for a living body.
  • part or all of the sample videos in the sample video set may be videos collected by the video collection device when the training participant is in a silent state.
  • the training participants When collecting videos of the training participants, the training participants only need to look at the video collection device, and the training participants are not required to complete designated facial actions such as opening their mouths, blinking eyes, and reading aloud in front of the camera.
  • a silent video can be taken for the face of each training participant among multiple training participants (real people).
  • the duration of the video can be controlled within 1 to 3 seconds, and such videos taken for real people can be labeled ,
  • Make this kind of video carry a tag, which indicates that this kind of video is a video collected from a living body.
  • You can shoot a video for each of the non-living bodies such as multiple printed photos, screen photos, and masks.
  • the length of the video can be controlled within 1 to 3 seconds.
  • This type of video shot for non-living bodies can be marked to make This type of video carries a tag that indicates that the video is not a video collected from a living body.
  • S120 For each sample video with a mark included in the sample video set, extract multiple frames of sample video images from the sample video with a mark.
  • each sample video carrying a mark it can be first divided into N sub-segments, and then a frame of RGB video image is extracted from each sub-segment as the sample video image, and finally from each sample carrying the mark A total of N frames of sample video images can be extracted from the video.
  • S130 Input each frame of the sample video image in the multi-frame sample video image into the convolutional layer of the model to be trained to obtain the characteristics of the frame sample video image.
  • N frames of sample video images can be sequentially input to the convolutional neural network of the model to be trained, and the convolutional neural network outputs a three-dimensional convolution feature for each frame of sample video image, that is, the feature of the sample video image of the frame.
  • the convolutional neural network outputs a three-dimensional convolution feature for each frame of sample video image, that is, the feature of the sample video image of the frame.
  • multiple frames of sample video images can share a convolutional neural network, and each frame of sample video images in the multiple frame of sample video images can also correspond to a convolutional neural network. Therefore, the number of convolutional neural networks included in the model to be trained may be one or multiple.
  • S140 Input the features of the sample frame of video image into the first fully connected layer of the model to be trained to obtain a third probability corresponding to the sample frame of video image, and the third probability represents whether the sample frame of video image is derived from a living body The captured video.
  • each feature of the N frames of sample video images can be sequentially input to the first fully connected layer, and the first fully connected layer outputs a probability vector of the shape (x, y) for the features of each frame of sample video image, namely
  • x represents the probability that the sample video image of the frame is derived from a video collected for a living body
  • y represents the probability that the sample video image of the frame is derived from a video collected for a non-living body.
  • S150 Input the third probability corresponding to each of the multiple frames of sample video images into the second fully connected layer of the model to be trained to obtain the estimated probability of whether the sample video is a video collected for a living body.
  • N third probabilities corresponding to N frames of sample video images can be input to the second fully connected layer, and the second fully connected layer outputs a probability vector of the shape (X, Y) for the N third probabilities, namely The estimated probability, where X represents the probability that the sample video is a video collected for a living body, and Y represents the probability that the sample video is a video collected for a non-living body.
  • S160 Establish a loss function according to the estimated probability and the third probability corresponding to each of the multi-frame sample video images, so as to update the model to be trained and obtain a live detection model.
  • the loss function is established, based on the estimated probability, such as a probability vector of (X, Y), and a third probability corresponding to each of the multi-frame sample video images, such as a probability vector of (x, y), the loss function is established, Using the gradient descent method, update the parameters of the model to be trained, and put the updated model to be trained into the next round of training. After multiple rounds of training, a live detection model is obtained. For example, after a fixed M round of training, such as 1000 rounds of training, the training ends, and the living body detection model is obtained. For another example, when the loss function of multiple consecutive rounds of training reflects that the model to be trained can accurately predict whether the sample video is a live body, the training is ended to obtain a live body detection model.
  • the implementation manner of establishing the loss function may be: comparing the N third probabilities with the markers carried by the sample video respectively, where each third probability is the prediction result, and the marker carried by the sample video represents the real situation, and the Nth probabilities are obtained.
  • a comparison result, the N first comparison results can represent the accuracy of the sample video prediction of the model to be trained in this round of training.
  • the second comparison result can also represent the model to be trained. In rounds of training, the accuracy of the sample video prediction.
  • the parameters of the model to be trained are adjusted to update the model to be trained. Put the updated model to be trained into the next round of training, and after multiple rounds of training, a live detection model is obtained.
  • the convergence speed of the model to be trained can be accelerated, and on the other hand, the model not only Based on the prediction accuracy of the model to be trained on the sample video, the parameters of the model to be trained are updated, and the parameters of the model to be trained are updated based on the prediction accuracy of each frame of the sample video image of the model to be trained, so that the final live detection
  • the model can output more accurate prediction results.
  • the first living body detection model is obtained.
  • some or all of the following steps can be performed: Extract from the video collected for the object to be detected Generate multiple frames of video images, and then for each frame of video image, according to the characteristics of the frame of video image, determine the first probability that the frame of video image represents whether the object to be detected is a living body, and finally according to the determined multiple first probabilities, Comprehensively determine whether the object to be detected is a living body.
  • the applicant of this application found that in addition to the multi-frame video image extracted from a video can characterize the video, the inter-frame correlation of the multi-frame video image can also be used to characterize the video. If multiple frames of video images and the inter-frame correlation of the multi-frame video images are used to characterize the video at the same time, the inter-frame correlation can be further introduced when performing living detection, which can further improve the accuracy of living detection.
  • the applicant further proposes that the inter-frame correlation is introduced into the living body detection method, and the second probability that the inter-frame correlation characterizes whether the object to be detected is a living body is determined first, and then the second probability and the multi-frame video image The first probability corresponding to each determines whether the object to be detected is a living body. Thereby further improving the accuracy of living body detection.
  • the applicant first constructed the model to be trained, and trained the model to be trained based on the sample video set to obtain the live detection model.
  • the applicant uses the living body detection model to perform some or all of the steps in the method further proposed above.
  • the living body detection model may also include: a feature combination module and a third fully connected layer. It should be understood that the model structure of the model to be trained constructed by the applicant in advance is the same as the model structure of the living body detection model shown in FIG. 1.
  • the model to be trained may also include a feature combination module and a third fully connected layer, and after training , The model parameters of the model to be trained are updated and adjusted, and finally the live detection model is obtained.
  • an embodiment of the present application further proposes steps S142, S144, and S150' on the basis of steps S110, S120, S130, S140, and S160.
  • steps S130, S140, S142, S144, S150' and S160 are the steps of each round of training in multiple rounds:
  • S142 Splicing each feature of the multiple frames of sample video images to obtain the feature of the sample video.
  • the feature combination module can be used to stack the N three-dimensional convolution features to obtain a new three-dimensional convolution feature as the sample video Features.
  • the features of the sample video can characterize the inter-frame correlation of multiple frames of sample video images. For example, after step S130, 8 36*36*25 convolution features are obtained, and after these 8 36*36*25 convolution features are stacked, a 36*36*200 convolution feature is obtained. The 36*200 convolution feature is used as the feature of the sample video.
  • S144 Input the characteristics of the sample video into the third fully connected layer of the model to be trained, and obtain a fourth probability of whether the sample video is a video collected for a living body.
  • the characteristics of the sample video can be input to the third fully connected layer, and the third fully connected layer outputs the probability vector of the shape (x', y') for the characteristics of the sample video, that is, the fourth probability, where x 'Represents the probability that the sample video is a video collected for a living body, and y'represents the probability that the sample video is a video collected for a non-living body.
  • S150' Input the third probability corresponding to each of the multiple frames of sample video images into the second fully connected layer of the model to be trained to obtain the estimated probability of whether the sample video is a video collected by a living body.
  • This step specifically includes: inputting the fourth probability and the third probability corresponding to each of the multi-frame sample video images into the second fully connected layer of the model to be trained to obtain whether the sample video is collected for a living body The estimated probability of the video.
  • the fourth probability corresponding to the sample video and the N third probabilities corresponding to the N frames of the sample video image can be input to the second fully connected layer, and the second fully connected layer outputs a form such as (X , Y) is the estimated probability, where X represents the probability that the sample video is a video collected for a living body, and Y represents the probability that the sample video is a video collected for a non-living body.
  • a second living body detection model is obtained.
  • some or all of the following steps can be executed : Extract multiple frames of video images from the video collected for the object to be detected, for each frame of video image, according to the characteristics of the frame of video image, determine the first probability that the frame of video image represents whether the object to be detected is a living body; and The inter-frame correlation of the multi-frame video image determines the second probability that the inter-frame correlation characterizes whether the object to be detected is a living body, and finally the object to be detected is determined according to the second probability and the first probability corresponding to each of the multi-frame video images Whether it is alive.
  • the foregoing embodiments of the present application mainly propose two training processes for the model to be trained based on a sample video set, and finally obtain the first living body detection model and the second living body detection model respectively.
  • this application will focus on the live body detection method, and schematically introduce how to apply the first live body detection model or the second live body detection model to the live body detection method.
  • FIG. 2 is a flowchart of a living body detection method proposed in an embodiment of the present application. As shown in Figure 2, the method includes the following steps:
  • the object to be detected refers to an object that needs to be detected whether it is a living body.
  • the object to be detected is not limited to only the face to be detected.
  • the object to be detected may also be a palm print or fingerprint to be detected. If the object to be detected is a palm print, the video collected for the object to be detected is a video shot for the palm print to be detected.
  • the method further includes: obtaining a video collected by the video collection device when the object to be detected is in a silent state.
  • the video collected for the object to be detected may be a silent video collected for the object to be detected.
  • a video is collected for the object to be detected, for example, a short video of 1 to 3 seconds is collected for the object to be detected.
  • the user when collecting a video from a user, the user only needs to look at the video capture device, and the user is not required to complete specified facial actions such as opening mouth, blinking, and reading aloud in front of the camera, which not only avoids the accuracy of face recognition by facial actions The impact of this can also enable users to complete living body detection without having to make specified facial actions, thereby improving user experience.
  • the extracted video images when multiple frames of video images are extracted from a video, they may be extracted at equal intervals between frames, and the extracted video images may be RGB images. For example, for a piece of video, for example, every 5 frames of video image, one frame of video image is extracted. Taking a video including 48 frames of video images as an example, the extracted video images of each frame are: frame 6, frame 12, frame 18, frame 24, frame 30, frame 36, frame 42 and frame 48 frames.
  • the video when extracting multiple frames of video images from a video, the video may be divided into multiple sub-segments first, and then one frame of video image is extracted from each sub-segment.
  • the video is equally divided into N sub-segments, and for each sub-segment, one frame of video image is randomly extracted therefrom, or one frame of video image is extracted from the middle of the sub-segment.
  • multiple frames of video images are extracted at equal intervals between frames, or by dividing the video into multiple sub-segments, and then extracting one frame of video images from each sub-segment, so that the extracted multiple frames of video images are evenly distributed Among the video images in the video, multiple frames of video images can more accurately characterize the content of the video, thereby further improving the accuracy of living body detection.
  • the multi-frame video image extracted from the video collected for the object to be detected is characterized by using the multi-frame video image to characterize the video, so that the living body detection method proposed in this application is based on the video. Detection.
  • this application uses video as the basis to perform live body detection, and the detection result is more accurate.
  • this application extracts multiple frames of video images from the video collected by the object to be detected, the redundant information of the video can be reduced, thereby reducing the amount of calculation and improving the detection efficiency.
  • S24 For each frame of video image in the multiple frames of video images, determine the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image.
  • the feature of the video image may be a convolution feature.
  • the first living body detection model obtained through training may be used. Specifically, each frame of the video image in the multi-frame video image is input into the convolutional layer of the living body detection model to obtain the characteristics of the frame of video image; and then the characteristics of the frame video image are input into the living body detection model.
  • the first fully connected layer is used to determine the first probability that the frame of video image represents whether the object to be detected is a living body.
  • the first probability corresponding to each frame of video image may be a probability vector in the form of (x, y), where x represents the probability that the object to be detected is a living body, and y represents the probability that the object to be detected is inanimate.
  • the feature of each frame of video image can be obtained through a convolutional neural network, or other image feature extraction methods can be used to extract the feature of each frame of video image. Then, the feature of each frame of video image is input into the first fully connected layer of the first living body detection model to determine the first probability that the frame of video image represents whether the object to be detected is a living body.
  • S26 Determine whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images.
  • the first living body detection model obtained through training may be used to determine whether the object to be detected is a living body.
  • the first probability corresponding to each of the multiple frames of video images is input into the second fully connected layer of the living body detection model to determine whether the object to be detected is a living body.
  • each frame of video image in the multiple frames of video images is input to the convolutional layer in the living body detection model, and the convolutional layer outputs the characteristics of each frame of video image; the characteristics of each frame of video image are then input to the living body detection model
  • the first fully connected layer the first fully connected layer outputs the first probability corresponding to each frame of video image; the first probability corresponding to each frame of video image is then input to the second fully connected layer of the live detection model, the second fully connected
  • the layer outputs an estimated probability, which is a comprehensive probability that characterizes whether the object to be detected is a living body.
  • the estimated probability may be a probability vector in the form of (X, Y), where X represents the comprehensive probability that the object to be detected is a living body, and Y represents the comprehensive probability that the object to be detected is a non-living body.
  • the average value of the multiple first probabilities may be calculated to determine whether the object to be detected is a living body.
  • the first probability corresponding to multiple frames of video images is a probability vector of the shape (x, y), where x represents the probability that the object to be detected is a living body, and y represents the probability that the object to be detected is non-living.
  • the probability vectors corresponding to the 8 frames of video images extracted from the video are: (35.9,13.0), (43.2,5.6), (34.7,14.3), (44.6,5.4), (58.6,2.1), (41.8) ,6.7), (29.2,17.8), (21.4,22.8), based on the above 8 probability vectors, the integrated average probability vector is calculated as (38.7,11.0), where the probability that the object to be detected is alive is greater than that of the object to be detected The probability that the object is a non-living body, determines that the object to be detected is a living body.
  • a live body detection is performed based on a video collected for the object to be detected. Specifically, extract multiple frames of video images from the video, use multiple frames of video images to characterize the video, and then for each frame of video image, according to the characteristics of the frame of video image, determine whether the frame of video image represents whether the object to be detected is a living body Finally, according to the determined multiple first probabilities, comprehensively determine whether the object to be detected is a living body. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate.
  • the inter-frame correlation of multiple frames of video images can also be used to characterize the piece of video. If multiple frames of video images and multiple The inter-frame correlation of a frame video image characterizes the video. When performing live detection, further introducing inter-frame correlation can further improve the accuracy of live detection.
  • FIG. 3 is another flowchart of the living body detection method proposed in an embodiment of the present application. As shown in Figure 3, the method includes the following steps:
  • S24 For each frame of video image in the multiple frames of video images, determine the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image.
  • the inter-frame correlation refers to: information between frames of a multi-frame video image. Specifically, for each frame of video images in a multi-frame video image, the feature of the frame of video image can be extracted, and the respective features of the multi-frame video image can be spliced to obtain the feature of the video, and the video feature is used to characterize all the video images.
  • the inter-frame correlation for each frame of video images in a multi-frame video image, the feature of the frame of video image can be extracted, and the respective features of the multi-frame video image can be spliced to obtain the feature of the video, and the video feature is used to characterize all the video images.
  • multiple frames of video images can be input to the convolutional layer of the second living detection model obtained through training, and the convolutional layer outputs the three-dimensional convolution feature of each frame of video image, and the three-dimensional convolution feature is It is a feature of video images. Then stacking a plurality of three-dimensional convolution features to obtain a new three-dimensional convolution feature as the feature of the video, and the video feature is used to characterize the inter-frame correlation.
  • the convolutional layer of the living body detection model outputs 8 36*36*25 convolutional features
  • the feature combination module of the living body detection model combines these 8 36*36*25
  • a 36*36*200 convolution feature is obtained, and the 36*36*200 convolution feature is used as the video feature.
  • the video feature characterizing the inter-frame correlation may be input to the second living body detection model obtained through training.
  • the third fully connected layer outputs a second probability that characterizes whether the object to be detected is a living body according to the video feature.
  • the second probability may be a probability vector in the form of (x', y'), where x'represents the probability that the object to be detected is a living body, and y'represents the probability that the object to be detected is non-living.
  • S26' Determine whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images.
  • the second living body detection model obtained through training can be used to Determine whether the object to be detected is alive.
  • the second probability of the inter-frame correlation characterizing whether the object to be detected is a living body, and the first probability corresponding to each of the multiple frames of video images may be input to the second fully connected layer of the second living body detection model, and the second fully connected layer
  • the connection layer outputs an estimated probability, which is a comprehensive probability that characterizes whether the object to be detected is a living body.
  • the estimated probability may be a probability vector in the form of (X, Y), where X represents the comprehensive probability that the object to be detected is a living body, and Y represents the comprehensive probability that the object to be detected is a non-living body.
  • determining whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images may specifically include:
  • S26'-1 Assign weights to the first probabilities corresponding to the second probability and the multiple frames of video images, wherein the weight corresponding to the second probability is greater than the weight corresponding to each of the first probabilities;
  • S26'-2 Determine whether the object to be detected is a living body according to the second probability and its corresponding probability, and the first probability and its corresponding weight corresponding to each of the multiple frames of video images.
  • the convolutional layer For example, after 8 frames of video images are input to the living body detection model, they pass through the convolutional layer and the first fully connected layer of the living body detection model, and output the first probabilities corresponding to each of the 8 frames of video images.
  • the 8 first probabilities are: (35.9 ,13.0), (43.2,5.6), (34.7,14.3), (44.6,5.4), (58.6,2.1), (41.8,6.7), (29.2,17.8), (21.4,22.8).
  • the output features of the convolutional layer for each frame of video image are all 36*36*25 convolution features.
  • the feature combination module stacks these 8 36*36*25 convolution features to obtain a 36*36*200
  • the convolution feature of 36*36*200 is used as the feature of the video, which represents the inter-frame correlation of multiple frames of video images.
  • the second probability is output, assuming that the second probability is (50.1, 3.5).
  • the second probability and the first probability corresponding to the multiple frames of video images are assigned weights.
  • the weight assigned to the second probability is 1/2
  • the weight assigned to each first probability is 1/16.
  • the weighted average probability is calculated according to the second probability and its corresponding probability, and the first probability and its corresponding weight corresponding to each of the multiple frames of video images, and whether the object to be detected is a living body is determined according to the weighted average probability.
  • the weighted average probability obtained is (44.4, 7.3), where the probability that the object to be detected is alive is greater than the probability that the object to be detected is non-living, and it is determined that the object to be detected is alive.
  • step S26'-1 and step S26'-2 By performing step S26'-1 and step S26'-2 to assign a larger weight to the second probability, it is possible to highlight the proportion of the inter-frame correlation of multi-frame video images in characterizing a piece of video information, as well as in the live detection process It plays a role in improving the accuracy of detection, thereby further improving the accuracy of live detection.
  • FIG. 4 is a schematic diagram of a living body detection device provided by an embodiment of the present application. As shown in Figure 4, the device includes:
  • the first extraction module 41 is configured to extract multiple frames of video images from the video collected for the object to be detected;
  • the first determining module 42 is configured to, for each frame of video image in the multi-frame video image, determine the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image;
  • the second determining module 43 is configured to determine whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images.
  • the device further includes:
  • a third determining module configured to determine, according to the inter-frame correlation of the multiple frames of video images, the second probability that the inter-frame correlation characterizes whether the object to be detected is a living body
  • the second determining module includes:
  • the first determination submodule is configured to determine whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images.
  • the device further includes:
  • the first splicing module is used to splice the respective characteristics of the multiple frames of video images to obtain the characteristics of the video, and the video characteristics are used to characterize the inter-frame correlation.
  • the first determining submodule includes:
  • An allocation subunit configured to allocate weights to the first probabilities corresponding to the second probability and the multiple frames of video images, wherein the weight corresponding to the second probability is greater than the weight corresponding to each of the first probabilities;
  • the determining subunit is configured to determine whether the object to be detected is a living body according to the second probability and its corresponding probability, and the first probability and its corresponding weight corresponding to each of the multiple frames of video images.
  • the device further includes:
  • the first obtaining module is configured to obtain a sample video set, the sample set includes a plurality of sample videos carrying tags, and the tags carried by the sample videos indicate whether the sample video is a video collected for a living body;
  • the second extraction module is configured to extract a multi-frame sample video image from the sample video with a mark for each sample video with a mark included in the sample video set;
  • the first input module is configured to input each frame of the sample video image in the multi-frame sample video image into the convolutional layer of the model to be trained to obtain the characteristics of the frame sample video image;
  • the second input module is used to input the characteristics of the frame sample video image into the first fully connected layer of the model to be trained to obtain a third probability corresponding to the frame sample video image, and the third probability represents the frame sample video image Whether it comes from a video collected from a living body;
  • the third input module is configured to input the third probability corresponding to each of the multi-frame sample video images into the second fully connected layer of the model to be trained to obtain an estimate of whether the sample video is a video collected by a living body Probability
  • the second obtaining module is configured to establish a loss function according to the estimated probability and the third probability corresponding to each of the multi-frame sample video images, so as to update the model to be trained and obtain a live detection model;
  • the first determining module includes:
  • the first input submodule is configured to input each frame of the video image in the multi-frame video image into the convolutional layer of the living body detection model to obtain the characteristics of the frame of video image;
  • the second input submodule is configured to input the characteristics of the frame of video image into the first fully connected layer of the living body detection model to determine the first probability that the frame of video image represents whether the object to be detected is a living body.
  • the device further includes:
  • the second splicing module is used to splice the respective characteristics of the multiple frames of sample video images to obtain the characteristics of the sample video;
  • the fourth input module is configured to input the characteristics of the sample video into the third fully connected layer of the model to be trained to obtain the fourth probability of whether the sample video is a video collected for a living body;
  • the third input module includes:
  • the third input sub-module is used to input the fourth probability and the third probability corresponding to each of the multi-frame sample video images into the second fully connected layer of the model to be trained to obtain whether the sample video is for a living body The estimated probability of the captured video.
  • the third determining module includes:
  • the fourth input sub-module is configured to input the first probability corresponding to each of the multiple frames of video images into the second fully connected layer of the living body detection model to determine whether the object to be detected is a living body.
  • the device further includes:
  • the third obtaining module is used to obtain the video captured by the video capturing device when the object to be detected is in a silent state.
  • an embodiment of the present application also provides an electronic device.
  • a schematic structural diagram of the electronic device is shown in FIG. 4.
  • the electronic device 7000 includes at least one processor 7001, a memory 7002, and a bus 7003.
  • the memory 7001 is electrically connected to the storage 7002; the memory 7002 is configured to store at least one computer-executable instruction, and the processor 7001 is configured to execute the at least one computer-executable instruction, so as to execute any one in the first embodiment of the present application. Steps of any living body detection method provided by an embodiment or any optional implementation.
  • the processor 7001 may be an FPGA (Field-Programmable Gate Array) or other devices with logic processing capabilities, such as MCU (Microcontroller Unit), CPU (Central Process Unit, Central Processing Unit) ).
  • FPGA Field-Programmable Gate Array
  • MCU Microcontroller Unit
  • CPU Central Process Unit
  • Central Processing Unit Central Processing Unit
  • the living body detection method provided by the present application is based on a video collected for the object to be detected, the living body detection is performed. Specifically, extract multiple frames of video images from the video, use multiple frames of video images to characterize the video, and then for each frame of video image, according to the characteristics of the frame of video image, determine whether the frame of video image represents whether the object to be detected is a living body Finally, according to the determined multiple first probabilities, comprehensively determine whether the object to be detected is a living body. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate.
  • the living body detection method provided by the present application extracts multiple frames of video images from the video collected by the object to be detected, the redundant information of the video can be reduced, thereby reducing the amount of calculation and improving the detection efficiency.
  • the living body detection method provided by this application does not require the object to be detected to complete specified facial actions such as opening mouth and blinking in front of the camera, which can not only avoid the impact of facial actions on the accuracy of face recognition, but also make it unnecessary for users to do In the case of designated facial movements, complete the living body detection, thereby improving the user experience.
  • the embodiment of the present application also provides a computer-readable storage medium, such as the memory 7002 in FIG. 4, in which a computer program 7002a is stored, which is used to implement the implementation of the present application when executed by a processor. Steps of any embodiment or any living body detection method in Example 1.
  • the computer-readable storage medium includes but is not limited to any type of disk (including floppy disk, hard disk, optical disk, CD-ROM, and magneto-optical disk), ROM (Read-Only Memory), RAM ( Random Access Memory), EPROM (Erasable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), Flash memory, Magnetic Card or light card. That is, a readable storage medium includes any medium that stores or transmits information in a readable form by a device (for example, a computer).
  • the living body detection method provided by the present application is based on a video collected for the object to be detected, the living body detection is performed. Specifically, extract multiple frames of video images from the video, use multiple frames of video images to characterize the video, and then for each frame of video image, according to the characteristics of the frame of video image, determine whether the frame of video image represents whether the object to be detected is a living body Finally, according to the determined multiple first probabilities, comprehensively determine whether the object to be detected is a living body. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate.
  • the living body detection method provided by the present application extracts multiple frames of video images from the video collected by the object to be detected, the redundant information of the video can be reduced, thereby reducing the amount of calculation and improving the detection efficiency.
  • the living body detection method provided by the present application does not require the subject to be detected to complete specified facial actions such as opening mouth and blinking in front of the camera, which can not only avoid the impact of facial actions on the accuracy of face recognition, but also make it unnecessary for users to make In the case of designated facial movements, complete the living body detection, thereby improving the user experience.

Abstract

The embodiment of the present application relates to the technical field of data processing. Provided are a living body testing method and apparatus, an electronic device and a readable storage medium. The living body testing method comprises: extracting multiple frames of video images from a video collected for an object to be tested; for each frame of a video image in the multiple frames of video images, determining, according to features of the frame of a video image, a first probability that the frame of a video image represents whether the object to be tested is a living body; and determining whether the object to be tested is a living body, according to first probabilities respectively corresponding to the multiple frames of video images. According to the living body testing method provided in the present application, the accuracy of testing a living body can be improved.

Description

活体检测方法、装置、电子设备及可读存储介质Living body detection method, device, electronic equipment and readable storage medium
本申请要求在2019年6月13日提交中国专利局、申请号为201910512041.1、发明名称为“活体检测方法、装置、电子设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201910512041.1, and the invention title is "live detection methods, devices, electronic equipment and readable storage media" on June 13, 2019. The reference is incorporated in this application.
技术领域Technical field
本申请实施例涉及数据处理技术领域,具体而言,涉及一种活体检测方法、装置、电子设备及可读存储介质。The embodiments of the present application relate to the field of data processing technology, and in particular, to a living body detection method, device, electronic device, and readable storage medium.
背景技术Background technique
随着数据处理技术领域中的身份识别技术广泛应用于安防、金融等领域,例如基于人脸识别、掌纹识别或指纹识别的门禁解锁、手机解锁、远程支付、远程开户等,身份识别技术的安全性越来越受到人们的关注。例如人们会关注在通过设备对识别对象进行识别时,如何确定识别对象是来自于真实的人。为此,相关技术提出了活体检测方法。As the identification technology in the field of data processing technology is widely used in security, finance and other fields, such as face recognition, palmprint recognition or fingerprint recognition based access control unlocking, mobile phone unlocking, remote payment, remote account opening, etc., identification technology Security is getting more and more attention. For example, people will pay attention to how to determine that the recognition object comes from a real person when the recognition object is recognized through the device. For this reason, related technologies have proposed live detection methods.
以人脸识别技术为例,在针对人脸图像进行活体检测时,相关技术提出的检测方法是:首先要求待检测对象在镜头前完成张嘴、眨眼等指定面部动作,镜头针对指定面部动作采集一张人脸图像,处理器基于该张人脸图像,判断该张人脸图像中的待检测对象是否是活体。然而,张嘴、眨眼等面部动作会影响人脸识别的准确性,也降低了用户体验。并且无论是针对人脸识别,还是针对掌纹识别等,均是以单张图像为基础,进行活体检测,活体检测的准确性较低。Taking face recognition technology as an example, when performing live detection on face images, the detection method proposed by related technologies is: first, the object to be detected is required to complete specified facial actions such as opening the mouth and blinking in front of the camera, and the camera captures the specified facial actions. Based on the face image, the processor determines whether the object to be detected in the face image is a living body. However, facial actions such as opening the mouth and blinking will affect the accuracy of face recognition and reduce the user experience. And whether it is for face recognition or palmprint recognition, etc., live detection is performed based on a single image, and the accuracy of live detection is low.
发明内容Summary of the invention
本申请实施例提供一种活体检测方法、装置、电子设备及可读存储介质,旨在提高活体检测的准确性。The embodiments of the present application provide a living body detection method, device, electronic device, and readable storage medium, aiming to improve the accuracy of living body detection.
本申请实施例第一方面提供了一种活体检测方法,所述方法包括:The first aspect of the embodiments of the present application provides a living body detection method, the method including:
从针对待检测对象所采集的视频中提取多帧视频图像;Extract multiple frames of video images from the video collected for the object to be detected;
针对所述多帧视频图像中的每帧视频图像,根据该帧视频图像的特征, 确定该帧视频图像表征所述待检测对象是否为活体的第一概率;For each frame of video image in the multiple frames of video images, according to the characteristics of the frame of video image, determine the first probability that the frame of video image represents whether the object to be detected is a living body;
根据所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体。According to the first probability corresponding to each of the multiple frames of video images, it is determined whether the object to be detected is a living body.
可选地,所述方法还包括:Optionally, the method further includes:
根据所述多帧视频图像的帧间相关性,确定所述帧间相关性表征所述待检测对象是否为活体的第二概率;Determining, according to the inter-frame correlation of the multiple frames of video images, the second probability that the inter-frame correlation represents whether the object to be detected is a living body;
根据所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体,包括:According to the first probability corresponding to each of the multiple frames of video images, determining whether the object to be detected is a living body includes:
根据所述第二概率和所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体。According to the second probability and the first probability corresponding to each of the multiple frames of video images, it is determined whether the object to be detected is a living body.
可选地,所述方法还包括:Optionally, the method further includes:
对所述多帧视频图像各自的特征进行拼接,得到所述视频的特征,该视频特征用于表征所述帧间相关性。The respective characteristics of the multiple frames of video images are spliced to obtain the characteristics of the video, and the video characteristics are used to characterize the inter-frame correlation.
可选地,根据所述第二概率和所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体,包括:Optionally, determining whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images includes:
为所述第二概率和所述多帧视频图像各自对应的第一概率分配权重,其中,所述第二概率对应的权重大于每个所述第一概率对应的权重;Assigning weights to the second probability and the first probability corresponding to each of the multiple frames of video images, wherein the weight corresponding to the second probability is greater than the weight corresponding to each of the first probabilities;
根据所述第二概率及其对应的概率,以及所述多帧视频图像各自对应的第一概率及其对应的权重,确定所述待检测对象是否为活体。According to the second probability and the corresponding probability, and the first probability and the corresponding weight corresponding to each of the multiple frames of video images, it is determined whether the object to be detected is a living body.
可选地,所述方法还包括:Optionally, the method further includes:
获得样本视频集,所述样本集包括多个携带标记的样本视频,样本视频携带的标记表征该样本视频是否是针对活体所采集的视频;Obtain a sample video set, where the sample set includes a plurality of sample videos carrying tags, and the tags carried by the sample videos represent whether the sample videos are videos collected for a living body;
针对所述样本视频集包括的每个携带标记的样本视频,执行以下步骤:For each sample video with a mark included in the sample video set, the following steps are performed:
从该携带标记的样本视频中提取多帧样本视频图像;Extracting multiple frames of sample video images from the sample video carrying the mark;
将所述多帧样本视频图像中的每帧样本视频图像输入待训练模型的卷积层,得到该帧样本视频图像的特征;Input each frame of sample video image in the multi-frame sample video image into the convolutional layer of the model to be trained to obtain the characteristics of the frame sample video image;
将该帧样本视频图像的特征输入所述待训练模型的第一全连接层,得到该帧样本视频图像对应的第三概率,该第三概率表征该帧样本视频图像是否来源于针对活体所采集的视频;The characteristics of the sample video image of the frame are input into the first fully connected layer of the model to be trained, and the third probability corresponding to the sample video image of the frame is obtained. The third probability characterizes whether the sample video image of the frame is derived from a live body Video of
将所述多帧样本视频图像各自对应的第三概率输入所述待训练模型的第二全连接层,得到所述样本视频是否是针对活体所采集的视频的预估概率;Input the third probability corresponding to each of the multiple frames of sample video images into the second fully connected layer of the model to be trained to obtain the estimated probability of whether the sample video is a video collected for a living body;
根据所述预估概率以及多帧样本视频图像各自对应的第三概率,建立损失函数,以更新所述待训练模型,获得活体检测模型;Establish a loss function according to the estimated probability and the third probability corresponding to each of the multi-frame sample video images, so as to update the model to be trained and obtain a live detection model;
针对所述多帧视频图像中的每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征所述待检测对象是否为活体的第一概率,包括:For each frame of video image in the multiple frames of video images, determining the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image includes:
将所述多帧视频图像中的每帧视频图像输入所述活体检测模型的卷积层,得到该帧视频图像的特征;Inputting each frame of the video image of the multiple frames of video images into the convolutional layer of the living body detection model to obtain the characteristics of the frame of video image;
将该帧视频图像的特征输入所述活体检测模型的第一全连接层,以确定该帧视频图像表征所述待检测对象是否为活体的第一概率。The feature of the frame of video image is input into the first fully connected layer of the living body detection model to determine the first probability that the frame of video image represents whether the object to be detected is a living body.
可选地,所述方法还包括:Optionally, the method further includes:
对所述多帧样本视频图像各自的特征进行拼接,得到所述样本视频的特征;Splicing the respective features of the multiple frames of sample video images to obtain the features of the sample video;
将所述样本视频的特征输入所述待训练模型的第三全连接层,得到所述样本视频是否是针对活体所采集的视频的第四概率;Input the features of the sample video into the third fully connected layer of the model to be trained to obtain the fourth probability of whether the sample video is a video collected for a living body;
将所述多帧样本视频图像各自对应的第三概率输入所述待训练模型的第二全连接层,得到所述样本视频是否是针对活体所采集的视频的预估概率,包括:Inputting the third probability corresponding to each of the multiple frames of sample video images into the second fully connected layer of the model to be trained to obtain whether the sample video is an estimated probability of a video collected from a living body includes:
将所述第四概率以及所述多帧样本视频图像各自对应的第三概率输入所述待训练模型的第二全连接层,得到所述样本视频是否是针对活体所采集的视频的预估概率。The fourth probability and the third probability corresponding to the multi-frame sample video image are input into the second fully connected layer of the model to be trained to obtain the estimated probability of whether the sample video is a video collected by a living body .
可选地,根据所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体,包括:Optionally, determining whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images includes:
将所述多帧视频图像各自对应的第一概率输入所述活体检测模型的第二全连接层,确定所述待检测对象是否为活体。The first probability corresponding to each of the multiple frames of video images is input into the second fully connected layer of the living body detection model to determine whether the object to be detected is a living body.
可选地,所述方法还包括:Optionally, the method further includes:
获得视频采集装置在所述待检测对象处于静默状态下所采集的视频。Obtain the video collected by the video collection device when the object to be detected is in a silent state.
本申请实施例第二方面提供一种活体检测装置,所述装置包括:A second aspect of the embodiments of the present application provides a living body detection device, which includes:
第一提取模块,用于从针对待检测对象所采集的视频中提取多帧视频图像;The first extraction module is configured to extract multiple frames of video images from the video collected for the object to be detected;
第一确定模块,用于针对所述多帧视频图像中的每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征所述待检测对象是否为活体的第一概率;The first determining module is configured to, for each frame of video image in the multiple frames of video images, determine the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image;
第二确定模块,用于根据所述多帧视频图像各自对应的第一概率,确定 所述待检测对象是否为活体。The second determining module is configured to determine whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images.
可选地,所述装置还包括:Optionally, the device further includes:
第三确定模块,用于根据所述多帧视频图像的帧间相关性,确定所述帧间相关性表征所述待检测对象是否为活体的第二概率;A third determining module, configured to determine, according to the inter-frame correlation of the multiple frames of video images, the second probability that the inter-frame correlation characterizes whether the object to be detected is a living body;
所述第二确定模块包括:The second determining module includes:
第一确定子模块,用于根据所述第二概率和所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体。The first determination submodule is configured to determine whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images.
可选地,所述装置还包括:Optionally, the device further includes:
第一拼接模块,用于对所述多帧视频图像各自的特征进行拼接,得到所述视频的特征,该视频特征用于表征所述帧间相关性。The first splicing module is used to splice the respective characteristics of the multiple frames of video images to obtain the characteristics of the video, and the video characteristics are used to characterize the inter-frame correlation.
可选地,所述第一确定子模块包括:Optionally, the first determining submodule includes:
分配子单元,用于为所述第二概率和所述多帧视频图像各自对应的第一概率分配权重,其中,所述第二概率对应的权重大于每个所述第一概率对应的权重;An allocation subunit, configured to allocate weights to the first probabilities corresponding to the second probability and the multiple frames of video images, wherein the weight corresponding to the second probability is greater than the weight corresponding to each of the first probabilities;
确定子单元,用于根据所述第二概率及其对应的概率,以及所述多帧视频图像各自对应的第一概率及其对应的权重,确定所述待检测对象是否为活体。The determining subunit is configured to determine whether the object to be detected is a living body according to the second probability and its corresponding probability, and the first probability and its corresponding weight corresponding to each of the multiple frames of video images.
可选地,所述装置还包括:Optionally, the device further includes:
第一获得模块,用于获得样本视频集,所述样本集包括多个携带标记的样本视频,样本视频携带的标记表征该样本视频是否是针对活体所采集的视频;The first obtaining module is configured to obtain a sample video set, the sample set includes a plurality of sample videos carrying tags, and the tags carried by the sample videos indicate whether the sample video is a video collected for a living body;
第二提取模块,用于针对所述样本视频集包括的每个携带标记的样本视频,从该携带标记的样本视频中提取多帧样本视频图像;The second extraction module is configured to extract a multi-frame sample video image from the sample video with a mark for each sample video with a mark included in the sample video set;
第一输入模块,用于将所述多帧样本视频图像中的每帧样本视频图像输入待训练模型的卷积层,得到该帧样本视频图像的特征;The first input module is configured to input each frame of the sample video image in the multi-frame sample video image into the convolutional layer of the model to be trained to obtain the characteristics of the frame sample video image;
第二输入模块,用于将该帧样本视频图像的特征输入所述待训练模型的第一全连接层,得到该帧样本视频图像对应的第三概率,该第三概率表征该帧样本视频图像是否来源于针对活体所采集的视频;The second input module is used to input the characteristics of the frame sample video image into the first fully connected layer of the model to be trained to obtain a third probability corresponding to the frame sample video image, and the third probability represents the frame sample video image Whether it comes from a video collected from a living body;
第三输入模块,用于将所述多帧样本视频图像各自对应的第三概率输入所述待训练模型的第二全连接层,得到所述样本视频是否是针对活体所采集的视频的预估概率;The third input module is configured to input the third probability corresponding to each of the multi-frame sample video images into the second fully connected layer of the model to be trained to obtain an estimate of whether the sample video is a video collected by a living body Probability
第二获得模块,用于根据所述预估概率以及多帧样本视频图像各自对应 的第三概率,建立损失函数,以更新所述待训练模型,获得活体检测模型;The second obtaining module is configured to establish a loss function according to the estimated probability and the third probability corresponding to each of the multi-frame sample video images, so as to update the model to be trained and obtain a live detection model;
所述第一确定模块包括:The first determining module includes:
第一输入子模块,用于将所述多帧视频图像中的每帧视频图像输入所述活体检测模型的卷积层,得到该帧视频图像的特征;The first input submodule is configured to input each frame of the video image in the multi-frame video image into the convolutional layer of the living body detection model to obtain the characteristics of the frame of video image;
第二输入子模块,用于将该帧视频图像的特征输入所述活体检测模型的第一全连接层,以确定该帧视频图像表征所述待检测对象是否为活体的第一概率。The second input submodule is configured to input the characteristics of the frame of video image into the first fully connected layer of the living body detection model to determine the first probability that the frame of video image represents whether the object to be detected is a living body.
可选地,所述装置还包括:Optionally, the device further includes:
第二拼接模块,用于对所述多帧样本视频图像各自的特征进行拼接,得到所述样本视频的特征;The second splicing module is used to splice the respective characteristics of the multiple frames of sample video images to obtain the characteristics of the sample video;
第四输入模块,用于将所述样本视频的特征输入所述待训练模型的第三全连接层,得到所述样本视频是否是针对活体所采集的视频的第四概率;The fourth input module is configured to input the characteristics of the sample video into the third fully connected layer of the model to be trained to obtain the fourth probability of whether the sample video is a video collected for a living body;
所述第三输入模块包括:The third input module includes:
第三输入子模块,用于将所述第四概率以及所述多帧样本视频图像各自对应的第三概率输入所述待训练模型的第二全连接层,得到所述样本视频是否是针对活体所采集的视频的预估概率。The third input sub-module is used to input the fourth probability and the third probability corresponding to each of the multi-frame sample video images into the second fully connected layer of the model to be trained to obtain whether the sample video is for a living body The estimated probability of the captured video.
可选地,所述第三确定模块包括:Optionally, the third determining module includes:
第四输入子模块,用于将所述多帧视频图像各自对应的第一概率输入所述活体检测模型的第二全连接层,确定所述待检测对象是否为活体。The fourth input sub-module is configured to input the first probability corresponding to each of the multiple frames of video images into the second fully connected layer of the living body detection model to determine whether the object to be detected is a living body.
可选地,所述装置还包括:Optionally, the device further includes:
第三获得模块,用于获得视频采集装置在所述待检测对象处于静默状态下所采集的视频。The third obtaining module is used to obtain the video captured by the video capturing device when the object to be detected is in a silent state.
本申请实施例第三方面提供一种可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请第一方面所述的方法中的步骤。A third aspect of the embodiments of the present application provides a readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps in the method described in the first aspect of the present application are implemented.
本申请实施例第四方面提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行时实现本申请第一方面所述的方法的步骤。A fourth aspect of the embodiments of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the method described in the first aspect of the present application when executed A step of.
采用本申请提供的活体检测方法,通过从针对待检测对象所采集的视频中提取多帧视频图像,针对每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征待检测对象是否为活体的第一概率,最后根据所确定出的多个第一概率,综合地确定待检测对象是否为活体。Using the living body detection method provided in this application, by extracting multiple frames of video images from the video collected for the object to be detected, for each frame of video image, according to the characteristics of the frame of video image, it is determined whether the frame of video image represents the object to be detected It is the first probability of a living body, and finally, according to the determined multiple first probabilities, it is comprehensively determined whether the object to be detected is a living body.
一方面,由于本申请提供的活体检测方法中,是以一段针对待检测对象 所采集的视频为基础,执行活体检测。具体地,从该视频中提取多帧视频图像,利用多帧视频图像表征该段视频,再针对每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征待检测对象是否为活体的第一概率,最后根据所确定出的多个第一概率,综合地确定待检测对象是否为活体。相比于现有技术中是针对单张图像进行活体检测,本申请以视频为基础,执行活体检测,检测结果更加准确。On the one hand, because the living body detection method provided in this application is based on a video collected for the object to be detected, the living body detection is performed. Specifically, extract multiple frames of video images from the video, use multiple frames of video images to characterize the video, and then for each frame of video image, according to the characteristics of the frame of video image, determine whether the frame of video image represents whether the object to be detected is a living body Finally, according to the determined multiple first probabilities, comprehensively determine whether the object to be detected is a living body. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate.
另一方面,由于本申请提供的活体检测方法中,针对待检测对象所采集的视频,从其中提取多帧视频图像,可以减少该视频的冗余信息,从而降低计算量,提高检测效率。On the other hand, since the living body detection method provided by the present application extracts multiple frames of video images from the video collected by the object to be detected, the redundant information of the video can be reduced, thereby reducing the amount of calculation and improving the detection efficiency.
再一方面,本申请提供的活体检测方法不要求待检测对象在镜头前完成张嘴、眨眼等指定面部动作,不仅可以避免面部动作对人脸识别准确性的影响,又可以使得用户在不必做出指定面部动作的情况下,完成活体检测,从而提高用户体验。On the other hand, the living body detection method provided by the present application does not require the subject to be detected to complete specified facial actions such as opening mouth and blinking in front of the camera, which can not only avoid the impact of facial actions on the accuracy of face recognition, but also make it unnecessary for users to make In the case of designated facial movements, complete the living body detection, thereby improving the user experience.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to understand the technical means of the present invention more clearly, it can be implemented in accordance with the content of the description, and in order to make the above and other objectives, features and advantages of the present invention more obvious and understandable. In the following, specific embodiments of the present invention are specifically cited.
附图说明Description of the drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings used in the description of the embodiments or the prior art. Obviously, the drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1是本申请一实施例中活体检测模型的训练过程示意图;FIG. 1 is a schematic diagram of a training process of a living body detection model in an embodiment of the present application;
图2是本申请一实施例提出的活体检测方法的流程图;2 is a flowchart of a living body detection method proposed in an embodiment of the present application;
图3是本申请一实施例提出的活体检测方法的另一流程图;FIG. 3 is another flowchart of the living body detection method proposed by an embodiment of the present application;
图4是本申请一实施例提供的活体检测装置的示意图;4 is a schematic diagram of a living body detection device provided by an embodiment of the present application;
图5是本申请一实施例提供的电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施例Specific embodiment
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本 发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
在基于身份识别技术的门禁解锁、手机解锁、远程支付、远程开户等应用场景中,设备需要采集用户的指纹或掌纹,或者需要拍摄用户的人脸或掌纹等识别对象。以识别对象为拍摄的人脸或掌纹等为例,为了避免攻击者通过将他人的人脸照片或掌纹照片展示给拍摄装置,导致攻击者在未经他人允许的情况下通过验证,从而私自进入他人账户或账号的情况发生,需要对拍摄装置所拍摄的照片中的人脸或掌纹等进行活体检测,以确定其是否来自于真实的人,即确定其是否来自于活体。In application scenarios such as access control unlocking, mobile phone unlocking, remote payment, and remote account opening based on identity recognition technology, the device needs to collect the user's fingerprint or palm print, or needs to capture the user's face or palm print and other identification objects. Take the face or palm print of the photographed object as an example, in order to prevent the attacker from showing the face photo or palm print photo of another person to the camera, causing the attacker to pass the verification without the permission of others. In the case of unauthorized access to another person’s account or account, it is necessary to perform a live detection of the face or palm print in the photo taken by the camera to determine whether it is from a real person, that is, to determine whether it is from a living body.
相关技术提供的种活体判断方法:首先要求待检测对象在镜头前完成张嘴、眨眼等指定面部动作,镜头针对指定面部动作采集一张人脸图像,处理器基于该张人脸图像,判断该张人脸图像中的待检测对象是否是活体。然而,张嘴、眨眼等面部动作会影响人脸识别的准确性,也降低了用户体验。并且无论是针对人脸识别,还是针对掌纹识别等,均是以单张图像为基础,进行活体检测,活体检测的准确性较低。A living body judgment method provided by related technologies: First, the object to be detected is required to complete specified facial actions such as opening mouth and blinking in front of the lens. The lens collects a face image for the specified facial action, and the processor judges the face image based on the face image. Whether the object to be detected in the face image is a living body. However, facial actions such as opening the mouth and blinking will affect the accuracy of face recognition and reduce the user experience. And whether it is for face recognition or palmprint recognition, etc., live detection is performed based on a single image, and the accuracy of live detection is low.
为了提高活体检测准确性,申请人提出:以一段针对待检测对象所采集的视频为基础,执行活体检测。为了表征该段视频,本申请从该段视频中提取出多帧视频图像,再针对每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征待检测对象是否为活体的第一概率,最后根据所确定出的多个第一概率,综合地确定待检测对象是否为活体。相比于现有技术中是针对单张图像进行活体检测,本申请以视频为基础,执行活体检测,检测结果更加准确。In order to improve the accuracy of live detection, the applicant proposes to perform live detection based on a video collected for the object to be detected. In order to characterize the video, this application extracts multiple frames of video images from the video, and then for each frame of video image, according to the characteristics of the frame of video image, it is determined whether the frame of video image characterizing the object to be detected is the first of a living body. Probability, and finally according to the determined multiple first probabilities, comprehensively determine whether the object to be detected is a living body. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate.
为了能更智能地实施申请人提出的上述方法,使得该方法的应用范围更广,申请人首先构建了待训练模型,并基于样本视频集对该待训练模型进行训练,得到活体检测模型(例如:下文所述的第一活体检测模型或第二活体检测模型),申请人利用该活体检测模型执行上述方法中的部分或全部步骤。In order to implement the above method proposed by the applicant more intelligently and make the application of the method wider, the applicant first constructed the model to be trained, and trained the model to be trained based on the sample video set to obtain a live detection model (for example : The first living body detection model or the second living body detection model described below), the applicant uses the living body detection model to perform part or all of the steps in the above method.
参考图1,图1是本申请一实施例中活体检测模型的训练过程示意图。 图1中,活体检测模型包括:卷积层、第一全连接层以及第二全连接层。其中,卷积层可具体采用卷积神经网络。应当理解的,申请人预先构建的待训练模型的模型结构与图1所示的活体检测模型的模型结构相同,待训练模型也包括卷积层、第一全连接层以及第二全连接层,而经过训练后,待训练模型的模型参数被更新调整,最终得到活体检测模型。Referring to FIG. 1, FIG. 1 is a schematic diagram of a training process of a living body detection model in an embodiment of the present application. In Figure 1, the living body detection model includes: a convolutional layer, a first fully connected layer, and a second fully connected layer. Among them, the convolutional layer can specifically adopt a convolutional neural network. It should be understood that the model structure of the model to be trained pre-built by the applicant is the same as the model structure of the living body detection model shown in Figure 1. The model to be trained also includes a convolutional layer, a first fully connected layer, and a second fully connected layer. After training, the model parameters of the model to be trained are updated and adjusted, and finally the live detection model is obtained.
为了对待训练模型进行训练,以得到活体检测模型,本申请一实施例提出以下步骤。需要预先说明的是,以下各步骤中具体以样本视频集是关于人脸的样本视频集为例,对各步骤进行了介绍。应当理解的,样本视频集的类型不局限于是关于人脸的样本视频集,例如还可以是关于掌纹的样本视频集,如果基于关于掌纹的样本视频集对待训练模型进行训练,最终所得的活体检测模型可用于针对掌纹视频进行活体检测。In order to train the model to be trained to obtain the living body detection model, an embodiment of the present application proposes the following steps. It should be noted in advance that, in the following steps, the sample video set is a sample video set about a human face as an example, and each step is introduced. It should be understood that the type of sample video set is not limited to the sample video set about human faces. For example, it can also be a sample video set about palm prints. If the training model is trained based on the sample video set about palm prints, the final result is The living body detection model can be used for living body detection for palmprint videos.
S110:获得样本视频集,所述样本集包括多个携带标记的样本视频,样本视频携带的标记表征该样本视频是否是针对活体所采集的视频。S110: Obtain a sample video set, the sample set includes a plurality of sample videos carrying tags, and the tags carried by the sample videos indicate whether the sample videos are videos collected for a living body.
本实施例中,样本视频集中的部分或全部样本视频可以是视频采集装置在训练参与者处于静默状态下所采集的视频。在对训练参与者采集视频时,训练参与者仅需注视视频采集装置即可,不要求训练参与者在镜头前完成张嘴、眨眼、朗读等指定面部动作。In this embodiment, part or all of the sample videos in the sample video set may be videos collected by the video collection device when the training participant is in a silent state. When collecting videos of the training participants, the training participants only need to look at the video collection device, and the training participants are not required to complete designated facial actions such as opening their mouths, blinking eyes, and reading aloud in front of the camera.
示例地,可以针对多位训练参与者(真人)中的每位训练参与者的脸部,拍摄一段静默视频,视频的时长可控制在1至3秒,对此类针对真人拍摄的视频进行标注,使此类视频携带标签,该标签表征此类视频是针对活体所采集的视频。可以针对多张打印照片、屏幕显示照片以及面具等非活体中的每个非活体,拍摄一段视频,视频的时长可控制在1至3秒,对此类针对非活体拍摄的视频进行标注,使此类视频携带标签,该标签表征此类视频不是针对活体所采集的视频。For example, a silent video can be taken for the face of each training participant among multiple training participants (real people). The duration of the video can be controlled within 1 to 3 seconds, and such videos taken for real people can be labeled , Make this kind of video carry a tag, which indicates that this kind of video is a video collected from a living body. You can shoot a video for each of the non-living bodies such as multiple printed photos, screen photos, and masks. The length of the video can be controlled within 1 to 3 seconds. This type of video shot for non-living bodies can be marked to make This type of video carries a tag that indicates that the video is not a video collected from a living body.
S120:针对所述样本视频集包括的每个携带标记的样本视频,从该携带标记的样本视频中提取多帧样本视频图像。S120: For each sample video with a mark included in the sample video set, extract multiple frames of sample video images from the sample video with a mark.
示例地,针对每个携带标记的样本视频,可以首先将其划分为N个子段,然后从每个子段中提取一帧RGB视频图像,作为所述样本视频图像,最终从每个携带标记的样本视频中总共可以提取出N帧样本视频图像。For example, for each sample video carrying a mark, it can be first divided into N sub-segments, and then a frame of RGB video image is extracted from each sub-segment as the sample video image, and finally from each sample carrying the mark A total of N frames of sample video images can be extracted from the video.
S130:将所述多帧样本视频图像中的每帧样本视频图像输入待训练模型的卷积层,得到该帧样本视频图像的特征。S130: Input each frame of the sample video image in the multi-frame sample video image into the convolutional layer of the model to be trained to obtain the characteristics of the frame sample video image.
示例地,可以将N帧样本视频图像依次输入待训练模型的卷积神经网络,卷积神经网络针对每一帧样本视频图像,输出一个三维的卷积特征,即该帧样本视频图像的特征。应当理解的,多帧样本视频图像可共用一个卷积神经网络,多帧样本视频图像中的每帧样本视频图像也可以分别对应一个卷积神经网络。因此待训练模型中包括的卷积神经网络可以是一个,也可以是多个。For example, N frames of sample video images can be sequentially input to the convolutional neural network of the model to be trained, and the convolutional neural network outputs a three-dimensional convolution feature for each frame of sample video image, that is, the feature of the sample video image of the frame. It should be understood that multiple frames of sample video images can share a convolutional neural network, and each frame of sample video images in the multiple frame of sample video images can also correspond to a convolutional neural network. Therefore, the number of convolutional neural networks included in the model to be trained may be one or multiple.
S140:将该帧样本视频图像的特征输入所述待训练模型的第一全连接层,得到该帧样本视频图像对应的第三概率,该第三概率表征该帧样本视频图像是否来源于针对活体所采集的视频。S140: Input the features of the sample frame of video image into the first fully connected layer of the model to be trained to obtain a third probability corresponding to the sample frame of video image, and the third probability represents whether the sample frame of video image is derived from a living body The captured video.
示例地,可以将N帧样本视频图像各自的特征依次输入第一全连接层,第一全连接层针对每一帧样本视频图像的特征,输出一个形如(x,y)的概率向量,即所述第三概率,其中x表示该帧样本视频图像是来源于针对活体所采集的视频的概率,y表示该帧样本视频图像是来源于针对非活体所采集的视频的概率。For example, each feature of the N frames of sample video images can be sequentially input to the first fully connected layer, and the first fully connected layer outputs a probability vector of the shape (x, y) for the features of each frame of sample video image, namely In the third probability, x represents the probability that the sample video image of the frame is derived from a video collected for a living body, and y represents the probability that the sample video image of the frame is derived from a video collected for a non-living body.
S150:将所述多帧样本视频图像各自对应的第三概率输入所述待训练模型的第二全连接层,得到所述样本视频是否是针对活体所采集的视频的预估概率。S150: Input the third probability corresponding to each of the multiple frames of sample video images into the second fully connected layer of the model to be trained to obtain the estimated probability of whether the sample video is a video collected for a living body.
示例地,可以将N帧样本视频图像对应的N个第三概率输入第二全连接层,第二全连接层针对N个第三概率,输出一个形如(X,Y)的概率向量,即所述预估概率,其中X表示该样本视频是针对活体所采集的视频的概率,Y表示该样本视频是针对非活体所采集的视频的概率。For example, N third probabilities corresponding to N frames of sample video images can be input to the second fully connected layer, and the second fully connected layer outputs a probability vector of the shape (X, Y) for the N third probabilities, namely The estimated probability, where X represents the probability that the sample video is a video collected for a living body, and Y represents the probability that the sample video is a video collected for a non-living body.
S160:根据所述预估概率以及多帧样本视频图像各自对应的第三概率,建立损失函数,以更新所述待训练模型,获得活体检测模型。S160: Establish a loss function according to the estimated probability and the third probability corresponding to each of the multi-frame sample video images, so as to update the model to be trained and obtain a live detection model.
示例地,根据预估概率,例如形如(X,Y)的概率向量,以及根据多帧样本视频图像各自对应的第三概率,例如形如(x,y)的概率向量,建立损失函数,采用梯度下降法,更新待训练模型的参数,并将更新后的待训练模型投入下一轮训练。经过多轮训练后,获得活体检测模型。例如,在经过固定的M轮训练后,如1000轮训练,结束训练,获得活体检测模型。又例如,在连 续多轮训练的损失函数反映出待训练模型已经可以准确地预测出样本视频是否为活体的情况下,结束训练,获得活体检测模型。For example, based on the estimated probability, such as a probability vector of (X, Y), and a third probability corresponding to each of the multi-frame sample video images, such as a probability vector of (x, y), the loss function is established, Using the gradient descent method, update the parameters of the model to be trained, and put the updated model to be trained into the next round of training. After multiple rounds of training, a live detection model is obtained. For example, after a fixed M round of training, such as 1000 rounds of training, the training ends, and the living body detection model is obtained. For another example, when the loss function of multiple consecutive rounds of training reflects that the model to be trained can accurately predict whether the sample video is a live body, the training is ended to obtain a live body detection model.
例如,建立损失函数的实施方式可以是:将N个第三概率分别与样本视频携带的标记进行比较,其中每个第三概率是预测结果,样本视频携带的标记表征真实情况,得到N个第一比较结果,N个第一比较结果可以表征待训练模型在此轮训练中,对样本视频预测的准确性。然后将预估概率与样本视频携带的标记进行比较,其中预估概率是预测结果,样本视频携带的标记表征真实情况,得到一个第二比较结果,第二比较结果也可以表征待训练模型在此轮训练中,对样本视频预测的准确性。For example, the implementation manner of establishing the loss function may be: comparing the N third probabilities with the markers carried by the sample video respectively, where each third probability is the prediction result, and the marker carried by the sample video represents the real situation, and the Nth probabilities are obtained. A comparison result, the N first comparison results can represent the accuracy of the sample video prediction of the model to be trained in this round of training. Then compare the estimated probability with the mark carried by the sample video, where the estimated probability is the predicted result, and the mark carried by the sample video represents the real situation, and a second comparison result is obtained. The second comparison result can also represent the model to be trained. In rounds of training, the accuracy of the sample video prediction.
最后根据第二比较结果以及N个第一比较结果,对待训练模型的参数进行调整,以更新待训练模型。将更新后的待训练模型投入下一轮训练,经过多轮训练后,获得活体检测模型。Finally, according to the second comparison result and the N first comparison results, the parameters of the model to be trained are adjusted to update the model to be trained. Put the updated model to be trained into the next round of training, and after multiple rounds of training, a live detection model is obtained.
通过在建立损失函数时,同时考虑了样本视频对应的预估概率,以及每帧样本视频图像对应的第三概率,一方面可以加快待训练模型的收敛速度,另一方面使得模型在训练期间不仅基于待训练模型对样本视频的预测准确性,对待训练模型的参数进行更新,还基于待训练模型对每帧样本视频图像的预测准确性,对待训练模型的参数进行更新,使最终得到的活体检测模型可以输出更准确的预测结果。By considering the estimated probability corresponding to the sample video and the third probability corresponding to each frame of the sample video image when establishing the loss function, on the one hand, the convergence speed of the model to be trained can be accelerated, and on the other hand, the model not only Based on the prediction accuracy of the model to be trained on the sample video, the parameters of the model to be trained are updated, and the parameters of the model to be trained are updated based on the prediction accuracy of each frame of the sample video image of the model to be trained, so that the final live detection The model can output more accurate prediction results.
通过执行步骤S110至步骤S160,获得了第一活体检测模型,该第一活体检测模型在应用期间,可以执行以下各步骤中的部分步骤或全部步骤:从针对待检测对象所采集的视频中提取出多帧视频图像,再针对每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征待检测对象是否为活体的第一概率,最后根据所确定出的多个第一概率,综合地确定待检测对象是否为活体。By performing step S110 to step S160, the first living body detection model is obtained. During the application of the first living body detection model, some or all of the following steps can be performed: Extract from the video collected for the object to be detected Generate multiple frames of video images, and then for each frame of video image, according to the characteristics of the frame of video image, determine the first probability that the frame of video image represents whether the object to be detected is a living body, and finally according to the determined multiple first probabilities, Comprehensively determine whether the object to be detected is a living body.
为了进一步提高活体检测的准确性,本申请申请人发现,除了从一段视频中提取的多帧视频图像可以表征该段视频,多帧视频图像的帧间相关性也可以用于表征该段视频,如果同时利用多帧视频图像以及多帧视频图像的帧间相关性表征该段视频,在进行活体检测时,进一步引入帧间相关性,可以进一步提高活体检测准确性。In order to further improve the accuracy of living body detection, the applicant of this application found that in addition to the multi-frame video image extracted from a video can characterize the video, the inter-frame correlation of the multi-frame video image can also be used to characterize the video. If multiple frames of video images and the inter-frame correlation of the multi-frame video images are used to characterize the video at the same time, the inter-frame correlation can be further introduced when performing living detection, which can further improve the accuracy of living detection.
基于上述发现,申请人进一步提出:将帧间相关性引入活体检测方法,首先确定出帧间相关性表征待检测对象是否为活体的第二概率,然后再根据该第二概率和多帧视频图像各自对应的第一概率,确定待检测对象是否为活体。从而进一步提高活体检测的准确性。Based on the above findings, the applicant further proposes that the inter-frame correlation is introduced into the living body detection method, and the second probability that the inter-frame correlation characterizes whether the object to be detected is a living body is determined first, and then the second probability and the multi-frame video image The first probability corresponding to each determines whether the object to be detected is a living body. Thereby further improving the accuracy of living body detection.
为了能更智能地实施申请人提出的上述方法,使得该方法的应用范围更广,申请人首先构建了待训练模型,并基于样本视频集对该待训练模型进行训练,得到活体检测模型。申请人利用该活体检测模型执行上述进一步提出的方法中的部分或全部步骤。In order to implement the above method proposed by the applicant more intelligently and make the application of the method wider, the applicant first constructed the model to be trained, and trained the model to be trained based on the sample video set to obtain the live detection model. The applicant uses the living body detection model to perform some or all of the steps in the method further proposed above.
请继续参考图1,图1中,活体检测模型还可以包括:特征组合模块和第三全连接层。应当理解的,申请人预先构建的待训练模型的模型结构与图1所示的活体检测模型的模型结构相同,待训练模型同样还可以包括特征组合模块和第三全连接层,而经过训练后,待训练模型的模型参数被更新调整,最终得到活体检测模型。Please continue to refer to FIG. 1. In FIG. 1, the living body detection model may also include: a feature combination module and a third fully connected layer. It should be understood that the model structure of the model to be trained constructed by the applicant in advance is the same as the model structure of the living body detection model shown in FIG. 1. The model to be trained may also include a feature combination module and a third fully connected layer, and after training , The model parameters of the model to be trained are updated and adjusted, and finally the live detection model is obtained.
为了对待训练模型进行训练,以得到活体检测模型,本申请一实施例在步骤S110、S120、S130、S140以及S160的基础上,进一步提出步骤S142、S144以及S150’,需要预先说明的是,步骤S130、S140、S142、S144、S150’以及S160是多轮训练中每一轮训练的各个步骤:In order to train the training model to obtain the living body detection model, an embodiment of the present application further proposes steps S142, S144, and S150' on the basis of steps S110, S120, S130, S140, and S160. What needs to be explained in advance S130, S140, S142, S144, S150' and S160 are the steps of each round of training in multiple rounds:
S142:对所述多帧样本视频图像各自的特征进行拼接,得到所述样本视频的特征。S142: Splicing each feature of the multiple frames of sample video images to obtain the feature of the sample video.
示例地,在经过步骤S130得到N个三维的卷积特征后,可以利用特征组合模块将这N个三维的卷积特征进行堆叠,得到一个新的三维的卷积特征,作为所述样本视频的特征,样本视频的特征可以表征多帧样本视频图像的帧间相关性。例如在经过步骤S130得到8个36*36*25的卷积特征,将这8个36*36*25的卷积特征进行堆叠后,得到一个36*36*200的卷积特征,该36*36*200的卷积特征作为样本视频的特征。For example, after obtaining N three-dimensional convolution features through step S130, the feature combination module can be used to stack the N three-dimensional convolution features to obtain a new three-dimensional convolution feature as the sample video Features. The features of the sample video can characterize the inter-frame correlation of multiple frames of sample video images. For example, after step S130, 8 36*36*25 convolution features are obtained, and after these 8 36*36*25 convolution features are stacked, a 36*36*200 convolution feature is obtained. The 36*200 convolution feature is used as the feature of the sample video.
S144:将所述样本视频的特征输入所述待训练模型的第三全连接层,得到所述样本视频是否是针对活体所采集的视频的第四概率。S144: Input the characteristics of the sample video into the third fully connected layer of the model to be trained, and obtain a fourth probability of whether the sample video is a video collected for a living body.
示例地,可以将样本视频的特征输入第三全连接层,第三全连接层针对样本视频的特征,输出形如(x’,y’)的概率向量,即所述第四概率,其中x’ 表示该样本视频是针对活体所采集的视频的概率,y’表示该样本视频是针对非活体所采集的视频的概率。For example, the characteristics of the sample video can be input to the third fully connected layer, and the third fully connected layer outputs the probability vector of the shape (x', y') for the characteristics of the sample video, that is, the fourth probability, where x 'Represents the probability that the sample video is a video collected for a living body, and y'represents the probability that the sample video is a video collected for a non-living body.
S150’:将所述多帧样本视频图像各自对应的第三概率输入所述待训练模型的第二全连接层,得到所述样本视频是否是针对活体所采集的视频的预估概率。该步骤具体包括:将所述第四概率以及所述多帧样本视频图像各自对应的第三概率输入所述待训练模型的第二全连接层,得到所述样本视频是否是针对活体所采集的视频的预估概率。S150': Input the third probability corresponding to each of the multiple frames of sample video images into the second fully connected layer of the model to be trained to obtain the estimated probability of whether the sample video is a video collected by a living body. This step specifically includes: inputting the fourth probability and the third probability corresponding to each of the multi-frame sample video images into the second fully connected layer of the model to be trained to obtain whether the sample video is collected for a living body The estimated probability of the video.
示例地,可以将样本视频对应的第四概率以及N帧样本视频图像对应的N个第三概率输入第二全连接层,第二全连接层针对N+1个概率,输出一个形如(X,Y)的概率向量,即所述预估概率,其中X表示该样本视频是针对活体所采集的视频的概率,Y表示该样本视频是针对非活体所采集的视频的概率。For example, the fourth probability corresponding to the sample video and the N third probabilities corresponding to the N frames of the sample video image can be input to the second fully connected layer, and the second fully connected layer outputs a form such as (X , Y) is the estimated probability, where X represents the probability that the sample video is a video collected for a living body, and Y represents the probability that the sample video is a video collected for a non-living body.
通过执行步骤S110、S120、S130、S140、S142、S144、S150’以及S160,获得了第二活体检测模型,该第二活体检测模型在应用期间,可以执行以下各步骤中的部分步骤或全部步骤:从针对待检测对象所采集的视频中提取多帧视频图像,针对每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征待检测对象是否为活体的第一概率;并针对多帧视频图像的帧间相关性,确定出帧间相关性表征待检测对象是否为活体的第二概率,最后根据该第二概率和多帧视频图像各自对应的第一概率,确定待检测对象是否为活体。By performing steps S110, S120, S130, S140, S142, S144, S150' and S160, a second living body detection model is obtained. During the application of the second living body detection model, some or all of the following steps can be executed : Extract multiple frames of video images from the video collected for the object to be detected, for each frame of video image, according to the characteristics of the frame of video image, determine the first probability that the frame of video image represents whether the object to be detected is a living body; and The inter-frame correlation of the multi-frame video image determines the second probability that the inter-frame correlation characterizes whether the object to be detected is a living body, and finally the object to be detected is determined according to the second probability and the first probability corresponding to each of the multi-frame video images Whether it is alive.
本申请的上述各实施例主要提出了基于样本视频集,对待训练模型的两种训练过程,并最终分别得到第一活体检测模型和第二活体检测模型。以下,本申请将着重介绍活体检测方法,并示意性地介绍如何将第一活体检测模型或第二活体检测模型应用于活体检测方法中。The foregoing embodiments of the present application mainly propose two training processes for the model to be trained based on a sample video set, and finally obtain the first living body detection model and the second living body detection model respectively. Hereinafter, this application will focus on the live body detection method, and schematically introduce how to apply the first live body detection model or the second live body detection model to the live body detection method.
参考图2,图2是本申请一实施例提出的活体检测方法的流程图。如图2所示,该方法包括以下步骤:Referring to FIG. 2, FIG. 2 is a flowchart of a living body detection method proposed in an embodiment of the present application. As shown in Figure 2, the method includes the following steps:
S22:从针对待检测对象所采集的视频中提取多帧视频图像。S22: Extract multiple frames of video images from the video collected for the object to be detected.
本实施例中,待检测对象是指:需要检测其是否为活体的对象。示例地,待检测对象不局限于仅是待检测的人脸,例如待检测对象还可以是待检测的 掌纹或指纹等。如果待检测对象是掌纹,则针对待检测对象所采集的视频是一段针对待检测的掌纹所拍摄的视频。In this embodiment, the object to be detected refers to an object that needs to be detected whether it is a living body. For example, the object to be detected is not limited to only the face to be detected. For example, the object to be detected may also be a palm print or fingerprint to be detected. If the object to be detected is a palm print, the video collected for the object to be detected is a video shot for the palm print to be detected.
本实施例中,所述方法还包括:获得视频采集装置在所述待检测对象处于静默状态下所采集的视频。In this embodiment, the method further includes: obtaining a video collected by the video collection device when the object to be detected is in a silent state.
换言之,针对待检测对象所采集的视频可以是针对该待检测对象所采集的静默视频。示例地,在待检测对象呈静默状态时,为待检测对象采集一段视频,例如为待检测对象采集一段1至3秒的短视频。本实施例中,在对用户采集视频时,仅需用户注视视频采集装置即可,不要求用户在镜头前完成张嘴、眨眼、朗读等指定面部动作,不仅可以避免面部动作对人脸识别准确性的影响,又可以使得用户在不必做出指定面部动作的情况下,完成活体检测,从而提高用户体验。In other words, the video collected for the object to be detected may be a silent video collected for the object to be detected. For example, when the object to be detected is in a silent state, a video is collected for the object to be detected, for example, a short video of 1 to 3 seconds is collected for the object to be detected. In this embodiment, when collecting a video from a user, the user only needs to look at the video capture device, and the user is not required to complete specified facial actions such as opening mouth, blinking, and reading aloud in front of the camera, which not only avoids the accuracy of face recognition by facial actions The impact of this can also enable users to complete living body detection without having to make specified facial actions, thereby improving user experience.
本实施例中,在从视频中提取多帧视频图像时,可以等帧间间隔地提取,提取的视频图像可以是RGB图像。示例地,针对一段视频,例如每隔5帧视频图像,提取出一帧视频图像。以一段视频包括48帧视频图像为例,提取出的各帧视频图像分别是:第6帧、第12帧、第18帧、第24帧、第30帧、第36帧、第42帧、第48帧。In this embodiment, when multiple frames of video images are extracted from a video, they may be extracted at equal intervals between frames, and the extracted video images may be RGB images. For example, for a piece of video, for example, every 5 frames of video image, one frame of video image is extracted. Taking a video including 48 frames of video images as an example, the extracted video images of each frame are: frame 6, frame 12, frame 18, frame 24, frame 30, frame 36, frame 42 and frame 48 frames.
或者,本实施例中,在从视频中提取多帧视频图像时,可以首先将该视频分为多个子段,然后从每个子段中提取一帧视频图像。示例地,针对一段视频,例如将该视频等分为N个子段,针对每个子段,从中随机提取一帧视频图像,或者从该子段的中间处提取一帧视频图像。Alternatively, in this embodiment, when extracting multiple frames of video images from a video, the video may be divided into multiple sub-segments first, and then one frame of video image is extracted from each sub-segment. For example, for a piece of video, for example, the video is equally divided into N sub-segments, and for each sub-segment, one frame of video image is randomly extracted therefrom, or one frame of video image is extracted from the middle of the sub-segment.
以上实施例中,通过等帧间间隔地提取多帧视频图像,或通过将视频分为多个子段,然后从每个子段中提取一帧视频图像,使得提取到的多帧视频图像是均匀分布在该视频中的视频图像,多帧视频图像能更准确地表征该视频的内容,从而进一步提高活体检测准确性。In the above embodiments, multiple frames of video images are extracted at equal intervals between frames, or by dividing the video into multiple sub-segments, and then extracting one frame of video images from each sub-segment, so that the extracted multiple frames of video images are evenly distributed Among the video images in the video, multiple frames of video images can more accurately characterize the content of the video, thereby further improving the accuracy of living body detection.
本实施例中,从针对待检测对象所采集的视频中提取出的多帧视频图像,利用多帧视频图像表征该段视频,使得本申请所提出的活体检测方法是以视频为基础,进行活体检测。相比于现有技术中是针对单张图像进行活体检测,本申请以视频为基础,执行活体检测,检测结果更加准确。又由于本申请针对待检测对象所采集的视频,从其中提取多帧视频图像,可以减少该 视频的冗余信息,从而降低计算量,提高检测效率。In this embodiment, the multi-frame video image extracted from the video collected for the object to be detected is characterized by using the multi-frame video image to characterize the video, so that the living body detection method proposed in this application is based on the video. Detection. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate. In addition, since this application extracts multiple frames of video images from the video collected by the object to be detected, the redundant information of the video can be reduced, thereby reducing the amount of calculation and improving the detection efficiency.
S24:针对所述多帧视频图像中的每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征所述待检测对象是否为活体的第一概率。S24: For each frame of video image in the multiple frames of video images, determine the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image.
本实施例中,视频图像的特征可以是卷积特征。示例地,为了能根据每帧视频图像的特征,确定该帧视频图像对应的第一概率,可以利用上述经过训练所获得的第一活体检测模型。具体地,将所述多帧视频图像中的每帧视频图像输入所述活体检测模型的卷积层,得到该帧视频图像的特征;再将该帧视频图像的特征输入所述活体检测模型的第一全连接层,以确定该帧视频图像表征所述待检测对象是否为活体的第一概率。In this embodiment, the feature of the video image may be a convolution feature. For example, in order to determine the first probability corresponding to each frame of video image according to the characteristics of each frame of video image, the first living body detection model obtained through training may be used. Specifically, each frame of the video image in the multi-frame video image is input into the convolutional layer of the living body detection model to obtain the characteristics of the frame of video image; and then the characteristics of the frame video image are input into the living body detection model. The first fully connected layer is used to determine the first probability that the frame of video image represents whether the object to be detected is a living body.
其中,每帧视频图像对应的第一概率可以是形如(x,y)的概率向量,其中x表示待检测对象是活体的概率,y表示待检测对象是非活体的概率。Wherein, the first probability corresponding to each frame of video image may be a probability vector in the form of (x, y), where x represents the probability that the object to be detected is a living body, and y represents the probability that the object to be detected is inanimate.
在实际应用中,可以通过卷积神经网络获得每帧视频图像的特征,也可以使用其它图像特征提取的方法提取每帧视频图像的特征。然后将每帧视频图像的特征输入上述第一活体检测模型的第一全连接层,以确定该帧视频图像表征所述待检测对象是否为活体的第一概率。In practical applications, the feature of each frame of video image can be obtained through a convolutional neural network, or other image feature extraction methods can be used to extract the feature of each frame of video image. Then, the feature of each frame of video image is input into the first fully connected layer of the first living body detection model to determine the first probability that the frame of video image represents whether the object to be detected is a living body.
S26:根据所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体。S26: Determine whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images.
示例地,可以利用上述经过训练所获得的第一活体检测模型确定待检测对象是否为活体。具体地,将所述多帧视频图像各自对应的第一概率输入所述活体检测模型的第二全连接层,确定所述待检测对象是否为活体。例如,将所多帧视频图像中的每帧视频图像输入所述活体检测模型中的卷积层,卷积层输出每帧视频图像的特征;每帧视频图像的特征接着被输入至活体检测模型的第一全连接层,第一全连接层输出每帧视频图像对应的第一概率;每帧视频图像对应的第一概率再被输入至活体检测模型的第二全连接层,第二全连接层输出预估概率,该预估概率是表征待检测对象是否为活体的综合性概率。For example, the first living body detection model obtained through training may be used to determine whether the object to be detected is a living body. Specifically, the first probability corresponding to each of the multiple frames of video images is input into the second fully connected layer of the living body detection model to determine whether the object to be detected is a living body. For example, each frame of video image in the multiple frames of video images is input to the convolutional layer in the living body detection model, and the convolutional layer outputs the characteristics of each frame of video image; the characteristics of each frame of video image are then input to the living body detection model The first fully connected layer, the first fully connected layer outputs the first probability corresponding to each frame of video image; the first probability corresponding to each frame of video image is then input to the second fully connected layer of the live detection model, the second fully connected The layer outputs an estimated probability, which is a comprehensive probability that characterizes whether the object to be detected is a living body.
其中,预估概率可以是形如(X,Y)的概率向量,其中X表示待检测对象是活体的综合性概率,Y表示待检测对象是非活体的综合性概率。可通过比较X与Y的大小关系,在X大于Y的情况下,确定待检测对象是活体。Among them, the estimated probability may be a probability vector in the form of (X, Y), where X represents the comprehensive probability that the object to be detected is a living body, and Y represents the comprehensive probability that the object to be detected is a non-living body. By comparing the size relationship between X and Y, when X is greater than Y, it can be determined that the object to be detected is a living body.
或者示例地,在通过步骤S24获得多帧视频图像各自对应的第一概率后,可以通过计算多个第一概率的平均值,以确定待检测对象是否为活体。例如多帧视频图像各自对应的第一概率是形如(x,y)的概率向量,其中x表示待检测对象是活体的概率,y表示待检测对象是非活体的概率。假设从视频中提取的8帧视频图像各自对应的概率向量分别是:(35.9,13.0)、(43.2,5.6)、(34.7,14.3)、(44.6,5.4)、(58.6,2.1)、(41.8,6.7)、(29.2,17.8)、(21.4,22.8),根据以上8个概率向量,计算出综合的平均概率向量为(38.7,11.0),其中表示待检测对象是活体的概率大于表示待检测对象是非活体的概率,确定待检测对象是活体。Or as an example, after obtaining the respective first probabilities of the multiple frames of video images through step S24, the average value of the multiple first probabilities may be calculated to determine whether the object to be detected is a living body. For example, the first probability corresponding to multiple frames of video images is a probability vector of the shape (x, y), where x represents the probability that the object to be detected is a living body, and y represents the probability that the object to be detected is non-living. Suppose the probability vectors corresponding to the 8 frames of video images extracted from the video are: (35.9,13.0), (43.2,5.6), (34.7,14.3), (44.6,5.4), (58.6,2.1), (41.8) ,6.7), (29.2,17.8), (21.4,22.8), based on the above 8 probability vectors, the integrated average probability vector is calculated as (38.7,11.0), where the probability that the object to be detected is alive is greater than that of the object to be detected The probability that the object is a non-living body, determines that the object to be detected is a living body.
通过执行步骤S22、步骤S24以及步骤S26,以一段针对待检测对象所采集的视频为基础,执行活体检测。具体地,从该视频中提取多帧视频图像,利用多帧视频图像表征该段视频,再针对每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征待检测对象是否为活体的第一概率,最后根据所确定出的多个第一概率,综合地确定待检测对象是否为活体。相比于现有技术中是针对单张图像进行活体检测,本申请以视频为基础,执行活体检测,检测结果更加准确。By performing step S22, step S24, and step S26, a live body detection is performed based on a video collected for the object to be detected. Specifically, extract multiple frames of video images from the video, use multiple frames of video images to characterize the video, and then for each frame of video image, according to the characteristics of the frame of video image, determine whether the frame of video image represents whether the object to be detected is a living body Finally, according to the determined multiple first probabilities, comprehensively determine whether the object to be detected is a living body. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate.
本申请申请人发现,除了从一段视频中提取的多帧视频图像可以表征该段视频,多帧视频图像的帧间相关性也可以用于表征该段视频,如果同时利用多帧视频图像以及多帧视频图像的帧间相关性表征该段视频,在进行活体检测时,进一步引入帧间相关性,可以进一步提高活体检测准确性。The applicant of this application found that in addition to multiple frames of video images extracted from a piece of video, the inter-frame correlation of multiple frames of video images can also be used to characterize the piece of video. If multiple frames of video images and multiple The inter-frame correlation of a frame video image characterizes the video. When performing live detection, further introducing inter-frame correlation can further improve the accuracy of live detection.
为了进一步提高活体检测的准确性,参考图3,图3是本申请一实施例提出的活体检测方法的另一流程图。如图3所示,该方法包括以下步骤:In order to further improve the accuracy of the living body detection, refer to FIG. 3, which is another flowchart of the living body detection method proposed in an embodiment of the present application. As shown in Figure 3, the method includes the following steps:
S22:从针对待检测对象所采集的视频中提取多帧视频图像。S22: Extract multiple frames of video images from the video collected for the object to be detected.
S24:针对所述多帧视频图像中的每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征所述待检测对象是否为活体的第一概率。S24: For each frame of video image in the multiple frames of video images, determine the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image.
S25:根据所述多帧视频图像的帧间相关性,确定所述帧间相关性表征所述待检测对象是否为活体的第二概率;S25: According to the inter-frame correlation of the multi-frame video image, determine the second probability that the inter-frame correlation characterizes whether the object to be detected is a living body;
本实施例中,帧间相关性是指:多帧视频图像的帧与帧之间的信息。具体地,可以针对多帧视频图像中的每帧视频图像,提取该帧视频图像的特征, 并对多帧视频图像各自的特征进行拼接,得到所述视频的特征,该视频特征用于表征所述帧间相关性。In this embodiment, the inter-frame correlation refers to: information between frames of a multi-frame video image. Specifically, for each frame of video images in a multi-frame video image, the feature of the frame of video image can be extracted, and the respective features of the multi-frame video image can be spliced to obtain the feature of the video, and the video feature is used to characterize all the video images. The inter-frame correlation.
示例地,可以将多帧视频图像输入至上述经过训练所获得的第二活体检测模型的卷积层,该卷积层输出每帧视频图像的三维的卷积特征,该三维的卷积特征即是视频图像的特征。然后将多个三维的卷积特征进行堆叠,得到一个新的三维的卷积特征,作为视频的特征,该视频特征用于表征所述帧间相关性。例如,8帧视频图像输入活体检测模型后,该活体检测模型的卷积层输出8个36*36*25的卷积特征,该活体检测模型的特征组合模块将这8个36*36*25的卷积特征进行堆叠后,得到一个36*36*200的卷积特征,该36*36*200的卷积特征作为视频的特征。For example, multiple frames of video images can be input to the convolutional layer of the second living detection model obtained through training, and the convolutional layer outputs the three-dimensional convolution feature of each frame of video image, and the three-dimensional convolution feature is It is a feature of video images. Then stacking a plurality of three-dimensional convolution features to obtain a new three-dimensional convolution feature as the feature of the video, and the video feature is used to characterize the inter-frame correlation. For example, after 8 frames of video images are input to the living body detection model, the convolutional layer of the living body detection model outputs 8 36*36*25 convolutional features, and the feature combination module of the living body detection model combines these 8 36*36*25 After stacking the convolution features of, a 36*36*200 convolution feature is obtained, and the 36*36*200 convolution feature is used as the video feature.
本实施例中,为了确定所述帧间相关性表征所述待检测对象是否为活体的第二概率,可以将表征帧间相关性的视频特征输入上述经过训练所获得的第二活体检测模型的第三全连接层,该第三全连接层根据该视频特征,输出表征待检测对象是否为活体的第二概率。其中,第二概率可以是形如(x’,y’)的概率向量,其中x’表示待检测对象是活体的概率,y’表示待检测对象是非活体的概率。In this embodiment, in order to determine the second probability that the inter-frame correlation characterizes whether the object to be detected is a living body, the video feature characterizing the inter-frame correlation may be input to the second living body detection model obtained through training. The third fully connected layer outputs a second probability that characterizes whether the object to be detected is a living body according to the video feature. The second probability may be a probability vector in the form of (x', y'), where x'represents the probability that the object to be detected is a living body, and y'represents the probability that the object to be detected is non-living.
S26’:根据所述第二概率和所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体。S26': Determine whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images.
示例地,在根据所述第二概率和所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体时,可以利用上述经过训练所获得的第二活体检测模型,以确定待检测对象是否为活体。具体的,可以将帧间相关性表征待检测对象是否为活体的第二概率,和多帧视频图像各自对应的第一概率输入至第二活体检测模型的第二全连接层,该第二全连接层输出预估概率,该预估概率是表征待检测对象是否为活体的综合性概率。For example, when determining whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images, the second living body detection model obtained through training can be used to Determine whether the object to be detected is alive. Specifically, the second probability of the inter-frame correlation characterizing whether the object to be detected is a living body, and the first probability corresponding to each of the multiple frames of video images may be input to the second fully connected layer of the second living body detection model, and the second fully connected layer The connection layer outputs an estimated probability, which is a comprehensive probability that characterizes whether the object to be detected is a living body.
其中,预估概率可以是形如(X,Y)的概率向量,其中X表示待检测对象是活体的综合性概率,Y表示待检测对象是非活体的综合性概率。可通过比较X与Y的大小关系,在X大于Y的情况下,确定待检测对象是活体。Among them, the estimated probability may be a probability vector in the form of (X, Y), where X represents the comprehensive probability that the object to be detected is a living body, and Y represents the comprehensive probability that the object to be detected is a non-living body. By comparing the size relationship between X and Y, when X is greater than Y, it can be determined that the object to be detected is a living body.
或者示例地,根据所述第二概率和所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体,可以具体包括:Or as an example, determining whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images may specifically include:
S26'-1:为所述第二概率和所述多帧视频图像各自对应的第一概率分配权重,其中,所述第二概率对应的权重大于每个所述第一概率对应的权重;S26'-1: Assign weights to the first probabilities corresponding to the second probability and the multiple frames of video images, wherein the weight corresponding to the second probability is greater than the weight corresponding to each of the first probabilities;
S26'-2:根据所述第二概率及其对应的概率,以及所述多帧视频图像各自对应的第一概率及其对应的权重,确定所述待检测对象是否为活体。S26'-2: Determine whether the object to be detected is a living body according to the second probability and its corresponding probability, and the first probability and its corresponding weight corresponding to each of the multiple frames of video images.
例如,8帧视频图像输入活体检测模型后,经过活体检测模型的卷积层和第一全连接层,输出8帧视频图像各自对应的第一概率,假设8个第一概率分别是:(35.9,13.0)、(43.2,5.6)、(34.7,14.3)、(44.6,5.4)、(58.6,2.1)、(41.8,6.7)、(29.2,17.8)、(21.4,22.8)。卷积层针对每帧视频图像输出的特征均是36*36*25的卷积特征,特征组合模块将这8个36*36*25的卷积特征进行堆叠后,得到一个36*36*200的卷积特征,该36*36*200的卷积特征作为视频的特征,表征多帧视频图像的帧间相关性。该视频的特征经过活体检测模型的第三全连接层后,输出第二概率,假设第二概率是(50.1,3.5)。For example, after 8 frames of video images are input to the living body detection model, they pass through the convolutional layer and the first fully connected layer of the living body detection model, and output the first probabilities corresponding to each of the 8 frames of video images. Suppose the 8 first probabilities are: (35.9 ,13.0), (43.2,5.6), (34.7,14.3), (44.6,5.4), (58.6,2.1), (41.8,6.7), (29.2,17.8), (21.4,22.8). The output features of the convolutional layer for each frame of video image are all 36*36*25 convolution features. The feature combination module stacks these 8 36*36*25 convolution features to obtain a 36*36*200 The convolution feature of 36*36*200 is used as the feature of the video, which represents the inter-frame correlation of multiple frames of video images. After the feature of the video passes through the third fully connected layer of the live detection model, the second probability is output, assuming that the second probability is (50.1, 3.5).
然后为第二概率和多帧视频图像各自对应的第一概率分配权重,例如为第二概率分配的权重是1/2,为每个第一概率分配的权重是1/16。根据所述第二概率及其对应的概率,以及所述多帧视频图像各自对应的第一概率及其对应的权重,计算加权平均概率,根据该加权平均概率确定待检测对象是否为活体。具体地,经过加权平均计算后,得到的加权平均概率是(44.4,7.3),其中表示待检测对象是活体的概率大于表示待检测对象是非活体的概率,则确定出待检测对象是活体。Then, the second probability and the first probability corresponding to the multiple frames of video images are assigned weights. For example, the weight assigned to the second probability is 1/2, and the weight assigned to each first probability is 1/16. The weighted average probability is calculated according to the second probability and its corresponding probability, and the first probability and its corresponding weight corresponding to each of the multiple frames of video images, and whether the object to be detected is a living body is determined according to the weighted average probability. Specifically, after weighted average calculation, the weighted average probability obtained is (44.4, 7.3), where the probability that the object to be detected is alive is greater than the probability that the object to be detected is non-living, and it is determined that the object to be detected is alive.
通过执行步骤S26'-1和步骤S26'-2,为第二概率分配较大权重,可以突出多帧视频图像的帧间相关性在表征一段视频信息时所占的比重,以及在活体检测过程中对提高检测准确性所发挥的作用,从而进一步提高活体检测的准确性。By performing step S26'-1 and step S26'-2 to assign a larger weight to the second probability, it is possible to highlight the proportion of the inter-frame correlation of multi-frame video images in characterizing a piece of video information, as well as in the live detection process It plays a role in improving the accuracy of detection, thereby further improving the accuracy of live detection.
应当理解的,本申请上述各实施例所列举的数值,如第一概率、第二概率的具体数值,又如卷积特征各维度的数值等,均是示意性的数值,用于示意性地对各实施例步骤进行解释。It should be understood that the numerical values listed in the foregoing embodiments of the present application, such as the specific numerical values of the first probability and the second probability, and the numerical values of each dimension of the convolution feature, are all illustrative values and are used for illustrative purposes. The steps of each embodiment are explained.
基于同一发明构思,本申请一实施例提供一种活体检测装置。参考图4,图4是本申请一实施例提供的活体检测装置的示意图。如图4所示,该装置包括:Based on the same inventive concept, an embodiment of the present application provides a living body detection device. Referring to FIG. 4, FIG. 4 is a schematic diagram of a living body detection device provided by an embodiment of the present application. As shown in Figure 4, the device includes:
第一提取模块41,用于从针对待检测对象所采集的视频中提取多帧视频图像;The first extraction module 41 is configured to extract multiple frames of video images from the video collected for the object to be detected;
第一确定模块42,用于针对所述多帧视频图像中的每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征所述待检测对象是否为活体的第一概率;The first determining module 42 is configured to, for each frame of video image in the multi-frame video image, determine the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image;
第二确定模块43,用于根据所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体。The second determining module 43 is configured to determine whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images.
可选地,所述装置还包括:Optionally, the device further includes:
第三确定模块,用于根据所述多帧视频图像的帧间相关性,确定所述帧间相关性表征所述待检测对象是否为活体的第二概率;A third determining module, configured to determine, according to the inter-frame correlation of the multiple frames of video images, the second probability that the inter-frame correlation characterizes whether the object to be detected is a living body;
所述第二确定模块包括:The second determining module includes:
第一确定子模块,用于根据所述第二概率和所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体。The first determination submodule is configured to determine whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images.
可选地,所述装置还包括:Optionally, the device further includes:
第一拼接模块,用于对所述多帧视频图像各自的特征进行拼接,得到所述视频的特征,该视频特征用于表征所述帧间相关性。The first splicing module is used to splice the respective characteristics of the multiple frames of video images to obtain the characteristics of the video, and the video characteristics are used to characterize the inter-frame correlation.
可选地,所述第一确定子模块包括:Optionally, the first determining submodule includes:
分配子单元,用于为所述第二概率和所述多帧视频图像各自对应的第一概率分配权重,其中,所述第二概率对应的权重大于每个所述第一概率对应的权重;An allocation subunit, configured to allocate weights to the first probabilities corresponding to the second probability and the multiple frames of video images, wherein the weight corresponding to the second probability is greater than the weight corresponding to each of the first probabilities;
确定子单元,用于根据所述第二概率及其对应的概率,以及所述多帧视频图像各自对应的第一概率及其对应的权重,确定所述待检测对象是否为活体。The determining subunit is configured to determine whether the object to be detected is a living body according to the second probability and its corresponding probability, and the first probability and its corresponding weight corresponding to each of the multiple frames of video images.
可选地,所述装置还包括:Optionally, the device further includes:
第一获得模块,用于获得样本视频集,所述样本集包括多个携带标记的样本视频,样本视频携带的标记表征该样本视频是否是针对活体所采集的视频;The first obtaining module is configured to obtain a sample video set, the sample set includes a plurality of sample videos carrying tags, and the tags carried by the sample videos indicate whether the sample video is a video collected for a living body;
第二提取模块,用于针对所述样本视频集包括的每个携带标记的样本视频,从该携带标记的样本视频中提取多帧样本视频图像;The second extraction module is configured to extract a multi-frame sample video image from the sample video with a mark for each sample video with a mark included in the sample video set;
第一输入模块,用于将所述多帧样本视频图像中的每帧样本视频图像输入待训练模型的卷积层,得到该帧样本视频图像的特征;The first input module is configured to input each frame of the sample video image in the multi-frame sample video image into the convolutional layer of the model to be trained to obtain the characteristics of the frame sample video image;
第二输入模块,用于将该帧样本视频图像的特征输入所述待训练模型的第一全连接层,得到该帧样本视频图像对应的第三概率,该第三概率表征该帧样本视频图像是否来源于针对活体所采集的视频;The second input module is used to input the characteristics of the frame sample video image into the first fully connected layer of the model to be trained to obtain a third probability corresponding to the frame sample video image, and the third probability represents the frame sample video image Whether it comes from a video collected from a living body;
第三输入模块,用于将所述多帧样本视频图像各自对应的第三概率输入所述待训练模型的第二全连接层,得到所述样本视频是否是针对活体所采集的视频的预估概率;The third input module is configured to input the third probability corresponding to each of the multi-frame sample video images into the second fully connected layer of the model to be trained to obtain an estimate of whether the sample video is a video collected by a living body Probability
第二获得模块,用于根据所述预估概率以及多帧样本视频图像各自对应的第三概率,建立损失函数,以更新所述待训练模型,获得活体检测模型;The second obtaining module is configured to establish a loss function according to the estimated probability and the third probability corresponding to each of the multi-frame sample video images, so as to update the model to be trained and obtain a live detection model;
所述第一确定模块包括:The first determining module includes:
第一输入子模块,用于将所述多帧视频图像中的每帧视频图像输入所述活体检测模型的卷积层,得到该帧视频图像的特征;The first input submodule is configured to input each frame of the video image in the multi-frame video image into the convolutional layer of the living body detection model to obtain the characteristics of the frame of video image;
第二输入子模块,用于将该帧视频图像的特征输入所述活体检测模型的第一全连接层,以确定该帧视频图像表征所述待检测对象是否为活体的第一概率。The second input submodule is configured to input the characteristics of the frame of video image into the first fully connected layer of the living body detection model to determine the first probability that the frame of video image represents whether the object to be detected is a living body.
可选地,所述装置还包括:Optionally, the device further includes:
第二拼接模块,用于对所述多帧样本视频图像各自的特征进行拼接,得到所述样本视频的特征;The second splicing module is used to splice the respective characteristics of the multiple frames of sample video images to obtain the characteristics of the sample video;
第四输入模块,用于将所述样本视频的特征输入所述待训练模型的第三全连接层,得到所述样本视频是否是针对活体所采集的视频的第四概率;The fourth input module is configured to input the characteristics of the sample video into the third fully connected layer of the model to be trained to obtain the fourth probability of whether the sample video is a video collected for a living body;
所述第三输入模块包括:The third input module includes:
第三输入子模块,用于将所述第四概率以及所述多帧样本视频图像各自对应的第三概率输入所述待训练模型的第二全连接层,得到所述样本视频是否是针对活体所采集的视频的预估概率。The third input sub-module is used to input the fourth probability and the third probability corresponding to each of the multi-frame sample video images into the second fully connected layer of the model to be trained to obtain whether the sample video is for a living body The estimated probability of the captured video.
可选地,所述第三确定模块包括:Optionally, the third determining module includes:
第四输入子模块,用于将所述多帧视频图像各自对应的第一概率输入所述活体检测模型的第二全连接层,确定所述待检测对象是否为活体。The fourth input sub-module is configured to input the first probability corresponding to each of the multiple frames of video images into the second fully connected layer of the living body detection model to determine whether the object to be detected is a living body.
可选地,所述装置还包括:Optionally, the device further includes:
第三获得模块,用于获得视频采集装置在所述待检测对象处于静默状态下所采集的视频。The third obtaining module is used to obtain the video captured by the video capturing device when the object to be detected is in a silent state.
基于相同的发明构思,本申请实施例还提供了一种电子设备,该电子设备的结构示意图如图4所示,该电子设备7000包括至少一个处理器7001、存储器7002和总线7003,至少一个处理器7001均与存储7002电连接;存储器7002被配置用于存储有至少一个计算机可执行指令,处理器7001被配置用于执行该至少一个计算机可执行指令,从而执行如本申请实施例一中任意一个实施例或任意一种可选实施方式提供的任意一种活体检测方法的步骤。Based on the same inventive concept, an embodiment of the present application also provides an electronic device. A schematic structural diagram of the electronic device is shown in FIG. 4. The electronic device 7000 includes at least one processor 7001, a memory 7002, and a bus 7003. The memory 7001 is electrically connected to the storage 7002; the memory 7002 is configured to store at least one computer-executable instruction, and the processor 7001 is configured to execute the at least one computer-executable instruction, so as to execute any one in the first embodiment of the present application. Steps of any living body detection method provided by an embodiment or any optional implementation.
进一步,处理器7001可以是FPGA(Field-Programmable Gate Array,现场可编程门阵列)或者其它具有逻辑处理能力的器件,如MCU(Microcontroller Unit,微控制单元)、CPU(Central Process Unit,中央处理器)。Further, the processor 7001 may be an FPGA (Field-Programmable Gate Array) or other devices with logic processing capabilities, such as MCU (Microcontroller Unit), CPU (Central Process Unit, Central Processing Unit) ).
应用本申请实施例,至少具有如下有益效果:The application of the embodiments of this application has at least the following beneficial effects:
一方面,由于本申请提供的活体检测方法中,是以一段针对待检测对象所采集的视频为基础,执行活体检测。具体地,从该视频中提取多帧视频图像,利用多帧视频图像表征该段视频,再针对每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征待检测对象是否为活体的第一概率,最后根据所确定出的多个第一概率,综合地确定待检测对象是否为活体。相比于现有技术中是针对单张图像进行活体检测,本申请以视频为基础,执行活体检测,检测结果更加准确。另一方面,由于本申请提供的活体检测方法中,针对待检测对象所采集的视频,从其中提取多帧视频图像,可以减少该视频的冗余信息,从而降低计算量,提高检测效率。再一方面,本申请提供的活体检测方法不要求待检测对象在镜头前完成张嘴、眨眼等指定面部动作,不仅可以避免面部动作对人脸识别准确性的影响,又可以使得用户在不必做出 指定面部动作的情况下,完成活体检测,从而提高用户体验。On the one hand, because the living body detection method provided by the present application is based on a video collected for the object to be detected, the living body detection is performed. Specifically, extract multiple frames of video images from the video, use multiple frames of video images to characterize the video, and then for each frame of video image, according to the characteristics of the frame of video image, determine whether the frame of video image represents whether the object to be detected is a living body Finally, according to the determined multiple first probabilities, comprehensively determine whether the object to be detected is a living body. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate. On the other hand, since the living body detection method provided by the present application extracts multiple frames of video images from the video collected by the object to be detected, the redundant information of the video can be reduced, thereby reducing the amount of calculation and improving the detection efficiency. On the other hand, the living body detection method provided by this application does not require the object to be detected to complete specified facial actions such as opening mouth and blinking in front of the camera, which can not only avoid the impact of facial actions on the accuracy of face recognition, but also make it unnecessary for users to do In the case of designated facial movements, complete the living body detection, thereby improving the user experience.
基于相同的发明构思,本申请实施例还提供了一种计算机可读存储介质,例如图4中的存储器7002,其中存储有计算机程序7002a,该计算机程序用于被处理器执行时实现本申请实施例一中任意一个实施例或任意一种活体检测方法的步骤。Based on the same inventive concept, the embodiment of the present application also provides a computer-readable storage medium, such as the memory 7002 in FIG. 4, in which a computer program 7002a is stored, which is used to implement the implementation of the present application when executed by a processor. Steps of any embodiment or any living body detection method in Example 1.
本申请实施例提供的计算机可读存储介质包括但不限于任何类型的盘(包括软盘、硬盘、光盘、CD-ROM、和磁光盘)、ROM(Read-Only Memory,只读存储器)、RAM(Random Access Memory,随即存储器)、EPROM(Erasable Programmable Read-Only Memory,可擦写可编程只读存储器)、EEPROM(Electrically Erasable Programmable Read-Only Memory,电可擦可编程只读存储器)、闪存、磁性卡片或光线卡片。也就是,可读存储介质包括由设备(例如,计算机)以能够读的形式存储或传输信息的任何介质。The computer-readable storage medium provided by the embodiments of this application includes but is not limited to any type of disk (including floppy disk, hard disk, optical disk, CD-ROM, and magneto-optical disk), ROM (Read-Only Memory), RAM ( Random Access Memory), EPROM (Erasable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), Flash memory, Magnetic Card or light card. That is, a readable storage medium includes any medium that stores or transmits information in a readable form by a device (for example, a computer).
应用本申请实施例,至少具有如下有益效果:The application of the embodiments of this application has at least the following beneficial effects:
一方面,由于本申请提供的活体检测方法中,是以一段针对待检测对象所采集的视频为基础,执行活体检测。具体地,从该视频中提取多帧视频图像,利用多帧视频图像表征该段视频,再针对每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征待检测对象是否为活体的第一概率,最后根据所确定出的多个第一概率,综合地确定待检测对象是否为活体。相比于现有技术中是针对单张图像进行活体检测,本申请以视频为基础,执行活体检测,检测结果更加准确。另一方面,由于本申请提供的活体检测方法中,针对待检测对象所采集的视频,从其中提取多帧视频图像,可以减少该视频的冗余信息,从而降低计算量,提高检测效率。再一方面,本申请提供的活体检测方法不要求待检测对象在镜头前完成张嘴、眨眼等指定面部动作,不仅可以避免面部动作对人脸识别准确性的影响,又可以使得用户在不必做出指定面部动作的情况下,完成活体检测,从而提高用户体验。On the one hand, because the living body detection method provided by the present application is based on a video collected for the object to be detected, the living body detection is performed. Specifically, extract multiple frames of video images from the video, use multiple frames of video images to characterize the video, and then for each frame of video image, according to the characteristics of the frame of video image, determine whether the frame of video image represents whether the object to be detected is a living body Finally, according to the determined multiple first probabilities, comprehensively determine whether the object to be detected is a living body. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate. On the other hand, since the living body detection method provided by the present application extracts multiple frames of video images from the video collected by the object to be detected, the redundant information of the video can be reduced, thereby reducing the amount of calculation and improving the detection efficiency. On the other hand, the living body detection method provided by the present application does not require the subject to be detected to complete specified facial actions such as opening mouth and blinking in front of the camera, which can not only avoid the impact of facial actions on the accuracy of face recognition, but also make it unnecessary for users to make In the case of designated facial movements, complete the living body detection, thereby improving the user experience.
本技术领域技术人员可以理解,可以用计算机程序指令来实现这些结构图和/或框图和/或流图中的每个框以及这些结构图和/或框图和/或流图中的框的组合。本技术领域技术人员可以理解,可以将这些计算机程序指令提供给通用计算机、专业计算机或其他可编程数据处理方法的处理器来实现, 从而通过计算机或其他可编程数据处理方法的处理器来执行本申请公开的结构图和/或框图和/或流图的框或多个框中指定的方案。Those skilled in the art can understand that computer program instructions can be used to implement each block in these structure diagrams and/or block diagrams and/or flow diagrams and combinations of blocks in these structure diagrams and/or block diagrams and/or flow diagrams. . Those skilled in the art can understand that these computer program instructions can be provided to processors of general-purpose computers, professional computers, or other programmable data processing methods for implementation, so that the computer or other programmable data processing method processors can execute this The structure diagram and/or block diagram and/or flow diagram disclosed in the application or the scheme specified in multiple boxes.
本技术领域技术人员可以理解,本申请中已经讨论过的各种操作、方法、流程中的步骤、措施、方案可以被交替、更改、组合或删除。进一步地,具有本申请中已经讨论过的各种操作、方法、流程中的其他步骤、措施、方案也可以被交替、更改、重排、分解、组合或删除。进一步地,现有技术中的具有与本申请中公开的各种操作、方法、流程中的步骤、措施、方案也可以被交替、更改、重排、分解、组合或删除。Those skilled in the art can understand that the various operations, methods, and steps, measures, and solutions in the process that have been discussed in this application can be alternated, changed, combined, or deleted. Further, various operations, methods, and other steps, measures, and solutions in the process that have been discussed in this application can also be alternated, changed, rearranged, decomposed, combined, or deleted. Further, steps, measures, and schemes in the prior art that have the various operations, methods, and procedures disclosed in this application can also be alternated, changed, rearranged, decomposed, combined or deleted.
以上所述仅是本申请的部分实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。The above are only part of the implementation of this application. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of this application, several improvements and modifications can be made, and these improvements and modifications are also Should be regarded as the scope of protection of this application.

Claims (12)

  1. 一种活体检测方法,其特征在于,所述方法包括:A living body detection method, characterized in that the method includes:
    从针对待检测对象所采集的视频中提取多帧视频图像;Extract multiple frames of video images from the video collected for the object to be detected;
    针对所述多帧视频图像中的每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征所述待检测对象是否为活体的第一概率;For each frame of video image in the multiple frames of video images, according to the characteristics of the frame of video image, determine the first probability that the frame of video image represents whether the object to be detected is a living body;
    根据所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体。According to the first probability corresponding to each of the multiple frames of video images, it is determined whether the object to be detected is a living body.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, wherein the method further comprises:
    根据所述多帧视频图像的帧间相关性,确定所述帧间相关性表征所述待检测对象是否为活体的第二概率;Determining, according to the inter-frame correlation of the multiple frames of video images, the second probability that the inter-frame correlation represents whether the object to be detected is a living body;
    根据所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体,包括:According to the first probability corresponding to each of the multiple frames of video images, determining whether the object to be detected is a living body includes:
    根据所述第二概率和所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体。According to the second probability and the first probability corresponding to each of the multiple frames of video images, it is determined whether the object to be detected is a living body.
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method of claim 2, wherein the method further comprises:
    对所述多帧视频图像各自的特征进行拼接,得到所述视频的特征,该视频特征用于表征所述帧间相关性。The respective characteristics of the multiple frames of video images are spliced to obtain the characteristics of the video, and the video characteristics are used to characterize the inter-frame correlation.
  4. 根据权利要求2所述的方法,其特征在于,根据所述第二概率和所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体,包括:The method according to claim 2, wherein the determining whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images comprises:
    为所述第二概率和所述多帧视频图像各自对应的第一概率分配权重,其中,所述第二概率对应的权重大于每个所述第一概率对应的权重;Assigning weights to the second probability and the first probability corresponding to each of the multiple frames of video images, wherein the weight corresponding to the second probability is greater than the weight corresponding to each of the first probabilities;
    根据所述第二概率及其对应的概率,以及所述多帧视频图像各自对应的第一概率及其对应的权重,确定所述待检测对象是否为活体。According to the second probability and the corresponding probability, and the first probability and the corresponding weight corresponding to each of the multiple frames of video images, it is determined whether the object to be detected is a living body.
  5. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, wherein the method further comprises:
    获得样本视频集,所述样本集包括多个携带标记的样本视频,样本视频携带的标记表征该样本视频是否是针对活体所采集的视频;Obtain a sample video set, where the sample set includes a plurality of sample videos carrying tags, and the tags carried by the sample videos represent whether the sample videos are videos collected for a living body;
    针对所述样本视频集包括的每个携带标记的样本视频,执行以下步骤:For each sample video with a mark included in the sample video set, the following steps are performed:
    从该携带标记的样本视频中提取多帧样本视频图像;Extracting multiple frames of sample video images from the sample video carrying the mark;
    将所述多帧样本视频图像中的每帧样本视频图像输入待训练模型 的卷积层,得到该帧样本视频图像的特征;Input each frame of sample video image in the multi-frame sample video image into the convolutional layer of the model to be trained to obtain the characteristics of the frame sample video image;
    将该帧样本视频图像的特征输入所述待训练模型的第一全连接层,得到该帧样本视频图像对应的第三概率,该第三概率表征该帧样本视频图像是否来源于针对活体所采集的视频;The characteristics of the sample video image of the frame are input into the first fully connected layer of the model to be trained, and the third probability corresponding to the sample video image of the frame is obtained. The third probability characterizes whether the sample video image of the frame is derived from a live body Video of
    将所述多帧样本视频图像各自对应的第三概率输入所述待训练模型的第二全连接层,得到所述样本视频是否是针对活体所采集的视频的预估概率;Input the third probability corresponding to each of the multiple frames of sample video images into the second fully connected layer of the model to be trained to obtain the estimated probability of whether the sample video is a video collected for a living body;
    根据所述预估概率以及多帧样本视频图像各自对应的第三概率,建立损失函数,以更新所述待训练模型,获得活体检测模型;Establish a loss function according to the estimated probability and the third probability corresponding to each of the multi-frame sample video images, so as to update the model to be trained and obtain a live detection model;
    针对所述多帧视频图像中的每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征所述待检测对象是否为活体的第一概率,包括:For each frame of video image in the multiple frames of video images, determining the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image includes:
    将所述多帧视频图像中的每帧视频图像输入所述活体检测模型的卷积层,得到该帧视频图像的特征;Inputting each frame of the video image of the multiple frames of video images into the convolutional layer of the living body detection model to obtain the characteristics of the frame of video image;
    将该帧视频图像的特征输入所述活体检测模型的第一全连接层,以确定该帧视频图像表征所述待检测对象是否为活体的第一概率。The feature of the frame of video image is input into the first fully connected layer of the living body detection model to determine the first probability that the frame of video image represents whether the object to be detected is a living body.
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:The method of claim 5, wherein the method further comprises:
    对所述多帧样本视频图像各自的特征进行拼接,得到所述样本视频的特征;Splicing the respective features of the multiple frames of sample video images to obtain the features of the sample video;
    将所述样本视频的特征输入所述待训练模型的第三全连接层,得到所述样本视频是否是针对活体所采集的视频的第四概率;Input the features of the sample video into the third fully connected layer of the model to be trained to obtain the fourth probability of whether the sample video is a video collected for a living body;
    将所述多帧样本视频图像各自对应的第三概率输入所述待训练模型的第二全连接层,得到所述样本视频是否是针对活体所采集的视频的预估概率,包括:Inputting the third probability corresponding to each of the multiple frames of sample video images into the second fully connected layer of the model to be trained to obtain whether the sample video is an estimated probability of a video collected from a living body includes:
    将所述第四概率以及所述多帧样本视频图像各自对应的第三概率输入所述待训练模型的第二全连接层,得到所述样本视频是否是针对活体所采集的视频的预估概率。The fourth probability and the third probability corresponding to the multi-frame sample video image are input into the second fully connected layer of the model to be trained to obtain the estimated probability of whether the sample video is a video collected by a living body .
  7. 根据权利要求5所述的方法,其特征在于,根据所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体,包括:The method according to claim 5, wherein the determining whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images comprises:
    将所述多帧视频图像各自对应的第一概率输入所述活体检测模型的第二全连接层,确定所述待检测对象是否为活体。The first probability corresponding to each of the multiple frames of video images is input into the second fully connected layer of the living body detection model to determine whether the object to be detected is a living body.
  8. 根据权利要求1-7任一所述的方法,其特征在于,所述方法还 包括:The method according to any one of claims 1-7, wherein the method further comprises:
    获得视频采集装置在所述待检测对象处于静默状态下所采集的视频。Obtain the video collected by the video collection device when the object to be detected is in a silent state.
  9. 一种活体检测装置,其特征在于,所述装置包括:A living body detection device, characterized in that the device comprises:
    第一提取模块,用于从针对待检测对象所采集的视频中提取多帧视频图像;The first extraction module is configured to extract multiple frames of video images from the video collected for the object to be detected;
    第一确定模块,用于针对所述多帧视频图像中的每帧视频图像,根据该帧视频图像的特征,确定该帧视频图像表征所述待检测对象是否为活体的第一概率;The first determining module is configured to, for each frame of video image in the multiple frames of video images, determine the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image;
    第二确定模块,用于根据所述多帧视频图像各自对应的第一概率,确定所述待检测对象是否为活体。The second determining module is configured to determine whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images.
  10. 一种可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-8任一所述的方法中的步骤。A readable storage medium with a computer program stored thereon, wherein the program is executed by a processor to implement the steps in the method according to any one of claims 1-8.
  11. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行时实现如权利要求1-8任一所述的方法的步骤。An electronic device, comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor implements the method according to any one of claims 1-8 when executed step.
  12. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算处理设备上运行时,导致所述计算处理设备执行根据权利要求1-8中任一项所述的活体检测方法。A computer program, comprising computer readable code, when the computer readable code runs on a computing processing device, causing the computing processing device to execute the living body detection method according to any one of claims 1-8.
PCT/CN2020/091047 2019-06-13 2020-05-19 Living body testing method and apparatus, electronic device and readable storage medium WO2020248780A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910512041.1A CN110378219B (en) 2019-06-13 2019-06-13 Living body detection method, living body detection device, electronic equipment and readable storage medium
CN201910512041.1 2019-06-13

Publications (1)

Publication Number Publication Date
WO2020248780A1 true WO2020248780A1 (en) 2020-12-17

Family

ID=68250296

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/091047 WO2020248780A1 (en) 2019-06-13 2020-05-19 Living body testing method and apparatus, electronic device and readable storage medium

Country Status (2)

Country Link
CN (1) CN110378219B (en)
WO (1) WO2020248780A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792701A (en) * 2021-09-24 2021-12-14 北京市商汤科技开发有限公司 Living body detection method and device, computer equipment and storage medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378219B (en) * 2019-06-13 2021-11-19 北京迈格威科技有限公司 Living body detection method, living body detection device, electronic equipment and readable storage medium
CN111091047B (en) * 2019-10-28 2021-08-27 支付宝(杭州)信息技术有限公司 Living body detection method and device, server and face recognition equipment
CN112749603A (en) * 2019-10-31 2021-05-04 上海商汤智能科技有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium
CN111209863B (en) * 2020-01-07 2023-12-15 北京旷视科技有限公司 Living model training and human face living body detection method and device and electronic equipment
CN111680624A (en) * 2020-06-08 2020-09-18 上海眼控科技股份有限公司 Behavior detection method, electronic device, and storage medium
CN111814567A (en) * 2020-06-11 2020-10-23 上海果通通信科技股份有限公司 Method, device and equipment for detecting living human face and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030133599A1 (en) * 2002-01-17 2003-07-17 International Business Machines Corporation System method for automatically detecting neutral expressionless faces in digital images
CN105956572A (en) * 2016-05-15 2016-09-21 北京工业大学 In vivo face detection method based on convolutional neural network
CN107480586A (en) * 2017-07-06 2017-12-15 天津科技大学 Bio-identification photo bogus attack detection method based on human face characteristic point displacement
CN108009493A (en) * 2017-11-30 2018-05-08 电子科技大学 Face anti-fraud recognition methods based on action enhancing
CN108805047A (en) * 2018-05-25 2018-11-13 北京旷视科技有限公司 A kind of biopsy method, device, electronic equipment and computer-readable medium
CN109670413A (en) * 2018-11-30 2019-04-23 腾讯科技(深圳)有限公司 Face living body verification method and device
CN110378219A (en) * 2019-06-13 2019-10-25 北京迈格威科技有限公司 Biopsy method, device, electronic equipment and readable storage medium storing program for executing

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915649B (en) * 2015-06-04 2018-12-14 南京理工大学 A kind of biopsy method applied to recognition of face
CN106874857B (en) * 2017-01-19 2020-12-01 腾讯科技(上海)有限公司 Living body distinguishing method and system based on video analysis
CN109389002A (en) * 2017-08-02 2019-02-26 阿里巴巴集团控股有限公司 Biopsy method and device
CN107818313B (en) * 2017-11-20 2019-05-14 腾讯科技(深圳)有限公司 Vivo identification method, device and storage medium
CN107992842B (en) * 2017-12-13 2020-08-11 深圳励飞科技有限公司 Living body detection method, computer device, and computer-readable storage medium
CN108596041B (en) * 2018-03-28 2019-05-14 中科博宏(北京)科技有限公司 A kind of human face in-vivo detection method based on video

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030133599A1 (en) * 2002-01-17 2003-07-17 International Business Machines Corporation System method for automatically detecting neutral expressionless faces in digital images
CN105956572A (en) * 2016-05-15 2016-09-21 北京工业大学 In vivo face detection method based on convolutional neural network
CN107480586A (en) * 2017-07-06 2017-12-15 天津科技大学 Bio-identification photo bogus attack detection method based on human face characteristic point displacement
CN108009493A (en) * 2017-11-30 2018-05-08 电子科技大学 Face anti-fraud recognition methods based on action enhancing
CN108805047A (en) * 2018-05-25 2018-11-13 北京旷视科技有限公司 A kind of biopsy method, device, electronic equipment and computer-readable medium
CN109670413A (en) * 2018-11-30 2019-04-23 腾讯科技(深圳)有限公司 Face living body verification method and device
CN110378219A (en) * 2019-06-13 2019-10-25 北京迈格威科技有限公司 Biopsy method, device, electronic equipment and readable storage medium storing program for executing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792701A (en) * 2021-09-24 2021-12-14 北京市商汤科技开发有限公司 Living body detection method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110378219B (en) 2021-11-19
CN110378219A (en) 2019-10-25

Similar Documents

Publication Publication Date Title
WO2020248780A1 (en) Living body testing method and apparatus, electronic device and readable storage medium
WO2021114931A1 (en) Method and apparatus for training encoding model capable of preventing private data leakage
US10275672B2 (en) Method and apparatus for authenticating liveness face, and computer program product thereof
JP2018508875A (en) Method and apparatus for biological face detection
CN109858371A (en) The method and device of recognition of face
CN112215043A (en) Human face living body detection method
CN111178249A (en) Face comparison method and device, computer equipment and storage medium
CN111611873A (en) Face replacement detection method and device, electronic equipment and computer storage medium
CN108108711B (en) Face control method, electronic device and storage medium
KR102145132B1 (en) Surrogate Interview Prevention Method Using Deep Learning
JP7188446B2 (en) Authentication device, authentication method, authentication program and recording medium
CN107704813A (en) A kind of face vivo identification method and system
CN109359689B (en) Data identification method and device
CN113591603A (en) Certificate verification method and device, electronic equipment and storage medium
CN114241588B (en) Self-adaptive face comparison method and system
CN109886084A (en) Face authentication method, electronic equipment and storage medium based on gyroscope
Ohki et al. Efficient spoofing attack detection against unknown sample using end-to-end anomaly detection
CN114299569A (en) Safe face authentication method based on eyeball motion
JP7353825B2 (en) Image processing device and method, image input device, image processing system, program
CN110414347B (en) Face verification method, device, equipment and storage medium
CN112001285A (en) Method, device, terminal and medium for processing beautifying image
Kaur et al. Improved Facial Biometric Authentication Using MobileNetV2
CN114463799A (en) Living body detection method and device and computer readable storage medium
CN111860563A (en) Vehicle verification method and device, electronic equipment and medium
CN112241674A (en) Face recognition method and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20821770

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20821770

Country of ref document: EP

Kind code of ref document: A1