CN112200057A - Face living body detection method and device, electronic equipment and storage medium - Google Patents

Face living body detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112200057A
CN112200057A CN202011063444.1A CN202011063444A CN112200057A CN 112200057 A CN112200057 A CN 112200057A CN 202011063444 A CN202011063444 A CN 202011063444A CN 112200057 A CN112200057 A CN 112200057A
Authority
CN
China
Prior art keywords
face
network
image
sample
living body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011063444.1A
Other languages
Chinese (zh)
Other versions
CN112200057B (en
Inventor
冯思博
陈莹
黄磊
彭菲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hanwang Technology Co Ltd
Original Assignee
Hanwang Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hanwang Technology Co Ltd filed Critical Hanwang Technology Co Ltd
Priority to CN202011063444.1A priority Critical patent/CN112200057B/en
Publication of CN112200057A publication Critical patent/CN112200057A/en
Application granted granted Critical
Publication of CN112200057B publication Critical patent/CN112200057B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive

Abstract

The application discloses a face in-vivo detection method, belongs to the technical field of face detection, and is beneficial to improving the speed and accuracy of face in-vivo detection. The method comprises the following steps: acquiring a first face image and a second face image which are synchronously acquired by a first image acquisition device and a second image acquisition device aiming at a target face; respectively carrying out face positioning on the first face image and the second face image, cutting a first face image to be detected from the first face image and cutting a second face image to be detected from the second face image according to a face positioning result; inputting the cut first face image to be detected and the cut second face image to be detected into a pre-trained living body detection model in parallel, and classifying and mapping the target face through the living body detection model according to the plane characteristics and the depth characteristics in the two input face images; and determining whether the target face is a living face according to the classification mapping result.

Description

Face living body detection method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of human face detection technologies, and in particular, to a human face in-vivo detection method, an apparatus, an electronic device, and a computer-readable storage medium.
Background
In order to improve the safety of the face recognition technology in practical application, the importance of performing living body detection on a face image to be recognized to resist the attack of a photo or a video on the face recognition application is increasingly prominent. In the prior art, in order to improve the accuracy of face recognition, a face recognition technology based on a binocular camera is increasingly widely applied. The face living body detection technology based on the binocular camera is also continuously improved. Currently, common face detection technologies include binocular visible light-based face in-vivo detection technologies. The method specifically comprises the following steps: the method comprises the steps of respectively carrying out face key point detection on two images of a target face acquired based on a binocular visible light camera, then constructing three-dimensional sparse point cloud according to face key point data, then carrying out interpolation on the three-dimensional sparse point cloud to generate dense point cloud, and classifying based on the dense point cloud. The scheme has the advantages of long calculation time, high complexity, large calculation error and limited use scene.
Therefore, the method for detecting the living human face in the prior art needs to be improved.
Disclosure of Invention
The application provides a face in-vivo detection method which is beneficial to improving the speed and accuracy of face in-vivo detection.
In order to solve the above problem, in a first aspect, an embodiment of the present application provides a face live detection method, including:
acquiring a first face image and a second face image which are synchronously acquired by a first image acquisition device and a second image acquisition device aiming at a target face;
respectively carrying out face positioning on the first face image and the second face image to obtain corresponding face positioning results;
respectively cutting a first face image to be detected from the first face image and cutting a second face image to be detected from the second face image according to the face positioning results in the first face image and the second face image;
inputting the cut first face image to be detected and the cut second face image to be detected into a pre-trained living body detection model in parallel, and performing classification mapping on the target face through the living body detection model according to the plane features and the depth features in the first face image to be detected and the second face image to be detected; the living body detection model is a classification model trained on face key point constraint and depth feature constraint of a training sample;
and determining whether the target face is a living face according to the classification mapping result.
In a second aspect, an embodiment of the present application provides a human face living body detection apparatus, including:
the face image acquisition module is used for acquiring a first face image and a second face image which are synchronously acquired by a first image acquisition device and a second image acquisition device aiming at a target face;
the face positioning module is used for respectively carrying out face positioning on the first face image and the second face image to obtain corresponding face positioning results;
the face image cutting module is used for cutting a first face image to be detected from the first face image and cutting a second face image to be detected from the second face image according to the face positioning results in the first face image and the second face image respectively;
the image classification module is used for inputting the cut first face image to be detected and the cut second face image to be detected into a pre-trained living body detection model in parallel, and performing classification mapping on the target face through the living body detection model according to the plane features and the depth features in the first face image to be detected and the second face image to be detected; the living body detection model is a classification model trained on face key point constraint and depth feature constraint of a training sample;
and the face living body detection result determining module is used for determining whether the target face is a living body face according to the classification mapping result.
In a third aspect, an embodiment of the present application further discloses an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the living human face detection method according to the embodiment of the present application is implemented.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the living human face detection method disclosed in the present application.
The method for detecting the living human face comprises the steps of acquiring a first human face image and a second human face image which are synchronously acquired by a first image acquisition device and a second image acquisition device aiming at a target human face; respectively carrying out face positioning on the first face image and the second face image to obtain corresponding face positioning results; then, respectively cutting a first face image to be detected from the first face image and cutting a second face image to be detected from the second face image according to the face positioning results in the first face image and the second face image; inputting the cut first face image to be detected and the cut second face image to be detected into a pre-trained living body detection model in parallel, and performing classification mapping on the target face through the living body detection model according to the plane features and the depth features in the first face image to be detected and the second face image to be detected; the living body detection model is a classification model trained on face key point constraint and depth feature constraint of a training sample; and determining whether the target face is a living face according to the classification mapping result, which is beneficial to improving the speed of face living body detection.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a flowchart of a human face living body detection method according to a first embodiment of the present application;
FIG. 2 is a diagram illustrating a multitasking model according to a first embodiment of the present application;
fig. 3 is a schematic structural diagram of a living human face detection model according to a first embodiment of the present application;
fig. 4 is a schematic structural diagram of a living human face detection device according to a second embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Example one
As shown in fig. 1, the method for detecting a living human face includes steps 110 to 150.
And 110, acquiring a first face image and a second face image which are synchronously acquired by a first image acquisition device and a second image acquisition device aiming at the target face.
In the embodiment of the present application, the first image capturing device and the second image capturing device are two synchronous image capturing devices disposed on the same electronic device, for example, a binocular synchronous face recognition device. The first image acquisition device and the second image acquisition device synchronously acquire images of target objects (such as human faces) according to the control of the electronic equipment. In some embodiments of the present application, the relative positions of the first image capturing device and the second image capturing device in the vertical direction and the horizontal direction are kept constant, and the horizontal direction is kept at a certain distance (for example, a distance larger than 60 mm is common). The imaging light sources of the first image acquisition device and the second image acquisition device can be the same or different. For example, the first image capturing device and the second image capturing device may be all visible light image capturing devices, or all infrared light image capturing devices, or one infrared light image capturing device and one visible light image capturing device, which is not limited in this application.
In some embodiments of the present application, the first image capturing device and the second image capturing device need to be calibrated and calibrated in advance to obtain calibration matrices of the first image capturing device and the second image capturing device.
Referring to the prior art, specific implementation manners for calibrating and calibrating the different image acquisition devices may be found, for example, the internal reference matrix and the external reference matrix of the camera obtained by a zhangnyou checkerboard calibration method may be adopted, and details are not repeated in the embodiments of the present application.
Taking the first image acquisition device and the second image acquisition device as binocular cameras of the electronic equipment as an example, the calibration matrix of the binocular cameras is determined by calibration when the cameras leave a factory. In some embodiments of the present application, after two face images of a target face are simultaneously and respectively acquired by a binocular synchronous camera of an electronic device, for example, the two face images are respectively represented as a first face image a and a second face image B, the first face image a and the second face image B are further rectified by using a calibration matrix of the binocular synchronous camera, and a first face image a 'and a second face image B' are respectively obtained.
And 120, respectively carrying out face positioning on the first face image and the second face image to obtain corresponding face positioning results.
In some embodiments of the present application, the face positioning result includes: and a human face positioning frame. In specific implementation, a face positioning method in the prior art may be adopted to perform face positioning on the first face image a 'and the second face image B' respectively, so as to obtain a face positioning frame in the first face image a 'and a face positioning frame in the second face image B'. The method and the device do not limit the specific implementation mode of respectively carrying out face positioning on the first face image and the second face image and respectively determining the face positioning frames in the first face image and the second face image.
Step 130, respectively cutting a first face image to be detected from the first face image and cutting a second face image to be detected from the second face image according to the face positioning results in the first face image and the second face image.
In some embodiments of the present application, the face positioning result includes: and a human face positioning frame. After the face positioning frame in the first face image and the face positioning frame in the second face image are respectively determined, respectively cutting out a first face image to be detected from the first face image and cutting out a second face image to be detected from the second face image according to the face positioning results in the first face image and the second face image, further comprising: cutting a first face image to be detected from the first face image according to the face positioning frame in the first face image; and cutting a second face image to be detected from the second face image according to the face positioning frame in the second face image.
In order to acquire more abundant information, it is necessary to determine an image in a larger area including the face positioning frame according to the face positioning frame for face living body detection. In some embodiments of the present application, a first face image to be detected is clipped from the first face image according to a face positioning frame in the first face image; and cutting a second face image to be detected from the second face image according to the face positioning frame in the second face image, wherein the cutting process comprises the following steps: expanding the face positioning frame in the first face image to a preset size, and cutting a first face image to be detected from the first face image according to the expanded face positioning frame; and expanding the face positioning frame in the second face image to a preset size, and cutting a second face image to be detected from the second face image according to the expanded face positioning frame. For example, the face of the first face image A' is positioned in the frame SAExpanding the face to the periphery by 2 times to obtain a face positioning frame SA' then, a face is positioned in a frame S from the first face image AAThe image of the' coverage area is cut out to be used as the first face image to be detected. In the same way, will be secondFace positioning frame S of face image BBExpanding the face to the periphery by 2 times to obtain a face positioning frame SB' then, a face is positioned in a frame S from the second face image BBAnd cutting out an image of the coverage area to serve as a second face image to be detected.
Step 140, inputting the cut first to-be-detected face image and the cut second to-be-detected face image in parallel to a pre-trained living body detection model, and performing classification mapping on the target face through the living body detection model according to the plane features and the depth features in the first to-be-detected face image and the second to-be-detected face image.
The living body detection model is a classification model trained on face key point constraint and depth feature constraint of a training sample;
and then, inputting the cut first face image to be detected and the cut second face image to be detected into a pre-trained living body detection model in parallel, and carrying out living body detection on the target face through the living body detection model based on the two input images. In specific implementation, a living body detection model needs to be trained firstly.
In some embodiments of the present application, before inputting the cut first to-be-detected face image and the cut second to-be-detected face image in parallel to a pre-trained living body detection model, and performing classification mapping on the target face according to a plane feature and a depth feature in the first to-be-detected face image and the second to-be-detected face image through the living body detection model, the method further includes: and training a living body detection model.
In some embodiments of the present application, the living body detection model is obtained by cutting a preset multitask model, as shown in fig. 2, the multitask model includes: a first task network consisting of a first convolutional network 210 and a first fully connected network 220, the first task network being used for learning face key point features in images input to the first convolutional network; a second task network consisting of a second convolutional network 230 and a second fully-connected network 240, the second task network being configured to learn the face keypoint features in the image input to the second convolutional network; a third task network consisting of the first convolutional network 210, the second convolutional network 230, a residual network 250, and a depth regression network 260, the third task network for learning depth features in the images input to the first convolutional network and the second convolutional network; and a fourth task network composed of the first convolution network 210, the second convolution network 230, the residual network 250, and a classification network 270, the fourth task network being configured to learn living body and non-living body information in the image input to the first convolution network and the second convolution network, wherein the first convolution network and the second convolution network are arranged in parallel, and the residual network is connected to outputs of the first convolution network and the second convolution network, respectively.
In the embodiment of the present application, as shown in fig. 3, the training of the living body detection model includes: the first convolutional network 210, the second convolutional network 230, the residual network 250, and the classification network 270. Accordingly, the training in vivo examination model comprises: training the multitask model; the network parameters of the living body detection model composed of the first convolutional network 210, the second convolutional network 230, the residual error network 250 and the classification network 270 are obtained by training the multi-task model.
Wherein the first convolutional network 210 and the second convolutional network 230 are arranged in parallel, the residual network 250 is connected with the outputs of the first convolutional network 210 and the second convolutional network 230, and the classification network 270 is connected with the output of the residual network 250.
In the specific implementation of the application, the multitask network is trained first. The multitask network comprises four learning tasks, namely two network tasks for respectively learning key point features of human faces in input images of two channels, one network task for learning depth features in the input images and one network task for learning living and non-living features in the input images. Each network task is realized through different task networks, the four task networks share the first convolution network and the second convolution network, and the learning of the depth features, the living body features and the non-living body features is based on the learning of the key point features of the face.
Before training the multitask model, a training sample set including several training samples needs to be obtained first. Wherein the sample data of each training sample comprises: a first sample image and a second sample image. The sample label of each training sample comprises: the first sample image and the second sample image each correspond to: the real value of the face key point, the real value of the depth value and the real value of the living body category.
The first sample image and the second sample image are a pair of images determined according to the mode of determining the first face image to be detected and the second face image to be detected; the real values of the face key points corresponding to the first sample image and the second sample image are obtained by respectively performing face detection on the first sample image and the second sample image through a face detection technology in the prior art. In specific implementation, the number of face key points obtained by different face detection technologies may be different. And the real value of the depth value is the depth value of the face key point obtained by calculating according to the face key point coordinates in the first sample image and the face key point coordinates in the second sample image and the calibration matrixes of the first image acquisition device and the second image acquisition device.
In some embodiments of the present application, the sample data for training each training sample of the multitask model comprises: a first sample image and a second sample image, the sample label of each of the training samples comprising: the first sample image and the second sample image each correspond to: the real values of the face key points, the real values of the depth values and the real values of the face living body categories.
The multitask model is trained by the following method: for each training sample in the sample set, performing the following encoding mapping operations: inputting a first sample image comprised by the training sample to the first convolutional network of the multitask model, while inputting a second sample image comprised by the training sample to the second convolutional network of the multitask model; performing operation processing on the first sample image through the first task network to obtain a human face key point prediction value of the first sample image in the training sample; performing operation processing on the second sample image through the second task network to obtain a human face key point prediction value of the second sample image in the training sample; performing operation processing on the first sample image and the second sample image through the third task network to obtain a depth value predicted value of the training sample; performing operation processing on the first sample image and the second sample image through the fourth task network to obtain a human face living body category predicted value of the training sample; determining prediction loss values of the first task network, the second task network, the third task network and the fourth task network according to prediction values (namely the face key point prediction value of the first sample image, the face key point prediction value of the second sample image, the depth value prediction value and the face living body category prediction value) obtained by executing the coding mapping operation; carrying out weighted summation on the predicted loss values of the first task network, the second task network, the third task network and the fourth task network to determine a model predicted total loss value of the multitask model; optimizing the network parameters of the multitask model, and skipping to executing the coding mapping operation until the model prediction total loss value converges to meet the preset condition.
Firstly, for each training sample, inputting a first sample image included in sample data of the training sample to the first convolution network of the multitask model, simultaneously inputting a second sample image included in the sample data of the training sample to the second convolution network of the multitask model, and then starting to execute codes corresponding to each network in the multitask model by computing and processing equipment to learn face key point features, depth features, living body features and non-living body face features of all the training samples in the training sample set. Specifically, in the task model structure shown in fig. 3, the first convolutional network 210 and the second convolutional network 230 are respectively used for learning the face key point features in the input image; the residual error network 250 is used for simultaneously learning the depth features and the living body and non-living body features of the input image based on the face key point feature learning.
In some embodiments of the present application, the first sample image is subjected to operation processing by the first task network, so as to obtain a face keypoint prediction value of the first sample image in the training sample; and performing operation processing on the second sample image through the second task network to obtain a face key point prediction value of the second sample image in the training sample, including: performing convolution processing on the first sample image in the training sample through the first convolution network to obtain a first vector; then, coding and mapping the first vector through the first full-connection network to obtain a human face key point predicted value corresponding to the first sample image; performing convolution processing on the second sample image in the training sample through the second convolution network to obtain a second vector; and then, coding and mapping the second vector through the second fully-connected network to obtain a human face key point prediction value corresponding to the second sample image.
In the implementation of the present application, the processing of the first sample image and the processing of the first sample image are performed synchronously through two network tasks. The following describes the encoding and mapping process for the first sample image and the second sample image in the training sample, respectively.
As shown in FIG. 2, the first convolution network 210 includes a plurality of convolution layers for input image (denoted as P)L_i) Convolving and extracting a feature vector, e.g. denoted as a first vector eL_i(ii) a Thereafter, the first fully connected network 220 couples the first vector eL_iPerforming vector leveling and mapping to obtain an input image (e.g., P)L_i) The corresponding face keypoint prediction value, for example, the prediction value of 81 face keypoints is expressed as (x)L_i_0,xL_i_1,…,xL_i_80),(yL_i_0,yL_i_1,…,yL_i_80)。
Similarly, the second convolutional network 230 includes a plurality of convolutional layers for aligning the input image (e.g., P)R_i) Convolving and extracting the feature vector, e.g. denoted as second vector eR_i(ii) a Thereafter, the second fully-connected network 240 pairs the second vector eR_iPerforming vector leveling and mapping to obtain an input image (e.g., P)R_i) The corresponding face keypoint prediction value, for example, the prediction value of 81 face keypoints is expressed as (x)R_i_0,xR_i_1,…,xR_i_80),(yR_i_0,yR_i_1,…,yR_i_80)。
In some embodiments of the present application, the determining the real value of the depth value according to the key points of the face in the first sample image and the second sample image in the sample data of the training sample, and performing operation processing on the first sample image and the second sample image through the third task network to obtain the predicted value of the depth of the training sample includes: performing convolution processing on the first vector and the second vector through the residual error network to obtain a third vector; and coding and mapping the third vector through the depth regression network to obtain a depth value predicted value of the face key point corresponding to the training sample.
For specific implementation of determining the face key points in the first sample image and the second sample image, reference is made to the prior art, and details are not described in the embodiment of the present application. Furthermore, by adopting the method in the prior art, the real value of the depth value of the training sample can be determined according to the face key points in the first sample image and the second sample image and the calibration matrixes of the first image acquisition device and the second image acquisition device for acquiring the first sample image and the second sample image.
As shown in fig. 2, the first vector e is processed by the residual network 250L_iAnd the second vector eR_iPerforming convolution processing to obtain a third vector; performing coding mapping on the third vector through the deep regression network 260 to obtain the third vector corresponding to the training sampleThe predicted value of depth values of key points of a human face, for example, the predicted value of depth values of key points of 81 human faces is expressed as (z)i_0,zi_1,…,zi_80)。
In some embodiments of the present application, the obtaining a human face living body category prediction value of the training sample by performing operation processing on the first sample image and the second sample image through the fourth task network includes: performing convolution processing on the first vector and the second vector through the residual error network to obtain a third vector; and coding and mapping the third vector through the classification network to obtain a human face living body category predicted value corresponding to the training sample.
As shown in fig. 2, the first vector e is processed by the residual network 250L_iAnd the second vector eR_iPerforming convolution processing to obtain a third vector; and performing coding mapping on the third vector through the classification network 270 to obtain a human face living body category predicted value corresponding to the training sample. In some embodiments of the present application, the classification network 270 may output a two-dimensional vector after encoding and mapping the third vector, where each dimension of the two-dimensional vector is used to represent the probability that the training sample is a different living human face category.
In some embodiments of the present application, determining the predicted loss values of the first task network, the second task network, the third task network, and the fourth task network according to the predicted values obtained by performing the encoding mapping operation includes: determining a prediction loss value of the first task network according to a difference value between the human face key point prediction value and a human face key point true value of a first sample image in all the training samples in the sample set; determining a prediction loss value of the second task network according to a difference value between the predicted value of the face key point and the true value of the face key point of a second sample image in all the training samples in the sample set; determining a prediction loss value of the third task network according to the difference value between the depth value prediction value and the depth value true value of all the training samples in the sample set; and determining a prediction loss value of the fourth task network according to the difference value between the human face living body type prediction value and the human face living body type true value of all the training samples in the sample set.
For example, according to the encoding mapping result of the first sample image in all training samples in the sample set, the prediction errors of the first convolution network 210 and the first fully-connected network 220 are calculated, that is, the prediction loss value of the first task network is also the first face keypoint prediction loss value. In some embodiments of the present application, the first face keypoint predicted loss value of the first task network may be calculated by the following formula:
Figure BDA0002713084180000111
wherein L islandmark_leftRepresenting a predicted loss value, x, of the first task networkL_iFor the first sample image in the ith training sample, f (x)L_i) Representing the face keypoint prediction value y of the first sample image in the ith training sampleL_iRepresenting the real value of the face key point of the first sample image in the ith training sample, N representing the number of samples in the training sample set, lambda representing the weight of each layer of network, wjFor the network parameter, n represents the number of network layers with weights.
For another example, according to the encoding mapping result of the second sample image in all the training samples in the sample set, the prediction error of the second convolutional network 230 and the second fully-connected network 240 is calculated, that is, the prediction loss value of the second task network is also the second face key point prediction loss value. In some embodiments of the present application, the predicted loss value for the second task network may be calculated by the following formula:
Figure BDA0002713084180000112
wherein L islandmark_rightRepresenting the predicted loss value, x, of the second network of tasksL_iFor the second sample image in the ith training sample, f (x)R_i) Express the ith trainingFace key point prediction value y of second sample image in training sampleR_iRepresenting the real value of the face key point of a second sample image in the ith training sample, N representing the number of samples in the training sample set, lambda representing the weight of each layer of network, wjFor the network parameters, n represents the number of network layers with weights.
For another example, the prediction errors of the first convolution network 210, the second convolution network 230, the residual network 250, and the depth regression network 260, i.e., the prediction loss values of the third task network, are also depth value prediction loss values, are calculated according to the encoding mapping results of all the training samples in the sample set. In some embodiments of the present application, the predicted loss value of the third task network may be calculated by the following formula:
Figure BDA0002713084180000121
wherein L isdepthA predicted loss value, f (x), representing the third task networki) Representing the predicted value y of the depth value of the face key point in the ith training sampleiRepresenting the real value of the depth value of the key point of the face in the ith training sample, N representing the number of samples in the training sample set, lambda representing the weight of each layer of the network, wjFor the network parameter, n represents the number of network layers with weights.
For another example, the prediction errors of the first convolutional network 210, the second convolutional network 230, the residual network 250, and the classification network 270, that is, the prediction loss value of the fourth task network, are also the face living body class prediction loss values, are calculated according to the coding mapping results of all the training samples in the sample set. In some embodiments of the present application, the predicted loss value of the fourth task network may be calculated by the following formula:
Figure BDA0002713084180000122
wherein L isface_livenessA predicted loss value, f (x), representing the fourth mission networki) Represents the ith training samplePredicted value of middle face living body class, yiRepresenting the real value of the class of the living human face in the ith training sample, N representing the number of samples in the training sample set, lambda representing the weight of each layer of the network, wjFor the network parameter, n represents the number of network layers with weights.
After the predicted loss value of each branch network is determined, a model predicted total loss value of the multitask model is further calculated according to the predicted loss values of the branch networks. In some embodiments of the present application, the model predicted total loss value L of the multitask model may be determined as followstotal
Ltotal=λ1Llandmark_left2Llandmark_right3Ldepth4Lface_liveness(ii) a Wherein λ is1、λ2、λ3And λ4The value of (a) can be set according to practical experience.
In the training process, the model prediction total loss value L of the multitask model can be adjusted by continuously optimizing network parameters of each network included in the multitask modeltotalUntil the model predicts the total loss value LtotalSatisfies a predetermined condition (e.g., loss value L)totalAnd converging to be less than a preset value), namely completing the training of the multitask model.
In the prediction stage, the cut first face image to be detected and the cut second face image to be detected are input to a pre-trained living body detection model in parallel, and the target face is classified and mapped through the living body detection model according to the plane features and the depth features in the first face image to be detected and the second face image to be detected, including: carrying out convolution processing on the first face image to be detected through the first convolution network to obtain a fourth vector; carrying out convolution processing on the second face image to be detected through the second convolution network to obtain a fifth vector; performing convolution processing on the fourth vector and the fifth vector through the residual error network to obtain a sixth vector; and coding and mapping the sixth vector through the classification network to obtain the living human face category corresponding to the target human face.
Specifically, in the specific embodiment of performing convolution processing on the first to-be-detected face image through the first convolution network to obtain the fourth vector, the first sample image of the training sample is subjected to convolution processing through the first convolution network in the training stage to obtain the first vector, which is not described herein again. And performing convolution processing on the second face image to be detected through the second convolution network to obtain a specific implementation manner of a fifth vector, and performing convolution processing on a second sample image of the training sample through the second convolution network in a training stage to obtain a second vector, which is not described herein again. And performing convolution processing on the fourth vector and the fifth vector through the residual error network to obtain a specific implementation manner of a sixth vector, and performing convolution processing on the first vector and the second vector through the residual error network in a training stage to obtain a specific implementation manner of a third vector, which is not described herein again. And performing coding mapping on the sixth vector through the classification network to obtain a specific implementation of the human face living body category corresponding to the target human face, referring to a training stage, and performing coding mapping on the third vector through the classification network to obtain a specific implementation of the human face living body category predicted value corresponding to the training sample, which is not described herein again.
And 150, determining whether the target face is a living face according to the classification mapping result.
The classification mapping result in the embodiment of the application comprises the probability that the input image is recognized as different human face living body classes. Further, when the probability that the input image is recognized as the living body face type is larger than a preset probability threshold value, the target face can be determined to be the living body face, and otherwise, the target face can be determined to be the non-living body face.
The method for detecting the living human face comprises the steps of acquiring a first human face image and a second human face image which are synchronously acquired by a first image acquisition device and a second image acquisition device aiming at a target human face; respectively carrying out face positioning on the first face image and the second face image to obtain corresponding face positioning results; then, respectively cutting a first face image to be detected from the first face image and cutting a second face image to be detected from the second face image according to the face positioning results in the first face image and the second face image; inputting the cut first face image to be detected and the cut second face image to be detected into a pre-trained living body detection model in parallel, and performing classification mapping on the target face through the living body detection model according to the plane features and the depth features in the first face image to be detected and the second face image to be detected; the living body detection model is a classification model trained on face key point constraint and depth feature constraint of a training sample; and determining whether the target face is a living face according to the classification mapping result, which is beneficial to improving the speed of face living body detection.
According to the face in-vivo detection method disclosed by the embodiment of the application, two face images acquired by binocular image acquisition equipment are further learned based on the constraint of face key points and depth information learning results in the training process of a in-vivo detection model, so that the plane information and the depth information of the image pair of a target face acquired by the binocular image acquisition equipment are realized, the target face is subjected to in-vivo detection without generating a three-dimensional space point cloud, the calculation complexity is low, the operation speed is high, and the face in-vivo detection efficiency is high.
In the model training process, the depth information of the image is considered, so that the plane non-living human faces such as photos and videos can be accurately classified in the prediction stage, and the attack of the plane image can be rapidly detected. Because the face key point information (such as the training process of the first task network and the second task network in fig. 2, and the training process of the first convolution network and the second convolution network) is fully considered by the respective networks, in the prediction stage, the living body detection model can detect the attack face of a nose complex bend, a three-dimensional head model, a simulation mask or a mask worn by a real person by pulling out a picture nose, and the accuracy of face living body detection is further improved.
Specifically, a first sample image and a second sample image (i.e., two image acquisition devices of a binocular image acquisition device) are respectively input into the multitask model shown in fig. 2, first, a first convolution network of a first task network and a second convolution network of a second task network respectively learn the face features of the first sample image and the second sample image, a full connection layer is added behind each convolution network to carry out regression on the features, and face key point constraints are increased, so that the convolution networks branch to learn the face features in the input image; then, combining the features extracted by the two convolutional network branches together, and learning the depth features and the living body features through a plurality of residual error modules of a residual error network; and then, two network branches are led out, one network branch is connected to the full-connection layer and regresses the depth value of the feature point for the fusion feature, the convolution network is added to the other network branch, then the convolution feature is stretched into a one-dimensional vector, the depth value is spliced to the one-dimensional vector, and a living body classification result is obtained through the two full-connection layers. The living body detection model trained by the structure and the method has face key point constraint and depth constraint corresponding to the key points, can learn face plane information (such as two-dimensional texture information) and depth information, and is favorable for improving the accuracy and reliability of living body detection.
On the other hand, the in-vivo detection model in the embodiment of the application is obtained by cutting based on a multi-task model, the four branch networks are synchronously trained in the training stage by combining plane information and depth information, only one branch network is used in the prediction stage, the network structure used in the prediction stage is simple, and the operation efficiency is higher.
Example two
Corresponding to the method embodiment, another embodiment of the present application discloses a human face live detection device, as shown in fig. 4, the device includes:
a face image obtaining module 410, configured to obtain a first face image and a second face image that are synchronously collected by a first image collecting device and a second image collecting device for a target face;
a face positioning module 420, configured to perform face positioning on the first face image and the second face image respectively to obtain corresponding face positioning results;
a face image clipping module 430, configured to clip a first to-be-detected face image from the first face image and clip a second to-be-detected face image from the second face image according to the face positioning results in the first face image and the second face image, respectively;
the image classification module 440 is configured to input the cut first to-be-detected face image and the cut second to-be-detected face image to a pre-trained living body detection model in parallel, and perform classification mapping on the target face through the living body detection model according to a plane feature and a depth feature in the first to-be-detected face image and the second to-be-detected face image; the living body detection model is a classification model trained on face key point constraint and depth feature constraint of a training sample;
and a face living body detection result determining module 450, configured to determine whether the target face is a living body face according to the classification mapping result.
In some embodiments of the present application, the living body detection model is obtained by cutting a preset multitask model, and the multitask model includes:
the system comprises a first task network, a second task network and a third task network, wherein the first task network is composed of a first convolution network and a first fully-connected network and is used for learning face key point features in images input to the first convolution network;
the second task network is composed of a second convolutional network and a second fully-connected network and is used for learning the human face key point characteristics in the image input to the second convolutional network;
a third task network composed of the first convolutional network, the second convolutional network, a residual network, and a depth regression network, the third task network being configured to learn depth features in images input to the first convolutional network and the second convolutional network; and the number of the first and second groups,
a fourth task network composed of the first convolutional network, the second convolutional network, the residual network, and a classification network, the fourth task network for learning living body and non-living body information in the image input to the first convolutional network and the second convolutional network;
obtaining network parameters of the living body detection model consisting of the first convolution network, the second convolution network, the residual error network and the classification network by training the multitask model;
the first convolution network and the second convolution network are arranged in parallel, and the residual error network is connected with the outputs of the first convolution network and the second convolution network respectively.
In some embodiments of the present application, the sample data for training each training sample of the multitask model comprises: a first sample image and a second sample image, the sample label of each of the training samples comprising: the first sample image and the second sample image each correspond to: the real values of the key points of the human face, the real values of the depth values and the real values of the living body classes of the human face;
the multitask model is trained by the following method:
for each training sample in the sample set, performing the following encoding mapping operations:
inputting a first sample image comprised by the training sample to the first convolutional network of the multitask model, while inputting a second sample image comprised by the training sample to the second convolutional network of the multitask model;
performing operation processing on the first sample image through the first task network to obtain a human face key point prediction value of the first sample image in the training sample; performing operation processing on the second sample image through the second task network to obtain a human face key point prediction value of the second sample image in the training sample;
performing operation processing on the first sample image and the second sample image through the third task network to obtain a depth value predicted value of the training sample;
performing operation processing on the first sample image and the second sample image through the fourth task network to obtain a human face living body category predicted value of the training sample;
determining prediction loss values of the first task network, the second task network, the third task network and the fourth task network according to prediction values obtained by executing the coding mapping operation;
carrying out weighted summation on the predicted loss values of the first task network, the second task network, the third task network and the fourth task network to determine a model predicted total loss value of the multitask model;
optimizing the network parameters of the multitask model, and skipping to executing the coding mapping operation until the model prediction total loss value converges to meet the preset condition.
In some embodiments of the present application, determining the predicted loss values of the first task network, the second task network, the third task network, and the fourth task network according to the predicted values obtained by performing the encoding mapping operation includes:
determining a prediction loss value of the first task network according to a difference value between the human face key point prediction value and a human face key point true value of a first sample image in all the training samples in the sample set;
determining a prediction loss value of the second task network according to a difference value between the predicted value of the face key point and the true value of the face key point of a second sample image in all the training samples in the sample set;
determining a prediction loss value of the third task network according to the difference value between the depth value prediction value and the depth value true value of all the training samples in the sample set;
and determining a prediction loss value of the fourth task network according to the difference value between the human face living body type prediction value and the human face living body type true value of all the training samples in the sample set.
In some embodiments of the present application, the first sample image is subjected to operation processing by the first task network, so as to obtain a face keypoint prediction value of the first sample image in the training sample; and the step of performing operation processing on the second sample image through the second task network to obtain a face key point prediction value of the second sample image in the training sample includes:
performing convolution processing on the first sample image in the training sample through the first convolution network to obtain a first vector; then, coding and mapping the first vector through the first full-connection network to obtain a human face key point predicted value corresponding to the first sample image; and the number of the first and second groups,
performing convolution processing on the second sample image in the training sample through the second convolution network to obtain a second vector; and then, coding and mapping the second vector through the second fully-connected network to obtain a human face key point prediction value corresponding to the second sample image.
In some embodiments of the present application, the determining the real value of the depth value according to the key points of the face in the first sample image and the second sample image in the sample data of the training sample, and the performing operation processing on the first sample image and the second sample image through the third task network to obtain the predicted value of the depth of the training sample includes:
performing convolution processing on the first vector and the second vector through the residual error network to obtain a third vector;
and coding and mapping the third vector through the depth regression network to obtain a depth value predicted value of the face key point corresponding to the training sample.
In some embodiments of the application, the step of obtaining the human face living body category prediction value of the training sample by performing operation processing on the first sample image and the second sample image through the fourth task network includes:
performing convolution processing on the first vector and the second vector through the residual error network to obtain a third vector;
and coding and mapping the third vector through the classification network to obtain a human face living body category predicted value corresponding to the training sample.
The face living body detection device disclosed in the embodiment of the present application is used for implementing the face living body detection method described in the first embodiment of the present application, and specific implementation manners of each module of the device are not described again, and reference may be made to specific implementation manners of corresponding steps in the method embodiments.
The face living body detection device disclosed by the embodiment of the application acquires a first face image and a second face image which are synchronously acquired by a first image acquisition device and a second image acquisition device aiming at a target face; respectively carrying out face positioning on the first face image and the second face image to obtain corresponding face positioning results; then, respectively cutting a first face image to be detected from the first face image and cutting a second face image to be detected from the second face image according to the face positioning results in the first face image and the second face image; inputting the cut first face image to be detected and the cut second face image to be detected into a pre-trained living body detection model in parallel, and performing classification mapping on the target face through the living body detection model according to the plane features and the depth features in the first face image to be detected and the second face image to be detected; the living body detection model is a classification model trained on face key point constraint and depth feature constraint of a training sample; and determining whether the target face is a living face according to the classification mapping result, which is beneficial to improving the speed of face living body detection.
The face in-vivo detection device disclosed by the embodiment of the application further learns the face in-vivo and non-in-vivo characteristics based on the constraint of face key points and depth information learning results for two face images acquired by binocular image acquisition equipment in the training process of a in-vivo detection model, so that the plane information and the depth information of the image pair of the target face acquired by the binocular image acquisition equipment are realized, the target face is subjected to in-vivo detection without generating a three-dimensional space point cloud, the calculation complexity is low, the operation speed is high, and the face in-vivo detection efficiency is high.
In the model training process, the depth information of the image is considered, so that the plane non-living human faces such as photos and videos can be accurately classified in the prediction stage, and the attack of the plane image can be rapidly detected. Because the face key point information (such as the training process of the first task network and the second task network in fig. 2, and the training process of the first convolution network and the second convolution network) is fully considered by the respective networks, in the prediction stage, the living body detection model can detect the attack face of a nose complex bend, a three-dimensional head model, a simulation mask or a mask worn by a real person by pulling out a picture nose, and the accuracy of face living body detection is further improved.
Specifically, a first sample image and a second sample image (i.e., two image acquisition devices of a binocular image acquisition device) are respectively input into the multitask model shown in fig. 2, first, a first convolution network of a first task network and a second convolution network of a second task network respectively learn the face features of the first sample image and the second sample image, a full connection layer is added behind each convolution network to carry out regression on the features, and face key point constraints are increased, so that the convolution networks branch to learn the face features in the input image; then, combining the features extracted by the two convolutional network branches together, and learning the depth features and the living body features through a plurality of residual error modules of a residual error network; and then, two network branches are led out, one network branch is connected to the full-connection layer and regresses the depth value of the feature point for the fusion feature, the convolution network is added to the other network branch, then the convolution feature is stretched into a one-dimensional vector, the depth value is spliced to the one-dimensional vector, and a living body classification result is obtained through the two full-connection layers. The living body detection model trained by the structure and the method has face key point constraint and depth constraint corresponding to the key points, can learn face plane information (such as two-dimensional texture information) and depth information, and is favorable for improving the accuracy and reliability of living body detection.
On the other hand, the in-vivo detection model in the embodiment of the application is obtained by cutting based on a multi-task model, the four branch networks are synchronously trained in the training stage by combining plane information and depth information, only one branch network is used in the prediction stage, the network structure used in the prediction stage is simple, and the operation efficiency is higher.
Correspondingly, the application also discloses an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the human face liveness detection method according to the first embodiment of the application. The electronic device can be a PC, a mobile terminal, a personal digital assistant, a tablet computer and the like.
The present application also discloses a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the living human face detection method according to the first embodiment of the present application.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The method and the device for detecting the living human face provided by the application are described in detail above, a specific example is applied in the text to explain the principle and the implementation of the application, and the description of the above example is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Claims (10)

1. A face living body detection method is characterized by comprising the following steps:
acquiring a first face image and a second face image which are synchronously acquired by a first image acquisition device and a second image acquisition device aiming at a target face;
respectively carrying out face positioning on the first face image and the second face image to obtain corresponding face positioning results;
respectively cutting a first face image to be detected from the first face image and cutting a second face image to be detected from the second face image according to the face positioning results in the first face image and the second face image;
inputting the cut first face image to be detected and the cut second face image to be detected into a pre-trained living body detection model in parallel, and performing classification mapping on the target face through the living body detection model according to the plane features and the depth features in the first face image to be detected and the second face image to be detected; the living body detection model is a classification model trained on face key point constraint and depth feature constraint of a training sample;
and determining whether the target face is a living face according to the classification mapping result.
2. The method of claim 1, wherein the in-vivo detection model is tailored from a pre-defined multitask model, the multitask model comprising:
the system comprises a first task network, a second task network and a third task network, wherein the first task network is composed of a first convolution network and a first fully-connected network and is used for learning face key point features in images input to the first convolution network;
the second task network is composed of a second convolutional network and a second fully-connected network and is used for learning the human face key point characteristics in the image input to the second convolutional network;
a third task network composed of the first convolutional network, the second convolutional network, a residual network, and a depth regression network, the third task network being configured to learn depth features in images input to the first convolutional network and the second convolutional network; and the number of the first and second groups,
a fourth task network composed of the first convolutional network, the second convolutional network, the residual network, and a classification network, the fourth task network for learning living body and non-living body information in the image input to the first convolutional network and the second convolutional network;
obtaining network parameters of the living body detection model consisting of the first convolution network, the second convolution network, the residual error network and the classification network by training the multitask model;
the first convolution network and the second convolution network are arranged in parallel, and the residual error network is connected with the outputs of the first convolution network and the second convolution network respectively.
3. The method of claim 2, wherein the sample data for training each training sample of the multitask model comprises: a first sample image and a second sample image, the sample label of each of the training samples comprising: the first sample image and the second sample image each correspond to: the real values of the key points of the human face, the real values of the depth values and the real values of the living body classes of the human face;
the multitask model is trained by the following method:
for each training sample in the sample set, performing the following encoding mapping operations:
inputting a first sample image comprised by the training sample to the first convolutional network of the multitask model, while inputting a second sample image comprised by the training sample to the second convolutional network of the multitask model;
performing operation processing on the first sample image through the first task network to obtain a human face key point prediction value of the first sample image in the training sample; performing operation processing on the second sample image through the second task network to obtain a human face key point prediction value of the second sample image in the training sample;
performing operation processing on the first sample image and the second sample image through the third task network to obtain a depth value predicted value of the training sample;
performing operation processing on the first sample image and the second sample image through the fourth task network to obtain a human face living body category predicted value of the training sample;
determining prediction loss values of the first task network, the second task network, the third task network and the fourth task network according to prediction values obtained by executing the coding mapping operation;
carrying out weighted summation on the predicted loss values of the first task network, the second task network, the third task network and the fourth task network to determine a model predicted total loss value of the multitask model;
optimizing the network parameters of the multitask model, and skipping to executing the coding mapping operation until the model prediction total loss value converges to meet the preset condition.
4. The method of claim 3, wherein determining predicted loss values for the first task network, the second task network, the third task network, and the fourth task network based on predicted values from performing the code mapping operation comprises:
determining a prediction loss value of the first task network according to a difference value between the human face key point prediction value and a human face key point true value of a first sample image in all the training samples in the sample set;
determining a prediction loss value of the second task network according to a difference value between the predicted value of the face key point and the true value of the face key point of a second sample image in all the training samples in the sample set;
determining a prediction loss value of the third task network according to the difference value between the depth value prediction value and the depth value true value of all the training samples in the sample set;
and determining a prediction loss value of the fourth task network according to the difference value between the human face living body type prediction value and the human face living body type true value of all the training samples in the sample set.
5. The method according to claim 3, wherein the first sample image is subjected to operation processing through the first task network to obtain a face key point prediction value of the first sample image in the training sample; and the step of performing operation processing on the second sample image through the second task network to obtain a face key point prediction value of the second sample image in the training sample includes:
performing convolution processing on the first sample image in the training sample through the first convolution network to obtain a first vector; then, coding and mapping the first vector through the first full-connection network to obtain a human face key point predicted value corresponding to the first sample image; and the number of the first and second groups,
performing convolution processing on the second sample image in the training sample through the second convolution network to obtain a second vector; and then, coding and mapping the second vector through the second fully-connected network to obtain a human face key point prediction value corresponding to the second sample image.
6. The method according to claim 5, wherein the step of determining the true depth value according to the key points of the face in the first sample image and the second sample image in the sample data of the training sample, and performing the operation processing on the first sample image and the second sample image through the third task network to obtain the predicted depth value of the training sample comprises:
performing convolution processing on the first vector and the second vector through the residual error network to obtain a third vector;
and coding and mapping the third vector through the depth regression network to obtain a depth value predicted value of the face key point corresponding to the training sample.
7. The method according to claim 5, wherein the step of performing operation processing on the first sample image and the second sample image through the fourth task network to obtain the face living body class prediction value of the training sample comprises:
performing convolution processing on the first vector and the second vector through the residual error network to obtain a third vector;
and coding and mapping the third vector through the classification network to obtain a human face living body category predicted value corresponding to the training sample.
8. A human face living body detection device is characterized in that,
the face image acquisition module is used for acquiring a first face image and a second face image which are synchronously acquired by a first image acquisition device and a second image acquisition device aiming at a target face;
the face positioning module is used for respectively carrying out face positioning on the first face image and the second face image to obtain corresponding face positioning results;
the face image cutting module is used for cutting a first face image to be detected from the first face image and cutting a second face image to be detected from the second face image according to the face positioning results in the first face image and the second face image respectively;
the image classification module is used for inputting the cut first face image to be detected and the cut second face image to be detected into a pre-trained living body detection model in parallel, and performing classification mapping on the target face through the living body detection model according to the plane features and the depth features in the first face image to be detected and the second face image to be detected; the living body detection model is a classification model trained on face key point constraint and depth feature constraint of a training sample;
and the face living body detection result determining module is used for determining whether the target face is a living body face according to the classification mapping result.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of detecting the presence of a human face according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for detecting a living body of a human face according to any one of claims 1 to 7.
CN202011063444.1A 2020-09-30 2020-09-30 Face living body detection method and device, electronic equipment and storage medium Active CN112200057B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011063444.1A CN112200057B (en) 2020-09-30 2020-09-30 Face living body detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011063444.1A CN112200057B (en) 2020-09-30 2020-09-30 Face living body detection method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112200057A true CN112200057A (en) 2021-01-08
CN112200057B CN112200057B (en) 2023-10-31

Family

ID=74012933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011063444.1A Active CN112200057B (en) 2020-09-30 2020-09-30 Face living body detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112200057B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926489A (en) * 2021-03-17 2021-06-08 北京市商汤科技开发有限公司 Living body detection method, living body detection device, living body detection equipment, living body detection medium, living body detection system and transportation means
CN113052034A (en) * 2021-03-15 2021-06-29 上海商汤智能科技有限公司 Living body detection method based on binocular camera and related device
CN113052035A (en) * 2021-03-15 2021-06-29 上海商汤智能科技有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium
CN113128428A (en) * 2021-04-24 2021-07-16 新疆爱华盈通信息技术有限公司 Depth map prediction-based in vivo detection method and related equipment
CN113128429A (en) * 2021-04-24 2021-07-16 新疆爱华盈通信息技术有限公司 Stereo vision based living body detection method and related equipment
WO2023098128A1 (en) * 2021-12-01 2023-06-08 马上消费金融股份有限公司 Living body detection method and apparatus, and training method and apparatus for living body detection system
CN116844198A (en) * 2023-05-24 2023-10-03 北京优创新港科技股份有限公司 Method and system for detecting face attack

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764091A (en) * 2018-05-18 2018-11-06 北京市商汤科技开发有限公司 Biopsy method and device, electronic equipment and storage medium
CN110942032A (en) * 2019-11-27 2020-03-31 深圳市商汤科技有限公司 Living body detection method and device, and storage medium
CN111046845A (en) * 2019-12-25 2020-04-21 上海骏聿数码科技有限公司 Living body detection method, device and system
WO2020125623A1 (en) * 2018-12-20 2020-06-25 上海瑾盛通信科技有限公司 Method and device for live body detection, storage medium, and electronic device
CN111444744A (en) * 2018-12-29 2020-07-24 北京市商汤科技开发有限公司 Living body detection method, living body detection device, and storage medium
CN111680588A (en) * 2020-05-26 2020-09-18 广州多益网络股份有限公司 Human face gate living body detection method based on visible light and infrared light

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764091A (en) * 2018-05-18 2018-11-06 北京市商汤科技开发有限公司 Biopsy method and device, electronic equipment and storage medium
WO2020125623A1 (en) * 2018-12-20 2020-06-25 上海瑾盛通信科技有限公司 Method and device for live body detection, storage medium, and electronic device
CN111444744A (en) * 2018-12-29 2020-07-24 北京市商汤科技开发有限公司 Living body detection method, living body detection device, and storage medium
CN110942032A (en) * 2019-11-27 2020-03-31 深圳市商汤科技有限公司 Living body detection method and device, and storage medium
CN111046845A (en) * 2019-12-25 2020-04-21 上海骏聿数码科技有限公司 Living body detection method, device and system
CN111680588A (en) * 2020-05-26 2020-09-18 广州多益网络股份有限公司 Human face gate living body detection method based on visible light and infrared light

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MANPREET BAGGA等: "Spoofing detection in face recognition: A review", 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM) *
李睿 等: "基于二维纹理重建三维人脸深度图像后的人脸识别", 现代计算机(专业版), no. 10 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052034A (en) * 2021-03-15 2021-06-29 上海商汤智能科技有限公司 Living body detection method based on binocular camera and related device
CN113052035A (en) * 2021-03-15 2021-06-29 上海商汤智能科技有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium
CN112926489A (en) * 2021-03-17 2021-06-08 北京市商汤科技开发有限公司 Living body detection method, living body detection device, living body detection equipment, living body detection medium, living body detection system and transportation means
CN113128428A (en) * 2021-04-24 2021-07-16 新疆爱华盈通信息技术有限公司 Depth map prediction-based in vivo detection method and related equipment
CN113128429A (en) * 2021-04-24 2021-07-16 新疆爱华盈通信息技术有限公司 Stereo vision based living body detection method and related equipment
WO2023098128A1 (en) * 2021-12-01 2023-06-08 马上消费金融股份有限公司 Living body detection method and apparatus, and training method and apparatus for living body detection system
CN116844198A (en) * 2023-05-24 2023-10-03 北京优创新港科技股份有限公司 Method and system for detecting face attack
CN116844198B (en) * 2023-05-24 2024-03-19 北京优创新港科技股份有限公司 Method and system for detecting face attack

Also Published As

Publication number Publication date
CN112200057B (en) 2023-10-31

Similar Documents

Publication Publication Date Title
CN112200057A (en) Face living body detection method and device, electronic equipment and storage medium
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
WO2021036059A1 (en) Image conversion model training method, heterogeneous face recognition method, device and apparatus
CN113196289B (en) Human body action recognition method, human body action recognition system and equipment
CN111444744A (en) Living body detection method, living body detection device, and storage medium
CN111062263B (en) Method, apparatus, computer apparatus and storage medium for hand gesture estimation
CN110390308B (en) Video behavior identification method based on space-time confrontation generation network
CN112200056B (en) Face living body detection method and device, electronic equipment and storage medium
CN112464912B (en) Robot end face detection method based on YOLO-RGGNet
CN115699082A (en) Defect detection method and device, storage medium and electronic equipment
CN113781164B (en) Virtual fitting model training method, virtual fitting method and related devices
CN112836625A (en) Face living body detection method and device and electronic equipment
CN112036260A (en) Expression recognition method and system for multi-scale sub-block aggregation in natural environment
CN111833360B (en) Image processing method, device, equipment and computer readable storage medium
CN113095370A (en) Image recognition method and device, electronic equipment and storage medium
CN114663686A (en) Object feature point matching method and device, and training method and device
CN113627504B (en) Multi-mode multi-scale feature fusion target detection method based on generation of countermeasure network
CN113705361A (en) Method and device for detecting model in living body and electronic equipment
CN112417974A (en) Public health monitoring method
CN115311186B (en) Cross-scale attention confrontation fusion method and terminal for infrared and visible light images
CN115115552B (en) Image correction model training method, image correction device and computer equipment
CN115496859A (en) Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning
CN114841887A (en) Image restoration quality evaluation method based on multi-level difference learning
CN114399648A (en) Behavior recognition method and apparatus, storage medium, and electronic device
CN114511877A (en) Behavior recognition method and device, storage medium and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant