WO2020000908A1

WO2020000908A1 - Method and device for face liveness detection

Info

Publication number: WO2020000908A1
Application number: PCT/CN2018/119758
Authority: WO
Inventors: 彭菲; 黄磊; 刘昌平
Original assignee: 汉王科技股份有限公司
Priority date: 2018-06-29
Filing date: 2018-12-07
Publication date: 2020-01-02
Also published as: CN108549886A

Abstract

The present invention relates to the technical field of facial recognition and addresses the problem in the prior art of low efficiency and accuracy in face liveness detection. A method face liveness detection, comprising: acquiring a color image and a depth image of a target to be identified (11), determining a normalized facial image corresponding to the color image and the depth image respectively (12), determining a correlation feature between the color image and the depth image by performing a correlation analysis between the normalized facial image corresponding to the color image and the normalized facial image corresponding to the depth image, and determining a depth consistency feature of the depth image by performing a depth consistency analysis on the normalized facial image corresponding to the depth image (13), and according to the correlation feature and the depth consistency feature, performing face liveness detection on the target to be identified (14). The method determines face liveness by using color information in combination with spatial information in an image of a target to be identified, improving the accuracy of face liveness detection.

Description

Human face live body detection method and device

Technical field

The present invention relates to the technical field of face recognition, and in particular, to a method and a device for detecting the living body of a face.

Background technique

Face recognition technology is more and more widely used in biometric identification devices such as time and attendance machines, access control systems, and electronic payment systems, which greatly facilitates people's daily lives.

However, with the widespread application of face recognition technology, the importance of face attack detection has become increasingly prominent. Common face attack methods include face recognition by using fake face images, face videos, or face molds to impersonate real faces.

Normally, face attack detection can be performed by performing face live detection on the image to be recognized. In the prior art, commonly used facial live detection methods include facial live detection based on motion information, facial feature detection based on texture feature analysis in photos collected under natural light conditions of the face, and combined voice information and facial image features. Perform face live detection.

The applicant found in the research on the prior art that face live body detection based on motion information and other information combined with speech takes a long time to collect features, and the detection efficiency is low; face live body detection based on texture features, in Poor results on HD face images.

In summary, the face live body detection method in the prior art needs to be improved.

Summary of the invention

The embodiment of the present invention aims to provide a face live detection method, which can efficiently and accurately perform face live detection.

According to an aspect of the present invention, an embodiment of the present invention provides a method for detecting a living body of a face, including:

Obtaining color and depth images of the target to be detected;

Respectively determining a normalized face image corresponding to the color image and the depth image;

Determining correlation characteristics between the color image and the depth image by performing correlation analysis on the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image; and,

Determining a depth consistency feature of the depth image by performing a depth consistency analysis on a normalized face image corresponding to the depth image;

Performing face live detection on the target to be detected according to the correlation feature and the depth consistency feature.

Optionally, by performing correlation analysis on the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image, determining the association between the color image and the depth image Sexual characteristics, including:

Denoise processing is performed on the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image by using a skin color model, and to determine the values in the normalized face image corresponding to the color image. A trusted pixel point and a trusted pixel point in a normalized face image corresponding to the depth image;

Determining a grayscale face image of a normalized face image corresponding to the color image;

Determining a first gray histogram of the grayed face image based on trusted pixels in the normalized face image corresponding to the color image; and based on the normalized person corresponding to the depth image A trusted pixel in a face image to determine a second grayscale histogram of the depth image;

A correlation feature of the color image and the depth image is determined by performing correlation analysis on the first gray histogram and the second gray histogram.

Optionally, the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image are denoised by using a skin color model to determine the normalization corresponding to the color image, respectively. The trusted pixel points in the normalized face image and the normalized pixel points in the normalized face image corresponding to the depth image include:

Determining every two pixel points with the same pixel coordinates in the normalized face image corresponding to the color image and the depth image as a pair of pixel points;

For each pair of pixels, it is determined that the pixel value of the pixel corresponding to the color image belongs to the skin color range defined by the skin color model, and when the pixel value of the pixel corresponding to the depth image satisfies a preset effective depth value condition , Each pixel in the pair of pixels is marked as a trusted pixel.

Optionally, determining depth consistency characteristics of the depth image by performing depth consistency analysis on a normalized face image corresponding to the depth image includes:

Divide the normalized face image corresponding to the depth image into N * M sub-regions, where N and M are integers greater than or equal to 3;

Determining a histogram of each of the sub-regions according to pixels whose pixel values in each of the sub-regions of the depth image satisfy a predefined effective depth value condition;

By calculating the cross entropy or divergence of any two of the histograms, a depth consistency feature of the depth image is determined.

Optionally, performing face live detection on the target to be detected according to the correlation feature and the depth consistency feature includes:

Classify and identify the correlation feature through a first kernel function, determine a first recognition result, and classify and identify the deep consistency feature through a second kernel function, and determine a second recognition result;

By performing weighted fusion on the first recognition result and the second recognition result, a result of performing face live detection on the target to be detected is determined.

Optionally, determining the normalized face image corresponding to the color image and the depth image separately includes:

Extracting a face region image in the color image and the depth image respectively through an oval template;

Normalizing the face region image in the color image and the face region image in the depth image, respectively, to obtain a normalized face image corresponding to the color image and a normalization corresponding to the depth image A face image.

Optionally, before the step of separately determining a normalized face image corresponding to the color image and the depth image, the method includes:

Pixel-aligning the color image and the depth image.

According to another aspect of the present invention, an embodiment of the present invention further provides a face live detection device, including:

An image acquisition module, configured to acquire a color image and a depth image of a target to be detected;

A normalization module, configured to respectively determine normalized face images corresponding to the color image and the depth image;

A first feature determination module, configured to determine the color image and the depth image by performing correlation analysis on a normalized face image corresponding to the color image and a normalized face image corresponding to the depth image Related characteristics; and

A second feature determination module, configured to determine a depth consistency feature of the depth image by performing a depth consistency analysis on a normalized face image corresponding to the depth image;

The living body detection module is configured to perform face live detection on the target to be detected according to the correlation features determined by the first feature determination module and the depth consistency features determined by the second feature determination module.

Optionally, by performing correlation analysis on the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image, the correlation feature of the color image and the depth image is determined When the first feature determining module is configured to:

Optionally, the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image are denoised by using a skin color model to determine the normalized person corresponding to the color image, respectively. When the trusted pixels in the face image and the trusted pixels in the normalized face image corresponding to the depth image, the first feature determination module is configured to:

For each pair of pixels, it is determined that the pixel value of the pixel corresponding to the color image belongs to the skin color range defined by the skin color model, and when the pixel value of the pixel corresponding to the depth image meets the preset effective depth value condition , Each pixel in the pair of pixels is marked as a trusted pixel.

Optionally, when a depth consistency analysis is performed on a normalized face image corresponding to the depth image to determine a depth consistency feature of the depth image, the second feature determination module is configured to:

Optionally, when performing face live detection on the target to be detected according to the correlation feature and the depth consistency feature, the live detection module is configured to:

Optionally, when determining a normalized face image corresponding to the color image and the depth image, the normalization module is configured to:

According to another aspect of the present invention, an embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the processor. The computer program implements the face live body detection method according to the embodiment of the present invention.

According to another aspect of the present invention, an embodiment of the present invention further provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the live face detection according to the embodiment of the present invention step.

In this way, the face living body detection method disclosed in the embodiment of the present invention determines a normalized face image corresponding to the color image and the depth image by acquiring a color image and a depth image of a target to be detected; Performing a correlation analysis on the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image to determine the correlation feature between the color image and the depth image; and Depth consistency analysis is performed on the normalized face image corresponding to the image to determine the depth consistency feature of the depth image; according to the correlation feature and the depth consistency feature, perform face live on the target to be detected The detection solves the problems of low efficiency and low accuracy of face living body detection in the prior art. Regardless of whether the color image and the depth image required by the method for detecting a living body of a face disclosed in the embodiments of the present invention are acquired at the same time or not, the time of image acquisition can be reduced, and the efficiency of detecting a living body of a face is improved. At the same time, by combining color information and spatial information in the image of the target to be detected, human body live detection is performed on the target to be detected, thereby improving the accuracy of the living body detection.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to explain the technical solutions of the embodiments of the present invention more clearly, the drawings used in the description of the embodiments of the present invention will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative labor.

FIG. 1 is a flowchart of a face live detection method according to the first embodiment of the present invention; FIG.

2a and 2b are schematic diagrams of a color image and a depth image obtained in Embodiment 1 of the present invention;

3a and 3b are schematic diagrams of a normalized face image determined in Embodiment 1 of the present invention;

4 is a schematic diagram of pixels at the same position in two normalized face images in Embodiment 1 of the present invention;

FIG. 5 is a schematic diagram of subregion division of a normalized face image corresponding to a depth image in Embodiment 1 of the present invention; FIG.

FIG. 6 is one of the schematic structural diagrams of a living body detection device for a face according to Embodiment 2 of the present invention.

detailed description

In the following, the technical solutions in the embodiments of the present invention will be clearly and completely described with reference to the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

Embodiment one:

This embodiment provides a method for detecting a living body of a human face. As shown in FIG. 1, the method includes steps 11 to 14.

Step 11: Obtain a color image and a depth image of the target to be detected.

In some embodiments of the present invention, two images of a target to be detected are collected simultaneously by an image acquisition device provided with a natural light camera and a depth camera, or the face information such as a face posture is maintained through the image acquisition device. The natural light camera and the depth camera successively acquire two images of a target to be detected.

For example, a color image of a target to be detected is collected by a natural light camera, and a depth image of a target to be detected is collected by a depth camera. Wherein, the placement positions of the natural light camera and the depth camera on the image acquisition device are close, so as to collect images of the target to be detected from similar positions and angles, respectively.

In some embodiments of the present invention, a pair of RGB-D images (color-depth images) can be taken with a Kinect device to be detected, which contains a color image (as shown in Figure 2a) and a "2.5D depth Image (as shown in Figure 2b) "or" pseudo-depth image ".

In some embodiments of the present invention, before the determining the normalized face image corresponding to the color image and the depth image, respectively, the method includes: pixel-aligning the color image and the depth image.

In devices such as Kinect, there is a certain physical position difference between the two sensors that capture color images and pseudo-depth images, so we need to perform binocular image calibration on the original RGB-D picture using camera-related parameters. Real depth images require special hardware equipment (such as laser equipment) or depth reconstruction algorithms for calculation, where the pixel value of each pixel is the specific depth information. The “false depth image” or “2.5D depth image” in the embodiment of the present invention refers to a depth image obtained by a structured light camera. The depth image described in the embodiment of the present invention contains fewer image details, and the pixel value of each pixel does not refer to specific depth information, but is only a representation of the depth relationship between pixels. In this embodiment, the acquired depth image is a converted gray image.

In other embodiments of the present invention, if a set of image depth information is collected by a depth image acquisition device, the depth information needs to be mapped to a gray value to obtain a depth image in a gray image format.

Step 12: Determine a normalized face image corresponding to the color image and the depth image, respectively.

For the obtained color image and depth image, it is necessary to further perform face area image extraction and normalization in order to perform face live detection subsequently.

For example, the position of the human eye can be determined first by using a face detection algorithm; then, the facial area image is extracted from the color image and the depth image by using a geometric template such as an oval template, a circular template, or a rectangular template; The face region image extracted from the image and the face region image extracted from the depth image are normalized and normalized to a uniform size to obtain a normalized face image and the depth corresponding to the color image. The normalized face image corresponding to the image.

In some preferred embodiments of the present invention, respectively determining a normalized face image corresponding to the color image and the depth image includes: extracting a face region image in the color image and the depth image respectively through an oval template, and Normalize the face region image in the color image and the face region image in the depth image to obtain the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image.

For example, in the corrected RGB-D image, the Viola-Jones cascade face detector provided by OpenCV, or other face detection algorithms are used to locate the face area of the collected color image and depth image.

Further, in order to avoid the potential influence of the area around the face on the texture correlation analysis, according to the position information of the face and eyes determined during the face area positioning, an ellipse template is used to process the input color image and depth image. Cropping, extracting an image of a face area in the color image (as shown in FIG. 3a) and an image of a face area in the depth image (as shown in FIG. 3b).

Because the equipment of the obtained color image and depth image is different, in order to ensure the consistency of image processing, further, the image of the face area in the extracted color image and the image of the face area in the depth image are separately A normalization process is performed to obtain a normalized face image corresponding to the color image of a uniform size and a normalized face image corresponding to the depth image. For the technical solution of normalizing the oval face image, reference may be made to the technical solution of normalizing the rectangular face image in the prior art, which is not described in this embodiment.

Step 13: Perform correlation analysis on the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image to determine the correlation characteristics of the color image and the depth image, and The normalized face image is analyzed for depth consistency to determine the depth consistency characteristics of the depth image.

In the actual application process, the applicant found that most face forgery attacks use photos or screens as the attack vector. Although the color image texture information of the forged face is close to the real face, the depth image is similar to the real user depth map. The obvious difference is, therefore, an effective living body detection clue can be obtained by exploring the correlation characteristics between the color image and the depth image of the face area.

However, in addition to common screens or photos, attack vectors such as face masks or head models are also one of the challenges that live detection systems will face. The depth images of masks forged faces are similar to real faces, so you cannot simply apply Detection of fake faces in photos or screens.

After further research, the applicant found that although the face mask can simulate real users from both color and depth images, the size of the mask is fixed at the time of production and has nothing to do with the face size of the wearer. The fixed size of the face mask will make the correlation between the color map and the depth map in some areas of the fake face appear significantly different, especially at the edge where the mask fits with the real face. This phenomenon will be more obvious.

Therefore, in the embodiment of the present invention, based on the imaging characteristics of the human face skin in the color image and the depth image, the potential correlation between color information and spatial information is analyzed.

In some embodiments of the present invention, the correlation characteristics of the color image and the depth image may be determined by performing correlation analysis on the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image. A depth consistency analysis is performed on the normalized face image corresponding to the depth image to determine the depth consistency feature of the depth image, and then the living body detection of the face is performed by combining the determined correlation feature and the depth consistency feature.

In some embodiments of the present invention, the consistency characteristics of the color image and the depth image are determined by performing consistency analysis on the normalized face images corresponding to the color image and the depth image, including: sub-steps S1 to S5.

Sub-step S1: performing a denoising process on the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image through the skin color model, and determining the normalized face image corresponding to the color image respectively. The trusted pixels and the trusted pixels in the normalized face image corresponding to the depth image.

The sizes of real faces are different. The normalized face image corresponding to the color image and the normalized face image corresponding to the depth image obtained through the foregoing steps may include many non-face skins. Some areas, such as background areas and hair, are significantly different from human skin in terms of imaging characteristics, which will directly affect subsequent correlation analysis.

Therefore, in some embodiments of the present invention, a predefined skin color model is used to consider removing these non-skin pixels that may cause interference. The skin color model uses YCbCr color space to cluster skin colors in a light-independent chromaticity plane, so that the skin color model can be applied to various environments such as different light and different skin colors. For the modeling method of the skin color model, refer to the prior art, which is not repeated in the embodiment of the present invention.

In the normalized face image, not only the interference of non-skin pixels in the color image, but the structured light depth camera is limited by its own imaging principle. There may also be some defects or blind spots in the captured depth image, that is, some pixels Corresponding depth information cannot be recovered smoothly through structured light, and some pixels in which depth values do not exist are formed in the depth image. In order to improve the reliability and stability of the correlation analysis between the color image and the depth image, before the subsequent analysis, the interference of these non-skin pixels and pixels where the depth value does not exist needs to be excluded.

In some embodiments of the present invention, the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image are denoised by using a skin color model to determine the normalized face corresponding to the color image, respectively. The trusted pixels in the normalized face image corresponding to the trusted pixels in the image and the depth image include: every two normalized face images in which the color image and the depth image correspond to each other have the same pixel coordinates Pixels are determined as a pair of pixels. For each pair of pixels, it is determined that the pixel value of the pixel corresponding to the color image belongs to the skin color range defined by the skin color model, and the pixel value of the pixel corresponding to the depth image When the preset effective depth value condition is satisfied, each pixel in the pair of pixels is marked as a trusted pixel respectively.

When determining each pair of pixel points with the same pixel coordinates in the normalized face image corresponding to the color image and the depth image respectively, the normalized face image corresponding to the color image may be determined first. The first pixel point of the selected pixel coordinate position in the image, and then determine the second pixel point of the selected pixel coordinate position in the normalized face image corresponding to the depth image, and finally, the first pixel point and the second pixel point are The pixels are determined as a pair of pixels.

For example, for the pixel point D1 in the normalized face image corresponding to the color image and the pixel point D2 in the normalized face image corresponding to the depth image, as shown in FIG. 4, the pixel point D1 and the pixel point D2 correspond to to be detected The same imaging position of the target, that is, the pixel position of pixel point D1 in the normalized face image corresponding to the color image and the pixel position of pixel point D2 in the normalized face image corresponding to the depth image, if and only Pixels D1 and D2 can be regarded as trusted pixels when the following two conditions are met: First, the pixel value of pixel D1 belongs to the skin color range defined by the skin color model; The second condition, the pixel value of the pixel point D2 satisfies a pre-defined effective depth value condition. The predefined effective depth value condition may be that the pixel value is not equal to 255. Due to the defects of the structured light camera, when collecting depth information, some pixels cannot obtain depth information, which may appear as NaN or 255 in the data, and after mapping to the depth image, it corresponds to the brightest white in the depth image. If the pixel value of a pixel in the depth image is not white, the depth value is considered valid, that is, the pixel is a trusted pixel.

Sub-step S2, determining a grayscale face image of the normalized face image corresponding to the color image.

In specific implementation, the normalized face image corresponding to the color image may be subjected to graying processing to obtain the grayed face image of the normalized face image corresponding to the color image. Alternatively, after obtaining the above-mentioned color image, first perform grayscale processing on the obtained color image, and then perform face area image extraction and normalization processing on the grayscale processed color image through an ellipse template to obtain A normalized face image corresponding to a color image.

Sub-step S3: determining a first gray-scale histogram of the gray-scaled face image based on the trusted pixel points in the normalized face image corresponding to the color image.

Because the depth image is less affected by light, in the case of correlation analysis in combination with the depth image, simple texture information can be extracted from the color image. When the present invention is specifically implemented, the grayscale of the color face image can be extracted Histograms are used for correlation analysis to improve computing efficiency and are highly versatile.

In specific implementation, when extracting a gray histogram of a color face image, only the gray distribution of trusted pixels in the normalized face image corresponding to the color image is counted to obtain the gray face. The first grayscale histogram of the image. In this embodiment, the histogram generated from the normalized face image corresponding to the grayscaled color image is denoted as C _i .

Sub-step S4: Determine a second gray histogram of the depth image based on the trusted pixel points in the normalized face image corresponding to the depth image.

In specific implementation, in order to improve the accuracy of the correlation analysis, in the embodiment of the present invention, the correlation analysis is performed based on the trusted pixels. Therefore, first, the trusted pixel points in the normalized face image corresponding to the depth image are determined. Wherein, the trusted pixel points in the normalized face image corresponding to the depth image are that the pixel value satisfies a pre-defined effective depth value condition. For the definition method of the effective depth value condition, refer to the description in the previous paragraph. Then, based on the trusted pixel points in the normalized face image corresponding to the depth image, a second gray histogram of the depth image is determined. In this embodiment, the histogram generated from the depth image is referred to as D _i .

In sub-step S5, a correlation analysis is performed on the first gray histogram and the second gray histogram to determine the correlation characteristics of the color image and the depth image.

In some embodiments of the present invention, the canonical correlation analysis (canonical correlation analysis, CCA) may be employed for the first histogram and the second C _i D _i histogram correlation analysis. First, define the projection direction of the first gray histogram C _i

And a projection direction of the second histogram of D _i

Then to maximize the two projection vectors

with

The correlation coefficient ρ _i as the target, and solve the optimal projection direction

with

The correlation coefficient ρ _i is expressed by the following function:

In the above function, the subscript T is the transpose of the vector, and E [g] represents the expectation of g.

In order to further simplify this equation, the intra-class covariance matrices C _CC and C _DD and the inter-class covariance matrices C _CD and C _DC are introduced during implementation. Since all feature vectors are on smaller subregion pictures Extraction, by introducing a regularization parameter λ for the intra-class covariance matrix to avoid situations such as overfitting, the above objective function can be rewritten as:

The above-mentioned optimized objective function can be solved by a typical correlation algorithm (Regularized Canonical Correlation Analysis) with a regular term. For a specific solution process, refer to the prior art, which will not be described in detail in the embodiment of the present invention.

By solving the above optimized objective function, two optimal projection directions can be obtained

with

Further, the projection direction of the first gray histogram may be determined.

Feature vector and the second gray histogram in the projection direction

Feature vector.

Then, according to feature vectors of the first grayscale histogram and the second grayscale histogram in respective optimal projection directions, an association feature of the color image and the depth image is constructed.

For example, placing the first gray histogram in a projection direction

Feature vector and the second gray histogram in the projection direction

The feature vectors of are connected in series, and the feature vectors obtained after the series are used as the correlation features of the color image and the depth image.

Further, by performing a depth consistency analysis on a normalized face image corresponding to the depth image, determining a depth consistency feature of the depth image includes: normalizing a face corresponding to the depth image The image is divided into N * M sub-regions, where N and M are integers greater than or equal to 3, and determined according to the pixel points in each of the sub-regions of the depth image whose pixel values meet a pre-defined effective depth value condition. A histogram of each of the sub-regions; by calculating the cross-entropy or divergence of any two of the histograms, a depth consistency feature of the depth image is determined.

Preferably, the normalized face image corresponding to the depth image is uniformly divided into N * M sub-regions, where N is equal to M.

In the actual living body detection process, the applicant found that only considering from the perspective of inaccurate depth information, fake faces such as photos, screens, and masks also have some differences from real faces: screen fake face images are displayed in an inflexible or The folded display has quite obvious flat characteristics. Although the photo-forged face image can be rotated, bent, or folded, it often maintains a more regular depth mode, such as a curved surface similar to a cylinder or gradient depth information; a mask. Although fake face images can achieve a relatively realistic depth effect, it is difficult for the mask to imitate some special areas with very complicated depth changes, such as the nose wings, nasolabial folds, etc. Therefore, in some embodiments of the present invention, the normalized face image corresponding to the depth image is equally divided into 3 * 3 sub-regions along the horizontal and vertical directions, as shown in FIG. 5. And in the order from left to right and from top to bottom, these regions are denoted as p ₁ , p ₂ , ..., p _{9, respectively} .

Then, the normalized face image p _i each subregion depth image corresponding to the further statistical having an effective pixel depth information, the pixels to the letter, and histogram h _i of the sub-region to measure substantially The depth distribution can be used to conduct live test from the spatial information dimension.

In some embodiments of the present invention, the depth distribution of the sub-regions can be measured by the divergence between the sub-regions. In specific implementation, the divergence can be calculated by the following formula:

Wherein, h _i (k) refers to the histogram h _i k-th element, h _j (k) refers to the histogram h _j k-th element.

In some preferred embodiments of the present invention, the depth distribution of a sub-region is measured by the cross-entropy between the sub-regions. In specific implementation, for any given two subregions corresponding to the histograms h _i and h _j (1≤i≤9, 1≤j≤9, i <j), the cross-entropy is used to measure the consistency of the depth distribution between them. The cross entropy of the histograms h _i and h _j is calculated as:

among them,

H (h _i) is the entropy of the histogram h _{_{_i, D KL (h i || h}} j) from KL divergence to h _i h _j, i.e., h _i h _j with respect to the relative entropy. The value of cross-entropy H (h _i , h _j ) can be understood from the perspective of information theory as the average number of bits required to finally identify the event distribution h _i when coding is performed based on the probability distribution h _j . In the specific living body detection process, if the two regions corresponding to _hi and h _j have similar depth distributions, for example, they come from the same side of a crease in a bent photo, or belong to the screen or mask of the same depth, The value of this cross entropy will be relatively small; for real faces, due to the complex depth changes and occlusions in the face region, the cross entropy between different subregions may be relatively large. Therefore, the cross Entropy can represent features of real or attacking faces.

In this embodiment, after the normalized face image corresponding to the depth image is divided into nine sub-regions in a certain order, a total of

Cross-entropy values, and finally concatenating these values as the depth consistency feature corresponding to the depth image.

In specific implementation, the value of N is determined according to the size of the face image in the data set. For example, N may also be set to an odd number such as 5 or 7. In view of the unique symmetrical characteristics of the three-by-three grid, that is, for three-by-three grids, such as attack screens such as rotating screens, photos that are bent horizontally or vertically, and masks with weak depth and detail, There may be some sub-regions with similar depth characteristics. Preferably, N is set to 3.

In specific implementation, the order of obtaining correlation features and obtaining depth consistency features can be reversed, which does not affect solving the technical problem of the present invention and achieving the same technical effect.

Step 14: Perform face live detection on the target to be detected according to the correlation feature and the depth consistency feature.

In some embodiments of the present invention, the correlation feature and the depth consistency feature can be directly combined into features to be identified, and input to a pre-trained recognition model to detect whether the target to be detected is an attacking face.

In some other preferred embodiments of the present invention, the performing a face live detection on the target to be detected according to the correlation feature and the depth consistency feature includes: associating the association with a first kernel function Classify and identify sexual characteristics, determine a first recognition result, and classify and identify the deep consistency feature through a second kernel function to determine a second recognition result; and determine the first recognition result and the second recognition by using a second kernel function Results are subjected to weighted fusion to determine a result of performing face live detection on the target to be detected.

The aforementioned correlation features between the color image and the depth image constructed by the projection direction vector of the color feature and the spatial feature, and the depth consistency feature constructed by the cross-entropy are very different in terms of physical meaning and mathematical dimensions. It may not be appropriate to use a unified classifier for live discrimination.

Therefore, according to different features of the extracted features, in some embodiments of the present invention, two classifiers with different kernel functions are used to perform live detection respectively, and then the detection results of different classifiers are fused.

For example, for the correlation features constructed based on the projection direction vector, a support vector machine with a radial basis kernel function is used for classification and recognition to determine the first recognition result; and for the depth consistency features constructed based on cross entropy, The support vector machine of the chi-square kernel function performs classification recognition and determines a second recognition result. The final classifier performs weighted fusion at the scoring level, and the corresponding weights of each classifier are determined by the verification process, and the sum of the two weights is 1. For example, weighted fusion is performed on the first recognition result and the second recognition result, and then classification recognition is performed based on the fusion result to determine whether the target to be detected is a real face. The fusion weight of the first recognition result and the second recognition result is determined according to the above test result.

The face live body detection method disclosed in the embodiment of the present invention obtains a color image and a depth image of a target to be detected; determines a normalized face image corresponding to the color image and the depth image, respectively; Performing a correlation analysis on the normalized face image corresponding to the normalized face image and the depth image to determine the correlation feature of the color image and the depth image; and Normalize the face image to perform a depth consistency analysis to determine the depth consistency characteristics of the depth image. Based on the correlation characteristics and the depth characteristics, perform face live detection on the object to be detected, which solves the current problem. There are problems with low efficiency and accuracy of face live body detection in the technology. The color image and depth image required by the face live detection method disclosed in the embodiment of the present invention can be acquired at the same time, thereby reducing the image acquisition time, improving the face live detection efficiency, and because the color information contains rich texture information, By combining the color information and spatial information in the image of the target to be detected, the living body detection of the target to be detected is carried out. Since the complementary features are used, the information is more comprehensive, which helps to improve the accuracy of the living body detection.

Embodiment two:

Correspondingly, the present invention also discloses a face live detection device. As shown in FIG. 6, the above-mentioned face live detection device includes:

An image acquisition module 610, configured to acquire a color image and a depth image of a target to be detected;

A normalization module 620, configured to determine normalized face images corresponding to the color image and the depth image, respectively;

A first feature determination module 630, configured to determine a correlation feature between the color image and the depth image by performing correlation analysis on the normalized face image corresponding to the normalized face image and the depth image of the color image; and,

A second feature determination module 640, configured to determine a depth consistency feature of the depth image by performing a depth consistency analysis on the normalized face image corresponding to the depth image;

The living body detection module 650 is configured to perform face living body detection on the target to be detected according to the correlation features determined by the first feature determination module 630 and the depth consistency features determined by the second feature determination module 640.

Optionally, by performing correlation analysis on the normalized face image corresponding to the normalized face image and the depth image of the color image to determine the correlation feature between the color image and the depth image, the first feature determination module 630 uses to:

Denoise the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image through the skin color model to determine the trusted pixel points and depth in the normalized face image corresponding to the color image. Trusted pixels in the normalized face image corresponding to the image;

Determining a grayscale face image of a normalized face image corresponding to a color image;

Determine the first gray histogram of the grayed face image based on the trusted pixels in the normalized face image corresponding to the color image; and, based on the trusted pixels in the normalized face image corresponding to the depth image Pixels, determine the second gray histogram of the depth image;

The correlation characteristics of the color image and the depth image are determined by performing correlation analysis on the first gray histogram and the second gray histogram.

Most face forgery attacks use photos or screens as the attack vector. Although the color image texture information of the forged face is close to the real face, the depth image is significantly different from the real user depth map. Correlation characteristics between the color image and depth image of the human face area to obtain effective live detection clues.

Optionally, the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image are denoised by using a skin color model to determine the credibility in the normalized face image corresponding to the color image. When the trusted pixels in the normalized face image corresponding to the pixels and the depth image are normalized, the first feature determination module 630 is configured to:

Determine each pair of pixel points with the same pixel coordinates in the normalized face image corresponding to the color image and the depth image as a pair of pixel points;

For each pair of pixels, it is determined that the pixel value of the pixel corresponding to the color image belongs to the skin color range defined by the skin color model, and when the pixel value of the pixel corresponding to the depth image meets the preset effective depth value condition, the Each pixel in the pair of pixels is labeled as a trusted pixel.

Optionally, when depth consistency analysis is performed on the normalized face image corresponding to the depth image to determine the depth consistency feature of the depth image, the second feature determination module 640 is configured to:

Determine the histogram of each sub-region based on the pixel points in each sub-region of the depth image that satisfy the pre-defined effective depth value condition;

By calculating the cross entropy or divergence of any two of the above histograms, the depth consistency characteristics of the depth image are determined.

Screen fake face images are displayed on a non-bendable or foldable display screen, which has fairly obvious flat characteristics; although photo fake face images can be rotated, bent, or folded, they often maintain a more regular depth mode, such as a cylinder Curved surface or gradient depth information; although the fake face image of the mask can achieve a relatively real depth effect, it is difficult for the mask to imitate some special areas with very complicated depth changes, such as the nose wings, nasolabial folds, etc.

In some embodiments of the present invention, the normalized face image corresponding to the depth image is equally divided into 3 * 3 sub-regions in the horizontal and vertical directions, as shown in FIG. 5. And in the order from left to right and from top to bottom, these regions are denoted as p ₁ , p ₂ , ..., p _{9, respectively} . Then, the normalized face image p _i each subregion depth image corresponding to the further statistical having an effective pixel depth information, the pixels to the letter, and histogram h _i of the sub-region to measure substantially The depth distribution can be used to conduct live test from the spatial information dimension.

In the normalized face image, not only the interference of non-skin pixels in the color image, but the structured light depth camera is limited by its own imaging principle. There may also be some defects or blind spots in the captured depth image, that is, some pixels Corresponding depth information cannot be recovered smoothly through structured light, and some pixels in which depth values do not exist are formed in the depth image. Before the subsequent analysis, it is necessary to eliminate the interference between these non-skin pixels and pixels that have no depth value, which can improve the reliability and stability of the correlation analysis between the color image and the depth image.

Optionally, according to the correlation feature determined by the first feature determination module 630 and the depth consistency feature determined by the second feature determination module 640, when performing face live detection on the target to be detected, the live detection module 650 is configured to:

Classify and identify the above-mentioned correlation feature through a first kernel function, determine a first recognition result, and classify and recognize a depth consistency feature through a second kernel function, and determine a second recognition result;

Optionally, when a normalized face image corresponding to the color image and the depth image is determined, the normalization module 620 is configured to:

Normalize the face area image in the color image and the face area image in the depth image, respectively, to obtain a normalized face image corresponding to the color image and a normalized face image corresponding to the depth image.

Optionally, the above device further includes:

A pixel alignment module (not shown in the figure) is configured to perform pixel alignment on the color image and the depth image.

The human face living body detection device disclosed in the embodiment of the present invention obtains a color image and a depth image of a target to be detected; determines a normalized face image corresponding to the color image and the depth image, respectively; Performing a correlation analysis on the normalized face image corresponding to the normalized face image and the depth image to determine the correlation feature of the color image and the depth image; and Normalize the face image to perform a depth consistency analysis to determine the depth consistency characteristics of the depth image. Based on the correlation characteristics and the depth characteristics, perform face live detection on the object to be detected, which solves the current problem. There are problems with low efficiency and accuracy of face live body detection in the technology. The color image and depth image required by the face living body detection device disclosed in the embodiment of the present invention can be collected simultaneously, thereby reducing the image acquisition time, improving the face living body detection efficiency, and because the color information contains rich texture information, By combining the color information and spatial information in the image of the target to be detected, the living body detection of the target to be detected is carried out. Since the complementary features are used, the information is more comprehensive, which helps to improve the accuracy of the living body detection.

Accordingly, an embodiment of the present invention also discloses an electronic device. The electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor. The processor executes the processor. The computer program implements the face live body detection method according to the first embodiment of the present invention. The electronic device may be a mobile phone, a PAD, a tablet computer, a face recognition machine, or the like.

Correspondingly, an embodiment of the present invention further provides a computer-readable storage medium having stored thereon a computer program, which is executed by a processor to implement the steps of the face live body detection method according to the first embodiment of the present invention.

The device embodiment of the present invention corresponds to a method. For specific implementation manners of each module and unit in the device embodiment, refer to the method is an embodiment, and details are not described herein again.

Those of ordinary skill in the art may realize that the units and algorithm steps of each example described in combination with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. A person skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present invention.

Those of ordinary skill in the art can understand that, in the embodiments provided by the present invention, the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple Network unit. In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit.

When the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product. The computer software product is stored in a storage medium and includes several instructions for making a computer device (which can be a personal computer, a server, or a network). Equipment, etc.) perform all or part of the steps of the method described in each embodiment of the present invention. The foregoing storage medium includes various media that can store program codes, such as a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The foregoing is only a specific implementation of the present invention, but the scope of protection of the present invention is not limited to this. Those of ordinary skill in the art can realize that the units and algorithms of each example described in combination with the embodiments disclosed herein The steps can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. A person skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present invention.

Claims

A face live body detection method, characterized in that it includes:

Obtaining color and depth images of the target to be detected;

Respectively determining a normalized face image corresponding to the color image and the depth image;

Determining correlation characteristics between the color image and the depth image by performing correlation analysis on the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image; and,

Determining a depth consistency feature of the depth image by performing a depth consistency analysis on a normalized face image corresponding to the depth image;

Performing face live detection on the target to be detected according to the correlation feature and the depth consistency feature.
The method according to claim 1, wherein the determining is performed by performing correlation analysis on a normalized face image corresponding to the color image and a normalized face image corresponding to the depth image, The step of associating features between the color image and the depth image includes:

Denoise processing is performed on the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image by using a skin color model, and to determine the values in the normalized face image corresponding to the color image. A trusted pixel point and a trusted pixel point in a normalized face image corresponding to the depth image;

Determining a grayscale face image of a normalized face image corresponding to the color image;

Determining a first gray histogram of the grayed face image based on trusted pixels in the normalized face image corresponding to the color image; and based on the normalized person corresponding to the depth image A trusted pixel in a face image to determine a second grayscale histogram of the depth image;

A correlation feature of the color image and the depth image is determined by performing correlation analysis on the first gray histogram and the second gray histogram.
The method according to claim 2, wherein the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image are denoised by using a skin color model, respectively, The step of determining a trusted pixel point in the normalized face image corresponding to the color image and a trusted pixel point in the normalized face image corresponding to the depth image includes:

Determining every two pixel points with the same pixel coordinates in the normalized face image corresponding to the color image and the depth image as a pair of pixel points;

For each pair of pixels, it is determined that the pixel value of the pixel corresponding to the color image belongs to the skin color range defined by the skin color model, and when the pixel value of the pixel corresponding to the depth image meets the preset effective depth value condition , Each pixel in the pair of pixels is marked as a trusted pixel.
The method according to claim 2, wherein the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image are denoised by using a skin color model, respectively, The step of determining the trusted pixels in the normalized face image corresponding to the color image and the trusted pixels in the normalized face image corresponding to the depth image further includes:

A pre-defined skin color model is used to consider non-skin pixels that may cause interference. The skin color model uses YCbCr color space to cluster skin colors in a light-independent chromaticity plane, so that the skin color model is suitable for different lights. And a variety of different skin tones.
The method according to claim 3, wherein the step of determining a grayscale face image of a normalized face image corresponding to a color image comprises:

The grayscale processing is performed on the normalized face image corresponding to the color image to obtain the grayscale face image of the normalized face image corresponding to the color image, or after obtaining the color image, the obtained first The grayscale image of the color image is processed, and then the grayscale processed color image is subjected to face area image extraction and normalization processing through an oval template to obtain the grayscale of the normalized face image corresponding to the color image. Face image.
The method according to claim 3, wherein the correlation between the color image and the depth image is determined by performing a correlation analysis on the first gray histogram and the second gray histogram. Features steps, including:

The first canonical correlation analysis of the histogram C i and D i the second histogram correlation analysis;

According to feature vectors of the first grayscale histogram and the second grayscale histogram in respective optimal projection directions, a correlation feature of the color image and the depth image is constructed.
The method according to claim 1, wherein the step of determining a depth consistency feature of the depth image by performing a depth consistency analysis on a normalized face image corresponding to the depth image comprises:

Divide the normalized face image corresponding to the depth image into N * M sub-regions, where N and M are integers greater than or equal to 3;

Determining a histogram of each of the sub-regions according to pixels whose pixel values in each of the sub-regions of the depth image satisfy a predefined effective depth value condition;

By calculating the cross entropy or divergence of any two of the histograms, a depth consistency feature of the depth image is determined.
The method according to claim 7, wherein the step of determining the depth consistency feature of the depth image by calculating the cross entropy or divergence of any two of the histograms comprises:

The depth distribution of sub-regions is measured by the divergence between sub-regions. The divergence is calculated by the following formula:

Wherein, h i (k) refers to the histogram h i k-th element, h j (k) refers to the histogram h j k th element;

Alternatively, the cross-entropy between subregions is used to measure the depth distribution of subregions. For any given two subregions, the histograms h i and h j (1≤i≤9, 1≤j≤9, i <j ), The cross-entropy is used to measure the consistency of the depth distribution between them. The cross-entropy of the histograms h i and h j is calculated as:

among them,
H (h i) is the entropy of the histogram h i, D KL (h i || h j) from KL divergence to h i h j, i.e., h i h j with respect to the relative entropy.
The method according to claim 1, wherein the step of performing face live detection on the target to be detected according to the correlation feature and the depth consistency feature comprises:

Classify and identify the correlation feature through a first kernel function, determine a first recognition result, and classify and identify the deep consistency feature through a second kernel function, and determine a second recognition result;

By performing weighted fusion on the first recognition result and the second recognition result, a result of performing face live detection on the target to be detected is determined.
The method according to any one of claims 1 to 5, wherein the step of separately determining a normalized face image corresponding to the color image and the depth image comprises:

Extracting a face region image in the color image and the depth image respectively through a template;

Normalizing the face region image in the color image and the face region image in the depth image, respectively, to obtain a normalized face image corresponding to the color image and a normalization corresponding to the depth image A face image.
The method according to claim 10, wherein the step of extracting the face image in the color image and the depth image by using a template comprises:

Determine the position of the human eye through a face detection algorithm;

The facial shape image is extracted from the color image and the depth image respectively through the geometric shape template.
The method according to claim 10, wherein before the step of separately determining a normalized face image corresponding to the color image and the depth image, the method includes:

Pixel-aligning the color image and the depth image.
The method according to claim 1, wherein the acquiring a color image and a depth image of an object to be detected comprises:

Two images of the target to be detected are collected simultaneously by an image acquisition device provided with a natural light camera and a depth camera, or, while the face information is maintained, the natural light camera and the depth camera are used to sequentially capture the target to be detected Two images.
A human face living body detection device, comprising:

An image acquisition module, configured to acquire a color image and a depth image of a target to be detected;

A normalization module, configured to respectively determine normalized face images corresponding to the color image and the depth image;

A first feature determination module, configured to determine the color image and the depth image by performing correlation analysis on a normalized face image corresponding to the color image and a normalized face image corresponding to the depth image Related characteristics; and

A second feature determination module, configured to determine a depth consistency feature of the depth image by performing a depth consistency analysis on a normalized face image corresponding to the depth image;

The living body detection module is configured to perform face live detection on the target to be detected according to the correlation features determined by the first feature determination module and the depth consistency features determined by the second feature determination module.
The device according to claim 14, wherein the color image is determined by performing correlation analysis on a normalized face image corresponding to the color image and a normalized face image corresponding to the depth image. When it is associated with the depth image, the first feature determining module is configured to:

Denoise processing is performed on the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image by using a skin color model, and to determine the values in the normalized face image corresponding to the color image. A trusted pixel point and a trusted pixel point in a normalized face image corresponding to the depth image;

Determining a grayscale face image of a normalized face image corresponding to the color image;

Determining a first gray histogram of the grayed face image based on trusted pixels in the normalized face image corresponding to the color image; and based on the normalized person corresponding to the depth image A trusted pixel in a face image to determine a second grayscale histogram of the depth image;

A correlation feature of the color image and the depth image is determined by performing correlation analysis on the first gray histogram and the second gray histogram.
The device according to claim 15, wherein the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image are denoised by using a skin color model to determine the When the trusted pixels in the normalized face image corresponding to the color image and the trusted pixels in the normalized face image corresponding to the depth image are described, the first feature determination module is configured to:

Determining every two pixel points with the same pixel coordinates in the normalized face image corresponding to the color image and the depth image as a pair of pixel points;

For each pair of pixels, it is determined that the pixel value of the pixel corresponding to the color image belongs to the skin color range defined by the skin color model, and when the pixel value of the pixel corresponding to the depth image satisfies a preset effective depth value condition , Each pixel in the pair of pixels is marked as a trusted pixel.
The device according to claim 15, wherein the normalized face image corresponding to the color image and the normalized face image corresponding to the depth image are denoised by a skin color model, respectively, When determining the trusted pixels in the normalized face image corresponding to the color image and the trusted pixels in the normalized face image corresponding to the depth image, the first feature determination module is configured to:

A pre-defined skin color model is used to consider non-skin pixels that may cause interference. The skin color model uses YCbCr color space to cluster skin colors in a light-independent chromaticity plane, so that the skin color model is suitable for different lights. And a variety of different skin tones.
The device according to claim 16, wherein when the grayscale face image of the normalized face image corresponding to the color image is determined, the first feature determination module is configured to:

The grayscale processing is performed on the normalized face image corresponding to the color image to obtain the grayscale face image of the normalized face image corresponding to the color image, or after obtaining the color image, the obtained first The grayscale image of the color image is processed, and then the grayscale processed color image is subjected to face area image extraction and normalization processing through an oval template to obtain the grayscale of the normalized face image corresponding to the color image. Face image.
The apparatus according to claim 16, wherein the correlation between the color image and the depth image is determined by performing a correlation analysis on the first gray histogram and the second gray histogram. When it is characterized, the first feature determination module is configured to:

The first canonical correlation analysis of the histogram C i and D i the second histogram correlation analysis:

According to feature vectors of the first grayscale histogram and the second grayscale histogram in respective optimal projection directions, a correlation feature of the color image and the depth image is constructed.
The device according to claim 14, characterized in that, when a depth consistency analysis is performed on a normalized face image corresponding to the depth image to determine a depth consistency feature of the depth image, the second feature The determination module is used to:

Divide the normalized face image corresponding to the depth image into N * M sub-regions, where N and M are integers greater than or equal to 3;

Determining a histogram of each of the sub-regions according to pixels whose pixel values in each of the sub-regions of the depth image satisfy a predefined valid depth value condition;

By calculating the cross entropy or divergence of any two of the histograms, a depth consistency feature of the depth image is determined.
The apparatus according to claim 20, wherein the second feature determining module is configured to determine a depth consistency feature of the depth image by calculating the cross entropy or divergence of any two of the histograms. Used for:

The depth distribution of sub-regions is measured by the divergence between sub-regions. The divergence is calculated by the following formula:

Wherein, h i (k) refers to the histogram h i k-th element, h j (k) refers to the histogram h j k th element;

Alternatively, the cross-entropy between subregions is used to measure the depth distribution of subregions. For any given two subregions, the histograms h i and h j (1≤i≤9, 1≤j≤9, i <j ), The cross-entropy is used to measure the consistency of the depth distribution between them. The cross-entropy of the histograms h i and h j is calculated as:
The device according to claim 14, characterized in that, when performing face live detection on the target to be detected according to the correlation feature and the depth consistency feature, the living body detection module is configured to:

Classify and identify the correlation feature through a first kernel function, determine a first recognition result, and classify and identify the deep consistency feature through a second kernel function, and determine a second recognition result;

By performing weighted fusion on the first recognition result and the second recognition result, a result of performing face live detection on the target to be detected is determined.
The device according to any one of claims 14 to 22, wherein when determining a normalized face image corresponding to the color image and the depth image, the normalization module is configured to:

Extracting a face region image in the color image and the depth image respectively through a template;

Normalizing the face region image in the color image and the face region image in the depth image, respectively, to obtain a normalized face image corresponding to the color image and a normalization corresponding to the depth image A face image.
The device according to claim 23, wherein the normalization module is configured to: when extracting the face image in the color image and the depth image through a template, respectively:

Determine the position of the human eye through a face detection algorithm;

The facial shape image is extracted from the color image and the depth image respectively through the geometric shape template.
An electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that when the processor executes the computer program, any one of claims 1 to 13 is implemented Face live body detection method as described in item 3.
A computer-readable storage medium having stored thereon a computer program, characterized in that, when the program is executed by a processor, the steps of the face live body detection method according to any one of claims 1 to 13 are implemented.