CN112487922B

CN112487922B - Multi-mode human face living body detection method and system

Info

Publication number: CN112487922B
Application number: CN202011339311.2A
Authority: CN
Inventors: 辛冠希; 高通; 钱贝贝; 黄源浩; 肖振中
Original assignee: Orbbec Inc
Current assignee: Orbbec Inc
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2024-05-07
Anticipated expiration: 2040-11-25
Also published as: CN112487922A

Abstract

The invention discloses a multimode face living body detection method and a system, comprising the following steps: s1, collecting a color image, an infrared image and a depth image of a target area, and registering; s2, detecting key points of the face according to the color image to obtain initial face frames and key point information of the face; s3, preprocessing the image based on the initial face frame and the face key point information, and cutting the preprocessed image to obtain an effective face image; s4, converting and/or normalizing the effective face images, and superposing the converted and/or normalized effective face images; s5, conveying the superimposed images to neural networks of different architectures to extract different face features, combining the different face features and judging living bodies; the invention improves the accuracy of human face living body detection, improves the robustness and generalization capability of a human face detection algorithm, simultaneously does not need a user to make corresponding cooperation action during detection, and optimizes the experience effect of the user.

Description

Multi-mode human face living body detection method and system

Technical Field

The invention relates to the technical field of digital image processing, in particular to a multimode face living body detection method and system.

Background

Along with the development of technologies such as electronic commerce and the like, the face-based authentication is widely applied, wherein the face-based authentication is mainly realized through a face recognition technology, and the security problem is gradually exposed while the life convenience of people is greatly improved through the face recognition technology, and the living body detection is performed by adopting a single mode, so that the algorithm robustness and generalization capability are not high. Therefore, the face living body anti-counterfeiting technology attracts a great deal of attention.

The existing living body detection technology generally adopts a matching method, namely, the actions such as face nodding, head shaking, blinking, mouth opening and the like are detected, and a user is required to match to make corresponding actions; if a video of a corresponding motion (such as nodding, waving, blinking, opening mouth, etc.) is shot in advance or a dummy head cover is put on, the living body is also detected, so that the counterfeit can be easily detected by the living body. In summary, the existing living body detection technology has poor user experience effect and poor safety, and is easy to break through by illegal users.

The foregoing background is only for the purpose of providing an understanding of the principles and concepts of the application and is not necessarily in the prior art to the present application and is not intended to be used as an admission that the background of the application is prior art to the present application or its application, or that it is prior art to the present application or its application.

Disclosure of Invention

The invention aims to provide a multi-mode human face living body detection method and system, which are used for solving at least one of the problems in the background technology.

In order to achieve the above object, the technical solution of the embodiment of the present invention is as follows:

a multi-mode human face living body detection method comprises the following steps:

S1, collecting a color image, an infrared image and a depth image of a target area, and registering;

S2, detecting the key points of the face of the color image to obtain initial face frames and key point information of the face;

S3, preprocessing an acquired image based on the initial face frame and the face key point information, and cutting the preprocessed image to obtain an effective face image;

S4, converting and/or normalizing the effective face image, and superposing the converted and/or normalized effective face image;

and S5, respectively conveying the superimposed images to at least three neural networks with different architectures to extract different face features, combining the different face features, acquiring a combined feature map and judging a living body.

In some embodiments, step S2 comprises:

s20, conveying the color image to a trunk feature extraction network, and outputting three first effective feature layers;

s21, processing the three first effective feature layers to obtain an effective feature fusion layer;

S22, extracting the enhanced features of the effective feature fusion layer, and outputting a second effective feature layer;

S23, carrying out face prediction according to the second effective feature layer to obtain the initial face frame and the face key point information.

In some embodiments, step S3 comprises:

s30, calculating the distance between two key points according to the coordinate information of the key points of the face, judging whether the calculated distance is within the preset distance range of the two key points, filtering out face images which do not accord with the preset size, and obtaining color effective face images;

S31, carrying out face contour detection and average brightness calculation on the corresponding infrared image of the color effective face image, and judging whether the infrared image contains a complete face contour and a living body or not so as to acquire the infrared effective face image;

S32, based on the face key point information, acquiring depth values of corresponding face key points in a depth image corresponding to the infrared effective face image, and judging whether the face key points are in a preset effective range or not and whether the relative distribution relation of the depth values accords with preset depth distribution or not so as to acquire a final effective processing image;

S33, cutting the color image, the infrared image and the depth image based on the coordinate information of the face key points according to the effective processing image to respectively obtain a first effective face image, a second effective face image and a third effective face image.

In some embodiments, step S5 comprises:

s50, respectively inputting the superimposed images into at least three neural networks with different frameworks, and extracting the characteristics of different characteristic parts of the effective face image;

s51, combining the feature images of the different feature parts to obtain a combined feature image, and connecting the combined feature image through a full-connection layer to generate a feature vector with the same dimension as the number of neurons;

S52, the feature vectors obtained through the full connection layer are input into a soft-max layer in average through logistic regression, and living body judgment is carried out.

In some embodiments, in step S4, the first effective face image is converted into a YCrCb face image, normalization processing is performed on the second effective face image and the third effective face image, and the second effective face image, the third effective face image and the YCrCb face image after normalization processing are superimposed, so as to obtain the superimposed image.

The technical scheme of another embodiment of the invention is as follows:

A multi-mode human face living body detection system comprises an acquisition camera, an image registration module, an image detection module, an image clipping module, an image fusion module and a living body judgment module; wherein,

The acquisition camera is used for acquiring color images, infrared images and depth images;

the image registration module is used for acquiring the color image, the infrared image and the depth image acquired by the acquisition camera and registering the color image, the infrared image and the depth image;

the image detection module comprises a color face detection unit and an image preprocessing unit; the color face detection unit is used for detecting a face and acquiring initial face frame and face key point information; the image preprocessing unit is used for processing the initial face frame and the face key point information to obtain an effective processing image;

The image clipping module is used for clipping the color image, the infrared image and the depth image corresponding to the effective processing image to obtain different effective face images;

The image fusion module is used for converting and/or normalizing the effective face images and superposing the converted and/or normalized effective face images to obtain a fusion image;

The living body judging module comprises at least three different neural network architectures, different feature extraction is carried out on the fused image by constructing different neural network architectures to obtain different feature images, the different feature images are combined to obtain combined feature images, and living body judgment is carried out according to the combined feature images.

In some embodiments, the image preprocessing unit calculates the distance between two key points according to the coordinate information of the key points of the face, and determines whether the calculated distance is within a preset distance range of the two key points, so as to obtain a color effective face image.

In some embodiments, the image preprocessing unit is further configured to perform face contour detection and average brightness calculation on an infrared image corresponding to the color effective face image, and determine whether the infrared image corresponding to the color effective face image includes a complete face contour and a living body, so as to obtain an infrared effective face image.

In some embodiments, the image preprocessing unit obtains depth values of corresponding face key points in the depth image corresponding to the infrared effective face image based on the face key point information, and judges whether the face key points are in a preset effective range or not and whether a relative distribution relation of the depth values accords with preset depth distribution or not, so as to obtain an effective processing image.

In some embodiments, the image clipping module clips a color image, an infrared image, and a depth image corresponding to the effective processing image to obtain a first effective face image, a second effective face image, and a third effective face image; the image fusion module converts the first effective face image into a YCrCb face image, performs normalization processing on the second effective face image and the third effective face image, and superimposes the normalized second effective face image, the normalized third effective face image and the YCrCb face image to obtain the fusion image.

The technical scheme of the invention has the beneficial effects that:

Compared with the prior art, the method and the device have the advantages that the color image, the infrared image and the depth image are subjected to information fusion, the interference of similar human face information is eliminated through multi-mode human face living body detection, the accuracy of human face living body detection is improved, the robustness and generalization capability of a human face detection algorithm are improved, meanwhile, a user does not need to make corresponding cooperation actions during detection, and the experience effect of the user is optimized.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

Fig. 1 is a flowchart of a multi-modality face in-vivo detection method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a multi-modality face biopsy system according to another embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical schemes and beneficial effects to be solved by the embodiments of the present invention more clear, the present invention is further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

It should be noted that the terms "first," "second," and "second" are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implying a number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present invention, the meaning of "plurality" is two or more, unless explicitly defined otherwise.

Fig. 1 is a flow chart of a multi-mode face living body detection method according to an embodiment of the invention, the method includes the following steps:

In one embodiment, a color image, an infrared image, and a depth image of a target area are acquired by an acquisition device. The acquisition device may be a depth camera based on structured light, binocular, TOF (time of flight algorithm) technology, among others. Preferably, the acquisition device comprises a structured light depth camera and a color camera for acquiring depth images, infrared images and color images. The acquisition frequencies of the depth image, the infrared image and the color image can be the same or different, and corresponding settings are performed according to specific functional requirements, for example, the depth image, the infrared image and the color image are acquired in a frequency crossing manner of 60FPS, or the depth image, the infrared image and the color image of 30FPS are acquired respectively.

In one embodiment, the color image, the infrared image and the depth image acquired by the acquisition device are registered, that is, the corresponding relation among the pixels in the color image, the infrared image and the depth image is found through a registration algorithm, so that parallax caused by different spatial positions among the color image, the infrared image and the depth image is eliminated. It should be noted that the registration may be performed by a dedicated processor in the acquisition device, or may be performed by an external processor. The registered depth image, infrared image and color image can realize various functions, such as speeding up human face living body detection and identification.

In one embodiment, the face detection can be performed on the color image first, and then the face part in the depth image or the infrared image can be directly positioned by utilizing the pixel corresponding relation, so that the face detection algorithm on the depth image or the infrared image at one time can be reduced; in one embodiment, the object detection can be performed on the color image of the previous frame, and only the depth value of the pixel or the light intensity reflected by the infrared light on the position of the face is obtained when the depth image or the infrared image is acquired in the next frame, namely only the depth image or the infrared image of the face part is output, so that the calculated amount of an extraction algorithm of the depth image or the infrared image is reduced, the data transmission bandwidth is reduced, the processing operation speed is improved, and the detection and identification efficiency is improved. Otherwise, the detection or identification of the human face living body in the color image or the infrared image can be accelerated by using the corresponding relation of the pixels. In the embodiments of the present application, the present application is not particularly limited, and any mode should be adopted as long as the mode does not deviate from the gist of the present application.

S2, detecting key points of the face of the color image to obtain initial face frames and key point information of the face;

Specifically, the color image is transmitted to a color image face detection model for face key point detection, and in one embodiment, the color image face detection model is built based on RETINAFACE face detection algorithm, and the steps include:

S20, conveying the color image to a trunk feature extraction network, and outputting the last three first effective feature layers;

In one embodiment, the backbone feature extraction network comprises a depth separable convolution (Mobilenet) model or a depth residual network (Resnet) model, preferably a Mobilenet model, with Mobilenet model being employed to reduce parameters of the model.

In one embodiment, three first effective feature layers are utilized to construct a feature map pyramid network (FPN) structure, and an effective feature fusion layer is obtained; more specifically, the convolution kernel is used for adjusting the channel number of the three first effective feature layers, up-sampling and image fusion are carried out by using the adjusted effective feature layers to realize feature fusion of the three first effective feature layers, three effective feature fusion layers with different sizes are obtained, and then the construction of the FPN structure is completed. It should be understood that the convolution kernel size of the convolution layer may be designed according to the actual situation, and is not limited herein.

S22, extracting the reinforced features of the obtained effective feature fusion layer, and outputting a second effective feature layer;

in one embodiment, three different sized active feature fusion layers are enhanced feature extraction using an SSH (SINGLE STAGE HEADLESS FACE Detector, single point headless face Detector) structure. The SSH structure includes three parallel convolution layer structures, where the three convolution layer structures may be 13×3 convolution layer, 23×3 convolution layers, and 33×3 convolution layers that are connected in parallel (i.e., one convolution layer is formed by 13×3 convolution layers, one convolution layer is formed by 23×3, and one convolution layer is formed by 3×3), which increases a receptive field (RECEPTIVE FIELD) of the convolution layers, and reduces calculation of parameters. The effective feature fusion layers are combined through a concat function after passing through three parallel convolution layer structures, so that new effective feature layers are obtained, namely, three effective feature fusion layers with different sizes can obtain three new second effective feature layers with different sizes and SSH structures through the three parallel convolution layer structures.

S23, face prediction is carried out according to the second effective feature layer, and initial face frames and face key point information are obtained.

In one embodiment, the second effective feature layer with three different sizes and having the SSH structure is equivalent to dividing the whole color image into grids with different sizes, each grid includes two prior frames, each prior frame represents a certain area on the color image, face detection is performed on each prior frame, the probability that whether the prior frame includes a face is predicted by setting the threshold of the confidence level to be 0.5, and the probability is compared with the threshold, if the probability of the prior frame is greater than the threshold, the prior frame includes the face, namely the initial face frame. It should be understood that the threshold of the confidence level may be specifically set according to the actual situation, and is not limited herein.

Further, the prior frames are adjusted to obtain the face key points, it should be understood that the face key points can include 98 or 106 face key points, the face key points can be designed according to practical situations, each face key point needs two adjustment parameters, and the x-axis and the y-axis of the center of each prior frame are adjusted to obtain the coordinates of the face key points.

It should be appreciated that the color image face detection model is not limited to RETINAFACE face detection algorithms, MTCNN, etc., and is not limited herein.

S3, preprocessing the image acquired based on the step S1 based on the initial face frame and the face key point information acquired in the step S2, and cutting the preprocessed image to acquire an effective face image;

In one embodiment, the preprocessed color image, infrared image and depth image are cut to obtain a first effective face image, a second effective face image and a third effective face image, respectively.

In one embodiment, the preprocessing includes the steps of:

S30, calculating the distance between two key points according to the coordinate information of the key points of the human face obtained in the step S2, and judging whether the calculated distance is within the preset distance range of the two key points or not, so that some human face images which do not accord with the preset size are filtered; in the embodiment of the invention, key points at two pupil positions are selected, the interpupillary distance is calculated, whether the interpupillary distance is within a preset interpupillary distance range is judged, so as to filter a plurality of face images which do not accord with the preset size, if the two face images are acquired within the preset interpupillary distance, a color effective face image is acquired, the next step is carried out, and otherwise, the image is filtered;

S31, carrying out face contour detection and average brightness calculation on the corresponding infrared image of the color effective face image obtained in the step S30, judging whether the infrared image corresponding to the color effective face image contains a complete face contour and a living body, judging whether the image has a false head sleeve or a head model and a screen attack, if not, obtaining the infrared effective face image, and carrying out the next step; otherwise filtering the image;

S32, based on the face key point information obtained in the step S2, obtaining depth values of corresponding face key points in the depth image corresponding to the infrared effective face image obtained in the step S31, judging whether the face key points are in a preset effective range or not and whether the relative distribution relation of the depth values accords with preset depth distribution, if so, obtaining a final effective processing image, and carrying out the next step; otherwise the image is filtered. It should be understood that the pretreatment steps described above may be performed in series or in parallel, and are not limited in this regard.

S33, cutting the color image, the infrared image and the depth image based on the coordinate information of the face key points obtained in the step S2 according to the effective processing image obtained in the preprocessing step, and respectively obtaining a first effective face image, a second effective face image and a third effective face image.

More specifically, clipping the preprocessed color image to obtain a face image in the color image, wherein the face image is a first effective face image; the color image, the infrared image and the depth image are aligned images, so that the face images of the infrared image and the depth image can be cut according to the same principle of the face key points obtained by the color image, and a second effective face image and a third effective face image are respectively obtained.

S4, converting and/or normalizing the effective face images, and superposing the converted and/or normalized effective face images;

Specifically, the first effective face image is converted into a YCrCb face image, normalization processing is carried out on the second effective face image and the third effective face image, and the normalized second effective face image, the normalized third effective face image and the normalized YCrCb face image are overlapped to obtain a fourth effective face image.

Because the skin color of the human face in the color image is greatly influenced by brightness, so that confusion is easy to occur during human face living detection, but textures of living bodies and non-living bodies in the YCrCb image are obviously distinguished, in one embodiment, a first effective human face image obtained by cutting the color image can be converted into the YCrCb human face image, wherein a Y channel represents the brightness value of the color image, a Cr channel reflects the difference between the red part and the brightness value of the color image, and a Cb channel represents the difference between the blue part and the brightness value of the color image; normalizing the second effective face image to an 8-bit bitmap having 1 channel that exhibits a 256-level change from black to white; normalizing the third effective face image into a 16-bit bitmap with 2 channels, representing values between 0 and 65535, and measuring the maximum value to 65 meters; and superposing the converted or normalized effective face images to obtain a six-channel image, namely a fourth effective face image.

S5, respectively conveying the superimposed images to at least three neural networks with different architectures to extract different face features, combining the different face features, acquiring a combined feature map, performing living body judgment, and outputting a living body judgment result.

In one embodiment, the fourth effective face image obtained in step S4 is input into at least three neural networks with different structures to extract different face features, and three feature images with different features are obtained and combined, and the combined feature images are output, and living body judgment is performed and a judgment result is output.

More specifically, step S5 includes:

S50, respectively inputting the superimposed images into at least three neural networks with different frameworks, and extracting the characteristics of different characteristic parts of the effective face image to obtain characteristic diagrams of the different characteristic parts;

In one embodiment, the different features include key points of the face such as eyes, nose, corners of mouth, ears, etc.; preferably, according to the face key point information acquired in the step S2, the VGG network is used to perform feature extraction on the eyes of the fourth effective face image to acquire an eye feature map, geoglenet is used to perform feature extraction on the nose to acquire a nose feature map, and the residual network is used to perform feature extraction on the mouth angle to acquire a mouth angle feature map. It should be understood that the neural networks of different architectures are not limited to the three networks, but may be SqueezeNet or ResNeXt networks, and feature extraction may be performed for any key point of the face, which is not limited herein.

S51, combining the feature graphs of different feature parts to obtain a combined feature graph; the combined feature map is connected through the full connection layer to generate a feature vector with the same dimension as the number of neurons of the full connection layer;

Specifically, three feature maps of an eye feature map, a nose feature map and a mouth angle feature map are obtained and combined, the combined feature map is input into a full-connection layer, wherein the full-connection layer is composed of a plurality of neurons and is connected with the last convolution layer of each neural network, and the combined feature map is connected by the full-connection layer to generate a feature vector with the same dimension as the number of the neurons.

In one embodiment, in order to improve accuracy of face living body detection, random erasure (Random Erasing) may be performed on the combined feature map obtained in step S51, that is, a rectangular frame (Patch) with a fixed size is arbitrarily selected in a spatial dimension of the combined feature map, and pixel values in the rectangular frame are all set to zero, so as to achieve data enhancement of an image and improve robustness of a model.

S52, through logistic regression, the feature vectors obtained through the full-connection layer are input into the soft-max layer on average, and living body judgment is carried out;

in one embodiment, the soft-max layer includes two neurons, which respectively correspond to probability distribution of the face image on two classifications of living body and non-living body, the probability threshold value of the face image is preset to be 0.6, if the probability detected by the face detection model is greater than the threshold value, the face in the face image is the living body, and the judging result of the living body is output; if the threshold value is smaller, the operation is ended. It should be understood that the living body probability set in advance may be set according to actual conditions, and is not limited herein.

Fig. 2 is a schematic diagram of a multi-mode face living body detection system according to an embodiment of the present invention, where the system 200 includes an acquisition camera 201, an image acquisition module 202, an image detection module 203, an image clipping module 204, an image fusion module 205, and a living body judgment module 206; wherein, the acquisition camera 201 is used for acquiring color images, infrared images and depth images; the image registration module 202 is configured to acquire a color image, an infrared image, and a depth image acquired by the acquisition camera 201, and register the color image, the infrared image, and the depth image; the image detection module 203 includes a color face detection unit 2031 and an image preprocessing unit 2032, where the color face detection unit 2031 is configured to detect a face and obtain initial face frame and face key point information; the image preprocessing unit 2032 is configured to process the initial face frame and the face key point information to obtain an effectively processed image, so as to reject images that do not meet a preset requirement; the image clipping module 204 is configured to clip a color image, an infrared image, and a depth image corresponding to the effectively processed image, so as to obtain different effective face images; the image fusion module 205 is configured to convert and/or normalize the effective face image, and superimpose the converted and/or normalized effective face image to obtain a fused image; the living body judging module 206 includes at least three different neural network architectures, performs different feature extraction on the fused image by constructing different neural network architectures to obtain different feature images, combines the different feature images to obtain a combined feature image, and performs living body judgment according to the combined image to output a living body judging result.

In some embodiments, the image preprocessing unit 2032 calculates the distance between two key points according to the coordinate information of the key points of the face, and determines whether the calculated distance is within the preset distance range of the two key points, so as to filter out some face images which do not conform to the preset size, so as to obtain a color valid face image.

In some embodiments, the image preprocessing unit 2032 is further configured to perform face contour detection and average brightness calculation on an infrared image corresponding to the color effective face image, and determine whether the infrared image corresponding to the color effective face image includes a complete face contour and a living body, so as to obtain the infrared effective face image.

In some embodiments, the image preprocessing unit 2032 obtains depth values of corresponding face key points in the depth image corresponding to the infrared effective face image based on the face key point information, and determines whether the face key points are within a preset effective range and whether a relative distribution relationship of the depth values conforms to a preset depth distribution, so as to obtain an effective processing image.

In some embodiments, the image cropping module 204 crops the color image, the infrared image, and the depth image corresponding to the effective processing image to obtain a first effective face image, a second effective face image, and a third effective face image.

In some embodiments, the image fusion module 205 converts the first effective face image into a YCrCb face image, normalizes the second effective face image and the third effective face image, and superimposes the normalized second effective face image, third effective face image and YCrCb face image to obtain a fused image.

According to the invention, the color image, the infrared image and the depth image are subjected to information fusion, and the multi-mode human face living body detection is adopted, so that the interference of human face-like information is eliminated, the accuracy of human face living body detection is improved, the robustness and generalization capability of a human face detection algorithm are improved, meanwhile, the user does not need to make corresponding cooperation actions during detection, and the experience effect of the user is optimized.

The invention also provides a computer readable storage medium, wherein the computer scale storage medium stores a computer program, and the computer program realizes the multi-mode human face living body detection method of the embodiment scheme when being executed by a processor. The storage medium may be implemented by any type of volatile or non-volatile storage device, or combination thereof.

Embodiments of the invention may include or utilize a special purpose or general-purpose computer including computer hardware, as discussed in greater detail below. Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. The computer-readable medium storing the computer-executable instructions is a physical storage medium. The computer-readable medium carrying computer-executable instructions is a transmission medium. Thus, by way of example, and not limitation, embodiments of the invention may comprise at least two distinct computer-readable media: physical computer readable storage media and transmission computer readable media.

The embodiment of the application also provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor at least realizes the multi-mode face living body detection method in the scheme of the embodiment when executing the computer program.

It is to be understood that the foregoing is a further detailed description of the invention in connection with specific/preferred embodiments, and that the invention is not to be considered as limited to such description. It will be apparent to those skilled in the art that several alternatives or modifications can be made to the described embodiments without departing from the spirit of the invention, and these alternatives or modifications should be considered to be within the scope of the invention. In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "preferred embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention.

In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction. Although embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope as defined by the appended claims.

Furthermore, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. Those of ordinary skill in the art will readily appreciate that the above-described disclosures, procedures, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.

Claims

1. The multi-mode human face living body detection method is characterized by comprising the following steps of:

s33, cutting the color image, the infrared image and the depth image based on the coordinate information of the face key points according to the effective processing image to respectively obtain a first effective face image, a second effective face image and a third effective face image;

converting the first effective face image into a YCrCb face image, carrying out normalization processing on the second effective face image and the third effective face image, and superposing the normalized second effective face image, the normalized third effective face image and the YCrCb face image to obtain a superposed image;

2. The multi-modality face in-vivo detection method of claim 1, wherein step S2 includes:

3. The multi-modality face in-vivo detection method of claim 1, wherein step S5 includes:

4. A multi-modality face in-vivo detection system, comprising: the system comprises an acquisition camera, an image registration module, an image detection module, an image clipping module, an image fusion module and a living body judgment module; wherein,

The living body judging module comprises at least three different neural network architectures, different characteristic extraction is carried out on the fused image by constructing different neural network architectures to obtain different characteristic diagrams, the different characteristic diagrams are combined to obtain combined characteristic diagrams, and living body judgment is carried out according to the combined characteristic diagrams;

The image preprocessing unit calculates the distance between two key points according to the coordinate information of the key points of the human face, and judges whether the calculated distance is within the preset distance range of the two key points so as to obtain a color effective human face image;

The image preprocessing unit is also used for carrying out face contour detection and average brightness calculation on the corresponding infrared image of the colorful effective face image, and judging whether the infrared image corresponding to the colorful effective face image contains a complete face contour and a living body or not so as to obtain the infrared effective face image;

The image preprocessing unit acquires depth values of corresponding face key points in a depth image corresponding to the infrared effective face image based on the face key point information, and judges whether the face key points are in a preset effective range or not and whether the relative distribution relation of the depth values accords with preset depth distribution or not so as to acquire an effective processing image;

The image clipping module clips the color image, the infrared image and the depth image corresponding to the effective processing image to obtain a first effective face image, a second effective face image and a third effective face image;

the image fusion module converts the first effective face image into a YCrCb face image, performs normalization processing on the second effective face image and the third effective face image, and superimposes the normalized second effective face image, the normalized third effective face image and the YCrCb face image to obtain the fusion image.