CN112487922A

CN112487922A - Multi-mode face in-vivo detection method and system

Info

Publication number: CN112487922A
Application number: CN202011339311.2A
Authority: CN
Inventors: 辛冠希; 高通; 钱贝贝; 黄源浩; 肖振中
Original assignee: Orbbec Inc
Current assignee: Orbbec Inc
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2021-03-12

Abstract

The invention discloses a multi-mode face in-vivo detection method and a system, which comprise the following steps: s1, collecting a color image, an infrared image and a depth image of the target area, and registering; s2, detecting the key points of the face according to the color image to obtain an initial face frame and key point information of the face; s3, preprocessing the image based on the initial face frame and the face key point information, and cutting the preprocessed image to obtain an effective face image; s4, converting and/or normalizing the effective facial images, and superposing the converted and/or normalized effective facial images; s5, transmitting the superposed images to neural networks of different architectures to extract different face features, combining the different face features and carrying out living body judgment; the method and the device improve the accuracy of the face living body detection, improve the robustness and generalization capability of a face detection algorithm, and simultaneously, a user does not need to make corresponding coordination actions during detection, thereby optimizing the experience effect of the user.

Description

Multi-mode face in-vivo detection method and system

Technical Field

The invention relates to the technical field of digital image processing, in particular to a multi-modal face in-vivo detection method and system.

Background

With the development of technologies such as electronic commerce and the like, identity verification based on human faces is widely applied, wherein the identity verification based on human faces is mainly realized through a human face recognition technology, the human face recognition technology greatly improves the living convenience of people, meanwhile, the safety problem is gradually exposed, living body detection is carried out by adopting a single mode, and the algorithm robustness and the generalization capability are not high. Therefore, human face living body anti-counterfeiting technology draws wide attention.

The existing living body detection technology generally adopts a matching method, namely, the actions of nodding the head, shaking the head, blinking, opening the mouth and the like of a human face are detected, and a user needs to make corresponding actions in a matching way; if a video of a corresponding action (such as nodding the head, shaking the head, blinking, opening the mouth, etc.) is shot in advance or a halter sleeve is worn, the living body is also detected, so that the counterfeiter can easily pass the living body detection. In summary, the existing in-vivo detection technology is not only poor in user experience effect, but also poor in safety, and is easily broken through by illegal users.

The above background disclosure is only for the purpose of assisting understanding of the inventive concept and technical solutions of the present invention, and does not necessarily belong to the prior art of the present patent application, and should not be used for evaluating the novelty and inventive step of the present application in the case that there is no clear evidence that the above content is disclosed at the filing date of the present patent application.

Disclosure of Invention

The present invention is directed to a method and a system for multi-modal in-vivo human face detection, so as to solve at least one of the above problems in the related art.

In order to achieve the above purpose, the technical solution of the embodiment of the present invention is realized as follows:

a multi-mode face in-vivo detection method comprises the following steps:

s1, collecting a color image, an infrared image and a depth image of the target area, and registering;

s2, carrying out face key point detection on the color image to obtain an initial face frame and face key point information;

s3, preprocessing the collected image based on the initial face frame and the face key point information, and cutting the preprocessed image to obtain an effective face image;

s4, converting and/or normalizing the effective facial images, and superposing the converted and/or normalized effective facial images;

and S5, respectively transmitting the superposed images to at least three neural networks with different architectures to extract different face features, combining the different face features, acquiring a combined feature image and performing living body judgment.

In some embodiments, step S2 includes:

s20, transmitting the color image to a main feature extraction network, and outputting three first effective feature layers;

s21, processing the three first effective feature layers to obtain effective feature fusion layers;

s22, performing reinforced feature extraction on the effective feature fusion layer, and outputting a second effective feature layer;

and S23, performing face prediction according to the second effective characteristic layer to obtain the initial face frame and the face key point information.

In some embodiments, step S3 includes:

s30, calculating the distance between two key points according to the coordinate information of the key points of the human face, judging whether the calculated distance is within the preset distance range of the two key points, filtering out the human face image which does not accord with the preset size, and obtaining a colorful effective human face image;

s31, carrying out face contour detection and average brightness calculation on the infrared image corresponding to the color effective face image, and judging whether the infrared image contains a complete face contour and a living body so as to obtain an infrared effective face image;

s32, acquiring depth values of corresponding face key points in a depth image corresponding to the infrared effective face image based on the face key point information, and judging whether the face key points are in a preset effective range and whether the relative distribution relation of the depth values accords with preset depth distribution so as to acquire a final effective processing image;

and S33, cutting the color image, the infrared image and the depth image based on the coordinate information of the key points of the face according to the effective processing image to respectively obtain a first effective face image, a second effective face image and a third effective face image.

In some embodiments, step S5 includes:

s50, respectively inputting the superposed images into at least three neural networks with different architectures, and performing feature extraction on different feature parts of the effective face image;

s51, merging the feature maps of the different feature parts to obtain a merged feature map, and connecting the merged feature map through a full-connection layer to generate a feature vector with the same dimension as the neuron number;

and S52, averagely inputting the feature vectors acquired through the full-connection layer into a soft-max layer through logistic regression, and performing living body judgment.

In some embodiments, in step S4, the first effective face image is converted into a YCrCb face image, the second effective face image and the third effective face image are normalized, and the normalized second effective face image, third effective face image and YCrCb face image are superimposed to obtain the superimposed image.

The technical scheme of another embodiment of the invention is as follows:

a multi-modal human face in-vivo detection system comprises an acquisition camera, an image registration module, an image detection module, an image cutting module, an image fusion module and an in-vivo judgment module; wherein the content of the first and second substances,

the acquisition camera is used for acquiring a color image, an infrared image and a depth image;

the image registration module is used for acquiring the color image, the infrared image and the depth image acquired by the acquisition camera and registering the color image, the infrared image and the depth image;

the image detection module comprises a colorful face detection unit and an image preprocessing unit; the color face detection unit is used for detecting a face and acquiring an initial face frame and face key point information; the image preprocessing unit is used for processing the initial face frame and the face key point information to obtain an effective processing image;

the image cutting module is used for cutting the color image, the infrared image and the depth image corresponding to the effective processing image to obtain different effective face images;

the image fusion module is used for converting and/or normalizing the effective face images and superposing the converted and/or normalized effective face images to obtain a fusion image;

the living body judgment module comprises at least three different neural network architectures, different feature extraction is carried out on the fused image by building different neural network architectures to obtain different feature maps, the different feature maps are combined to obtain a combined feature map, and living body judgment is carried out according to the combined feature map.

In some embodiments, the image preprocessing unit calculates a distance between two key points according to the coordinate information of the face key points, and determines whether the calculated distance is within a preset distance range of the two key points, so as to obtain a color effective face image.

In some embodiments, the image preprocessing unit is further configured to perform face contour detection and average brightness calculation on an infrared image corresponding to the effective color face image, and determine whether the infrared image corresponding to the effective color face image includes a complete face contour and a living body, so as to obtain an effective infrared face image.

In some embodiments, the image preprocessing unit obtains depth values of face key points corresponding to the depth image corresponding to the infrared effective face image based on the face key point information, and determines whether the face key points are within a preset effective range and whether a relative distribution relationship of the depth values conforms to a preset depth distribution, so as to obtain an effective processed image.

In some embodiments, the image cropping module crops a color image, an infrared image and a depth image corresponding to the effective processing image to obtain a first effective face image, a second effective face image and a third effective face image; the image fusion module converts the first effective face image into a YCrCb face image, normalizes the second effective face image and the third effective face image, and superposes the normalized second effective face image, the normalized third effective face image and the YCrCb face image to obtain the fusion image.

The technical scheme of the invention has the beneficial effects that:

compared with the prior art, the method and the device have the advantages that the color image, the infrared image and the depth image are subjected to information fusion, the interference of information similar to the face is eliminated through multi-mode face in-vivo detection, the face in-vivo detection accuracy is improved, the robustness and the generalization capability of a face detection algorithm are improved, meanwhile, a user does not need to make corresponding matching actions during detection, and the user experience effect is optimized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart illustration of a multi-modal face liveness detection method according to an embodiment of the invention.

FIG. 2 is a schematic diagram of a multi-modal in-vivo face detection system according to another embodiment of the invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the embodiments of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It should be noted that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present invention, "a plurality" means two or more unless specifically limited otherwise.

Fig. 1 is a schematic flow chart of a multi-modal face in-vivo detection method according to an embodiment of the present invention, and the method includes the following steps:

in one embodiment, a color image, an infrared image, and a depth image of the target area are acquired by an acquisition device. Wherein the acquisition device may be a depth camera based on structured light, binocular, TOF (time of flight algorithm) technology. Preferably, the acquisition device comprises a structured light depth camera and a color camera for acquiring depth images, infrared images and color images. The acquisition frequencies of the depth image, the infrared image and the color image may be the same or different, and corresponding settings are performed according to specific functional requirements, for example, the depth image, the infrared image and the color image are acquired at a frequency of 60FPS in a crossed manner, or the depth image, the infrared image and the color image of 30FPS are acquired respectively.

In one embodiment, the color image, the infrared image and the depth image collected by the collecting device are registered, that is, the corresponding relation among the pixels in the depth image, the infrared image and the color image is found through a registration algorithm, so as to eliminate the parallax caused by the difference of the spatial positions among the color image, the infrared image and the depth image. It should be noted that the registration may be performed by a dedicated processor in the acquisition device or by an external processor. The registered depth image, infrared image and color image can realize multiple functions, such as accelerating human face living body detection and identification.

In one embodiment, the face detection can be performed on the color image, and then the face part in the depth image or the infrared image can be directly positioned by utilizing the corresponding relation of pixels, so that the face detection algorithm for the depth image or the infrared image can be reduced once; in one embodiment, object detection may be performed on a color image of a previous frame, and when a depth image or an infrared image is acquired in a next frame, only a depth value of a pixel at a position where a human face is located or light intensity reflected by infrared light is acquired, that is, only a depth image or an infrared image of a human face portion is output, so that a calculation amount of an extraction algorithm of the depth image or the infrared image is reduced, and meanwhile, a data transmission bandwidth is reduced, so that a processing operation speed is increased, and a detection and identification efficiency is improved. On the contrary, the face living body detection or identification can be performed on the depth image firstly, and then the detection or identification of the face living body in the color image or the infrared image is accelerated by utilizing the corresponding relation of pixels. In the embodiments of the present invention, no particular limitation is imposed on the embodiment, and any form may be adopted without departing from the gist of the present invention, and the present invention shall fall within the scope of protection of the present application.

specifically, a color image is transmitted to a color image face detection model for face key point detection, in one embodiment, the color image face detection model is built based on a Retinaface face detection algorithm, and the steps include:

s20, transmitting the color image to a main feature extraction network, and outputting the last three first effective feature layers;

in one embodiment, the stem feature extraction network comprises a depth separable convolution (mobilene) model or a depth residual error network (Resnet) model, preferably a mobilene model, with which parameters of the model can be reduced.

in one embodiment, three first effective feature layers are used for constructing a feature map pyramid network (FPN) structure, and an effective feature fusion layer is obtained; more specifically, the number of channels of the three first effective feature layers is adjusted by using a convolution kernel which is a convolution layer of 1 × 1, and the adjusted effective feature layers are used for performing up-sampling and image fusion to realize feature fusion of the three first effective feature layers, so that three effective feature fusion layers with different sizes are obtained, and further the construction of the FPN structure is completed. It should be understood that the convolution kernel size of the convolutional layer can be designed according to practical situations, and is not limited herein.

S22, performing reinforced feature extraction on the obtained effective feature fusion layer, and outputting a second effective feature layer;

in one embodiment, a Single Stage assisted Face Detector (SSH) structure is used to perform enhanced feature extraction on three different sizes of valid feature fusion layers. The SSH structure comprises three parallel convolutional layers, wherein the three convolutional layers can be respectively 1 3 × 3 convolutional layer, 23 × 3 convolutional layers and 3 × 3 convolutional layers which are connected in parallel (namely, one convolutional layer is formed by 1 3 × 3 convolutional layer, one convolutional layer is formed by 23 × 3 layers, and one convolutional layer is formed by 3 × 3 layers), the sensing field (reliable field) of the convolutional layers is increased, and the calculation of parameters is reduced. After passing through the three parallel convolutional layers, the effective feature fusion layers are merged through a concat function to obtain new effective feature layers, that is, three effective feature fusion layers with different sizes can obtain three new second effective feature layers with SSH structures and different sizes through the three parallel convolutional layers.

And S23, performing face prediction according to the second effective characteristic layer to obtain an initial face frame and face key point information.

In one embodiment, the three second effective feature layers with different sizes and with an SSH structure are equivalent to dividing the whole color image into grids with different sizes, each grid includes two prior frames, each prior frame represents a certain area on the color image, face detection is performed on each prior frame, the probability of whether a prior frame includes a face is predicted by setting the threshold of confidence to 0.5, and the probability is compared with the threshold, and if the probability of a prior frame is greater than the threshold, the prior frame includes the face, which is the initial face frame. It should be understood that the threshold of the confidence level may be specifically set according to actual situations, and is not limited herein.

Further, the prior frames are adjusted to obtain the face key points, it should be understood that the number of the face key points may include 98 or 106, and the face key points may be designed according to the actual situation, and each face key point needs two adjustment parameters, and the x-axis and the y-axis of the center of each prior frame are adjusted to obtain the coordinates of the face key points.

It should be understood that the color image face detection model is not limited to the Retinaface face detection algorithm, but may be MTCNN, etc., and is not limited herein.

S3, preprocessing the image acquired based on the step S1 based on the initial face frame and the face key point information acquired in the step S2, and cutting the preprocessed image to acquire an effective face image;

in one embodiment, the preprocessed color image, infrared image and depth image are cut to obtain a first effective face image, a second effective face image and a third effective face image respectively.

In one embodiment, the pre-processing comprises the steps of:

s30, calculating the distance between two key points according to the coordinate information of the key points of the human face obtained in the step S2, and judging whether the calculated distance is within the preset distance range of the two key points, so as to filter out human face images which do not accord with the preset size; in the embodiment of the invention, key points at two pupil positions are selected, the distance between two pupils is calculated, whether the interpupillary distance is within the range of the preset interpupillary distance is judged, so as to filter some face images which do not accord with the preset size, if the face images are obtained within the preset interpupillary distance, the color effective face images are obtained, the next step is carried out, otherwise, the images are filtered;

s31, carrying out face contour detection and average brightness calculation on the infrared image corresponding to the effective color face image obtained in the step S30, and judging whether the infrared image corresponding to the effective color face image contains a complete face contour and a living body, so that whether the image has a false head cover or a head mold and screen attack can be judged, if not, obtaining the effective infrared face image, and carrying out the next step; otherwise, filtering the image;

s32, acquiring depth values of corresponding face key points in the depth image corresponding to the infrared effective face image acquired in the step S31 based on the face key point information acquired in the step S2, judging whether the face key points are in a preset effective range and whether the relative distribution relation of the depth values accords with preset depth distribution, and if the face key points accord with the preset depth distribution, acquiring a final effective processing image and carrying out the next step; otherwise, the image is filtered. It should be understood that the above pre-treatment steps may be performed in series or in parallel, and are not limited thereto.

And S33, cutting the color image, the infrared image and the depth image according to the effective processing image obtained in the preprocessing step and based on the coordinate information of the key points of the human face obtained in the step S2 to respectively obtain a first effective human face image, a second effective human face image and a third effective human face image.

More specifically, cutting the preprocessed color image to obtain a face image in the color image, wherein the face image is a first effective face image; the color image, the infrared image and the depth image are aligned images, so that the face images of the infrared image and the depth image can be cut according to the same principle of the face key points acquired by the color image, and a second effective face image and a third effective face image are respectively obtained.

specifically, the first effective face image is converted into a YCrCb face image, the second effective face image and the third effective face image are subjected to normalization processing, and the normalized second effective face image, third effective face image and YCrCb face image are superposed to obtain a fourth effective face image.

Since the skin color of the face in the color image is greatly influenced by the brightness, so that confusion is easy to occur during face live body detection, but the textures of the live body and the non-live body in the YCrCb image are obviously different, in one embodiment, a first effective face image obtained by cutting out the color image can be converted into the YCrCb face image, wherein a Y channel represents the brightness value of the color image, a Cr channel reflects the difference between the red part and the brightness value of the color image, and a Cb channel represents the difference between the blue part and the brightness value of the color image; normalizing the second valid face image to an 8-bit bitmap having 1 channel representing a change of 256 levels of tone from black to white; normalizing the third effective face image to be a 16-bit bitmap, wherein the bitmap has 2 channels and represents a numerical value between 0 and 65535, and the maximum measurement value is 65 meters; and superposing the converted or normalized effective face images to obtain a six-channel image, namely a fourth effective face image.

And S5, respectively transmitting the superposed images to at least three neural networks with different architectures to extract different face features, combining the different face features, acquiring a combined feature diagram, performing living body judgment, and outputting a living body judgment result.

In one embodiment, the fourth effective facial image obtained in step S4 is input to at least three neural networks with different architectures to extract different facial features, three feature maps with different features are obtained and merged, the merged feature map is output, living body judgment is performed, and a judgment result is output.

More specifically, step S5 includes:

s50, respectively inputting the superposed images into at least three neural networks with different architectures, and performing feature extraction on different feature parts of the effective face image to obtain feature maps of the different feature parts;

in one embodiment, the different feature portions include key points of the face, such as eyes, nose, mouth corners, ears, etc.; preferably, according to the face key point information obtained in step S2, the VGG network may be used to perform feature extraction on the eyes of the fourth effective face image to obtain an eye feature map, the Geoglenet may be used to perform feature extraction on the nose to obtain a nose feature map, and the residual error network may be used to perform feature extraction on the mouth corner to obtain a mouth corner feature map. It should be understood that the neural networks with different architectures are not limited to the above three networks, and may also be SqueezeNet or resenext networks, and may perform feature extraction for any key point of a human face, and is not limited herein.

S51, merging the feature maps of different feature parts to obtain a merged feature map; connecting the combined characteristic graphs through the full-connection layer to generate a characteristic vector with the same dimensionality as the neuron quantity of the full-connection layer;

specifically, three feature maps of the obtained eye feature map, the nose feature map and the mouth corner feature map are combined to obtain a combined feature map, the combined feature map is input into a full-connection layer, wherein the full-connection layer is composed of a plurality of neurons and is connected with the last convolution layer of each neural network, and the combined feature map is connected by the full-connection layer to generate a feature vector with the same dimension as the number of the neurons.

In an embodiment, in order to improve the accuracy of face live detection, Random Erasing (Random Erasing) may be performed on the merged feature map obtained in step S51, that is, a rectangular frame (Patch) with a fixed size is arbitrarily selected in the spatial dimension of the merged feature map, and all pixel values in the rectangular frame are set to zero, so as to realize data enhancement of an image and improve the robustness of a model.

S52, averagely inputting the feature vectors acquired by the full-connection layer to a soft-max layer through logistic regression, and judging the living body;

in one embodiment, the soft-max layer comprises two neurons, the two neurons correspond to the probability distribution of the face image on two categories of a living body and a non-living body respectively, the probability threshold value preset on the living body is 0.6, if the probability detected by the face detection model is greater than the threshold value, the face in the face image is the living body, and a living body judgment result is output; if the threshold value is less than the threshold value, the operation is ended. It should be understood that the preset living body probability can be set according to practical situations, and is not limited herein.

Fig. 2 is a schematic diagram of a multi-modal human face in-vivo detection system according to an embodiment of the present invention, where the system 200 includes an acquisition camera 201, an image acquisition module 202, an image detection module 203, an image cropping module 204, an image fusion module 205, and an in-vivo judgment module 206; the collecting camera 201 is used for collecting color images, infrared images and depth images; the image registration module 202 is configured to acquire a color image, an infrared image, and a depth image acquired by the acquisition camera 201, and register the color image, the infrared image, and the depth image; the image detection module 203 comprises a color face detection unit 2031 and an image preprocessing unit 2032, wherein the color face detection unit 2031 is configured to detect a face and obtain an initial face frame and face key point information; the image preprocessing unit 2032 is configured to process the initial face frame and the face key point information to obtain an effective processed image, so as to reject images that do not meet preset requirements; the image cutting module 204 is configured to cut a color image, an infrared image, and a depth image corresponding to the effective processing image to obtain different effective face images; the image fusion module 205 is configured to convert and/or normalize the effective face images, and superimpose the converted and/or normalized effective face images to obtain a fusion image; the living body judgment module 206 includes at least three different neural network architectures, extracts different features of the fused image by building different neural network architectures to obtain different feature maps, merges the different feature maps to obtain a merged feature map, and performs living body judgment according to the merged image to output a living body judgment result.

In some embodiments, the image preprocessing unit 2032 calculates distances between two key points according to the coordinate information of the key points of the face, and determines whether the calculated distances are within a preset distance range of the two key points, so as to filter out some face images that do not conform to a preset size, and obtain a color effective face image.

In some embodiments, the image preprocessing unit 2032 is further configured to perform face contour detection and average brightness calculation on the infrared image corresponding to the effective color face image, and determine whether the infrared image corresponding to the effective color face image contains a complete face contour and a living body, so as to obtain an effective infrared face image.

In some embodiments, the image preprocessing unit 2032 obtains depth values of corresponding face key points in the depth image corresponding to the infrared effective face image based on the face key point information, and determines whether the face key points are within a preset effective range and whether a relative distribution relationship of the depth values conforms to a preset depth distribution, so as to obtain an effective processed image.

In some embodiments, the image cropping module 204 crops the color image, the infrared image, and the depth image corresponding to the effective processing image to obtain a first effective face image, a second effective face image, and a third effective face image.

In some embodiments, the image fusion module 205 converts the first effective face image into a YCrCb face image, normalizes the second effective face image and the third effective face image, and superimposes the normalized second effective face image, third effective face image, and YCrCb face image to obtain a fusion image.

According to the invention, the color image, the infrared image and the depth image are subjected to information fusion, and the interference of information similar to the face is eliminated through multi-mode face in-vivo detection, so that the face in-vivo detection accuracy is improved, the robustness and the generalization capability of a face detection algorithm are improved, and meanwhile, a user does not need to make corresponding coordination action during detection, so that the user experience effect is optimized.

The invention also provides a computer readable storage medium, and a computer program is stored in the computer readable storage medium, and when the computer program is executed by a processor, the multi-modal human face living body detection method of the embodiment is realized. The storage medium may be implemented by any type of volatile or non-volatile storage device, or combination thereof.

Embodiments of the present invention may comprise or utilize a special purpose or general-purpose computer including computer hardware, as discussed in greater detail below. Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. The computer-readable medium storing the computer-executable instructions is a physical storage medium. Computer-readable media carrying computer-executable instructions are transmission media. Thus, by way of example, and not limitation, embodiments of the invention can include at least two distinct computer-readable media: physical computer-readable storage media and transmission computer-readable media.

The embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, at least implements the multimodal living human face detection method in the foregoing embodiment.

It is to be understood that the foregoing is a more detailed description of the invention, and that specific embodiments are not to be considered as limiting the invention. It will be apparent to those skilled in the art that various substitutions and modifications can be made to the described embodiments without departing from the spirit of the invention, and these substitutions and modifications should be considered to fall within the scope of the invention. In the description herein, references to the description of the term "one embodiment," "some embodiments," "preferred embodiments," "an example," "a specific example," or "some examples" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention.

In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. Although embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the invention as defined by the appended claims.

Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. One of ordinary skill in the art will readily appreciate that the above-disclosed, presently existing or later to be developed, processes, machines, manufacture, compositions of matter, means, methods, or steps, that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.

Claims

1. A multi-modal face in-vivo detection method is characterized by comprising the following steps:

2. The multi-modal face in-vivo detection method according to claim 1, wherein the step S2 comprises:

3. The multi-modal in-vivo face detection method of claim 1, wherein: step S3 includes:

4. The multi-modal face in-vivo detection method according to claim 1, wherein the step S5 comprises:

5. The multi-modal in-vivo face detection method of claim 3, wherein: in step S4, the first effective face image is converted into a YCrCb face image, the second effective face image and the third effective face image are normalized, and the normalized second effective face image, third effective face image and YCrCb face image are superimposed to obtain the superimposed image.

6. A multi-modal face liveness detection system, comprising: the system comprises an acquisition camera, an image registration module, an image detection module, an image cutting module, an image fusion module and a living body judgment module; wherein the content of the first and second substances,

7. The multi-modal in-vivo face detection system of claim 6, wherein: and the image preprocessing unit calculates the distance between two key points according to the coordinate information of the key points of the human face and judges whether the calculated distance is within the preset distance range of the two key points so as to obtain a colorful effective human face image.

8. The multi-modal in-vivo face detection system of claim 7, wherein: the image preprocessing unit is also used for carrying out face contour detection and average brightness calculation on the infrared image corresponding to the effective color face image and judging whether the infrared image corresponding to the effective color face image contains a complete face contour and a living body so as to obtain the effective infrared face image.

9. The multi-modal in-vivo face detection system of claim 8, wherein: the image preprocessing unit acquires the depth value of the corresponding face key point in the depth image corresponding to the infrared effective face image based on the face key point information, and judges whether the face key point is in a preset effective range or not and whether the relative distribution relation of the depth values accords with preset depth distribution or not so as to acquire an effective processing image.

10. The multi-modal in-vivo face detection system of claim 9, wherein:

the image cutting module cuts the color image, the infrared image and the depth image corresponding to the effective processing image to obtain a first effective face image, a second effective face image and a third effective face image;

the image fusion module converts the first effective face image into a YCrCb face image, normalizes the second effective face image and the third effective face image, and superposes the normalized second effective face image, the normalized third effective face image and the YCrCb face image to obtain the fusion image.