CN113496215A

CN113496215A - Method and device for detecting human face of living body and electronic equipment

Info

Publication number: CN113496215A
Application number: CN202110766966.6A
Authority: CN
Inventors: 徐佳文
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-07-07
Filing date: 2021-07-07
Publication date: 2021-10-12

Abstract

The application discloses a method, a device and electronic equipment for detecting a human face of a living body, wherein the method comprises the steps of firstly obtaining an input image collected by a monocular camera, obtaining a first image and a second image according to a human face coordinate in the input image, and then obtaining a first confidence score of the first image through a first model; and obtaining a second confidence score of the second image through the second model, comparing a final confidence score obtained by calculating the first confidence score and the second confidence score with a preset threshold value, and determining whether the face in the input image is a living face according to a comparison result. Based on the method, the accuracy of the living body detection under different false face intrusion scenes can be effectively improved.

Description

Method and device for detecting human face of living body and electronic equipment

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to a method and an apparatus for detecting a human face of a living body, and an electronic device.

Background

With the wide application of the face recognition technology in various authority systems, if a real face or a false face cannot be recognized accurately, a significant threat is caused to the personal interests of the user. The accuracy of face recognition in a false face intrusion scene is also increasingly emphasized.

The method can effectively help to identify false face intrusion through a living body detection technology, the existing living body detection technology mainly depends on a monocular camera to collect a face region of an image to extract texture features, and a deep learning model is trained to generate a living body detection result.

However, the above scheme has a problem of low accuracy of live body detection in different false face intrusion scenes.

Disclosure of Invention

The application provides a method and a device for detecting a human face in vivo and electronic equipment, which are used for improving the accuracy of the detection of the human face in vivo under different false human face intrusion scenes.

In a first aspect, the present application provides a method for detecting a human face in a living body, the method including:

obtaining a first image and a second image according to face coordinates in an input image, wherein the first image is used for representing a face image, and the second image is used for representing the face image and a part of background image;

obtaining a first confidence score of the first image according to a first model, wherein the first model is used for representing whether an object in the image is a living body or not according to image texture features, and the first confidence score is used for representing the probability that the object in the first image is the living body;

obtaining a second confidence score of the second image according to a second model, wherein the second model is used for representing whether the object in the image is a living body or not according to the shape characteristics of the image, and the second confidence score is used for representing the probability that the object in the second image is the living body;

and comparing a final confidence score obtained by calculating the first confidence score and the second confidence score with a preset threshold value, and determining whether the face in the input image is a living face according to a comparison result.

By the method, the accuracy and the universality of the living body detection of the face image in different false face intrusion scenes can be effectively improved.

In one possible design, obtaining the first image and the second image according to the face coordinates in the input image includes:

acquiring face coordinates in the input image according to a face detection operator, wherein the input image is a two-dimensional image acquired by a monocular camera;

extracting a face frame of a face image in the input image according to the face coordinates to obtain the first image;

and carrying out external expansion on the face frame on the input image according to a preset external expansion ratio to obtain the second image.

Two images generated by different characteristics are extracted aiming at the input image and are used as different input data of different false human face intrusion models, so that the accuracy of living body detection and the robustness of an algorithm are effectively improved.

In one possible design, the obtaining a first confidence score of the first image according to the first model includes:

inputting the first image into the first model;

performing Fourier transform operation on the first image to generate a spectrogram;

extracting a first feature of the first image according to the spectrogram, wherein the first feature is used for representing a texture feature of a human face in the first image;

according to the first feature, a first confidence score that an object in the first image is living is determined.

By extracting the first features, the texture features are extracted only for the face region, noise interference of background information is effectively eliminated, and a Fourier spectrogram is introduced to serve as auxiliary supervision, so that the accuracy of the first model for live body detection of 2D face attack is effectively improved.

In one possible design, the obtaining a second confidence score of the second image according to the second model includes:

inputting the second image into the second model;

adding noise to the second image to obtain a third image, wherein the third image is used for representing an image which has the same shape characteristic and different texture characteristics with the second image, and the shape characteristic is used for representing the edge characteristic of a paper image or a screen image in an image background;

extracting a second feature of the second image and a third feature of the third image by using a convolutional neural network, wherein the second feature and the third feature contain the same shape information and different texture information;

calculating a contrast loss value of the second feature and the third feature, and adjusting model parameters in the convolutional neural network so that the texture feature ratio contained in the second feature extracted by the convolutional neural network is reduced;

determining a second confidence score that the object in the second image is living according to the second feature.

By taking the second characteristic as a detection basis of the second model to judge the living body or the non-living body, the second model focuses more on the shape characteristic in the image in the process of living body detection, namely the second model takes the edge failure of paper attack or screen attack as a judgment index of living body detection, so that the accuracy of the living body detection for the paper attack or the screen attack is improved.

In one possible design, the determining whether the face in the input image is a live face according to the comparison result by comparing a final confidence score obtained by calculating the first confidence score and the second confidence score with a preset threshold value includes:

averaging the first confidence score and the second confidence score to obtain a final confidence score;

judging whether the final confidence score is larger than the preset threshold value or not;

if so, the face in the input image is a living face;

and if not, the face in the input image is a non-living face.

By the living body detection method for constructing the two living body detection models and fusing the multi-scale living body detection results of the two living body models, the accuracy of the living body detection of the false human face intrusion detection aiming at three aspects of 2D face attack, paper attack or screen attack is effectively improved, and the generalization of the applicable scene of the living body detection models and the robustness of the model algorithm are further enhanced by the method.

In a second aspect, the present application provides an apparatus for detecting silence of a face, the apparatus comprising:

the image acquisition module is used for obtaining a first image and a second image according to face coordinates in an input image, wherein the first image is used for representing a face image, and the second image is used for representing the face image and a part of background image;

the first detection module is used for obtaining a first confidence score of the first image according to a first model, wherein the first model is used for representing whether an object in the image is a living body or not according to image texture features, and the first confidence score is used for representing the probability that the object in the first image is the living body;

the second detection module is used for obtaining a second confidence score of the second image according to a second model, wherein the second model is used for representing whether the object in the image is a living body or not according to the shape feature of the image, and the second confidence score is used for representing the probability that the object in the second image is the living body;

and the living body judgment module is used for comparing a final confidence score obtained by calculating the first confidence score and the second confidence score with a preset threshold value and determining whether the face in the input image is a living body face according to a comparison result.

In one possible design, the image obtaining module is specifically configured to obtain face coordinates in the input image according to a face detection operator, where the input image is a two-dimensional image acquired by a monocular camera; extracting a face frame of a face image in the input image according to the face coordinates to obtain the first image; and carrying out external expansion on the face frame on the input image according to a preset external expansion ratio to obtain the second image.

In one possible design, the first detection module is specifically configured to input the first image into the first model; performing Fourier transform operation on the first image to generate a spectrogram; extracting a first feature of the first image according to the spectrogram, wherein the first feature is used for representing a texture feature of a human face in the first image; according to the first feature, a first confidence score that an object in the first image is living is determined.

In one possible design, the second detection module is specifically configured to input the second image into the second model; adding noise to the second image to obtain a third image, wherein the third image is used for representing an image which has the same shape characteristic and different texture characteristics with the second image, and the shape characteristic is used for representing the edge characteristic of a paper image or a screen image in an image background; extracting a second feature of the second image and a third feature of the third image by using a convolutional neural network, wherein the second feature and the third feature contain the same shape information and different texture information; calculating a contrast loss value of the second feature and the third feature, and adjusting model parameters in the convolutional neural network so that the texture feature ratio contained in the second feature extracted by the convolutional neural network is reduced; determining a second confidence score that the object in the second image is living according to the second feature.

In one possible design, the living body judgment module is specifically configured to average the first confidence score and the second confidence score to obtain a final confidence score; judging whether the final confidence score is larger than the preset threshold value or not; if so, the face in the input image is a living face; and if not, the face in the input image is a non-living face.

In a third aspect, the present application provides an electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the above-described method steps of generating an image when executing the computer program stored on the memory.

In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the above-mentioned method steps of generating an image.

For each of the second aspect to the fourth aspect and possible technical effects of each aspect, please refer to the above description of the possible technical effects of the first aspect or various possible schemes of the first aspect, and no repeated description is given here.

Drawings

Fig. 1 is a flowchart of a method for detecting a human face in a living body according to the present application;

fig. 2 is a schematic diagram of an input image including a face image according to the present application;

fig. 3 is a schematic diagram of a first image with a non-expanded face frame provided in the present application;

fig. 4 is a schematic diagram of a second image of a face frame extension provided by the present application;

FIG. 5 is a flow chart of a method of obtaining a first confidence score based on a first model provided herein;

fig. 6 is a schematic diagram of an apparatus for human face detection provided in the present application;

fig. 7 is a schematic diagram of a structure of an electronic device provided in the present application.

Detailed Description

The embodiment of the application provides a method for detecting a living body face, which comprises the steps of firstly obtaining face coordinates in an input image according to a face detection operator to obtain a first image containing face information and a second image containing the face information and partial background information. Then, according to a first model for detecting 2D mask attack, obtaining a first confidence score of the first image; and obtaining a second confidence score of the second image according to a second model for detecting paper attack or screen attack. And finally, comparing the final confidence score obtained by averaging the first confidence score and the second confidence score with a preset threshold value to obtain a result of the living body detection.

According to the method provided by the embodiment of the application, the false face intrusion models in different scenes can be effectively identified, the problem that the accuracy of the living body face detection based on the image acquired by the monocular camera is low at present is solved, and the accuracy of the living body detection of the face image in different scenes is effectively improved.

The method provided by the embodiment of the application is further described in detail with reference to the attached drawings.

Referring to fig. 1, an embodiment of the present application provides a method for detecting a human face, which includes the following specific processes:

step 101: acquiring an input image acquired by a monocular camera;

the input image may be an RGB image including a face image in natural light acquired by a monocular RGB (Red Green Blue ) camera, and the input image may also be a single frame RGB image including a face image in a video stream acquired by the monocular RGB camera. Here, the input image may be as shown in fig. 2, where the white area is an area of the face image, and the other areas are areas of the background image.

Here, there are some problems affecting the living body detection due to the input image acquired in the monocular mode: the human face image in the input image is easily influenced by light rays to cause inconsistent imaging quality and cause inaccuracy of living body detection; the human faces in the input images have scene diversity, which causes low generalization and inaccurate detection of living body detection, for example, the human faces are at different angles, or the human faces have different shielding states, and the like.

Therefore, in order to ensure the accuracy of the living body detection of the face image in the input image under different scenes, the false face intrusion problem is divided into three detection sub-problems of paper attack, screen attack and 2D face attack under the condition that the face image in the input image is not a living body. The paper attack represents counterfeiting a real face by using paper containing a face image, the screen attack represents counterfeiting a real face by using electronic equipment containing a face image, and the 2D mask attack represents counterfeiting a real face by using paper with a face contour cut.

The sub-problem is divided into two specific problems according to the characteristics of the sub-problem: on the first hand, there is a commonality of edge breaks (shape features) as a distinctive discriminative feature for paper attacks and screen attacks; in a second aspect, it is characteristic for a 2D mask attack to have textural features as the distinctive distinguishable features.

According to the specific problems of the two aspects, the input image is subjected to the following specific processing of steps 102-106, so that the scene generalization and the identification accuracy of the living body detection aiming at the input image are improved, and the robustness of the living body detection algorithm is enhanced.

Step 102: obtaining a first image and a second image according to the face coordinates in the input image;

in order to pertinently solve the problem of characteristic commonalities in two aspects, which are commonly formed by 2D face attack, paper attack or screen attack, face coordinates of an input image are acquired according to a face detection operator, a region of the face image is framed, and the region of the face image is extracted as a first image. Then, according to the coordinate position of the first image in the input image, the face frame is expanded, the size is enlarged according to the expansion ratio, and the area which contains the first image and part of the input image background after being processed is extracted as a second image. Specifically, the face detection operator may detect a face region of a face image in an input image, and locate coordinates of the face image in the input image, where the face region may be a rectangular region including only the face image, and the extension ratio may be any value greater than 0, the amplification processing characterizes that the face frame expands the size of the image at the extension ratio in two positive and negative aspects of x-y dimensions on the coordinates of the input image according to the located face coordinates, and by expanding the area of the face frame, an image (i.e., a second image) in the face frame after the frame expansion of the first image includes the face image and also includes a partial background image in the input image.

Thereby, a first image only having face information is obtained, wherein the first image may be as shown in fig. 3, and a second image further including a background image (the background image mainly includes shape features of the paper image or the edge information of the screen image), wherein the second image may be as shown in fig. 4, wherein the white area is an area of the face image, and the other areas are areas of the background image.

Step 103: obtaining a first confidence score of the first image according to a first model;

in the embodiment of the application, in order to improve the living body detection accuracy for 2D mask attacks, a first model is constructed, where the first model is used for representing whether an object in an image is a living body or not according to image texture features.

Specifically, the first model is a model trained according to image texture features, wherein the first model inputs a first image only including a face image as a model, extracts the texture features of the face in the first image as first features, and determines a first confidence score that an object in the first image is a living body according to the first features.

Wherein the first confidence score is used to characterize the probability that the object is live in the first image, where the first confidence score is also used to characterize the live detection result of the first image in the first model. The first confidence score may be a value between 0-1: if the first confidence score approaches to 0, the living body detection result of the first model on the first image approaches to a non-living body, and the object in the first image approaches to the non-living body; if the first confidence score approaches 1, the living body detection result of the first model on the first image approaches the living body, and the object in the first image approaches the living body.

For example, the first model may be specifically a fourier spectrogram-based aided supervised deep learning model. The method comprises the steps of inputting a first image into a first model, carrying out Fourier transform operation on the first image to generate a spectrogram of the first image, extracting a first feature of the first image according to the spectrogram, and determining a first confidence score that an object in the first image is a living body.

Based on the difference of the living body and the non-living body in the frequency domain, which can be reflected to a certain extent by the Fourier spectrogram, aiming at the 2D mask attack scene, if the living body face image and the non-living body face image attacked by the 2D mask are simultaneously converted based on the Fourier spectrogram to generate a first living body frequency domain image and a second non-living body frequency domain image, it can be found that: the high-frequency information of the first living body frequency domain image is diverged outwards from the center of the image; the high frequency information distribution of the second non-living body frequency domain map is relatively single and extends only along the horizontal and vertical directions. By applying the Fourier spectrogram, the difference that the object in the image is a living body and a non-living body can be reflected more effectively.

As the non-living body face image generated by the 2D mask belongs to the secondary imaging of the face and has larger difference with the living body face image in the frequency domain and the color distribution of the RGB image, the Fourier spectrogram is introduced into the first model to be used as auxiliary supervision so as to improve the living body detection accuracy of the first model.

Step 104: obtaining a second confidence score of the second image according to a second model;

in order to improve the accuracy of the live body detection against the paper attack or the screen attack, a second model is constructed, where the second model may be a deep learning model. And taking the shape feature extracted from the second image as a second feature through a second model, wherein the shape feature can be represented as an edge feature of a paper image or a screen image contained in the background of the second image, and performing model training according to the second feature only containing shape information to obtain a second confidence score of the second image.

Since a paper attack or a screen attack has a distinctive characteristic of edge failure, in order to improve the accuracy of the second model for performing the live body detection on the second image, it is necessary to: strengthening the attention of the second model to the edge features of the paper image or the screen image contained in the second image; and weakening the interference of the second model on the face image and the irrelevant background image in the second image.

Therefore, in the embodiment of the present application, a third image is generated by introducing a certain noise to the second image, the second feature of the second image and the third feature of the third image are extracted according to the convolutional neural network, the contrast loss value of the second feature and the third feature is calculated, and the model parameter in the convolutional neural network is adjusted, so that the proportion of texture information contained in the second feature extracted by the convolutional neural network is reduced, and the texture feature of the second image is suppressed.

Here, in order to make the living body detection more accurate, an optimization effect on the texture feature restriction in the second feature is proposed: all texture information in the second feature is reduced so that the shape information in the second feature is the only valid information. However, in most cases, the optimization cannot be achieved, and the calculated contrast loss value can be limited to a predetermined effective range: if the preset effective range is reached, the optimization effect can be considered to be reached, and the shape information in the second characteristic is the only effective information; if the preset effective range is not reached, the optimization effect is not considered to be reached, the shape information in the second feature cannot be considered to be the only effective information, and the operation of inhibiting the texture feature in the second feature is repeated. Wherein the value range of the preset effective range is between 0 and 2.

Because certain noise is introduced into the third image, the texture information related to the second image is changed at random, the similarity between the second feature and the third feature is restrained by calculating a contrast loss value, the texture information is restrained, the texture information is useless, and the shape information is only effective information.

Specifically, a second image is acquired in the second model, and a third image containing certain noise is generated by adding noise to the second image. And extracting a second feature containing the information of the paper image or the screen image in the background in the second image and a third feature containing the information of the paper image or the screen image in the background added with noise in the third image through a convolutional neural network. And calculating the contrast loss value of the second feature and the third feature, and adjusting the model parameters in the second model by reducing the contrast loss value so as to adjust the second feature to reduce the ratio of the texture information: suppressing texture information; only the shape information is of interest. And obtaining a second confidence score of the second image according to the adjusted second characteristic in the second model because the training process of the second model is based on the shape characteristic of the image.

Wherein the second confidence score is used to characterize the probability that the object is living in the second image, where the second confidence score is also used to represent the result of the living detection of the second image in the second model. The second confidence score may be a value between 0-1: if the second confidence score approaches to 0, the living body detection result of the second model on the second image approaches to a non-living body, and the object in the second image approaches to a non-living body object; if the second confidence score approaches 1, the result of the second model on the living body detection of the second image approaches the living body, and the object in the second image approaches the living body object.

For example, referring to fig. 5, an embodiment of the present application provides a method for obtaining a second confidence score of a second image according to a second model, where the specific flow is as follows:

step 501: adding noise to the second image to obtain a third image;

in the embodiment of the application, since the second model is used for representing the living body detection model for the paper attack or the screen attack, the significance of judging that the paper attack or the screen attack is different from the real living body is determined as the edge failure of the paper image or the screen image in the image background. Wherein the screen image may be a screen image of the mobile device.

In order to improve the accuracy of the second model living body detection, by improving the accuracy of the identification of the edge burst: intensifying the attention of the second model to the edge feature of the second image; and weakening the interference of the second model to the face image and the irrelevant background information. In particular, the shape features in the second image are made to be the only valid information, while the texture features in the second image are made to no longer provide information.

In the embodiment of the application, a method of adding noise is adopted to randomly change the texture information in the second image, and the second image after the texture information is randomly changed is used as the third image.

The third image and the second image obtained by adding noise to the second image have the following characteristics: have the same shape characteristics, and have different texture characteristics.

Step 502: extracting a second feature of the second image and a third feature of the third image by using a convolutional neural network;

and inputting the second image and the third image into a convolution neural network, and performing feature extraction on the second image and the third image through convolution operation to obtain a second feature extracted according to the second image and a third feature extracted according to the third image.

According to the characteristics of the second image and the third image, the second characteristic and the third characteristic have the following characteristics: the second feature and the third feature contain the same shape information and different texture information.

Step 503: calculating a contrast loss value of the second feature and the third feature, and adjusting model parameters in the convolutional neural network to reduce the texture information ratio contained in the second feature extracted by the convolutional neural network;

in order to ensure that the only effective information in the second image is the shape information, the texture information in the second image is suppressed, a contrast loss value between the second feature and the third feature is obtained through calculation according to a loss function, and then the model parameters in the convolutional neural network mentioned in the step 502 are adjusted according to the contrast loss value to adjust the second feature, so that the occupation ratio of the texture information contained in the second feature extracted by the convolutional neural network is reduced.

Here, the comparison loss value may be limited to a preset effective range: if the contrast loss value is within the preset effective range, the adjusted second feature can be considered to achieve the optimization effect, that is, the second feature does not contain texture information and only contains shape information; if the contrast loss value is not within the preset effective range, it may be determined that the adjusted second characteristic does not achieve the optimal optimization effect, and the adjustment operation needs to be repeated until the calculated contrast loss value is within the preset effective range. Wherein the value range of the preset effective range is between 0 and 2.

Specifically, the method of calculating the contrast loss value according to the loss function is as shown in equation 1 below:

wherein D represents the second image, D' represents the third image,

the second characteristic is shown in the representation,

a third characteristic is shown in the drawing which,

representing the similarity between the second feature and the third feature, LOSS represents a LOSS of contrast value between the second feature and the third feature, where the LOSS of contrast value is a number between 0-2: if the contrast loss value approaches 0, the higher the similarity between the second feature and the third feature is; the lower the similarity between the second feature and the third feature, the closer the contrast loss value approaches 2.

The mathematical calculation method of the similarity described in the above formula 1 is specifically shown in the following formula 2:

wherein the content of the first and second substances,

the length of the vector representing the second feature,

the length of the vector representing the third feature,

denotes the similarity between the second feature and the third feature, wherein the similarity isA number between-1 and 1: if the similarity approaches 1, the higher the similarity between the second feature and the third feature is; the lower the similarity between the second feature and the third feature, the closer the similarity approaches-1.

And adjusting the second feature by calculating a contrast loss value between the second feature and the third feature and adjusting the parameter of the convolutional neural network according to the reduced contrast loss value, so that the occupation ratio of texture information contained in the second feature extracted by the convolutional neural network is reduced, wherein the adjusted second feature only retains shape information and does not contain texture information.

According to the method for limiting the texture information in the second characteristic, the condition that the living body detection of the paper attack or the screen attack is not accurate due to the fact that the current living body detection technology only focuses on the texture information and ignores the shape information is optimized, and the accuracy of the living body detection of the paper attack or the screen attack is effectively improved.

Step 504: determining a second confidence score that the object in the second image is living according to the second feature.

For example, in order to make the living body detection result of the second model for detecting whether the human face in the second image is a living body or a non-living body more accurate in the case of paper attack or screen attack, the second model determines the second confidence score that the object in the second image is a living body according to the second feature. The second model is a model trained according to the shape features of the images, and the second features only include the shape features of the second images, so that the second features are used as detection bases of the second model to judge whether the human faces in the second images are living bodies or non-living bodies, and the living body detection results of the second model are obtained, namely the second confidence scores of the second images.

The second model may be a deep learning model, and the second model may also be a deep neural network model. The second confidence score is used to characterize the probability that the object is live in the second image, where the second confidence score is also used to represent the live detection result of the second image in the second model. The second confidence score may be a value between 0-1: if the second confidence score approaches to 0, the living body detection result of the second model on the second image approaches to a non-living body, and the object in the second image approaches to a non-living body object; if the second confidence score approaches 1, the result of the second model on the living body detection of the second image approaches the living body, and the object in the second image approaches the living body object.

According to the characteristic that the second feature only focuses on the shape information, the second feature is used as a detection basis for judging the living body or the non-living body by the second model, so that the second model focuses more on the shape feature in the image in the process of living body detection, that is, the second model takes the edge failure of paper attack or screen attack as a judgment index of the living body detection, and therefore the accuracy of the living body detection for the paper attack or the screen attack is improved.

Step 105: comparing a final confidence score obtained by calculating the first confidence score and the second confidence score with a preset threshold value, and determining whether the face in the input image is a living face according to a comparison result;

in the embodiment of the application, in order to improve the accuracy of in-vivo detection, the in-vivo face detection model is composed of a first model and a second model, a first confidence score of a first image is calculated through the first model, a second confidence score of a second image is calculated through the second model, a final confidence score is obtained according to an average value of the calculated first confidence score and the calculated second confidence score, and then an in-vivo detection result is obtained according to a comparison result of the final confidence score and a preset threshold. If the confidence score is larger than the preset threshold, the living body detection result is a living body, and the object in the input image is a living body object; and if the confidence score is not greater than the preset threshold, the living body detection result is a non-living body, and the object in the input image is a non-living body object.

Specifically, the specific method for calculating the final confidence score obtained by the first confidence score and the second confidence score is as shown in the following formula 3:

where n denotes the number of models in the living body face detection model, where the first model and the second model are counted as 2. s_iThe ith model is represented, for example, as a first model if i is 1 and as a second model if i is 2. S represents the final confidence score, where the final confidence score may be a number between 0-1: if the final confidence score approaches to 0, the living body detection result of the living body detection model on the input image approaches to a non-living body, and the face in the input image approaches to the face of the non-living body; if the final confidence score approaches to 1, the living body detection result of the living body detection model on the input image approaches to the living body, and the face in the input image approaches to the face of the living body.

Based on the same invention concept, the application also provides a living body face detection device, which is used for realizing the living body detection under different false face intrusion scenes, solving the problem of low accuracy of the living body face detection based on the monocular camera image acquisition, and effectively improving the accuracy of the living body detection of the face image under different false face intrusion scenes, and referring to fig. 6, the device comprises:

the image obtaining module 601 is configured to obtain a first image and a second image according to face coordinates in an input image, where the first image is used to represent a face image, and the second image is used to represent the face image and a part of a background image.

The first detection module 602 is configured to obtain a first confidence score of the first image according to a first model, where the first model is used to characterize whether an object in the image is a living body according to an image texture feature, and the first confidence score is used to characterize a probability that the object in the first image is a living body.

The second detecting module 603 is configured to obtain a second confidence score of the second image according to a second model, where the second model is used to characterize whether the object in the image is a living body according to the image shape feature, and the second confidence score is used to characterize a probability that the object in the second image is a living body.

And the living body judgment module 604 is configured to compare a final confidence score obtained by calculating the first confidence score and the second confidence score with a preset threshold, and determine whether a face in the input image is a living body face according to a comparison result.

In a possible design, the image obtaining module 601 is specifically configured to obtain face coordinates in the input image according to a face detection operator, where the input image is a two-dimensional image acquired by a monocular camera; extracting a face frame of a face image in the input image according to the face coordinates to obtain the first image; and carrying out external expansion on the face frame on the input image according to a preset external expansion ratio to obtain the second image.

In one possible design, the first detection module 602 is specifically configured to input the first image into the first model; performing Fourier transform operation on the first image to generate a spectrogram; extracting a first feature of the first image according to the spectrogram, wherein the first feature is used for representing a texture feature of a human face in the first image; according to the first feature, a first confidence score that an object in the first image is living is determined.

In one possible design, the second detection module 603 is specifically configured to input the second image into the second model; adding noise to the second image to obtain a third image, wherein the third image is used for representing an image which has the same shape characteristic and different texture characteristics with the second image, and the shape characteristic is used for representing the edge characteristic of a paper image or a screen image in an image background; extracting a second feature of the second image and a third feature of the third image by using a convolutional neural network, wherein the second feature and the third feature contain the same shape information and different texture information; calculating a contrast loss value of the second feature and the third feature, and adjusting model parameters in the convolutional neural network so that the texture feature ratio contained in the second feature extracted by the convolutional neural network is reduced; determining a second confidence score that the object in the second image is living according to the second feature.

In one possible design, the living body judgment module 604 is specifically configured to average the first confidence score and the second confidence score to obtain a final confidence score; judging whether the final confidence score is larger than the preset threshold value or not; if so, the face in the input image is a living face; and if not, the face in the input image is a non-living face.

Based on the device, the living body detection of the acquired image based on the monocular camera in different false face intrusion scenes is effectively realized, and the accuracy and the universality of the living body detection of the face image in different false face intrusion scenes are effectively improved.

Based on the same inventive concept, an embodiment of the present application further provides an electronic device, where the electronic device can implement the function of the apparatus for detecting a human face, and with reference to fig. 7, the electronic device includes:

at least one processor 701 and a memory 702 connected to the at least one processor 701, in this embodiment, a specific connection medium between the processor 701 and the memory 702 is not limited in this embodiment, and fig. 7 illustrates an example in which the processor 701 and the memory 702 are connected by a bus 700. The bus 700 is shown in fig. 7 by a thick line, and the connection between other components is merely illustrative and not limited thereto. The bus 700 may be divided into an address bus, a data bus, a control bus, etc., and is shown in fig. 7 with only one thick line for ease of illustration, but does not represent only one bus or one type of bus. Alternatively, the processor 701 may also be referred to as a controller, without limitation to name a few.

In the embodiment of the present application, the memory 702 stores instructions executable by the at least one processor 701, and the at least one processor 701 may execute the method for detecting a living human face as discussed above by executing the instructions stored in the memory 702. The processor 701 may implement the functions of the various modules in the apparatus shown in fig. 6.

The processor 701 is a control center of the apparatus, and may connect various parts of the entire control device by using various interfaces and lines, and perform various functions and process data of the apparatus by operating or executing instructions stored in the memory 702 and calling data stored in the memory 702, thereby performing overall monitoring of the apparatus.

In one possible design, processor 701 may include one or more processing units, and processor 701 may integrate an application processor, which handles primarily the operating system, user interfaces, and applications, among others, and a modem processor, which handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 701. In some embodiments, processor 701 and memory 702 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 701 may be a general-purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method for detecting a living human face disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.

Memory 702, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 702 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 702 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 702 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

By programming the processor 701, the code corresponding to the method for detecting a living human face described in the foregoing embodiment may be solidified into a chip, so that the chip can execute the steps of the method for detecting a living human face of the embodiment shown in fig. 1 when running. How to program the processor 701 is well known to those skilled in the art and will not be described herein.

Based on the same inventive concept, the present application further provides a storage medium storing computer instructions, which when executed on a computer, cause the computer to perform the method for detecting a living human face discussed above.

In some possible embodiments, the aspects of the method for living body face detection provided by the present application may also be implemented in the form of a program product comprising program code for causing a control apparatus to perform the steps in the method for living body face detection according to various exemplary embodiments of the present application described above in this specification when the program product is run on a device.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for human face detection, the method comprising:

2. The method of claim 1, wherein obtaining the first image and the second image according to the face coordinates in the input image comprises:

3. The method of claim 1, wherein said deriving a first confidence score for the first image based on the first model comprises:

inputting the first image into the first model;

4. The method of claim 1, wherein said deriving a second confidence score for the second image based on the second model comprises:

inputting the second image into the second model;

5. The method of claim 1, wherein the determining whether the face in the input image is a live face according to the comparison result of the final confidence score obtained by calculating the first confidence score and the second confidence score being compared with a preset threshold comprises:

if so, the face in the input image is a living face;

and if not, the face in the input image is a non-living face.

6. An apparatus for human face detection, the system comprising:

7. The apparatus of claim 6, wherein the second detection module is specifically configured to input the second image into the second model; adding noise to the second image to obtain a third image, wherein the third image is used for representing an image which has the same shape characteristic and different texture characteristics with the second image, and the shape characteristic is used for representing the edge characteristic of a paper image or a screen image in an image background; extracting a second feature of the second image and a third feature of the third image by using a convolutional neural network, wherein the second feature and the third feature contain the same shape information and different texture information; calculating a contrast loss value of the second feature and the third feature, and adjusting model parameters in the convolutional neural network so that the texture feature ratio contained in the second feature extracted by the convolutional neural network is reduced; determining a second confidence score that the object in the second image is living according to the second feature.

8. The apparatus of claim 6, wherein the liveness determination module is specifically configured to average the first confidence score and the second confidence score to obtain a final confidence score; judging whether the final confidence score is larger than the preset threshold value or not; if so, the face in the input image is a living face; and if not, the face in the input image is a non-living face.

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1-5 when executing the computer program stored on the memory.

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-5.