CN113496215B

CN113496215B - Method and device for detecting living human face and electronic equipment

Info

Publication number: CN113496215B
Application number: CN202110766966.6A
Authority: CN
Inventors: 徐佳文
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Filing date: 2021-07-07
Publication date: 2024-07-02
Anticipated expiration: 2041-07-07

Abstract

The application discloses a method, a device and electronic equipment for detecting a living body face, wherein the method comprises the steps of firstly acquiring an input image acquired by a monocular camera, obtaining a first image and a second image according to face coordinates in the input image, and then obtaining a first confidence score of the first image through a first model; and obtaining a second confidence score of the second image through the second model, comparing the final confidence score obtained by calculating the first confidence score and the second confidence score with a preset threshold value, and determining whether the face in the input image is a living face according to a comparison result. Based on the method, the accuracy of living body detection under different false face intrusion scenes can be effectively improved.

Description

Method and device for detecting living human face and electronic equipment

Technical Field

The present application relates to the field of computer vision, and in particular, to a method and apparatus for detecting a living face, and an electronic device.

Background

With the wide application of the face recognition technology in various authority systems, if a real face or a virtual dummy face cannot be accurately recognized, a great threat is caused to personal interests of users. The accuracy of face recognition in false face intrusion scenarios is also becoming increasingly important.

The virtual dummy face invasion can be effectively identified through the living body detection technology, the existing living body detection technology mainly relies on a monocular camera to collect face areas of images to extract texture features, and a deep learning model is trained to generate living body detection results.

However, the above scheme has the problem of low accuracy of living body detection under different false face intrusion scenes.

Disclosure of Invention

The application provides a method, a device and electronic equipment for detecting a living body face, which are used for improving the accuracy of living body detection under different false face invasion scenes.

In a first aspect, the present application provides a method of in-vivo face detection, the method comprising:

Obtaining a first image and a second image according to face coordinates in an input image, wherein the first image is used for representing the face image, and the second image is used for representing the face image and part of background image;

obtaining a first confidence score of the first image according to a first model, wherein the first model is used for representing whether an object in the image is a living body according to image texture characteristics, and the first confidence score is used for representing the probability that the object in the first image is the living body;

Obtaining a second confidence score of the second image according to a second model, wherein the second model is used for representing whether the object in the image is a living body according to the image shape characteristics, and the second confidence score is used for representing the probability that the object in the second image is the living body;

And comparing the final confidence score obtained by calculating the first confidence score and the second confidence score with a preset threshold value, and determining whether the face in the input image is a living face according to a comparison result.

By the method, the accuracy and the universality of living body detection of the face image under different false face intrusion scenes can be effectively improved.

In one possible design, the obtaining the first image and the second image according to the face coordinates in the input image includes:

Acquiring face coordinates in the input image according to a face detection operator, wherein the input image is a two-dimensional image acquired by a monocular camera;

extracting a face frame of a face image in the input image according to the face coordinates to obtain the first image;

and expanding the face frame on the input image according to a preset expansion ratio to obtain the second image.

Two images generated by extracting different features from the input image are used as different input data of different false face intrusion models, so that the accuracy of living body detection and the robustness of an algorithm are effectively improved.

In one possible design, the obtaining a first confidence score for the first image according to a first model includes:

Inputting the first image into the first model;

performing Fourier transform operation on the first image to generate a spectrogram;

Extracting first features of the first image according to the spectrogram, wherein the first features are used for representing texture features of faces in the first image;

and determining a first confidence score of the object in the first image as a living body according to the first characteristic.

By extracting the first features, texture features are extracted only for the face region, noise interference of background information is effectively eliminated, a Fourier spectrogram is introduced as auxiliary supervision, and accuracy of living body detection of the first model for 2D mask attack is effectively improved.

In one possible design, the obtaining a second confidence score for the second image according to a second model includes:

Inputting the second image into the second model;

Adding noise to the second image to obtain a third image, wherein the third image is used for representing an image with the same shape characteristics and different texture characteristics as the second image, and the shape characteristics are used for representing edge characteristics of a paper image or a screen image in an image background;

Extracting a second feature of the second image and a third feature of the third image by using a convolutional neural network, wherein the second feature and the third feature contain the same shape information and different texture information;

calculating a contrast loss value of the second feature and the third feature, and adjusting model parameters in the convolutional neural network so that the duty ratio of the texture features contained in the second feature extracted by the convolutional neural network is reduced;

and determining a second confidence score of the object in the second image as a living body according to the second characteristic.

The second model is used as a detection basis for judging the living body or the non-living body, so that the second model is more concerned with the shape characteristics in the image in the living body detection process, that is to say, the second model takes the edge break of the paper attack or the screen attack as a judgment index of living body detection, and the accuracy of living body detection for the paper attack or the screen attack is improved.

In one possible design, the determining whether the face in the input image is a living face according to the comparison result by comparing the final confidence score obtained by calculating the first confidence score and the second confidence score with a preset threshold value includes:

averaging the first confidence score and the second confidence score to obtain a final confidence score;

Judging whether the final confidence score is larger than the preset threshold value or not;

if yes, the face in the input image is a living face;

if not, the face in the input image is a non-living face.

By constructing two living body detection models and fusing the multi-scale living body detection results of the two living body models, the living body detection accuracy of false human face intrusion detection in three aspects of 2D mask attack, paper attack or screen attack is effectively improved, and generalization of a suitable scene of the living body detection models and robustness of a model algorithm are further enhanced by the method.

In a second aspect, the present application provides an apparatus for face silence detection, the apparatus comprising:

The image acquisition module is used for obtaining a first image and a second image according to the face coordinates in the input image, wherein the first image is used for representing the face image, and the second image is used for representing the face image and part of the background image;

The first detection module is used for obtaining a first confidence score of the first image according to a first model, wherein the first model is used for representing whether an object in the image is a living body according to image texture characteristics, and the first confidence score is used for representing the probability that the object in the first image is the living body;

the second detection module is used for obtaining a second confidence score of the second image according to a second model, wherein the second model is used for representing whether the object in the image is a living body according to the image shape characteristics, and the second confidence score is used for representing the probability that the object in the second image is the living body;

the living body judging module is used for comparing the final confidence score obtained by calculating the first confidence score and the second confidence score with a preset threshold value and determining whether the face in the input image is a living body face or not according to a comparison result.

In one possible design, the image acquisition module is specifically configured to acquire face coordinates in the input image according to a face detection operator, where the input image is a two-dimensional image acquired by a monocular camera; extracting a face frame of a face image in the input image according to the face coordinates to obtain the first image; and expanding the face frame on the input image according to a preset expansion ratio to obtain the second image.

In one possible design, the first detection module is specifically configured to input the first image into the first model; performing Fourier transform operation on the first image to generate a spectrogram; extracting first features of the first image according to the spectrogram, wherein the first features are used for representing texture features of faces in the first image; and determining a first confidence score of the object in the first image as a living body according to the first characteristic.

In one possible design, the second detection module is specifically configured to input the second image into the second model; adding noise to the second image to obtain a third image, wherein the third image is used for representing an image with the same shape characteristics and different texture characteristics as the second image, and the shape characteristics are used for representing edge characteristics of a paper image or a screen image in an image background; extracting a second feature of the second image and a third feature of the third image by using a convolutional neural network, wherein the second feature and the third feature contain the same shape information and different texture information; calculating a contrast loss value of the second feature and the third feature, and adjusting model parameters in the convolutional neural network so that the duty ratio of the texture features contained in the second feature extracted by the convolutional neural network is reduced; and determining a second confidence score of the object in the second image as a living body according to the second characteristic.

In one possible design, the living body judging module is specifically configured to average the first confidence score and the second confidence score to obtain a final confidence score; judging whether the final confidence score is larger than the preset threshold value or not; if yes, the face in the input image is a living face; if not, the face in the input image is a non-living face.

In a third aspect, the present application provides an electronic device, including:

a memory for storing a computer program;

and a processor for implementing one of the above-described method steps for generating an image when executing the computer program stored on the memory.

In a fourth aspect, the present application provides a computer readable storage medium having a computer program stored therein, which when executed by a processor, implements a method step of generating an image as described above.

The technical effects of each of the second to fourth aspects and the technical effects that may be achieved by each aspect are referred to above for the first aspect or the technical effects that may be achieved by each possible aspect in the first aspect, and the detailed description is not repeated here.

Drawings

FIG. 1 is a flow chart of a method for detecting a living human face according to the present application;

FIG. 2 is a schematic diagram of an input image including a face image according to the present application;

FIG. 3 is a schematic diagram of a first image of a non-expansion face frame provided by the present application;

fig. 4 is a schematic diagram of a second image of face frame expansion provided by the application;

FIG. 5 is a flow chart of a method for deriving a first confidence score based on a first model according to the present application;

Fig. 6 is a schematic diagram of an apparatus for detecting a living human face according to the present application;

fig. 7 is a schematic diagram of a structure of an electronic device according to the present application.

Detailed Description

The embodiment of the application provides a living body face detection method, which comprises the steps of firstly obtaining face coordinates in an input image according to a face detection operator to obtain a first image containing face information and a second image containing face information and partial background information. Then, according to a first model for detecting 2D mask attack, obtaining a first confidence score of the first image; and obtaining a second confidence score of the second image according to a second model for detecting the paper attack or the screen attack. And finally, the final confidence score obtained by averaging the first confidence score and the second confidence score is compared with a preset threshold value to obtain a living body detection result.

According to the method provided by the embodiment of the application, false face intrusion models in different scenes can be effectively identified, the problem of low accuracy of living face detection based on the acquired images of the monocular cameras at present is solved, and the accuracy of living face detection of the face images in different scenes is effectively improved.

The method provided by the embodiment of the application is further described in detail below with reference to the accompanying drawings.

Referring to fig. 1, the embodiment of the application provides a method for detecting a living human face, which specifically comprises the following steps:

Step 101: acquiring an input image acquired by a monocular camera;

the input image may be an RGB image containing a face image under natural light collected by a monocular RGB (Red Green Blue) camera, and the input image may also be a single frame RGB image containing a face image in a video stream collected by a monocular RGB camera. Here, the input image may be shown in fig. 2, in which the white area is an area of the face image, and the other areas are areas of the background image.

Here, there are some problems affecting the living body detection due to the input image acquired based on the monocular mode: face images in input images are easily affected by light to cause inconsistent imaging quality and inaccurate living body detection; the faces in the input image have scene diversity, so that generalization of living body detection is low and detection is inaccurate, for example, the faces are at different angles, or different shielding states exist in the faces.

Therefore, in order to ensure the accuracy of the living detection of the face image in the input image in different scenes, the virtual dummy face intrusion problem is divided into three detection sub-problems of paper attack, screen attack and 2D mask attack aiming at the condition that the face image in the input image is non-living. The paper attack feature is to forge a real face by using paper containing a face image, the screen attack feature is to forge the real face by using electronic equipment containing the face image, and the 2D mask attack feature is to forge the real face by using paper with a cut face outline.

The specific problems according to the features of the sub-problems are divided into two aspects: in a first aspect, there is commonality of edge breakup (shape features) as a significant distinguishability feature for paper attacks and screen attacks; in a second aspect, there is a feature of texture as a distinct distinguishable feature for a 2D mask attack.

According to the specific problems of the two aspects, the input image is subjected to the following specific processing of step 102-step 106, so that the scene generalization and the recognition accuracy of the living body detection of the input image are improved, and the robustness of a living body detection algorithm is enhanced.

Step 102: obtaining a first image and a second image according to face coordinates in an input image;

In order to solve the problem of feature commonality in two aspects consisting of 2D mask attack, paper attack or screen attack in a targeted manner, face coordinates of an input image are acquired according to a face detection operator, a region of a face image is framed, and the region of the face image is extracted as a first image. And then, expanding the face frame according to the coordinate position of the first image in the input image, performing size amplification processing according to the expansion ratio, and extracting the processed area containing the first image and part of the background of the input image as a second image. Specifically, the face detection operator may detect a face region of a face image in an input image, locate coordinates of the face image in the input image, the face image region may be a rectangular region including only the face image, the expansion ratio may be a value arbitrarily greater than 0, and the expansion process characterizes that the face frame expands the size of the image according to the located face coordinates in positive and negative aspects of two dimensions x-y on the coordinates of the input image, and by expanding the area of the face frame, the image (i.e., the second image) selected in the face frame after the expansion of the first image includes the face image and also includes a part of the background image in the input image.

This results in a first image with only face information, wherein the first image may be seen in fig. 3, and a second image further comprising a background image (the background image mainly comprising shape features of the paper image or of the edge information of the screen image), wherein the second image may be seen in fig. 4, wherein the white areas are areas of the face image and the other areas are areas of the background image.

Step 103: obtaining a first confidence score of the first image according to a first model;

In the embodiment of the application, in order to improve living body detection accuracy aiming at 2D mask attack, a first model is constructed, wherein the first model is used for representing whether an object in an image is a living body or not according to image texture characteristics.

Specifically, the first model is a model trained according to the texture features of the image, wherein the first model takes a first image only including a face image as a model input, extracts the texture features of the face in the first image as first features, and determines a first confidence score of the living body of the object in the first image according to the first features.

Wherein the first confidence score is used to characterize the probability that the object in the first image is a living being, where the first confidence score is also used to characterize the living result of the first image in the first model. The first confidence score may be a value between 0-1: if the first confidence score approaches 0, the living body detection result of the first model on the first image approaches to a non-living body, and the object in the first image approaches to the non-living body; if the first confidence score approaches 1, the living body detection result of the first model on the first image approaches the living body, and the object in the first image approaches the living body.

The first model may in particular be a deep learning model based on an aided supervision of fourier spectrograms, for example. Inputting a first image into a first model, performing Fourier transform operation on the first image to generate a spectrogram of the first image, extracting first features of the first image according to the spectrogram, and determining a first confidence score of an object in the first image as a living body.

Based on the difference of the Fourier spectrogram in the frequency domain of living bodies and non-living bodies to a certain extent, aiming at a 2D mask attack scene, if the living body face image and the non-living body face image attacked by the 2D mask are converted based on the Fourier spectrogram at the same time, a first living body frequency domain image and a second non-living body frequency domain image are generated, and the following can be found: the high-frequency information of the first living body frequency domain image is divergent outwards from the center of the image; the second non-living body frequency domain plot has a relatively single distribution of high frequency information extending only in the horizontal and vertical directions. By applying the Fourier spectrogram, the difference between the living body and the non-living body of the object in the image can be reflected more effectively.

Because the non-living body face image generated by the 2D mask belongs to the secondary imaging of the face and has larger difference with the living body face image in the frequency domain and the color distribution of the RGB image, a Fourier spectrogram is introduced into the first model as auxiliary supervision so as to improve the living body detection accuracy of the first model.

Step 104: obtaining a second confidence score of the second image according to a second model;

In order to improve the accuracy of the living body detection for paper attacks or screen attacks, a second model is constructed, which here may be a deep learning model. And taking the shape feature extracted from the second image as a second feature through a second model, wherein the shape feature can be expressed as an edge feature of a paper image or a screen image contained in the background of the second image, and performing model training according to the second feature only containing shape information to obtain a second confidence score of the second image.

Wherein, since the paper attack or the screen attack has the significant distinguishable characteristic of edge breaking, in order to improve the accuracy of the second model for performing the living detection on the second image, it is required to: enhancing the second model's attention to edge features of the paper image or the screen image contained in the second image; and weakening the interference of the second model on the face image and the irrelevant background image in the second image.

Therefore, in the embodiment of the application, a third image is generated by introducing certain noise into the second image, the second characteristic of the second image and the third characteristic of the third image are extracted according to the convolutional neural network, the contrast loss value of the second characteristic and the third characteristic is calculated, and the model parameters in the convolutional neural network are adjusted, so that the proportion of texture information contained in the second characteristic extracted by the convolutional neural network is reduced, and the texture characteristics of the second image are suppressed.

Here, in order to make the living body detection more accurate, an optimization effect of the texture feature restriction in the second feature is proposed: all texture information in the second feature is reduced so that shape information in the second feature is the only valid information. However, in most cases, the optimization effect cannot be obtained, and the comparison loss value obtained by the calculation may be limited to a preset effective range: if the preset effective range is reached, the optimization effect can be considered to be reached, and the shape information in the second feature is the only effective information; if the predetermined effective range is not reached, it is considered that the optimization effect is not reached, and the shape information in the second feature is not considered as the only effective information, and the operation of suppressing the texture feature in the second feature is repeated. Wherein the value range of the preset effective range is between 0 and 2.

Wherein, because a certain noise is introduced into the third image, the texture information related to the second image is randomly changed, and then the similarity between the second feature and the third feature is restrained by calculating the contrast loss value, the texture information is restrained, so that the texture information is useless information and the shape information is the only effective information.

Specifically, a second image is acquired in a second model, and a third image containing a certain noise is generated from adding noise to the second image. And extracting a second characteristic containing the information of the paper image or the screen image in the background in the second image and a third characteristic containing the information of the paper image or the screen image in the background, which is added with noise, in the third image through a convolutional neural network. And calculating contrast loss values of the second feature and the third feature, and adjusting model parameters in the second model by reducing the contrast loss values so as to adjust the second feature to reduce the duty ratio of the texture information: suppressing texture information; only shape information is of interest. Because the second model training process is based on the shape features of the image, in the second model, a second confidence score for the second image is obtained based on the adjusted second features.

Wherein the second confidence score is used to characterize the probability that the object in the second image is a living being, where the second confidence score is also used to represent the living result of the second image in the second model. The second confidence score may be a value between 0-1: if the second confidence score approaches 0, the living body detection result of the second model on the second image approaches to the non-living body, and the object in the second image approaches to the non-living body object; if the second confidence score approaches 1, the result of the living body detection of the second image by the second model approaches the living body, and the object in the second image approaches the living body object.

For example, referring to fig. 5, in an embodiment of the present application, a method for obtaining a second confidence score of a second image according to a second model is provided, which specifically includes the following steps:

step 501: adding noise to the second image to obtain a third image;

In the embodiment of the application, since the second model is used for representing the living body detection model for the paper attack or the screen attack, the significance of judging the paper attack or the screen attack to be different from the real living body is based on that the edge of the paper image or the screen image in the image background breaks down. Wherein the screen image may be a screen image of a mobile device.

In order to improve the accuracy of the second model in vivo detection, the accuracy of the identification of the edge breakup is improved by the improvement of: enhancing the attention of the second model to edge features of the second image; and weakening the interference of the second model on the face image and irrelevant background information. In particular, the shape features in the second image are made to be the only valid information, while the texture features in the second image are made no longer to provide information.

In the embodiment of the application, a noise adding method is adopted to randomly change the texture information in the second image, and the second image after the texture information is randomly changed is used as the third image.

The third image and the second image obtained by adding noise to the second image have the following characteristics: having the same shape characteristics, and having different texture characteristics.

Step 502: extracting second features of the second image and third features of the third image by using a convolutional neural network;

And inputting the second image and the third image into a convolutional neural network, and performing feature extraction on the second image and the third image through convolutional operation to obtain second features extracted according to the second image and third features extracted according to the third image.

Wherein, according to the characteristics of the second image and the third image, the second feature and the third feature have the following characteristics: the second feature and the third feature contain the same shape information and different texture information.

Step 503: calculating a contrast loss value of the second feature and the third feature, and adjusting model parameters in the convolutional neural network so that the texture information contained in the second feature extracted by the convolutional neural network is reduced in duty ratio;

In order to ensure that the only effective information in the second image is the shape information, suppressing the texture information in the second image, calculating a comparison loss value of the second feature and the third feature according to the loss function, and adjusting the model parameters in the convolutional neural network mentioned in the step 502 according to the comparison loss value to adjust the second feature, so that the duty ratio of the texture information contained in the second feature extracted by the convolutional neural network is reduced.

Here, the comparison loss value may be further limited by a preset valid range: if the contrast loss value is within the preset effective range, the adjusted second feature can be considered to achieve the optimal effect, that is, the second feature does not contain texture information and only contains shape information; if the contrast loss value is not within the preset effective range, it may be considered that the adjusted second feature does not achieve the optimal effect, and the above adjustment operation needs to be repeated until the calculated contrast loss value is within the preset effective range. Wherein the value range of the preset effective range is between 0 and 2.

Specifically, the method of calculating the contrast loss value from the loss function is shown in the following equation 1:

wherein D represents the second image, D' represents the third image, A second characteristic is indicated by the fact that,A third characteristic is indicated by the fact that,Representing the similarity between the second feature and the third feature, LOSS represents a contrast LOSS value between the second feature and the third feature, wherein the contrast LOSS value is a number between 0 and 2: if the contrast loss value approaches 0, the higher the similarity between the second feature and the third feature is; if the contrast loss value approaches 2, the similarity between the second feature and the third feature is lower.

The mathematical calculation method of the similarity described in the above formula 1 is specifically shown in the following formula 2:

Wherein, The length of the vector representing the second feature,The length of the vector representing the third feature,Representing a similarity between the second feature and the third feature, wherein the similarity is a number between-1 and 1: if the similarity approaches 1, the higher the similarity between the second feature and the third feature; if the similarity approaches-1, the lower the similarity between the second feature and the third feature.

And adjusting the parameters of the convolutional neural network according to the reduced contrast loss value so as to adjust the second feature, so that the duty ratio of texture information contained in the second feature extracted by the convolutional neural network is reduced, wherein the adjusted second feature only retains shape information and does not contain texture information.

According to the method for limiting the texture information in the second feature, the current living detection technology is optimized to only pay attention to the texture information and ignore the shape information, so that the situation that the living detection of the paper attack or the screen attack is inaccurate is caused, and the accuracy of the living detection of the paper attack or the screen attack is effectively improved.

Step 504: and determining a second confidence score of the object in the second image as a living body according to the second characteristic.

For example, in order to make the second model more accurate for the living detection result that the face in the second image is a living or non-living body in the case of a paper attack or a screen attack, the second model will determine a second confidence score that the object in the second image is a living body according to the second feature. Because the second model is a model trained according to the shape features of the image, and the second features only comprise the shape features of the second image, the second features are used as the detection basis for judging whether the face in the second image is a living body or a non-living body by the second model, and a living body detection result of the second model is obtained, namely the second confidence score of the second image.

The second model may be a deep learning model, and the second model may also be a deep neural network model. The second confidence score is used to characterize the probability that the object in the second image is a living body, where the second confidence score is also used to represent the living body detection result of the second image in the second model. The second confidence score may be a value between 0-1: if the second confidence score approaches 0, the living body detection result of the second model on the second image approaches to the non-living body, and the object in the second image approaches to the non-living body object; if the second confidence score approaches 1, the result of the living body detection of the second image by the second model approaches the living body, and the object in the second image approaches the living body object.

According to the characteristic that the second feature only focuses on the shape information, the second feature is used as a detection basis for judging the living body or the non-living body by the second model in the embodiment of the application, so that the second model focuses on the shape feature in the image in the living body detection process, that is, the second model uses the edge rupture of the paper attack or the screen attack as a judgment index for the living body detection, and the accuracy of the living body detection for the paper attack or the screen attack is improved.

Step 105: comparing the final confidence score obtained by calculating the first confidence score and the second confidence score with a preset threshold value, and determining whether the face in the input image is a living face according to a comparison result;

In the embodiment of the application, in order to improve the accuracy of living body detection, a living body face detection model consists of a first model and a second model, a first confidence score of a first image is calculated through the first model, a second confidence score of a second image is calculated through the second model, a final confidence score is obtained according to the average value of the calculated first confidence score and the second confidence score, and then a living body detection result is obtained according to the comparison result of the final confidence score and a preset threshold value. If the confidence score is larger than the preset threshold, the living body detection result is a living body, and the object in the input image is a living body object; and if the confidence score is not greater than the preset threshold, the living body detection result is a non-living body, and the object in the input image is a non-living body object.

Specifically, a specific method for calculating the final confidence score obtained by the first confidence score and the second confidence score is shown in the following formula 3:

Where n represents the number of models in the living face detection model, where the first model and the second model are counted as 2.s _i represents the ith model, e.g., the first model if i is 1, and the second model if i is 2.S represents the final confidence score, where the final confidence score may be a number between 0 and 1: if the final confidence score approaches 0, the living body detection result of the living body detection model on the input image approaches to a non-living body, and the human face in the input image approaches to the non-living body human face; if the final confidence score approaches 1, the living body detection result of the living body detection model on the input image approaches a living body, and the human face in the input image approaches the living body face.

Based on the same inventive concept, the application also provides a device for detecting living body face, which is used for realizing living body detection under different false face invasion scenes, solving the problem of low accuracy of living body face detection based on monocular camera collected images, and effectively improving the accuracy of living body detection of face images under different false face invasion scenes, and referring to fig. 6, the device comprises:

The image obtaining module 601 is configured to obtain a first image and a second image according to face coordinates in an input image, where the first image is used for representing a face image, and the second image is used for representing the face image and a part of a background image.

The first detection module 602 is configured to obtain a first confidence score of the first image according to a first model, where the first model is used for characterizing whether an object in the image is a living body according to image texture features, and the first confidence score is used for characterizing a probability that the object in the first image is a living body.

The second detection module 603 is configured to obtain a second confidence score of the second image according to a second model, where the second model is used for characterizing whether the object in the image is a living body according to the image shape feature, and the second confidence score is used for characterizing a probability that the object in the second image is a living body.

The living body judging module 604 is configured to compare a final confidence score obtained by calculating the first confidence score and the second confidence score with a preset threshold value, and determine whether the face in the input image is a living body face according to a comparison result.

In one possible design, the image obtaining module 601 is specifically configured to obtain face coordinates in the input image according to a face detection operator, where the input image is a two-dimensional image collected by a monocular camera; extracting a face frame of a face image in the input image according to the face coordinates to obtain the first image; and expanding the face frame on the input image according to a preset expansion ratio to obtain the second image.

In one possible design, the first detection module 602 is specifically configured to input the first image into the first model; performing Fourier transform operation on the first image to generate a spectrogram; extracting first features of the first image according to the spectrogram, wherein the first features are used for representing texture features of faces in the first image; and determining a first confidence score of the object in the first image as a living body according to the first characteristic.

In one possible design, the second detection module 603 is specifically configured to input the second image into the second model; adding noise to the second image to obtain a third image, wherein the third image is used for representing an image with the same shape characteristics and different texture characteristics as the second image, and the shape characteristics are used for representing edge characteristics of a paper image or a screen image in an image background; extracting a second feature of the second image and a third feature of the third image by using a convolutional neural network, wherein the second feature and the third feature contain the same shape information and different texture information; calculating a contrast loss value of the second feature and the third feature, and adjusting model parameters in the convolutional neural network so that the duty ratio of the texture features contained in the second feature extracted by the convolutional neural network is reduced; and determining a second confidence score of the object in the second image as a living body according to the second characteristic.

In one possible design, the living body determining module 604 is specifically configured to average the first confidence score and the second confidence score to obtain a final confidence score; judging whether the final confidence score is larger than the preset threshold value or not; if yes, the face in the input image is a living face; if not, the face in the input image is a non-living face.

Based on the device, the living body detection of the image acquired by the monocular camera under different false face intrusion scenes is effectively realized, and the accuracy and the universality of the living body detection of the face image under different false face intrusion scenes are effectively improved.

Based on the same inventive concept, the embodiment of the present application further provides an electronic device, where the electronic device may implement the function of the foregoing apparatus for detecting a living body face, and referring to fig. 7, the electronic device includes:

At least one processor 701, and a memory 702 connected to the at least one processor 701, in which the specific connection medium between the processor 701 and the memory 702 is not limited in the embodiment of the present application, and in fig. 7, the connection between the processor 701 and the memory 702 through the bus 700 is taken as an example. Bus 700 is shown in bold lines in fig. 7, and the manner in which the other components are connected is illustrated schematically and not by way of limitation. The bus 700 may be divided into an address bus, a data bus, a control bus, etc., and is represented by only one thick line in fig. 7 for convenience of representation, but does not represent only one bus or one type of bus. Alternatively, the processor 701 may be referred to as a controller, and the names are not limited.

In an embodiment of the present application, the memory 702 stores instructions executable by the at least one processor 701, and the at least one processor 701 can perform the method for detecting a living face as described above by executing the instructions stored in the memory 702. The processor 701 may implement the functions of the various modules in the apparatus shown in fig. 6.

The processor 701 is a control center of the apparatus, and may connect various parts of the entire control device using various interfaces and lines, and by executing or executing instructions stored in the memory 702 and invoking data stored in the memory 702, various functions of the apparatus and processing data, thereby performing overall monitoring of the apparatus.

In one possible design, processor 701 may include one or more processing units, and processor 701 may integrate an application processor and a modem processor, wherein the application processor primarily processes operating systems, user interfaces, application programs, and the like, and the modem processor primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 701. In some embodiments, processor 701 and memory 702 may be implemented on the same chip, or they may be implemented separately on separate chips in some embodiments.

The processor 701 may be a general purpose processor such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, and may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method for detecting the living body human face disclosed by the embodiment of the application can be directly embodied as the execution of a hardware processor or the execution of the combination of hardware and software modules in the processor.

The memory 702 is a non-volatile computer-readable storage medium that can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 702 may include at least one type of storage medium, and may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), magnetic Memory, magnetic disk, optical disk, and the like. Memory 702 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 702 in embodiments of the present application may also be circuitry or any other device capable of performing storage functions for storing program instructions and/or data.

By programming the processor 701, codes corresponding to the method for detecting a living human face described in the foregoing embodiment may be cured into a chip, so that the chip can execute the steps of the method for detecting a living human face of the embodiment shown in fig. 1 at the time of operation. How to design and program the processor 701 is a technology well known to those skilled in the art, and will not be described in detail herein.

Based on the same inventive concept, the embodiments of the present application also provide a storage medium storing computer instructions that, when executed on a computer, cause the computer to perform the method of living face detection as previously discussed.

In some possible embodiments, aspects of the method of living face detection provided by the present application may also be implemented in the form of a program product comprising program code for causing the control apparatus to carry out the steps of the method of living face detection according to the various exemplary embodiments of the present application as described herein above when the program product is run on a device.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of in-vivo face detection, the method comprising:

Obtaining a second confidence score of the second image according to a second model, wherein the second model is used for representing whether an object in the image is a living body or not according to the shape characteristics of the image, and the second confidence score is obtained through the following modes: adding noise to the second image to obtain a third image, wherein the third image is an image with the same shape characteristics and different texture characteristics as the second image, and the shape characteristics are used for representing edge characteristics of a paper image or a screen image in an image background; extracting a second feature of the second image and a third feature of the third image by using a convolutional neural network, wherein the second feature and the third feature contain the same shape information and different texture information; calculating a contrast loss value of the second feature and the third feature, and adjusting model parameters in the convolutional neural network so that the texture feature proportion contained in the second feature extracted by the convolutional neural network is reduced; determining the second confidence score of the object in the second image as a living body according to the second characteristic;

2. The method of claim 1, wherein the obtaining the first image and the second image based on face coordinates in the input image comprises:

3. The method of claim 1, wherein the deriving a first confidence score for the first image from the first model comprises:

Inputting the first image into the first model;

4. The method of claim 1, wherein the determining whether the face in the input image is a live face based on the comparison result by comparing the final confidence score obtained by calculating the first confidence score and the second confidence score with a preset threshold value comprises:

if yes, the face in the input image is a living face;

if not, the face in the input image is a non-living face.

5. An apparatus for in-vivo face detection, the apparatus comprising:

The second detection module is used for obtaining a second confidence score of the second image according to a second model, wherein the second model is used for representing whether an object in the image is a living body or not according to the shape characteristics of the image, and the second confidence score is obtained through the following modes: adding noise to the second image to obtain a third image, wherein the third image is an image with the same shape characteristics and different texture characteristics as the second image, and the shape characteristics are used for representing edge characteristics of a paper image or a screen image in an image background; extracting a second feature of the second image and a third feature of the third image by using a convolutional neural network, wherein the second feature and the third feature contain the same shape information and different texture information; calculating a contrast loss value of the second feature and the third feature, and adjusting model parameters in the convolutional neural network so that the texture feature proportion contained in the second feature extracted by the convolutional neural network is reduced; determining the second confidence score of the object in the second image as a living body according to the second characteristic;

6. The apparatus of claim 5, wherein the living judgment module is specifically configured to average the first confidence score and the second confidence score to obtain a final confidence score; judging whether the final confidence score is larger than the preset threshold value or not; if yes, the face in the input image is a living face; if not, the face in the input image is a non-living face.

7. An electronic device, comprising:

a memory for storing a computer program;

A processor for implementing the method of any of claims 1-4 when executing a computer program stored on the memory.

8. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-4.