CN112597885A

CN112597885A - Face living body detection method and device, electronic equipment and computer storage medium

Info

Publication number: CN112597885A
Application number: CN202011526858.3A
Authority: CN
Inventors: 聂凤梅; 李骊
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-04-02

Abstract

The application provides a face in-vivo detection method, a face in-vivo detection device, electronic equipment and a computer storage medium, wherein the face in-vivo detection method comprises the following steps: firstly, acquiring a face image to be detected; then, inputting the face image to be detected into a face living body detection model to obtain a living body prediction value of the face image to be detected; the human face living body detection model is obtained by training a deep learning model by using a plurality of training samples of human face images; the training sample of the face image comprises a plurality of living face images and a plurality of non-living face images; the deep learning model is a deep learning model containing a classification model structure; each dense network layer in the classification model structure is a network structure which integrates the original convolution and the cavity convolution; and finally, if the living body prediction value of the face image to be detected is larger than the threshold value, determining that the face in the image to be detected is a living body. Therefore, the aim of considering both the detection speed and the detection accuracy during the human face living body detection is fulfilled.

Description

Face living body detection method and device, electronic equipment and computer storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for detecting a living human face, an electronic device, and a computer storage medium.

Background

Ever since the commercial face recognition system enters the market, the face recognition system is widely applied to the fields of identity authentication, electronic commerce, man-machine interaction, information security and the like. The face living body recognition is more and more important, in many scenes needing face recognition, face living body detection needs to be carried out firstly, otherwise, the face recognition is meaningless, for example, in a face payment scene, a face image needs to be ensured to come from a real person before the face recognition, otherwise, a picture A possibly holding a picture B passes through the face recognition, and serious potential safety hazards are caused.

At present, the method for detecting the living human face generally adopts a texture feature method or a time method for detecting the living human face. However, the method for detecting the living human face by using the texture features comprises the following steps: the single-frame face image is used for living body detection, although the speed is relatively high, the accuracy of the face living body detection cannot be guaranteed; the method for detecting the living human face based on time comprises the following steps: the video (continuous multiframe) is used for detecting the living human face, although the accuracy is high, the speed is relatively slow. It can be seen that the existing human face living body detection method cannot give consideration to both detection speed and accuracy on mobile equipment.

Disclosure of Invention

In view of the above, the present application provides a method and an apparatus for detecting a living human face, an electronic device, and a computer storage medium, which are used for detecting a living human face while considering detection speed and accuracy.

The application provides a face living body detection method in a first aspect, which comprises the following steps:

acquiring a human face image to be detected;

inputting the face image to be detected into a face living body detection model to obtain a living body prediction value of the face image to be detected; the human face living body detection model is obtained by training a deep learning model by using a plurality of training samples of human face images; the training sample of the face image comprises a plurality of living face images and a plurality of non-living face images; the deep learning model is a deep learning model containing a classification model structure; each dense network layer in the classification model structure is a network structure which integrates original convolution and cavity convolution;

and if the living body prediction value of the face image to be detected is greater than the threshold value, determining that the face in the image to be detected is a living body.

Optionally, the method for constructing the human face living body detection model includes:

constructing a training sample set of the face image; wherein the training sample set of the facial images comprises a plurality of training samples of the facial images; the training samples of the plurality of face images comprise face images of a plurality of living bodies and face images of a plurality of non-living bodies;

aiming at a training sample of each face image, inputting the training sample of the face image into a plurality of continuous dense network layers in a deep learning model to obtain a first prediction result of whether the training sample of the face image is a living body and a second prediction result of whether the training sample of the face image is the living body in a plurality of scenes;

calculating to obtain a final loss function value of the training sample of the face image according to whether the training sample of the face image is a first prediction result of a living body or not, whether the training sample of the face image is a second prediction result of the living body or not in a plurality of scenes, an output value of each dense network layer, whether the training sample of the face image is a real result of the living body or not, and whether the training sample of the face image is a real result of the living body or not in a plurality of scenes;

and if the final loss function value does not meet the preset convergence condition, adjusting parameters in the deep learning model until the final loss function value calculated by the adjusted deep learning model meets the preset convergence condition, and taking the deep learning model as a human face living body detection model.

Optionally, the calculating, according to whether the training sample of the face image is the first prediction result of the living body, whether the training sample of the face image is the second prediction result of the living body in multiple scenes, the output value of each dense network layer, whether the training sample of the face image is the real result of the living body, and whether the training sample of the face image is the real result of the living body in multiple scenes, to obtain the final loss function value of the training sample of the face image, includes:

combining the output values of each dense network layer to obtain a series characteristic diagram;

inputting the feature maps connected in series into a spatial attention output layer to obtain a third prediction result of whether the training sample of the face image is a living body;

determining a first loss function value of the training sample of the face image according to a third prediction result of whether the training sample of the face image is a living body and a real result of whether the training sample of the face image is a living body;

determining a second loss function value of the training sample of the face image according to a first prediction result of whether the training sample of the face image is a living body and a real result of whether the training sample of the face image is a living body;

determining a third loss function value of the training sample of the face image according to a second prediction result of whether the training sample of the face image is a living body in a plurality of scenes and a real result of whether the training sample of the face image is a living body in a plurality of scenes;

and determining a sum of the first loss function value, the second loss function value, and the third loss function value as the final loss function value.

Optionally, the deep learning model is a deep learning model combining a classification model structure and a spatial attention output layer, and the inputting the face image to be detected into a face living body detection model to obtain a living body prediction value of the face image to be detected includes:

inputting the face image to be detected into a plurality of continuous dense network layers in a face living body detection model to obtain an output value of each dense network layer;

and inputting the series characteristic diagrams into the space attention output layer to obtain a living body prediction value of the face image to be detected.

inputting the feature maps connected in series into the spatial attention output layer to obtain a first living body prediction value of the face image to be detected;

inputting the face image to be detected into an output value of the last dense network layer in a plurality of continuous dense network layers in a face living body detection model, and taking the output value as a second living body prediction value of the face image to be detected;

and determining the living body prediction value of the face image to be detected according to the first living body prediction value and the second living body prediction value.

The present application provides in a second aspect a living human face detection apparatus, including:

the acquisition unit is used for acquiring a face image to be detected;

the first input unit is used for inputting the face image to be detected into a face living body detection model to obtain a living body prediction value of the face image to be detected; the human face living body detection model is obtained by training a deep learning model by using a plurality of training samples of human face images; the training sample of the face image comprises a plurality of living face images and a plurality of non-living face images; the deep learning model is a deep learning model containing a classification model structure; each dense network layer in the classification model structure is a network structure which integrates original convolution and cavity convolution;

and the first determining unit is used for determining the human face in the image to be detected as the living body if the living body prediction value of the human face image to be detected is greater than a threshold value.

Optionally, the construction unit of the human face living body detection model includes:

the training sample set constructing unit is used for constructing a training sample set of the face image; wherein the training sample set of the facial images comprises a plurality of training samples of the facial images; the training samples of the plurality of face images comprise face images of a plurality of living bodies and face images of a plurality of non-living bodies;

the second input unit is used for inputting the training samples of the face images to a plurality of continuous dense network layers in a deep learning model aiming at the training samples of each face image to obtain a first prediction result of whether the training samples of the face images are living bodies and a second prediction result of whether the training samples of the face images are living bodies in a plurality of scenes;

a calculating unit, configured to calculate a final loss function value of a training sample of the face image according to whether the training sample of the face image is a first prediction result of a living body, whether the training sample of the face image is a second prediction result of the living body in multiple scenes, an output value of each dense network layer, whether the training sample of the face image is a real result of the living body, and whether the training sample of the face image is a real result of the living body in multiple scenes;

and the adjusting unit is used for adjusting parameters in the deep learning model if the final loss function value does not meet the preset convergence condition until the final loss function value calculated by the adjusted deep learning model meets the preset convergence condition, and taking the deep learning model as a human face living body detection model.

Optionally, the computing unit includes:

the first merging unit is used for merging the output values of each dense network layer to obtain a series-connected characteristic diagram;

the third input unit is used for inputting the series-connected feature maps into a space attention output layer to obtain a third prediction result of whether the training sample of the face image is a living body;

a second determining unit, configured to determine a first loss function value of the training sample of the face image according to a third prediction result of whether the training sample of the face image is a living body and a real result of whether the training sample of the face image is a living body;

the second determining unit is further configured to determine a second loss function value of the training sample of the face image according to a first prediction result of whether the training sample of the face image is a living body and a real result of whether the training sample of the face image is a living body;

the second determining unit is further configured to determine a third loss function value of the training sample of the face image according to a second prediction result of whether the training sample of the face image is a living body in multiple scenes and a real result of whether the training sample of the face image is a living body in multiple scenes;

a third determining unit configured to determine a sum of the first loss function value, the second loss function value, and the third loss function value as the final loss function value.

Optionally, the deep learning model is a deep learning model combining a classification model structure and a spatial attention output layer, and the first input unit includes:

the first input subunit is used for inputting the face image to be detected to a plurality of continuous dense network layers in a face living body detection model to obtain an output value of each dense network layer;

a second merging unit, configured to merge the output values of each dense network layer to obtain a series-connected feature map;

the first input subunit is further configured to input the series-connected feature maps to the spatial attention output layer, so as to obtain a living body prediction value of the face image to be detected.

the first input subunit is further configured to input the series-connected feature maps to the spatial attention output layer, so as to obtain a first living body prediction value of the face image to be detected;

the first input subunit is further configured to input the face image to be detected to an output value in a last dense network layer of a plurality of continuous dense network layers in a face in-vivo detection model, and the output value is used as a second in-vivo prediction value of the face image to be detected;

and the fourth determining unit is used for determining the living body prediction value of the face image to be detected according to the first living body prediction value and the second living body prediction value.

A third aspect of the present application provides an electronic device comprising:

one or more processors;

a storage device having one or more programs stored thereon;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of the first aspects.

A fourth aspect of the present application provides a computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method according to any one of the first aspect.

In view of the above, in the human face living body detection method, the human face living body detection device, the electronic device, and the computer storage medium provided by the present application, the human face living body detection method includes: firstly, acquiring a face image to be detected; then, inputting the face image to be detected into a face living body detection model to obtain a living body prediction value of the face image to be detected; the human face living body detection model is obtained by training a deep learning model by using a plurality of training samples of human face images; the training sample of the face image comprises a plurality of living face images and a plurality of non-living face images; the deep learning model is a deep learning model containing a classification model structure; each dense network layer in the classification model structure is a network structure which integrates original convolution and cavity convolution; and finally, if the living body prediction value of the face image to be detected is larger than a threshold value, determining that the face in the image to be detected is a living body. Therefore, the aim of considering both the detection speed and the detection accuracy during the human face living body detection is fulfilled.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a specific flowchart of a human face live detection method according to an embodiment of the present application;

FIG. 2 is a diagram of a dense network layer in the prior art;

fig. 3 is a schematic diagram of a dense network layer according to an embodiment of the present application;

fig. 4 is a flowchart of a method for constructing a human face live detection model according to an embodiment of the present application;

FIG. 5 is a flow chart of a method for calculating a final loss function value according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a plurality of spatial attention output layers according to an embodiment of the present application;

fig. 7 is a detailed flowchart of a living human face detection method according to another embodiment of the present application;

fig. 8 is a detailed flowchart of a living human face detection method according to another embodiment of the present application;

fig. 9 is a schematic view of a living human face detection apparatus according to another embodiment of the present application;

fig. 10 is a schematic view of an electronic device for implementing a face live detection method according to another embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first", "second", and the like, referred to in this application, are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence of functions performed by these devices, modules or units, but the terms "include", or any other variation thereof are intended to cover a non-exclusive inclusion, so that a process, method, article, or apparatus that includes a series of elements includes not only those elements but also other elements that are not explicitly listed, or includes elements inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The embodiment of the application provides a face living body detection method, as shown in fig. 1, specifically comprising the following steps:

s101, obtaining a face image to be detected.

Specifically, firstly, in a scene where living human face recognition is required, for example: the method comprises the steps of identity authentication, man-machine interaction, video monitoring and the like, wherein images containing human faces are shot through a camera and the like, and then the shot images are extracted through the existing human face detection system to obtain human face images to be detected.

S102, inputting the face image to be detected into a face living body detection model to obtain a living body prediction value of the face image to be detected.

The human face living body detection model is obtained by training a deep learning model by using a plurality of training samples of human face images; the training sample of the face image comprises a plurality of living face images and a plurality of non-living face images; the deep learning model is a deep learning model containing a classification model structure; each dense network layer in the classification model structure is a network structure which integrates the original convolution and the cavity convolution.

The hole convolution can increase the perception field of the convolution neuron without increasing the calculation amount, and the principle is as follows:

y represents the output of the 1-dimensional convolution, y [ i ]]The ith neuron representing the output, x representing the characteristic of the input, x [ i + r.k ]]The element with index i + r.k is represented, r represents the void rate, the ordinary convolution, namely the original convolution can be regarded as the convolution with the void rate of 1, when r is not 1, the sampling with the interval of r-1 is equivalently carried out on the input characteristic, and then the convolution is carried out, so that the void convolution can be seen to expand the perception field of the neuron in a sampling mode without increasing the calculation amount.

Each dense network layer in the classification model in the prior art is implemented by a plurality of original convolutions, as shown in fig. 2, Filter conditioner represents a feature series, a Previous layer represents a Previous network layer, and conv represents a convolution, and it can be seen that in order to obtain different perception fields, the existing dense network layer needs to be implemented by 1 original convolution of 3 × 3 and 2 continuous original convolutions of 3 × 3, which correspond to the left and right parts in fig. 2, respectively. Referring to fig. 3, a schematic diagram of a dense network layer of a network structure in which an original convolution and a hole convolution are merged in the present application is shown, where atous conv.r is 2, which represents a hole convolution with a hole rate of 2, and it can be seen that the dense network layer of a network structure in which an original convolution and a hole convolution are merged only needs 1 original convolution of 3 × 3 and 1 hole convolution to achieve the effect of the dense network layer in fig. 2, that is, compared with the dense network layer in the prior art, the dense network layer of a network structure in which an original convolution and a hole convolution are merged can obtain a sensing field of the same size and reduce the number of parameters and the amount of calculation of a model.

It should be noted that the number of layers of the dense network layer and the number of features in the dense network layer may be changed according to the actual application, and is not limited herein. Similarly, the void rate of the void convolution can be changed according to the actual application, and is not limited herein.

Optionally, in another embodiment of the present application, an implementation manner of the method for constructing a living human face detection model, as shown in fig. 4, includes:

s401, constructing a training sample set of the face image.

The training sample set of the face images comprises a plurality of training samples of the face images; the training samples of the plurality of face images comprise face images of a plurality of living bodies and face images of a plurality of non-living bodies.

Specifically, the data acquisition system may be used to acquire images including faces in various scenes, for example: different lighting conditions, different distances between the cameras, different attack modes, different postures, different angles, and the like, which are not limited herein. And extracting the face image in the image by using a face detection system. Labeling the training sample of the face image, for example, if the training sample of the face image is a living body, setting a label gt _ label _1 as 1; if the training sample of the face image is a non-living body, a label gt _ label _1 may be set to 0.

And in order to enable the deep learning model to learn the essential features of the human face living body, wherein the essential features are the features of the human face living body which do not change along with the scene. The generalization capability of the deep learning model under different scenes is increased, so that an auxiliary classification task is also added during model training, labels of living bodies under different scenes are 0, labels of non-living bodies under different scenes are different, and the labels of the non-living bodies under the three scenes are respectively 1, 2 and 3 under the assumption that three scenes exist. For example: if the training sample of the face image in the scene a is a non-living body, setting a label gt _ label _2 as a; if the training sample of the face image is a non-living body in the scene b, setting a label gt _ label _2 as b; if the training sample of the face image is a non-living body in the scene c, setting a label gt _ label _2 as c; in any case, as long as the training sample of the face image is a living body, a label gt _ label _2 can be set to 0.

Therefore, each training sample of the face image corresponds to two labels, one is a label indicating that the training sample of the face image is a living body or a non-living body, and the other is a label indicating that the training sample of the face image is a living body 0 in different scenes or a non-living body in a certain scene.

S402, aiming at the training sample of each face image, inputting the training sample of the face image into a plurality of continuous dense network layers in the deep learning model, and obtaining a first prediction result whether the training sample of the face image is a living body or not and a second prediction result whether the training sample of the face image is a living body or not in a plurality of scenes.

S403, calculating to obtain a final loss function value of the training sample of the face image according to whether the training sample of the face image is a first prediction result of a living body, whether the training sample of the face image is a second prediction result of the living body in a plurality of scenes, an output value of each dense network layer, whether the training sample of the face image is a real result of the living body, and whether the training sample of the face image is a real result of the living body in a plurality of scenes.

It should be noted that, as the number of layers of the dense network layer in the deep learning increases, the size of the output value of the output dense network layer may change, and then the dense network layer with the same specification may be used as the same dense network layer, and only one dense network layer may be calculated, so as to further reduce the calculation amount.

Optionally, in another embodiment of the present application, an implementation manner of step S403, as shown in fig. 5, includes:

and S501, combining the output values of each dense network layer to obtain a series characteristic diagram.

And S502, inputting the series-connected feature maps into a space attention output layer to obtain a third prediction result of whether the training sample of the face image is a living body.

The structure of the spatial attention output layer and the corresponding loss function are beneficial to improving the accuracy of the classification model without increasing the calculation amount.

The dotted line in fig. 6 is a schematic diagram of the spatial attention output layer, feature represents the output feature of the dense network layer in front of the spatial attention output layer, and after a plurality of features are connected, that is, the features are merged to obtain a serial feature map, which is input to the spatial attention output layer, it should be noted that when a plurality of features are connected and input to the spatial attention output layer, the features of which layers are used for connection may be selected as needed, and therefore, the number of feature channels participating in connection may be different; k represents the number of categories of the classification task; ConvBlock represents a spatial attention map; the Spatial attribute map represents a Spatial attention map, the size of the Spatial attention map is m multiplied by n, the Spatial attention map represents a matrix with m rows and n columns, the size of each value can reflect the importance degree of the corresponding position feature, and the sum of all pixels in the Spatial attention map is 1; spatial logs represent Spatial logic, the scale of which is a 3-dimensional matrix of m × n × K, and each position in the Spatial logic corresponds to K values, and it can be understood that there are m × n 1-dimensional vectors of length K in total, that is, m × n squares are directly opposite to one another when viewed from the front side of the image to the direction penetrating the paper surface, and each square represents a vector of length K.

S503, determining a first loss function value of the training sample of the face image according to whether the training sample of the face image is a third prediction result of the living body and whether the training sample of the face image is a real result of the living body.

Specifically, the first loss function value of the training sample of the face image may be calculated by using a preset calculation formula of the first loss function value. Wherein, the preset calculation formula of the first loss function value is as follows:

wherein L is_SAOLRepresenting the first loss function value, N representing the size of the training sample of the face image, y_iRepresenting the real result, y, corresponding to the training sample of the ith personal image_iSAOLAnd whether the training sample which is the ith personal face image is the third prediction result of the living body.

S504, determining a second loss function value of the training sample of the face image according to the first prediction result of whether the training sample of the face image is a living body and the real result of whether the training sample of the face image is a living body.

Specifically, the second loss function value of the training sample of the face image may be calculated by using a preset calculation formula of the second loss function value. Wherein, the preset calculation formula of the second loss function value is as follows:

wherein L is_CLSRepresenting the second loss function value, N representing the size of the training sample of the face image, y_iRepresenting the real result, y, corresponding to the training sample of the ith personal image_iPWhether the training sample which is the ith personal face image is the first prediction result of the living body.

And S505, determining a third loss function value of the training sample of the face image according to a second prediction result of whether the training sample of the face image is a living body in a plurality of scenes and a real result of whether the training sample of the face image is a living body in a plurality of scenes.

Specifically, the third loss function value of the training sample of the face image may be calculated by using a preset calculation formula of the third loss function value. Wherein, the preset calculation formula of the second loss function value is as follows:

wherein L is_{arc_face}Representing a third loss function value, wherein N represents the size of a training sample of the face image; s and m are empirical parameters; can be set and changed according to the actual application condition; n is the total number of categories; y is_iRepresenting a real result corresponding to the training sample of the ith personal face image; w_j、x_iAnd cos θ_jAnd a constraint condition of a calculation formula of the preset third loss function value. | | | is a two-norm operation symbol, W_jRepresenting a weight vector corresponding to the j category label; x is the number of_iRepresenting input feature vectors corresponding to training samples of the ith personal face image; theta_jDenotes x_iThe angle between the weight vectors corresponding to each class.

S506, the sum of the first loss function value, the second loss function value, and the third loss function value is used as a final loss function value.

It should be noted that, according to the actual situation, weight parameters may be set for the first loss function value, the second loss function value, and the third loss function value, respectively, and the initial weight parameter defaults to 1, which is not limited herein.

And S404, judging whether the final loss function value meets a preset convergence condition.

The preset convergence condition is preset by a technician and can be adjusted according to an actual application situation, an application scenario, and the like, and is not limited herein.

Specifically, if it is determined that the final loss function value does not satisfy the preset convergence condition, step S405 is executed; if the final loss function value is determined to satisfy the predetermined convergence condition, step S406 is executed.

And S405, adjusting parameters in the deep learning model.

And S406, taking the deep learning model as a human face living body detection model.

It can be understood that, in the implementation process of this embodiment, a mode of presetting a maximum number of training rounds may also be adopted, the deep learning model is continuously trained until the maximum number of training rounds is reached, and the deep learning model reaching the maximum number of training rounds is used as the face living body detection model.

S103, judging whether the living body prediction value of the face image to be detected is larger than a threshold value.

The threshold is obtained by inputting all data in the verification set into the trained model after training of the face living body detection model is completed, obtaining all outputs, controlling the result of all outputs between 0 and 1, counting the living body detection rate under the corresponding threshold after adding 1/10000 values from 0, and obtaining the corresponding threshold when the living body detection rate is maximum as the final living body detection threshold of the model. The construction method can correspondingly refer to the way of constructing the training sample set of the face image, and the training sample set of the face image and the verification set of the face image are not crossed.

It is to be understood that the method of determining the threshold is not limited to the above-mentioned methods, for example: the threshold value when the equal error rate is reached on the verification set may be taken as the final threshold value, or the threshold value when the FAR is equal to a specific value may be taken.

And the threshold interval and the starting and ending values of the human face living body detection model can be determined according to the requirements when the threshold is determined.

Specifically, if the predicted value of the living body of the face image to be detected is judged to be greater than the threshold value, the step S104 is executed; and if the living body prediction value of the face image to be detected is judged to be larger than the threshold value, executing the step S105.

And S104, determining the human face in the image to be detected as a living body.

And S105, determining that the human face in the image to be detected is a non-living body.

According to the scheme, the living human face detection method provided by the application comprises the steps of firstly, obtaining a human face image to be detected; then, inputting the face image to be detected into a face living body detection model to obtain a living body prediction value of the face image to be detected; the human face living body detection model is obtained by training a deep learning model by using a plurality of training samples of human face images; the training sample of the face image comprises a plurality of living face images and a plurality of non-living face images; the deep learning model is a deep learning model containing a classification model structure; each dense network layer in the classification model structure is a network structure which integrates the original convolution and the cavity convolution; and finally, if the living body prediction value of the face image to be detected is larger than the threshold value, determining that the face in the image to be detected is a living body. Therefore, the aim of considering both the detection speed and the detection accuracy during the human face living body detection is fulfilled.

Optionally, in another embodiment of the present application, an implementation manner of the face live detection method, as shown in fig. 7, includes:

and S701, acquiring a face image to be detected.

It should be noted that the specific implementation process of step S701 is the same as the specific implementation process of step S101, and reference may be made to this.

S702, inputting the face image to be detected into a plurality of continuous dense network layers in the face living body detection model to obtain an output value of each dense network layer.

The human face living body detection model is obtained by training a deep learning model by using a plurality of training samples of human face images; the training sample of the face image comprises a plurality of living face images and a plurality of non-living face images; the deep learning model is a deep learning model combining a classification model structure and a spatial attention output layer; each dense network layer in the classification model structure is a network structure which integrates the original convolution and the cavity convolution.

And S703, combining the output values of each dense network layer to obtain a series characteristic diagram.

It should be noted that the specific implementation process of step S703 is the same as the specific implementation process of step S501, and reference may be made to this.

And S704, inputting the series characteristic images into a space attention output layer to obtain a living body prediction value of the face image to be detected.

It should be noted that, although step S704 is a step in the actual application process of the face live detection model, and step S502 is a step in the face live detection model construction process, the specific implementation process of step S704 is the same as that of step S502 described above, and therefore, reference may be made to each other.

S705, judging whether the living body prediction value of the face image to be detected is larger than a threshold value.

And S706, determining the human face in the image to be detected as a living body.

And S707, determining that the human face in the image to be detected is a non-living body.

It should be noted that the specific implementation process of steps S705 to S707 is the same as the specific implementation process of steps S103 to S105, and may be referred to each other.

According to the scheme, the living human face detection method provided by the application comprises the steps of firstly, obtaining a human face image to be detected; then, the face image to be detected is input to a plurality of continuous dense network layers in the face living body detection model, and an output value of each dense network layer is obtained. The human face living body detection model is obtained by training a deep learning model by using a plurality of training samples of human face images; the training sample of the face image comprises a plurality of living face images and a plurality of non-living face images; the deep learning model is a deep learning model combining a classification model structure and a spatial attention output layer; each dense network layer in the classification model structure is a network structure which integrates the original convolution and the cavity convolution; combining the output values of each dense network layer to obtain a series characteristic diagram; inputting the series characteristic images into a space attention output layer to obtain a living body prediction value of the face image to be detected; and finally, if the living body prediction value of the face image to be detected is larger than the threshold value, determining that the face in the image to be detected is a living body. Therefore, the aim of considering both the detection speed and the detection accuracy during the human face living body detection is fulfilled.

Optionally, in another embodiment of the present application, an implementation manner of the face live detection method, as shown in fig. 8, includes:

and S801, acquiring a face image to be detected.

It should be noted that the specific implementation process of step S801 is the same as the specific implementation process of step S101, and reference may be made to this.

S802, inputting the face image to be detected to a plurality of continuous dense network layers in the face living body detection model to obtain an output value of each dense network layer.

And S803, combining the output values of each dense network layer to obtain a series characteristic diagram.

It should be noted that the specific implementation process of step S803 is the same as the specific implementation process of step S501, and reference may be made to this.

S804, inputting the feature maps connected in series into a space attention output layer to obtain a first living body prediction value of the face image to be detected.

It should be noted that, although step S804 is a step in the actual application process of the face live detection model, and step S502 is a step in the face live detection model construction process, the specific implementation process of step S704 is the same as that of step S502 described above, and therefore, reference may be made to each other.

And S805, inputting the face image to be detected into an output value in the last dense network layer in a plurality of continuous dense network layers in the face living body detection model, and taking the output value as a second living body prediction value of the face image to be detected.

It should be noted that the second living body prediction value may also be equal to the living body prediction value of the face image to be detected output in step S102.

And S806, determining the living body prediction value of the face image to be detected according to the first living body prediction value and the second living body prediction value.

Specifically, the average value of the first living body prediction value and the second living body prediction value is used as the living body prediction value of the face image to be detected; the first living body prediction value and the second living body prediction value can also be weighted and averaged to be used as the living body prediction value of the face image to be detected, and the method is quite diversified and is not limited here. The weight may be adjusted according to the actual usage, and is not limited here.

S807, judging whether the living body prediction value of the face image to be detected is larger than a threshold value.

And S808, determining the human face in the image to be detected as a living body.

And S809, determining the human face in the image to be detected as a non-living body.

It should be noted that the specific implementation procedure of steps S707 to S709 is the same as the specific implementation procedure of steps S103 to S105, and can be referred to each other.

According to the scheme, the living human face detection method provided by the application comprises the steps of firstly, obtaining a human face image to be detected; then, the face image to be detected is input to a plurality of continuous dense network layers in the face living body detection model, and an output value of each dense network layer is obtained. The human face living body detection model is obtained by training a deep learning model by using a plurality of training samples of human face images; the training sample of the face image comprises a plurality of living face images and a plurality of non-living face images; the deep learning model is a deep learning model combining a classification model structure and a spatial attention output layer; each dense network layer in the classification model structure is a network structure which integrates the original convolution and the cavity convolution; combining the output values of each dense network layer to obtain a series characteristic diagram; inputting the feature maps connected in series into a spatial attention output layer to obtain a first living body prediction value of the face image to be detected; inputting the face image to be detected into an output value of the last dense network layer in a plurality of continuous dense network layers in the face living body detection model, and taking the output value as a second living body prediction value of the face image to be detected; determining a living body prediction value of the face image to be detected according to the first living body prediction value and the second living body prediction value; and finally, if the living body prediction value of the face image to be detected is larger than the threshold value, determining that the face in the image to be detected is a living body. Therefore, the aim of considering both the detection speed and the detection accuracy during the human face living body detection is fulfilled.

Another embodiment of the present application provides a human face live detection device, as shown in fig. 9, specifically including:

an obtaining unit 901, configured to obtain a face image to be detected.

The first input unit 902 is configured to input the face image to be detected into the face living body detection model, so as to obtain a living body prediction value of the face image to be detected.

The first determining unit 903 is configured to determine that a face in the image to be detected is a living body if the predicted value of the living body of the face image to be detected is greater than a threshold value.

For a specific working process of the unit disclosed in the above embodiment of the present application, reference may be made to the content of the corresponding method embodiment, as shown in fig. 1, which is not described herein again.

According to the above scheme, in the living human face detection device provided by the application, firstly, the acquisition unit 901 acquires a human face image to be detected; then, the first input unit 902 inputs the face image to be detected into the face living body detection model to obtain a living body prediction value of the face image to be detected; the human face living body detection model is obtained by training a deep learning model by using a plurality of training samples of human face images; the training sample of the face image comprises a plurality of living face images and a plurality of non-living face images; the deep learning model is a deep learning model containing a classification model structure; each dense network layer in the classification model structure is a network structure which integrates the original convolution and the cavity convolution; finally, if the living body prediction value of the face image to be detected is greater than the threshold, the first determining unit 903 determines that the face in the image to be detected is a living body. Therefore, the aim of considering both the detection speed and the detection accuracy during the human face living body detection is fulfilled.

Optionally, in another embodiment of the present application, an implementation manner of the construction unit of the living human face detection model includes:

and the training sample set constructing unit is used for constructing a training sample set of the face image.

And the second input unit is used for inputting the training samples of the face images to a plurality of continuous dense network layers in the deep learning model aiming at the training samples of each face image to obtain a first prediction result of whether the training samples of the face images are living bodies and a second prediction result of whether the training samples of the face images are living bodies in a plurality of scenes.

And the calculating unit is used for calculating a final loss function value of the training sample of the face image according to whether the training sample of the face image is a first prediction result of a living body, whether the training sample of the face image is a second prediction result of the living body in a plurality of scenes, an output value of each dense network layer, whether the training sample of the face image is a real result of the living body, and whether the training sample of the face image is a real result of the living body in a plurality of scenes.

And the adjusting unit is used for adjusting parameters in the deep learning model if the final loss function value does not meet the preset convergence condition until the final loss function value calculated by the adjusted deep learning model meets the preset convergence condition, and taking the deep learning model as the human face living body detection model.

For a specific working process of the unit disclosed in the above embodiment of the present application, reference may be made to the content of the corresponding method embodiment, as shown in fig. 4, which is not described herein again.

Optionally, in another embodiment of the present application, an implementation manner of the computing unit specifically includes:

and the first merging unit is used for merging the output values of each dense network layer to obtain a series-connected characteristic diagram.

And the third input unit is used for inputting the feature maps connected in series into the spatial attention output layer to obtain a third prediction result of whether the training sample of the face image is a living body.

And the second determining unit is used for determining the first loss function value of the training sample of the face image according to the third prediction result of whether the training sample of the face image is a living body and the real result of whether the training sample of the face image is a living body.

And the second determining unit is further used for determining a second loss function value of the training sample of the face image according to the first prediction result of whether the training sample of the face image is a living body and the real result of whether the training sample of the face image is a living body.

And the second determining unit is further used for determining a third loss function value of the training sample of the face image according to a second prediction result of whether the training sample of the face image is a living body in a plurality of scenes and a real result of whether the training sample of the face image is a living body in the plurality of scenes.

A third determining unit configured to determine a sum of the first loss function value, the second loss function value, and the third loss function value as a final loss function value.

For a specific working process of the unit disclosed in the above embodiment of the present application, reference may be made to the content of the corresponding method embodiment, as shown in fig. 5, which is not described herein again.

Optionally, in another embodiment of the present application, the deep learning model is a deep learning model combining a classification model structure and a spatial attention output layer, and then an implementation of the first input unit includes:

the first input subunit is used for inputting the face image to be detected to a plurality of continuous dense network layers in the face living body detection model to obtain an output value of each dense network layer.

And the second merging unit is used for merging the output values of each dense network layer to obtain a series-connected characteristic diagram.

The first input subunit is further configured to input the feature maps connected in series to the spatial attention output layer, so as to obtain a living body prediction value of the face image to be detected.

For a specific working process of the unit disclosed in the above embodiment of the present application, reference may be made to the content of the corresponding method embodiment, as shown in fig. 7, which is not described herein again.

The first input subunit is further configured to input the feature maps connected in series to the spatial attention output layer, so as to obtain a first living body prediction value of the face image to be detected.

The first input subunit is further configured to input the face image to be detected into an output value in a last dense network layer of the consecutive dense network layers in the face in-vivo detection model, as a second in-vivo prediction value of the face image to be detected.

For a specific working process of the unit disclosed in the above embodiment of the present application, reference may be made to the content of the corresponding method embodiment, as shown in fig. 8, which is not described herein again.

Another embodiment of the present application provides an electronic device, as shown in fig. 10, including:

one or more processors 1001.

Storage 1002 on which one or more programs are stored.

The one or more programs, when executed by the one or more processors 1001, cause the one or more processors 1001 to implement the methods as in any of the above embodiments.

Another embodiment of the present application provides a computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method as described in any of the above embodiments.

In the above embodiments disclosed in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present disclosure may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part. The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a live broadcast device, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Those skilled in the art can make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A face living body detection method is characterized by comprising the following steps:

acquiring a human face image to be detected;

2. The face in-vivo detection method according to claim 1, wherein the face in-vivo detection model construction method comprises the following steps:

3. The method according to claim 2, wherein the calculating a final loss function value of the training sample of the face image according to whether the training sample of the face image is a first prediction result of a living body, whether the training sample of the face image is a second prediction result of a living body in a plurality of scenes, an output value of each dense network layer, whether the training sample of the face image is a real result of a living body, and whether the training sample of the face image is a real result of a living body in a plurality of scenes comprises:

4. The method according to claim 1, wherein the deep learning model is a deep learning model combining a classification model structure and a spatial attention output layer, and the inputting the face image to be detected into the face living detection model to obtain the living prediction value of the face image to be detected comprises:

5. The method according to claim 1, wherein the deep learning model is a deep learning model combining a classification model structure and a spatial attention output layer, and the inputting the face image to be detected into the face living detection model to obtain the living prediction value of the face image to be detected comprises:

6. A face liveness detection device, comprising:

the acquisition unit is used for acquiring a face image to be detected;

7. The face in-vivo detection device according to claim 6, wherein the face in-vivo detection model construction unit comprises:

8. The face liveness detection device according to claim 7, wherein the calculation unit comprises:

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-5.

10. A computer storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method of any of claims 1 to 5.