CN108416324B

CN108416324B - Method and apparatus for detecting living body

Info

Publication number: CN108416324B
Application number: CN201810258183.5A
Authority: CN
Inventors: 武慧敏; 洪智滨
Original assignee: Baidu Online Network Technology Beijing Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd
Priority date: 2018-03-27
Filing date: 2018-03-27
Publication date: 2022-02-25
Anticipated expiration: 2038-03-27
Also published as: CN108416324A

Abstract

The embodiment of the application discloses a method and a device for detecting a living body. One embodiment of the method comprises: acquiring a human face image to be detected; inputting a facial image to be detected into a pre-trained feature extraction model to obtain image features corresponding to the facial image to be detected, wherein the feature extraction model is used for extracting the features of the facial image; and inputting the obtained image characteristics into a pre-trained living body detection model to obtain a living body detection result corresponding to the face image to be detected, wherein the living body detection result is used for representing whether the face in the face image is a living body face, and the living body detection model is used for representing the corresponding relation between the image characteristics and the living body detection result. The embodiment improves the convenience of the living body detection for the user and improves the speed of the living body detection.

Description

Method and apparatus for detecting living body

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for detecting a living body.

Background

In liveness detection, a video of a user making a specified action (e.g., nodding, shaking, raising, lowering, blinking, etc.) may be first recorded and then the recorded video may be analyzed to give a liveness detection result. However, making the specified action causes inconvenience in use to the user, and the time for analyzing the video is also relatively long.

Disclosure of Invention

The embodiment of the application provides a method and a device for detecting a living body.

In a first aspect, an embodiment of the present application provides a method for detecting a living body, the method including: acquiring a human face image to be detected; inputting a facial image to be detected into a pre-trained feature extraction model to obtain image features corresponding to the facial image to be detected, wherein the feature extraction model is used for extracting the features of the facial image; and inputting the obtained image characteristics into a pre-trained living body detection model to obtain a living body detection result corresponding to the face image to be detected, wherein the living body detection result is used for representing whether the face in the face image is a living body face, and the living body detection model is used for representing the corresponding relation between the image characteristics and the living body detection result.

In some embodiments, the feature extraction model is trained by a first training step as follows: acquiring a living body face image set; for the living face images in the living face image set, executing the following first parameter adjusting steps: inputting the living body face image into an initial feature extraction model to obtain image features corresponding to the living body face image; inputting the obtained image features into an initial generator to obtain a generated face image, wherein the initial generator is a generator in an initial generation countermeasure network; adjusting parameters of an initial feature extraction model and an initial generator based on the obtained similarity between the generated face image and the living body face image; and determining the initial feature extraction model as a feature extraction model.

In some embodiments, after adjusting parameters of the initial feature extraction model and the initial generator based on the obtained similarity between the generated face image and the living face image, the first parameter adjusting step further includes: respectively inputting the obtained generated face image and the living body face image into an initial discriminator to obtain a first discrimination result and a second discrimination result, wherein the initial discriminator is a discriminator in an initial generation countermeasure network, and the first discrimination result and the second discrimination result are respectively used for representing whether the obtained generated face image and the living body face image are real face images; parameters of the initial feature extraction model, the initial generator, and the initial discriminator are adjusted based on a first difference between a first discrimination result and a discrimination result indicating whether an image input to the initial discriminator is a real face image or not, and a second difference between a second discrimination result and a discrimination result indicating that an image input to the initial discriminator is a real face image.

In some embodiments, before performing the following first parameter adjusting step for the live face images in the live face image set, the first training step further comprises: the method comprises the steps of determining model structure information of an initial feature extraction model and network structure information of an initially generated countermeasure network, and initializing model parameters of the initial feature extraction model and network parameters of the initially generated countermeasure network.

In some embodiments, the in-vivo detection model is trained by the following second training step: acquiring a training sample set, wherein each training sample comprises a sample face image and labeling information used for indicating whether the face in the sample face image is a living body face; determining model structure information of the initial living body detection model and initializing model parameters of the initial living body detection model; for the training samples in the training sample set, performing the following second parameter adjusting step: inputting the sample face image in the training sample into a feature extraction model to obtain an image feature corresponding to the training sample; inputting the obtained image characteristics into an initial living body detection model to obtain a living body detection result corresponding to the obtained image characteristics; adjusting model parameters of an initial in-vivo detection model based on the difference between the obtained in-vivo detection result and the labeling information in the training sample; and determining the initial living body detection model as the living body detection model.

In some embodiments, the feature extraction model is a convolutional neural network.

In some embodiments, the liveness detection model includes a fully connected layer and a classifier.

In a second aspect, an embodiment of the present application provides an apparatus for detecting a living body, the apparatus including: the acquisition unit is configured to acquire a face image to be detected; the characteristic extraction unit is configured to input the facial image to be detected into a pre-trained characteristic extraction model to obtain image characteristics corresponding to the facial image to be detected, wherein the characteristic extraction model is used for extracting the characteristics of the facial image; and the living body detection unit is configured to input the obtained image characteristics into a pre-trained living body detection model to obtain a living body detection result corresponding to the face image to be detected, wherein the living body detection result is used for representing whether the face in the face image is a living body face, and the living body detection model is used for representing the corresponding relation between the image characteristics and the living body detection result.

In some embodiments, after adjusting parameters of the initial feature extraction model and the initial generator based on the obtained similarity between the generated face image and the live face image, the first parameter adjusting step further includes: respectively inputting the obtained generated face image and the living body face image into an initial discriminator to obtain a first discrimination result and a second discrimination result, wherein the initial discriminator is a discriminator in an initial generation countermeasure network, and the first discrimination result and the second discrimination result are respectively used for representing whether the obtained generated face image and the living body face image are real face images; parameters of the initial feature extraction model, the initial generator, and the initial discriminator are adjusted based on a first difference between a first discrimination result and a discrimination result indicating whether an image input to the initial discriminator is a real face image or not, and a second difference between a second discrimination result and a discrimination result indicating that an image input to the initial discriminator is a real face image.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any implementation manner of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method as described in any implementation manner of the first aspect.

According to the method and the device for detecting the living body, the image characteristics of the face image to be detected are extracted, and then the obtained image characteristics are input into a pre-trained living body detection model, so that the living body detection result corresponding to the face image to be detected is obtained. The living body detection can be realized only by acquiring the face image of the user without acquiring the video information of the appointed action of the user, so that the convenience degree of the living body detection of the user is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a method for detecting a living subject according to the present application;

FIG. 3 is a flow diagram for one embodiment of a first training step for training a feature extraction model according to the present application;

FIG. 4 is a flow diagram of one embodiment of a second training step for training a liveness detection model according to the present application;

FIG. 5 is a schematic diagram of the structure of one embodiment of a device for detecting living organisms according to the present application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the method for detecting a living body or the apparatus for detecting a living body of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as an image acquisition application, an image processing application, a biopsy application, a search application, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as a plurality of software or software modules (for example, to provide an image acquisition service or a biopsy service), or as a single software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as a detection server that performs live body detection on a face image to be detected uploaded by the

terminal apparatuses

101, 102, 103. The detection server may analyze and otherwise process the received data such as the face image to be detected, and feed back a processing result (e.g., a living body detection result) to the terminal device.

It should be noted that the method for detecting a living body provided in the embodiment of the present application is generally performed by the server 105, and accordingly, the apparatus for detecting a living body is generally disposed in the server 105.

It should be noted that the server 105 may locally store the facial image to be detected directly, and the server 105 may directly extract the local facial image to be detected for live body detection, in this case, the exemplary system architecture 100 may not include the

terminal devices

101, 102, and 103 and the network 104.

It should also be noted that the

terminal devices

101, 102, 103 may also be installed with a living body detection application, and the

terminal devices

101, 102, 103 may also perform living body detection based on the face image to be detected, in which case, the method for detecting a living body may also be executed by the

terminal devices

101, 102, 103, and accordingly, the apparatus for detecting a living body may also be installed in the

terminal devices

101, 102, 103. At this point, the exemplary system architecture 100 may also not include the server 105 and the network 104.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (for example, to provide a liveness detection service), or as a single software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for detecting a living subject according to the present application is shown. The method for detecting a living body includes the steps of:

step 201, obtaining a face image to be detected.

In the present embodiment, an execution subject (for example, a server shown in fig. 1) of the method for detecting a living body may acquire a face image to be detected.

Here, the facial image to be detected may be uploaded to the execution subject by a terminal device (e.g.,

terminal devices

101, 102, 103 shown in fig. 1) in communication connection with the execution subject (e.g., server shown in fig. 1) through a wired connection manner or a wireless connection manner. At this time, a camera may be mounted in a terminal device (for example, a mobile phone) which is in communication connection with the execution body. The terminal device can control a camera installed in the terminal device to shoot a face image of a user, and sends the shot face image to the execution main body. In this way, the execution subject may use the image received from the terminal device as the face image to be detected. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

The face image to be detected may also be locally stored by the execution main body. For example, when the execution subject is a terminal device, a camera may be mounted in the terminal device. The terminal equipment can control a camera installed in the terminal equipment to shoot the face image of the user, locally store the shot face image and acquire the locally stored face image as the face image to be detected.

Step 202, inputting the image to be detected into a pre-trained feature extraction model to obtain the image features corresponding to the face image to be detected.

In this embodiment, the executing entity (for example, the server shown in fig. 1) may input the image to be detected obtained in step 201 into a pre-trained feature extraction model, so as to obtain an image feature corresponding to the face image to be detected.

Here, the feature extraction model trained in advance may be various models for extracting image features. The image features may also be various features including, but not limited to, color features, texture features, two-dimensional shape features, two-dimensional spatial relationship features, three-dimensional shape features, three-dimensional spatial relationship features, facial features, shape features of five sense organs, position and scale features of five sense organs, and the like.

In some optional implementations of this embodiment, the feature extraction model may be a convolutional neural network. Here, a Convolutional Neural Network (CNN) may include at least one Convolutional layer, which may be used to extract image features, and at least one pooling layer, which may be used to Down-Sample input information. In practice, the convolutional neural network is a feed-forward neural network, and its artificial neurons can respond to a part of the surrounding cells within the coverage range, and have excellent performance on image processing, so that the convolutional neural network can be used for extracting the image features, which can be various basic elements (such as color, lines, textures and the like) of the image. The image features corresponding to the face image to be detected are used for representing the features in the face image to be detected, and meanwhile, dimension reduction is carried out on the face image to be detected, so that the later-period calculation amount is reduced. In practice, the convolutional neural network may further include an activation function layer, and the activation function layer performs nonlinear computation on the input information using various nonlinear activation functions (e.g., a ReLU (Rectified Linear Units) function, a Sigmoid function, etc.).

In some optional implementations of the present embodiment, the feature extraction model may be obtained by training through the following first training step. Referring to fig. 3, fig. 3 shows a flow 300 of one embodiment of a first training step for training a feature extraction model according to the present application, which may include the following steps 301 to 303:

step 301, acquiring a living human face image set.

Here, the execution subject of the first training step may be the same as or different from the execution subject of the method for detecting a living body. If the two training parameters are the same, the executing agent of the first training step may store the model structure information of the trained feature extraction model and the parameter values of the model parameters locally after the feature extraction model is trained. If not, the executing agent of the first training step may send the model structure information of the trained feature extraction model and the parameter values of the model parameters to the executing agent of the method for detecting a living body after the feature extraction model is trained.

Here, the execution subject of the first training step may acquire a set of living body face images, each of which is an image obtained by taking a living body face, locally or from another electronic device network-connected to the execution subject of the first training step described above.

Step 302, for the living face image in the living face image set, executing a first parameter adjusting step.

Here, the executing subject of the first training step may execute the first parameter adjustment step with respect to the living face images in the set of living face images acquired in step 301. Wherein, the first parameter adjusting step may include the following sub-steps 3021 to 3023:

and a substep 3021, inputting the living body face image into an initial feature extraction model to obtain an image feature corresponding to the living body face image.

Here, the executing subject of the first training step may input the living body face image into an initial feature extraction model, resulting in an image feature corresponding to the living body face image. Here, the initial feature extraction model may be a model for feature extraction that is predetermined for training the feature extraction model.

Optionally, the executing agent of the first training step may execute the following first initialization operation before executing step 302:

first, model structure information of an initial feature extraction model may be determined. It is to be understood that, since the initial feature extraction model may include various types of models for extracting image features, model structure information required to be determined is also different for different types of models for extracting image features. Alternatively, the initial feature extraction model may be a convolutional neural network. Since the convolutional neural network is a multi-layer neural network, each layer is composed of a plurality of two-dimensional planes, and each plane is composed of a plurality of independent neurons, it is necessary to determine which layers (e.g., convolutional layers, pooling layers, excitation function layers, etc.), the connection order relationship between layers, and which parameters (e.g., weight, bias, convolution step size) each layer includes, etc. the initial feature extraction model of the convolutional neural network type includes. Among other things, convolutional layers may be used to extract image features. For each convolution layer, it can be determined how many convolution kernels exist, the size of each convolution kernel, the weight of each neuron in each convolution kernel, the bias term corresponding to each convolution kernel, the step length between two adjacent convolutions, whether padding is needed, how many pixel points are padded, and the number value for padding (generally, the padding is 0), etc. While the pooling layer may be used to Down-Sample (Down Sample) the input information to compress the amount of data and parameters to reduce overfitting. For each pooling layer, a pooling method for that pooling layer may be determined (e.g., taking a region average or taking a region maximum). The excitation function layer is used for carrying out nonlinear calculation on input information. A specific excitation function may be determined for each excitation function layer. For example, the activation function may be a ReLU and various variants of ReLU activation functions, a Sigmoid function, a Tanh (hyperbolic tangent) function, a Maxout function, and so on. In practice, a Convolutional Neural Network (CNN) is a feed-forward Neural Network whose artificial neurons can respond to a part of surrounding cells within a coverage range, and has excellent performance on image processing, so that the Convolutional Neural Network can be used for extracting image features, which can be various basic elements of an image (such as color, lines, textures and the like).

Alternatively, the initial Feature extraction Model may also be an Active Shape Model (ASM), a Principal Component Analysis (PCA) Model, an Independent Component Analysis (ICA) Model, a Linear Discriminant Analysis (LDA) Model, a Local Feature Analysis (LFA) Model, or the like, for extracting features of the face image. Correspondingly, the model structure information to be determined is different corresponding to different feature extraction models.

Model parameters of the initial feature extraction model may then be initialized. In practice, the model parameters of the initial feature extraction model may be initialized with a number of different small random numbers. The small random number is used for ensuring that the model does not enter a saturation state due to overlarge weight value, so that training fails, and the difference is used for ensuring that the model can be normally learned.

In practice, the image features corresponding to the living human face image obtained may be in the form of a feature map (feature vector) or a feature vector, because the specific models of the feature extraction models are different.

And a substep 3022 of inputting the obtained image features into an initial generator to obtain a generated face image.

In this embodiment, the executing subject of the first training step may input the image features obtained in sub-step 3021 into the initial generator to obtain a generated face image. Wherein the initial generator is a generator in the initially generated countermeasure network. Here, the initial generation countermeasure network may be a generation countermeasure network (GAN) including an initial generator for generating an image and an initial discriminator for determining whether the input image is a generation image or a real image, which is predetermined for training the feature extraction model.

In some optional implementations of this embodiment, the executing subject of the first training step may perform the following second initialization operation before performing step 302:

first, network configuration information for initially generating the counterparty network may be determined.

Here, since the initially generated countermeasure network includes an initial generator and an initial discriminator. Thus, here, the execution subject of the first training step may determine the network structure information of the initial generator, and determine the network structure information of the initial discriminator.

It is understood that the initial generator and the initial arbiter can be various neural networks, and for this purpose, it can be determined which neural network the initial generator and the initial arbiter are, respectively, including several layers of neurons, how many neurons there are in each layer, the connection order relationship between the neurons in each layer, which parameters each layer of neurons includes, the corresponding activation function type of each layer of neurons, and so on. It will be appreciated that the network structure information that needs to be determined is different for different neural network types.

Parameter values for network parameters of the initial generator and the initial arbiter in the initially generated counterpoise network may then be initialized. In practice, the various network parameters of the initial generator and the initial arbiter may be initialized with some different small random numbers. The small random number is used for ensuring that the network does not enter a saturation state due to overlarge weight value, so that training fails, and the different random numbers are used for ensuring that the network can normally learn.

And a substep 3023 of adjusting parameters of the initial feature extraction model and the initial generator based on the obtained similarity between the generated face image and the live face image.

In this embodiment, the executing subject of the first training step may adjust the parameters of the initial feature extraction model and the initial generator based on the obtained similarity between the generated face image and the live face image.

In practice, an objective function can be set by taking the obtained maximum similarity between the generated face image and the living body face image as a target, then a preset optimization algorithm is adopted to adjust the parameters of the initial feature extraction model and the initial generator so as to optimize the objective function, and the first parameter adjusting step is finished under the condition that a first preset training finishing condition is met. For example, here, the first preset training end condition may include, but is not limited to: the training time exceeds the preset time length, the times of executing the first parameter adjusting step exceeds the preset times, and the similarity between the obtained generated face image and the living body face image is greater than a preset similarity threshold value.

Here, the preset optimization algorithm may include, but is not limited to, a Gradient Descent method (Gradient decision), a Newton's method, a Quasi-Newton method (Quasi-Newton Methods), a Conjugate Gradient method (Conjugate Gradient), a heuristic optimization method, and other various optimization algorithms now known or developed in the future. The similarity between two images can be calculated by various methods, for example, histogram matching, mathematical matrix decomposition (such as singular value decomposition and non-negative matrix decomposition), an image similarity calculation method based on feature points, and the like can be used.

In some implementations, after sub-step 3023, the first referencing step may also include sub-steps 3024 and 3025 below.

And a substep 3024 of inputting the generated face image and the living body face image obtained into an initial discriminator to obtain a first discrimination result and a second discrimination result, respectively.

Here, the executing subject of the first training step may input the generated image obtained in sub-step 3022 and the living body face image to the initial discriminator to obtain a first discrimination result and a second discrimination result, respectively. The initial discriminator is used for representing the corresponding relation between the face image and the discrimination result used for representing whether the input face image is the real face image. The first discrimination result is a discrimination result output by the initial discriminator for the generated face image obtained in the substep 3022 input to the initial discriminator, and the first discrimination result is used to characterize whether the generated face image obtained in the substep 3022 is a real face image. The second judgment result is a judgment result output by the initial discriminator aiming at the living body face image input into the initial discriminator, and the second judgment result is used for representing whether the living body face image is a real face image. Here, the discrimination result output by the initial discriminator may be in various forms. For example, the discrimination result may be a discrimination result (e.g., a number 1 or a vector (1,0)) for characterizing that the face image is a real face image or a discrimination result (e.g., a number 0 or a vector (0,1)) for characterizing that the face image is not a real face image (i.e., a generated face image); for another example, the discrimination result may further include a probability for characterizing that the face image is a real face image and/or a probability for characterizing that the face image is not a real face image (i.e., the generated face image), for example, the discrimination result may be a vector including a first probability for characterizing that the face image is a real face image and a second probability for characterizing that the face image is not a real face image (i.e., the generated face image).

Sub-step 3025, adjusting parameters of the initial feature extraction model, the initial generator, and the initial discriminator based on the first difference and the second difference.

Here, the execution subject of the first training step may first calculate the first difference and the second difference according to a preset loss function (e.g., L1 norm or L2 norm, etc.). Here, the first difference is a difference between the first discrimination result obtained in sub-step 3024 and a result of whether or not the image input to the initial discriminator is a real face image, and the second difference is a difference between the second discrimination result and a result of whether or not the image input to the initial discriminator is a real face image. It will be appreciated that the specific penalty functions may be different when the form of the discrimination results output by the initial generators is different.

Then, the executing agent of the first training step may adjust parameters of the initial feature extraction model, the initial generator, and the initial discriminator based on the calculated first difference and second difference, and end the first parameter adjusting step when a second preset training end condition is satisfied. For example, here, the second preset training end condition may include, but is not limited to: the training time exceeds the preset time, the number of times of executing the first parameter adjusting step exceeds the preset number of times, and the difference between the first probability and the second probability obtained by calculation is smaller than a first preset difference threshold value.

Here, the parameters of the initial feature extraction model, the initial generator, and the initial discriminator may be adjusted based on the calculated first and second differences in various implementations. For example, the model parameters of the initial feature extraction model, the initial generator, and the initial discriminator may be adjusted using a BP (Back Propagation) algorithm or an SGD (Stochastic Gradient Descent) algorithm.

Thus, by optimizing the initial feature extraction model and the initial generator for a plurality of times in step 302, the image features can be obtained after the face image is input into the initial feature extraction model, and the generated image obtained after the obtained image features are input into the initial generator is similar to the face image input into the initial feature extraction model.

Step 303, determining the initial feature extraction model as a feature extraction model.

In this embodiment, the executing subject of the first training step may determine the initial feature extraction model trained after step 302 as the feature extraction model. Thus, with the first training step described in steps 301 to 303, a pre-trained feature extraction model can be obtained.

And 203, inputting the obtained image characteristics into a pre-trained living body detection model to obtain a living body detection result corresponding to the face image to be detected.

In this embodiment, the executing subject of the method for detecting a living body may input the image features obtained in step 202 into a pre-trained living body detection model, so as to obtain a living body detection result corresponding to the face image to be detected. And the living body detection result is used for representing whether the face in the face image is a living body face. For example, the living body detection result may be a detection result identification (e.g., a number 1 or a vector (1,0)) for characterizing that the face in the face image is a living body face or a no detection result identification (e.g., a number 0 or a vector (0,1)) for characterizing that the face in the face image is not a living body face; for another example, the living body detection result may further include a probability that the face in the face image is a living body face and/or a probability that the face in the face image is a non-living body face, for example, the living body detection result may be a vector including a third probability and a fourth probability, where the third probability is used for representing the probability that the face in the face image is a living body face, and the fourth probability is used for representing the probability that the face in the face image is a non-living body face.

Here, the living body detection model is used to characterize the correspondence between the image features and the living body detection results. As an example, the living body detection model may be a correspondence table in which correspondence between a plurality of image features and the living body detection result is stored, which is prepared in advance by a technician based on statistics of a large number of image features and the living body detection result; or a calculation formula which is preset by a technician based on statistics of a large amount of data and is stored in the execution subject and is used for performing numerical calculation on one or more numerical values in the image characteristics to obtain a calculation result for representing the living body detection result.

In some optional implementations of the embodiment, the living body detection model may be obtained by training through a second training step, as shown in fig. 4, fig. 4 shows a flow 400 of the second training step for training the living body detection model according to the present application, and the second training step may include the following steps 401 to 404:

step 401, a training sample set is obtained.

Here, the execution subject of the second training step may be the same as or different from the execution subject of the method for detecting a living body. If the two training results are the same, the executing agent of the second training step may store the model structure information of the trained in-vivo detection model and the parameter values of the model parameters locally after the in-vivo detection model is trained. If not, the executing agent of the second training step may send the model structure information of the trained living body detection model and the parameter values of the model parameters to the executing agent of the method for detecting a living body after the living body detection model is trained.

Here, the execution subject of the second training step may obtain a set of training samples locally or from other electronic devices connected to the execution subject network of the second training step, where each training sample includes a sample face image and annotation information indicating whether a face in the sample face image is a live face. That is, the sample face image may be a sample live face image or a sample non-live face image. Here, the annotation information may be a detection result identifier (e.g., a number 1 or a vector (1,0)) for characterizing that the face in the sample face image is a living face or a no identifier (e.g., a number 0 or a vector (0,1)) for characterizing that the face in the sample face image is not a living face; for another example, the annotation information may also be a probability that the face in the sample face image is a living face and/or a probability that the face in the face image is a non-living face, for example, the annotation information may be a vector including a third probability and a fourth probability, where the third probability is used to represent a probability that the face in the face image is a living face, and the fourth probability is used to represent a probability that the face in the face image is a non-living face.

Step 402, determining model structure information of the initial in-vivo detection model, and initializing model parameters of the initial in-vivo detection model.

Here, the execution subject of the second training step may first determine model structure information of the initial living body detection model. It will be appreciated that, since the initial liveness detection model may be a variety of machine learning models that may implement a classification function, the model structure information that needs to be determined may also be different for different types of models. For example, the initial liveness detection model may be a decision tree, logistic regression, naive bayes, neural networks, or the like.

Alternatively, the initial liveness detection model may include a fully connected layer and a classifier. For the fully-connected layer, since the fully-connected layer is used to connect two layers, and all neurons between the two connected layers have weighted connections, it is necessary to determine the number of neurons in the previous layer and the number of neurons in the next layer of the fully-connected layer for the fully-connected layer in the initial in-vivo detection model, so that the number of weighting parameters in the fully-connected layer can be determined to be M × N, where M is the number of neurons in the previous layer of the fully-connected layer, and N is the number of neurons in the next layer of the fully-connected layer. For a classifier in the initial liveness detection model, the type of the classifier needs to be determined. By way of example, the classifier may be an activation function layer, which may include the ReLU and various variant activation functions of the ReLU, a Sigmoid function, a Tanh (hyperbolic tangent) function, a Maxout function, and so on. For another example, the classifier can also be a logistic regression model. It is understood that the model structure information of the classifier to be determined may be different for different classifier types.

The performing agent of the second training step may then initialize the model parameters of the initial liveness detection model. In practice, the individual model parameters of the initial liveness detection model may be initialized with some different small random numbers. The small random number is used for ensuring that the model does not enter a saturation state due to overlarge weight value, so that training fails, and the difference is used for ensuring that the model can be normally learned.

In step 403, for the training samples in the training sample set, a second parameter adjusting step is performed.

Here, the executing entity of the second training step may execute a second parameter adjusting step on the training samples in the training sample set acquired in step 401, where the second parameter adjusting step may include the following sub-steps 4031 to 4033:

and a substep 4031, inputting the sample face image in the training sample into a feature extraction model to obtain an image feature corresponding to the training sample.

And a substep 4032 of inputting the obtained image feature into the initial living body detection model to obtain a living body detection result corresponding to the obtained image feature.

And a substep 4033 of adjusting model parameters of the initial in vivo detection model based on the difference between the obtained in vivo detection result and the labeling information in the training sample.

Here, the performing subject of the second training step may calculate a difference between the result of the living body test obtained in the sub-step 4032 and the labeling information in the training sample using a preset loss function (e.g., L1 norm or L2 norm, etc.), and adjust the model parameters of the initial living body test model based on the calculated difference, and may end the second tuning step in a case where a third preset training end condition is satisfied. For example, here, the third preset training end condition may include, but is not limited to: the training time exceeds the preset time; the frequency of executing the second parameter adjusting step exceeds the preset frequency; the calculated difference is less than a second preset difference threshold.

Here, various implementations may be employed to adjust the model parameters of the above-described initial liveness detection model based on the calculated differences. For example, the BP algorithm or the SGD algorithm may be employed to adjust model parameters of the initial liveness detection model.

The model parameters of the initial liveness detection model are optimized, via step 403.

In step 404, the initial in vivo detection model is determined as the in vivo detection model.

Here, the execution subject of the second training step may determine the initial living body detection model subjected to parameter adjustment in step 403 as the living body detection model.

The method for detecting the living body provided by the embodiment of the application obtains the living body detection result corresponding to the face image to be detected by extracting the image characteristics of the face image to be detected and then inputting the obtained image characteristics into the pre-trained living body detection model. The living body detection can be realized only by acquiring the face image of the user without acquiring the video information of the appointed action of the user, so that the convenience degree of the living body detection of the user is improved.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present application provides an embodiment of an apparatus for detecting a living body, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.

As shown in fig. 5, the apparatus 500 for detecting a living body of the present embodiment includes: an acquisition unit 501, a feature extraction unit 502, and a living body detection unit 503. The acquiring unit 501 is configured to acquire a face image to be detected; a feature extraction unit 502 configured to input the facial image to be detected into a pre-trained feature extraction model to obtain an image feature corresponding to the facial image to be detected, where the feature extraction model is used to extract features of the facial image; the living body detection unit 503 is configured to input the obtained image features into a pre-trained living body detection model, and obtain a living body detection result corresponding to the face image to be detected, where the living body detection result is used to represent whether the face in the face image is a living body face, and the living body detection model is used to represent a corresponding relationship between the image features and the living body detection result.

In this embodiment, specific processes of the obtaining unit 501, the feature extracting unit 502, and the living body detecting unit 503 of the apparatus 500 for detecting a living body and technical effects brought by the specific processes can refer to the related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, respectively, and are not described herein again.

In some optional implementations of the present embodiment, the feature extraction model may be obtained by training through the following first training step: acquiring a living body face image set; for the living face images in the living face image set, executing the following first parameter adjusting step: inputting the living body face image into an initial feature extraction model to obtain image features corresponding to the living body face image; inputting the obtained image characteristics into an initial generator to obtain a generated face image, wherein the initial generator is a generator in an initial generation countermeasure network; adjusting parameters of the initial feature extraction model and the initial generator based on the obtained similarity between the generated face image and the living body face image; and determining the initial feature extraction model as the feature extraction model.

In some optional implementations of this embodiment, after adjusting parameters of the initial feature extraction model and the initial generator based on the obtained similarity between the generated face image and the living body face image, the first parameter adjusting step may further include: respectively inputting the generated face image and the living body face image into an initial discriminator to obtain a first discrimination result and a second discrimination result, wherein the initial discriminator is the discriminator in the initial generation countermeasure network, and the first discrimination result and the second discrimination result are respectively used for representing whether the generated face image and the living body face image are real face images; adjusting parameters of the initial feature extraction model, the initial generator, and the initial discriminator based on a first difference between the first discrimination result and a result of whether or not an image input to the initial discriminator is a real face image, and a second difference between the second discrimination result and a result of whether or not an image input to the initial discriminator is a real face image.

In some optional implementations of this embodiment, before performing the following first parameter adjusting step on the living face images in the living face image set, the first training step may further include: determining model structure information of the initial feature extraction model and network structure information of the initially generated countermeasure network, and initializing model parameters of the initial feature extraction model and network parameters of the initially generated countermeasure network.

In some optional implementations of the present embodiment, the living body detection model may be obtained by training through the following second training step: acquiring a training sample set, wherein each training sample comprises a sample face image and labeling information used for indicating whether the face in the sample face image is a living body face; determining model structure information of an initial living body detection model and initializing model parameters of the initial living body detection model; for the training samples in the training sample set, executing the following second parameter adjusting step: inputting a sample face image in the training sample into the feature extraction model to obtain an image feature corresponding to the training sample; inputting the obtained image characteristics into the initial living body detection model to obtain a living body detection result corresponding to the obtained image characteristics; adjusting the model parameters of the initial in-vivo detection model based on the difference between the obtained in-vivo detection result and the labeling information in the training sample; and determining the initial living body detection model as the living body detection model.

In some optional implementations of this embodiment, the feature extraction model may be a convolutional neural network.

In some optional implementations of the embodiment, the liveness detection model may include a full connection layer and a classifier.

It should be noted that, for details of implementation and technical effects of each unit in the apparatus for detecting a living body provided in the embodiments of the present application, reference may be made to descriptions of other embodiments in the present application, and details are not described herein again.

Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An Input/Output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: a storage portion 606 including a hard disk and the like; and a communication section 607 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 607 performs communication processing via a network such as the internet. Drivers 608 are also connected to the I/O interface 605 as needed. A removable medium 609 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 608 as necessary, so that a computer program read out therefrom is mounted into the storage section 606 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 607 and/or installed from the removable medium 609. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a feature extraction unit, and a living body detection unit. The names of these units do not in some cases constitute a limitation on the unit itself, and for example, the acquisition unit may also be described as a "unit that acquires a face image to be detected".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a human face image to be detected; inputting a facial image to be detected into a pre-trained feature extraction model to obtain image features corresponding to the facial image to be detected, wherein the feature extraction model is used for extracting the features of the facial image; and inputting the obtained image characteristics into a pre-trained living body detection model to obtain a living body detection result corresponding to the face image to be detected, wherein the living body detection result is used for representing whether the face in the face image is a living body face, and the living body detection model is used for representing the corresponding relation between the image characteristics and the living body detection result.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for detecting a living body, comprising:

acquiring a human face image to be detected;

inputting the facial image to be detected into a pre-trained feature extraction model to obtain image features corresponding to the facial image to be detected, wherein the feature extraction model is used for extracting features of the facial image;

inputting the obtained image characteristics into a pre-trained living body detection model to obtain a living body detection result corresponding to the face image to be detected, wherein the living body detection result is used for representing whether the face in the face image is a living body face or not, and the living body detection model is used for representing the corresponding relation between the image characteristics and the living body detection result; the feature extraction model is obtained by training through the following first training step:

acquiring a living body face image set;

for the living face images in the living face image set, executing the following first parameter adjusting steps: inputting the living body face image into an initial feature extraction model to obtain image features corresponding to the living body face image; inputting the obtained image features into an initial generator to obtain a generated face image, wherein the initial generator is a generator in an initial generation countermeasure network; adjusting parameters of the initial feature extraction model and the initial generator based on the obtained similarity between the generated face image and the living body face image;

determining the initial feature extraction model as the feature extraction model.

2. The method of claim 1, wherein after adjusting parameters of the initial feature extraction model and the initial generator based on the obtained similarity between the generated face image and the live face image, the first parameter adjusting step further comprises:

respectively inputting the obtained generated face image and the living body face image into an initial discriminator to obtain a first discrimination result and a second discrimination result, wherein the initial discriminator is the discriminator in the initial generation countermeasure network, and the first discrimination result and the second discrimination result are respectively used for representing whether the obtained generated face image and the living body face image are real face images; adjusting parameters of the initial feature extraction model, the initial generator, and the initial discriminator based on a first difference between the first discrimination result and a discrimination result indicating whether an image input to the initial discriminator is a real face image or not and a second difference between the second discrimination result and a discrimination result indicating that an image input to the initial discriminator is a real face image.

3. The method of claim 2, wherein the first training step further comprises, before performing the following first parameter adjusting step for a live face image of the set of live face images:

determining model structure information of the initial feature extraction model and network structure information of the initially generated countermeasure network, and initializing model parameters of the initial feature extraction model and network parameters of the initially generated countermeasure network.

4. The method of claim 1, wherein the in-vivo detection model is trained by a second training step comprising:

acquiring a training sample set, wherein each training sample comprises a sample face image and labeling information used for indicating whether the face in the sample face image is a living body face;

determining model structure information of an initial in-vivo detection model, and initializing model parameters of the initial in-vivo detection model;

for the training samples in the training sample set, performing the following second parameter adjusting step: inputting a sample face image in the training sample into the feature extraction model to obtain an image feature corresponding to the training sample; inputting the obtained image characteristics into the initial living body detection model to obtain a living body detection result corresponding to the obtained image characteristics; adjusting model parameters of the initial in-vivo detection model based on the difference between the obtained in-vivo detection result and the labeling information in the training sample;

determining the initial in vivo detection model as the in vivo detection model.

5. The method of claim 1, wherein the feature extraction model is a convolutional neural network.

6. The method of any of claims 1-5, wherein the liveness detection model includes a fully connected layer and a classifier.

7. An apparatus for detecting a living body, comprising:

the acquisition unit is configured to acquire a face image to be detected;

the feature extraction unit is configured to input the facial image to be detected into a pre-trained feature extraction model to obtain image features corresponding to the facial image to be detected, wherein the feature extraction model is used for extracting features of the facial image;

the living body detection unit is configured to input the obtained image characteristics into a pre-trained living body detection model to obtain a living body detection result corresponding to the face image to be detected, wherein the living body detection result is used for representing whether the face in the face image is a living body face or not, and the living body detection model is used for representing the corresponding relation between the image characteristics and the living body detection result; the feature extraction model is obtained by training through the following first training step:

acquiring a living body face image set;

8. The apparatus of claim 7, wherein after adjusting parameters of the initial feature extraction model and the initial generator based on the obtained similarity between the generated face image and the live face image, the first parameter adjusting step further comprises:

9. The apparatus according to claim 8, wherein the first training step further comprises, before the following first parameter adjusting step is performed for a live face image of the set of live face images:

10. The apparatus of claim 7, wherein the in-vivo detection model is trained by a second training step of:

determining the initial in vivo detection model as the in vivo detection model.

11. The apparatus of claim 7, wherein the feature extraction model is a convolutional neural network.

12. The apparatus of any of claims 7-11, wherein the liveness detection model includes a fully connected layer and a classifier.

13. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-6.

14. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.