CN113255511A

CN113255511A - Method, apparatus, device and storage medium for living body identification

Info

Publication number: CN113255511A
Application number: CN202110557990.9A
Authority: CN
Inventors: 王珂尧; 梁柏荣
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-21
Filing date: 2021-05-21
Publication date: 2021-08-13

Abstract

The present disclosure provides a method, an apparatus, a device and a storage medium for living body identification, which relate to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to smart cities and financial scenes. The specific implementation scheme is as follows: acquiring a plurality of images to be processed of a target object, wherein each image to be processed corresponds to different poses of the target object; inputting a plurality of images to be processed into a pre-trained three-dimensional image generation model to obtain a three-dimensional face image of a target object; and obtaining a living body recognition result of the target object based on the three-dimensional face image. According to the technology disclosed by the invention, the accuracy of living body identification of the target object is improved, and the effectiveness and the generalization of identification of complex and various attack modes are improved.

Description

Method, apparatus, device and storage medium for living body identification

Technical Field

The utility model relates to an artificial intelligence technical field especially relates to computer vision and deep learning technical field, can be applied to under wisdom city and the financial scene.

Background

The human face living body detection is a basic composition module of a human face recognition system, and the safety of the human face recognition system is ensured. The human face living body detection technology specifically refers to a computer for judging whether a detected human face is a real human face or a fake human face attack, wherein the human face attack can be a legal user picture, a video shot in advance and the like. In the related technology, a human face living body detection algorithm using a deep learning technology is a mainstream method in the field at present, and compared with a traditional algorithm, the accuracy is greatly improved. However, for some application scenarios, the problems of poor generalization and poor detection effect on unknown attack modes still exist, and the actual application performance is affected.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, and storage medium for living body identification.

According to an aspect of the present disclosure, there is provided a method for living body identification, including:

acquiring a plurality of images to be processed of a target object, wherein each image to be processed corresponds to different poses of the target object;

inputting a plurality of images to be processed into a pre-trained three-dimensional image generation model to obtain a three-dimensional face image of a target object;

and obtaining a living body recognition result of the target object based on the three-dimensional face image.

According to another aspect of the present disclosure, there is provided a training method of a three-dimensional image generation model, including:

determining a target three-dimensional face image by using the sample image;

inputting a sample image into a three-dimensional image generation model to be trained to obtain a predicted three-dimensional face image;

and determining the difference between the target three-dimensional face image and the predicted three-dimensional face image, and training the three-dimensional image generation model to be trained according to the difference until the difference is within an allowable range.

According to another aspect of the present disclosure, there is provided an apparatus for living body identification, including:

the to-be-processed image acquisition module is used for acquiring a plurality of to-be-processed images of the target object, and each to-be-processed image corresponds to different poses of the target object respectively;

the three-dimensional face image generation module is used for inputting a plurality of images to be processed into a pre-trained three-dimensional image generation model to obtain a three-dimensional face image of a target object;

and the living body recognition module is used for obtaining a living body recognition result of the target object based on the three-dimensional face image.

According to another aspect of the present disclosure, there is provided a training apparatus for a three-dimensional image generation model, including:

the target three-dimensional face image determining module is used for determining a target three-dimensional face image by utilizing the sample image;

the prediction three-dimensional face image generation module is used for inputting the sample image into a three-dimensional image generation model to be trained to obtain a prediction three-dimensional face image;

and the training module is used for determining the difference between the target three-dimensional face image and the predicted three-dimensional face image, and training the three-dimensional image generation model to be trained according to the difference until the difference is within an allowable range.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method in any of the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method in any of the embodiments of the present disclosure.

According to the technology disclosed by the invention, the accuracy of in-vivo identification of the target object is improved and the effectiveness and the generalization of identification of complex and various attack modes are improved by acquiring a plurality of images to be processed of the target object under different poses, then generating a three-dimensional face image of the target object by using a pre-trained three-dimensional image generation model, and finally determining the in-vivo identification result of the target object based on the three-dimensional face image.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 shows a flow diagram of a method for living body identification according to an embodiment of the present disclosure;

fig. 2 shows a detailed flowchart of obtaining a three-dimensional face image according to a method for living body recognition according to an embodiment of the present disclosure;

fig. 3 shows a detailed flowchart of determining a living body recognition result of a target object according to a method for living body recognition according to an embodiment of the present disclosure;

fig. 4 shows a detailed flowchart of preprocessing a three-dimensional face image according to a method for living body recognition according to an embodiment of the present disclosure;

FIG. 5 shows a flow diagram of a method of training a three-dimensional image generation model according to an embodiment of the present disclosure;

FIG. 6 shows a specific flowchart of a method for training a three-dimensional image generation model to obtain a predicted three-dimensional face image according to an embodiment of the present disclosure;

FIG. 7 shows a specific flowchart of the method for training a three-dimensional image generation model to determine a target three-dimensional face image according to an embodiment of the present disclosure;

FIG. 8 illustrates a flow diagram for pre-processing face region images for a training method of a three-dimensional image generation model according to an embodiment of the present disclosure;

FIG. 9 illustrates a detailed flow chart of a method of training a three-dimensional image generation model to determine variance in accordance with an embodiment of the present disclosure;

FIG. 10 shows a schematic diagram of an apparatus for living body identification, according to an embodiment of the present disclosure;

FIG. 11 shows a schematic diagram of a three-dimensional image training apparatus according to an embodiment of the present disclosure;

fig. 12 is a block diagram of an electronic device for implementing a method for living body identification of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 shows a flowchart of a method for living body identification according to an embodiment of the present disclosure.

As shown in fig. 1, the method for living body identification of the embodiment of the present disclosure specifically includes the following steps:

s101: acquiring a plurality of images to be processed of a target object, wherein each image to be processed corresponds to different poses of the target object;

s102: inputting a plurality of images to be processed into a pre-trained three-dimensional image generation model to obtain a three-dimensional face image of a target object;

s103: and obtaining a living body recognition result of the target object based on the three-dimensional face image.

In the embodiment of the present disclosure, the target object may be understood as a target discrimination object in a specific scene, and the living body recognition result of the target object may be whether a carrier of the target object is a body of the target object. For example, in a scene of face recognition, the target object may be a face, and the living body recognition result of the target object may be that the carrier of the target object is a body or an attack, where the attack refers to a picture or a video including the target object.

The plurality of images to be processed of the target object can be understood as a plurality of images to be processed of the target object contained in the images, and the poses of the target object in different images to be processed are different, in other words, the viewing angles of the target object in different images to be processed are different. It should be noted that the pose of the target object may be formed by three degrees of freedom of the target object in a three-dimensional space and a spatial rotation of the three degrees of freedom. For example, in a face recognition scenario, the different poses of the target object may be such that the face of the target object is oriented differently.

For example, in step S101, a plurality of images to be processed may be acquired by acquiring target objects in different poses through a terminal device. The terminal device may be various image capturing devices, such as a camera, a video camera, and the like.

Exemplarily, in step S102, the three-dimensional image generation model may adopt an SFM (Structure From Motion) model, an SFS (Shape From shaping, a method for recovering three-dimensional information From a single image) model, or a 3D dm (3D deformable human face model), etc.

Specifically, taking 3DMM as an example, the core idea is that key points of a face can be matched one by one in a three-dimensional space, and can be calculated by orthogonal basis weighting or linear addition of other faces. The face is represented by a fixed number of key points based on the key points of the target object in each image to be processed, wherein the coordinates of each key point can be represented by the basis quantities of the key point in three directions in the three-dimensional space, for example, the coordinates (x, y, z) of the key point can be obtained by weighted addition calculation from the basis quantities (1,0,0), (0,1,0) and (0,0,1) in three directions in the three-dimensional space. The basic attributes of the three-dimensional face image comprise shape and texture, and each three-dimensional face image can be represented by linear superposition of a shape vector and a texture vector.

Illustratively, in step S103, a living body recognition result of the target object may be obtained by inputting the three-dimensional face image into a living body recognition model trained in advance. Specifically, a Convolutional Neural Network (CNN) may be used for the three-dimensional face image, and the Convolutional Neural network is a feed Forward Neural Network (FNN) that contains convolution calculation and has a deep structure, and may implement supervised learning or unsupervised learning of living body identification.

The method for living body identification according to the embodiment of the present disclosure is described below with reference to a specific application scenario.

In the face recognition scene, in response to a face recognition request, face images of a target object in different poses are acquired, for example, a plurality of to-be-processed images of the target object with the face oriented at different horizontal angles are acquired, or a plurality of to-be-processed images of the target object with the face oriented at different vertical angles are acquired. Then, a plurality of images to be processed of the target object are input into a pre-trained three-dimensional image generation model, and a three-dimensional face image of the target object is obtained. And finally, inputting the three-dimensional face image into a pre-trained living body recognition model to obtain a living body recognition result of the target object. Under the condition that the living body identification result of the target object is a living body, carrying out the next face identification process; and terminating the subsequent face recognition process when the living body recognition result of the target object is not the living body.

Compared with the prior art that only a single image of a target object is used as the input of a convolutional neural network, the method for identifying the living body of the embodiment of the disclosure acquires a plurality of images to be processed of the target object under different poses, then generates a three-dimensional face image of the target object by using a pre-trained three-dimensional image generation model, and finally determines the living body identification result of the target object based on the three-dimensional face image And the verification is favorable for further popularization of the business items.

As shown in fig. 2, in an embodiment, step S102 specifically includes the following steps:

s201: and inputting a plurality of images to be processed into a feature extraction layer in the three-dimensional image generation model, and receiving a three-dimensional face image of the target object from the image generation layer in the three-dimensional image generation model.

The characteristic extraction layer is configured to extract parameter information of each image to be processed and input the parameter information of the image to be processed into the image generation layer; the image generation layer is configured to receive parameter information of an image to be processed and output a three-dimensional face image of a target object.

Illustratively, the feature extraction layer may employ ResNet34 (a deep neural network). Specifically, the ResNet34 adopts a 34-layer Residual error network structure (Residual Unit), and the Residual error jump structure breaks through the convention that the output of the n-1 layer of the traditional neural network can only be used as the input for the n layer, so that the output of a certain layer can directly span several layers as the input of a later layer, and a new direction is provided for the difficult problem that the error rate of the whole learning model is not reduced and increased due to the superposition of multiple layers of networks, therefore, the number of layers of the neural network can exceed the previous constraint, reaches dozens of layers, hundreds of layers or even thousands of layers, and feasibility is provided for the extraction and classification of high-level semantic features. Based on the method, the accuracy of parameter information extraction is improved, and the training efficiency of the three-dimensional face image generation model is improved.

And constructing a three-dimensional face contour of the target object by the image generation layer based on the parameter information of each image to be processed extracted by the feature extraction layer, and rendering the three-dimensional face contour to finally obtain the three-dimensional face image of the target object.

According to the embodiment, the accuracy and precision of the three-dimensional face image are improved by extracting the parameter information of each image to be processed and generating the three-dimensional face image of the target object based on the parameter information, so that the accuracy and generalization of the living body recognition result are further improved.

In one embodiment, the parameter information includes at least one of a shape parameter, an expression parameter, a texture parameter, and at least one of a pose parameter, a lighting parameter, and a camera parameter. The image generation layer is configured to:

receiving parameter information of an image to be processed, and obtaining a three-dimensional contour image of a target object based on at least one of a shape parameter, an expression parameter and a texture parameter; and rendering the three-dimensional profile video based on at least one of the attitude parameter, the illumination parameter and the camera parameter to obtain a three-dimensional face image of the target object.

It is understood that the shape parameters are used to characterize the basic shape parameters of the face of the target object; the expression parameters are used for representing emotion parameters of the face of the target object; the texture parameter is used for representing a parameter for arranging and combining a plurality of texture primitives of the human face forming the target object, wherein the texture primitives refer to image primitives which are formed by pixels and have certain sizes and shapes; the posture parameters are used for representing the positions of the faces of the target objects in the three-dimensional space and the related parameters of the face orientations; the illumination parameters are used for representing relevant parameters such as illumination intensity, glossiness and the like of the face of the target object; the camera parameters are used for representing relevant shooting parameters used by the image acquisition device when acquiring the image to be processed.

Illustratively, in the process that the image generation layer receives the parameter information of the image to be processed from the feature extraction layer and outputs the three-dimensional face image of the target object, the three-dimensional contour image of the target object can be generated based on the shape parameter, the expression parameter and the texture parameter in the parameter information, and then the three-dimensional contour image is subjected to differentiable rendering based on the posture parameter, the illumination parameter and the camera parameter, so as to finally obtain the three-dimensional face image of the target object.

According to the embodiment, the image generation layer receives parameter information of multiple dimensions, constructs the three-dimensional contour image and renders the three-dimensional contour image based on the parameter information of different dimensions, so that the construction precision and accuracy of the three-dimensional face image are improved in multiple dimensions, and the robustness of subsequent living body identification is further improved.

As shown in fig. 3, in one embodiment, step S103 includes:

s301: preprocessing the three-dimensional face image to obtain a preprocessed three-dimensional face image;

s302: and inputting the preprocessed three-dimensional face image into a pre-trained living body recognition model to obtain a living body recognition result of the target object.

It should be noted that, in step S301, the three-dimensional face image is preprocessed, so that the preprocessed data of the three-dimensional face image can meet the input requirement of the living body recognition model, and the data of the three-dimensional face image is facilitated to be simplified, thereby improving the detection efficiency and stability of the subsequent living body recognition model.

For example, in step S302, the living body recognition model may adopt various models known to those skilled in the art or known in the future.

Taking the living body identification model as an example, the convolutional neural network model is adopted, and specifically, the convolutional neural network includes a feature extraction layer, a Fully Connected Layers (FC) and an image normalization processing layer. The feature extraction layer is used for extracting features of the three-dimensional face image, then inputting the extracted features into the full-connection layer, classifying the three-dimensional face image according to the extracted features by the full-connection layer, inputting a classification result into the image normalization processing layer, and finally outputting a living body identification result through image normalization processing of the image normalization processing layer. The feature extraction layer can adopt MobileNet V2 (a deep separable convolution) as a backbone network of a convolutional neural network; the image normalization processing layer can use a Softmax layer (a logistic regression model).

In addition, in the training process of the living body recognition model, a living body training sample and an attack training sample can be respectively input into the three-dimensional image generation model to obtain a living body three-dimensional face image and an attack three-dimensional face image. And training the living body recognition model by using the batch of living body three-dimensional face images and attack three-dimensional face images.

According to the embodiment, the three-dimensional face image is input into the pre-trained living body recognition model to obtain the living body recognition result of the target object, so that on one hand, the detection precision and accuracy of the living body recognition result are improved, and the generalization on attack detection is improved; on the other hand, the identification efficiency of living body identification is improved, and further popularization of the business item by adopting the method of the embodiment of the disclosure is facilitated.

As shown in fig. 4, in one embodiment, step S301 includes:

s401: determining a face area in the three-dimensional face image based on the three-dimensional face image;

s402: extracting a partial image of a face region in the three-dimensional face image based on the face region to serve as a face region image;

s403: and carrying out image normalization processing on the face region image to obtain a preprocessed three-dimensional face image.

For example, in step S401, a face region in the three-dimensional face image may be determined by a pre-trained face detection model.

Exemplarily, in step S402, coordinate values of each keypoint in the face region are detected through a pre-trained face keypoint detection model, a coordinate range of a face frame corresponding to the face region is determined according to the coordinate values of each keypoint, and a partial image of the face region in the three-dimensional face image is captured according to the coordinate range of the face frame, so as to obtain a face region image.

For example, the face key point detection model is used to detect predefined 72 face key point coordinates (x1, y1) … (x72, y72), determine two key point coordinates with the maximum and minimum horizontal coordinates and two key point coordinates with the maximum and minimum vertical coordinates from the coordinates, determine a coordinate range of a face frame corresponding to a face region based on the above four coordinates, and capture a three-dimensional face image based on the coordinate range of the face frame to obtain a face region template image. And finally, expanding the face area template image by three times, carrying out secondary interception on a face frame in the face area template image, and taking the finally obtained image as the face area image.

In step S403, for the pixel value of each pixel in the face region image, calculation processing is performed according to a preset rule, and a preprocessed three-dimensional face image is determined based on the calculation result of each pixel value.

For example, for each pixel value in the face region image, the difference value obtained by subtracting 128 from each pixel value is divided by 256 to obtain a calculation result, wherein the value range of the calculation result of each pixel value is [ -0.5,0.5 ].

In addition, after the image normalization processing is carried out on the face region graph, random data enhancement processing can be carried out on the preprocessed three-dimensional face image, and the processed three-dimensional face image is used as the input of the living body recognition model.

Through the embodiment, the relevant data of the preprocessed three-dimensional face image can be limited in the preset range, so that the influence of singular data in the relevant data on the finally generated living body recognition result is eliminated, and the accuracy of the living body recognition result is further improved.

In one embodiment, the plurality of images to be processed includes a first image to be processed, a second image to be processed, and a third image to be processed. The first image to be processed corresponds to the pose of the target object in the state that the face turns left by a first preset angle, the second image to be processed corresponds to the pose of the target object in the state that the face faces right ahead, and the third image to be processed corresponds to the pose of the target object in the state that the face turns right by a second preset angle.

Illustratively, in the first image to be processed, the face of the target object is turned 45 degrees to the left with respect to the image capturing device; in the second image to be processed, the face of the target object faces the image acquisition device; in the third image to be processed, the face of the target object turns to the right by 45 degrees relative to the image acquisition device.

By the implementation mode, the plurality of images to be processed respectively acquire the states of the target object in different poses, so that the three-dimensional face image generated by the three-dimensional image generation model can better conform to the real face image of the target object, the accuracy and precision of the three-dimensional face image are improved, and the attack image identification in the subsequent living body identification process is facilitated.

As another aspect of the embodiments of the present disclosure, a training method for a three-dimensional image generation model is also provided.

FIG. 5 illustrates a method of training a three-dimensional image generation model according to an embodiment of the present disclosure.

As shown in fig. 5, the training method of the three-dimensional image generation model includes the following steps:

s501: determining a target three-dimensional face image by using the sample image;

s502: inputting a sample image into a three-dimensional image generation model to be trained to obtain a predicted three-dimensional face image;

s503: and determining the difference between the target three-dimensional face image and the predicted three-dimensional face image, and training the three-dimensional image generation model to be trained according to the difference until the difference is within an allowable range.

Illustratively, in step S501, the sample image may be obtained by respectively acquiring a plurality of images of the living object in different poses and images of the attack object in different poses, wherein the attack object may be a photograph or a video of the living object. The target three-dimensional face image can be obtained by labeling the sample image.

Illustratively, in step S503, the difference between the target three-dimensional face image and the predicted three-dimensional face image may be determined by calculating a loss function of each pixel value between the two images.

According to the method disclosed by the embodiment of the disclosure, the training of the three-dimensional image generation model is realized, and the three-dimensional image generation model obtained by training has higher accuracy and generalization by taking the images of the living body object or the attack object in different poses as sample images.

As shown in fig. 6, in one embodiment, step S502 includes:

s601: inputting a sample image into a feature extraction layer in a three-dimensional image generation model to be trained, and receiving a predicted three-dimensional face image from an image generation layer in the three-dimensional image generation model to be trained;

wherein the feature extraction layer is configured to extract parameter information of the sample image and input the parameter information of the sample image into the image generation layer; the image generation layer is configured to receive parameter information of the sample image and output a predicted three-dimensional face image.

Illustratively, the feature extraction layer may employ ResNet 34. By adopting ResNet34 as a feature extraction layer, the accuracy of parameter information extraction is improved, and the training efficiency of the three-dimensional face image generation model is improved.

Through the embodiment, the trained three-dimensional image generation model can generate the three-dimensional face image of the target object based on the parameter information by extracting the parameter information of each image to be processed. Therefore, the accuracy and precision of the three-dimensional face image output by the three-dimensional image generation model are improved.

As shown in fig. 7, in one embodiment, step S501 includes:

s701: determining face key point data in the sample image based on the sample image;

s702: carrying out segmentation processing and mask processing on the sample image based on the face key point data to obtain a face region image;

s703: and preprocessing the face region image to obtain a target three-dimensional face image.

Exemplarily, coordinate values of face key points in a sample image are detected through a pre-trained face key point data set, a coordinate range of a face frame corresponding to a face region is determined according to the coordinate values of the face key points, and a partial image of the face region in the three-dimensional face image is subjected to segmentation processing and mask processing according to the coordinate range of the face frame to obtain a face region image.

For example, the face key point data set may be an ibug key point data set, and is configured to detect predefined 68 face key point coordinates, that is, (x1, y1) … (x68, y68), determine two face key point coordinates with the largest and smallest horizontal coordinates and two face key point coordinates with the largest and smallest vertical coordinates from the face key point coordinates, determine a coordinate range of a face frame corresponding to the face region based on the four face key point coordinates, and perform segmentation processing and mask processing on the three-dimensional face image based on the coordinate range of the face frame to obtain a face region template image. And finally, expanding the face area template image by three times, carrying out secondary interception on a face frame in the face area template image, and taking the finally obtained image as the face area image.

By the embodiment, the face region image can be accurately extracted from the sample image and used as the input of the three-dimensional image generation model to be trained, so that the influence of singular data in the sample image data on the training process is eliminated.

As shown in fig. 8, in one embodiment, step S703 includes:

s801: and carrying out alignment processing, cutting processing, size adjustment processing and image normalization processing on the face region image to obtain a target three-dimensional face image.

For example, in step S801, the face region image may be aligned and cropped according to a matching result of the face region image in the High-definition face data set by using a pre-established High-definition face data set (FFHQ). And adjusting the size of the face region image to 224 x 224.

In addition, the normalization processing is performed on the face region image, the calculation processing can be performed according to the preset rule on the pixel value of each pixel in the face region image, and the three-dimensional face image after the preprocessing is determined based on the calculation result of each pixel value.

Through the embodiment, the determined target three-dimensional face image can be ensured to have higher accuracy, the manual labeling cost for manual labeling is saved, and the acquisition difficulty of the target three-dimensional face image is reduced.

As shown in fig. 9, in one embodiment, step S503 includes:

s901: calculating a first loss value based on the pixel value of the predicted three-dimensional face image and the pixel value of the target three-dimensional face image;

s902: calculating a second loss value based on the key point coordinate value of the predicted three-dimensional face image and the key point coordinate value of the sample image;

s903: and summing the first loss value and the second loss value to obtain the difference between the target three-dimensional face image and the predicted three-dimensional face image.

Exemplarily, in step S901, the specific step of calculating the first loss value includes: rendering the predicted three-dimensional face image to obtain a two-dimensional image corresponding to the predicted three-dimensional face image, performing bitwise subtraction on a pixel value of the two-dimensional image and a pixel value of the sample image, performing bitwise multiplication on the obtained pixel difference value and a pixel value of the face area image, and summing absolute values of the obtained products of each pixel to obtain a first loss value.

For example, in step S902, the keypoint coordinates of the predicted face image may be determined by using the keypoint data set, and then an L1 loss function between the pixel value of the keypoint coordinates of the predicted face image and the pixel value of the keypoint coordinates of the sample image is calculated, resulting in a second loss value.

Exemplarily, in step S903, the first loss value, the second loss value, and regular terms of other parameters such as a shape parameter, an expression parameter, a texture parameter, and a pose parameter of the sample image are summed to obtain a difference between the target three-dimensional face image and the predicted three-dimensional face image.

By the embodiment, the difference between the target three-dimensional face image and the predicted three-dimensional face image can be accurately solved, and the optimization effect of the three-dimensional image generation model in the training process is improved.

According to another aspect of the embodiments of the present disclosure, there is also provided an apparatus for living body identification.

As shown in fig. 10, the apparatus for living body identification includes:

a to-be-processed image obtaining module 1001, configured to obtain a plurality of to-be-processed images of a target object, where each of the to-be-processed images corresponds to a different pose of the target object;

the three-dimensional face image generation module 1002 is configured to input a plurality of images to be processed into a pre-trained three-dimensional image generation model to obtain a three-dimensional face image of a target object;

and the living body recognition module 1003 is configured to obtain a living body recognition result of the target object based on the three-dimensional face image.

In one embodiment, the three-dimensional face image generation module 1002 is further configured to:

inputting a plurality of images to be processed into a feature extraction layer in a three-dimensional image generation model, and receiving a three-dimensional face image of a target object from an image generation layer in the three-dimensional image generation model;

In one embodiment, the parameter information includes at least one of a shape parameter, an expression parameter, a texture parameter, and at least one of a pose parameter, a lighting parameter, and a camera parameter; the image generation layer is configured to:

receiving parameter information of an image to be processed, and obtaining a three-dimensional contour image of a target object based on at least one of a shape parameter, an expression parameter and a texture parameter; and the number of the first and second groups,

and rendering the three-dimensional profile video based on at least one of the attitude parameter, the illumination parameter and the camera parameter to obtain a three-dimensional face image of the target object.

In one embodiment, the living body identification module 1003 includes:

the preprocessing submodule is used for preprocessing the three-dimensional face image to obtain a preprocessed three-dimensional face image;

and the living body recognition submodule is used for inputting the preprocessed three-dimensional face image into a pre-trained living body recognition model to obtain a living body recognition result of the target object.

In one embodiment, the pre-processing sub-module comprises:

the face area determining unit is used for determining a face area in the three-dimensional face image based on the three-dimensional face image;

a face region image determining unit, configured to extract, based on the face region, a partial image of the face region in the three-dimensional face image as a face region image;

and the image normalization processing unit is used for carrying out image normalization processing on the face area image to obtain a preprocessed three-dimensional face image.

In one embodiment, the plurality of images to be processed includes a first image to be processed, a second image to be processed, and a third image to be processed;

the first image to be processed corresponds to the pose of the target object in the state that the face turns left by a first preset angle, the second image to be processed corresponds to the pose of the target object in the state that the face faces right ahead, and the third image to be processed corresponds to the pose of the target object in the state that the face turns right by a second preset angle.

According to another aspect of the present disclosure, a training apparatus for generating a model from a three-dimensional image is also provided.

As shown in fig. 11, the training device for a three-dimensional image generation model includes:

a target three-dimensional face image determining module 1101, configured to determine a target three-dimensional face image by using the sample image;

a predicted three-dimensional face image generation module 1102, configured to input a sample image into a three-dimensional image generation model to be trained, so as to obtain a predicted three-dimensional face image;

the training module 1103 is configured to determine a difference between the target three-dimensional face image and the predicted three-dimensional face image, and train the three-dimensional image generation model to be trained according to the difference until the difference is within an allowable range.

In one embodiment, the predictive three-dimensional face image generation module 1102 is further configured to:

inputting a sample image into a feature extraction layer in a three-dimensional image generation model to be trained, and receiving a predicted three-dimensional face image from an image generation layer in the three-dimensional image generation model to be trained;

In one embodiment, the target three-dimensional face image determination module 1101 includes:

the face key point data determining submodule is used for determining face key point data in the sample image based on the sample image;

the face region image determining submodule is used for carrying out segmentation processing and mask processing on the sample image based on the face key point data to obtain a face region image;

and the target three-dimensional face image determining submodule is used for preprocessing the face region image to obtain a target three-dimensional face image.

In one embodiment, the target three-dimensional face image determination sub-module is further configured to:

and carrying out alignment processing, cutting processing, size adjustment processing and image normalization processing on the face region image to obtain a target three-dimensional face image.

In one embodiment, training module 1103 includes:

the first loss value calculation operator module is used for calculating a first loss value based on the pixel value of the predicted three-dimensional face image and the pixel value of the target three-dimensional face image;

the second loss value calculation operator module is used for calculating a second loss value based on the key point coordinate value of the predicted three-dimensional face image and the key point coordinate value of the sample image;

and the difference calculation submodule is used for summing the first loss value and the second loss value to obtain the difference between the target three-dimensional face image and the predicted three-dimensional face image.

The functions of each unit, module or sub-module in each apparatus in the embodiments of the present disclosure may refer to the corresponding description in the above method embodiments, and are not described herein again.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 12 shows a schematic block diagram of an example electronic device 1200, which can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 12, the electronic apparatus 1200 includes a computing unit 1201, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data necessary for the operation of the electronic apparatus 1200 may also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.

Various components in the electronic device 1200 are connected to the I/O interface 1205, including: an input unit 1206 such as a keyboard, a mouse, or the like; an output unit 1207 such as various types of displays, speakers, and the like; a storage unit 1208, such as a magnetic disk, optical disk, or the like; and a communication unit 1209 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the electronic device 1200 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 1201 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1201 performs the respective methods and processes described above, such as a method for living body recognition and/or a training method of a three-dimensional image generation model. For example, in some embodiments, the method for living body recognition and/or the training method for the three-dimensional image generation model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1200 via the ROM 1202 and/or the communication unit 1209. When the computer program is loaded into the RAM 1203 and executed by the computing unit 1201, one or more steps of the method for living body recognition and/or the training method of the three-dimensional image generation model described above may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured by any other suitable means (e.g. by means of firmware) to perform the method for living body recognition and/or the training method of the three-dimensional image generation model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for living body identification, comprising:

acquiring a plurality of images to be processed of a target object, wherein each image to be processed corresponds to different poses of the target object respectively;

inputting a plurality of images to be processed into a pre-trained three-dimensional image generation model to obtain a three-dimensional face image of the target object;

2. The method of claim 1, wherein the inputting the plurality of images to be processed into a pre-trained three-dimensional image generation model to obtain a three-dimensional face image of the target object comprises:

inputting a plurality of images to be processed into a feature extraction layer in the three-dimensional image generation model, and receiving a three-dimensional face image of the target object from an image generation layer in the three-dimensional image generation model;

wherein the feature extraction layer is configured to extract parameter information of each of the images to be processed and input the parameter information of the images to be processed into the image generation layer; the image generation layer is configured to receive parameter information of the image to be processed and output a three-dimensional face image of the target object.

3. The method of claim 2, wherein the parameter information includes at least one of a shape parameter, an expression parameter, a texture parameter, and at least one of a pose parameter, a lighting parameter, and a camera parameter;

the image generation layer is configured to:

receiving parameter information of the image to be processed, and obtaining a three-dimensional contour image of the target object based on at least one of the shape parameter, the expression parameter and the texture parameter; and the number of the first and second groups,

4. The method of claim 1, wherein the determining a living body recognition result of the target object based on the three-dimensional face image comprises:

preprocessing the three-dimensional face image to obtain a preprocessed three-dimensional face image;

and inputting the preprocessed three-dimensional face image into a pre-trained living body recognition model to obtain a living body recognition result of the target object.

5. The method of claim 4, wherein the preprocessing the three-dimensional face image to obtain a preprocessed three-dimensional face image comprises:

determining a face region in the three-dimensional face image based on the three-dimensional face image;

extracting a partial image of the face region in the three-dimensional face image as a face region image based on the face region;

and carrying out image normalization processing on the face region image to obtain a preprocessed three-dimensional face image.

6. The method according to any one of claims 1 to 5, wherein the plurality of images to be processed comprises a first image to be processed, a second image to be processed and a third image to be processed;

7. A method of training a three-dimensional image generation model, comprising:

determining a target three-dimensional face image by using the sample image;

inputting the sample image into a three-dimensional image generation model to be trained to obtain a predicted three-dimensional face image;

8. The method of claim 7, wherein the inputting the sample image into a three-dimensional image generation model to be trained to obtain a predicted three-dimensional face image comprises:

inputting the sample image into a feature extraction layer in a three-dimensional image generation model to be trained, and receiving a predicted three-dimensional face image from an image generation layer in the three-dimensional image generation model to be trained;

wherein the feature extraction layer is configured to extract parameter information of the sample image and input the parameter information of the sample image into the image generation layer; the image generation layer is configured to receive parameter information of the sample image and output the predicted three-dimensional face image.

9. The method of claim 7, wherein determining the target three-dimensional face image using the sample image comprises:

determining face key point data in the sample image based on the sample image;

carrying out segmentation processing and mask processing on the sample image based on the face key point data to obtain a face region image;

and preprocessing the face region image to obtain a target three-dimensional face image.

10. The method of claim 9, wherein the preprocessing the face region image to obtain a target three-dimensional face image comprises:

11. The method of claim 8, wherein the determining the difference between the target three-dimensional face image and the predicted three-dimensional face image comprises:

calculating a first loss value based on the pixel values of the predicted three-dimensional face image and the pixel values of the target three-dimensional face image;

calculating a second loss value based on the key point coordinate value of the predicted three-dimensional face image and the key point coordinate value of the sample image;

and summing the first loss value and the second loss value to obtain the difference between the target three-dimensional face image and the predicted three-dimensional face image.

12. An apparatus for living body identification, comprising:

the system comprises a to-be-processed image acquisition module, a target object detection module and a target object processing module, wherein the to-be-processed image acquisition module is used for acquiring a plurality of to-be-processed images of a target object, and each to-be-processed image corresponds to different poses of the target object;

the three-dimensional face image generation module is used for inputting a plurality of images to be processed into a pre-trained three-dimensional image generation model to obtain a three-dimensional face image of the target object;

13. The apparatus of claim 12, wherein the three-dimensional face image generation module is further configured to:

14. The apparatus of claim 13, wherein the parameter information includes at least one of a shape parameter, an expression parameter, a texture parameter, and at least one of a pose parameter, a lighting parameter, and a camera parameter;

the image generation layer is configured to:

15. The apparatus of claim 12, wherein the living body identification module comprises:

16. The apparatus of claim 15, wherein the pre-processing sub-module comprises:

and the image normalization processing unit is used for carrying out image normalization processing on the face region image to obtain a preprocessed three-dimensional face image.

17. The apparatus according to any one of claims 12 to 16, wherein the plurality of images to be processed includes a first image to be processed, a second image to be processed, and a third image to be processed;

18. A training apparatus for a three-dimensional image generation model, comprising:

the predicted three-dimensional face image generation module is used for inputting the sample image into a three-dimensional image generation model to be trained to obtain a predicted three-dimensional face image;

19. The apparatus of claim 18, wherein the predictive three-dimensional face image generation module is further configured to:

20. The apparatus of claim 18, wherein the target three-dimensional face image determination module comprises:

a face region image determining submodule, configured to perform segmentation processing and mask processing on the sample image based on the face key point data to obtain a face region image;

21. The apparatus of claim 20, wherein the target three-dimensional face image determination sub-module is further configured to:

22. The apparatus of claim 19, wherein the training module comprises:

a first loss value calculation operator module, configured to calculate a first loss value based on the pixel value of the predicted three-dimensional face image and the pixel value of the target three-dimensional face image;

23. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 11.

24. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 11.

25. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 11.