CN113255512A

CN113255512A - Method, apparatus, device and storage medium for living body identification

Info

Publication number: CN113255512A
Application number: CN202110558078.5A
Authority: CN
Inventors: 梁柏荣; 王珂尧
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-21
Filing date: 2021-05-21
Publication date: 2021-08-13
Anticipated expiration: 2041-05-21
Also published as: CN113255512B

Abstract

The present disclosure provides a method, an apparatus, a device and a storage medium for living body identification, which relate to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to smart cities and financial scenes. The specific implementation scheme is as follows: acquiring a plurality of images to be processed of a target object, wherein each image to be processed corresponds to different poses of the target object; inputting an image to be processed into a pre-trained key point detection model to obtain a key point information set of the image to be processed, wherein the key point information set comprises a plurality of key point information; and obtaining a living body identification result of the target object based on the key point information set of each image to be processed. According to the technical scheme disclosed by the invention, the accuracy of living body identification of the target object is improved, and the effectiveness and the generalization of identification of complex and various attack modes are improved.

Description

Method, apparatus, device and storage medium for living body identification

Technical Field

The utility model relates to an artificial intelligence technical field especially relates to computer vision and deep learning technical field, can be applied to under wisdom city and the financial scene.

Background

The human face living body detection is a basic composition module of a human face recognition system, and the safety of the human face recognition system is ensured. The human face living body detection technology specifically refers to a computer for judging whether a detected human face is a real human face or a fake human face attack, wherein the human face attack can be a picture of a legal user or a video shot in advance and the like. In the related technology, a human face living body detection algorithm using a deep learning technology is a mainstream method in the field at present, and compared with a traditional algorithm, the accuracy is greatly improved. However, for some application scenarios, the problems of poor generalization and poor detection effect on unknown attack modes still exist, and the actual application performance is affected.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, and storage medium for living body identification.

According to an aspect of the present disclosure, there is provided a method for living body identification, including:

acquiring a plurality of images to be processed of a target object, wherein each image to be processed corresponds to different poses of the target object;

inputting an image to be processed into a pre-trained key point detection model to obtain a key point information set of the image to be processed, wherein the key point information set comprises a plurality of key point information;

and obtaining a living body identification result of the target object based on the key point information set of each image to be processed.

According to another aspect of the present disclosure, there is provided a method for training a keypoint detection model, including:

determining a target key point information set by using the sample image;

inputting a sample image into a key point detection model to be trained to obtain a prediction key point information set;

determining the difference between the target key point information set and the predicted key point information set, and training the key point detection model to be trained according to the difference until the difference is within an allowable range

According to another aspect of the present disclosure, there is provided an apparatus for living body identification, including:

the image acquisition module is used for acquiring a plurality of images to be processed of the target object, and each image to be processed corresponds to different poses of the target object respectively;

the key point information set generating module is used for inputting the image to be processed into a pre-trained key point detection model to obtain a key point information set of the image to be processed, and the key point information set comprises a plurality of pieces of key point information;

and the living body identification result generation module is used for obtaining the living body identification result of the target object based on the key point information set of each image to be processed.

According to another aspect of the present disclosure, there is provided a training apparatus for a keypoint detection model, comprising:

the target key point information set determining module is used for determining a target key point information set by utilizing the sample image;

the system comprises a prediction key point information set generation module, a prediction key point information set generation module and a comparison module, wherein the prediction key point information set generation module is used for inputting a sample image into a key point detection model to be trained to obtain a prediction key point information set;

and the training module is used for determining the difference between the target key point information set and the prediction key point information set and training the key point detection model to be trained according to the difference until the difference is within an allowable range.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method in any of the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method in any of the embodiments of the present disclosure.

According to the technical scheme, the images to be processed of the target object under different poses are obtained, the key point information sets of the images to be processed are obtained by utilizing the pre-trained key point detection model, and the living body identification result of the target object is determined based on the key point information sets.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a method for living body identification according to an embodiment of the present disclosure;

fig. 2 is a detailed flowchart of obtaining a living body identification result in the method for living body identification according to the embodiment of the present disclosure;

FIG. 3 is a detailed flow chart of determining an array of keypoints in a method for in vivo identification according to an embodiment of the present disclosure;

FIG. 4 is a detailed flow chart of determining a rotational torque matrix set in a method for living body identification according to an embodiment of the present disclosure;

fig. 5 is a detailed flowchart of obtaining a living body identification result in the method for living body identification according to the embodiment of the present disclosure;

FIG. 6 is a detailed flow chart of preprocessing a key point array in a method for living body identification according to an embodiment of the present disclosure;

fig. 7 is a scene diagram of a method for living body identification, in which an embodiment of the present disclosure may be implemented.

FIG. 8 is a flow chart of a method of training a keypoint detection model according to an embodiment of the disclosure;

FIG. 9 is a detailed flow chart of determining a target keypoint information set in a training method according to an embodiment of the present disclosure;

FIG. 10 is a detailed flow chart of determining differences in a training method according to an embodiment of the present disclosure;

FIG. 11 is a schematic view of an apparatus for living body identification according to an embodiment of the present disclosure;

FIG. 12 is a schematic diagram of a training apparatus for a keypoint detection model according to an embodiment of the present disclosure;

fig. 13 is a block diagram of an electronic device for implementing a method for live body recognition and/or a method for training a keypoint detection model according to an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 shows a flowchart of a method for living body identification according to an embodiment of the present disclosure.

As shown in fig. 1, the method for living body identification specifically includes the following steps:

s101: acquiring a plurality of images to be processed of a target object, wherein each image to be processed corresponds to different poses of the target object;

s102: inputting an image to be processed into a pre-trained key point detection model to obtain a key point information set of the image to be processed, wherein the key point information set comprises a plurality of key point information;

s103: and obtaining a living body identification result of the target object based on the key point information set of each image to be processed.

In the embodiment of the present disclosure, the target object may be a target discrimination object in a specific scene, and the living body recognition result of the target object may be whether a carrier of the target object is a body of the target object. For example, in a scene of face recognition, the target object may be a face, and the living body recognition result of the target object may be that the carrier of the target object is a body or an attack, where the attack refers to a picture or a video including the target object.

The plurality of images to be processed of the target object can be understood as a plurality of images to be processed of the target object contained in the images, and the poses of the target object in different images to be processed are different, in other words, the viewing angles of the target object in different images to be processed are different. It should be noted that the pose of the target object may be formed by three degrees of freedom of the target object in a three-dimensional space and a spatial rotation of the three degrees of freedom. For example, in a face recognition scenario, the different poses of the target object may be such that the face of the target object is oriented differently.

For example, in step S101, a plurality of images to be processed may be acquired by acquiring target objects in different poses through a terminal device. The terminal device may be various image capturing devices, such as a camera, a video camera, and the like.

In one example, the plurality of images to be processed may include a first image to be processed, a second image to be processed, and a third image to be processed. The first image to be processed corresponds to the pose of the target object in the state that the face turns left by a first preset angle, the second image to be processed corresponds to the pose of the target object in the state that the face faces right ahead, and the third image to be processed corresponds to the pose of the target object in the state that the face turns right by a second preset angle. In the first image to be processed, the face of the target object faces to the left 45 degrees relative to the image acquisition device; in the second image to be processed, the face of the target object faces the image acquisition device; in the third image to be processed, the face of the target object turns to the right by 45 degrees relative to the image acquisition device.

For example, in step S102, the key point detection model may adopt various models for detecting key points of a human face, which are known by those skilled in the art or known in the future. For example, the keypoint detection model may specifically adopt any one of CNN (Convolutional Neural Networks), DCNN (Deep Convolutional Neural Networks), TCDCN (Tasks-Constrained Convolutional Neural Networks), MTCNN (Multi-Task Cascaded Convolutional Neural Networks), TCNN (threaded Convolutional Neural Networks, transient Neural Networks), and DAN (Deep Alignment Networks).

The key point information set of the image to be processed comprises a preset number of key point information, wherein each key point information can comprise coordinate values and pixel values of key points.

In one example, the keypoint detection model includes a feature extraction layer and a fully connected layer. The image to be processed is input into a feature extraction layer of a key point detection model, and then a key point information set of the image to be processed is received from a full connection layer. The feature extraction layer is configured to extract human face features in an image to be processed, the extracted human face feature information is transmitted to the full connection layer, the full connection layer (FC) is configured to receive the human face feature information sent by the feature extraction layer, the human face feature information is classified according to the extracted human face feature information, a preset number of pieces of key point information are obtained based on a classification result, and a key point information set is constructed and output based on the preset number of pieces of key point information.

More specifically, the feature extraction layer may employ ResNet34 (a deep neural network). Specifically, the ResNet34 adopts a 34-layer Residual error network structure (Residual Unit), and the Residual error jump structure breaks through the convention that the output of the n-1 layer of the traditional neural network can only be used as the input for the n layer, so that the output of a certain layer can directly span several layers as the input of a later layer, and a new direction is provided for the difficult problem that the error rate of the whole learning model is not reduced and increased due to the superposition of multiple layers of networks, therefore, the number of layers of the neural network can exceed the previous constraint, reaches dozens of layers, hundreds of layers or even thousands of layers, and feasibility is provided for the extraction and classification of high-level semantic features. Therefore, the accuracy rate of extracting the face feature information is improved.

Illustratively, in step S103, a living body recognition result of the target object may be obtained by inputting the key point information sets corresponding to the plurality of images to be processed of the target object respectively into a pre-trained living body recognition model.

In one example, the living body recognition model employs a convolutional neural network model. Specifically, the convolutional neural network includes a feature extraction layer, a Fully Connected Layers (FC), and an image normalization processing layer. The feature extraction layer is used for carrying out feature extraction processing on the three-dimensional face image, then inputting the extracted features into the full-connection layer, classifying the three-dimensional face image according to the extracted features by the full-connection layer, inputting the classification result into the normalization processing layer, and obtaining and outputting a living body recognition result through image normalization processing of the normalization processing layer. The feature extraction layer can adopt MobileNet V2 (a deep separable convolution) as a backbone network of the convolutional neural network; the normalization processing layer may employ a Softmax layer (a logistic regression model).

Illustratively, the method of the embodiment of the disclosure can be applied to smart cities or financial scenes, and particularly can be applied to a plurality of scenes such as security, attendance, financial payment verification, entrance guard passing and the like.

The method for living body identification according to the embodiment of the present disclosure is described below with reference to a specific application scenario.

In a face recognition scenario of financial payment verification, in response to a face recognition request, face images of a target object in different poses are acquired, for example, a plurality of face images of the target object with the face oriented at different horizontal angles are acquired, or a plurality of face images of the target object with the face oriented at different vertical angles are acquired. Then, a plurality of face images of the target object are input into a pre-trained key point detection model, and key point information sets corresponding to the plurality of face images respectively are obtained. And finally, obtaining a living body recognition result of the target object by utilizing a pre-trained living body recognition model based on the plurality of key point information sets. Under the condition that the living body identification result of the target object is a living body, carrying out the next face identification process; and terminating the subsequent face recognition process when the living body recognition result of the target object is not the living body.

Compared with the in-vivo detection method in the related art, which only uses a single image of a target object as the input of a convolutional neural network, the method for in-vivo identification of the embodiment of the disclosure obtains a plurality of images to be processed under different poses of the target object, then obtains the key point information sets of the plurality of images to be processed by using a pre-trained key point detection model, and finally determines the in-vivo identification result of the target object based on the plurality of key point information sets, aiming at the situation that the pose of the target object changes too much in a real scene, the method for in-vivo identification of the embodiment of the disclosure has higher robustness, is beneficial to improving the accuracy of in-vivo identification, improves the effectiveness and generalization of identification aiming at complex and various attack modes, and effectively improves the technical performance of face in-vivo detection in various application scenes such as security, attendance, finance, entrance guard and the like, the application effect and the user experience are effectively improved for a plurality of applications based on the human face living body detection technology, and the further popularization of the business items is facilitated.

As shown in fig. 2, in one embodiment, step S103 includes:

s201: determining a key point array of a target object based on the key point information set of each image to be processed; determining a rotation matrix array of the target object based on the key point information set of each image to be processed;

s202: and obtaining a living body identification result of the target object based on the key point array and the rotation matrix array.

It should be noted that, for a plurality of images to be processed in which a target object is a living body, because the poses of the target object in different images to be processed are different, and the spatial coordinate systems based on the target object are different, spatial vectors of the same face key point of the target object in different images to be processed are different. For a plurality of identical face key points in any two images to be processed, the space vector rotation transformation relations of the face key points are consistent, wherein the rotation transformation relations can be represented by a rotation matrix. Based on this, the rotation matrix can be used to represent the rotation transformation relationship of each face key point of the target object in different images to be processed.

Illustratively, the key point array includes key point information of a plurality of images to be processed of the target object, and the rotation matrix array includes a rotation transformation relationship of coordinate values of the key point information in any two images to be processed. And inputting the key point array and the rotation matrix array into a pre-trained living body recognition model according to the key point coordinates contained in the key point information in each key point information set in the key point array and the rotation matrix between any two key point information sets in the rotation matrix array, thereby obtaining the living body recognition result of the target object.

According to the embodiment, the living body identification result of the target object is obtained according to whether the rotation transformation relations of the face key points of the target object in different images to be processed are consistent or not by utilizing the key point information sets of the images to be processed and then determining the key point array and the rotation matrix array of the target object based on the key point information sets. Therefore, the accuracy of the living body detection of the target object is improved, and the generalization and the effectiveness of the electronic screen attack identification are improved.

As shown in fig. 3, in one embodiment, step S201 includes:

s301: and merging the key point information sets of the images to be processed to obtain a key point array of the target object.

Illustratively, for three images to be processed of the target object, 68 pieces of keypoint information of each image to be processed are respectively detected through the keypoint detection model to form a keypoint information set. The key point information set may represent coordinate values of the respective key point information by a vector set of 68 × 3, for example, the key point information set may be represented as { (x1, y1, z1), …, (x68, y68, z68) }. And merging the three key point information sets to obtain a 68 x 9 vector set as a key point array of the target object.

According to the embodiment, the key point information sets of the images to be processed are combined to obtain the key point array, and the plurality of key point information sets are integrated on the basis of keeping the original key point information so as to meet the input requirement of the living body recognition model.

As shown in fig. 4, in one embodiment, step S201 further includes:

s401: constructing a plurality of to-be-processed image pairs, wherein each to-be-processed image pair comprises any two of the to-be-processed images of the target object;

s402: calculating a rotation matrix of each image pair to be processed based on the key point information set of the image to be processed;

s403: and combining the rotation matrixes of the multiple pairs of images to be processed to obtain a rotation matrix array of the target object.

Illustratively, the image pair to be processed includes two different images to be processed, and the plurality of images to be processed may be obtained by selecting two different images to be processed from the plurality of images to be processed for a plurality of times and combining the two different images to be processed. And aiming at each image pair to be processed, calculating a rotation matrix between the key point information sets of the two images to be processed according to the coordinate values of the key point information contained in the images to be processed, and merging the rotation matrixes respectively corresponding to the multiple images to be processed to obtain a rotation matrix array of the target object.

For example, for three to-be-processed images of the target object, three to-be-processed image pairs can be obtained through combination. And respectively calculating rotation matrixes of the three to-be-processed image pairs to obtain three rotation matrixes with the size of 3 x 3. And combining the three rotation matrixes to obtain a rotation matrix array of 9 x 3.

According to the above embodiment, by calculating the rotation matrix of each pair of images to be processed and combining the obtained rotation matrices to obtain the rotation matrix array of the target object, the rotation transformation relation data between different sets of keypoint information can be integrated and meet the input requirement of the living body identification model.

As shown in fig. 5, in one embodiment, step S202 includes:

s501: preprocessing the key point array to obtain a preprocessed key point array;

s502: and inputting the preprocessed key point array and the preprocessed rotation matrix array into a pre-trained living body recognition model to obtain a living body recognition result of the target object.

It should be noted that, in step S501, the key point groups are preprocessed, so that the preprocessed key point groups can meet the input requirement of the living body identification model, and the data of the key point groups can be simplified, thereby improving the detection efficiency and stability of the subsequent living body identification model.

For example, in step S502, the living body recognition model may adopt various models known to those skilled in the art or known in the future.

Taking the living body identification model as an example, the convolutional neural network model is adopted, and specifically, the convolutional neural network includes a feature extraction layer, a Fully Connected Layers (FC) and an image normalization processing layer. The feature extraction layer is used for performing feature extraction processing on the key point array and the rotation matrix array, then inputting the extracted features into the full connection layer, the full connection layer inputs the classification result into the normalization processing layer according to the extracted features to obtain a living body identification result. The feature extraction layer can adopt MobileNet V2 (a deep separable convolution) as a backbone network of a convolutional neural network; the normalization processing layer may employ a Softmax layer (a logistic regression model).

In addition, in the training process of the living body recognition model, a plurality of images of the living body object and the attack object may be respectively acquired as a plurality of living body image samples of the living body object and a plurality of attack image samples of the attack object. And respectively inputting the plurality of living body image samples and the plurality of attack image samples into a pre-trained key point detection model to obtain a plurality of key point information sets of the living body object and a plurality of key point information sets of the attack object. Obtaining a key point array and a rotation matrix array of the living object based on a plurality of key point information sets of the living object; and obtaining a key point array and a rotation matrix array of the attack object based on a plurality of key point information sets of the attack object. And respectively taking the key point array and the rotation matrix array of the living body object and the key point array and the rotation matrix array of the attack object as training samples to train the living body recognition model.

According to the embodiment, the living body recognition result of the target object is obtained by inputting the key point array and the rotation matrix array into the pre-trained living body recognition model, so that on one hand, the detection precision and accuracy of the living body recognition result are improved, and the generalization on attack detection is improved; on the other hand, the detection efficiency of the living body detection is improved, and the method is favorable for further popularization of the business items adopting the method of the embodiment of the disclosure.

As shown in fig. 6, in one embodiment, the key point information includes coordinate values and pixel values corresponding to the coordinate values, and the step S501 includes:

s601: normalizing the pixel value of each key point information in the key point array to obtain a normalized pixel value of each key point information, wherein the numerical value of the normalized pixel value conforms to a preset interval;

s602: and obtaining a preprocessed key point array based on the normalized pixel value of each key point information.

For example, in step S601, a calculation process is performed according to a preset rule on a pixel value included in each piece of keypoint information, so as to obtain a normalized pixel value of each piece of keypoint information.

For example, for the pixel value included in each keypoint information, the difference value obtained by subtracting 112 from each pixel value is divided by 224 to obtain a normalized pixel value, wherein the normalized pixel value has a value range of [ -0.5,0.5 ].

Through the embodiment, the relevant numerical values of the key point information in the preprocessed key point array can be limited in the preset range, so that the influence of singular data in the data on the finally generated living body identification result is eliminated, and the accuracy of the living body identification result is further improved.

The method for living body identification according to the embodiment of the present disclosure is described below in one specific application scenario with reference to fig. 7.

As shown in fig. 7, the method for living body identification of the embodiment of the present disclosure may be applied to a living body detection scene for a human face. Firstly, a plurality of collected images of an object to be detected in different poses, for example, three collected images of the left side, the front side and the right side of the face of the object to be detected are obtained. And carrying out data preprocessing on the three acquired images so that the data of the three acquired images meet the input requirement of the human face key point detection model. And respectively inputting the three collected images into the face key point detection model to obtain a key point information set corresponding to each collected image. The key point information set comprises a plurality of key point information, and each key point information comprises a key point coordinate and a corresponding pixel value.

And then, based on the key point information sets of the three acquired images, obtaining a key point array through merging processing. And calculating a rotation matrix between any two key point information sets in the key point information sets of the three acquired images, and combining the obtained rotation matrices to obtain a rotation matrix array.

And finally, inputting the key point array and the rotation matrix array into a pre-trained living body recognition model to obtain a living body recognition result of the object to be detected.

According to another aspect of the present disclosure, a method for training a keypoint detection model is also provided.

As shown in fig. 8, the method for training the keypoint detection model includes:

s801: determining a target key point information set by using the sample image;

s802: inputting a sample image into a key point detection model to be trained to obtain a prediction key point information set;

s803: and determining the difference between the target key point information set and the prediction key point information set, and training the key point detection model to be trained according to the difference until the difference is within an allowable range.

Illustratively, in step S801, the obtained living body sample image and the attack sample image may be taken as sample images by acquiring a plurality of images of the living body object and the attack object in different poses, respectively. Wherein, the attack object can be a photo or a video of a living object.

For example, in step S803, the difference between the target keypoint information set and the predicted keypoint information set may be determined by calculating a loss function between the coordinate values of the predicted keypoint information in the predicted keypoint information set and the coordinate values of the target keypoint information in the target keypoint information set.

According to the method disclosed by the embodiment of the disclosure, the training of the key point detection model is realized, and the key point information set of the living body object or the attack object can be accurately detected by taking a plurality of images of the living body object or the attack object under different poses as sample images and aiming at the images of different target objects under different poses of the trained key point detection model.

As shown in fig. 9, in one embodiment, step S801 includes:

s901: carrying out image extraction processing on the sample image to obtain a face region image;

s902: carrying out normalization processing on the face region image to obtain a normalized face region image;

s903: and obtaining a target key point information set according to a matching result of the normalized face area image and a pre-established key point data set.

Exemplarily, in step S901, first, a preset number of reference key points are determined in the sample image by using a pre-established face key point data set, and the sample image is masked based on the preset number of reference key points, so as to obtain a templated representation of the sample image. Then, using a High-definition face data set (FFHQ) established in advance, the face region image is subjected to alignment processing, cropping processing, and size adjustment according to the matching result expressed in the High-definition face data set by templating of the sample image, so that a face region image with a size of 224 × 224 is obtained.

For example, in step S902, a calculation process is performed according to a preset rule on the pixel value of each pixel in the face region image, so as to obtain a normalized pixel value of each pixel. For example, for each pixel in the face region image, the pixel value corresponding to each pixel is divided by 255 and then subtracted by 1 to obtain a normalized pixel value of each pixel, wherein the numerical range of the normalized pixel value is [ -0.5,0.5 ].

For example, in step S903, the key point data set may adopt various key point data sets known to those skilled in the art or known in the future. The key point data set comprises a batch of high-definition images collected in advance, and each high-definition image comprises a key point labeled in advance. By matching the normalized face region image with the key point data set, the target key point data set of the sample image can be conveniently obtained.

Through the embodiment, the determined target key point data set can be ensured to have higher accuracy, the manpower marking cost for manually marking is saved, and the acquisition difficulty of the target key point data set is reduced.

As shown in fig. 10, in one embodiment, step S803 includes:

s1001: determining target key point information corresponding to each piece of predicted key point information in the predicted key point information set based on the target key point information set and the predicted key point information set;

s1002: and calculating a loss value between the coordinate value of the predicted key point information and the coordinate value of the corresponding target key point information, and determining the difference between the target key point information set and the predicted key point information set according to the loss value.

For example, the loss values of the predicted keypoint information and the corresponding target keypoint information may be obtained by calculating an L1 loss function between the coordinate values of each predicted keypoint information and the corresponding coordinate values of the target keypoint information. And then, determining the difference between the target key point information set and the prediction key point information set by calculating the average value of the loss values corresponding to all the prediction key point information sets.

By the implementation mode, the difference between the target key point information set and the prediction key point information set can be accurately determined, and the optimization effect of the key point detection model in the training process is improved.

According to another aspect of the present disclosure, there is also provided an apparatus for living body identification.

As shown in fig. 11, the apparatus for living body identification includes:

an image obtaining module 1101, configured to obtain a plurality of to-be-processed images of a target object, where each of the to-be-processed images corresponds to a different pose of the target object;

a key point information set generating module 1102, configured to input the image to be processed into a pre-trained key point detection model, so as to obtain a key point information set of the image to be processed, where the key point information set includes a plurality of pieces of key point information;

a living body recognition result generating module 1103, configured to obtain a living body recognition result of the target object based on the key point information set of each image to be processed.

In one embodiment, the living body identification result generation module 1103 includes:

the key point array determining submodule is used for determining the key point array of the target object based on the key point information set of each image to be processed; determining a rotation matrix array of the target object based on the key point information set of each image to be processed;

and the living body identification result generation submodule is used for obtaining a living body identification result of the target object based on the key point array and the rotation matrix array.

In one embodiment, the key point group determination submodule includes:

and the key point array determining unit is used for merging the key point information sets of the images to be processed to obtain the key point array of the target object.

In one embodiment, the living body recognition result generation sub-module includes:

the device comprises a to-be-processed image pair construction unit, a target object detection unit and a target object detection unit, wherein the to-be-processed image pair construction unit is used for constructing a plurality of to-be-processed image pairs, and each to-be-processed image pair comprises any two of a plurality of to-be-processed images of the target object;

the rotation matrix calculation unit is used for calculating a rotation matrix of each image pair to be processed based on the key point information set of the image to be processed;

and the rotation matrix array generating unit is used for combining the rotation matrixes of the multiple pairs of images to be processed to obtain a rotation matrix array of the target object.

the preprocessing unit is used for preprocessing the key point array to obtain a preprocessed key point array;

and the living body recognition result generation unit is used for inputting the preprocessed key point array and the preprocessed rotation matrix array into a pre-trained living body recognition model to obtain a living body recognition result of the target object.

In one embodiment, the key point information includes coordinate values and pixel values corresponding to the coordinate values; the preprocessing unit is further configured to:

normalizing the pixel value of each key point information in the key point array to obtain a normalized pixel value of each key point information, wherein the numerical value of the normalized pixel value conforms to a preset interval; and the number of the first and second groups,

and obtaining a preprocessed key point array based on the normalized pixel value of each key point information.

According to another aspect of the present disclosure, there is also provided a training apparatus for a keypoint detection model,

as shown in fig. 12, the training apparatus for the keypoint detection model includes:

a target keypoint information set determining module 1201, configured to determine a target keypoint information set by using the sample image;

a predicted key point information set generation module 1202, configured to input the sample image into a key point detection model to be trained, to obtain a predicted key point information set;

a training module 1203, configured to determine a difference between the target keypoint information set and the predicted keypoint information set, and train the keypoint detection model to be trained according to the difference until the difference is within an allowable range.

In one embodiment, the target keypoint information set determination module 1201 comprises:

the image extraction submodule is used for carrying out image extraction processing on the sample image to obtain a face region image;

the normalization processing submodule is used for performing normalization processing on the face region image to obtain a normalized face region image;

and the target key point information generating submodule is used for obtaining a target key point information set according to a matching result of the normalized face area image and a key point data set established in advance.

In one embodiment, the training module 1203 includes:

the corresponding relation determining submodule is used for determining target key point information corresponding to each piece of predicted key point information in the predicted key point information set on the basis of the target key point information set and the predicted key point information set;

and the difference determining submodule is used for calculating a loss value between the coordinate value of the predicted key point information and the coordinate value of the corresponding target key point information, and determining the difference between the target key point information set and the predicted key point information set according to the loss value.

The functions of each unit, module or sub-module in each apparatus in the embodiments of the present disclosure may refer to the corresponding description in the above method embodiments, and are not described herein again.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

Fig. 13 illustrates a schematic block diagram of an example electronic device 1300 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 13, the electronic device 1300 includes a computing unit 1301 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1302 or a computer program loaded from a storage unit 1308 into a Random Access Memory (RAM) 1303. In the RAM 1303, various programs and data necessary for the operation of the electronic device 1300 can also be stored. The calculation unit 1301, the ROM 1302, and the RAM 1303 are connected to each other via a bus 1304. An input/output (I/O) interface 1305 is also connected to bus 1304.

A number of components in the electronic device 1300 are connected to the I/O interface 1305, including: an input unit 1306 such as a keyboard, a mouse, or the like; an output unit 1307 such as various types of displays, speakers, and the like; storage unit 1308, such as a magnetic disk, optical disk, or the like; and a communication unit 1309 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1309 allows the electronic device 1300 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 1301 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of computing unit 1301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1301 performs the respective methods and processes described above, such as a method for living body recognition and/or a training method of a keypoint detection model. For example, in some embodiments, the method for live recognition and/or the method for training the keypoint detection model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1308. In some embodiments, part or all of the computer program can be loaded and/or installed onto the electronic device 1300 via the ROM 1302 and/or the communication unit 1309. When the computer program is loaded into the RAM 1303 and executed by the computing unit 1301, one or more steps of the method for living body identification and/or the training method of the keypoint detection model described above may be performed. Alternatively, in other embodiments, the computing unit 1301 may be configured in any other suitable way (e.g. by means of firmware) to perform the method for live body identification and/or the training method of the keypoint detection model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for living body identification, comprising:

acquiring a plurality of images to be processed of a target object, wherein each image to be processed corresponds to different poses of the target object respectively;

inputting the image to be processed into a pre-trained key point detection model to obtain a key point information set of the image to be processed, wherein the key point information set comprises a plurality of pieces of key point information;

2. The method according to claim 1, wherein the obtaining of the living body recognition result of the target object based on the key point information set of each image to be processed comprises:

determining a key point array of the target object based on the key point information set of each image to be processed; determining a rotation matrix array of the target object based on the key point information set of each image to be processed;

and obtaining a living body identification result of the target object based on the key point array and the rotation matrix array.

3. The method of claim 2, wherein the determining the keypoint array of the target object comprises:

and merging the plurality of key point information sets of the images to be processed to obtain a key point array of the target object.

4. The method of claim 2, wherein the determining the array of rotation matrices for the target object comprises:

constructing a plurality of to-be-processed image pairs, wherein each to-be-processed image pair comprises any two of the to-be-processed images of the target object;

calculating a rotation matrix of each image pair to be processed based on the key point information set of the image to be processed;

and combining the rotation matrixes of the plurality of pairs of images to be processed to obtain a rotation matrix array of the target object.

5. The method of claim 2, wherein the deriving the live recognition result of the target object based on the keypoint array and the rotation matrix array comprises:

preprocessing the key point array to obtain a preprocessed key point array;

and inputting the preprocessed key point array and the rotation matrix array into a pre-trained living body recognition model to obtain a living body recognition result of the target object.

6. The method of claim 5, wherein the keypoint information comprises coordinate values and pixel values corresponding to the coordinate values;

the preprocessing is carried out on the key point array to obtain a preprocessed key point array, and the method comprises the following steps:

normalizing the pixel value of each piece of key point information in the key point array to obtain a normalized pixel value of each piece of key point information, wherein the numerical value of the normalized pixel value conforms to a preset interval;

and obtaining a preprocessed key point array based on the normalized pixel value of each piece of key point information.

7. A method for training a keypoint detection model comprises the following steps:

determining a target key point information set by using the sample image;

inputting the sample image into a key point detection model to be trained to obtain a prediction key point information set;

and determining the difference between the target key point information set and the prediction key point information set, and training the key point detection model to be trained according to the difference until the difference is within an allowable range.

8. The method of claim 7, wherein said determining a target keypoint information set using a sample image comprises:

carrying out image extraction processing on the sample image to obtain a face region image;

carrying out normalization processing on the face region image to obtain a normalized face region image;

and obtaining a target key point information set according to a matching result of the normalized face area image and a pre-established key point data set.

9. The method of claim 7, wherein the determining the difference of the target keypoint information set and the predicted keypoint information set comprises:

determining target key point information corresponding to each piece of predicted key point information in the predicted key point information set based on the target key point information set and the predicted key point information set;

and calculating a loss value between the coordinate value of the predicted key point information and the coordinate value of the corresponding target key point information, and determining the difference between the target key point information set and the predicted key point information set according to the loss value.

10. An apparatus for living body identification, comprising:

the image acquisition module is used for acquiring a plurality of images to be processed of a target object, and each image to be processed corresponds to different poses of the target object respectively;

a key point information set generating module, configured to input the image to be processed into a pre-trained key point detection model, so as to obtain a key point information set of the image to be processed, where the key point information set includes a plurality of pieces of key point information;

11. The apparatus of claim 10, wherein the living body recognition result generating module comprises:

a key point array determining submodule, configured to determine a key point array of the target object based on the key point information set of each to-be-processed image; determining a rotation matrix array of the target object based on the key point information set of each image to be processed;

and the living body identification result generation submodule is used for obtaining the living body identification result of the target object based on the key point array and the rotation matrix array.

12. The apparatus of claim 11, wherein the keypoint array determination submodule comprises:

13. The apparatus according to claim 11, wherein the living body recognition result generation sub-module includes:

the to-be-processed image pair construction unit is used for constructing a plurality of to-be-processed image pairs, and each to-be-processed image pair comprises any two of the to-be-processed images of the target object;

and the rotation matrix array generating unit is used for combining the rotation matrixes of the plurality of pairs of images to be processed to obtain a rotation matrix array of the target object.

14. The apparatus according to claim 11, wherein the living body recognition result generation sub-module includes:

and the living body recognition result generation unit is used for inputting the preprocessed key point array and the rotation matrix array into a pre-trained living body recognition model to obtain a living body recognition result of the target object.

15. The apparatus of claim 14, wherein the keypoint information comprises coordinate values and pixel values corresponding to the coordinate values; the preprocessing unit is further configured to:

normalizing the pixel value of each piece of key point information in the key point array to obtain a normalized pixel value of each piece of key point information, wherein the numerical value of the normalized pixel value conforms to a preset interval; and the number of the first and second groups,

16. A training apparatus for a keypoint detection model, comprising:

the predicted key point information set generation module is used for inputting the sample image into a key point detection model to be trained to obtain a predicted key point information set;

17. The apparatus of claim 16, wherein the target keypoint information set determination module comprises:

18. The apparatus of claim 16, wherein the training module comprises:

19. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 9.

20. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 9.