CN113255512B

CN113255512B - Method, apparatus, device and storage medium for living body identification

Info

Publication number: CN113255512B
Application number: CN202110558078.5A
Authority: CN
Inventors: 梁柏荣; 王珂尧
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-21
Filing date: 2021-05-21
Publication date: 2023-07-28
Anticipated expiration: 2041-05-21
Also published as: CN113255512A

Abstract

The disclosure provides a method, a device, equipment and a storage medium for living body identification, relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to smart cities and financial scenes. The specific implementation scheme is as follows: acquiring a plurality of images to be processed of a target object, wherein each image to be processed corresponds to different poses of the target object; inputting an image to be processed into a pre-trained key point detection model to obtain a key point information set of the image to be processed, wherein the key point information set comprises a plurality of key point information; and obtaining a living body identification result of the target object based on the key point information set of each image to be processed. According to the technical scheme, the accuracy of living body identification of the target object is improved, and the effectiveness and generalization of identification of complex and diverse attack modes are improved.

Description

Method, apparatus, device and storage medium for living body identification

Technical Field

The disclosure relates to the field of artificial intelligence technology, and in particular to the field of computer vision and deep learning technology, which can be applied to smart cities and financial scenes.

Background

The human face living body detection is a basic composition module of the human face recognition system, and the safety of the human face recognition system is ensured. The human face living body detection technology specifically refers to a human face attack which is judged by a computer to be a real human face or a fake human face, and the human face attack can be a picture of a legal user or a video shot in advance, and the like. In the related art, a face living body detection algorithm using a deep learning technology is a mainstream method in the current field, and compared with the traditional algorithm, the accuracy is greatly improved. However, for some application scenarios, there are still problems of poor generalization and poor detection effect on unknown attack modes, which affect the actual application performance.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, and storage medium for living body identification.

According to an aspect of the present disclosure, there is provided a method for living body identification, including:

acquiring a plurality of images to be processed of a target object, wherein each image to be processed corresponds to different poses of the target object;

inputting an image to be processed into a pre-trained key point detection model to obtain a key point information set of the image to be processed, wherein the key point information set comprises a plurality of key point information;

And obtaining a living body identification result of the target object based on the key point information set of each image to be processed.

According to another aspect of the present disclosure, there is provided a training method of a keypoint detection model, including:

determining a target key point information set by using the sample image;

inputting the sample image into a key point detection model to be trained to obtain a predicted key point information set;

determining the difference between the target key point information set and the predicted key point information set, and training the key point detection model to be trained according to the difference until the difference is within the allowable range

According to another aspect of the present disclosure, there is provided an apparatus for living body identification, including:

the image acquisition module is used for acquiring a plurality of images to be processed of the target object, and each image to be processed corresponds to different poses of the target object respectively;

the key point information set generation module is used for inputting the image to be processed into a pre-trained key point detection model to obtain a key point information set of the image to be processed, wherein the key point information set comprises a plurality of key point information;

and the living body identification result generation module is used for obtaining the living body identification result of the target object based on the key point information set of each image to be processed.

According to another aspect of the present disclosure, there is provided a training apparatus of a keypoint detection model, including:

the target key point information set determining module is used for determining a target key point information set by utilizing the sample image;

the predicted key point information set generation module is used for inputting the sample image into a key point detection model to be trained to obtain a predicted key point information set;

and the training module is used for determining the difference between the target key point information set and the predicted key point information set, and training the key point detection model to be trained according to the difference until the difference is within the allowable range.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method in any of the embodiments of the present disclosure.

According to the technical scheme, the method comprises the steps of obtaining a plurality of images to be processed under different poses of a target object, obtaining a plurality of key point information sets of the images to be processed by utilizing a pre-trained key point detection model, determining a living body recognition result of the target object based on the plurality of key point information sets, and improving the accuracy of living body recognition for the situation that the poses of the target object in a real scene are too large, and improving the effectiveness and generalization of the recognition for complex and diverse attack modes.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a method for in vivo identification according to an embodiment of the present disclosure;

Fig. 2 is a specific flowchart for obtaining a living body recognition result in a method for living body recognition according to an embodiment of the present disclosure;

FIG. 3 is a particular flow chart of determining a set of keypoints in a method for in-vivo identification according to an embodiment of the disclosure;

FIG. 4 is a particular flow chart of determining a rotation matrix array in a method for in-vivo identification according to an embodiment of the present disclosure;

fig. 5 is a specific flowchart for obtaining a living body recognition result in the method for living body recognition according to the embodiment of the present disclosure;

FIG. 6 is a particular flow chart of preprocessing a set of key points in a method for in vivo identification according to an embodiment of the present disclosure;

fig. 7 is a scene diagram of a method for in-vivo identification in which embodiments of the present disclosure may be implemented.

FIG. 8 is a flow chart of a method of training a keypoint detection model in accordance with an embodiment of the present disclosure;

FIG. 9 is a particular flow chart of determining a set of target keypoint information in a training method in accordance with an embodiment of the disclosure;

FIG. 10 is a particular flow chart of a determined difference in a training method according to an embodiment of the present disclosure;

FIG. 11 is a schematic view of an apparatus for in-vivo identification according to an embodiment of the present disclosure;

FIG. 12 is a schematic diagram of a training device of a keypoint detection model in accordance with an embodiment of the present disclosure;

Fig. 13 is a block diagram of an electronic device used to implement the method for in-vivo identification and/or the training method of the keypoint detection model of embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 illustrates a flowchart of a method for in-vivo identification according to an embodiment of the present disclosure.

As shown in fig. 1, the method for living body identification specifically includes the steps of:

s101: acquiring a plurality of images to be processed of a target object, wherein each image to be processed corresponds to different poses of the target object;

s102: inputting an image to be processed into a pre-trained key point detection model to obtain a key point information set of the image to be processed, wherein the key point information set comprises a plurality of key point information;

s103: and obtaining a living body identification result of the target object based on the key point information set of each image to be processed.

In the embodiment of the present disclosure, the target object may be a target discrimination object under a specific scene, and the living body recognition result of the target object may be whether the carrier of the target object is the body of the target object. For example, in the face recognition scenario, the target object may be a face, and the living body recognition result of the target object may be that the carrier of the target object is an entity or an attack, where the attack refers to a picture or a video containing the target object.

The multiple images to be processed of the target object can be understood as multiple images to be processed of the target object contained in the image, and the pose of the target object in different images to be processed is different, in other words, the visual angles of the target object in different images to be processed are different. It should be noted that the pose of the target object may be formed by three degrees of freedom of the target object in the three-dimensional space and spatial rotation of the three degrees of freedom. For example, in a face recognition scenario, different poses of the target object may be different orientations of the face of the target object.

In step S101, a plurality of images to be processed may be acquired by acquiring target objects in different poses by a terminal device. The terminal device may be various image capturing devices, such as a camera, a video camera, etc.

In one example, the plurality of images to be processed may include a first image to be processed, a second image to be processed, and a third image to be processed. The first image to be processed corresponds to the pose of the target object in a face left-turn first preset angle state, the second image to be processed corresponds to the pose of the target object in a face right-turn second preset angle state, and the third image to be processed corresponds to the pose of the target object in a face right-turn second preset angle state. In the first image to be processed, the face orientation of the target object turns left 45 degrees relative to the image acquisition device; in the second image to be processed, the face orientation of the target object is opposite to the image acquisition device; in the third image to be processed, the face orientation of the target object is turned 45 degrees to the right relative to the image acquisition device.

Illustratively, in step S102, the keypoint detection model may employ various models for detecting face keypoints known to those skilled in the art or as may be known in the future. For example, the keypoint detection model may specifically employ any one of CNN (Convolutional Neural Networks, convolutional neural network), DCNN (Deep Convolutional Neural Networks, deep convolutional neural network), TCDCN (Tasks-Constrained Deep Convolutional Network, face feature point detection model), MTCNN (Multi-Task Cascaded Convolutional Networks, multi-tasking cascade convolutional neural network), TCNN (Tweaked Convolutional Neural Networks, transient chaotic neural network), and DAN (Deep Alignment Networks, deep alignment network).

The key point information set of the image to be processed comprises a preset number of key point information, wherein each key point information can comprise coordinate values and pixel values of the key points.

In one example, the keypoint detection model includes a feature extraction layer and a full connection layer. The method comprises the steps of inputting an image to be processed into a feature extraction layer of a key point detection model, and then receiving a key point information set of the image to be processed from a full connection layer. The feature extraction layer is configured to extract face features in the image to be processed, and convey the extracted face feature information to the full connection layer, and the full connection layer (Fully Connected Layers, FC) is configured to receive the face feature information sent by the feature extraction layer, classify the face feature information according to the extracted face feature information, obtain a preset number of key point information based on classification results, and construct and output a key point information set based on the preset number of key point information.

More specifically, the feature extraction layer may employ ResNet34 (a deep neural network). Specifically, the ResNet34 adopts a 34-layer Residual network structure (Residual Unit), and the Residual jump structure breaks through the convention that the output of n-1 layers of the traditional neural network can only be used as input for n layers, so that the output of one layer can directly cross several layers to be used as the input of a later layer, a new direction is provided for the difficult problem that the error rate of the whole learning model is not reduced and raised by overlapping the multi-layer network, the number of layers of the neural network can exceed the previous constraint, tens of layers, hundreds of layers and even thousands of layers are achieved, and feasibility is provided for high-level semantic feature extraction and classification. Therefore, the accuracy of extracting the face characteristic information is improved.

Illustratively, in step S103, the living body recognition result of the target object may be obtained by inputting the keypoint information sets respectively corresponding to the plurality of images to be processed of the target object into a living body recognition model trained in advance.

In one example, the living body identification model employs a convolutional neural network model. Specifically, the convolutional neural network includes a feature extraction layer, a full connection layer (Fully Connected Layers, FC), and an image normalization processing layer. The feature extraction layer is used for carrying out feature extraction processing on the three-dimensional face image, then inputting the extracted features into the full-connection layer, classifying the three-dimensional face image according to the extracted features, inputting the classification result into the normalization processing layer, carrying out image normalization processing of the normalization processing layer, obtaining a living body recognition result and outputting the living body recognition result. The feature extraction layer can adopt MobileNet V2 (a depth separable convolution) as a backbone network of the convolution neural network; the normalization process layer may employ a Softmax layer (a logistic regression model).

The method of the embodiment of the disclosure can be applied to smart cities or financial scenes, and particularly can be applied to various scenes such as security protection, attendance checking, financial payment verification, entrance guard passage and the like.

The method for living body recognition according to the embodiment of the present disclosure is described below in connection with one specific application scenario.

In a face recognition scenario of financial payment verification, face images of a target object under different poses are acquired in response to a face recognition request, for example, a plurality of face images of the target object under different horizontal angles of face orientation, or a plurality of face images of the target object under different vertical angles of face orientation are acquired. And then, inputting a plurality of face images of the target object into a pre-trained key point detection model to obtain key point information sets corresponding to the face images respectively. Finally, based on the plurality of key point information sets, a living body recognition result of the target object is obtained by using a living body recognition model trained in advance. Under the condition that the living body identification result of the target object is a living body, carrying out the next face identification flow; if the living body recognition result of the target object is not living body, the subsequent face recognition process is terminated.

Compared with the living body detection method in the related art, the living body detection method for living body identification only uses a single image of the target object as the input of the convolutional neural network, the living body identification method for living body identification in the embodiment of the disclosure obtains a plurality of key point information sets of the images to be processed by obtaining a plurality of images to be processed under different poses of the target object through the key point detection model trained in advance, and finally determines the living body identification result of the target object based on the plurality of key point information sets.

As shown in fig. 2, in one embodiment, step S103 includes:

s201: determining a key point array of the target object based on the key point information sets of the images to be processed; determining a rotation matrix array of the target object based on the key point information sets of the images to be processed;

s202: and obtaining a living body identification result of the target object based on the key point array and the rotation matrix array.

It should be noted that, for a plurality of images to be processed in which the target object is a living body, since the pose of the target object in different images to be processed is different, the spatial coordinate system based on the target object is different, and thus the spatial vector of the same face key point of the target object in different images to be processed is different. For the same plurality of face key points in any two images to be processed, the spatial vector rotation transformation relations of the face key points are consistent, wherein the rotation transformation relations can be represented by a rotation matrix. Based on the above, the rotation matrix can be used for representing the rotation transformation relation of each face key point of the target object in different images to be processed.

Illustratively, the key point array contains key point information of a plurality of to-be-processed images of the target object, and the rotation matrix array contains rotation transformation relations of coordinate values of the key point information in any two to-be-processed images. And inputting the key point array and the rotation matrix array into a pre-trained living body recognition model according to the key point coordinates contained in the key point information in each key point information set in the key point array and the rotation matrix between any two key point information sets in the rotation matrix array, so as to obtain a living body recognition result of the target object.

According to the embodiment, the key point information sets of the images to be processed are utilized, then the key point array and the rotation matrix array of the target object are determined based on the key point information sets, and the living body recognition result of the target object is obtained according to whether the rotation transformation relations of the face key points of the target object in different images to be processed are consistent. Thus, accuracy of living body detection of the target object is improved, and generalization and effectiveness of authentication of electronic screen attacks are improved.

As shown in fig. 3, in one embodiment, step S201 includes:

s301: and combining the key point information sets of the plurality of images to be processed to obtain a key point array of the target object.

Illustratively, for three images to be processed of the target object, 68 pieces of key point information of each image to be processed are detected through a key point detection model, respectively, to form a key point information set. The set of key point information may represent coordinate values of each key point information by a vector set of 68×3, for example, the set of key point information may be represented as { (x 1, y1, z 1), …, (x 68, y68, z 68) }. And combining the three key point information sets to obtain a 68 x 9 vector set serving as a key point array of the target object.

According to the embodiment, the key point information sets of the images to be processed are combined to obtain the key point array, and the key point information sets are integrated on the basis of keeping original key point information so as to meet the input requirement of the living body identification model.

As shown in fig. 4, in one embodiment, step S201 further includes:

s401: constructing a plurality of image pairs to be processed, wherein each image pair to be processed comprises any two of a plurality of images to be processed of a target object;

s402: calculating a rotation matrix of each image pair to be processed based on the key point information set of the image to be processed;

s403: and combining the rotation matrixes of the plurality of image pairs to be processed to obtain a rotation matrix array of the target object.

The image pair to be processed includes two different images to be processed, and the plurality of images to be processed can be obtained by combining the two different images to be processed by selecting the images to be processed from the plurality of images to be processed multiple times. For each image pair to be processed, calculating a rotation matrix between the key point information sets of the two images to be processed according to the coordinate values of the key point information contained in the images to be processed, and combining the rotation matrices corresponding to the multiple images to be processed to obtain a rotation matrix array of the target object.

For example, three images to be processed for a target object may be combined to obtain three pairs of images to be processed. And respectively calculating rotation matrixes of the three image pairs to be processed to obtain three rotation matrixes of 3*3. Combining the three rotation matrices to obtain the rotation matrix array of 9*3.

According to the embodiment, the rotation matrix of each image pair to be processed is calculated, and the obtained rotation matrices are combined to obtain the rotation matrix array of the target object, so that rotation transformation relation data among different key point information sets can be integrated, and the input requirements of a living body identification model are met.

As shown in fig. 5, in one embodiment, step S202 includes:

s501: preprocessing the key point array to obtain a preprocessed key point array;

s502: inputting the preprocessed key point array and the preprocessed rotation matrix array into a pre-trained living body recognition model to obtain a living body recognition result of the target object.

In step S501, the key point group after the pretreatment is preprocessed, so that the preprocessed key point group meets the input requirement of the living body recognition model, the data of the key point group can be simplified, and the detection efficiency and stability of the subsequent living body recognition model are improved.

Illustratively, in step S502, the living body recognition model may employ various models known to those skilled in the art or to be known in the future.

Taking a convolutional neural network model as an example of the living body identification model, specifically, the convolutional neural network comprises a feature extraction layer, a full connection layer (Fully Connected Layers, FC) and an image normalization processing layer. The feature extraction layer is used for carrying out feature extraction processing on the key point array and the rotation matrix array, then inputting the extracted features into the full-connection layer, inputting the classification result into the normalization processing layer according to the extracted features on the target object, and finally outputting the living body recognition result through normalization processing. The feature extraction layer can adopt MobileNet V2 (a depth separable convolution) as a backbone network of the convolution neural network; the normalization process layer may employ a Softmax layer (a logistic regression model).

In addition, in the training process of the living body recognition model, a plurality of images of the living body object and the attack object may be acquired as a plurality of living body image samples of the living body object and a plurality of attack image samples of the attack object, respectively. And respectively inputting the multiple living body image samples and the multiple attack image samples into a pre-trained key point detection model to obtain multiple key point information sets of the living body object and multiple key point information sets of the attack object. Obtaining a key point array and a rotation matrix array of the living body object based on a plurality of key point information sets of the living body object; and obtaining a key point array and a rotation matrix array of the attack object based on the plurality of key point information sets of the attack object. And respectively taking the key point array and the rotation matrix array of the living body object and the key point array and the rotation matrix array of the attack object as training samples to train the living body identification model.

According to the embodiment, the key point array and the rotation matrix array are input into the pre-trained living body recognition model to obtain the living body recognition result of the target object, so that on one hand, the detection precision and accuracy of the living body recognition result are improved, and the generalization of attack detection is improved; on the other hand, the detection efficiency of living body detection is improved, and further popularization of service items by adopting the method of the embodiment of the disclosure is facilitated.

As shown in fig. 6, in one embodiment, the key point information includes coordinate values and pixel values corresponding to the coordinate values, and step S501 includes:

s601: carrying out normalization processing on pixel values of each key point information in the key point array to obtain normalized pixel values of each key point information, wherein the numerical value of the normalized pixel values accords with a preset interval;

s602: and obtaining a preprocessed key point array based on the normalized pixel value of each key point information.

Illustratively, in step S601, for the pixel value included in each key point information, a calculation process is performed according to a preset rule, so as to obtain a normalized pixel value of each key point information.

For example, for each pixel value included in the key point information, the difference value obtained by subtracting 112 from each pixel value is divided by 224 to obtain a normalized pixel value, wherein the value range of the normalized pixel value is between [ -0.5,0.5 ].

By the implementation mode, the related numerical value of the key point information in the key point array after pretreatment can be limited in a preset range, so that the influence of the singular data in the data on the finally generated living body recognition result is eliminated, and the accuracy of the living body recognition result is further improved.

A method for living body recognition according to an embodiment of the present disclosure is described below in one specific application scenario with reference to fig. 7.

As shown in fig. 7, the method for in-vivo recognition of the embodiment of the present disclosure may be applied to a living body detection scene for a human face. First, a plurality of acquired images of an object to be detected in different poses, for example, three acquired images of the left side, the front side and the right side of the face of the object to be detected, are acquired. And carrying out data preprocessing on the three acquired images so that the data of the three acquired images meet the input requirement of the face key point detection model. And respectively inputting the three acquired images into a face key point detection model to obtain a key point information set corresponding to each acquired image. The key point information set comprises a plurality of key point information, and each key point information comprises key point coordinates and corresponding pixel values.

Then, based on the key point information sets of the three acquired images, a key point array is obtained through combination processing. And for any two key point information sets in the three key point information sets of the acquired images, calculating a rotation matrix between the two key point information sets, and combining the obtained rotation matrices to obtain a rotation matrix array.

And finally, inputting the key point array and the rotation matrix array into a pre-trained living body recognition model to obtain a living body recognition result of the object to be detected.

According to another aspect of the present disclosure, a method for training a keypoint detection model is also provided.

As shown in fig. 8, the training method of the keypoint detection model includes:

s801: determining a target key point information set by using the sample image;

s802: inputting the sample image into a key point detection model to be trained to obtain a predicted key point information set;

s803: and determining the difference between the target key point information set and the predicted key point information set, and training the key point detection model to be trained according to the difference until the difference is within the allowable range.

Illustratively, in step S801, the obtained living body sample image and attack sample image may be taken as sample images by respectively acquiring a plurality of images of the living body object and the attack object in different poses. Wherein the attack object may be a photograph or video of the living object.

Illustratively, in step S803, the difference between the target set of key point information and the predicted set of key point information may be determined by calculating a loss function between the coordinate values of the predicted key point information in the predicted set of key point information and the coordinate values of the target key point information in the target set of key point information.

According to the method, training of the key point detection model is achieved, and the key point information set of the living body object or the attack object can be accurately detected by taking a plurality of images of the living body object or the attack object under different poses as sample images.

As shown in fig. 9, in one embodiment, step S801 includes:

s901: carrying out image extraction processing on the sample image to obtain a face region image;

s902: carrying out normalization processing on the face area image to obtain a normalized face area image;

s903: and obtaining a target key point information set according to a matching result of the normalized face area image and the pre-established key point data set.

Illustratively, in step S901, first, a preset number of reference keypoints are determined in a sample image by using a pre-established face keypoint data set, and mask processing is performed on the sample image based on the preset number of reference keypoints, so as to obtain a templated representation of the sample image. Then, using a pre-established high-definition face data set (Flickr Faces High Quality, FFHQ), the face region image is aligned, cut and resized according to the matching result of the templated representation of the sample image in the high-definition face data set, to obtain a face region image with a size of 224×224.

Illustratively, in step S902, for each pixel value in the face region image, a calculation process is performed according to a preset rule, so as to obtain a normalized pixel value of each pixel. For example, for each pixel in the face region image, dividing the pixel value corresponding to each pixel by 255 and subtracting 1 to obtain a normalized pixel value of each pixel, where the value range of the normalized pixel value is between [ -0.5,0.5 ].

Illustratively, in step S903, the keypoint data set may employ various keypoint data sets known to those skilled in the art or as may be known in the future. The key point data set comprises a batch of high-definition images collected in advance, and each high-definition image comprises a key point marked in advance. By matching the normalized face region image with the keypoint dataset, a target keypoint dataset of the sample image may be conveniently obtained.

Through the implementation mode, the determined target key point data set can be ensured to have higher accuracy, the manual marking cost for manually marking is saved, and the acquisition difficulty of the target key point data set is reduced.

As shown in fig. 10, in one embodiment, step S803 includes:

S1001: determining target key point information corresponding to each piece of predicted key point information in the predicted key point information set based on the target key point information set and the predicted key point information set;

s1002: and calculating a loss value between the coordinate value of the predicted key point information and the coordinate value of the corresponding target key point information, and determining the difference between the target key point information set and the predicted key point information set according to the loss value.

Illustratively, the loss values of the predicted key point information and the corresponding target key point information may be obtained by calculating an L1 loss function between the coordinate values of each predicted key point information and the coordinate values of the corresponding target key point information. And then determining the difference between the target key point information set and the predicted key point information set by calculating the average value of the loss values corresponding to all the predicted key point information.

By the embodiment, the difference between the target key point information set and the predicted key point information set can be accurately determined, and the optimization effect on the key point detection model in the training process is improved.

According to another aspect of the present disclosure, there is also provided an apparatus for living body identification.

As shown in fig. 11, the apparatus for living body recognition includes:

The image acquisition module 1101 is configured to acquire a plurality of to-be-processed images of the target object, where each to-be-processed image corresponds to a different pose of the target object;

the key point information set generating module 1102 is configured to input an image to be processed into a pre-trained key point detection model to obtain a key point information set of the image to be processed, where the key point information set includes a plurality of key point information;

the living body recognition result generating module 1103 is configured to obtain a living body recognition result of the target object based on the key point information set of each image to be processed.

In one embodiment, the living body identification result generation module 1103 includes:

the key point group determining submodule is used for determining a key point group of a target object based on the key point information set of each image to be processed; determining a rotation matrix array of the target object based on the key point information sets of the images to be processed;

and the living body identification result generation sub-module is used for obtaining the living body identification result of the target object based on the key point array and the rotation matrix array.

In one embodiment, the key point number group determination submodule includes:

and the key point array determining unit is used for carrying out combination processing on the key point information sets of the plurality of images to be processed to obtain the key point array of the target object.

In one embodiment, the living body recognition result generation submodule includes:

a to-be-processed image pair construction unit configured to construct a plurality of to-be-processed image pairs, each to-be-processed image pair including any two of a plurality of to-be-processed images of the target object;

the rotation matrix calculation unit is used for calculating the rotation matrix of each image pair to be processed based on the key point information set of the image to be processed;

and the rotation matrix array generating unit is used for combining the rotation matrices of the plurality of image pairs to be processed to obtain a rotation matrix array of the target object.

the preprocessing unit is used for preprocessing the key point groups to obtain preprocessed key point groups;

and the living body recognition result generating unit is used for inputting the preprocessed key point array and the preprocessed rotation matrix array into a pre-trained living body recognition model to obtain a living body recognition result of the target object.

In one embodiment, the key point information includes coordinate values and pixel values corresponding to the coordinate values; the preprocessing unit is also used for:

carrying out normalization processing on pixel values of each key point information in the key point array to obtain normalized pixel values of each key point information, wherein the numerical value of the normalized pixel values accords with a preset interval; the method comprises the steps of,

And obtaining a preprocessed key point array based on the normalized pixel value of each key point information.

According to another aspect of the present disclosure, there is also provided a training apparatus of a keypoint detection model,

as shown in fig. 12, the training device for the keypoint detection model includes:

a target key point information set determining module 1201, configured to determine a target key point information set using the sample image;

the predicted key point information set generating module 1202 is configured to input a sample image into a key point detection model to be trained, so as to obtain a predicted key point information set;

the training module 1203 is configured to determine a difference between the target key point information set and the predicted key point information set, and train the key point detection model to be trained according to the difference until the difference is within the allowable range.

In one embodiment, the target keypoint information set determination module 1201 includes:

the image extraction sub-module is used for carrying out image extraction processing on the sample image to obtain a face area image;

the normalization processing sub-module is used for carrying out normalization processing on the face area image to obtain a normalized face area image;

and the target key point information generation sub-module is used for obtaining a target key point information set according to a matching result of the normalized face area image and the pre-established key point data set.

In one embodiment, training module 1203 includes:

the corresponding relation determining sub-module is used for determining target key point information corresponding to each piece of predicted key point information in the predicted key point information set based on the target key point information set and the predicted key point information set;

and the difference determining sub-module is used for calculating a loss value between the coordinate value of the predicted key point information and the coordinate value of the corresponding target key point information, and determining the difference between the target key point information set and the predicted key point information set according to the loss value.

The functions of each unit, module or sub-module in each apparatus of the embodiments of the present disclosure may be referred to the corresponding descriptions in the above method embodiments, which are not repeated herein.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 13 illustrates a schematic block diagram of an example electronic device 1300 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 13, the electronic device 1300 includes a computing unit 1301 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1302 or a computer program loaded from a storage unit 1308 into a Random Access Memory (RAM) 1303. In the RAM 1303, various programs and data required for the operation of the electronic device 1300 can also be stored. The computing unit 1301, the ROM 1302, and the RAM 1303 are connected to each other through a bus 1304. An input output (I/O) interface 1305 is also connected to bus 1304.

Various components in electronic device 1300 are connected to I/O interface 1305, including: an input unit 1306 such as a keyboard, a mouse, or the like; an output unit 1307 such as various types of displays, speakers, and the like; storage unit 1308, such as a magnetic disk, optical disk, etc.; and a communication unit 1309 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1309 allows the electronic device 1300 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 1301 performs the respective methods and processes described above, for example, a method for living body recognition and/or a training method of a keypoint detection model. For example, in some embodiments, the method for in-vivo identification and/or the training method of the keypoint detection model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1308. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1300 via the ROM 1302 and/or the communication unit 1309. When the computer program is loaded into the RAM 1303 and executed by the computing unit 1301, one or more steps of the above-described method for living body identification and/or training method of the keypoint detection model may be performed. Alternatively, in other embodiments, computing unit 1301 may be configured to perform the method for in-vivo identification and/or the training method of the keypoint detection model by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method for in vivo identification, comprising:

inputting the image to be processed into a pre-trained key point detection model to obtain a key point information set of the image to be processed, wherein the key point information set comprises a plurality of key point information;

combining the key point information sets of the images to be processed to obtain a key point array of the target object;

constructing a plurality of image pairs to be processed, wherein each image pair to be processed comprises any two of a plurality of images to be processed of the target object;

calculating a rotation matrix of each image pair to be processed based on the key point information set of the image to be processed;

Combining the rotation matrixes of the plurality of image pairs to be processed to obtain a rotation matrix array of the target object;

preprocessing the key point array to obtain a preprocessed key point array;

inputting the preprocessed key point array and the rotation matrix array into a pre-trained living body recognition model to obtain a living body recognition result of the target object.

2. The method of claim 1, wherein the key point information includes coordinate values and pixel values corresponding to the coordinate values;

the preprocessing of the key point array to obtain a preprocessed key point array comprises the following steps:

normalizing pixel values of all the key point information in the key point array to obtain normalized pixel values of all the key point information, wherein the numerical value of the normalized pixel values accords with a preset interval;

3. An apparatus for living body identification, comprising:

the image acquisition module is used for acquiring a plurality of images to be processed of a target object, and each image to be processed corresponds to different poses of the target object respectively;

the key point array determining unit is used for carrying out combination processing on the key point information sets of the plurality of images to be processed to obtain a key point array of the target object;

a to-be-processed image pair construction unit configured to construct a plurality of to-be-processed image pairs, each of which includes any two of a plurality of to-be-processed images of the target object;

a rotation matrix calculating unit, configured to calculate a rotation matrix of each image pair to be processed based on the key point information set of the image to be processed;

the rotation matrix array generating unit is used for combining the rotation matrices of the plurality of image pairs to be processed to obtain a rotation matrix array of the target object;

and the living body recognition result generating unit is used for inputting the preprocessed key point array and the rotation matrix array into a pre-trained living body recognition model to obtain a living body recognition result of the target object.

4. The apparatus of claim 3, wherein the key point information includes coordinate values and pixel values corresponding to the coordinate values; the preprocessing unit is further used for:

normalizing pixel values of all the key point information in the key point array to obtain normalized pixel values of all the key point information, wherein the numerical value of the normalized pixel values accords with a preset interval; the method comprises the steps of,

5. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of claim 1 or 2.

6. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of claim 1 or 2.