CN113221766A

CN113221766A - Method for training living body face recognition model and method for recognizing living body face and related device

Info

Publication number: CN113221766A
Application number: CN202110540898.1A
Authority: CN
Inventors: 王珂尧
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Nanjing Xiyun Information Technology Co ltd
Priority date: 2021-05-18
Filing date: 2021-05-18
Publication date: 2021-08-06
Anticipated expiration: 2041-05-18
Also published as: CN113221766B

Abstract

The present disclosure provides a method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product for training a living body face recognition model, recognizing a living body face, which relate to the field of artificial intelligence such as computer vision and deep learning technology, and can be applied in a face recognition scene. One embodiment of the method comprises: preprocessing the live figure image contained in the live figure image set, and extracting a face image from the preprocessed image to obtain a live face image set; generating a reflecting face image corresponding to a living body face image in the living body face image set in a preset reflecting feature simulation mode to obtain a reflecting face image set; and extracting the light reflection characteristics from the light reflection face images in the light reflection face image set by using a coding-decoding model, and training the light reflection characteristics as training samples to obtain a living body face recognition model. The living body face recognition model provided by the embodiment can better recognize the electronic photo attack.

Description

Method for training living body face recognition model and method for recognizing living body face and related device

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to computer vision and deep learning techniques, which can be applied in face recognition scenarios, and in particular, to a method for training a living body face recognition model and recognizing a living body face, and a corresponding apparatus, electronic device, computer-readable storage medium, and computer program product.

Background

The human face living body detection is to distinguish whether an image is shot by a real person, is a basic composition module of a human face recognition system and ensures the safety of the human face recognition system.

In the prior art, a living human face recognition is generally realized by using a human face living body detection algorithm of a deep learning technology.

Disclosure of Invention

The embodiment of the disclosure provides a method, a device, an electronic device, a computer-readable storage medium and a computer program product for training a living body face recognition model and recognizing a living body face.

In a first aspect, an embodiment of the present disclosure provides a method for training a living body face recognition model, including: preprocessing the live figure image contained in the live figure image set, and extracting a face image from the preprocessed image to obtain a live face image set; generating a reflecting face image corresponding to a living body face image in the living body face image set in a preset reflecting feature simulation mode to obtain a reflecting face image set; and extracting the light reflection characteristics from the light reflection face images in the light reflection face image set by using a coding-decoding model, and training the light reflection characteristics as training samples to obtain a living body face recognition model.

In a second aspect, an embodiment of the present disclosure provides an apparatus for training a living human face recognition model, including: the living body human face image set acquisition unit is configured to preprocess a living body figure image contained in the living body figure image set, and extract a human face image from the preprocessed image to obtain a living body human face image set; the reflecting face image set acquisition unit is configured to generate reflecting face images corresponding to the living body face images in the living body face image set in a preset reflecting feature simulation mode to obtain a reflecting face image set; and the reflecting characteristic extracting and model training unit is configured to extract reflecting characteristics from the reflecting face images in the reflecting face image set by using the coding-decoding model, and train the reflecting characteristics as training samples to obtain the living body face recognition model.

In a third aspect, an embodiment of the present disclosure provides a method for recognizing a living human face, including: acquiring a face image to be recognized; calling a living body face recognition model to recognize a face image to be recognized; the living body face recognition model is obtained according to the method for training the living body face recognition model described in any implementation manner of the first aspect.

In a fourth aspect, an embodiment of the present disclosure provides an apparatus for recognizing a face of a living body, including: a face image to be recognized acquisition unit configured to acquire a face image to be recognized; a calling model identification unit configured to call a living body face identification model to identify a face image to be identified; wherein the living body face recognition model is obtained according to the apparatus for training a living body face recognition model as described in any implementation manner of the second aspect.

In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of training a live face recognition model as described in any one of the implementations of the first aspect or the method of recognizing a live face as described in any one of the implementations of the third aspect when executed.

In a sixth aspect, the disclosed embodiments provide a non-transitory computer-readable storage medium storing computer instructions for enabling a computer to implement the method of training a living body face recognition model as described in any one of the implementations of the first aspect or the method of recognizing a living body face as described in any one of the implementations of the third aspect when executed.

In a seventh aspect, the disclosed embodiments provide a computer program product comprising a computer program, which when executed by a processor is capable of implementing the method of training a living body face recognition model as described in any of the implementations of the first aspect or the method of recognizing a living body face as described in any of the implementations of the third aspect.

According to the method for training the living body face recognition model and recognizing the living body face, the accuracy of the living body face image is improved by preprocessing the living body figure image, and the accuracy of the reflection face image obtained by presetting a reflection characteristic simulation mode is further improved, so that the living body face recognition model trained according to the method can better recognize electronic photo attacks, and the extracted reflection characteristic is more accurate due to the use of the coding-decoding model.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture to which the present disclosure may be applied;

fig. 2 is a flowchart of a method for training a living human face recognition model according to an embodiment of the present disclosure;

fig. 3 is a flowchart of a method for acquiring a living human face image set according to an embodiment of the present disclosure;

fig. 4 is a flowchart of a method for obtaining a reflective face image set according to an embodiment of the present disclosure;

fig. 5 is a flowchart of a method for extracting light reflection characteristics through an encoding-decoding model according to an embodiment of the present disclosure;

fig. 6 is a block diagram of a structure of an apparatus for training a living human face recognition model according to an embodiment of the present disclosure;

fig. 7 is a block diagram illustrating a structure of an apparatus for recognizing a face of a living body according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an electronic device suitable for executing a method for training a living body face recognition model and recognizing a living body face according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, necessary security measures are taken, and the customs of the public order is not violated.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the present methods, apparatuses, electronic devices and computer-readable storage media for training a face recognition model and recognizing a face may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 and the server 105 may be installed with various applications for implementing information communication between the two, such as a data encryption application, a living body face recognition application, an instant messaging application, and the like.

The

terminal apparatuses

101, 102, 103 and the server 105 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like; when the

terminal devices

101, 102, and 103 are software, they may be installed in the electronic devices listed above, and they may be implemented as multiple software or software modules, or may be implemented as a single software or software module, and are not limited in this respect. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server; when the server is software, the server may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not limited herein.

The server 105 can provide various services through various built-in applications, taking a living body face recognition application which can provide a living body face recognition service for a user as an example, the server 105 can realize the following effects when running the living body face recognition application: firstly, receiving incoming face images to be recognized from

terminal equipment

101, 102 and 103 through a network 104; then, calling a previously trained living body face recognition model from a preset storage position, and inputting a face image to be recognized into the living body face recognition model as input data; finally, the recognition result output by the living body face recognition model is returned to the

terminal devices

101, 102, 103.

The living body face recognition model can be obtained by training a built-in model training application on the server 105 according to the following steps: firstly, preprocessing a live figure image contained in a live figure image set, and extracting a face image from the preprocessed image to obtain a live face image set; then, generating a reflecting face image corresponding to the living body face image in the living body face image set in a preset reflecting feature simulation mode to obtain a reflecting face image set; and then, extracting the light reflection characteristics from the light reflection face images in the light reflection face image set by using a coding-decoding model, and training based on the light reflection characteristics to obtain a living body face recognition model.

Since the living body face recognition model obtained by training needs to occupy more computing resources and stronger computing power, the method for training the living body face recognition model provided in the following embodiments of the present application is generally executed by the server 105 having stronger computing power and more computing resources, and accordingly, the device for training the face recognition model is generally disposed in the server 105. However, it should be noted that when the

terminal devices

101, 102, and 103 also have computing capabilities and computing resources meeting the requirements, the

terminal devices

101, 102, and 103 may also complete the above-mentioned operations that are originally delivered to the server 105 through the model training application installed thereon, and then output the same result as the server 105. Correspondingly, the means for training the face recognition model may also be provided in the

terminal device

101, 102, 103. In such a case, the exemplary system architecture 100 may also not include the server 105 and the network 104.

Of course, the server used for training to obtain the living body face recognition model may be different from the server used for recognizing the face image to be recognized. Specifically, the living body face recognition model trained by the server 105 may also obtain a light-weight living body face recognition model suitable for being placed in the

terminal devices

101, 102, and 103 by means of model distillation, that is, the light-weight living body face recognition model in the

terminal devices

101, 102, and 103 or the target face recognition model in the server 105 may be selected and used flexibly according to the recognition accuracy required in practice.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring to fig. 2, fig. 2 is a flowchart of a method for training a living human face recognition model according to an embodiment of the present disclosure, wherein the process 200 includes the following steps:

step 201: preprocessing the live figure image contained in the live figure image set, and extracting a face image from the preprocessed image to obtain a live face image set;

this step is intended to derive a set of at least one live face image from the pre-processed live person image by an executive (e.g., the server 105 shown in fig. 1) of the method of training the live face recognition model. Wherein, the 'living body' in the living body face image is used for limiting that the face image comes from a living body person, and does not come from false channels such as mask camouflage, electronic photograph display and the like. The living human face image can be shot by other execution subjects and sent to the execution subjects, and can also be directly shot by the execution subjects in a specific scene.

It should be noted that the present disclosure allows the original image to be processed during the process of acquiring the living human face image, but the processing performed on the original image should avoid operations such as beautifying, brightness enhancement, skin grinding and the like to easily remove details of the image content, especially details of the face image, as much as possible, and maintain the original state of the human face as much as possible. Besides, operations such as image segmentation, rectification, size adjustment and the like which do not affect the details of the image content can be used according to actual situations.

One implementation, including and not limited to, is:

firstly, acquiring a living figure image set; then, preprocessing the live figure image contained in the live figure image set, and extracting a face image from the preprocessed image to obtain an original face image set; and then, carrying out normalization processing on the original face images in the original face image set to obtain a living body face image set. The living body figure image is an image obtained by naturally shooting a real figure, namely, the image does not necessarily contain a face image of a person, and the purpose of the normalization operation is to unify the display mode of the face image as much as possible so as to facilitate the training of the model. Of course, besides this approach, it is also possible to attempt to directly acquire a living face image to construct a living face image set.

Step 202: generating a reflecting face image corresponding to a living body face image in the living body face image set in a preset reflecting feature simulation mode to obtain a reflecting face image set;

on the basis of step 201, this step is intended to generate a retroreflective face image corresponding to a living body face image from the execution subject described above. The preset reflection characteristic simulation mode refers to a mode capable of simulating and generating reflection characteristics, such as modes of spacing transparent reflection materials, contrasting reflectors, obtaining images displayed in a display screen directly through lens shooting, and the like. Wherein a transparent light reflecting material, such as transparent glass, and a mirror, such as a mirror.

It should be understood that whatever the reflection feature simulation mode is, in practice, an attack mode that an electronic photograph is used to bypass a living body face recognition mechanism under a simulated actual scene is adopted, so as to acquire the reflection features consistent with an actual attack means as much as possible.

In view of the efficiency of acquiring the retroreflective face image, one implementation includes and is not limited to: and generating a reflecting face image corresponding to the living body face image in the living body face image set by spacing transparent reflecting materials and/or contrasting reflectors to obtain a reflecting face image set.

The "and/or" practice in this embodiment may be broken down into three broad categories of generation: firstly, a reflective face image is generated only by spacing transparent reflective materials; secondly, generating a reflecting face image only by contrasting a reflector; thirdly, generating a reflecting face image by spacing transparent reflecting materials and contrasting reflectors. Furthermore, a third generation mode may have an adjustment space in terms of specific implementation details, and further, multiple implementation modes may be subdivided under such circumstances, for example, an intermediate image is obtained by first using a transparent reflective material at intervals, a reflective face image corresponding to the intermediate image is generated by contrasting reflectors, and another implementation mode may be obtained by changing the execution sequence, which is not listed here one by one.

Step 203: and extracting the reflection characteristics from the reflection face images in the reflection face image set by using a coding-decoding model, and training based on the reflection characteristics to obtain a living body face recognition model.

On the basis of step 202, this step is intended to extract, by the executing agent, the glisten features from the glisten face images in the glisten face image set, and train the living body face recognition model based on the extracted glisten features. The method specifically adopts an encoding-decoding (Encoder-Decoder) model as a feature extraction network, and improves the accuracy of extracted light reflection features by means of a data processing mode that the encoding-decoding model is different from a conventional model.

For image data, the encoding sub-model in the encoding-decoding model performs down-sampling with respect to the image to gradually reduce the image size to mine hidden image features as much as possible, and the decoding sub-model performs reverse operation to gradually restore the reduced image size to the original image size by continuously performing up-sampling operation. The accurate reflective characteristics are mined through the continuous down-sampling operation of the coding-decoding model, and the reflective characteristics are more highlighted in the original-size image through the continuous up-sampling operation. Since the same-size image as the input image can be output by the encoding-decoding model, better recognizability is achieved.

According to the method for training the living body face recognition model provided by the embodiment of the disclosure, the accuracy of the living body face image is improved by preprocessing the living body figure image, and the accuracy of the reflective face image obtained by presetting a reflective feature simulation mode is further improved, so that the trained living body face recognition model can better recognize electronic photo attacks, and the extracted reflective feature is more accurate due to the use of the coding-decoding model.

Referring to fig. 3, fig. 3 is a flowchart of a method for acquiring a living body face image set according to an embodiment of the present disclosure, that is, a specific implementation manner is provided for step 201 in the flowchart 200 shown in fig. 2, other steps in the flowchart 200 are not adjusted, and a new complete embodiment is obtained by replacing step 201 with the specific implementation manner provided in this embodiment. Wherein the process 300 comprises the following steps:

step 301: acquiring a living figure image set;

the living person image is an image obtained by naturally photographing an actual person, that is, an image of a face of a person is not necessarily included therein.

Step 302: respectively determining the face position information in each living body figure image in the living body figure image set;

on the basis of step 301, this step is intended to determine general positional information of the face in each live person image in the image by the above-described executing body, so as to determine a key point extraction region based on the general positional information. Specifically, the operation of determining the general position information may be implemented by a pre-trained model or a simple positioning model.

Step 303: extracting key points of the face from the face region determined based on the face position information to obtain key point coordinates;

on the basis of step 302, this step is intended to extract the key points of the face from the face region determined based on the face position information by the execution subject, and obtain the coordinates of the key points. The face key points are key points used for constructing the whole face of the face, and the specific key point extraction operation can be realized through a pre-training model or a known key point extraction algorithm, which is not described herein again.

Step 304: determining a face extraction frame according to the key point coordinate pair, and extracting a face image from a corresponding live figure image according to the face extraction frame to obtain an original face image set;

on the basis of step 303, this step is intended to determine a face extraction frame from the key point coordinate pair by the execution subject described above. The determined face extraction frame is used for deducting a complete face image from the live figure image as much as possible. Therefore, the face extraction frame is obtained by properly extending the face outline determined by the key point coordinates, and the extending degree can be determined according to the actual situation.

Step 305: and carrying out normalization processing on the original face images in the original face image set to obtain a living body face image set.

The purpose of the normalization processing is to unify the expression forms, such as scale, resolution and the like, of the original face images extracted from different original face images as much as possible, so as to train and obtain a more accurate living body face recognition model.

In the embodiment, a live figure image set is taken as basic data, and the live figure image set is finally obtained through face positioning, key point extraction, face extraction frame determination and normalization processing operations in sequence.

Referring to fig. 4, fig. 4 is a flowchart of a method for obtaining a light-reflecting face image set according to an embodiment of the present disclosure, that is, a specific implementation manner is provided for step 202 in the flowchart 200 shown in fig. 2, other steps in the flowchart 200 are not adjusted, and a new complete embodiment is obtained by replacing step 202 with the specific implementation manner provided in this embodiment. Wherein the process 400 comprises the following steps:

step 401: generating a first reflecting face image corresponding to a living body face image in the living body face image in a mode of spacing transparent reflecting materials to obtain a first reflecting face image set;

step 402: generating a second reflecting face image corresponding to the living body face image in a mode of contrasting a reflector to obtain a second reflecting face image set;

step 403: and generating a reflecting face image set according to the first reflecting face image set and the second reflecting face image set.

The embodiment is a specific implementation manner of the third implementation manner (i.e., a manner of spacing transparent reflective materials and contrasting reflectors) in the three implementation manners provided in step 202, that is, respective reflective face image sets are obtained based on the manner of spacing transparent reflective materials and the manner of contrasting reflectors, and then a reflective face image set for finally extracting reflective features is formed according to the two reflective face image sets.

The light reflection face image set finally used for extracting the light reflection features can be directly obtained by summarizing the two light reflection face image sets, and the light reflection features obtained by the two modes can be mixed in other modes so as to improve the diversity of the light reflection features expressed on the images.

One implementation, including and not limited to, may be:

generating a third reflecting face image corresponding to the second reflecting face image in the second reflecting face image set in a mode of spacing transparent reflecting materials to obtain a third reflecting face image set;

generating a fourth reflecting face image corresponding to the first reflecting face image in the first reflecting face image set in a reflector contrasting manner to obtain a fourth reflecting face image set;

and generating a reflecting face image set according to the third reflecting face image set and the fourth reflecting face image set. Namely, the two reflecting modes are combined by superposing the other generating mode.

Referring to fig. 5, fig. 5 is a flowchart of a method for extracting light reflection characteristics through an encoding-decoding model according to an embodiment of the present disclosure, that is, a specific implementation manner is provided for step 203 in the flow 200 shown in fig. 2, other steps in the flow 200 are not adjusted, and a new complete embodiment is obtained by replacing step 203 with the specific implementation manner provided in this embodiment. Wherein the process 500 comprises the following steps:

step 501: utilizing the coding sub-model to carry out down-sampling on the reflecting face images in the reflecting face image set to obtain a down-sampling result;

step 502: sequentially passing through the convolutional layer and the full-connection layer, and performing cross entropy loss supervision of two categories on the downsampling result to obtain a supervised downsampling result;

specifically, three convolutional layers and a full-link layer can be connected to an output part of an encoding sub-model (Encoder) so that the model learns the light reflection characteristics which are used for distinguishing the living human face as accurately as possible through cross entropy loss supervision of two classifications.

Step 503: utilizing a decoding sub-model to perform up-sampling on the down-sampling result after supervision to obtain a light reflection characteristic image with the size consistent with the size of the light reflection face image;

step 504: and extracting the light reflection characteristics from the light reflection characteristic diagram.

It should be understood that the embodiments shown in fig. 3-5 are defined or developed with respect to different steps in the embodiment shown in fig. 2, and the specific embodiment solutions provided by the above embodiments may also be combined to obtain a preferred embodiment according to actual needs.

In order to make the trained living body face recognition model really play a role in an actual scene, the following scheme shows a method for recognizing a living body face, which comprises the following steps:

acquiring a face image to be recognized;

and calling a living body face recognition model to recognize the face image to be recognized.

The living body face recognition model is obtained by the method for training the living body face recognition model described in the above embodiment.

The face image to be recognized may be obtained by shooting by other execution subjects and sent to the execution subject, or may be obtained by directly shooting by the execution subject in a specific scene. The number of the face images to be recognized is not limited, and a plurality of face images can be simultaneously acquired even corresponding to the same user, and under a common condition, living body face recognition only needs to be performed through one picture, but the final recognition accuracy can be improved by combining the discrimination results of the plurality of pictures. And some most suitable images can be selected from the shot video to be used as the face images to be recognized. Furthermore, the face image to be recognized should avoid unnecessary image processing operations, such as beautifying, brightness enhancement, peeling, and the like, which are easy to remove details of the image content, as much as possible, and maintain the original state of the captured image as much as possible.

The execution subject for executing the steps of the method for recognizing the living body face can be distinguished from the execution subject for executing the steps of the method for training the living body face recognition model.

In order to deepen understanding, the disclosure also provides a specific implementation scheme by combining a specific application scenario:

firstly, acquiring a plurality of living body face images and electronic screen attack face images;

then, preprocessing each acquired image, wherein the preprocessing mechanism comprises the following steps: detecting to obtain an approximate position area where a human face is located in the image through a positioning detection model; extracting face key points and coordinate values thereof through a face key point detection model pair according to the detected approximate position area to obtain 72 face key point coordinates which are respectively (x1, y1) … (x72, y 72); obtaining the maximum and minimum values xmin, xmax, ymin and ymax of x and y according to the coordinates of key points of 72 human faces, determining a human face frame according to the maximum and minimum values, then expanding the human face frame by three times (namely a face global extraction frame), then intercepting a human face image, and adjusting the human face image to 224x 224; subjecting the obtained face global image with the size of 224x224 to image normalization processing, specifically, subtracting 128 from the pixel value of each pixel in the image and dividing by 256 to make the pixel value of each pixel between [ -0.5,0.5 ]; carrying out random data enhancement processing on the normalized image;

next, a reflectance detection model is trained using the open-source reflectance dataset, so that the trained model can generate a reflectance map of the same size as the input image. Using the trained reflection detection model to predict the screen attack data (namely, the electronic screen attack face image) in the training set once to obtain the reflection characteristic diagram of each screen attack data, using the characteristic diagrams as the labels of the screen attack data, and enabling the labels of the real people to be all-zero matrixes with the same size;

specifically, the encoding-decoding feature extraction network of the screen reflection living body model uses UNet (a segmentation network) as a backbone network of a convolutional neural network, a preprocessed image is used as an input, a reflection map with the same size (224x224) as the input is obtained after the model is passed, and if the input is a living body face image, an all-zero matrix with the size of 224x224 and the reflection map generated by the model are used for solving the L1 loss (also called minimum absolute value deviation, which is the sum of absolute differences between a target value and an estimated value is minimized); and if the input is the screen attack image, calculating the L1 loss by using the reflection characteristic map and the reflection map. Meanwhile, in order to enable the model to learn the living body detection characteristics, cross entropy loss supervision of two classifications can be carried out by connecting three convolutional layers and a full connection layer to the output of an Encoder network of UNet (a common image segmentation network).

During final prediction, the average value of the characteristic graphs output by the screen reflection living body model is used as a prediction result, and if the result is greater than a set threshold value, screen attack is performed; if the result is less than the set threshold value, the person is a real person.

In the embodiment, the screen reflection characteristics are detected by using a coding-decoding network structure, a reflection characteristic graph which is the same as the original graph is generated, the characteristic graphs are classified, and the method has stronger identification and defense capabilities for the situation of electronic photo attack, particularly high-definition screen display photo attack.

With further reference to fig. 6 and 7, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for training a living body face recognition model and an embodiment of an apparatus for recognizing a living body face, respectively, where the embodiment of the apparatus for training a living body face recognition model corresponds to the embodiment of the method for training a living body face recognition model shown in fig. 2, and the embodiment of the apparatus for recognizing a living body face corresponds to the embodiment of the method for recognizing a living body face. The device can be applied to various electronic equipment.

As shown in fig. 6, the apparatus 600 for training a living body face recognition model of the present embodiment may include: a living body face image set acquisition unit 601, a reflection face image set acquisition unit 602, and a reflection feature extraction and model training unit 603. The living body face image set acquisition unit 601 is configured to pre-process a living body figure image contained in a living body figure image set, and extract a face image from an image obtained after the pre-processing to obtain a living body face image set;

a reflective face image set obtaining unit 602, configured to generate, in a preset reflective feature simulation manner, a reflective face image corresponding to a living body face image in the living body face image set, so as to obtain a reflective face image set;

a reflection feature extraction and model training unit 603 configured to extract reflection features from the reflection face images in the reflection face image set by using a coding-decoding model, and train the reflection features as training samples to obtain a living body face recognition model.

In the present embodiment, in the apparatus 600 for training a living body face recognition model: the specific processing of the living body face image set obtaining unit 601, the reflective face image set obtaining unit 602, the reflective feature extraction and model training unit 603 and the technical effects brought by the processing can refer to the related descriptions of step 201 and step 203 in the corresponding embodiment of fig. 2, which are not described herein again.

In some optional implementations of this embodiment, the reflective face image set obtaining unit 602 may include:

and the spacing and/or contrast simulation subunit is configured to generate a reflective face image corresponding to the living face image in the living face image set by spacing the transparent reflective material and/or contrast reflector, so as to obtain a reflective face image set.

In some optional implementations of this embodiment, the spacing and/or comparing analog subunit may include:

the first reflection processing module is configured to generate a first reflection face image corresponding to a living body face image in the living body face image in a mode of spacing transparent reflection materials to obtain a first reflection face image set;

the second reflection processing module is configured to generate a second reflection face image corresponding to the living body face image in a reflector contrasting manner to obtain a second reflection face image set;

and the reflecting face image set generating module is configured to generate a reflecting face image set according to the first reflecting face image set and the second reflecting face image set.

In some optional implementations of this embodiment, the retro-reflective face image set generation module may be further configured to:

and generating a reflecting face image set according to the third reflecting face image set and the fourth reflecting face image set.

In some optional implementations of this embodiment, the reflection feature extraction and model training unit 603 may include a reflection feature extraction subunit configured to extract reflection features from the reflection face images in the reflection face image set using the encoding-decoding model, the reflection feature extraction subunit being further configured to:

utilizing the coding sub-model to carry out down-sampling on the reflecting face images in the reflecting face image set to obtain a down-sampling result;

sequentially passing through the convolutional layer and the full-connection layer, and performing cross entropy loss supervision of two categories on the downsampling result to obtain a supervised downsampling result;

utilizing a decoding sub-model to perform up-sampling on the down-sampling result after supervision to obtain a light reflection characteristic image with the size consistent with the size of the light reflection face image;

and extracting the light reflection characteristics from the light reflection characteristic diagram.

In some optional implementations of the present embodiment, the living body face image set acquisition unit 601 may include:

a live person image set acquisition subunit configured to acquire a live person image set;

the face extraction sub-unit is configured to preprocess the live figure images contained in the live figure image set, and extract face images from the preprocessed images to obtain an original face image set;

and the normalization processing subunit is configured to perform normalization processing on the original face images in the original face image set to obtain a living body face image set.

In some optional implementations of the present embodiment, the face extraction subunit is further configured to:

respectively determining the face position information in each living body figure image in the living body figure image set;

extracting key points of the face from the face region determined based on the face position information to obtain key point coordinates;

and determining a face extraction frame according to the key point coordinate pair, and extracting a face image from the corresponding live figure image according to the face extraction frame to obtain an original face image set.

As shown in fig. 7, the apparatus 700 for recognizing a face of a living body of the present embodiment may include: a face image to be recognized acquisition unit 701 and a calling model recognition unit 702. The face image to be recognized acquiring unit 701 is configured to acquire a face image to be recognized; a calling model identification unit 702 configured to call a living body face identification model to identify a face image to be identified; wherein the living body face recognition model is obtained according to the apparatus for training a living body face recognition model 600.

In the present embodiment, in the apparatus 700 for recognizing a face of a living body: the specific processing of the to-be-recognized face image obtaining unit 701 and the calling model recognition unit 702 and the technical effects brought by the processing may respectively correspond to the related descriptions in the method embodiments, and are not described herein again.

The device for training the living body face recognition model and the device for recognizing the living body face provided by the embodiment improve the accuracy of the living body face image by preprocessing the living body figure image, and further improve the accuracy of the reflective face image obtained by a preset reflective feature simulation mode, so that the living body face recognition model trained according to the method can better recognize electronic photo attacks, and the extracted reflective features are more accurate due to the use of the coding-decoding model.

According to an embodiment of the present disclosure, the present disclosure also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to implement a method of training a live face recognition model and/or a method of recognizing a live face of any of the above content distribution networks when executed.

According to an embodiment of the present disclosure, the present disclosure further provides a readable storage medium storing computer instructions for enabling a computer to implement any one of the above methods for training a living body face recognition model and/or a method for recognizing a living body face of a content distribution network when executed.

The disclosed embodiments provide a computer program product which, when executed by a processor, is capable of implementing any of the above-described methods of a content distribution network for training a live face recognition model and/or a method of recognizing a live face.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as a method of training a living body face recognition model and/or a method of recognizing a living body face. For example, in some embodiments, the method of training a live face recognition model and/or the method of recognizing a live face may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into the RAM 803 and executed by the computing unit 801, a computer program may perform one or more steps of the method of training a living body face recognition model and/or the method of recognizing a living body face described above. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the method of training the living face recognition model and/or the method of recognizing the living face by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in the conventional physical host and Virtual Private Server (VPS) service.

According to the technical scheme of the embodiment of the disclosure, the accuracy of the live person face image is improved by preprocessing the live person image, and the accuracy of the reflective face image obtained by a preset reflective characteristic simulation mode is further improved, so that the live person face recognition model trained in accordance with the method can better recognize electronic photo attacks, and the extracted reflective characteristics are more accurate due to the use of the coding-decoding model.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of training a live face recognition model, comprising:

preprocessing the live figure image contained in the live figure image set, and extracting a face image from the preprocessed image to obtain a live face image set;

generating a reflecting face image corresponding to the living body face image in the living body face image set in a preset reflecting feature simulation mode to obtain a reflecting face image set;

and extracting the light reflection characteristics from the light reflection face images in the light reflection face image set by using a coding-decoding model, and training by using the light reflection characteristics as training samples to obtain a living body face recognition model.

2. The method according to claim 1, wherein the generating of the reflection face image corresponding to the living body face image in the living body face image set by a preset reflection feature simulation manner to obtain a reflection face image set comprises:

and generating a reflecting face image corresponding to the living body face image in the living body face image set by spacing transparent reflecting materials and/or contrasting reflectors to obtain a reflecting face image set.

3. The method of claim 2, wherein the generating of the reflected facial image corresponding to the live facial image in the live facial image set by spacing transparent reflective materials and/or contrasting reflectors to obtain a reflected facial image set comprises:

generating a first reflective face image corresponding to a living body face image in the living body face images in a mode of spacing the transparent reflective materials to obtain a first reflective face image set;

generating a second reflecting face image corresponding to the living body face image in a mode of contrasting the reflector to obtain a second reflecting face image set;

and generating the reflecting face image set according to the first reflecting face image set and the second reflecting face image set.

4. The method of claim 3, wherein generating the set of retro-reflective facial images from the first set of retro-reflective facial images and the second set of retro-reflective facial images comprises:

generating a third reflecting face image corresponding to the second reflecting face image in the second reflecting face image set by spacing the transparent reflecting materials to obtain a third reflecting face image set;

generating a fourth reflecting face image corresponding to the first reflecting face image in the first reflecting face image set in a mode of contrasting the reflectors to obtain a fourth reflecting face image set;

and generating the reflecting face image set according to the third reflecting face image set and the fourth reflecting face image set.

5. The method of claim 1, wherein the extracting of the glint features from the glint face images in the glint face image set using a coding-decoding model comprises:

utilizing a coding sub-model to carry out down-sampling on the reflecting face images in the reflecting face image set to obtain a down-sampling result;

utilizing a decoding sub-model to perform up-sampling on the supervised down-sampling result to obtain a light reflection characteristic map with the size consistent with the light reflection face image;

6. The method according to any one of claims 1 to 5, wherein the preprocessing the live view images included in the live view image set and extracting the facial image from the preprocessed images comprises:

acquiring the living figure image set;

preprocessing the living figure images contained in the living figure image set, and extracting face images from the preprocessed images to obtain an original face image set;

and carrying out normalization processing on the original face images in the original face image set to obtain the living body face image set.

7. The method of claim 1, wherein the pre-processing the live-person image included in the live-person image set and extracting the facial image from the pre-processed image comprises:

and determining a face extraction frame according to the key point coordinate pair, and extracting a face image from a corresponding live figure image according to the face extraction frame.

8. A method of recognizing a face of a living body, comprising:

acquiring a face image to be recognized;

calling a living body face recognition model to recognize the face image to be recognized; wherein the living body face recognition model is obtained according to the method for training the living body face recognition model of any one of claims 1 to 7.

9. An apparatus for training a live face recognition model, comprising:

the living body human face image set acquisition unit is configured to preprocess a living body figure image contained in the living body figure image set, and extract a human face image from the image obtained after the preprocessing to obtain a living body human face image set;

the reflecting face image set acquisition unit is configured to generate reflecting face images corresponding to the living body face images in the living body face image set in a preset reflecting feature simulation mode to obtain a reflecting face image set;

and the reflecting feature extraction and model training unit is configured to extract reflecting features from the reflecting face images in the reflecting face image set by using a coding-decoding model, and train the reflecting features as training samples to obtain a living body face recognition model.

10. The apparatus according to claim 9, wherein the retro-reflective face image set acquisition unit includes:

11. The apparatus of claim 10, wherein the spacing and/or control simulation subunit comprises:

the first reflection processing module is configured to generate a first reflection face image corresponding to a living body face image in the living body face images in a mode of spacing the transparent reflection materials to obtain a first reflection face image set;

the second reflection processing module is configured to generate a second reflection face image corresponding to a living body face image in the living body face image in a mode of contrasting the reflector, so as to obtain a second reflection face image set;

and the reflecting face image set generating module is configured to generate the reflecting face image set according to the first reflecting face image set and the second reflecting face image set.

12. The apparatus of claim 11, wherein the retro-reflective human face image set generation module is further configured to:

13. The apparatus of claim 9, wherein the reflection feature extraction and model training unit comprises a reflection feature extraction subunit configured to extract reflection features from reflection face images in the set of reflection face images using a coding-decoding model, the reflection feature extraction subunit further configured to:

14. The apparatus according to any one of claims 9 to 13, wherein the living body face image set acquisition unit includes:

a live person image set acquisition subunit configured to acquire the live person image set;

a face extraction subunit, configured to pre-process the live character images included in the live character image set, and extract a face image from the pre-processed images to obtain an original face image set;

and the normalization processing subunit is configured to perform normalization processing on the original face images in the original face image set to obtain the living body face image set.

15. The apparatus of claim 14, wherein the face extraction subunit is further configured to:

and determining a face extraction frame according to the key point coordinate pair, extracting a face image from the corresponding live figure image according to the face extraction frame, and obtaining an original face image set.

16. An apparatus for recognizing a face of a living body, comprising:

a face image to be recognized acquisition unit configured to acquire a face image to be recognized;

a calling model identification unit configured to call a living body face identification model to identify the face image to be identified; wherein the living body face recognition model is obtained according to the apparatus for training a living body face recognition model according to any one of claims 9-15.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of training a live face recognition model of any one of claims 1-7 and/or the method of recognizing a live face of claim 8.

18. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of training a living body face recognition model of any one of claims 1-7 and/or the method of recognizing a living body face of claim 8.

19. A computer program product comprising a computer program which, when being executed by a processor, carries out the method of training a live face recognition model according to any one of claims 1-7 and/or the method of recognizing a live face of claim 8.