CN115761411A

CN115761411A - Model training method, living body detection method, electronic device, and storage medium

Info

Publication number: CN115761411A
Application number: CN202211486526.6A
Authority: CN
Inventors: 何武; 付贤强; 朱海涛; 户磊
Original assignee: Beijing Lumingshi Technology Co ltd
Current assignee: Beijing Lumingshi Technology Co ltd
Priority date: 2022-11-24
Filing date: 2022-11-24
Publication date: 2023-03-07
Anticipated expiration: 2042-11-24
Also published as: CN115761411B

Abstract

The embodiment of the application relates to the field of image recognition, and discloses a model training method, a living body detection method, electronic equipment and a storage medium. The model training method comprises the following steps: inputting a multi-channel first processed image obtained by blocking processing and fusing processing of the face infrared image with the living body or the prosthesis marked into a living body detection network to obtain a first face feature and a second face feature; constructing a feature loss which enables a first face feature and a living body center vector which belong to a living body to be close to each other and a first face feature and a living body center vector which belong to a prosthesis to be far from each other, and constructing a probability loss which enables a prediction probability of a second face feature and a living body probability center which belong to a living body to be close to each other and a prediction probability of a second face feature and a living body probability center which belong to a prosthesis to be far from each other; and training the living body detection network according to the characteristic loss and the probability loss to obtain the trained living body detection network. The accuracy of the living body detection on the complex scene (the parchment paper real person attack) is improved.

Description

Model training method, living body detection method, electronic device, and storage medium

Technical Field

The embodiment of the application relates to the technical field of image recognition, in particular to a model training method, a living body detection method, an electronic device and a storage medium.

Background

With the rapid development of the deep learning technology, the face recognition technology has been widely applied to the production and life of people, however, the existing face recognition system is easily attacked by various prosthesis attacks such as printing, makeup, 3D mask, 3D head model, parchment paper and the like, and the face living body detection technology is very important in order to ensure the safety of the face recognition system. The currently adopted biopsy means include: a two-classification method, an anomaly detection-based method, and a generative model-based method.

However, the classification method adopts softmax loss for classification, so that overfitting is easy to realize, and the detection effect on the types of the prostheses which do not appear in a training set is poor. The method based on anomaly detection has low accuracy on complex scenes (such as real attack on the parchment paper). The method based on the generated model is not suitable for the living body detection occasion with high real-time requirement because the speed of using the generated model is low.

Disclosure of Invention

An object of the embodiment of the application is to provide a model training method, a living body detection method, an electronic device and a storage medium, so that the accuracy of living body detection on a complex scene (such as a parchment paper real-person attack) is improved, a better detection effect can be achieved on an undiscovered prosthesis in a training set, robustness is good, and the detection speed is high.

In order to solve the above technical problem, an embodiment of the present application provides a model training method, including: inputting a multi-channel first processing image obtained by blocking processing and fusion processing of the infrared image of the face marked with the living body or the prosthesis into a living body detection network to obtain a first face characteristic and a second face characteristic; constructing a feature loss in which the first face feature and living body center vector belonging to a living body are made close to each other and the first face feature and living body center vector belonging to a prosthesis are made far from each other, and a probability loss in which the prediction probability and living body probability center of the second face feature belonging to a living body are made close to each other and the prediction probability and living body probability center of the second face feature belonging to a prosthesis are made far from each other; the prediction probability is the prediction probability of the second human face feature on living body and prosthesis classification; and training the living body detection network according to the characteristic loss and the probability loss to obtain the trained living body detection network.

Embodiments of the present application also provide a method of in vivo detection, comprising: inputting a multichannel first processed image obtained by blocking and fusing infrared images of a face to be detected into a trained living body detection network to obtain a first face characteristic and a second face characteristic; calculating a feature distance between the first face feature and a living body center vector, and a probability distance between a prediction probability corresponding to the second face feature and a living body probability center; determining a living body detection result of the infrared image of the face to be detected according to the characteristic distance and the probability distance; the living body detection network, the living body center vector and the living body probability center are all obtained by the model training method provided by the embodiment.

An embodiment of the present application also provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method as mentioned in the above embodiments or to perform the liveness detection method as mentioned in the above embodiments.

Embodiments of the present application also provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements the model training method mentioned in the above embodiments or is capable of executing the living body detection method mentioned in the above embodiments.

According to the model training method provided by the embodiment of the application, the face infrared image is subjected to blocking processing and fusion processing, and the image of a single channel is processed into the first processed image of multiple channels, so that the in-vivo detection network is promoted to learn local features with finer granularity, and a foundation is provided for improving the detection accuracy of the in-vivo detection network in a complex scene. The method comprises the steps of inputting a first processed image into a living body detection network to obtain a first face feature and a second face feature, enabling the first face feature and a living body center vector which belong to a living body to be close to each other, enabling the first face feature and the living body center vector which belong to a prosthesis to be far away from each other, enabling a prediction probability of the second face feature which belongs to the living body to be close to a living body probability center and enabling a prediction probability of the second face feature which belongs to the prosthesis to be far away from a living body probability center, so that the training can obviously improve the compact type between the living bodies and the discrete type between the living bodies and the prosthesis, namely, the face feature and the living body center vector of the living body are similar as far as possible, the face feature of the prosthesis only needs to be dissimilar as far as possible with the face feature of the living body, and the position, form, composition and the like of the face feature of the specific prosthesis are not concerned in the application, so that the living body detection network obtained by training improves the accuracy of detection of complex scenes (such as real paper sulfate attack) detection, and has good detection effect on the concentrated and the prosthesis which does not appear in the training, robustness and detection speed is high.

In addition, according to the model training method provided by the embodiment of the present application, the multichannel first processed image is obtained by the following steps: acquiring preset key points of the face infrared image; wherein the preset key points include: a left eye key point, a right eye key point, a mouth key point, and a nose key point; taking each preset key point in the face infrared image as a center, and acquiring a plurality of image blocks with preset sizes; and fusing the image blocks according to a preset sequence to obtain a multi-channel first processed image. According to the method and the device, the human face infrared image is a single-channel image, and a plurality of image blocks of a single channel are subjected to fusion processing to obtain a multi-channel first processing image, so that a living body detection network is prompted to learn local features with finer granularity, and the living body detection accuracy of a complex scene is improved.

In addition, according to the model training method provided by the embodiment of the present application, the living body detection network includes: the system comprises a feature extraction network, a first multilayer sensor and a second multilayer sensor, wherein the feature extraction network is respectively connected with the first multilayer sensor and the second multilayer sensor, and a first processed image obtained by processing the infrared image of the face marked with living body or prosthesis by blocks and fusing is input into a living body detection network to obtain a first face characteristic and a second face characteristic, and the system comprises: inputting the first processed image into a feature extraction network to obtain an output feature; and inputting the output characteristics into the first multilayer perceptron and the second multilayer perceptron respectively to obtain the first human face characteristics and the second human face characteristics. The living body detection network comprises a feature extraction network and two multilayer perceptrons, and the living body detection network with a simple structure can achieve a good living body detection effect through training.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.

FIG. 1 is a flow chart of a model training method provided by an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a liveness detection network provided by an embodiment of the present application;

FIG. 3 is a flow chart of a method of in vivo detection provided by an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the following describes each embodiment of the present application in detail with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in various embodiments of the present application in order to provide a better understanding of the present application. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.

The following describes details of the model training according to the present embodiment. The following disclosure provides implementation details for the purpose of facilitating understanding, and is not necessary to practice the present solution.

The embodiment of the present application relates to a model training method, as shown in fig. 1, including:

step 101, inputting a multichannel first processed image obtained by processing the infrared image of the face marked with the living body or the prosthesis by blocks and fusing into a living body detection network to obtain a first face feature and a second face feature.

Specifically, compared with other forms of face images, the face infrared image is not influenced by factors such as ambient illumination change and face change (such as makeup, face lifting and the like), and has strong anti-interference performance. In this embodiment, when a training set including a face infrared image is acquired, the training set includes both a face infrared image of a living body and a face infrared image of a prosthesis, and the face infrared image of the living body includes various possible situations and application scenes, such as a large pose angle, dark illumination, wearing accessories, bang, and the like, and similarly, the face infrared image of the prosthesis also includes various possible prosthesis types and application scenes, such as a mobile phone screen displaying a two-dimensional prosthesis video, a flat screen displaying a two-dimensional prosthesis picture, an equal-scale printing of a two-dimensional prosthesis picture, makeup, a 3D mask, a 3D head model, a parchment paper real person attack, and the like.

The method for real-person attack of the parchment paper comprises the steps of printing a prosthesis picture on the parchment paper, cutting off the parts of eyes, nose and mouth in the printed prosthesis picture and pasting the cut parts to the corresponding five sense organs on the face of a real person, and attacking a living body detection system by the aid of the prosthesis picture, wherein the parts of the eyes, the nose and the mouth are the prosthesis parchment paper during attack, and other parts on the face are the five sense organs on the face of the real person. It should be noted that, because the parchment paper has high strength, pure paper quality and good transparency, the paper has high degree of adhesion with the face when being pasted on the face, and when facial features images are printed on the parchment paper, the living body detection system is difficult to distinguish, and the recognition accuracy is low.

After the face infrared image is acquired, the face infrared image needs to be subjected to blocking processing and fusion processing to obtain a first processed image, wherein the blocking processing refers to dividing the face infrared image into a plurality of image blocks according to a preset size, and the fusion processing refers to fusing the obtained plurality of image blocks to obtain a multi-channel first processed image. Because the face infrared image is a single-channel image, the plurality of image blocks with preset sizes are also single-channel image blocks, the plurality of image blocks with single channel are fused to obtain a multi-channel first processing image, and the number of the multi-channels of the first processing image is related to the number of the fused image blocks, such as: and fusing the image blocks of the 4 single channels to obtain a first processed image of the 4 channels.

Further, the processed first processed image is input into a living body detection network, so that a first human face feature and a second human face feature are obtained, namely the living body detection network is an input network and two output networks.

In one embodiment, the processing procedure of the first processed image obtained by processing the infrared image of the face marked with the living body or the prosthesis by blocking and fusing comprises the following steps: acquiring preset key points of the face infrared image; wherein the preset key points include: a left eye key point, a right eye key point, a mouth key point, and a nose key point; taking each preset key point in the face infrared image as a center, and acquiring a plurality of image blocks with preset sizes; and performing fusion processing on the plurality of image blocks according to a preset sequence to obtain a multi-channel first processed image.

Specifically, the method performs key point detection on the face infrared image to obtain information (position, pixel value, etc.) of preset key points on the face infrared image, where the preset key points may include: left eye keypoints, right eye keypoints, mouth keypoints, and nose keypoints. And taking a preset key point as a center, and acquiring a plurality of image blocks with preset sizes, wherein the preset sizes of the image blocks can be the same or different. Such as: the method includes the steps that a single-channel face infrared image 1 x 224 is cut according to a preset ruler 96 x 96 by taking a right-eye key point, a left-eye key point, a nose key point and a mouth key point as centers to obtain 4 single-channel image blocks 1 x 96, and then the 4 image blocks are subjected to fusion processing to obtain a first processed image 4 x 96. In the fusion, it is necessary to fuse the plurality of image blocks according to a preset order. Such as: and if the preset sequence is mouth, left eye, right eye and nose, the first processed image 4 × 96 × 96 is obtained by fusing the mouth image blocks 1 × 96 × 96, the left eye image blocks 1 × 96 × 96, the right eye image blocks 1 × 96 × 96 and the nose image blocks 1 × 96 × 96 in sequence during fusion.

Since an image can be described by a pixel matrix, fusion processing is performed on 4 single-channel image blocks in a preset order to obtain 1 4-channel first processed image, which can be understood as stacking the 4 single-channel pixel matrices in a preset order (image blocks of a preset size) to obtain 1 pixel matrix of 4 channels (first processed image with the same preset size).

Certainly, the preset key points in this embodiment may also include other facial features, such as eyebrow key points, ear key points, and the like.

In one embodiment, a liveness detection network includes: the system comprises a feature extraction network, a first multilayer perceptron and a second multilayer perceptron, wherein the feature extraction network is respectively connected with the first multilayer perceptron and the second multilayer perceptron; the method for inputting a first processed image obtained by processing the human face infrared image marked with the living body or the prosthesis in a blocking mode and a fusion mode into a living body detection network to obtain a first human face feature and a second human face feature comprises the following steps: inputting the first processed image into a feature extraction network to obtain an output feature; and inputting the output characteristics into the first multilayer perceptron and the second multilayer perceptron respectively to obtain the first human face characteristics and the second human face characteristics.

In this embodiment, the living body detection network includes a feature extraction network, and a first multilayer sensor and a second multilayer sensor that are both connected to the feature extraction network, and the first multilayer sensor and the second multilayer sensor are the same in model structure, but different in loss constraint of the two networks during training. The structure of the living body detection network is schematically shown in fig. 2. Of course, the embodiment describes only the main network for implementing the living body detection function in the living body detection network, but does not mean that other networks for implementing the functions of the auxiliary feature extraction network, the first multi-layer perceptron and the second multi-layer perceptron are not included, such as the average pooling layer, the batch normalization layer, and the like.

And 102, constructing a feature loss which enables a first face feature and a living body center vector belonging to a living body to be close to each other and a first face feature and a living body center vector belonging to a prosthesis to be far from each other, and constructing a probability loss which enables a prediction probability of a second face feature belonging to the living body and a living body probability center to be close to each other and a prediction probability of a second face feature belonging to the prosthesis and a living body probability center to be far from each other.

In this embodiment, the prediction probability is a prediction probability of the second face feature in the living body and the prosthesis classification. The prediction probability may be obtained by inputting the second face feature into a classifier, such as: the prediction probability of the living body is preset to be 1, the prediction probability of the prosthesis is 0, when the prediction probability output by the classifier is 0.2, the corresponding human face infrared image is very likely to be the prosthesis, and when the prediction probability is 0.9, the corresponding human face infrared image is very likely to be the living body. It should be noted that the classifier mentioned in this embodiment is not a basic network included in the living body detection network, and only in order to obtain the prediction probability of the second facial feature in the living body and prosthesis classification, the living body detection network itself only performs constraint training on the first facial feature and the second facial feature. Classifiers are only assisting in the computation.

The trained loss constraints comprise feature losses and probability losses, the feature losses constrain first face features, the probability losses constrain second face features, the feature losses constrain the first face features and the living body center vectors corresponding to infrared images of faces belonging to a living body to be close to each other, the first face features and the living body center vectors corresponding to infrared images of faces belonging to a prosthesis to be far away from each other, and the probability losses constrain the prediction probabilities and the living body probability centers of the second face features corresponding to infrared images of faces belonging to a living body to be close to each other, and the prediction probabilities and the living body probability centers of the second face features corresponding to infrared images of faces belonging to a prosthesis to be far away from each other.

The living body center vector and the living body probability center are continuously changed in the training process, and can not stably trend to a fixed value until the final convergence. Namely, the living body center vector and the living body probability center are obtained by continuously training and optimizing the living body detection network. For the prosthesis center vector and the prosthesis probability center, the living body detection network does not train and optimize the prosthesis center vector and the prosthesis probability center, and does not pay attention to the prosthesis center vector and the prosthesis probability center.

It is to be noted that, for the first face feature, the purpose of the feature loss is to minimize the feature distance between the first face feature of the living body and the living body center vector, and maximize the feature distance between the first face feature of the prosthesis and the living body center vector. For the second face feature, the objective of the probability loss is to minimize the probability distribution distance between the prediction probability of the second face feature of the living body and the probability center of the living body, and maximize the prediction probability of the second face feature of the prosthesis and the probability distribution distance between the probability centers of the living body.

That is, the first face feature is unconstrained from the perspective of feature distance, and the second face feature is unconstrained from the perspective of probability distribution distance. The characteristic distance may be a euclidean distance, a mahalanobis distance, a cosine distance, etc., and the probability distance may be a KL Divergence (Kullback-Leibler Divergence), a JS Divergence, a hailing distance, etc.

And 103, training the living body detection network according to the characteristic loss and the probability loss to obtain the trained living body detection network.

In this embodiment, a gradient descent method commonly used in the deep learning field is adopted for training when the living body detection network is trained, the constraint is a characteristic loss and a probability loss, and a total loss function during the training is L = α L1+ β L2, where L1 is a loss function corresponding to the characteristic loss, L2 is a loss function corresponding to the probability loss, and α and β are corresponding weight parameters, which can be set empirically and adjusted in the training process, and generally β is greater than α. When the training of the living body detection network is converged, the living body detection network is trained, and whether the network is converged can be determined by judging whether the training times reach the preset training times, whether the total loss function has no obvious change any more and the like.

In one embodiment, the loss function for the characteristic loss is constructed by the following equation:

wherein m is a hyperparameter, d _live As a feature distance between a first face feature belonging to a living body and a living body center vector, d _fake N is a dimension of the first face feature, being a feature distance between the first face feature belonging to the prosthesis and the living body center vector. Feature distances include, but are not limited to, euclidean distances, mahalanobis distances, cosine distances, and the like.

The loss function of the probability loss is constructed by the following formula:

L ₂ ＝-log(sigmoid(D _fake -D _live ))+D _live

wherein D is _live Is the probability distance between the predicted probability of a second face feature belonging to a living body and the center of probability of the living body, D _fake The probability distance between the predicted probability of the second face feature belonging to the prosthesis and the probability center of the living body includes, but is not limited to, KL Divergence (Kullback-Leibler Divergence), JS Divergence, hailinger distance, and the like.

According to the model training method provided by the embodiment of the application, the human face infrared image is subjected to blocking processing and fusion processing, and the single-channel image is processed into the multi-channel first processing image, so that the in-vivo detection network is prompted to learn local features with finer granularity, and a basis is provided for improving the detection accuracy of the in-vivo detection network in a complex scene. Then inputting the first processed image into a living body detection network to obtain a first face feature and a second face feature, wherein during training, the first face feature and a living body center vector which belong to a living body are close to each other, the first face feature and a living body center vector which belong to a prosthesis are far away from each other, meanwhile, the prediction probability of the second face feature which belong to the living body is close to the living body probability center, and the prediction probability of the second face feature which belong to the prosthesis is far away from the living body probability center.

Embodiments of the present application relate to a method of detecting a living body, as shown in fig. 3, including:

step 201, inputting a multi-channel first processed image obtained by blocking and fusing the infrared image of the face to be detected into a trained living body detection network to obtain a first face feature and a second face feature.

In this embodiment, the process of processing the infrared image of the face to be detected to obtain the first processed image is completely the same as the image processing process in the training stage, and the specific processing process may refer to the specific implementation details of the embodiment of the model training method.

Step 202, calculating a feature distance between the first face feature and the living body center vector, and a probability distance between the prediction probability corresponding to the second face feature and the living body probability center.

In this embodiment, the living body detection network, the living body center vector, and the living body probability center are all obtained by the model training method. The living body detection network is a network with training convergence, and after the training convergence, the living body center vector and the living body probability center are also trained to converge and stably tend to fixed values.

Specifically, the characteristic distance may be a euclidean distance, a mahalanobis distance, a cosine distance, or the like, and the probability distance may be a KL Divergence (Kullback-Leibler Divergence), a JS Divergence, a hailing distance, or the like.

And step 203, determining the living body detection result of the infrared image of the face to be detected according to the characteristic distance and the probability distance.

In this embodiment, the living body detection result of the infrared image of the face to be detected can be determined according to the characteristic distance and the probability distance. It can be understood that, during training, the network makes the first face feature of the living body and the living body center vector approach each other, the first face feature of the prosthesis and the living body center vector away from each other, makes the prediction probability of the second face feature of the living body and the living body probability center approach each other, and makes the prediction probability of the second face feature of the prosthesis and the living body probability center away from each other, so that the smaller the characteristic distance is, the more likely the infrared image of the face to be measured is to be the living body, the larger the characteristic distance is, the more likely the infrared image of the face to be measured is to be the prosthesis, and similarly, the smaller the probability distance is, the more likely the infrared image of the face to be measured is to be the living body, the larger the probability distance is, the more likely the infrared image of the face to be measured is to be the prosthesis.

In an embodiment, determining a living body detection result of the infrared image of the face to be detected according to the characteristic distance and the probability distance includes: when the probability distance is smaller than a first threshold value, determining the infrared image of the face to be detected as a living body; when the probability distance is larger than a second threshold value, determining the infrared image of the face to be detected as a prosthesis; when the probability distance is larger than a first threshold and smaller than a second threshold, determining a living body detection result of the infrared image of the face to be detected according to the characteristic distance; wherein the first threshold is less than the second threshold.

Specifically, the probability distance has a higher accuracy in the living body detection than the feature distance, and therefore the probability distance can be regarded as a primary score and the feature distance as a secondary score. Of course, since the calculation dimensions of the feature distance and the probability distance are different, the two distances need to be normalized before the judgment. After the normalization process, the first threshold is typically set to 0.2 and the second threshold is set to 0.8. Of course, the setting of the first threshold and the second threshold may be adjusted according to the application scenario, the requirement for the detection severity, and other factors.

Further, determining the living body detection result of the infrared image of the face to be detected according to the characteristic distance, comprising the following steps: when the characteristic distance is smaller than a first threshold value, determining the infrared image of the face to be detected as a living body; and when the characteristic distance is larger than a second threshold value, determining the infrared image of the face to be detected as a prosthesis.

In this embodiment, when the probability distribution distance is between the first threshold and the second threshold, the characteristic distance is used as a main judgment basis, and in other cases, the probability distance is used as a main judgment basis.

The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.

Embodiments of the present application relate to an electronic device, as shown in fig. 4, including:

at least one processor 301; and a memory 302 communicatively coupled to the at least one processor 301; wherein the memory 302 stores instructions executable by the at least one processor 301, the instructions being executable by the at least one processor 301 to enable the at least one processor 301 to perform the model training method as mentioned in the above embodiments, or to perform the liveness detection method as mentioned in the above embodiments.

The electronic device includes: one or more processors 301 and a memory 302, with one processor 301 being illustrated in fig. 4. The processor 301 and the memory 302 may be connected by a bus or other means, and fig. 4 illustrates the connection by the bus as an example. The memory 302 is a non-volatile computer-readable storage medium for storing non-volatile software programs, non-volatile computer-executable programs, and modules, such as algorithms corresponding to the processing policies in the policy space of the embodiment of the present application, in the memory 302. The processor 301 implements the model training method or the living body detection method described above by executing non-volatile software programs, instructions, and modules stored in the memory 302 to execute various functional applications of the device and data processing.

The memory 302 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory 302 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 302 may optionally include memory located remotely from processor 301, which may be connected to an external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory 302, which when executed by the one or more processors 301, perform the model training method of any of the embodiments described above, or are capable of performing the in vivo detection method mentioned in the embodiments described above.

The product can execute the method provided by the embodiment of the application, has corresponding functional modules and beneficial effects of the execution method, and can refer to the method provided by the embodiment of the application without detailed technical details in the embodiment.

Embodiments of the present application relate to a computer-readable storage medium storing a computer program. The computer program, when executed by the processor, implements the model training method or the liveness detection method described above.

That is, as can be understood by those skilled in the art, all or part of the steps in the method according to the above embodiments may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the present application, and that various changes in form and details may be made therein without departing from the spirit and scope of the present application in practice.

Claims

1. A method of model training, comprising:

inputting a multi-channel first processed image obtained by blocking processing and fusing processing of the face infrared image with the living body or the prosthesis marked into a living body detection network to obtain a first face feature and a second face feature;

constructing a feature loss in which the first face feature and living body center vector belonging to a living body are made close to each other and the first face feature and living body center vector belonging to a prosthesis are made far from each other, and a probability loss in which the prediction probability and living body probability center of the second face feature belonging to a living body are made close to each other and the prediction probability and living body probability center of the second face feature belonging to a prosthesis are made far from each other; the predicted probability is the predicted probability of the second human face feature on the classification of living bodies and prostheses;

and training the living body detection network according to the characteristic loss and the probability loss to obtain the trained living body detection network.

2. The model training method of claim 1, wherein the processing procedure of the multi-channel first processed image obtained by the blocking processing and the fusion processing of the human face infrared image with the living body or the prosthesis marked comprises:

acquiring preset key points of the face infrared image; wherein the preset key points include: a left eye key point, a right eye key point, a mouth key point, and a nose key point;

taking each preset key point in the face infrared image as a center, and acquiring a plurality of image blocks with preset sizes;

and performing fusion processing on the plurality of image blocks according to a preset sequence to obtain a multi-channel first processed image.

3. The model training method according to claim 1 or 2, wherein the liveness detection network comprises: the system comprises a feature extraction network, a first multilayer perceptron and a second multilayer perceptron, wherein the feature extraction network is respectively connected with the first multilayer perceptron and the second multilayer perceptron;

the method for inputting a first processed image obtained by processing the human face infrared image marked with the living body or the prosthesis in a blocking mode and a fusion mode into a living body detection network to obtain a first human face feature and a second human face feature comprises the following steps:

inputting the first processed image into a feature extraction network to obtain an output feature;

and inputting the output features into the first multilayer perceptron and the second multilayer perceptron respectively to obtain the first human face features and the second human face features.

4. The model training method of claim 1, wherein the loss function of feature loss is constructed by the following formula:

wherein L is ₁ Is a loss function of the characteristic loss, m is a hyperparameter, d _live As a feature distance between a first face feature belonging to a living body and a living body center vector, d _fake N is a dimension of the first face feature, being a feature distance between the first face feature belonging to the prosthesis and the living body center vector.

5. The model training method of claim 1, wherein the loss function of the probability loss is constructed by the following formula:

L ₂ ＝-log(sigmoid(D _fake -D _live ))+D _live

wherein L is ₂ As a loss function of the probability loss, D _live Is the probability distance between the predicted probability of a second face feature belonging to a living body and the center of probability of the living body, D _fake Is the probability distance between the predicted probability of the second face feature belonging to the prosthesis and the probability center of the living body.

6. A method of in vivo detection, comprising:

inputting a multi-channel first processing image obtained by blocking and fusing the infrared image of the face to be detected into a trained living body detection network to obtain a first face feature and a second face feature;

calculating a feature distance between the first face feature and a living body center vector, and a probability distance between a prediction probability corresponding to the second face feature and a living body probability center;

determining a living body detection result of the infrared image of the face to be detected according to the characteristic distance and the probability distance;

wherein the in vivo detection network, the in vivo center vector, and the in vivo probability center are all obtained by the model training method of any one of claims 1-5.

7. The in-vivo detection method according to claim 6, wherein the determining the in-vivo detection result of the infrared image of the face to be detected according to the characteristic distance and the probability distance comprises:

when the probability distance is smaller than a first threshold value, determining the infrared image of the face to be detected as a living body;

when the probability distance is larger than a second threshold value, determining the infrared image of the face to be detected as a prosthesis;

when the probability distance is larger than the first threshold and smaller than the second threshold, determining a living body detection result of the infrared image of the face to be detected according to the characteristic distance; wherein the first threshold is less than the second threshold.

8. The model training method of claim 7, wherein the determining the in-vivo detection result of the infrared image of the face to be detected according to the characteristic distance comprises:

when the characteristic distance is smaller than the first threshold value, determining that the infrared image of the face to be detected is a living body;

and when the characteristic distance is larger than a second threshold value, determining that the infrared image of the face to be detected is a prosthesis.

9. An electronic device, comprising:

at least one processor; and (c) a second step of,

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method of any one of claims 1 to 5 or to perform the liveness detection method of any one of claims 6 to 8.

10. A computer-readable storage medium, storing a computer program, characterized in that the computer program, when being executed by a processor, implements the model training method of any one of claims 1 to 5 or is capable of performing the in-vivo detection method of any one of claims 6 to 8.