WO2023011606A1

WO2023011606A1 - Training method of live body detection network, method and apparatus of live body detectoin

Info

Publication number: WO2023011606A1
Application number: PCT/CN2022/110368
Authority: WO
Inventors: Yongkai Li; Ningbo WANG; Shulei ZHU; Jingsong HAO
Original assignee: Zhejiang Dahua Technology Co., Ltd.
Priority date: 2021-08-06
Filing date: 2022-08-04
Publication date: 2023-02-09
Also published as: CN113723215A; CN113723215B

Abstract

The present application provides a method for training a live body detection network, a live body detection method and a live body detection apparatus. The training method includes: inputting non-live data to the live body detection network to obtain an output distribution generated by the live body detection network processing each non-live data; calculating a relative difference between the output distribution of the non-live data and the uniform distribution to obtain a uniform distribution loss; and performing an adversarial training on an adversarial training network that includes the live body detection network based on the uniform distribution loss. According to the present application, the accuracy for live body detection is improved.

Description

TRAINING METHOD OF LIVE BODY DETECTION NETWORK, METHOD AND APPARATUS OF LIVE BODY DETECTOIN

CROSS REFERENCE

The present application claims priority of China Patent Application No. 202110904298.9, filed on August 06, 2021, in the China National Intellectual Property Administration, the entire contents of which are hereby incorporated by reference.

TECHNICAL FIELD

The present disclosure relates to the field of image detection, and in particular to a training method of a live body detection network, a method and an apparatus of live body detection.

BACKGROUND

Biometric detection technology, especially face recognition, has been developing and progressed in recent years, such as mobile phone unlocking or face payment. However, a face recognition system faces a risk of attack by taking a fake face of a user. If a face image of the user is stolen, the system may be easily attacked by a photo or a video. Live body detection may determine whether a face acquired by a camera is a face of a real user or a disguised face of the user, such as a face photo taken by a mobile phone, a face printed on paper, or a 3D silicone face mask. Therefore, research on live body detection is becoming an increasingly important research task in face recognition. However, accuracy of live body detection is not accurate.

SUMMARY OF THE DISCLOSURE

According to the present disclosure, a training method of a live body detection network, a method and an apparatus of live body detection are provided to improve the accuracy of live body detection.

To achieve the above objectives, the present disclosure provides a method for training a live body detection network, including:

inputting at least one non-live data to the live body detection network to obtain an output distribution generated by the live body detection network processing each of the at least one non-live data;

calculating a relative difference between the output distribution of the non-live data and the uniform distribution to obtain a uniform distribution loss; and

performing an adversarial training on an adversarial training network that includes the live body detection network based on the uniform distribution loss.

In some embodiments, the at least one non-live data includes expanded non-live data obtained by expanding based on at least a portion of training data.

In some embodiments, before the inputting at least one non-live data to the live body detection network, the method further includes:

processing the at least a portion of training data based on an attack-breakage image to obtain the expanded non-live data.

The attack-breakage image is obtained by retaining an attack-breakage region of an image and setting each pixel in a non-attack-breakage region to be 0.

In some embodiments, the processing the at least a portion of training data based on an attack-breakage image, includes:

adding the attack-breakage image and the training data to obtain a superimposed image; and

subtracting an average pixel value of the attack-breakage image from a pixel value of each pixel in the superimposed image to obtain the expanded non-live data.

In some embodiments, the subtracting an average pixel value of the attack-breakage image from a pixel value of each pixel in the superimposed image to obtain the expanded non-live data, includes:

subtracting an average pixel value of the attack-breakage image from a pixel value of each pixel in the superimposed image to obtain an intermediate image;

increasing Gaussian noise to the intermediate image to obtain the expanded non-live data.

In some embodiments, one sub-classifier is configured for one domain. Before the performing an adversarial training on an adversarial training network that includes the live body detection network based on the uniform distribution loss, the method further includes:

dividing all the training data into a plurality of sub-training data sets of a plurality of domains;

inputting live data in sub-training data sets of at least two of the plurality of domains to the live body detection network to obtain first feature data of the live data of the sub-training data sets of the at least two of the plurality of domains;

inputting the live data of the sub-training data sets of the at least two of the plurality of domains to sub-classifiers of the at least two of the plurality of domains, which the live data belong to, to obtain second feature data of the live data of the sub-training data sets of the at least two of the plurality of domains;

calculating the first feature data and the second feature data of the live data of the sub-training data sets for the at least two of the plurality of domains to obtain a feature difference loss.

The performing an adversarial training on an adversarial training network that includes the live body detection network based on the uniform distribution loss, includes:

performing an adversarial training on the live body detection network and a domain determiner based on the uniform distribution loss and the feature difference loss.

In some embodiments, the calculating the first feature data and the second feature data of the live data of the sub-training data sets for the at least two of the plurality of domains to obtain a feature difference loss, includes:

calculating a maximum mean difference between the first feature data and the second feature data of the sub-training data sets of the at least two of the plurality of domains to obtain the feature difference loss of the live body detection network.

In some embodiments, the adversarial training network comprises the live body detection network and the domain determiner. Before the performing an adversarial training on the live body detection network and a domain determiner based on the uniform distribution loss and the feature difference loss, the method further includes:

inputting the first feature data for each data in the sub-training data sets of the at least two of the plurality of domains to the domain determiner to obtain a domain determination result for each data predicted by the domain determiner;

calculating a domain determination loss based on domain determination results for all data in the sub-training data sets of the at least two of the plurality of domains predicted by the domain determiner.

performing the adversarial training on the live body detection network and the domain determiner based on the uniform distribution loss, the feature difference loss and the domain determination loss.

In some embodiments, before the performing an adversarial training on an adversarial training network that includes the live body detection network based on the uniform distribution loss, the method further includes:

obtaining classification results of all the training data predicted by the live body detection network;

calculating a classification loss of the live body detection network based on the classification results of all the training data.

The performing the adversarial training on the live body detection network and the domain determiner based on the uniform distribution loss, the feature difference loss and the domain determination loss, includes:

adding the uniform distribution loss, the feature difference loss, the domain difference loss and the classification loss by weight to obtain a total loss; and

performing the adversarial training on the live body detection network and the domain determiner by taking the total loss.

In some embodiments, one domain determiner is configured for one domain. The inputting the first feature data for each data in the sub-training data sets of the at least two of the plurality of domains to the domain determiner, includes:

inputting first feature data of all the training data to each domain determiner to obtain a domain determination result for each training data predicted by each domain determiner.

Calculating a domain determination loss based on the domain determination result for each training data predicted by each domain determiner, includes:

calculating a domain determination loss for each domain determiner based on the domain discriminant results for all the training data predicted by each domain determiner; and

adding domain determination losses for all domain determiners to obtain a total domain determination loss of the domain determiner.

In some embodiments, before the inputting first feature data of all the training data to each domain determiner, the method further includes:

training the domain determiner of each domain by taking the second feature data of the sub-training data set for each domain and the second feature data of a sub-training data set of at least one of the remaining domains.

In some embodiments, before the inputting the live data of the sub-training data sets of the at least two of the plurality of domains to sub-classifiers of the at least two of the plurality of domains, which the live data belong to, to obtain second feature data of the live data of the sub-training data sets of the at least two of the plurality of domains, the method further includes:

training the sub-classifier of each domain through a binary classification cross-entropy loss function by taking the sub-training data set for each domain.

To achieve the above objectives, the present disclosure provides a live body detection method, including:

performing live body detection on an object to be detected by applying the live body detection network trained by the above training method to determine whether the object to be detected is a live body or a non-live body.

To achieve the above objectives, the present disclosure provides an electronic device. The electronic device includes a processor, and the processor is configured to execute instructions to implement operations of the above method.

To achieve the above objectives, the present disclosure provides a computer-readable storage medium, which stores a program and/or instructions. The program and/or the instructions, when being executed, implement the operations of the above method.

According to the present disclosure, while training the live body detection network, the relative difference between the output distribution, which is generated by the live body detection network processing each non-live data, and the uniform distribution may be calculated to obtain the uniform distribution loss of the live body detection network. The adversarial training network containing the live body detection network is trained by taking the uniform distribution loss of the live body detection network. In this way, for labeled data being distributed limitedly, non-live data of different domains not having common domain invariant features, and the non-live data being an open set, the difference between the output distribution and the uniform distribution of the non-live data in the training dataset predicted by the live body detection network is calculated. In this way, the live body detection network does not learn features of the non-live data, allowing the live body detection network to focus on learning features of the live data, such that the live body detection network, which is trained by the training method of the live body detection network in the present disclosure, has a low probability output for the non-live data of the open set distribution and has a high probability output for live bodies in different domains. In this way, the live body detection network in the present disclosure has a strong generalization ability for the infinite number of non-live data, reducing the probability that the live body detection network determines the non-live data to be the live data, improving the accuracy of live body detection.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings herein are illustrated to provide a further understanding of the present disclosure and form a part of the present disclosure. The exemplary embodiments of the present disclosure and description thereof are used to explain the present disclosure and do not limit the present disclosure.

FIG. 1 is a flow chart of a training method of a live body detection network according to an embodiment of the present disclosure.

FIG. 2 is a flow chart of a training method of a live body detection network according to another embodiment of the present disclosure.

FIG. 3 is a schematic view of a process of a training method of a live body detection network according to another embodiment of the present disclosure.

FIG. 4 is a working schematic view of a sub-classifier in a training method of a live body detection network according to an embodiment of the present disclosure.

FIG. 5 is a schematic view of expanding non-live body data in a training method of a live body detection network according to an embodiment of the present disclosure.

FIG. 6 is a schematic view of calculating a loss in a training method of a live body detection network according to an embodiment of the present disclosure.

FIG. 7 is a schematic view of a training target of a training method of a live body detection network according to an embodiment of the present disclosure.

FIG. 8 is a flow chart of a live body detection method according to an embodiment of the present disclosure.

FIG. 9 is a structural schematic view of an electronic device according to an embodiment of the present disclosure.

FIG. 10 is a structural schematic view of a computer-readable storage medium according to an embodiment of the present disclosure.

DETAILED DESCRIPTION

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below by referring to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part, but not all, of the embodiments of the present disclosure. All other embodiments obtained by an ordinary skilled person in the art based on the embodiments in the present disclosure shall be included in the scope of the present disclosure. In addition, unless otherwise specified (such as "or in addition"or "or alternatively") , the term "or"herein refers to non-exclusive "or" (i.e., "and/or") . Moreover, various embodiments described herein are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments.

In the art, when training the live body detection network by applying a domain adaptive method and/or an adversarial training method, only a distribution similarity loss of live data from various domains is calculated, such that the live body detection network learns common features of the live data from the various domains. However, information of non-live data from the various domains is not fully applied. Therefore, generalization of the live body detection network trained by the methods in the art applied to live bodies may be improved, whereas high generalization for an endless number of non-live data may not be achieved.

Therefore, the present disclosure provides a training method of live body detection network. While training the live body detection network, the relative difference between the output distribution, which is generated by the live body detection network processing each non-live data, and the uniform distribution may be calculated to obtain the uniform distribution loss of the live body detection network. The adversarial training network containing the live body detection network is trained by taking the uniform distribution loss of the live body detection network. In this way, for labeled data being distributed limitedly, non-live data of different domains not having common domain invariant features, and the non-live data being an open set, the difference between the output distribution and the uniform distribution of the non-live data in the training dataset predicted by the live body detection network is calculated. In this way, the live body detection network does not learn features of the non-live data, allowing the live body detection network to focus on learning features of the live data, such that the live body detection network, which is trained by the training method of the live body detection network in the present disclosure, has a low probability output for the non-live data of the open set distribution and has a high probability output for live bodies in different domains. In this way, the live body detection network in the present disclosure has a strong generalization ability for the infinite number of non-live data, reducing the probability that the live body detection network determines the non-live data to be the live data, improving the accuracy of live body detection.

Specifically, as shown in FIG. 1, the training method of the live body detection network in the present implementation includes following operations. To be noted that, reference numerals of the following operations are made for simple illustration only, but not to limit an order of performing the operations. The order of performing the operations may be changed at will without departing from the concept of the present disclosure.

In an operation S101, non-live data are input to the live body detection network to obtain output distribution generated while the live body detection network is processing each non-live data.

The non-live data is input into the live body detection network to obtain the output distribution generated while the live body detection network is processing each non-live data. In this way, a relative difference between the output distribution of the non-live data and uniform distribution may be subsequently calculated to obtain a uniform distribution loss of the live body detection network. The uniform distribution loss may be taken to train the adversarial training network that includes the live body detection network. In this way, the live body detection network does not learn features of the non-live data, allowing the live body detection network to focus on learning features of the live data. In this way, the live body detection network obtained by being trained by the training method of the live body detection network in the present disclosure has a significantly low probability output for the non-live data with open set distribution, and has a high probability output for live bodies in various domains. Therefore, the live body detection network of the present disclosure has a strong generalization ability to the infinite number of non-live data, reducing the probability that the live body detection network determines the non-live data to be the live data, improving the accuracy of live body detection.

The present disclosure does not limit a structure of the live body detection network, as long as the live body detection network includes a feature extraction portion and a classification portion. In this way, the live body detection network can extract features from the data through the feature extraction portion of the live body detection network, and process the features extracted by the feature extraction portion through the classification portion of the live body detection network, such that a classification result of the data is obtained. In detail, the live body detection network may be a common VGG network, or the like.

In an operation S102, the relative difference between the output distribution of the non-live data and the uniform distribution is calculated to obtain the uniform distribution loss.

Based on the output distribution generated while the live body detection network processes each non-live data in the operation S101, the relative difference between the output distribution of the non-live data and the uniform distribution can be calculated to obtain the uniform distribution loss of the live body detection network.

The uniform distribution loss D _kl(p||q) may be calculated based on a formula,

The N is the total number of data of the non-live data. The p (x _i) is output distribution of non-live data x _i. The q (x _i) is the uniform distribution. In other embodiments, a relative difference between the uniform distribution and each confidence level of the non-live data belonging to each class may also be calculated by other formulas, which will not be limited by the present disclosure.

In an implementation, the output distribution generated by the live body detection network processing each non-live data may include a confidence level that each non-live data belongs to each category as predicted by the live body detection network. For example, when the live body detection network is a binary classification network, classes may include a live body class and a non-live body class. In this way, when the non-live data is input to the live body detection network, the output distribution of each non-live data may be a confidence level that each non-live data belongs to the live body class as predicted by the live body detection network and a confidence level that each non-live data belongs to the non-live body class as predicted by the live body detection network. Correspondingly, the uniform distribution may be (0.5, 0.5) . In this way, the live body detection network is trained based on the relative differences between the uniform distribution and each of the confidence levels that the non-live data belongs to each class, such that the confidence level that the non-live data belongs to the live body class as predicted by the live body detection network and the confidence level that the non-live data belongs to the non-live body class as predicted by the live body detection network approach to 0.5. For non-live data that the network has never seen, the live body detection network may have a relatively uniform probability output. Therefore, a threshold live body confidence level may be set to achieve a more robust live body detection in applications.

In another embodiment, the output distribution generated by the live body detection network processing each non-live data may include the confidence level that each non-live data belongs to the non-live body class as predicted by the live body detection network. For example, when the live body detection network is a classification network a, classes of the classification network a may include the non-live body class. In this way, when the non-live data is input to the live body detection network, the output distribution obtained for each non-live data is the confidence level that each non-live data belongs to the non-live body class as predicted by the live body detection network. Correspondingly, the uniform distribution is 1/a. The live body detection network is trained by taking the relative difference between the uniform distribution and the confidence level that the non-live data belongs to the non-live body class, such that the confidence level that the non-live data belongs to the non-live body class as predicted by the live body detection network is approaching 1/a. For the non-live data that the network has never seen, the live body detection network may have a relatively uniform probability output. In this way, the threshold live body confidence level may be set to achieve more robust live body detection in applications.

If a classifier of the live body detection network includes softmax, an output of the softmax layer may be taken as the output distribution generated by the live body detection network processing each non-live data.

In an operation S103, an adversarial training network that includes the live body detection network is trained based on the uniform distribution loss.

After obtaining the uniform distribution loss of the live body detection network based on the operation S102, the adversarial training network that includes the live body detection network may be trained based on the uniform distribution loss.

In the present embodiment, while training the live body detection network, the relative difference between the output distribution, which is generated by the live body detection network processing each non-live data, and the uniform distribution may be calculated to obtain the uniform distribution loss of the live body detection network. The adversarial training network that includes the live body detection network is trained by taking the uniform distribution loss of the live body detection network. In this way, for labeled data being distributed limitedly, non-live data of different domains not having common domain invariant features, and the non-live data being an open set, the difference between the output distribution and the uniform distribution of the non-live data in the training dataset predicted by the live body detection network is calculated. In this way, the live body detection network does not learn features of the non-live data, allowing the live body detection network to focus on learning features of the live data, such that the live body detection network, which is trained by the training method of the live body detection network in the present disclosure, has a low probability output for the non-live data of the open set distribution and has a high probability output for live bodies in different domains. In this way, the live body detection network in the present disclosure has a strong generalization ability for the infinite number of non-live data, reducing the probability that the live body detection network determines the non-live data to be the live data, improving the accuracy of live body detection.

To address the problem that the data distribution is limited, and the non-live attack data is an open set, the present disclosure provides a method for expanding the non-live data. A large number of non-live bodies are created, such that distribution of non-live bodies is more diverse thereby approaching a true open set distribution. The adversarial training network in the present disclosure includes the live body detection network and a domain determiner. In detail, as shown in FIG. 2 and FIG. 3, FIG. 2 is a flow chart of a training method of a live body detection network according to another embodiment of the present disclosure, and FIG. 3 is a schematic view of a process of a training method of a live body detection network according to another embodiment of the present disclosure. The training method of the live body detection network of the present implementation includes the following operations. To be noted that, reference numerals of the following operations are made for simple illustration only, but not to limit an order of performing the operations. The order of performing the operations may be changed at will without departing from the concept of the present disclosure.

In an operation S201, all training data are divided into sub-training data sets in a plurality of domains.

All training data may be divided into sub-training data sets in the plurality of domains based on imaging factors (such as light intensity or a type of a camera device) , pose features, attack factors, and the like.

Preferably, all training data may be divided into sub-training data sets in the plurality of domains based on the imaging factors (such as light intensity or a type of a camera device) and the pose features, such that the sub-training data set in each of the plurality of domains may include live data and non-live data. The live data of one sub-training data set of one domain may be significantly different form that of a different sub-training data set of a different domain. In this way, while performing the adversarial training on the live body detection network, the live body detection network may learn common features of the live data with different imaging factors or pose features. Detection accuracy of the trained live body detection network for data with different imaging factors or pose features may be improved.

All training data of the present disclosure may be represented as a plurality of face images. The plurality of face images include both live face images and non-live face images (such as captured photo images of faces, 2D or 3D mask images, paper-printed face images, and the like) . In other embodiments, all training data may be represented as a plurality of animal images. In an example, the training data are face images. In the present disclosure, all training data may be divided into b sub-training data sets of b different domains based on the light intensity. Data of the sub-training data set of the i-th domain may be represented as

wherein the b is an integer greater than 1, and the i is an integer greater than 0 and less than or equal to b.

In an operation S202, all training data are input into the live body detection network to obtain classification results of all training data.

All training data may be input to the live body detection network (the G network in FIG. 3) to obtain a classification result of each training data. In this way, a classification loss of the live body detection network may be calculated in the operation S203 based on classification results of all training data, such that the classification loss may be taken to train the live body detection network at a later stage, enabling the live body detection network to determine more accurately whether or not each data is the live body, improving the detection accuracy of the live body detection network.

In the case of expanding the non-live data, in the operation S202, all the training data and the expanded non-live data may be input to the live body detection network to obtain classification results of all the training data and the expanded non-live data. Subsequently, the classification loss of the live body detection network is calculated based on the classification results of all the training data and the expanded non-live data. By expanding the non-live data, the distribution of the non-live data may be more diverse, and thus, approaching the true open set distribution, improving the ability of the live body detection network to distinguish between the live body and the non-live body.

The order of performing the operation S202 and the operation S201 is not limited, for example, the operation S202 may be performed before the operation S201 or after the operation S201.

In an operation S203, the classification loss of the live body detection network is calculated based on the classification results of all training data.

The classification loss of the live body detection network may be calculated based on the classification result of each training data obtained in the operation S202.

The classification loss of the live body detection network may be calculated by performing a loss function, such as a binary classification cross-entropy loss function.

In an embodiment, the binary classification cross-entropy loss function may be expressed as,

The L _{classification} represents the classification loss of the live body detection network, the c represents a batch size, the y _j is a label of the j-th data indicating that the j-th data is the live body or the non-live body, and the p _j is the confidence level that the j-th data is the live body.

In an operation S204, a first feature data is obtained and extracted by the live body detection network from the live data in the sub-training data set of at least two domains.

After dividing all the training data into sub-training data sets in the plurality of domains based on the operation S201, the live data in the sub-training data sets of at least two domains may be input to the live body detection network to allow the live body detection network to perform feature extraction for each live data in the sub-training data sets of the at least two domains to obtain first feature data of each live data in the sub-training data sets of the at least two domains.

When all training data is input to the live body detection network in the operation S202, the first feature data is directly obtained and extracted in the operation S202 by the live body detection network from the live data in the sub-training data sets of the at least two domains. The operation of inputting the live data in the sub-training data sets of the at least two domains into the live body detection network may not be repeated. In other embodiments, the operation of inputting the live data into the sub-training data sets of the at least two domains into the live body detection network may be repeated to obtain the first feature data extracted from the live data in the sub-training data sets of the at least two domains by the live body detection network.

In an operation S205, the live data of the sub-training data sets of the at least two domains are input to sub-classifiers of the at least two domains correspondingly to obtain second feature data of the live data in the sub-training data sets of the at least two domains.

The live data of the sub-training data sets of the at least two domains are input to the sub-classifiers of the at least two domains correspondingly to obtain the second feature data of the live data in the sub-training data sets of the at least two domains, such that subsequent computation may be performed by taking the first feature data of the live data of the sub-training data sets of the at least two domains and the second feature data of the live data of the sub-training data sets of the at least two domains to obtain a feature difference loss of the live body detection network.

As shown in FIG. 4, one sub-classifier may be set for one domain (that is, the Di network in FIG. 3, and the D1 network, the D2 network, and the Di network in FIG. 4) , that is, the sub-classifier and the sub-training sample set are in one-to-one correspondence. Each sub-classifier is configured to determine whether the class of the data in the domain corresponding to the sub-classifier is the live body class or the non-live body class.

It shall be understood that, in the present disclosure, the trained sub-classifier are configured to extract features from each data in the sub-training data set of the domain, which the trained sub-classifier belongs to.

That is, before the present operation, the sub-classifiers may be trained to obtain the trained sub-classifiers.

In detail, each data in the sub-training data set of each domain may be input to the sub-classifier of the domain, which the data belongs to, to train the corresponding sub-classifier.

To be exemplary, in the operation S201, when all training data are divided into sub-training data sets of 3 domains, a sub-classifier of a first domain may be trained by taking a sub-training data set of the first domain, a sub-classifier of a second domain may be trained by taking a sub-training data set of the second domain, and a sub-classifier of a third domain may be trained by taking a sub-training data set of the third domain.

In detail, the sub-classifier of each domain may be trained by the loss function, such as the binary classification cross-entropy loss function.

A dimension of the first feature data of the live data in the sub-training data set of the at least two domains may be equal to that of the second feature data of the live data in the sub-training data set of the at least two domains. For example, c of the first feature data and c of the second feature data may both be 100, h of the first feature data and c of the second feature data may both be 64, and w of the first feature data and c of the second feature data may both be 64. In this way, the first feature data of the live data in the sub-training data set of the at least two domains and the second feature data of the live data in the sub-training data set of the at least two domains may be calculated.

In an operation S206, the first feature data and the second feature data of the live data in the sub-training data set of the at least two domains are calculated to obtain a feature difference loss.

The first feature data of the live data in the sub-training data set of the at least two domains obtained at the operation S204 and the second feature data of the live data in the sub-training data set of the at least two domains obtained at the operation S205 may be calculated to obtain the feature difference loss of the live body detection network. In this way, the live body detection network may be trained subsequently by taking the feature difference loss. That is, the live body detection network is trained by the domain adaptive training method, such that a feature output by the live body detection network approaches to an intersection of features output by all the sub-classifiers. The data output by the live body detection network may confuse a domain determiner that has a domain determination capability. That is, the domain determiner may determine feature data of live data in the remaining domains extracted by the live body detection network as feature data of live data in the corresponding domains, enabling the live body detection network to learn common features of the live data of various domains. In this way, an operation of the live body detection network trained by the training method of the present disclosure detecting the live bodies may not be affected by factors, such as imaging factors, pose features attack breakages, and the like, improving the robustness of live body detection.

In an embodiment, the difference between the first feature data of the live data in the sub-training data set of the at least two domains and the second feature data of the live data in the sub-training data set of the at least two domains may be calculated to obtain the feature difference loss of the live body detection network.

In an embodiment, a maximum mean difference between the first feature data of the live data in the sub-training data set of the at least two domains and the second feature data of the live data in the sub-training data set of the at least two domains may be calculated to obtain the feature difference loss of the live body detection network.

The feature difference loss of the live body detection network may be calculated based on the following equation.

The k (x, y) represents a kernel function, which in this case may be the Laplace kernel,

The x represents the first feature data of the live data in the sub-training data set of the at least two domains. The y represents the second feature data of the live data in the sub-training data set of the at least two domains. The m represents the total data amount of the live data corresponding to the first feature data. The n represents the total data amount of the live data corresponding to the second feature data.

Further, in the operation S204, the first feature data extracted by the live body detection network from the live data in the sub-training data sets of all domains may be obtained. In the operation S205, the live data in the sub-training data set of each domain may be input to the sub-classifier of the domain, which the sub-training data set belongs to, to obtain the second feature data of the live data in the sub-training data sets of all domains. In the operation S206, the first feature data and the second feature data of the sub-training data sets of all domains are calculated to obtain the feature difference loss.

In an operation S207, the first feature data of each data in the sub-training data set of the at least two domains is input to the domain determiner to obtain a domain determination result of each data predicted by the domain determiner.

In an operation S208, a domain determination loss is calculated based on domain determination results for all data in the sub-training data sets of the at least two domains predicted by the domain determiner.

After dividing all the training data into sub-training data sets of the plurality of domains based on the operation S201, each data in the sub-training data sets of the at least two domains may be input to the live body detection network to allow the live body detection network to extract features from each data in the sub-training data sets of the at least two domains to obtain the first feature data for each data in the sub-training data sets of the at least two domains. Further, the first feature data of each data in the sub-training data sets of the at least two domains may be input to the domain determiner to obtain the domain determination result of each data predicted by the domain determiner. Further, the domain determination loss is calculated based on the domain determination results of all data in the sub-training data sets of the at least two domains predicted by the domain determiner. The live body detection network is trained by taking the domain determination loss in order to allow the domain determiner to be unable to determine a true domain of each first feature data. In this way, the feature data output by the live body detection network can confuse the domain determiner that has the domain determiner capability. That is, the domain determiner may determine the feature data of the live data in the remaining domains extracted by the live body detection network as the feature data of the live data in the corresponding domains. In this way, the live body detection network may learn the common features of the live data of various domains, such that the operation of the live body detection network, which is trained by the training method of the present disclosure, performing the live body detection may not be affected by the factors, such as imaging factors, pose features or attack breakages, improving the robustness of live body detection.

The domain determination results for all data in the sub-training data sets of the at least two domains predicted by the domain determiner may be calculated by taking the loss function, such as the binary classification cross-entropy loss function or the uniform distribution loss function, such that the domain determination loss of the live body detection network may be obtained.

When all the training data has been input to the live body detection network in the operation S202, the first feature data extracted by the live body detection network from the sub-training data sets of the at least two domains may be obtained directly in the operation S202, and the operation of inputting the sub-training data sets of the at least two domains to the live body detection network may not be repeated.

One domain determiner is set for each domain. Each domain determiner only needs to determine whether the input feature data belongs to the domain corresponding to the present domain determiner. Further, in the operation S207, the first feature data of all training data may be input to each domain determiner (that is, the Mi network in FIG. 3) , to obtain the domain determination result of each data of each domain predicted by each domain determiner. Further, in the operation S208, the domain determination loss of each domain determiner is calculated based on the domain determination results of all training data predicted by each domain determiner. Further, the domain determination losses of all domain determiners are added to obtain a total domain determination loss of the domain determiner.

Before the operation S207, training of the domain determiner may be completed, such that the trained domain determiner may be taken to perform domain determination on the first feature data in the operation S207.

The domain determiner may be trained by taking the feature data output by the sub-classifier described in the operation S204.

In detail, as shown in FIG. 4, when one domain determiner is set for each domain, a sub-classifier may be set corresponding to each domain. That is, the sub-classifier and the domain determiner are in one-to-one correspondence. Each sub-classifier is configured to determine whether the class of the data of the corresponding domain is the live body class or the non-live body class. In this way, before the operation S205, the sub-training data set of each domain may be input to the sub-classifier of each domain in order to obtain the second feature data of each data in the sub-training data set of each domain. Further, the domain determiner of each domain is trained by taking the second feature data of each data in the sub-training data set of each domain and the second feature data of each data in the sub-training data set of at least one of the remaining domains. While training the domain determiner of each domain, the second feature data extracted by the sub-classifier in the domain, which the determiner belongs to, may be taken as a positive sample, and the second feature data extracted by the sub-classifier in the domain, which the determiner does not belong to, may be taken as a negative sample.

In other embodiments, the first feature data and the second feature data of the sub-training data set of the domain, which the determiner belongs to, may be taken as the positive sample, the first feature data and the second feature data in the sub-training data sets of at least one of the remaining domains may be taken as the negative sample, and the domain determiner of each domain may be trained by taking the positive sample and the negative sample. In this way, the domain determiner may be trained by taking the second feature data extracted by the live body detection network to increase the number of training samples for the domain determiner. In addition, while training the domain determiner, model parameters of the live body detection network may be changed to allow the second feature data output by the live body detection network to be more abundant.

In an operation S209, the output distribution generated by the live body detection network processing each non-live data is obtained.

In some embodiments, the non-live data may be expanded non-live data obtained by expanding at least a portion of all training data (which may include both original live data and original non-live data) . That is, the expanded non-live data may be obtained by expanding at least a portion of the original live data and/or at least a portion of the original non-live data from all the training data. In this way, the non-live data is significantly expanded, allowing the distribution of the non-live data to be more diverse, and thus, approaching the true open set distribution.

According to the present disclosure, at least a portion of the training data as shown in FIG. 5a may be processed through an attack-breakage image to obtain the expanded non-live data as shown in FIG. 5 (b) . The attack-breakage image is obtained by retaining an attack-breakage region in the image and setting pixels in the non-attack-breakage region to be 0. The attack breakage may include abnormal light spots, paper edges, paper cavities, paper creases, mask reflections and so on.

Further, the attack-breakage image and each of the at least a portion of the training data may be added to obtain a superimposed image. An average pixel value of all pixels of the attack-breakage image may be subtracted from a pixel value of each pixel of the superimposed image to obtain the expanded non-live data. The above method may solve a problem that pixel values of a plurality of pixels corresponding to the attack-breakage region in an expanded image obtained by directly adding the original image and the attack-breakage image being greater than 255, such that the expanded non-live data may clearly exhibit the attack breakage, allowing the expanded non-live data to be more representative.

In addition, after subtracting the average pixel value of the attack-breakage image from each pixel value of the superimposed image to obtain an intermediate image, Gaussian noise may be randomly increased for the intermediate image to obtain the expanded non-live data.

When all the training data and the expanded non-live data have already been input to the live body detection network in the operation S202, the output distribution generated by the live body detection network processing each expanded non-live data in the operation S202 may be obtained directly. The non-live data may not be repeatedly input to the live body detection network. Of course, in other embodiments, the non-live data may be input into the live body detection network repeatedly to obtain the output distribution generated by the live body detection network processing each expanded non-live data.

In an operation S210, the relative difference between the output distribution of the non-live data and the uniform distribution is calculated to obtain the uniform distribution loss.

In an operation S211, the uniform distribution loss, the feature difference loss, the domain difference loss and the classification loss are added by weight to obtain a total loss.

In some embodiments, as shown in FIG. 6, after obtaining the uniform distribution loss, the feature difference loss, the domain difference loss and the classification loss based on the above operation, the uniform distribution loss, the feature difference loss, the domain difference loss and the classification loss may be added by weight to obtain the total loss, such that adversarial training may be performed on the live body detection network and the domain determiner by taking the total loss. The live body detection network is trained to allow the total loss to be minimized, an absolute value of the domain determination loss to be maximized, the uniform distribution loss to be minimized, the feature difference loss to be minimized, and the classification loss to be minimized, such that the objectives shown in FIG. 7, which minimizes the classification loss of the live body detection network and maximizes a domain confusion degree.

Weighting factors for the uniform distribution loss, the domain difference loss, the feature difference loss and the classification loss are not limited by the present disclosure, and may be determined based on actual situations. For example, when the domain determination loss is negative, a weighting factor for each of the four types of losses may be 1. That is, the uniform distribution loss, the domain difference loss, the feature difference loss and the classification loss can be added directly to obtain the total loss.

It shall be understood that, an order to calculate the uniform distribution loss, the domain difference loss, the feature difference loss and the classification loss is not limited by the present disclosure. The order of the calculation may be referred to the order shown in the present disclosure. Alternatively, the calculation may be performed in an order of the domain difference loss –the uniform distribution loss –the feature difference loss -the classification loss.

In an operation S212, adversarial training is performed on the live body detection network and the domain determiner by taking the total loss.

According to the present embodiment, the concept of performing abnormality detection is applied to optimize the domain adaptive algorithm. To address the problem that the data distribution is limited, and the non-live attack data is an open set, a method of expanding the non-live data is provided to create a large number of non-live bodies, such that the distribution of non-live bodies is more diverse, and thus, approaching the true open set distribution. To address the problem that the non-live data does not have the domain invariant features, the concept of performing abnormality detection is provided. In the training process, domain invariant features for the live bodies in various domains are calculated, and the difference between the predicted distribution of the live body detection network and the uniform distribution for the expanded non-live body features is calculated. In this way, the trained model has a high probability output for the live bodies in various domains and has a low probability output for the non-live body data with the open set distribution, such that the live body detection is optimized.

In addition, the operations S202-S212 may be repeated until the number of repetitions reaches a preset number or the total loss is less than a threshold. When the number of repetitions reaches the preset number, or when the total loss is less than the threshold, the training is terminated.

Before the operation S202, the preset number of repetitions may be set in advance, and the number of repetitions may be set to be 0. When one process of the operations S202-S212 is completed, the number of repetitions is added by one. When the number of repetitions is less than the preset number, the operation S202 is performed to train the live body detection network again, until the number of repetitions is greater than or equal to the preset number.

After training the live body detection network based on the training method of the live body detection network in the above-described implementation, the trained live body detection network can be applied to perform live body detection. The live body detection method of the present disclosure includes: performing live body detection on an object to be detected through the live body detection network trained by the training method described above to determine whether the object to be detected is a live object or a non-live object.

In detail, as shown in FIG. 8, the live body detection method in the embodiment of the present disclosure may specifically include following operations.

In an operation 301, the object to be detected is input.

In an operation 302, the object to be detected is fed into the trained live body detection network to obtain a live body confidence level.

In an operation 303, it is determined whether the live body confidence level of the object to be detected is greater than a confidence threshold.

The confidence threshold may be set according to actual situations, for example, the confidence threshold may be 0.7, 0.8, 0.9, or the like.

In an operation 304, the object to be detected is a live body.

When the confidence level of the object to be detected being the live body is greater than the confidence threshold, an output of the operation 304 is the live body.

In an operation 305, the object to be detected is a non-live body.

When the confidence level of the object to be detected being the live body is not greater than the confidence threshold, an output of the operation 305 is a non-live body.

The live body detection method of the present embodiment is not affected by any scene feature, such as light, pose features and the like. The live body detection method can distinguish live bodies based on distinguishing features between the live bodies and the non-live bodies, and can distinguish whether a face is a real face of a live human or a faked face. The live body detection method has high robustness for live face detection.

As shown in FIG. 9, FIG. 9 is a structural schematic view of an electronic device 20 according to an embodiment of the present disclosure. The electronic device 20 of the present disclosure includes a processor 22, the processor 22 is configured to execute instructions to implement the method provided by any of the above embodiments of the present disclosure and any non-conflicting combination thereof.

The electronic device 20 may be a terminal such as a mobile phone, a laptop computer, and the like, or may be a server.

The processor 22 may be called a CPU (Central Processing Unit) . The processor 22 may be an integrated circuit chip having a signal processing capability. The processor 22 may also be a general purpose processor, a digital signal processor (DSP) , an application specific integrated circuit (ASIC) , a field programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. The general purpose processor may be a microprocessor, or the processor 22 may also be any conventional processor or the like.

The electronic device 20 may further include a memory 21 for storing instructions and data required for the processor 22 to operate.

As shown in FIG. 10, FIG. 10 is a structural schematic view of a computer-readable storage medium according to an embodiment of the present disclosure. The computer-readable storage medium 30 of the present embodiment stores instructions/program data 31 which, when being executed, implement the method provided in any of the above embodiments of the present disclosure and any non-conflicting combination of the methods. The instructions/program data 31 may for a program file, which is stored in the above-mentioned storage medium 30 in a form of a software product, enabling a computer device (apersonal computer, a server, a network device, or the like) or a processor to perform all or some of the operations of the method of each embodiment of the present disclosure. The above-mentioned storage medium 30 includes: a universal serial bus (USB) drive, a portable hard drive, a Read-Only Memory (ROM) , a Random Access Memory (RAM) , a magnetic disk, an optical disk and other media that can store program codes, or devices such as a computer, a server, a mobile phone, a tablet, and the like.

According to various embodiments of the present disclosure, it shall be understood that, the disclosed system, the device and the method can be implemented by other means. For example, the embodiments of apparatuses described above are merely exemplary. For example, units are divided based on logical functions only. Practically, the units can be divided in other ways, for example, multiple units or components can be combined or can be integrated into another system, or some features can be omitted or not implemented. In addition, the mutual coupling or direct coupling or communicative connection shown or discussed may be indirect coupling or communicative connection between devices or units via some interfaces, which may be electrical, mechanical or in other forms.

In addition, various functional units in the various embodiments of the present disclosure can be integrated in one single processing unit, or may be separated physically. Alternatively, two or more units can be integrated in one single unit. The above integrated units can be implemented either in a form of hardware or in a form of software functional units.

It shall be noted that, terms "includes", "comprises"or any variant thereof are intended to cover non-exclusive inclusion, such that a process, a method, a commodity or a device that includes a series of elements includes not only the listed elements but also other elements not expressly listed, or further includes elements that are inherent to the process, the method, the commodity or the device. Without further limitation, an element defined by a statement of "including a ......"does not preclude presence of additional identical elements in the process, in the method, in the commodity or in the device.

The above shows only embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. Any equivalent structure or equivalent process transformation performed based on the specification and the accompanying drawings of the present disclosure, directly or indirectly applied in other related fields, shall be equally covered by the scope of the present disclosure.

Claims

A method for training a live body detection network, comprising:

inputting at least one non-live data to the live body detection network to obtain an output distribution generated by the live body detection network processing each of the at least one non-live data;

calculating a relative difference between the output distribution of the non-live data and the uniform distribution to obtain a uniform distribution loss; and

performing an adversarial training on an adversarial training network that includes the live body detection network based on the uniform distribution loss.
The method according to claim 1, wherein the at least one non-live data includes expanded non-live data obtained by expanding based on at least a portion of training data.
The method according to claim 2, wherein before the inputting at least one non-live data to the live body detection network, the method further comprises:

processing the at least a portion of training data based on an attack-breakage image to obtain the expanded non-live data;

wherein the attack-breakage image is obtained by retaining an attack-breakage region of an image and setting each pixel in a non-attack-breakage region to be 0.
The method according to claim 3, wherein the processing the at least a portion of training data based on an attack-breakage image, comprises:

adding the attack-breakage image and the training data to obtain a superimposed image; and

subtracting an average pixel value of the attack-breakage image from a pixel value of each pixel in the superimposed image to obtain the expanded non-live data.
The method according to claim 4, wherein the subtracting an average pixel value of the attack-breakage image from a pixel value of each pixel in the superimposed image to obtain the expanded non-live data, comprises:

subtracting an average pixel value of the attack-breakage image from a pixel value of each pixel in the superimposed image to obtain an intermediate image; and

increasing Gaussian noise to the intermediate image to obtain the expanded non-live data.
The method according to claim 1, wherein one sub-classifier is configured for one domain, and before the performing an adversarial training on an adversarial training network that includes the live body detection network based on the uniform distribution loss, the method further comprises:

dividing all the training data into a plurality of sub-training data sets of a plurality of domains;

inputting live data in sub-training data sets of at least two of the plurality of domains to the live body detection network to obtain first feature data of the live data of the sub-training data sets of the at least two of the plurality of domains;

inputting the live data of the sub-training data sets of the at least two of the plurality of domains to sub-classifiers of the at least two of the plurality of domains, which the live data belong to, to obtain second feature data of the live data of the sub-training data sets of the at least two of the plurality of domains;

calculating the first feature data and the second feature data of the live data of the sub-training data sets for the at least two of the plurality of domains to obtain a feature difference loss;

wherein the performing an adversarial training on an adversarial training network that includes the live body detection network based on the uniform distribution loss, comprises:

performing an adversarial training on the live body detection network and a domain determiner based on the uniform distribution loss and the feature difference loss.
The method according to claim 6, wherein the calculating the first feature data and the second feature data of the live data of the sub-training data sets for the at least two of the plurality of domains to obtain a feature difference loss, comprises:

calculating a maximum mean difference between the first feature data and the second feature data of the sub-training data sets of the at least two of the plurality of domains to obtain the feature difference loss of the live body detection network.
The method according to claim 6, wherein the adversarial training network comprises the live body detection network and the domain determiner, and before the performing an adversarial training on the live body detection network and a domain determiner based on the uniform distribution loss and the feature difference loss, the method further comprises:

inputting the first feature data for each data in the sub-training data sets of the at least two of the plurality of domains to the domain determiner to obtain a domain determination result for each data predicted by the domain determiner; and

calculating a domain determination loss based on domain determination results for all data in the sub-training data sets of the at least two of the plurality of domains predicted by the domain determiner;

wherein the performing an adversarial training on an adversarial training network that includes the live body detection network based on the uniform distribution loss, comprises:

performing the adversarial training on the live body detection network and the domain determiner based on the uniform distribution loss, the feature difference loss and the domain determination loss.
The method according to claim 8, wherein before the performing an adversarial training on an adversarial training network that includes the live body detection network based on the uniform distribution loss, the method further comprises:

obtaining classification results of all the training data predicted by the live body detection network; and

calculating a classification loss of the live body detection network based on the classification results of all the training data; and

wherein the performing the adversarial training on the live body detection network and the domain determiner based on the uniform distribution loss, the feature difference loss and the domain determination loss, comprises:

adding the uniform distribution loss, the feature difference loss, the domain difference loss and the classification loss by weight to obtain a total loss; and

performing the adversarial training on the live body detection network and the domain determiner by taking the total loss.
The method according to claim 8, wherein one domain determiner is configured for one domain, and the inputting the first feature data for each data in the sub-training data sets of the at least two of the plurality of domains to the domain determiner, comprises:

inputting first feature data of all the training data to each domain determiner to obtain a domain determination result for each training data predicted by each domain determiner; and

wherein calculating a domain determination loss based on the domain determination result for each training data predicted by each domain determiner, comprises:

calculating a domain determination loss for each domain determiner based on the domain discriminant results for all the training data predicted by each domain determiner; and

adding domain determination losses for all domain determiners to obtain a total domain determination loss of the domain determiner.
The method according to claim 10, wherein before the inputting first feature data of all the training data to each domain determiner, the method further comprises:

training the domain determiner of each domain by taking the second feature data of the sub-training data set for each domain and the second feature data of a sub-training data set of at least one of the remaining domains.
The method according to claim 11, wherein before the inputting the live data of the sub-training data sets of the at least two of the plurality of domains to sub-classifiers of the at least two of the plurality of domains, which the live data belong to, to obtain second feature data of the live data of the sub-training data sets of the at least two of the plurality of domains, the method further comprises:

training the sub-classifier of each domain through a binary classification cross-entropy loss function by taking the sub-training data set for each domain.
A live body detection method, comprising:

performing live body detection on an object to be detected by applying the live body detection network trained by the method of any one of claims 1-12 to determine whether the object to be detected is a live body or a non-live body.
An electronic device, comprising: a processor; wherein the processor is configured to execute instructions to implement operations of the method of any one of claims 1-13.
A computer-readable storage medium, which stores a program and/or instructions, wherein the program and/or the instructions, when being executed, implement the operations of the method of any of claims 1-13.