CN113591736A

CN113591736A - Feature extraction network, training method of living body detection model and living body detection method

Info

Publication number: CN113591736A
Application number: CN202110888934.3A
Authority: CN
Inventors: 邹棹帆
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-08-03
Filing date: 2021-08-03
Publication date: 2021-11-02

Abstract

The utility model provides a training method of a feature extraction network and a living body detection model and a living body detection method, which relate to the field of artificial intelligence, in particular to the field of computer vision and deep learning and can be applied to scenes such as face recognition, living body detection and the like. The training method of the feature extraction network comprises the following steps: inputting the sample into a first extraction sub-network in a feature extraction network to obtain a feature map; the sample has a first label indicating an actual probability that the sample is from a plurality of data fields; inputting the feature map into a second extraction sub-network in the feature extraction network to obtain domain invariant features; based on the feature map and the domain invariant features, a domain classifier is adopted to obtain a first probability and a second probability of a sample from a plurality of data domains; training a first extraction subnetwork based on the first probability, the actual probability, and a first weight for the actual probability; and training a second extraction subnetwork based on the second probability, the actual probability, and a second weight for the actual probability that is greater than the first weight.

Description

Feature extraction network, training method of living body detection model and living body detection method

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to scenes such as face recognition, living body detection and the like.

Background

With the development of computer technology and network technology, deep learning technology has been widely used in many fields. At present, the deep learning technology can bring a better test effect for a certain data domain. But on different data fields, better test effect cannot be ensured. That is, the generalization ability of deep learning technology models is generally low.

Disclosure of Invention

Based on this, the present disclosure provides a training method of a feature extraction network, a training method of a live body detection model, and a live body detection method, apparatus, device, and storage medium that improve generalization ability.

According to an aspect of the present disclosure, there is provided a training method of a feature extraction network, wherein the feature extraction network includes a first extraction sub-network and a second extraction sub-network, the training method including: inputting the first sample image into a first extraction sub-network to obtain a feature map of the first sample image; wherein the first sample image has a first label indicating an actual probability that the first sample image is from the plurality of data fields; inputting the feature map into a second extraction sub-network to obtain a first domain invariant feature of the first sample image; respectively obtaining a first probability and a second probability of the first sample image from a plurality of data fields by adopting a predetermined field classifier based on the feature map and the first field invariant feature; training a first extraction subnetwork based on the first probability, the actual probability, and a first predetermined weight for the actual probability; and training a second extraction subnetwork based on the second probability, the actual probability, and a second predetermined weight for the actual probability, wherein the first predetermined weight is greater than the second predetermined weight.

According to another aspect of the present disclosure, a training method of a living body detection model is provided, wherein the living body detection model includes a feature extraction submodule and a classification submodule; the method comprises the following steps: inputting the second sample image into a feature extraction network to obtain a second domain invariant feature of the second sample image; wherein the second sample image includes the target object and the second sample image has a second label indicating an actual category of the target object; inputting the second domain invariant features into a classification network to obtain the prediction category of the target object in the second sample image; and training the living body detection model based on the actual category and the prediction category, wherein the feature extraction network is obtained by training by adopting the training method of the feature extraction network, and the actual category comprises a real category or a non-real category.

According to another aspect of the present disclosure, there is provided a method of living body detection, including: inputting the image to be detected into a feature extraction network included in the living body detection model to obtain a third domain invariant feature of the image to be detected; and inputting the invariant features of the third domain into a classification network included in the living body detection model to obtain the category of the target object in the image to be detected, wherein the living body detection model is obtained by adopting the training method of the living body detection model described in the foregoing.

According to another aspect of the present disclosure, there is provided a training device of a feature extraction network, wherein the feature extraction network includes a first extraction sub-network and a second extraction sub-network, the training device comprising: the first feature extraction module is used for inputting the first sample image into a first extraction sub-network to obtain a feature map of the first sample image; wherein the first sample image has a first label indicating an actual probability that the first sample image is from the plurality of data fields; the second feature extraction module is used for inputting the feature map into a second extraction sub-network to obtain the first domain invariant features of the first sample image; the probability obtaining module is used for obtaining a first probability and a second probability of the first sample image from the plurality of data fields by adopting a predetermined field classifier based on the feature map and the first field invariant feature respectively; a first training module to train a first extraction subnetwork based on a first probability, an actual probability, and a first predetermined weight for the actual probability; and a second training module to train a second extraction subnetwork based on the second probability, the actual probability, and a second predetermined weight for the actual probability, wherein the first predetermined weight is greater than the second predetermined weight.

According to another aspect of the present disclosure, there is provided a training apparatus for a living body detection model, wherein the living body detection model includes a feature extraction sub-module and a classification sub-module, the apparatus including: the third feature extraction module is used for inputting the second sample image into a feature extraction network to obtain a second domain invariant feature of the second sample image; wherein the second sample image includes the target object and the second sample image has a second label indicating an actual category of the target object; the first class obtaining module is used for inputting the second domain invariant features into a classification network to obtain the prediction class of the target object in the second sample image; and the seventh training module is used for training the living body detection model based on the actual category and the prediction category, wherein the characteristic extraction submodule is obtained by adopting the training device of the characteristic extraction network for training, and the actual category comprises a real category or a non-real category.

According to another aspect of the present disclosure, there is provided a living body detection apparatus including: the fourth feature extraction module is used for inputting the image to be detected into a feature extraction network included in the living body detection model to obtain a third domain invariant feature of the image to be detected; and the second category obtaining module is used for inputting the invariant features of the third domain into a classification network included by the living body detection model to obtain the category of the target object in the image to be detected, wherein the living body detection model is obtained by adopting the training device of the living body detection model for training.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform at least one of the following methods provided by the present disclosure: a training method of a feature extraction network, a training method of a living body detection model and a living body detection method.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform at least one of the following methods provided by the present disclosure: a training method of a feature extraction network, a training method of a living body detection model and a living body detection method.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements at least one of the following methods provided by the present disclosure: a training method of a feature extraction network, a training method of a living body detection model and a living body detection method.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an application scenario of a feature extraction network, a training method of a living body detection model, a living body detection method and a living body detection device according to an embodiment of the disclosure;

FIG. 2 is a flow diagram of a method of training a feature extraction network according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a feature extraction network according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a training feature extraction network according to an embodiment of the present disclosure;

FIG. 5 is a flow chart diagram of a method of training a liveness detection model according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a principle of training a liveness detection model according to an embodiment of the present disclosure;

FIG. 7 is a schematic flow chart diagram of a liveness detection method according to an embodiment of the present disclosure;

FIG. 8 is a block diagram of a training apparatus for a feature extraction network according to an embodiment of the present disclosure;

FIG. 9 is a block diagram of a configuration of a training apparatus for a liveness detection model according to an embodiment of the present disclosure;

FIG. 10 is a block diagram of the configuration of a living body detecting apparatus according to an embodiment of the present disclosure; and

FIG. 11 is a block diagram of an electronic device used to implement methods of embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Deep learning techniques have met with great success in vivo testing. Although the deep learning technology can achieve a good test result in a certain data field, the test effect is often poor when images of different data fields are subjected to living body detection, which is caused by the poor generalization capability of the model. In implementing the disclosed concept, the inventors discovered that the reason for the poor generalization ability of the model is because the intrinsic distribution relationship of data of different data domains is not considered, and thus the learned features of the model are biased.

In order to improve the generalization of the model, an extended data collection, Domain Adaptation (Domain Adaptation) technique, and a method of generating a countermeasure network to perform Domain augmentation processing on data of a source Domain may be adopted.

Specifically, the method for expanding data acquisition may use images acquired by more data acquisition devices in more data acquisition scenes to train the model, so as to increase the spatial range of the source domain. The method has long period from data acquisition to data backflow and data processing, and the diversity of the data is difficult to ensure in the data acquisition process. The execution process is not only labor and material consuming, but also difficult to make the collected data reach the expectation.

The difference between the source domain and the target domain can be reduced by adopting a domain adaptation technology, and the source domain and the target domain are mapped into the same feature space. However, if the domain adaptation technique is adopted, an iteration is required to be performed every time a new domain is reflowed so that the model adapts to the new domain, which is high in labor cost.

By generating the method for carrying out the domain augmentation processing on the countermeasure network, the characteristics of different domains can be generated so as to enhance the generalization capability of the model to unknown domains. The method expands the types of generated data fields in a noise disturbance mode, does not need backflow and multiple times of collection of data, and expands the probability of hitting unknown data fields through data expansion. However, with this approach, there is still a high probability of encountering unknown domains that are not generalized by the model.

In order to overcome the problems in the foregoing methods, the present disclosure is directed to training a feature extraction network through existing source data from the perspective of Domain Generalization (Domain Generalization), so that extracted features can align with data distribution of different domains, eliminate features related to data domains in model extracted features, and map a target object (e.g., a human face) of an unknown data Domain to the vicinity of a shared feature space, thereby enabling a living body detection model with the feature extraction network to be well generalized to a new data Domain. Specifically, the present disclosure provides a training method for a feature extraction network, which includes a feature extraction stage, a probability determination stage, and a model training stage. In the feature extraction stage, a first sample image is input into a first extraction sub-network in the feature extraction network to obtain a feature map of the first sample image, and the feature map is input into a second extraction sub-network in the feature extraction network to obtain a first domain invariant feature of the first sample image. Wherein the first sample image has a first label indicating an actual probability that the first sample image is from the plurality of data fields. In a probability determination stage, a predetermined domain classifier is adopted to obtain a first probability and a second probability of the first sample image from a plurality of data domains based on the feature map and the first domain invariant feature respectively. In the model training phase, a first extraction subnetwork is trained based on a first probability, an actual probability and a first predetermined weight for the actual probability, and a second extraction subnetwork is trained based on a second probability, the actual probability and a second predetermined weight for the actual probability. Wherein the first predetermined weight is greater than the second predetermined weight.

An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.

Fig. 1 is a schematic view of an application scenario of a feature extraction network, a training method of a living body detection model, a living body detection method, and a living body detection device according to an embodiment of the disclosure.

As shown in fig. 1, the application scenario 100 of the embodiment may include a resident 110, a building 120 and an image capture device 130, and an electronic device 140. The image capturing apparatus 130 may be communicatively connected to the electronic device 140 via a network or a near field communication protocol.

The entrance door of the building 120 may be, for example, a smart door. The electronic device 140 may, for example, control the opening and closing of the entry door. The image capturing device 130 may be used to capture a facial image of the resident 110 who walks to the entrance, for example, and transmit the facial image to the electronic device 140, and the electronic device 140 performs living body detection and face recognition on the facial image. When the face in the face image is detected and determined to be the real face of the predetermined user, an opening instruction can be sent to the entrance door, so that the entrance door can be opened intelligently.

In one embodiment, the electronic device 140 may be, for example, a terminal device having a display screen and disposed in the building 120, or may be a remote control server.

In one embodiment, the electronic device 140 may detect whether a face in the face image is a real face using a live body detection model, for example. The liveness detection model may be trained by a server communicatively coupled to the electronic device 140. Alternatively, the electronic device 140 may be trained in advance.

It should be noted that the training method of the feature extraction network and the training method of the living body detection model provided by the present disclosure may be generally executed by the electronic device 140 or executed by a server communicatively connected to the electronic device 140. Accordingly, the training apparatus of the feature extraction network and the training apparatus of the living body detection model provided by the present disclosure may be provided in the electronic device 140 or in a server communicatively connected to the electronic device 140. The living body detection method provided by the present disclosure may be performed by the electronic device 140. Accordingly, the living body detecting apparatus provided by the present disclosure may be provided in the electronic device 140.

It should be understood that the number and type of image capturing devices, buildings, and electronic equipment, and the number of residents in fig. 1 are merely illustrative. There may be any number and type of image capture devices, buildings and electronic equipment, and any number of residents, as desired for the implementation.

It is to be understood that the application scenario in fig. 1 is only an example to facilitate understanding of the present disclosure, and the method of the present disclosure may also be applied to various scenarios such as object classification, object detection, object recognition, object segmentation, and object prediction.

The training method of the feature extraction network provided by the present disclosure will be described in detail below with reference to fig. 1 through fig. 2 to 4 below.

Fig. 2 is a flow chart diagram of a training method of a feature extraction network according to an embodiment of the present disclosure.

As shown in fig. 2, the training method 200 of the feature extraction network of this embodiment may include operations S210 to S250.

In operation S210, a first sample image is input into a first extraction subnetwork in a feature extraction network, and a feature map of the first sample image is obtained.

According to the embodiment of the present disclosure, the first extraction sub-network may adopt, for example, a feature extraction architecture in a U-Net network, or may adopt an arbitrary encoder structure, which is not limited by the present disclosure. As long as the first extraction subnetwork can be used to enable extraction of image features. This operation may input a first sample image into the first extraction subnetwork, and take the features output by the first extraction subnetwork as a feature map.

In operation S220, the feature map is input into a second extraction subnetwork in the feature extraction network, resulting in a first domain-invariant feature of the first sample image.

According to an embodiment of the present disclosure, the second extraction subnetwork may employ, for example, a decoder structure for screening Domain-Invariant features (DIR) from the feature map. The domain invariant feature refers to a feature that is independent of the domain to which the sample image belongs, that is, the domain invariant feature does not change according to the domain to which the sample image belongs. Taking a living body detection scene as an example, the domain invariant feature may be a feature of a human face.

In an embodiment, a decoupler may be disposed in the second feature extraction network, for example, to decouple the domain-invariant features in the feature map from the domain-specific features, so as to facilitate the extraction of the domain-invariant features. Here, the Domain-Specific Representation (DSR) refers to a feature Specific to a data Domain to which the sample image belongs, and the Domain-Specific feature changes depending on the Domain to which the sample image belongs. For example, in a live detection scenario, the domain-specific feature may be a feature of the environment surrounding the target object, such as a weather feature or the like.

In operation S230, a predetermined domain classifier is used to obtain a first probability and a second probability that the first sample image is from the plurality of data domains based on the feature map and the first domain invariant features, respectively.

The feature map may be input to a predetermined Domain Classifier (Domain Classifier), which processes the feature map to output a probability that the first sample image belongs to each of the plurality of data domains, resulting in a first probability vector. The plurality of data fields may be, for example, a collection of data fields to which the plurality of sample images belong. Similarly, the first domain invariant feature is input into the predetermined domain classifier, the predetermined domain classifier processes the first domain invariant feature, and the probability that the first sample image belongs to each data domain of the plurality of data domains is output to obtain a second probability vector. The predetermined domain classifier may be any domain classifier in the related art, which is not limited in this disclosure.

In operation S240, a first extraction subnetwork is trained based on the first probability, the actual probability, and a first predetermined weight for the actual probability.

According to an embodiment of the present disclosure, the first sample image may have a first label indicating an actual probability that the first sample image is from the plurality of data fields. For example, the label of the first sample image may include information such as the model of the image capturing device that captured the first sample image. As such, the actual probability that the first sample image is from multiple data fields may be: the actual probability of the first sample image from the data field to which it belongs is 1, and the actual probability from the other data fields is 0.

The embodiment may determine the value of the first loss function based on the first probability, the actual probability, and the first predetermined weight. In one embodiment, the first loss function may be a cross entropy loss function, a normal form loss function, a mean square error loss function, or the like. In the first loss function, the term characterizing the target value (target) may be a product of the first predetermined weight and the actual probability. And then training the first extraction sub-network by adopting a gradient descent algorithm or a back propagation algorithm and the like based on the value of the first loss function. The first predetermined weight may be 1, or any value smaller than 1 but close to 1, which is not limited by the present disclosure.

In an embodiment, the number of the first sample images may be multiple, the multiple first sample images constitute a batch of sample data, and the embodiment may count a difference between a product of an actual probability obtained based on the batch of sample data and a first predetermined weight and the first probability to obtain a value of the first loss function.

In an embodiment, when training the first extraction sub-network, the training may be performed together with the domain classifier to further improve the precision of the domain classifier, which is beneficial to improve the training effect of the first extraction sub-network.

In operation S250, a second extraction subnetwork is trained based on the second probability, the actual probability, and a second predetermined weight for the actual probability. This operation S250 is similar to the operation S240, except that the value of the second predetermined weight is smaller than the value of the first predetermined weight.

According to the embodiment of the disclosure, when the second extraction sub-network is trained, a smaller weight is assigned to the actual probability, so that the domain invariant features extracted by the trained second extraction sub-network are indistinguishable to the domain, and the accuracy of the domain invariant features is improved. And thus facilitates improving the accuracy of completion of downstream tasks (e.g., liveness detection, object classification) performed based on the domain-invariant features.

In one embodiment, the first sample image may be set according to a downstream task. For example, in the living body detection task, the first sample image may be an image including a human face. For example, the first sample image may also include both positive and negative samples, for example, and the embodiment trains the feature extraction network based only on the domain-invariant features of the positive samples. Therefore, the situation that the model is difficult to optimize the generalizable features due to large distribution difference of the negative samples, and the accuracy of the downstream task executed based on the extracted features cannot be improved can be avoided. For example, in the living body detection, since the attack method and the collection method of the false face are different, it is difficult to aggregate the false face data of each data field, which is not favorable for finding the generalizable feature space for the false face data.

In particular, the plurality of first sample images includes a first sample of the target object having a real category and a second sample of the target object having a non-real category. The real class target object may be a real face, and the non-real class target object may be a false face. Only if the first probability is obtained based on the first sample, the first extraction subnetwork is trained based on the first probability and the actual probability. In this way, the expression ability of the first extraction subnetwork on the real target object can be strengthened, and the detection accuracy can be improved in the living body detection. Similarly, the second extraction sub-network may be trained based on the second probability and the actual probability only if the second probability is obtained based on the first sample. Therefore, the expression capability of the second extraction sub-network on the domain-invariant features can be enhanced, and the situation that convergence is difficult due to large distribution difference of the unreal targets of different data domains on the features is avoided.

Wherein the plurality of first sample images may also be comprised of images taken from a plurality of data fields. Thereby further improving the generalization capability of the model.

Fig. 3 is a schematic structural diagram of a feature extraction network according to an embodiment of the present disclosure.

According to the embodiment of the present disclosure, as shown in fig. 3, the feature extraction network in this embodiment 300 may further include a first normalization subnetwork 313 and a second normalization subnetwork 314 in addition to the first extraction subnetwork G311 and the second extraction subnetwork D312. The first normalization subnetwork 313 is connected to the first extraction subnetwork G311 to normalize the feature map extracted by the first extraction subnetwork G311 to obtain the first standard feature. The second normalization subnetwork 314 is connected to the second extraction subnetwork D312 to normalize the domain-invariant features extracted by the second extraction subnetwork D312 to obtain second standard features.

Accordingly, when the feature extraction network is trained, the first sample image 301 may be input into the first extraction subnetwork G311, the feature map obtained by the first extraction subnetwork G311 may be input into the second extraction subnetwork D312 and the first normalization subnetwork 313, respectively, and the domain-invariant features output by the second extraction subnetwork D312 may be input into the second normalization subnetwork 314. After the first normalized features output by the first normalization subnetwork 313 are input into the predetermined domain classifier 320, the predetermined domain classifier 320 outputs the first probability 302. The second normalized feature output by the second normalization subnetwork 314 is input to the predetermined domain classifier 305 and the predetermined domain classifier 320 outputs the second probability 303.

The first normalization sub-network 313 and the second normalization sub-network 314 can respectively use the normalization function to normalize the feature map and the domain-invariant feature. The normalization function may be a normalization function, etc., which is not limited by this disclosure.

The feature extraction network of the embodiment of the disclosure normalizes the features by setting the normalization network, so that the generalization capability of the feature extraction network on new domain data can be further improved, and the completion accuracy of downstream tasks is further improved.

FIG. 4 is a schematic diagram of a training feature extraction network according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, the overall structure of the feature extraction network may also be trained based on the distance between the domain-invariant features of the two samples, i.e. the first extraction sub-network and the second extraction sub-network are trained based on the distance, such that the feature extraction network selectively learns the commonality between the features.

According to the embodiment of the disclosure, the feature extraction network can be trained based on the distance of the domain-invariant features of two second samples from the same data domain, so that the feature extraction network can learn the commonality between the second samples of the same data domain, the feature distribution of a plurality of unreal classes of target objects of the same data domain is compact, and the distribution of the data features of the plurality of data domains in the feature space is more uniform.

Illustratively, the aforementioned second sample may comprise a plurality of samples from at least two of the plurality of data fields. As shown in fig. 4, in the embodiment 400, when training the feature extraction model, first domain invariant features of two second samples from the same data domain may be determined, so as to obtain two first target features. The feature extraction network is then trained using a predetermined penalty function 480 based on the first distance 440 between the two first target features. Wherein the predetermined loss function 480 and the first distance 440 are positively correlated with each other.

For example, for the second sample 421 and the second sample 422 from the first data field 411, a first field invariant feature 431 and a first field invariant feature 432 may be obtained, respectively. The distance between the two first domain invariant features is a first distance 440. Similarly, a first domain invariant feature 433 and a first domain invariant feature 434, and thus a first distance 440, may be derived based on the second sample 423 and the second sample 424 from the second data domain 412. Substituting the first distance 440 into the predetermined loss function 480 may result in a value of the predetermined loss function 480. The predetermined loss function may be a contrast loss function or a triplet loss function, etc. The first distance may be expressed by a euclidean distance, a manhattan distance, a mahalanobis distance, or the like, which is not limited by the present disclosure. It is understood that, if the number of the second samples belonging to each data field is M, and the number of the data fields is N, the number of the first distances obtained is N × M (M-1)/2.

According to the embodiment of the disclosure, the number of the first samples may be multiple, and the embodiment may further train the feature extraction network based on the distance between the first domain invariant features of any two first samples, so that the feature distribution of the target object in the real category is more compact, and the expression capability of the model on the target object in the real category is improved. The difference of the data fields is not considered for the first sample because the distribution difference of the target objects of the real class is usually small, and the feature extraction network easily learns a compact feature space for the target objects of the real class.

For example, as shown in fig. 4, in the embodiment 400, when the feature extraction model is trained, first domain invariant features of any two first samples in the plurality of first samples may be determined, so as to obtain two second target features. The feature extraction network is then trained with a predetermined penalty function based on a second distance 450 between the two second target features. Wherein the predetermined loss function 480 and the second distance 450 are positively correlated with each other.

For example, for the first sample 425 to the first sample 426, a first domain invariant feature 435 to a first domain invariant feature 436 may be obtained, respectively. The distance between feature 435 and feature 436 is a second distance 450. Substituting the second distance 450 into the predetermined loss function 480 may result in a value of the predetermined loss function 480. The predetermined loss function and the second distance are similar to the predetermined loss function and the first distance, respectively, and are not described herein again. It is understood that, if the number of the first samples is set to be P, the number of the obtained second distances 450 is P (P-1)/2.

The embodiment may also train the feature extraction network based on a distance between the first domain-invariant feature of the first sample and the first domain-invariant feature of the second sample. Therefore, the characteristic extraction network learns the distinguishing characteristics between the real-class target objects and the non-real-class target objects, and the characteristic extraction model is improved to effectively distinguish the different classes of target objects.

Illustratively, as shown in fig. 4, this embodiment 400 may train the feature extraction network with a predetermined loss function based on a third distance 460 between the first domain-invariant feature of the first sample and the first domain-invariant feature of the second sample when training the feature extraction model. Wherein the predetermined loss function 480 and the third distance 460 are inversely related to each other. The predetermined loss function and the third distance are similar to the predetermined loss function and the first distance, respectively, and are not described herein again.

For example, for the first sample 426 and the second sample 421, a first domain invariant feature 436 and a first domain invariant feature 431 may be obtained, respectively, and the distance between the features 436 and 431 is a third distance 460. For the first sample 425 and the second sample 421, a first domain invariant feature 435 and a first domain invariant feature 431 may be obtained, respectively, with the distance between the features 435 and 431 being a third distance 460. By analogy, if the total number of the second samples is set to m × n, and the number of the first samples is set to p, the number of the obtained third distances 460 is set to m × n × p.

According to the embodiment of the disclosure, the feature extraction network can be trained based on the distance between the domain invariant features of two second samples from different data domains, so that the feature distribution of a plurality of target objects of unreal classes in different data domains is dispersed, and the problem that the feature extraction network is difficult to optimize and converge due to large difference of the target objects of unreal classes in different data domains is avoided.

For example, as shown in fig. 4, in the embodiment 400, when training the feature extraction model, the first domain invariant features of each of two second samples from different data domains may be determined, so as to obtain two third target features. The feature extraction network is then trained with a predetermined penalty function based on a fourth distance 470 between the two third target features. Wherein the predetermined loss function and the fourth distance 470 are inversely related to each other. The predetermined loss function and the fourth distance are similar to the predetermined loss function and the first distance, respectively, and are not described herein again.

For example, for the second sample 421 from the first data field 411 and the second sample 423 from the second data field 412, a first domain invariant feature 431 and a first domain invariant feature 433 may be obtained, respectively, and the distance between the features 431 and 433 is a fourth distance 470. For the second sample 422 from the first data field 411 and the second sample 423 from the second data field 412, a first domain invariant feature 432 and a first domain invariant feature 433 are obtained, respectively, and the distance between the

features

432 and 433 is a fourth distance 470. By analogy, the number of the data fields is set to be M, the number of the second samples belonging to each data field is set to be N, and the number of the obtained fourth distances 470 is set to be N²M (M-1)/2.

It will be appreciated that at least two of the aforementioned first, second, third and fourth distances may be considered in training the feature extraction model with a predetermined loss function.

For example, in one embodiment, as shown in FIG. 4, the first distance, the second distance, the third distance, and the fourth distance may be considered in combination. Therefore, the feature extraction model can learn a generalizable feature space, the feature distribution of the target object in the real category in the feature space is compact, the feature distribution of the target object in the unreal category in different data domains is dispersed, and the feature distribution of the target object in the unreal category in the same data domain is compact. In this case, the predetermined loss function can be expressed by the following equation:

wherein the content of the first and second substances,

respectively representing first domain invariant features of two first samples or first domain invariant features of two second samples belonging to the same data domain;

respectively representing a first domain-invariant feature of a first sample and a first domain-invariant feature of a second sample, or first domain-invariant features of two second samples belonging to different data domains, m being a predetermined boundary value greater than 0. In the summation formula, the first term is the first distance or the second distance, and the second term is the third distance or the fourth distance. i. j is the number of the first sample image.

Based on the training method of the feature extraction network, the disclosure also provides a training method of the living body detection model. The method will be described in detail below with reference to fig. 5 to 6.

Fig. 5 is a flow chart diagram of a training method of an in-vivo detection model according to an embodiment of the present disclosure.

As shown in fig. 5, the training method 500 of the living body test model of the embodiment may include operations S510 to S530. Wherein, the living body detection model can comprise a feature extraction network and a classification network. The feature extraction network is obtained by training in the embodiment.

In operation S510, the second sample image is input to the feature extraction network, and a second domain invariant feature of the second sample image is obtained.

According to an embodiment of the present disclosure, the second sample image may include a target object, which may be of a real category or a non-real category. For example, the target object may be a real human face or a false face. The method for obtaining the second domain invariant feature of the second sample image is similar to the method for obtaining the first domain invariant feature of the first sample image, and is not repeated herein.

In operation S520, the invariant features in the second domain are input into the classification network, so as to obtain a prediction class of the target object in the second sample image.

According to an embodiment of the present disclosure, the classification network may include, for example, an active layer and a full connection layer, and the classification network is similar to a classification network of a living body detection model in the related art, and the present disclosure does not limit this. The data output by the classification network may be a predicted probability value of the target object in the second sample image being in a true category and a predicted probability value of the target object being in a non-true category. The embodiment may determine the prediction class as the real class when the probability value for the real class is greater than a predetermined threshold, and otherwise determine the prediction class as the non-real class. When the target object is a human face, the real class is a real face class, and the non-real class is a false face class.

In operation S530, a living body detection model is trained based on the actual category and the prediction category.

According to an embodiment of the present disclosure, the second sample image has a second label indicating an actual category of the target object, the actual category being either a real category or a non-real category. For example, if the actual class is the real class, the embodiment may set the actual probability value that the target object is the real class to 1, and set the actual probability value that the target object is the non-real class to 0. And then calculating the value of the second loss function according to the actual probability value and the predicted probability value. The liveness detection model is trained by minimizing the second loss function. The second loss function is similar to the first loss function described above, and may be a cross entropy loss function, a normal form loss function, or a mean square error loss function.

The training of the liveness detection model may be a simultaneous training of the classification network and the feature extraction network.

FIG. 6 is a schematic diagram of a principle of training a liveness detection model according to an embodiment of the present disclosure.

As shown in fig. 6, the living body detection model in this embodiment 600 includes a feature extraction network 610 and a classification network 620. Therein, the feature extraction network 610 comprises a first extraction subnetwork G611, a second extraction subnetwork D612, a first normalization subnetwork 613 and a second normalization subnetwork 614.

The plurality of sample images 601 may include, for example, the aforementioned first and second samples, each sample image including the aforementioned first and second labels. When training the living body detection model, a plurality of sample images 601 may be sequentially input to the feature extraction network 610, and the domain classifier 630 is used to perform domain classification on the feature map and the domain invariant features of each sample image obtained by the feature extraction network, so as to obtain the first probability 602 and the second probability 603 for each sample image. Meanwhile, the domain-invariant features of each sample image are input to the classification network 620, and the prediction class 604 for each sample image is derived based on the output of the classification network 620. The value of the aforementioned first loss function for the first extraction sub-network G611 can be derived based on the first probabilities 602, the actual probabilities and the first predetermined weights of the plurality of sample images. Based on the second probability 603, the actual probability and the second predetermined weight, the value of the aforementioned first loss function for the second extraction subnetwork D612 can be obtained. Based on the prediction categories and actual categories of the plurality of sample images, the value of the second loss function can be obtained.

Meanwhile, the second sample comprises a plurality of samples from different data fields. Based on the domain-invariant features of the plurality of sample images 601, a value of a predetermined loss function 605 may be calculated.

After obtaining the values of the loss functions, the first extraction subnetwork G611, the first normalization subnetwork 613 and the domain classifier 630 may be trained based on the values of the first loss function for the first extraction subnetwork G611, and the second extraction subnetwork, the second normalization subnetwork, the classification network and the domain classifier may be trained based on the weighted sum of the values of the predetermined loss function 605, the values of the first loss function for the second extraction subnetwork D612 and the values of the second loss function.

In an embodiment, the feature extraction network may be pre-trained based on a value of the first loss function. The second extraction subnetwork D612 is then trained based on a predetermined loss function. And finally, training the feature extraction network and the classification network based on the second loss function.

Based on the training method of the in-vivo detection model provided by the disclosure, the disclosure also provides an in-vivo detection method. This method will be described in detail below with reference to fig. 7.

FIG. 7 is a schematic flow chart diagram of a liveness detection method according to an embodiment of the present disclosure.

As shown in fig. 7, the living body detecting method 700 of this embodiment may include operations S710 to S720.

In operation S710, the image to be detected is input to the feature extraction network included in the living body detection model, so as to obtain a third domain invariant feature of the image to be detected.

The living body detection model is obtained by training by the training method of the living body detection model described above. The image to be detected is an image with a target object, and may be an image with a human face, for example. The method for obtaining the third domain invariant feature is similar to the method for obtaining the second domain invariant feature, and is not described herein again.

In operation S720, the invariant features in the third domain are input into a classification network included in the living body detection model, so as to obtain a category of the target object in the image to be detected. The method for obtaining the class of the target object is similar to the method for obtaining the prediction class, and is not described herein again.

Based on the training method of the feature extraction network provided by the present disclosure, the present disclosure also provides a training device of the feature extraction network, which will be described in detail below with reference to fig. 8.

Fig. 8 is a block diagram of a training apparatus of a feature extraction network according to an embodiment of the present disclosure.

As shown in fig. 8, the example training apparatus 800 may include a first feature extraction module 810, a second feature extraction module 820, a probability obtaining module 830, a first training module 840, and a second training module 850.

The first feature extraction module 810 is configured to input the first sample image into a first extraction sub-network in the feature extraction network to obtain a feature map of the first sample image. Wherein the first sample image has a first label indicating an actual probability that the first sample image is from the plurality of data fields. In an embodiment, the first feature extraction module 810 may be configured to perform the operation S210 described above, which is not described herein again.

The second feature extraction module 820 is configured to input the feature map into a second extraction sub-network in the feature extraction network to obtain a first domain invariant feature of the first sample image. In an embodiment, the second feature extraction module 820 may be configured to perform the operation S220 described above, which is not described herein again.

The probability obtaining module 830 is configured to obtain a first probability and a second probability of the first sample image from the multiple data fields by using a predetermined field classifier based on the feature map and the first field invariant feature, respectively. In an embodiment, the probability obtaining module 830 may be configured to perform the operation S230 described above, and is not described herein again.

The first training module 840 is configured to train the first extraction subnetwork based on the first probability, the actual probability, and a first predetermined weight for the actual probability. In an embodiment, the first training module 840 may be configured to perform the operation S240 described above, which is not described herein again.

The second training module 850 is for training the second extraction subnetwork based on the second probability, the actual probability, and a second predetermined weight for the actual probability. Wherein the first predetermined weight is greater than the second predetermined weight. In an embodiment, the second training module 850 may be configured to perform the operation S250 described above, which is not described herein again.

According to an embodiment of the present disclosure, the feature extraction network further comprises a first normalization subnetwork and a second normalization subnetwork. The probability obtaining module 830 may include a first feature obtaining sub-module, a second feature obtaining sub-module, and a probability obtaining sub-module. The first feature obtaining sub-module is used for inputting the feature map into the first normalization sub-network to obtain a first standard feature. And the second characteristic obtaining submodule is used for inputting the first domain invariant characteristics into a second normalization sub-network to obtain second standard characteristics. And the probability obtaining submodule is used for respectively inputting the first standard characteristic and the second standard characteristic into the predetermined domain classifier to respectively obtain a first probability and a second probability.

According to an embodiment of the present disclosure, the first sample image is plural, and the plural first sample images include: a first sample of target objects having a real category and a second sample of target objects having a non-real category.

According to an embodiment of the disclosure, the second samples comprise a plurality of samples from at least two of the plurality of data fields. The apparatus 800 may further include a first characteristic determining module and a first characteristic determining module. The first feature determination module is configured to determine first domain invariant features of two second samples from the same data domain, to obtain two first target features. The first feature determination module is used for training the feature extraction network by adopting a preset loss function based on a first distance between two first target features. Wherein the predetermined loss function and the first distance are positively correlated with each other.

According to an embodiment of the present disclosure, the first sample is a plurality of samples, and the apparatus 800 may further include a second feature determination module and a fourth training module. The second feature determination module is configured to determine a first domain invariant feature of each of any two first samples in the plurality of first samples, so as to obtain two second target features. And the fourth training module is used for training the feature extraction network by adopting a preset loss function based on the second distance between the two second target features. Wherein the predetermined loss function and the second distance are positively correlated with each other.

According to an embodiment of the present disclosure, the apparatus 800 may further include a fifth training module, configured to train the feature extraction network with a predetermined loss function based on a third distance between the first domain-invariant feature of the first sample and the first domain-invariant feature of the second sample. Wherein the predetermined loss function and the third distance are inversely related to each other.

According to an embodiment of the present disclosure, the apparatus 800 may further include a third feature determination module and a sixth training module. The third feature determination module is configured to determine first domain invariant features of two second samples from different data domains, to obtain two third target features. And the sixth training module is used for training the feature extraction network by adopting a preset loss function based on a fourth distance between the two third target features. Wherein the predetermined loss function and the fourth distance are inversely related to each other.

According to an embodiment of the present disclosure, the predetermined loss function is expressed by the following formula:

wherein the content of the first and second substances,

respectively representing a first domain-invariant feature of a first sample and a first domain-invariant feature of a second sample, or first domain-invariant features of two second samples belonging to different data domains, m being a predetermined boundary value greater than 0.

According to an embodiment of the present disclosure, the first training module 840 is configured to train the first extraction subnetwork if the first probability is obtained based on the first sample. Alternatively, the second training module 850 is configured to train the second extraction subnetwork if the second probability is obtained based on the first sample.

Based on the training method of the living body detection model provided by the present disclosure, the present disclosure also provides a training device of the living body detection model, which will be described in detail below with reference to fig. 9.

Fig. 9 is a block diagram of the configuration of a training apparatus of a living body detection model according to an embodiment of the present disclosure.

As shown in fig. 9, the training apparatus 900 for a living body detection model of this embodiment may include a third feature extraction module 910, a first class obtaining module 920, and a seventh training module 930. The living body detection model comprises a feature extraction network and a classification network. The feature extraction network is obtained by training with the training device of the feature extraction network described above.

The third feature extraction module 910 is configured to input the second sample image into a feature extraction network, so as to obtain a second domain invariant feature of the second sample image. Wherein the second sample image includes the target object and the second sample image has a second label indicating an actual category of the target object, the actual category including a real category or a non-real category. In an embodiment, the third feature extraction module 910 may be configured to perform the operation S510 described above, which is not described herein again.

The first class obtaining module 920 is configured to input the invariant features of the second domain into a classification network, so as to obtain a prediction class of the target object in the second sample image. In an embodiment, the first class obtaining module 920 may be configured to perform the operation S520 described above, which is not described herein again.

The seventh training module 930 is configured to train the in-vivo detection model based on the actual category and the predicted category. In an embodiment, the seventh training module 930 may be configured to perform the operation S530 described above, which is not described herein again.

Based on the living body detection method provided by the present disclosure, the present disclosure also provides a living body detection apparatus, which will be described in detail below with reference to fig. 10.

Fig. 10 is a block diagram of the configuration of a living body detecting apparatus according to an embodiment of the present disclosure.

As shown in fig. 10, the living body detecting apparatus 1000 of this embodiment may include a fourth feature extraction module 1010 and a second category obtaining module 1020.

The fourth feature extraction module 1010 is configured to input the image to be detected into a feature extraction network included in the living body detection model, so as to obtain a third domain invariant feature of the image to be detected. The living body detection model is obtained by training with the training device of the living body detection model described above. In an embodiment, the fourth feature extraction module 1010 may be configured to perform the operation S710 described above, which is not described herein again.

The second class obtaining module 1020 is configured to input the invariant features of the third domain into a classification network included in the living body detection model, so as to obtain a class of the target object in the image to be detected. In an embodiment, the second class obtaining module 1020 may be configured to perform the operation S720 described above, which is not described herein again.

It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, and the like of the personal information of the related user all conform to the regulations of the relevant laws and regulations, and do not violate the common customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 11 shows a schematic block diagram of an example electronic device 1100 that may be used to implement the methods of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the device 1100 comprises a computing unit 1101, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the device 1100 may also be stored. The calculation unit 1101, the ROM 1102, and the RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in device 1100 connect to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, and the like; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108 such as a magnetic disk, optical disk, or the like; and a communication unit 1109 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 can be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 1101 performs the respective methods and processes described above, such as at least one of the following methods: a training method of a feature extraction network, a training method of a living body detection model and a living body detection method. For example, in some embodiments, at least one of the training method of the feature extraction network, the training method of the liveness detection model, and the liveness detection method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When the computer program is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of at least one of the training method of the feature extraction network, the training method of the living body detection model, and the living body detection method described above may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured by any other suitable means (e.g., by means of firmware) to perform at least one of the following methods: a training method of a feature extraction network, a training method of a living body detection model and a living body detection method.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of training a feature extraction network, wherein the feature extraction network comprises a first extraction subnetwork and a second extraction subnetwork; the method comprises the following steps:

inputting a first sample image into the first extraction sub-network to obtain a feature map of the first sample image; wherein the first sample image has a first label indicating an actual probability that the first sample image is from a plurality of data fields;

inputting the feature map into the second extraction sub-network to obtain a first domain invariant feature of the first sample image;

obtaining a first probability and a second probability of the first sample image from the plurality of data fields by adopting a predetermined field classifier based on the feature map and the first field invariant feature respectively;

training the first extraction subnetwork based on the first probability, the actual probability, and a first predetermined weight for the actual probability; and

training the second extraction sub-network based on the second probability, the actual probability, and a second predetermined weight for the actual probability,

wherein the first predetermined weight is greater than the second predetermined weight.

2. The method of claim 1, wherein the feature extraction network further comprises a first normalization subnetwork and a second normalization subnetwork; obtaining a first probability and a second probability that the first sample image is from the plurality of data fields using a predetermined field classifier comprises:

inputting the feature map into the first normalization sub-network to obtain a first standard feature;

inputting the first domain invariant features into the second normalization sub-network to obtain second standard features; and

and respectively inputting the first standard feature and the second standard feature into the predetermined domain classifier to respectively obtain the first probability and the second probability.

3. The method of claim 1, wherein the first sample image is a plurality of first sample images comprising: a first sample of target objects having a real category and a second sample of target objects having a non-real category.

4. The method of claim 3, wherein the second sample comprises a plurality of samples from at least two of the plurality of data fields; the method further comprises the following steps:

determining respective first domain invariant features of two second samples from the same data domain to obtain two first target features; and

training the feature extraction network with a predetermined loss function based on a first distance between the two first target features,

wherein the predetermined loss function and the first distance are positively correlated with each other.

5. The method of claim 3 or 4, wherein the first sample is a plurality, the method further comprising:

determining respective first domain invariant features of any two first samples in the plurality of first samples to obtain two second target features; and

training the feature extraction network with a predetermined loss function based on a second distance between the two second target features,

wherein the predetermined loss function and the second distance are positively correlated with each other.

6. The method of any of claims 3-5, further comprising:

training the feature extraction network with a predetermined loss function based on a third distance between the first domain invariant feature of the first sample and the first domain invariant feature of the second sample,

wherein the predetermined loss function and the third distance are inversely related to each other.

7. The method of claim 4, further comprising:

determining respective first domain invariant features of two second samples from different data domains to obtain two third target features; and

training the feature extraction network with the predetermined penalty function based on a fourth distance between the two third target features,

wherein the predetermined loss function and the fourth distance are inversely related to each other.

8. The method of claim 7, wherein the predetermined loss function is expressed by the following equation:

wherein the content of the first and second substances,

9. The method of claim 3, wherein:

training the first extraction subnetwork comprises: training the first extraction subnetwork if the first probability is obtained based on the first sample; or

Training the second extraction subnetwork comprises: training the second extraction subnetwork if the second probability is obtained based on the first sample.

10. A training method of a living body detection model is provided, wherein the living body detection model comprises a feature extraction network and a classification network; the method comprises the following steps:

inputting a second sample image into the feature extraction network to obtain a second domain invariant feature of the second sample image; wherein the second sample image includes a target object and the second sample image has a second label indicating an actual category of the target object;

inputting the second domain invariant features into the classification network to obtain a prediction category of a target object in the second sample image; and

training the liveness detection model based on the actual class and the predicted class,

wherein the feature extraction network is obtained by training by adopting the method of any one of claims 1-9; the actual category includes a real category or a non-real category.

11. A method of in vivo detection comprising:

inputting an image to be detected into a feature extraction network included in a living body detection model to obtain a third domain invariant feature of the image to be detected; and

inputting the third domain invariant features into a classification network included in the living body detection model to obtain the category of the target object in the image to be detected,

wherein the in vivo test model is trained using the method of claim 10.

12. A training apparatus of a feature extraction network, wherein the feature extraction network comprises a first extraction subnetwork and a second extraction subnetwork; the device comprises:

the first feature extraction module is used for inputting a first sample image into the first extraction sub-network to obtain a feature map of the first sample image; wherein the first sample image has a first label indicating an actual probability that the first sample image is from a plurality of data fields;

the second feature extraction module is used for inputting the feature map into the second extraction sub-network to obtain a first domain invariant feature of the first sample image;

a probability obtaining module, configured to obtain, by using a predetermined domain classifier, a first probability and a second probability that the first sample image is from the multiple data domains based on the feature map and the first domain invariant feature, respectively;

a first training module to train the first extraction subnetwork based on the first probability, the actual probability, and a first predetermined weight for the actual probability; and

a second training module to train the second extraction sub-network based on the second probability, the actual probability, and a second predetermined weight for the actual probability,

13. The apparatus of claim 12, wherein the feature extraction network further comprises a first normalization subnetwork and a second normalization subnetwork; the probability obtaining module comprises:

the first characteristic obtaining sub-module is used for inputting the characteristic diagram into the first normalization sub-network to obtain a first standard characteristic;

the second characteristic obtaining sub-module is used for inputting the first domain invariant characteristics into the second normalization sub-network to obtain second standard characteristics; and

and the probability obtaining submodule is used for respectively inputting the first standard characteristic and the second standard characteristic into the predetermined domain classifier to respectively obtain the first probability and the second probability.

14. The apparatus of claim 12, wherein the first sample image is a plurality of first sample images comprising: a first sample of target objects having a real category and a second sample of target objects having a non-real category.

15. The apparatus of claim 14, wherein the second sample comprises a plurality of samples from at least two of the plurality of data fields; the device further comprises:

the first characteristic determining module is used for determining respective first domain invariant characteristics of two second samples from the same data domain to obtain two first target characteristics; and

a first feature determination module for training the feature extraction network with a predetermined loss function based on a first distance between the two first target features,

16. The apparatus of claim 14 or 15, wherein the first sample is a plurality, the apparatus further comprising:

the second characteristic determining module is used for determining respective first domain invariant characteristics of any two first samples in the plurality of first samples to obtain two second target characteristics; and

a fourth training module for training the feature extraction network with a predetermined penalty function based on a second distance between the two second target features,

17. The apparatus of any of claims 14-16, further comprising:

a fifth training module for training the feature extraction network with a predetermined loss function based on a third distance between the first domain invariant feature of the first sample and the first domain invariant feature of the second sample,

18. The apparatus of claim 15, further comprising:

the third characteristic determining module is used for determining respective first domain invariant characteristics of two second samples from different data domains to obtain two third target characteristics; and

a sixth training module for training the feature extraction network with the predetermined penalty function based on a fourth distance between the two third target features,

19. The apparatus of claim 18, wherein the predetermined loss function is expressed using the following equation:

wherein the content of the first and second substances,

respectively representing a first domain-invariant feature of a first sample and a first domain-invariant feature of a second sample, or of two second samples belonging to different data domains, M being a predetermined boundary value greater than 0.

20. The apparatus of claim 14, wherein:

the first training module is to: training the first extraction subnetwork if the first probability is obtained based on the first sample; or

The second training module is to: training the second extraction subnetwork if the second probability is obtained based on the first sample.

21. A training device of a living body detection model, wherein the living body detection model comprises a feature extraction network and a classification network; the device comprises:

the third feature extraction module is used for inputting a second sample image into the feature extraction network to obtain a second domain invariant feature of the second sample image; wherein the second sample image includes a target object and the second sample image has a second label indicating an actual category of the target object;

the first class obtaining module is used for inputting the second domain invariant features into the classification network to obtain the prediction class of the target object in the second sample image; and

a seventh training module for training the in-vivo detection model based on the actual class and the predicted class,

wherein the feature extraction network is obtained by training by using the device of any one of claims 12-20; the actual category includes a real category or a non-real category.

22. A living body detection apparatus comprising:

the fourth feature extraction module is used for inputting the image to be detected into a feature extraction network included by the living body detection model to obtain a third domain invariant feature of the image to be detected; and

a second category obtaining module, configured to input the third domain invariant feature into a classification network included in the living body detection model to obtain a category of a target object in the image to be detected,

wherein the in-vivo detection model is trained using the apparatus of claim 21.

23. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.

24. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-11.

25. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 11.