CN111553202B

CN111553202B - Training method, detection method and device for neural network for living body detection

Info

Publication number: CN111553202B
Application number: CN202010270821.2A
Authority: CN
Inventors: 杨赟
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-04-08
Filing date: 2020-04-08
Publication date: 2023-05-16
Anticipated expiration: 2040-04-08
Also published as: CN111553202A

Abstract

The invention belongs to the technical field of image recognition, and particularly relates to a training method, a detection method and a device for a neural network for living body detection, wherein the training method comprises the following steps: dividing a total training sample set into a plurality of sub-training sample sets according to scene characteristics, wherein the training sample sets comprise living samples and non-living samples; training living body classification of the sub-neural network by using the training sample set, and establishing an initial total neural network; inputting part of living body samples in the total training sample set to the total neural network and the sub-neural network corresponding to scene characteristics respectively, and obtaining total sample characteristics and sub-sample characteristics of the living body samples; and performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, wherein the trained total neural network is used for performing living body detection. The neural network can accurately detect living objects in different scenes.

Description

Training method, detection method and device for neural network for living body detection

Technical Field

The invention belongs to the technical field of image recognition, and particularly relates to a training method, a detection method and a detection device for a neural network for living body detection.

Background

Biometric identification technology, particularly face recognition, has achieved tremendous progress and advancement in recent years, such as attendance systems, cell phone unlocking, face-brushing payment, and the like. However, most of the face recognition systems at present do not perform living detection and are therefore easily spoofed by photographs or videos. Living detection is generally defined as detecting whether a given face is a genuine or counterfeit person, such as a printed face photograph, a face in video, or a 3D face mask. Since living detection has a great impact on the security of face recognition systems, many living detection algorithms have been proposed, however, the difficulty of living detection across scenes is still great.

Disclosure of Invention

The invention mainly solves the technical problem of cross-scene living body detection and provides a training method, a living body detection method and a computer device for a neural network for living body detection.

In order to solve the technical problems, the invention adopts a technical scheme that: provided is a training method of a neural network for living body detection, the training method comprising:

dividing a total training sample set into a plurality of sub-training sample sets according to scene characteristics, wherein the sub-training sample sets comprise living samples and non-living samples;

training living body classification of the sub-neural network by using the sub-training sample set, and establishing an initial total neural network;

inputting part of living body samples in the total training sample set into a total neural network and a sub-neural network corresponding to scene characteristics respectively, and obtaining total sample characteristics and sub-sample characteristics of the living body samples;

and performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, wherein the trained total neural network is used for performing living body detection.

The method for acquiring the total sample characteristics and the sub-sample characteristics of the living sample comprises the steps of:

acquiring a characteristic diagram of calculation output of each layer of the neural network on a living body sample;

and fusing the obtained feature images, and carrying out weighted summation on the fused feature images to obtain sample features.

Before fusing the obtained feature graphs, the training method further comprises the following steps: and upsampling the feature images calculated and output from the second layer to the last layer in the neural network.

Further, the obtained feature images are fused, and the fused feature images are subjected to weighted summation to obtain sample features, wherein the obtained total sample features and the sub-sample features have different weighted weights.

Wherein the scene features include illumination features and/or pose features.

Specifically, the in-vivo classification of the sub-neural network is trained using a sub-training sample set, wherein the loss function of the sub-neural network comprises in-vivo classification loss.

Specifically, using the total sample feature and the sub-sample feature to perform countermeasure training on the scene feature classifier and the total neural network, wherein a loss function of the total neural network comprises living body classification loss and generation loss of the total sample feature in the scene feature classifier; the loss function of the scene feature classifier includes a classification loss belonging to the scene feature.

The method comprises the steps of respectively inputting part of living samples in a total training sample set into a total neural network and a sub-neural network corresponding to scene characteristics, and obtaining total sample characteristics and sub-sample characteristics of the living samples; performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, including:

acquiring a unit sample set in a total training sample set, respectively inputting living samples in the unit sample set into a total neural network and a sub-neural network corresponding to scene characteristics, and acquiring total sample characteristics and sub-sample characteristics of the living samples;

performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features;

and repeating the processes of obtaining the unit sample set and the countermeasure training until the repetition times reach the preset times.

The invention also comprises a second technical scheme, namely a living body detection method, which comprises the following steps:

inputting the object to be detected into the trained neural network, and outputting whether the object to be detected is a living body or a non-living body; the trained neural network is trained by the training method.

The present invention also includes a third technical solution, a computing device, including at least one processing unit and at least one storage unit, where the storage unit stores a computer program, and when the program is executed by the processing unit, causes the processing unit to execute the steps of the living body detection method described above.

The beneficial effects of the invention are as follows: in comparison with the prior art, the training method for the neural network for living body detection according to the embodiment of the invention utilizes the total sample feature and the sub-sample feature to conduct countermeasure training on the scene feature classifier and the total neural network, so that the common feature of the sub-neural network from the total neural network and the corresponding scene feature can be learned, and the common feature is the distinguishing feature of living body and non-living body. The neural network provided by the embodiment of the invention can learn common characteristics of different scene data from the data of the sub-training sample sets of different scene characteristics through countermeasure training, so that the neural network can be applied to distinguishing living bodies from non-living bodies across scenes. In the living body detection, the living body is a closed set, and the non-living body is an open set, and the living body sample is only calculated when the countermeasure training is carried out, so that the noise of the non-living body operation can be reduced, and the distinguishing effect of the scene characteristic classifier on the living body sample is improved. The neural network training method is not influenced by scene characteristics when applied to living body detection, and can improve the robustness of living body detection.

Drawings

FIG. 1 is a block diagram of a prior art adaptive embodiment;

FIG. 2 is a schematic diagram showing the steps of a neural network training method for performing in vivo detection according to an embodiment of the present invention;

FIG. 3 is a training architecture diagram of one embodiment of a sub-neural network of the present invention;

FIG. 4 is a training architecture diagram of one embodiment of the present invention of a general neural network;

FIG. 5 is a schematic view showing the steps of one embodiment of the invention for obtaining the total sample feature and the sub-sample feature of a living sample;

FIG. 6 is a schematic view of steps of another embodiment of the invention for obtaining total sample features and sub-sample features of a living sample;

FIG. 7 is a schematic diagram of a feature fusion module of the present invention;

FIG. 8 is a schematic diagram illustrating steps of another embodiment of a neural network training method for performing in vivo detection according to the present invention;

FIG. 9 is a schematic diagram showing steps of performing a biopsy according to an embodiment of the present invention;

FIG. 10 is a block diagram of a computing device embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and more specific, the present invention will be described in further detail below with reference to the accompanying drawings and examples.

When living body detection is performed, the influence of scenes such as illumination, living body posture and imaging quality is large, and it is difficult to cope with living body detection across scenes. The effect of the living body detection algorithm is improved by learning common characteristics obeying different data distributions of living bodies and non-living bodies, the loss of the similarity of the characteristics extracted by a characteristic extractor such as a deep neural network from the data of the different data distributions is generally increased besides the classification loss, and the model can learn the common characteristics of the different data distributions by optimizing the classification loss and the similarity loss, so that the generalization performance of the living body detection algorithm is improved.

The deep neural network performs well after extensive in-vivo and out-of-vivo labeled data training, however due to field drift problems, the deep neural network can degrade substantially when applied to data that has not been seen, for example, when a cat and dog classifier trained using a Gaffy cat and a Husky sees other species of dogs. One solution is fine tuning, i.e., retraining the currently trained classifier as a pre-training parameter on new data, but fine tuning is not used when the new data has no tag; another solution is to employ domain adaptation in which the source domain is defined as having n _s Sample set of individual samples

The target domain has n _t Sample set of individual samples

Sample sampling from the source and target domains is distributed from a union P (X ^s ,Y ^s ) Q (X) ^t ,Y ^t ) And p+.q. Because of the different distributions of the source domain and the target domain, if the labeled data of the source domain is used for training the deep neural network, the performance on the target domain is greatly reduced, and the field adaptation is realized by utilizing the unlabeled dataThe signed target domain data is used for improving the performance of the deep neural network on the target domain, and the general structure of the domain adaptation is shown in fig. 1.

The field adaptation is to simultaneously perform feature extraction on the data of the source domain and the target domain through the convolutional neural network CNN in training, and since the purpose of the field adaptation is to ensure that the feature extracted by the convolutional neural network CNN is the common feature of the source domain and the target domain, the convolutional neural network CNN extracts the feature sharing parameters of the source domain and the target domain, and only the common feature is extracted to improve the performance on the target domain, so that the convolutional neural network CNN needs to minimize the distance between the features of the source domain and the target domain in addition to optimizing the classification loss of the source domain. The characteristic is judged to be from the source domain or the target domain by a discriminator, and the purpose of the convolutional neural network CNN is to cheat the discriminator, so that the discriminator also discriminates the characteristic of the source domain as the characteristic of the target domain, and the convolutional neural network CNN can be considered to learn the common characteristic of the source domain and the target domain because the discriminator cannot distinguish the source domain and the target domain at the moment.

The embodiment of the invention provides a training method of a neural network for living body detection, which is shown in fig. 2 and comprises the following steps:

step 110: the total training sample set is divided into a plurality of sub-training sample sets according to scene features, wherein the sub-training sample sets comprise living samples and non-living samples.

In the embodiment of the invention, scene features are light features and gesture features, for example, the light features comprise indoor features and outdoor features; or the light characteristics comprise the characteristics of cloudy days, rainy days, sunlight in the early morning, sunlight in the afternoon, sunlight in the evening, and the like; the gesture features can be the features of a front face, a left side face, a right side face and the like, and the scene features of the embodiment of the invention combine and distribute the light features and the gesture features, such as an indoor front face, an indoor left side face, an indoor right side face, an outdoor front face, an outdoor left side face and an outdoor right side face. In other embodiments, the scene features may be just ray features or gesture features.

In the embodiment of the invention, the total training sample set packageThe method comprises a plurality of face images, wherein the face images comprise living face images and non-living face images, and the face images are divided into N different sub-training sample sets N according to light characteristics and gesture characteristics ₁ ～N _n . As shown in FIG. 2, the data of the total training sample set is (X, Y), and the data of the training sample set divided into a plurality of sub-training sample sets according to scene characteristics is

In other embodiments, the total training sample set may also include a plurality of animal images.

Continuing with FIG. 2, step 120: and training living body classification of the sub-neural network by using the sub-training sample set, and establishing an initial total neural network.

Specifically, in the embodiment of the present invention, as shown in fig. 3, for each sub-training sample set of data

Training a corresponding sub-neural network M _i A network. Wherein each sub training sample set trains in the scene characteristic range, such as a face image set N belonging to indoor face ₁ Data of->

The method comprises the steps of inputting data into a convolution layer, wherein the convolution layer is a plurality of Conv, and carrying out convolution operation on data of a sub-training sample set, namely a face image set of an indoor face, and the convolution layer is a feature extractor for extracting features; and input into a full-connection layer, wherein the full-connection layer is provided with two fc, and the two fc are classified into living bodies and non-living bodies to form a corresponding sub-neural network M ₁ A network. Face image set N belonging to outdoor face ₂ Data of->

The data is input into a convolution layer, the convolution layer is a plurality of Conv, the convolution operation is carried out on the data of a sub-training sample set, namely the face image set of the outdoor face, and the data is input into a full connection layer, wherein the full connection layer is provided with two fc and carries out the convolution operation on the dataTwo classifications of living and non-living bodies are performed to form a corresponding sub-neural network M ₂ A network. Face image set N of another scene feature _n Data of->

The data are input into a convolution layer, the convolution layer is a plurality of Conv, the data of the sub training sample set are subjected to convolution operation and are input into a full connection layer, the full connection layer is two fc, and the two classification of living bodies and non-living bodies is carried out on the full connection layer to form a corresponding sub neural network M _n A network, etc.

The loss function of the sub-neural network includes a living body classification loss, and in the embodiment of the invention, the loss function of the sub-neural network is a loss function on a classification intersection:

wherein L represents a two-class cross entropy loss of the sub-neural network, c represents a batch size, y _j A living or non-living label, p, of the jth sample _j Is the probability that the jth sample is a living body;

the method comprises the steps of establishing an initial total neural network G network, performing two categories of living bodies and non-living bodies of training data (X, Y) of a total training sample set to form the initial total neural network G network, for example, performing a plurality of face images, wherein the face images comprise living body samples and non-living body samples of various scene characteristics such as indoor front faces, indoor left faces, indoor right faces, outdoor front faces, outdoor left faces, outdoor right faces and the like, and performing the category of the living bodies and the non-living bodies to form the initial total neural network G network.

Continuing with FIG. 2, step 130: and respectively inputting part of the living body samples in the total training sample set into the total neural network and the sub-neural network corresponding to the scene characteristics, and obtaining the total sample characteristics and the sub-sample characteristics of the living body samples.

Specifically, in the embodiment of the present invention, as shown in fig. 4, for the total training samplesAll living body samples in the set are respectively input into a total neural network G network and a sub-neural network M corresponding to scene characteristics _i Obtaining a total sample feature F of a living sample _G And subsample feature F _M . In other embodiments, part of the living body samples in the total training sample set may be input to the total neural network G and the sub-neural network M corresponding to the scene feature _i Network for obtaining total sample characteristics F of living body samples _G And subsample feature F _M 。

More specifically, in the embodiment of the present invention, a living sample data in a unit sample set is selected from the data (X, Y) of the total training sample set, and the data is determined according to the scene characteristics

Input to a sub-neural network M corresponding to scene features _i Network, i.e. M input to the sub-neural network _i Feature extraction is carried out on a convolution layer conv of the network to obtain sub-sample features F _M The method comprises the steps of carrying out a first treatment on the surface of the Selecting living sample data in a unit sample set from the total training sample set, inputting the data (x, y) into a total neural network G network, namely inputting into a convolution layer conv for feature extraction to obtain total sample features F _G 。

Continuing with FIG. 2, step 140: and performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, wherein the trained total neural network is used for performing living body detection.

Wherein, the countermeasure training is to judge that the feature is the total sample feature F through a scene feature classifier D classifier _G Or subsampled feature F _M When the scene feature classifier D cannot distinguish that the feature is the total sample feature F _G Or subsampled feature F _M I.e. it is not possible to distinguish whether the features originate from the general neural network G network or from the sub-neural network M _i At this time, the total neural network G network can be considered to learn common characteristics among different scene characteristics, and the countermeasure training is realized. As shown in fig. 3, in particular, the total sample feature F _G Sum subsamplesFeature F _M Input to and sub-neural network M _i Scene feature classifier D corresponding to network _i Feature classification is carried out in the classifier, and a scene feature classifier D _i The classifier is used for judging whether the input features belong to the sub-neural network M _i Sub-training sample set of scene characteristics corresponding to network, when total sample characteristics F generated by total neural network G network _M When the scene characteristic classifier D classifier can be spoofed, the common characteristics among samples of different scene characteristics are learned by the total neural network G network.

According to the training method for the neural network for living body detection, the scene feature classifier and the total neural network are subjected to countermeasure training by using the total sample features and the sub-sample features, the common features of the sub-neural network from the total neural network and the corresponding scene features can be learned, the common features in the embodiment of the invention are the distinguishing features of living bodies and non-living bodies, the distinguishing features of the living bodies and the non-living bodies are all existing, the features are not any one of LBP (local binary pattern), HOG (capture outline information), SURF (speeded up with reference) and other known feature extraction methods, but are an unknown universally existing texture feature, the distinguishing features are not constrained by any scene, the neural network of the embodiment of the invention can learn the common features of different scene data from the sub-training sample sets of different scene features through countermeasure training, namely the distinguishing features of the living bodies and the non-living bodies, the distance between the total sample classification set and the sub-sample classification set can be minimized, and the robustness of the living body detection is improved. The trained neural network can be used for extracting distinguishing characteristics of living bodies and non-living bodies without being limited by scenes, so that the trained neural network can be used for performing living body detection to overcome the limitation of living body recognition technology influenced by scene characteristics. In addition, the embodiment of the invention only calculates the living body sample when performing the countermeasure training, because living bodies are a closed set in living body detection, but not an open set in living body detection, the category of the non-living bodies can be printed paper or video or 3D facial mask, and can also be any object similar to a human face, and the objects generally have no common characteristics. If similarity calculations are also performed on these negative samples, this may result in a reduced distinguishing effect of the scene feature classifier on the living samples, as these non-living samples may be regarded as noise, which may interfere with the normal data distribution. The neural network training method is not influenced by scene characteristics when applied to living body detection, and can improve the robustness of living body detection.

As a further preferred embodiment of the present invention, step 130: inputting part of the living body samples in the total training sample set to a total neural network and a sub-neural network corresponding to the scene characteristics respectively, and obtaining the total sample characteristics and the sub-sample characteristics of the living body samples, as shown in fig. 5, including:

step 131: and obtaining a characteristic map of calculation output of each layer of the neural network on the living body sample.

Wherein the neural network is a total neural network G network or a sub-neural network M _i A network.

Step 132: and fusing the obtained feature images, and carrying out weighted summation on the fused feature images to obtain sample features.

The embodiment of the invention is used for carrying out the operation on the total neural network G network and the sub-neural network M _i When the network extracts the characteristics, the characteristic fusion is carried out, the convergence difficulty of the scene characteristic classifier D network is reduced, so that the scene characteristic classifier D network can finish the countermeasure learning, the common characteristics among different sub-sample training sets are learned, the total neural network G network can be subjected to field self-adaptive training, the influence of the scene characteristics can be avoided when the living body recognition is carried out, and the robustness of the living body recognition is improved.

More preferably, as shown in fig. 6, before the step 132 of fusing the acquired feature images, the training method further includes:

step 1311: and upsampling the feature images calculated and output from the second layer to the last layer in the neural network. The process of fusing the obtained feature graphs is shown in fig. 7, the feature graphs output by the last convolution layer Conv from the second layer Stage2 to the last layer Stage n are up-sampled to make the feature graphs have the same size as the feature graphs of the first layer Stage1, then the feature graphs of the last convolution layer of all the neural network layers are weighted, namely, the feature graphs are stacked in the channel dimension, and then the fused features are weighted and summed through an SE (sequential-Extract) module to output fused features.

The distinguishing characteristic of living bodies and non-living bodies is generally regarded as a texture characteristic, and the calculation of the characteristic similarity in the existing field-adaptive-based living body detection method uses high-level semantic characteristics, so that the training difficulty of a scene characteristic classifier D network is increased, the scene characteristic classifier D network is difficult to converge, and the total neural network G network loses the common characteristics among sub-training samples under different scene characteristics due to the fact that the total neural network G network loses the countermeasure learning of the scene characteristic classifier D network, and the superiority of the field-adaptive method is lost. As shown in FIG. 3, the embodiment of the invention is implemented on the total neural network G network and the sub-neural network M _i When the network extracts the features, the feature fusion module is used for carrying out feature fusion, the extracted features are used for carrying out fusion of high-level semantic features and low-level texture features, and the fused high-level features and low-level features are weighted according to the feature fusion module SE module, so that the model can learn the importance of the high-level features and the low-level features according to training data, and the convergence difficulty of a scene feature classifier D network is reduced.

In the embodiment of the present invention, step 130 inputs part of the living samples in the total training sample set to the total neural network G network and the sub-neural network M corresponding to the scene feature respectively _i A network for fusing the features mentioned by the feature extractor through

steps

131, 1311 and 132 to obtain a total sample feature F of the obtained living body sample _G And subsample feature F _M . Wherein, the total sample characteristic F is obtained _G And subsample feature F _M Is different from the weighting of the (c). Total neural network G network and sub-neural network M _i The weights are not shared.

Specifically, in the embodiment of the invention, data (x, y) of part of living body samples in the total training sample set are input into a total neural network G network for forward propagation to obtain a classification output probability p, and a characteristic extractor of the total neural network G network is providedThe extracted features are fused to obtain a total sample feature F _G 。

Specifically, step 140 performs countermeasure training on the scene feature classifier and the total neural network by using the total sample feature and the sub-sample feature, wherein the loss function of the total neural network includes living body classification loss and generation loss of the total sample feature in the scene feature classifier. Wherein the living body classification loss function is a classification cross entropy loss:

wherein L is _{classification} Represents a two-class cross entropy loss, c represents a batch size, y _j A living or non-living label, p, of the jth sample _j Is the probability that the jth sample is a living body;

generation loss of total sample features in the scene feature classifier:

wherein L is _GAN Represents c _k Representing the batch size of the kth subsampled training set, D is the scene feature classifier, F _G Is the characteristic generated by the total neural network G network;

loss function of the total neural network: l=l _{classification} +L _GAN 。

The loss function of the scene feature classifier comprises a scene feature classification loss:

wherein c represents the batch size, D is a scene feature classifier, F _G Is the characteristic of the generation of the total neural network G network, F _M Is a characteristic of the generation of the sub-neural network.

The above is a loss function of the sub-neural network, a loss of the total neural network and a loss function of the scene feature classifier in the embodiment of the invention, and in the embodiment of the invention, the neural network is required to optimize the loss of the sub-neural network, the loss of the total neural network and the loss of the scene feature classifier when training the neural network, so as to improve the performance of the extracted common features on the total neural network, thereby improving the robustness of the total neural network in living body detection.

Wherein, step 130: inputting part of living body samples in the total training sample set into a total neural network and a sub-neural network corresponding to scene characteristics respectively, and obtaining total sample characteristics and sub-sample characteristics of the living body samples; step 140: using the total sample features and the sub-sample features, performing countermeasure training on the scene feature classifier and the total neural network, as shown in fig. 8, includes:

step 130': acquiring a unit sample set in a total training sample set, respectively inputting living samples in the unit sample set into a total neural network and a sub-neural network corresponding to scene characteristics, and acquiring total sample characteristics and sub-sample characteristics of the living samples;

step 140': performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features;

the process of obtaining the cell sample set of step 130 'and the challenge training of step 140' is repeated until the number of repetitions reaches a preset number.

The number of repetitions reaches a preset number of times, step 150': the training is ended.

Further, before step 130 'and step 140', the steps of:

step 121': initializing the preset times of the total neural network and the times=0;

step 122': judging whether the repetition number i is smaller than a preset number;

if not, step 160': the training is finished;

if yes, go to step 130', step 140' and step 150', step 150': the repeat times of the total neural network are propagated in a gradient reverse way, and the repeat times i+ =1 are updated;

loop to step 122'.

According to the embodiment of the invention, the unit sample set is used for iterative training, so that the loss of the total neural network G network and the loss function of the scene feature classifier D network are minimized.

Specifically, as shown in fig. 9, the living body detection method in the embodiment of the invention specifically includes:

step 210: inputting an object to be detected;

step 220: sending the obtained product into a trained total neural network to obtain a living probability value P;

step 230: it is judged whether the obtained probability value P is greater than 0.5.

If yes, step 240 outputs the result as living body;

if not, step 250 outputs a result of non-living.

In the embodiment of the present invention, whether the probability value P is greater than 0.5 is determined, and in other embodiments, whether the probability value P is greater than a value such as 0.4, 0.6, or 0.7 may be determined.

The living body detection method provided by the embodiment of the invention is not influenced by scene characteristics such as light rays, gestures and the like, can distinguish living bodies according to distinguishing characteristics of living bodies and non-living bodies, can be used for face detection, can respectively detect true faces or fake faces, and has higher robustness of face living body detection.

The present invention further includes a third technical solution, as shown in fig. 10, a computing device 300, including at least one processing unit 310 and at least one storage unit 320, where the storage unit 320 stores a computer program, and when the program is executed by the processing unit, causes the processing unit 310 to execute the steps of the living body detection method described above.

The computer apparatus 300 may also include a power component configured to perform power management of the computer device, a wired or wireless network interface configured to connect the device to a network, and an input output (I/O) interface. The device may operate based on an operating system stored in memory, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.

The foregoing description is only of embodiments of the present invention, and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present invention or directly or indirectly applied to other related technical fields are included in the scope of the present invention.

Claims

1. A method of training a neural network for in vivo detection, the method comprising:

training living body classification of a sub-neural network by using the sub-training sample set, and establishing an initial total neural network, wherein the total neural network is formed based on the total training sample set;

inputting part of living body samples in the total training sample set to the total neural network and the sub-neural network corresponding to scene characteristics respectively, and obtaining total sample characteristics and sub-sample characteristics of the living body samples;

and performing antibody training on the scene feature classifier corresponding to each sub-neural network and the total neural network by using the total sample features and the sub-sample features so that the scene feature classifier cannot distinguish whether the features are the total sample features or the sub-sample features, wherein the trained total neural network is used for performing living body detection.

2. The training method according to claim 1, wherein the inputting the partial living body samples in the total training sample set into the total neural network and the sub-neural network corresponding to scene features, respectively, obtains total sample features and sub-sample features of the living body samples, includes:

acquiring a characteristic diagram of each layer in the neural network, which is calculated and output on the living body sample;

3. Training method according to claim 2, characterized in that before fusing the acquired feature images, the method further comprises:

and upsampling the feature images calculated and output from the second layer to the last layer in the neural network.

4. The training method according to claim 2, wherein fusing the acquired feature maps and weighting and summing the fused feature maps to obtain sample features comprises:

and obtaining the weighting weights of the total sample characteristic and the sub-sample characteristic to be different.

5. Training method according to claim 1, characterized in that the scene features comprise illumination features and/or gesture features.

6. The training method of claim 1, wherein training the in vivo classification of the sub-neural network using the sub-training sample set comprises:

wherein the loss function of the sub-neural network comprises a living sort loss.

7. The training method of claim 1, wherein using the total sample feature and the sub-sample feature to counter-train a scene feature classifier and a total neural network comprises:

the loss function of the total neural network comprises living body classification loss and generation loss of the total sample characteristics in the scene characteristic classifier;

the loss function of the scene feature classifier includes a classification loss belonging to the scene feature.

8. The training method according to claim 1, wherein a part of living samples in the total training sample set are respectively input into the total neural network and the sub-neural network corresponding to scene features, and total sample features and sub-sample features of the living samples are obtained; using the total sample features and the sub-sample features, performing countermeasure training on the scene feature classifier and the total neural network, including:

acquiring a unit sample set in the total training sample set, respectively inputting living samples in the unit sample set into a total neural network and a sub-neural network corresponding to scene characteristics, and acquiring total sample characteristics and sub-sample characteristics of the living samples;

9. A method of in vivo detection, the method comprising:

inputting the object to be detected into the trained neural network, and outputting whether the object to be detected is a living body or a non-living body; wherein the trained neural network is trained by the training method of any one of claims 1-8.

10. A computing device comprising at least one processing unit and at least one storage unit, the storage unit storing a computer program which, when executed by the processing unit, causes the processing unit to perform the steps of the living detection method of claim 9.