CN111553202A

CN111553202A - Training method, detection method and device of neural network for detecting living body

Info

Publication number: CN111553202A
Application number: CN202010270821.2A
Authority: CN
Inventors: 杨赟
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-04-08
Filing date: 2020-04-08
Publication date: 2020-08-18
Anticipated expiration: 2040-04-08
Also published as: CN111553202B

Abstract

The invention belongs to the technical field of image recognition, and particularly relates to a training method, a detection method and a device of a neural network for in vivo detection, wherein the training method comprises the following steps: dividing a total training sample set into a plurality of sub-training sample sets according to scene characteristics, wherein the training sample sets comprise living samples and non-living samples; training the living body classification of the sub-neural network by using the training sample set to establish an initial total neural network; respectively inputting part of living body samples in the total training sample set into the total neural network and the sub-neural network corresponding to scene features to obtain total sample features and sub-sample features of the living body samples; and performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, wherein the trained total neural network is used for performing in vivo detection. The neural network can accurately detect the living targets in different scenes.

Description

Training method, detection method and device of neural network for detecting living body

Technical Field

The invention belongs to the technical field of image recognition, and particularly relates to a training method, a detection method and a device of a neural network for in-vivo detection.

Background

The biometric technology, especially the face recognition, has been developed and advanced greatly in recent years, such as attendance system, mobile phone unlocking, face-brushing payment, etc. However, most of the current face recognition systems do not perform live body detection, and thus are easily spoofed by photos or videos. Liveness detection is generally defined as detecting whether a given face is a real person or a fake person, such as a printed face photograph, a face in video, or a 3D face mask. Since the importance of liveness detection on the safety of a face recognition system is great, many liveness detection algorithms have been proposed, but the cross-scene liveness detection difficulty is still great.

Disclosure of Invention

The invention mainly solves the technical problem of cross-scene in-vivo detection, and provides a neural network training method for in-vivo detection, an in-vivo detection method and a computer device.

In order to solve the technical problems, the invention adopts a technical scheme that: provided is a training method of a neural network for performing in-vivo detection, the training method including:

dividing the total training sample set into a plurality of sub-training sample sets according to scene characteristics, wherein the sub-training sample sets comprise living samples and non-living samples;

training the living body classification of the sub-neural network by using a sub-training sample set, and establishing an initial total neural network;

respectively inputting part of the living body samples in the total training sample set into a total neural network and a sub-neural network corresponding to the scene characteristics to obtain the total sample characteristics and the sub-sample characteristics of the living body samples;

and performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, wherein the trained total neural network is used for performing in vivo detection.

Wherein, partial living body samples in the total training sample set are respectively input into a total neural network and a sub-neural network corresponding to the scene characteristics, and the total sample characteristics and the sub-sample characteristics of the living body samples are obtained, which comprises the following steps:

acquiring a feature map which is calculated and output by each layer of the neural network to the living body sample;

and fusing the obtained feature maps, and performing weighted summation on the fused feature maps to obtain sample features.

Before fusing the acquired feature maps, the training method further comprises the following steps: and (4) up-sampling feature maps output by computing from the second layer to the last layer in the neural network.

And further, fusing the obtained feature maps, and performing weighted summation on the fused feature maps to obtain sample features, wherein the obtained total sample features and the obtained sub-sample features have different weighted weights.

Wherein the scene features include lighting features and/or pose features.

Specifically, a live body classification of a sub neural network is trained using a sub training sample set, wherein a loss function of the sub neural network includes a live body classification loss.

Specifically, a scene feature classifier and a total neural network are subjected to countermeasure training by using total sample features and sub-sample features, wherein a loss function of the total neural network comprises living body classification loss and generation loss of the total sample features in the scene feature classifier; the penalty function of the scene feature classifier includes a penalty attributed to the classification of the scene feature.

Respectively inputting part of living body samples in the total training sample set into a total neural network and a sub-neural network corresponding to the scene characteristics, and acquiring the total sample characteristics and the sub-sample characteristics of the living body samples; carrying out countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, wherein the method comprises the following steps:

acquiring a unit sample set in a total training sample set, and respectively inputting living samples in the unit sample set into a total neural network and a sub-neural network corresponding to scene characteristics to acquire total sample characteristics and sub-sample characteristics of the living samples;

performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features;

and repeating the process of acquiring the unit sample set and the countertraining until the repetition times reach the preset times.

The invention also comprises a second technical scheme, and a living body detection method comprises the following steps:

inputting the object to be detected into the trained neural network, and outputting the object to be detected as a living body or a non-living body; wherein, the trained neural network is obtained by the training of the training method.

The present invention also includes a third technical means, a computing device comprising at least one processing unit and at least one storage unit, the storage unit storing a computer program which, when executed by the processing unit, causes the processing unit to perform the steps of the above-described living body detecting method.

The invention has the beneficial effects that: different from the situation in the prior art, the training method of the neural network for living body detection according to the embodiment of the present invention performs countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, and can learn the common features from the total neural network and the sub-neural networks corresponding to the scene features, where the common features are distinctive features of a living body and a non-living body. The neural network of the embodiment of the invention can learn common characteristics of different scene data from data of sub-training sample sets with different scene characteristics through countertraining, thereby being applied to distinguishing living bodies and non-living bodies across scenes. In the living body detection, the living body is a closed set instead of an open set, and the embodiment of the invention only calculates the living body sample during the confrontation training, so that the noise of non-living body operation can be reduced, and the distinguishing effect of the scene feature classifier on the living body sample is improved. The training method of the neural network is applied to the detection of the living body, is not influenced by scene characteristics, and can improve the robustness of the detection of the living body.

Drawings

FIG. 1 is a block diagram of one embodiment of a prior art adaptation;

FIG. 2 is a schematic diagram illustrating the steps of an embodiment of a neural network training method for performing in-vivo detection according to the present invention;

FIG. 3 is a training block diagram of an embodiment of a sub-neural network of the present invention;

FIG. 4 is a training architecture of an embodiment of the overall neural network of the present invention;

FIG. 5 is a schematic diagram of one embodiment of the present invention for obtaining total sample characteristics and sub-sample characteristics of a live sample;

FIG. 6 is a schematic diagram illustrating another embodiment of the steps for obtaining the total sample characteristic and the sub-sample characteristic of the live sample according to the present invention;

FIG. 7 is a schematic view of a feature fusion module of the present invention;

FIG. 8 is a schematic diagram illustrating the steps of a neural network training method for performing in-vivo detection according to another embodiment of the present invention;

FIG. 9 is a schematic diagram illustrating the steps of one embodiment of the in vivo assay of the present invention;

FIG. 10 is a block diagram of an embodiment of a computing device.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples.

When living body detection is carried out, the influence of scenes, such as illumination, living body posture and imaging quality, is large, and living body detection crossing scenes is difficult to deal with. The effect of the in-vivo detection algorithm is improved by learning the common characteristics obeying different data distributions of living bodies, non-living bodies and the like, the loss of the similarity of the characteristics of the data with different data distributions extracted by a characteristic extractor such as a deep neural network is usually added besides the classification loss, the model can learn the common characteristics of different data distributions by optimizing the classification loss and the similarity loss, and the generalization performance of the in-vivo detection algorithm is improved.

The deep neural network has better performance after being trained by large-scale live and non-live labeled data, but due to the problem of domain drift, the performance is greatly reduced when the deep neural network is applied to unseen data, for example, when cat and dog classifiers trained by hardy and hardy meet other varieties of cats and dogs. One solution is fine tuning, i.e., the currently trained classifier is retrained as a pre-training parameter on new data, however, when the new data has no label, fine tuning cannot be used; another solution is to use domain adaptation, in which domain adaptation is definedThe source domain is n_sSample set of individual samples

The target domain is n_tSample set of individual samples

Self-joining distribution P (X) of sample sampling of source domain and target domain^s,Y^s) And Q (X)^t,Y^t) And P ≠ Q. Due to different distribution of the source domain and the target domain, if the deep neural network is trained by simply using the labeled data of the source domain, the performance on the target domain will certainly be greatly reduced, the performance of the deep neural network on the target domain is improved by using the unlabeled target domain data in the domain adaptation, and the general structure of the domain adaptation is shown in fig. 1.

The method comprises the steps that a convolutional neural network CNN is used for simultaneously extracting features of data of a source domain and data of a target domain in training, and the aim of the domain adaptation is to ensure that the features extracted by the convolutional neural network CNN are common features of the source domain and the target domain, wherein the convolutional neural network CNN extracts feature sharing parameters of the source domain and the target domain, and the performance of the convolutional neural network CNN on the target domain can be improved only by extracting the common features, so that the convolutional neural network CNN needs to minimize the distance between the features of the source domain and the target domain besides optimizing the classification loss of the source domain. The feature is judged to be from a source domain or a target domain through a discriminator, and the convolutional neural network CNN aims to cheat the discriminator, so that the discriminator judges the feature of the source domain as well as the feature of the target domain, and at this time, the convolutional neural network CNN can be considered to learn the common feature of the source domain and the target domain, because the discriminator cannot distinguish the source domain from the target domain.

An embodiment of the present invention provides a training method for a neural network for performing in-vivo detection, as shown in fig. 2, the training method includes:

step 110: and dividing the total training sample set into a plurality of sub-training sample sets according to the scene characteristics, wherein the sub-training sample sets comprise living samples and non-living samples.

In the embodiment of the present invention, the scene features are light features and gesture features, for example, the light features include indoor features and outdoor features; or the light characteristics include characteristics of cloudy days, rainy days, early morning sunlight, midday sunlight, late evening sunlight, and the like; the gesture features can be features of a front face, a left side face, a right side face and the like, and the scene features of the embodiment of the invention distribute the light features and the gesture features in a self-combination mode, such as an indoor front face, an indoor left side face, an indoor right side face, an outdoor front face, an outdoor left side face and an outdoor right side face. In other embodiments, the scene features may be only light features or pose features.

In the embodiment of the invention, the total training sample set comprises a plurality of face images, wherein the plurality of face images comprise living face images and non-living face images, and the plurality of face images are divided into N different sub-training sample sets N according to light characteristics and posture characteristics₁～N_n. As shown in fig. 2, the data of the total training sample set is (X, Y), and the data of the sub-training sample set divided into a plurality of sub-training sample sets according to scene features is (X, Y)

In other embodiments, the total training sample set may also include a plurality of animal images.

Continuing as shown in FIG. 2, step 120: and training the living body classification of the sub-neural network by using the sub-training sample set to establish an initial total neural network.

Specifically, in the embodiment of the present invention, as shown in fig. 3, the data of each sub-training sample set is

Training the corresponding sub-neural network M_iA network. Wherein, each sub-training sample set is trained in the scene characteristic range, for example, the face image set N belonging to the indoor front face₁Data of (2)

Input to the rollThe convolution layer is a plurality of Conv and is used for performing convolution operation on data of a sub-training sample set, namely a face image set of an indoor front face, wherein the convolution layer is a feature extractor and is used for extracting features; and inputting the signals into a full connection layer, wherein the full connection layer comprises two fc, and the fc is subjected to two classifications of living bodies and non-living bodies to form a corresponding sub-neural network M₁A network. A human face image set N belonging to an outdoor front face₂Data of (2)

Inputting the data into a convolutional layer, wherein the convolutional layer is a plurality of Conv, carrying out convolution operation on the data of a sub-training sample set, namely an outdoor face image set, and inputting the data into a full connection layer, the full connection layer is two fc, and carrying out living body and non-living body classification on the data to form a corresponding sub-neural network M₂A network. Collecting N face images of another scene characteristic_nData of (2)

Inputting the data into a convolutional layer, wherein the convolutional layer is a plurality of Conv, carrying out convolution operation on the data of the sub-training sample set and inputting the data into a full connection layer, the full connection layer is two fc, and carrying out two classification on the living body and the non-living body to form a corresponding sub-neural network M_nNetwork, and so on.

In the embodiment of the present invention, the loss function of the sub-neural network is a loss function on a binary cross:

wherein L represents the two-class cross-entropy loss of the sub-neural network, c represents the batch size, y_jIs a living or non-living label of the jth sample, p_jIs the probability that the jth sample is live;

the method comprises the steps of establishing an initial total neural network G network, carrying out two classifications of a living body and a non-living body which are trained on data (X, Y) of a total training sample set to form the initial total neural network G network, for example, carrying out training on all face image data to carry out classification of the living body and the non-living body to form the initial total neural network G network for a plurality of face images, wherein the plurality of face images comprise living body samples and non-living body samples with various scene characteristics such as an indoor front face, an indoor left side face, an indoor right side face, an outdoor front face, an outdoor left side face and an outdoor right side face, and the like, but are not classified according to the scene characteristics.

Continuing as shown in FIG. 2, step 130: and respectively inputting partial living samples in the total training sample set into the total neural network and the sub-neural network corresponding to the scene characteristics to obtain the total sample characteristics and the sub-sample characteristics of the living samples.

Specifically, in the embodiment of the present invention, as shown in fig. 4, all living samples in the total training sample set are respectively input to the total neural network G network and the sub-neural network M corresponding to the scene features_iObtaining the total sample characteristics F of the living body sample_GAnd subsample characteristics F_M. In other embodiments, part of the living body samples in the total training sample set may also be input to the total neural network G network and the sub-neural network M corresponding to the scene features respectively_iNetwork for obtaining total sample characteristics F of living body sample_GAnd subsample characteristics F_M。

More specifically, in the embodiment of the present invention, a living sample data in a unit sample set is selected from the data (X, Y) in the total training sample set, and the data is processed according to different scene characteristics

Input to the sub-neural network M corresponding to the scene characteristics_iNetworks, i.e. M input to a sub-neural network_iCarrying out feature extraction on the convolution layer conv of the network to obtain a sub-sample feature F_M(ii) a Selecting a living sample data in a unit sample set from the total training sample set, inputting the data (x, y) into the total neural network G network, namely inputting the data into the convolution layer conv for feature extraction, and obtaining the total sample feature F_G。

Continuing as shown in FIG. 2, step 140: and performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, wherein the trained total neural network is used for performing in vivo detection.

Wherein the confrontation training is performed by a scene feature classifier D for judging the feature is the total sample feature F_GOr a child sample characteristic F_MWhen the scene feature classifier D cannot distinguish that the feature is the total sample feature F_GOr sub-sample characteristics F_MI.e. it is not possible to distinguish whether the features originate from the global neural network G or from the sub-neural network M_iAt this time, the general neural network G network can be considered to learn common features among different scene features, and thus, the confrontation training is realized. As shown in FIG. 3, specifically, the total sample feature F_GAnd subsample characteristics F_MInput to and subneural network M_iNetwork corresponding scene feature classifier D_iFeature classification in classifier, scene feature classifier D_iThe classifier is used for judging whether the input features belong to the sub-neural network M_iA sub-training sample set of scene features corresponding to the network, and a total sample feature F generated when the total neural network G is used_MWhen the scene feature classifier D classifier can be cheated, the fact that the common features among samples of different scene features are learned by the total neural network G network is shown.

The training method of the neural network for living body detection in the embodiment of the invention utilizes the total sample characteristics and the sub-sample characteristics to carry out antagonistic training on the scene characteristic classifier and the total neural network, can learn the common characteristics from the total neural network and the sub-neural network corresponding to the scene characteristics, the common characteristics in the embodiment of the invention are the distinguishing characteristics of living bodies and non-living bodies, the distinguishing characteristics of the living bodies and the non-living bodies are both existed, the distinguishing characteristics are not any one of known characteristic extraction methods such as LBP (local binary pattern), HOG (captured contour information), SURF and the like, but are unknown ubiquitous texture characteristics, the distinguishing characteristics are not restricted by any scene, the neural network in the embodiment of the invention can learn the common characteristics of different scene data from the data of the sub-training sample sets of different scene characteristics through the antagonistic training, namely the distinguishing characteristics of living bodies and non-living bodies, the distance between the total sample classification set and the sub-sample classification set can be minimized, and the robustness of living body detection is improved. The trained neural network can extract the distinguishing characteristics of living bodies and non-living bodies without being limited by scenes, and therefore the neural network can be used for living body detection to overcome the limitation that the living body identification technology is influenced by the scene characteristics. In addition, only the living body sample is calculated when the confrontation training is carried out, because the living body in the living body detection is a closed set, but the non-living body is an open set, the category of the non-living body can be printed paper or video or 3D face mask, or any object similar to the face, and the objects usually have no common characteristics. If the similarity calculation is also performed on the negative samples, the distinguishing effect of the scene feature classifier on the live samples is reduced, because the non-live samples can be regarded as noise and interfere with normal data distribution. When the neural network training method is applied to in-vivo detection, the neural network training method is not influenced by scene characteristics, and robustness of in-vivo detection can be improved.

As a further preferable scheme of the embodiment of the present invention, step 130: inputting part of living body samples in the total training sample set into the total neural network and the sub-neural network corresponding to the scene features, respectively, and acquiring the total sample features and the sub-sample features of the living body samples, as shown in fig. 5, including:

step 131: and acquiring a feature map which is calculated and output by each layer in the neural network on the living body sample.

Wherein the neural network is a total neural network G network or a sub-neural network M_iA network.

Step 132: and fusing the obtained feature maps, and performing weighted summation on the fused feature maps to obtain sample features.

The embodiment of the invention carries out the pair of the total neural network G network and the sub-neural network M_iWhen the network extracts the features, feature fusion is carried out, the convergence difficulty of the scene feature classifier D network is reduced, so that the scene feature classifier D network can finishAnd (3) resisting learning, learning common characteristics among different sub-sample training sets, so that the total neural network G network can be free from the influence of scene characteristics when living body identification is carried out through field adaptive training, and the robustness of the living body identification is improved.

More preferably, as shown in fig. 6, before the step 132 fuses the acquired feature maps, the training method further includes:

step 1311: and (4) up-sampling feature maps output by computing from the second layer to the last layer in the neural network. Specifically, as shown in fig. 7, the process of fusing the acquired feature maps includes up-sampling the feature map output by the last convolutional layer Conv from the second layer Stage2 to the last layer Stage n to make the size of the feature map be the same as that of the first layer Stage1, weighting the feature maps of the last convolutional layers of all the neural network layers, that is, stacking the feature maps in the channel dimension, and then weighting and summing the fused features by a SE (Squeeze-Extract) module to output a fused feature.

The distinguishing characteristic of the living body and the non-living body is generally regarded as a texture characteristic, and the existing living body detection method based on the field self-adaption uses high-level semantic characteristics for calculating the characteristic similarity, so that the training difficulty of a scene characteristic classifier D network is increased, the scene characteristic classifier D network is difficult to converge, and at the moment, the common characteristic between sub-training samples under different scene characteristics is not learned due to the fact that the general neural network G network loses the counterstudy of the scene characteristic classifier D network, and the superiority of the field self-adaption method is lost. As shown in FIG. 3, the embodiment of the present invention proceeds to pair the general neural network G network and the sub-neural network M_iWhen the network extracts the features, feature fusion is carried out through the feature fusion module, the extracted features are subjected to fusion of high-level semantic features and low-level texture features, and the fused high-level and low-level features are weighted according to the feature fusion module SE module, so that the model can learn the importance of the high-level and low-level features according to training data, and the convergence difficulty of the scene feature classifier D network is reduced.

In the embodiment of the present invention, step 130 inputs part of the living body samples in the total training sample set to the total neural network G network and the sub-neural network M corresponding to the scene characteristics respectively_iThe network fuses the characteristics mentioned by the characteristic extractor through

steps

131, 1311 and 132 to obtain the total sample characteristic F of the obtained living body sample_GAnd subsample characteristics F_M. Wherein a total sample characteristic F is obtained_GAnd subsample characteristics F_MThe weighting weights of (a) and (b) are different. General neural network G network and sub-neural network M_iThe weights are not shared.

Specifically, in the embodiment of the present invention, data (x, y) of a part of living body samples in a total training sample set is input to a total neural network G network for forward propagation to obtain a classification output probability p, and features extracted by a total neural network G network feature extractor are fused to obtain a total sample feature F_G。

Specifically, step 140 performs countermeasure training on the scene feature classifier and the total neural network using the total sample features and the sub-sample features, wherein the loss function of the total neural network includes living body classification loss and generation loss of the total sample features in the scene feature classifier. Wherein the living body classification loss function is a two-classification cross entropy loss:

wherein L is_{classification}Represents the two-class cross entropy loss, c represents the batch size, y_jIs a living or non-living label of the jth sample, p_jIs the probability that the jth sample is live;

loss of generation of total sample features in the scene feature classifier:

wherein L is_GANRepresents c_kRepresenting the batch size of the kth sub-sample training set, D is the scene feature classifier, F_GIs a feature of the overall neural network G network generation;

loss function of the overall neural network: l ═ L_{classification}+L_GAN。

Wherein, the loss function of the scene feature classifier comprises the following loss belonging to the scene feature classification:

where c represents the batch size, D is the scene feature classifier, F_GIs a feature of the overall neural network G network generation, F_MIs a feature generated by the sub-neural network.

In the embodiment of the present invention, when the neural network is trained, the neural network optimizes the loss of the sub-neural network, the loss of the total neural network, and the loss of the scene feature classifier, so as to improve the expression of the extracted common features on the total neural network, thereby improving the robustness of the total neural network in performing living body detection.

Wherein, step 130: respectively inputting part of the living body samples in the total training sample set into a total neural network and a sub-neural network corresponding to the scene characteristics to obtain the total sample characteristics and the sub-sample characteristics of the living body samples; step 140: performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, as shown in fig. 8, the method includes:

step 130': acquiring a unit sample set in a total training sample set, and respectively inputting living samples in the unit sample set into a total neural network and a sub-neural network corresponding to scene characteristics to acquire total sample characteristics and sub-sample characteristics of the living samples;

step 140': carrying out countermeasure training on the scene feature classifier and the general neural network by using the general sample features and the sub-sample features;

and repeating the step 130 'of acquiring the unit sample set and the step 140' of the countertraining process until the number of repetitions reaches a preset number.

If the repetition times reaches the preset times, the step 150': and finishing the training.

Further, before the step 130 'and the step 140', the method further comprises the steps of:

step 121': initializing the preset times and the times of the total neural network to be 0;

step 122': judging whether the repetition times i are less than the preset times or not;

if not, step 160': finishing the training;

if yes, executing step 130 ', step 140' and step 150 ', step 150': reversely propagating the gradient of the repetition times of the total neural network, and updating the repetition times i + ═ 1;

loop to step 122'.

According to the embodiment of the invention, iterative training is carried out through the unit sample set, so that the loss of the total neural network G network and the loss function of the scene feature classifier D network are minimized.

Specifically, as shown in fig. 9, the living body detection method in the embodiment of the present invention specifically includes:

step 210: inputting an object to be detected;

step 220: sending the data to a trained total neural network to obtain a living body probability value P;

step 230: and judging whether the obtained probability value P is more than 0.5.

If yes, step 240 outputs the result as a live body;

if not, step 250 outputs a result of non-living being.

In the embodiment of the present invention, it is determined whether the probability value P is greater than 0.5, and in other embodiments, it may be determined whether the probability value P is greater than 0.4, 0.6, 0.7, or the like.

The living body detection method provided by the embodiment of the invention is not influenced by scene characteristics such as light, posture and the like, can distinguish the living body according to distinguishing characteristics of the living body and a non-living body, can be used for face detection, can respectively judge whether the face is a real face or a forged face, and has high robustness of face living body detection.

The present invention also includes a third technical means, as shown in fig. 10, a computing device 300 comprising at least one processing unit 310 and at least one storage unit 320, the storage unit 320 storing a computer program which, when executed by the processing unit, causes the processing unit 310 to perform the steps of the above-described biopsy method.

The computer apparatus 300 may further include a power supply component configured to perform power management of the computer device, a wired or wireless network interface configured to connect the device to a network, and an input output (I/O) interface. The device may operate based on an operating system stored in memory, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present invention and the contents of the attached drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A training method of a neural network for performing a live body test, the training method comprising:

dividing a total training sample set into a plurality of sub-training sample sets according to scene characteristics, wherein the sub-training sample sets comprise living samples and non-living samples;

training the living body classification of the sub-neural network by using the sub-training sample set to establish an initial total neural network;

respectively inputting part of living body samples in the total training sample set into the total neural network and the sub-neural network corresponding to scene features to obtain total sample features and sub-sample features of the living body samples;

2. The training method according to claim 1, wherein the inputting the partial living body samples in the total training sample set to the total neural network and the sub-neural network corresponding to scene features respectively to obtain total sample features and sub-sample features of the living body samples comprises:

acquiring a feature map which is calculated and output by each layer in a neural network on the living body sample;

3. The training method of claim 2, wherein before fusing the acquired feature maps, the method further comprises:

and (4) up-sampling feature maps output by computing from the second layer to the last layer in the neural network.

4. The training method according to claim 2, wherein fusing the obtained feature maps and performing weighted summation on the fused feature maps to obtain sample features comprises:

wherein the weighting weights of the total sample feature and the sub-sample feature are different.

5. Training method according to claim 1, wherein the scene features comprise lighting features and/or pose features.

6. The training method of claim 1, wherein the training of the in-vivo classification of the sub-neural network using the sub-training sample set comprises:

wherein the loss function of the sub-neural network comprises a living body classification loss.

7. The training method according to claim 1, wherein the performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features comprises:

wherein the loss function of the total neural network comprises living body classification loss and generation loss of the total sample feature in the scene feature classifier;

the loss function of the scene feature classifier includes a loss belonging to the scene feature classification.

8. The training method according to claim 1, wherein partial living body samples in the total training sample set are respectively input to the total neural network and the sub-neural network corresponding to scene features, and total sample features and sub-sample features of the living body samples are obtained; carrying out countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, wherein the method comprises the following steps:

acquiring a unit sample set in the total training sample set, and respectively inputting living samples in the unit sample set into a total neural network and a sub-neural network corresponding to scene characteristics to acquire total sample characteristics and sub-sample characteristics of the living samples;

carrying out countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features;

and repeating the processes of acquiring the unit sample set and the countertraining until the repetition times reach the preset times.

9. A method of in vivo detection, the method comprising:

inputting the object to be detected into the trained neural network, and outputting the object to be detected as a living body or a non-living body; wherein the trained neural network is trained by the training method of any one of claims 1-8.

10. A computing device comprising at least one processing unit and at least one memory unit, the memory unit storing a computer program that, when executed by the processing unit, causes the processing unit to perform the steps of the liveness detection method of claim 9.