CN111553202A - Training method, detection method and device of neural network for detecting living body - Google Patents

Training method, detection method and device of neural network for detecting living body Download PDF

Info

Publication number
CN111553202A
CN111553202A CN202010270821.2A CN202010270821A CN111553202A CN 111553202 A CN111553202 A CN 111553202A CN 202010270821 A CN202010270821 A CN 202010270821A CN 111553202 A CN111553202 A CN 111553202A
Authority
CN
China
Prior art keywords
neural network
total
sub
sample
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010270821.2A
Other languages
Chinese (zh)
Other versions
CN111553202B (en
Inventor
杨赟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202010270821.2A priority Critical patent/CN111553202B/en
Publication of CN111553202A publication Critical patent/CN111553202A/en
Application granted granted Critical
Publication of CN111553202B publication Critical patent/CN111553202B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of image recognition, and particularly relates to a training method, a detection method and a device of a neural network for in vivo detection, wherein the training method comprises the following steps: dividing a total training sample set into a plurality of sub-training sample sets according to scene characteristics, wherein the training sample sets comprise living samples and non-living samples; training the living body classification of the sub-neural network by using the training sample set to establish an initial total neural network; respectively inputting part of living body samples in the total training sample set into the total neural network and the sub-neural network corresponding to scene features to obtain total sample features and sub-sample features of the living body samples; and performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, wherein the trained total neural network is used for performing in vivo detection. The neural network can accurately detect the living targets in different scenes.

Description

Training method, detection method and device of neural network for detecting living body
Technical Field
The invention belongs to the technical field of image recognition, and particularly relates to a training method, a detection method and a device of a neural network for in-vivo detection.
Background
The biometric technology, especially the face recognition, has been developed and advanced greatly in recent years, such as attendance system, mobile phone unlocking, face-brushing payment, etc. However, most of the current face recognition systems do not perform live body detection, and thus are easily spoofed by photos or videos. Liveness detection is generally defined as detecting whether a given face is a real person or a fake person, such as a printed face photograph, a face in video, or a 3D face mask. Since the importance of liveness detection on the safety of a face recognition system is great, many liveness detection algorithms have been proposed, but the cross-scene liveness detection difficulty is still great.
Disclosure of Invention
The invention mainly solves the technical problem of cross-scene in-vivo detection, and provides a neural network training method for in-vivo detection, an in-vivo detection method and a computer device.
In order to solve the technical problems, the invention adopts a technical scheme that: provided is a training method of a neural network for performing in-vivo detection, the training method including:
dividing the total training sample set into a plurality of sub-training sample sets according to scene characteristics, wherein the sub-training sample sets comprise living samples and non-living samples;
training the living body classification of the sub-neural network by using a sub-training sample set, and establishing an initial total neural network;
respectively inputting part of the living body samples in the total training sample set into a total neural network and a sub-neural network corresponding to the scene characteristics to obtain the total sample characteristics and the sub-sample characteristics of the living body samples;
and performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, wherein the trained total neural network is used for performing in vivo detection.
Wherein, partial living body samples in the total training sample set are respectively input into a total neural network and a sub-neural network corresponding to the scene characteristics, and the total sample characteristics and the sub-sample characteristics of the living body samples are obtained, which comprises the following steps:
acquiring a feature map which is calculated and output by each layer of the neural network to the living body sample;
and fusing the obtained feature maps, and performing weighted summation on the fused feature maps to obtain sample features.
Before fusing the acquired feature maps, the training method further comprises the following steps: and (4) up-sampling feature maps output by computing from the second layer to the last layer in the neural network.
And further, fusing the obtained feature maps, and performing weighted summation on the fused feature maps to obtain sample features, wherein the obtained total sample features and the obtained sub-sample features have different weighted weights.
Wherein the scene features include lighting features and/or pose features.
Specifically, a live body classification of a sub neural network is trained using a sub training sample set, wherein a loss function of the sub neural network includes a live body classification loss.
Specifically, a scene feature classifier and a total neural network are subjected to countermeasure training by using total sample features and sub-sample features, wherein a loss function of the total neural network comprises living body classification loss and generation loss of the total sample features in the scene feature classifier; the penalty function of the scene feature classifier includes a penalty attributed to the classification of the scene feature.
Respectively inputting part of living body samples in the total training sample set into a total neural network and a sub-neural network corresponding to the scene characteristics, and acquiring the total sample characteristics and the sub-sample characteristics of the living body samples; carrying out countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, wherein the method comprises the following steps:
acquiring a unit sample set in a total training sample set, and respectively inputting living samples in the unit sample set into a total neural network and a sub-neural network corresponding to scene characteristics to acquire total sample characteristics and sub-sample characteristics of the living samples;
performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features;
and repeating the process of acquiring the unit sample set and the countertraining until the repetition times reach the preset times.
The invention also comprises a second technical scheme, and a living body detection method comprises the following steps:
inputting the object to be detected into the trained neural network, and outputting the object to be detected as a living body or a non-living body; wherein, the trained neural network is obtained by the training of the training method.
The present invention also includes a third technical means, a computing device comprising at least one processing unit and at least one storage unit, the storage unit storing a computer program which, when executed by the processing unit, causes the processing unit to perform the steps of the above-described living body detecting method.
The invention has the beneficial effects that: different from the situation in the prior art, the training method of the neural network for living body detection according to the embodiment of the present invention performs countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, and can learn the common features from the total neural network and the sub-neural networks corresponding to the scene features, where the common features are distinctive features of a living body and a non-living body. The neural network of the embodiment of the invention can learn common characteristics of different scene data from data of sub-training sample sets with different scene characteristics through countertraining, thereby being applied to distinguishing living bodies and non-living bodies across scenes. In the living body detection, the living body is a closed set instead of an open set, and the embodiment of the invention only calculates the living body sample during the confrontation training, so that the noise of non-living body operation can be reduced, and the distinguishing effect of the scene feature classifier on the living body sample is improved. The training method of the neural network is applied to the detection of the living body, is not influenced by scene characteristics, and can improve the robustness of the detection of the living body.
Drawings
FIG. 1 is a block diagram of one embodiment of a prior art adaptation;
FIG. 2 is a schematic diagram illustrating the steps of an embodiment of a neural network training method for performing in-vivo detection according to the present invention;
FIG. 3 is a training block diagram of an embodiment of a sub-neural network of the present invention;
FIG. 4 is a training architecture of an embodiment of the overall neural network of the present invention;
FIG. 5 is a schematic diagram of one embodiment of the present invention for obtaining total sample characteristics and sub-sample characteristics of a live sample;
FIG. 6 is a schematic diagram illustrating another embodiment of the steps for obtaining the total sample characteristic and the sub-sample characteristic of the live sample according to the present invention;
FIG. 7 is a schematic view of a feature fusion module of the present invention;
FIG. 8 is a schematic diagram illustrating the steps of a neural network training method for performing in-vivo detection according to another embodiment of the present invention;
FIG. 9 is a schematic diagram illustrating the steps of one embodiment of the in vivo assay of the present invention;
FIG. 10 is a block diagram of an embodiment of a computing device.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples.
When living body detection is carried out, the influence of scenes, such as illumination, living body posture and imaging quality, is large, and living body detection crossing scenes is difficult to deal with. The effect of the in-vivo detection algorithm is improved by learning the common characteristics obeying different data distributions of living bodies, non-living bodies and the like, the loss of the similarity of the characteristics of the data with different data distributions extracted by a characteristic extractor such as a deep neural network is usually added besides the classification loss, the model can learn the common characteristics of different data distributions by optimizing the classification loss and the similarity loss, and the generalization performance of the in-vivo detection algorithm is improved.
The deep neural network has better performance after being trained by large-scale live and non-live labeled data, but due to the problem of domain drift, the performance is greatly reduced when the deep neural network is applied to unseen data, for example, when cat and dog classifiers trained by hardy and hardy meet other varieties of cats and dogs. One solution is fine tuning, i.e., the currently trained classifier is retrained as a pre-training parameter on new data, however, when the new data has no label, fine tuning cannot be used; another solution is to use domain adaptation, in which domain adaptation is definedThe source domain is nsSample set of individual samples
Figure RE-GDA0002579426280000041
The target domain is ntSample set of individual samples
Figure RE-GDA0002579426280000042
Self-joining distribution P (X) of sample sampling of source domain and target domains,Ys) And Q (X)t,Yt) And P ≠ Q. Due to different distribution of the source domain and the target domain, if the deep neural network is trained by simply using the labeled data of the source domain, the performance on the target domain will certainly be greatly reduced, the performance of the deep neural network on the target domain is improved by using the unlabeled target domain data in the domain adaptation, and the general structure of the domain adaptation is shown in fig. 1.
The method comprises the steps that a convolutional neural network CNN is used for simultaneously extracting features of data of a source domain and data of a target domain in training, and the aim of the domain adaptation is to ensure that the features extracted by the convolutional neural network CNN are common features of the source domain and the target domain, wherein the convolutional neural network CNN extracts feature sharing parameters of the source domain and the target domain, and the performance of the convolutional neural network CNN on the target domain can be improved only by extracting the common features, so that the convolutional neural network CNN needs to minimize the distance between the features of the source domain and the target domain besides optimizing the classification loss of the source domain. The feature is judged to be from a source domain or a target domain through a discriminator, and the convolutional neural network CNN aims to cheat the discriminator, so that the discriminator judges the feature of the source domain as well as the feature of the target domain, and at this time, the convolutional neural network CNN can be considered to learn the common feature of the source domain and the target domain, because the discriminator cannot distinguish the source domain from the target domain.
An embodiment of the present invention provides a training method for a neural network for performing in-vivo detection, as shown in fig. 2, the training method includes:
step 110: and dividing the total training sample set into a plurality of sub-training sample sets according to the scene characteristics, wherein the sub-training sample sets comprise living samples and non-living samples.
In the embodiment of the present invention, the scene features are light features and gesture features, for example, the light features include indoor features and outdoor features; or the light characteristics include characteristics of cloudy days, rainy days, early morning sunlight, midday sunlight, late evening sunlight, and the like; the gesture features can be features of a front face, a left side face, a right side face and the like, and the scene features of the embodiment of the invention distribute the light features and the gesture features in a self-combination mode, such as an indoor front face, an indoor left side face, an indoor right side face, an outdoor front face, an outdoor left side face and an outdoor right side face. In other embodiments, the scene features may be only light features or pose features.
In the embodiment of the invention, the total training sample set comprises a plurality of face images, wherein the plurality of face images comprise living face images and non-living face images, and the plurality of face images are divided into N different sub-training sample sets N according to light characteristics and posture characteristics1~Nn. As shown in fig. 2, the data of the total training sample set is (X, Y), and the data of the sub-training sample set divided into a plurality of sub-training sample sets according to scene features is (X, Y)
Figure BDA0002443091890000051
In other embodiments, the total training sample set may also include a plurality of animal images.
Continuing as shown in FIG. 2, step 120: and training the living body classification of the sub-neural network by using the sub-training sample set to establish an initial total neural network.
Specifically, in the embodiment of the present invention, as shown in fig. 3, the data of each sub-training sample set is
Figure BDA0002443091890000061
Training the corresponding sub-neural network MiA network. Wherein, each sub-training sample set is trained in the scene characteristic range, for example, the face image set N belonging to the indoor front face1Data of (2)
Figure BDA0002443091890000062
Input to the rollThe convolution layer is a plurality of Conv and is used for performing convolution operation on data of a sub-training sample set, namely a face image set of an indoor front face, wherein the convolution layer is a feature extractor and is used for extracting features; and inputting the signals into a full connection layer, wherein the full connection layer comprises two fc, and the fc is subjected to two classifications of living bodies and non-living bodies to form a corresponding sub-neural network M1A network. A human face image set N belonging to an outdoor front face2Data of (2)
Figure BDA0002443091890000063
Inputting the data into a convolutional layer, wherein the convolutional layer is a plurality of Conv, carrying out convolution operation on the data of a sub-training sample set, namely an outdoor face image set, and inputting the data into a full connection layer, the full connection layer is two fc, and carrying out living body and non-living body classification on the data to form a corresponding sub-neural network M2A network. Collecting N face images of another scene characteristicnData of (2)
Figure BDA0002443091890000064
Inputting the data into a convolutional layer, wherein the convolutional layer is a plurality of Conv, carrying out convolution operation on the data of the sub-training sample set and inputting the data into a full connection layer, the full connection layer is two fc, and carrying out two classification on the living body and the non-living body to form a corresponding sub-neural network MnNetwork, and so on.
In the embodiment of the present invention, the loss function of the sub-neural network is a loss function on a binary cross:
Figure RE-GDA0002579426280000065
wherein L represents the two-class cross-entropy loss of the sub-neural network, c represents the batch size, yjIs a living or non-living label of the jth sample, pjIs the probability that the jth sample is live;
the method comprises the steps of establishing an initial total neural network G network, carrying out two classifications of a living body and a non-living body which are trained on data (X, Y) of a total training sample set to form the initial total neural network G network, for example, carrying out training on all face image data to carry out classification of the living body and the non-living body to form the initial total neural network G network for a plurality of face images, wherein the plurality of face images comprise living body samples and non-living body samples with various scene characteristics such as an indoor front face, an indoor left side face, an indoor right side face, an outdoor front face, an outdoor left side face and an outdoor right side face, and the like, but are not classified according to the scene characteristics.
Continuing as shown in FIG. 2, step 130: and respectively inputting partial living samples in the total training sample set into the total neural network and the sub-neural network corresponding to the scene characteristics to obtain the total sample characteristics and the sub-sample characteristics of the living samples.
Specifically, in the embodiment of the present invention, as shown in fig. 4, all living samples in the total training sample set are respectively input to the total neural network G network and the sub-neural network M corresponding to the scene featuresiObtaining the total sample characteristics F of the living body sampleGAnd subsample characteristics FM. In other embodiments, part of the living body samples in the total training sample set may also be input to the total neural network G network and the sub-neural network M corresponding to the scene features respectivelyiNetwork for obtaining total sample characteristics F of living body sampleGAnd subsample characteristics FM
More specifically, in the embodiment of the present invention, a living sample data in a unit sample set is selected from the data (X, Y) in the total training sample set, and the data is processed according to different scene characteristics
Figure BDA0002443091890000071
Input to the sub-neural network M corresponding to the scene characteristicsiNetworks, i.e. M input to a sub-neural networkiCarrying out feature extraction on the convolution layer conv of the network to obtain a sub-sample feature FM(ii) a Selecting a living sample data in a unit sample set from the total training sample set, inputting the data (x, y) into the total neural network G network, namely inputting the data into the convolution layer conv for feature extraction, and obtaining the total sample feature FG
Continuing as shown in FIG. 2, step 140: and performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, wherein the trained total neural network is used for performing in vivo detection.
Wherein the confrontation training is performed by a scene feature classifier D for judging the feature is the total sample feature FGOr a child sample characteristic FMWhen the scene feature classifier D cannot distinguish that the feature is the total sample feature FGOr sub-sample characteristics FMI.e. it is not possible to distinguish whether the features originate from the global neural network G or from the sub-neural network MiAt this time, the general neural network G network can be considered to learn common features among different scene features, and thus, the confrontation training is realized. As shown in FIG. 3, specifically, the total sample feature FGAnd subsample characteristics FMInput to and subneural network MiNetwork corresponding scene feature classifier DiFeature classification in classifier, scene feature classifier DiThe classifier is used for judging whether the input features belong to the sub-neural network MiA sub-training sample set of scene features corresponding to the network, and a total sample feature F generated when the total neural network G is usedMWhen the scene feature classifier D classifier can be cheated, the fact that the common features among samples of different scene features are learned by the total neural network G network is shown.
The training method of the neural network for living body detection in the embodiment of the invention utilizes the total sample characteristics and the sub-sample characteristics to carry out antagonistic training on the scene characteristic classifier and the total neural network, can learn the common characteristics from the total neural network and the sub-neural network corresponding to the scene characteristics, the common characteristics in the embodiment of the invention are the distinguishing characteristics of living bodies and non-living bodies, the distinguishing characteristics of the living bodies and the non-living bodies are both existed, the distinguishing characteristics are not any one of known characteristic extraction methods such as LBP (local binary pattern), HOG (captured contour information), SURF and the like, but are unknown ubiquitous texture characteristics, the distinguishing characteristics are not restricted by any scene, the neural network in the embodiment of the invention can learn the common characteristics of different scene data from the data of the sub-training sample sets of different scene characteristics through the antagonistic training, namely the distinguishing characteristics of living bodies and non-living bodies, the distance between the total sample classification set and the sub-sample classification set can be minimized, and the robustness of living body detection is improved. The trained neural network can extract the distinguishing characteristics of living bodies and non-living bodies without being limited by scenes, and therefore the neural network can be used for living body detection to overcome the limitation that the living body identification technology is influenced by the scene characteristics. In addition, only the living body sample is calculated when the confrontation training is carried out, because the living body in the living body detection is a closed set, but the non-living body is an open set, the category of the non-living body can be printed paper or video or 3D face mask, or any object similar to the face, and the objects usually have no common characteristics. If the similarity calculation is also performed on the negative samples, the distinguishing effect of the scene feature classifier on the live samples is reduced, because the non-live samples can be regarded as noise and interfere with normal data distribution. When the neural network training method is applied to in-vivo detection, the neural network training method is not influenced by scene characteristics, and robustness of in-vivo detection can be improved.
As a further preferable scheme of the embodiment of the present invention, step 130: inputting part of living body samples in the total training sample set into the total neural network and the sub-neural network corresponding to the scene features, respectively, and acquiring the total sample features and the sub-sample features of the living body samples, as shown in fig. 5, including:
step 131: and acquiring a feature map which is calculated and output by each layer in the neural network on the living body sample.
Wherein the neural network is a total neural network G network or a sub-neural network MiA network.
Step 132: and fusing the obtained feature maps, and performing weighted summation on the fused feature maps to obtain sample features.
The embodiment of the invention carries out the pair of the total neural network G network and the sub-neural network MiWhen the network extracts the features, feature fusion is carried out, the convergence difficulty of the scene feature classifier D network is reduced, so that the scene feature classifier D network can finishAnd (3) resisting learning, learning common characteristics among different sub-sample training sets, so that the total neural network G network can be free from the influence of scene characteristics when living body identification is carried out through field adaptive training, and the robustness of the living body identification is improved.
More preferably, as shown in fig. 6, before the step 132 fuses the acquired feature maps, the training method further includes:
step 1311: and (4) up-sampling feature maps output by computing from the second layer to the last layer in the neural network. Specifically, as shown in fig. 7, the process of fusing the acquired feature maps includes up-sampling the feature map output by the last convolutional layer Conv from the second layer Stage2 to the last layer Stage n to make the size of the feature map be the same as that of the first layer Stage1, weighting the feature maps of the last convolutional layers of all the neural network layers, that is, stacking the feature maps in the channel dimension, and then weighting and summing the fused features by a SE (Squeeze-Extract) module to output a fused feature.
The distinguishing characteristic of the living body and the non-living body is generally regarded as a texture characteristic, and the existing living body detection method based on the field self-adaption uses high-level semantic characteristics for calculating the characteristic similarity, so that the training difficulty of a scene characteristic classifier D network is increased, the scene characteristic classifier D network is difficult to converge, and at the moment, the common characteristic between sub-training samples under different scene characteristics is not learned due to the fact that the general neural network G network loses the counterstudy of the scene characteristic classifier D network, and the superiority of the field self-adaption method is lost. As shown in FIG. 3, the embodiment of the present invention proceeds to pair the general neural network G network and the sub-neural network MiWhen the network extracts the features, feature fusion is carried out through the feature fusion module, the extracted features are subjected to fusion of high-level semantic features and low-level texture features, and the fused high-level and low-level features are weighted according to the feature fusion module SE module, so that the model can learn the importance of the high-level and low-level features according to training data, and the convergence difficulty of the scene feature classifier D network is reduced.
In the embodiment of the present invention, step 130 inputs part of the living body samples in the total training sample set to the total neural network G network and the sub-neural network M corresponding to the scene characteristics respectivelyiThe network fuses the characteristics mentioned by the characteristic extractor through steps 131, 1311 and 132 to obtain the total sample characteristic F of the obtained living body sampleGAnd subsample characteristics FM. Wherein a total sample characteristic F is obtainedGAnd subsample characteristics FMThe weighting weights of (a) and (b) are different. General neural network G network and sub-neural network MiThe weights are not shared.
Specifically, in the embodiment of the present invention, data (x, y) of a part of living body samples in a total training sample set is input to a total neural network G network for forward propagation to obtain a classification output probability p, and features extracted by a total neural network G network feature extractor are fused to obtain a total sample feature FG
Specifically, step 140 performs countermeasure training on the scene feature classifier and the total neural network using the total sample features and the sub-sample features, wherein the loss function of the total neural network includes living body classification loss and generation loss of the total sample features in the scene feature classifier. Wherein the living body classification loss function is a two-classification cross entropy loss:
Figure RE-GDA0002579426280000101
wherein L isclassificationRepresents the two-class cross entropy loss, c represents the batch size, yjIs a living or non-living label of the jth sample, pjIs the probability that the jth sample is live;
loss of generation of total sample features in the scene feature classifier:
Figure RE-GDA0002579426280000102
wherein L isGANRepresents ckRepresenting the batch size of the kth sub-sample training set, D is the scene feature classifier, FGIs a feature of the overall neural network G network generation;
loss function of the overall neural network: l ═ Lclassification+LGAN
Wherein, the loss function of the scene feature classifier comprises the following loss belonging to the scene feature classification:
Figure RE-GDA0002579426280000103
where c represents the batch size, D is the scene feature classifier, FGIs a feature of the overall neural network G network generation, FMIs a feature generated by the sub-neural network.
In the embodiment of the present invention, when the neural network is trained, the neural network optimizes the loss of the sub-neural network, the loss of the total neural network, and the loss of the scene feature classifier, so as to improve the expression of the extracted common features on the total neural network, thereby improving the robustness of the total neural network in performing living body detection.
Wherein, step 130: respectively inputting part of the living body samples in the total training sample set into a total neural network and a sub-neural network corresponding to the scene characteristics to obtain the total sample characteristics and the sub-sample characteristics of the living body samples; step 140: performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, as shown in fig. 8, the method includes:
step 130': acquiring a unit sample set in a total training sample set, and respectively inputting living samples in the unit sample set into a total neural network and a sub-neural network corresponding to scene characteristics to acquire total sample characteristics and sub-sample characteristics of the living samples;
step 140': carrying out countermeasure training on the scene feature classifier and the general neural network by using the general sample features and the sub-sample features;
and repeating the step 130 'of acquiring the unit sample set and the step 140' of the countertraining process until the number of repetitions reaches a preset number.
If the repetition times reaches the preset times, the step 150': and finishing the training.
Further, before the step 130 'and the step 140', the method further comprises the steps of:
step 121': initializing the preset times and the times of the total neural network to be 0;
step 122': judging whether the repetition times i are less than the preset times or not;
if not, step 160': finishing the training;
if yes, executing step 130 ', step 140' and step 150 ', step 150': reversely propagating the gradient of the repetition times of the total neural network, and updating the repetition times i + ═ 1;
loop to step 122'.
According to the embodiment of the invention, iterative training is carried out through the unit sample set, so that the loss of the total neural network G network and the loss function of the scene feature classifier D network are minimized.
The invention also comprises a second technical scheme, and a living body detection method comprises the following steps:
inputting the object to be detected into the trained neural network, and outputting the object to be detected as a living body or a non-living body; wherein, the trained neural network is obtained by the training of the training method.
Specifically, as shown in fig. 9, the living body detection method in the embodiment of the present invention specifically includes:
step 210: inputting an object to be detected;
step 220: sending the data to a trained total neural network to obtain a living body probability value P;
step 230: and judging whether the obtained probability value P is more than 0.5.
If yes, step 240 outputs the result as a live body;
if not, step 250 outputs a result of non-living being.
In the embodiment of the present invention, it is determined whether the probability value P is greater than 0.5, and in other embodiments, it may be determined whether the probability value P is greater than 0.4, 0.6, 0.7, or the like.
The living body detection method provided by the embodiment of the invention is not influenced by scene characteristics such as light, posture and the like, can distinguish the living body according to distinguishing characteristics of the living body and a non-living body, can be used for face detection, can respectively judge whether the face is a real face or a forged face, and has high robustness of face living body detection.
The present invention also includes a third technical means, as shown in fig. 10, a computing device 300 comprising at least one processing unit 310 and at least one storage unit 320, the storage unit 320 storing a computer program which, when executed by the processing unit, causes the processing unit 310 to perform the steps of the above-described biopsy method.
The computer apparatus 300 may further include a power supply component configured to perform power management of the computer device, a wired or wireless network interface configured to connect the device to a network, and an input output (I/O) interface. The device may operate based on an operating system stored in memory, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present invention and the contents of the attached drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A training method of a neural network for performing a live body test, the training method comprising:
dividing a total training sample set into a plurality of sub-training sample sets according to scene characteristics, wherein the sub-training sample sets comprise living samples and non-living samples;
training the living body classification of the sub-neural network by using the sub-training sample set to establish an initial total neural network;
respectively inputting part of living body samples in the total training sample set into the total neural network and the sub-neural network corresponding to scene features to obtain total sample features and sub-sample features of the living body samples;
and performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, wherein the trained total neural network is used for performing in vivo detection.
2. The training method according to claim 1, wherein the inputting the partial living body samples in the total training sample set to the total neural network and the sub-neural network corresponding to scene features respectively to obtain total sample features and sub-sample features of the living body samples comprises:
acquiring a feature map which is calculated and output by each layer in a neural network on the living body sample;
and fusing the obtained feature maps, and performing weighted summation on the fused feature maps to obtain sample features.
3. The training method of claim 2, wherein before fusing the acquired feature maps, the method further comprises:
and (4) up-sampling feature maps output by computing from the second layer to the last layer in the neural network.
4. The training method according to claim 2, wherein fusing the obtained feature maps and performing weighted summation on the fused feature maps to obtain sample features comprises:
wherein the weighting weights of the total sample feature and the sub-sample feature are different.
5. Training method according to claim 1, wherein the scene features comprise lighting features and/or pose features.
6. The training method of claim 1, wherein the training of the in-vivo classification of the sub-neural network using the sub-training sample set comprises:
wherein the loss function of the sub-neural network comprises a living body classification loss.
7. The training method according to claim 1, wherein the performing countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features comprises:
wherein the loss function of the total neural network comprises living body classification loss and generation loss of the total sample feature in the scene feature classifier;
the loss function of the scene feature classifier includes a loss belonging to the scene feature classification.
8. The training method according to claim 1, wherein partial living body samples in the total training sample set are respectively input to the total neural network and the sub-neural network corresponding to scene features, and total sample features and sub-sample features of the living body samples are obtained; carrying out countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features, wherein the method comprises the following steps:
acquiring a unit sample set in the total training sample set, and respectively inputting living samples in the unit sample set into a total neural network and a sub-neural network corresponding to scene characteristics to acquire total sample characteristics and sub-sample characteristics of the living samples;
carrying out countermeasure training on the scene feature classifier and the total neural network by using the total sample features and the sub-sample features;
and repeating the processes of acquiring the unit sample set and the countertraining until the repetition times reach the preset times.
9. A method of in vivo detection, the method comprising:
inputting the object to be detected into the trained neural network, and outputting the object to be detected as a living body or a non-living body; wherein the trained neural network is trained by the training method of any one of claims 1-8.
10. A computing device comprising at least one processing unit and at least one memory unit, the memory unit storing a computer program that, when executed by the processing unit, causes the processing unit to perform the steps of the liveness detection method of claim 9.
CN202010270821.2A 2020-04-08 2020-04-08 Training method, detection method and device for neural network for living body detection Active CN111553202B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010270821.2A CN111553202B (en) 2020-04-08 2020-04-08 Training method, detection method and device for neural network for living body detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010270821.2A CN111553202B (en) 2020-04-08 2020-04-08 Training method, detection method and device for neural network for living body detection

Publications (2)

Publication Number Publication Date
CN111553202A true CN111553202A (en) 2020-08-18
CN111553202B CN111553202B (en) 2023-05-16

Family

ID=72000134

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010270821.2A Active CN111553202B (en) 2020-04-08 2020-04-08 Training method, detection method and device for neural network for living body detection

Country Status (1)

Country Link
CN (1) CN111553202B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723215A (en) * 2021-08-06 2021-11-30 浙江大华技术股份有限公司 Training method of living body detection network, living body detection method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107545241A (en) * 2017-07-19 2018-01-05 百度在线网络技术(北京)有限公司 Neural network model is trained and biopsy method, device and storage medium
CN108537152A (en) * 2018-03-27 2018-09-14 百度在线网络技术(北京)有限公司 Method and apparatus for detecting live body
CN108563998A (en) * 2018-03-16 2018-09-21 新智认知数据服务有限公司 Vivo identification model training method, biopsy method and device
US20190034703A1 (en) * 2017-07-26 2019-01-31 Baidu Online Network Technology (Beijing) Co., Ltd. Attack sample generating method and apparatus, device and storage medium
CN109583342A (en) * 2018-11-21 2019-04-05 重庆邮电大学 Human face in-vivo detection method based on transfer learning
CN110059546A (en) * 2019-03-08 2019-07-26 深圳神目信息技术有限公司 Vivo identification method, device, terminal and readable medium based on spectrum analysis
CN110059569A (en) * 2019-03-21 2019-07-26 阿里巴巴集团控股有限公司 Biopsy method and device, model evaluation method and apparatus
CN110490076A (en) * 2019-07-18 2019-11-22 平安科技(深圳)有限公司 Biopsy method, device, computer equipment and storage medium
CN110706152A (en) * 2019-09-25 2020-01-17 中山大学 Face illumination migration method based on generation of confrontation network
US20200098139A1 (en) * 2018-09-26 2020-03-26 Facebook Technologies, Llc Systems and Methods for Generating and Transmitting Image Sequences Based on Sampled Color Information

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107545241A (en) * 2017-07-19 2018-01-05 百度在线网络技术(北京)有限公司 Neural network model is trained and biopsy method, device and storage medium
US20190034703A1 (en) * 2017-07-26 2019-01-31 Baidu Online Network Technology (Beijing) Co., Ltd. Attack sample generating method and apparatus, device and storage medium
CN108563998A (en) * 2018-03-16 2018-09-21 新智认知数据服务有限公司 Vivo identification model training method, biopsy method and device
CN108537152A (en) * 2018-03-27 2018-09-14 百度在线网络技术(北京)有限公司 Method and apparatus for detecting live body
US20200098139A1 (en) * 2018-09-26 2020-03-26 Facebook Technologies, Llc Systems and Methods for Generating and Transmitting Image Sequences Based on Sampled Color Information
CN109583342A (en) * 2018-11-21 2019-04-05 重庆邮电大学 Human face in-vivo detection method based on transfer learning
CN110059546A (en) * 2019-03-08 2019-07-26 深圳神目信息技术有限公司 Vivo identification method, device, terminal and readable medium based on spectrum analysis
CN110059569A (en) * 2019-03-21 2019-07-26 阿里巴巴集团控股有限公司 Biopsy method and device, model evaluation method and apparatus
CN110490076A (en) * 2019-07-18 2019-11-22 平安科技(深圳)有限公司 Biopsy method, device, computer equipment and storage medium
CN110706152A (en) * 2019-09-25 2020-01-17 中山大学 Face illumination migration method based on generation of confrontation network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨巨成等: "人脸识别活体检测综述" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723215A (en) * 2021-08-06 2021-11-30 浙江大华技术股份有限公司 Training method of living body detection network, living body detection method and device
WO2023011606A1 (en) * 2021-08-06 2023-02-09 Zhejiang Dahua Technology Co., Ltd. Training method of live body detection network, method and apparatus of live body detectoin

Also Published As

Publication number Publication date
CN111553202B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN109145979B (en) Sensitive image identification method and terminal system
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
CN108830252B (en) Convolutional neural network human body action recognition method fusing global space-time characteristics
CN110348319B (en) Face anti-counterfeiting method based on face depth information and edge image fusion
CN109815826B (en) Method and device for generating face attribute model
CN108520216B (en) Gait image-based identity recognition method
CN108520226B (en) Pedestrian re-identification method based on body decomposition and significance detection
CN109558810B (en) Target person identification method based on part segmentation and fusion
CN109614907B (en) Pedestrian re-identification method and device based on feature-enhanced guided convolutional neural network
CN106778796B (en) Human body action recognition method and system based on hybrid cooperative training
CN108230291B (en) Object recognition system training method, object recognition method, device and electronic equipment
CN111797683A (en) Video expression recognition method based on depth residual error attention network
KR101777601B1 (en) Distinction method and system for characters written in caoshu characters or cursive characters
CN112784763A (en) Expression recognition method and system based on local and overall feature adaptive fusion
JP2017062778A (en) Method and device for classifying object of image, and corresponding computer program product and computer-readable medium
US20100111375A1 (en) Method for Determining Atributes of Faces in Images
KR101687217B1 (en) Robust face recognition pattern classifying method using interval type-2 rbf neural networks based on cencus transform method and system for executing the same
CN109903339B (en) Video group figure positioning detection method based on multi-dimensional fusion features
CN109145704B (en) Face portrait recognition method based on face attributes
Wang et al. Study on the method of transmission line foreign body detection based on deep learning
CN112329784A (en) Correlation filtering tracking method based on space-time perception and multimodal response
CN110633689B (en) Face recognition model based on semi-supervised attention network
CN111553202A (en) Training method, detection method and device of neural network for detecting living body
CN109815887B (en) Multi-agent cooperation-based face image classification method under complex illumination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant