CN112464864A - Face living body detection method based on tree-shaped neural network structure - Google Patents

Face living body detection method based on tree-shaped neural network structure Download PDF

Info

Publication number
CN112464864A
CN112464864A CN202011439243.7A CN202011439243A CN112464864A CN 112464864 A CN112464864 A CN 112464864A CN 202011439243 A CN202011439243 A CN 202011439243A CN 112464864 A CN112464864 A CN 112464864A
Authority
CN
China
Prior art keywords
face
living body
picture
neural network
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011439243.7A
Other languages
Chinese (zh)
Inventor
沈耀
薛迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202011439243.7A priority Critical patent/CN112464864A/en
Publication of CN112464864A publication Critical patent/CN112464864A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive

Abstract

A human face in-vivo detection method based on a tree-shaped neural network structure is characterized in that training samples are collected and labeled and then used for training a human face in-vivo detection model based on the tree-shaped neural network structure, and a preprocessed picture to be detected is input into the trained human face in-vivo detection model to realize in-vivo detection. When the method faces various light condition changes, camera condition changes and attacks of various non-living body types of the picture to be detected, the detection accuracy of the living body detection is ensured, the usability, the reliability and the universality of the face living body detection are obviously improved, and the judgment of the non-living body attack types is carried out.

Description

Face living body detection method based on tree-shaped neural network structure
Technical Field
The invention relates to a technology in the field of image detection, in particular to a human face living body detection method based on a tree-shaped neural network structure.
Background
Because the existing face recognition technology is very easy to copy through the modes of photos, videos, even simulation molds and the like, a malicious person tries to disguise in the recognition process, and the purpose of illegal invasion is achieved through verification by a picture. The human face living body detection is generated from the human face living body detection, and the main adopted technical means are as follows: detecting a living body by using pictures: whether the target object is a living body is judged based on the burst (Moire pattern, imaging deformity and the like) of the portrait in the picture, so that cheating attacks such as secondary copying of a screen and the like can be effectively prevented, and single or multiple judgment logics can be used; video stream living body detection: the motion of each pixel position is determined by utilizing the time domain variation and the correlation of pixel intensity data in the image sequence, the operation information of each pixel point is obtained from the image sequence, and a Gaussian difference filter, LBP characteristics and a support vector machine are adopted for data statistical analysis. Meanwhile, the optical flow field is sensitive to the movement of an object, and eyeball movement and blink can be uniformly detected by using the optical flow field. The living body detection mode can realize blind detection under the condition that a user is not matched; the infrared binocular camera living body detection realizes living body judgment at night or under the condition of no natural light by utilizing the near-infrared imaging principle. The imaging characteristics (such as incapability of imaging on a screen, different material reflectivities and the like) of the method can realize high-robustness living body judgment; the 3D camera in-vivo detection is based on a 3D structured light imaging principle, a depth image is constructed through light reflected by the surface of a human face, whether a target object is a living body or not is judged, and attacks such as pictures, videos, screens and molds can be defended effectively. The disadvantages are the need for additional equipment and the high cost; the action is matched with living body detection, a specified action requirement is given, a user needs to complete the action, and whether the living body exists is judged by detecting the states of eyes, mouth and head postures of the user in real time, so that the method is a widely used technology at present.
However, the existing human face living body detection technology has the following defects: the detection accuracy is low using extra equipment and silence detection that does not require action coordination; attack to a specific detection mode cannot be dealt with; the detection accuracy under various light conditions is not stable; the method cannot cope with unknown attacks, i.e. the method has low generalization.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a human face in-vivo detection method based on a tree neural network structure, which can ensure the detection accuracy of in-vivo detection, remarkably improve the usability, reliability and universality of human face in-vivo detection and judge the types of non-in-vivo attacks when the human face in-vivo detection method faces to be detected and the attack of various light condition changes, camera condition changes and various non-in-vivo types.
The invention is realized by the following technical scheme:
the invention relates to a human face in-vivo detection method based on a tree-shaped neural network structure, which is characterized in that training samples are collected and labeled and then used for training a human face in-vivo detection model based on the tree-shaped neural network structure, and a preprocessed picture to be detected is input into the trained human face in-vivo detection model to realize in-vivo detection.
The training sample collection and labeling means that: using a public data set, the data set comprising: in addition to the live body sample video and the non-live body sample video of paper printing and screen attack that are used for training of shooting are gathered under indoor light environment, still include: and collecting non-living sample videos containing attack types such as makeup, masks and the like for training from a video website, carrying out face detection on each frame in the non-living sample videos through an MTCNN face detection algorithm, and making a mask serving as training supervision data.
The living body sample video used for training is a living body sample video shot in natural light and indoors.
The non-living sample video is a face photo shot and printed as a printing attack and a screen display face photo as a screen attack.
The non-live sample video further comprises: the collection from the video website includes: the method comprises the following steps that various types of non-living body sample videos such as a partial mask, a silicone mask, a paper mask, a dummy model, a makeup simulation and a paper printing eye glasses are added, the type number of the non-living body samples is increased, and the richness of training data is improved so as to improve the effect of the model. And randomly extracting 10 frames, namely 1626 × 10 images in total from the training sample video as training data, marking the label of the living body sample as 0 and the label of the non-living body sample as 1, thereby achieving the effect of data enhancement.
The face detection means that: the real human face part is 0, the prosthesis part is 1, and the method specifically comprises the following steps: the mask of the real face is 0 for the area of the face, the mask of the print photo is 1 for the area of the face, the mask of the eye mask is 1 for the area of the eyes, and the other areas of the face are 0.
The training is as follows: and (3) disordering the videos of the living body sample and the non-living body sample, randomly extracting a video frame and a mask thereof, and inputting the video frame and the mask thereof into the tree-shaped neural network for training to obtain a trained human face living body detection model based on the tree-shaped neural network structure.
The human face living body detection model based on the tree-shaped neural network structure is a neural network with a 4-layer tree-shaped structure, and specifically comprises the following steps: 7 feature extraction modules, 8 supervised training modules and 7 unsupervised clustering routing modules for attack samples, wherein: non-leaf nodes on the tree neural network pass through a feature extraction module and then are connected with a routing module, and leaf nodes are supervised training modules.
The network input batch size (batch size) is preferably set to 32.
The feature extraction module comprises: three convolutional layers with residual structure and one max pooling layer, where: the convolution kernels of the convolution layers are all 3 × 3 in size, and characteristic graphs with the sizes of 128 × 40, 64 × 40 and 32 × 40 are obtained by using the ReLU as an activation function and adding BN layers to the first layer, the second layer and the third layer respectively.
The unsupervised clustering means that: passing the input vector x through a routing function
Figure BDA0002821816760000021
Calculating to obtain a sample set advancing to the left child node
Figure BDA0002821816760000022
And sample set advancing to the right child node
Figure BDA0002821816760000023
Wherein: s is a sample set, IkK is 1,2, 3.
The loss function adopted for the training of the routing module of the unsupervised cluster used is
Figure BDA0002821816760000031
Wherein: n, Nl,NrRespectively are sample sets S, Sleft,SrightThe number of samples.
The routing module reduces the dimension of the input feature graph by using convolution of 1 × 1, scales the feature graph to 16 × 20, uses reshape as a vector with the length of 16 × 20, and inputs the vector into a routing function to calculate and obtain the routing target.
The supervised training module takes a feature map of 32 × 40 as input, and firstly, one branch passes through a convolution layer of 1 × 1 to generate a non-living body mask map of 32 × 1; the other branch passes through two convolution layers, the sizes of convolution kernels are 3 x 3, the number of channels is 40, then the final result of whether the living body is obtained through two fully-connected layers with the dimensionality of 500 and 2 respectively, the living body is 0, and the non-living body is 1.
The pretreatment is as follows: after face detection is carried out on an input picture to be detected to obtain a face position, cutting and zooming are carried out on the picture to be detected according to the obtained face position, and the method specifically comprises the following steps: detecting the face position coordinates in an input picture by using an MTCNN method, selecting the largest face for processing when the picture contains a plurality of faces, and quitting a detection system and returning no-face abnormity when the picture does not contain the faces; and preprocessing the picture to be detected according to the obtained face position, cutting the picture according to face position coordinates obtained by face detection, cutting out the picture of the face part, and zooming to 256 × 256 of a fixed size.
The living body detection is as follows: preprocessing a picture to be detected and inputting the preprocessed picture into a trained human face living body detection model to obtain a result comprising: and determining whether the detected image is a live body 0/1 value and a mask with the size of a non-live body part being 32 × 1 in the image as a detection result, and performing live body judgment on the detection result and non-live body attack type judgment according to the mask to obtain a final live body detection result.
The invention relates to a system for realizing the method, which comprises the following steps: the system comprises a data preprocessing unit and a face living body detection unit, wherein the data preprocessing unit receives and preprocesses a picture to be detected, processes the picture into a picture which only contains a face part and has a fixed size, and inputs the picture into the face living body detection unit; the human face living body detection unit is connected with the data preprocessing unit, receives the picture data after data preprocessing, performs living body detection on the human face in the picture, and generates the probability that the human face in the picture is a non-living body and a mask of the non-living body part in the picture.
Technical effects
Compared with the prior art, the method does not need human action coordination, increases the non-living body part mask as a monitoring signal, realizes clustering on non-living body attack, enhances the accuracy of extracted feature semantics during feature extraction, enables the model effect to get rid of the requirements on light and other conditions, and further improves the reliability of the model result; the invention can further change the network structure by increasing the number of layers or nodes of the tree neural network, can realize more detailed clustering of non-living attack types, and improves the resistance to unknown attack types.
Drawings
FIG. 1 is a schematic diagram of a training process of the present invention;
FIG. 2 is a schematic view of the detection process of the present invention;
FIG. 3 is a schematic diagram of a tree neural network structure according to the present invention;
FIG. 4 is a block diagram of a tree neural network according to the present invention;
in the figure: a) is a structure diagram of a feature extraction module, b) is a structure diagram of a routing module, and c) is a structure diagram of a leaf supervised training module;
FIG. 5 is a schematic diagram of a partial image of test sample data and its non-live mask according to an embodiment.
Detailed Description
As shown in fig. 1, the present embodiment relates to a method for detecting a living human face based on a tree neural network structure, which includes: the method comprises the following steps:
s101, acquiring a training sample video: using the public data set, acquiring a live sample video for training and a non-live sample video for paper printing and screen attack in an indoor light environment, and collecting the non-live sample video for training from a video website, as shown in fig. 5, including: part mask, silicone mask, paper mask, dummy model, make-up simulation, paper printing eye glasses, 1626 videos in total in the data set.
Step S102, carrying out face detection on the video file frame by frame, cutting an image according to the face position and zooming, and specifically comprising the following steps:
for each frame of image of the video, the method comprises the following steps: but not limited to MTCNN and other methods, to obtain the coordinates (x) of the upper left corner and the lower right corner of the face position frame1,y1) And (x)2,y2);
② according to the coordinates (x) of the face frame1,y1) And (x)2,y2) And cutting the picture, cutting the face part, scaling the face part by adopting a bilinear interpolation algorithm to a fixed size of 256 x 256.
Step S103, whether the face position of the video frame image is a living body or not is marked according to the non-living body attack type, and a non-living body part mask is manufactured, wherein: the labeling method of the sample comprises the following steps: labeling a live sample label as 0 and a non-live sample label as 1; and performing face detection and making a mask for each frame in the sample video, wherein the real face part is 0, the false body part is 1, for example, the mask of the real face is 0, the mask of the printed picture is 1, the mask of the eye mask is 1, and the other face regions are 0.
Step S104, training a tree neural network model shown in FIG. 4, wherein the tree neural network structure has a 4-layer structure, and comprises 7 feature extraction modules, 8 leaf supervised training modules for outputting the living body detection result and the non-living body mask and 7 routing modules.
As shown in fig. 4a), the feature extraction module in the tree-shaped neural network model is composed of three convolution layers with residual structure and one maximum pooling layer, the convolution kernels of the convolution layers are all 3 × 3, and feature maps with sizes of 128 × 40, 64 × 40 and 32 × 40 are obtained by using the ReLU as an activation function and adding BN layers to the first layer, the second layer and the third layer respectively. Extracting hierarchical features of the image;
the routing module is used for calculating the forward direction of the features in the network. As shown in fig. 4b), the feature map enters the routing module, the convolution layer with convolution kernel size of 1 × 1 is subjected to dimensionality reduction, the number of channels 40 is subjected to dimensionality reduction to the number of channels 20, then scaling of the feature map is performed through bilinear interpolation, the size of the feature map is scaled to fixed size 16 × 16, then the feature map with size 16 × 20 is reshaped into a vector with length 16 × 20, and then the vector is input into the routing function
Figure BDA0002821816760000051
The forward direction is calculated.
The routing function of the routing module
Figure BDA0002821816760000052
Wherein: x is a vector calculated by a characteristic graph processing input routing function through a routing module, and v and tau are a parameter vector and a bias quantity which need to pass through unsupervised training in the routing function respectively; calculating the vector x as an input function
Figure BDA0002821816760000053
(proceed to the left child node) and
Figure BDA0002821816760000054
(proceed to the right child node), wherein: s is a sample set, IkK is a data sample, and therefore clustering of the samples is achieved. The parameters v, tau in the function are obtained by training in an unsupervised mode, and the loss function is obtained
Figure BDA0002821816760000055
Wherein: n, Nl,NrAre respectively a sampleCollection of S, Sleft,SrightThe objective of the training loss function is to make the routing function value of the vector to the left and right nodes
Figure BDA0002821816760000056
The average value of (a) is farthest to achieve clustering, xkIs a sample IkThe vector to be calculated of the routing function is reached after processing, S is a non-living sample set passing through the node, S-For other sample sets, including: live samples and non-live samples that did not pass through this node.
The input of the leaf supervised training module is a feature map of 32 × 40, as shown in fig. 4c), and the module comprises: two branches, specifically: a branch feature map is subjected to direct dimensionality reduction through a convolution layer with convolution kernel size of 1 x 1 to obtain a mask of 32 x 1; and the other branch characteristic diagram successively passes through two convolution layers with the convolution kernel size of 3 x 3 and the channel number of 40, then is connected with two full-connection layers, the dimensionality is 500 and 2 respectively, and then the probability that the result is a non-living body is obtained through the softmax layer, wherein the living body is 0, and the non-living body is 1.
The leaf has a non-living mask of the supervised training module, and passes through a loss function
Figure BDA0002821816760000057
To obtain, wherein: mkMasks generated for the network, DkFor the input mask samples, N is the number of samples to reach this node, S is the set of samples to reach this node, IkIs the sample of the sample set that arrives at this node.
The leaf has the in-vivo detection result of the supervised training module, that is, in 0/1 two-classification branches, the two-classification loss function uses an improved focalloss loss function, which specifically comprises:
Figure BDA0002821816760000061
wherein: s1For all non-live data samples arriving at this node, S0For all live data samples arriving at this node, IkTo arrive atThe samples in the sample set of this node, N is the total number of samples that reach the node, y' is the predicted non-living body probability of the sample, α, γ are balance factors, and are preferably set to α ═ 0.25, and γ ═ 2.
The total loss of the training use of the tree neural network model
Figure BDA0002821816760000062
Wherein: alpha is alpha1,α2,α3For weighting parameters, the preferred settings of the parameters are 0.01, 1,2, L, respectivelycls,Lmask,LrouteRespectively a binary classification loss function, a mask loss function and a routing unsupervised clustering loss function.
Step S105, as shown in fig. 2, performing face detection on the picture to be detected: adopting MTCNN method to detect human face and obtaining coordinates (x) of upper left corner and lower right corner of human face position frame1,y1) And (x)2,y2) If the face is not detected in the picture, the face living body detection fails and returns to the picture that no face error exists; and if the number of the faces in the picture is more than 1, only selecting the largest face for subsequent judgment.
Step S106, preprocessing the picture to be detected: face frame coordinates (x) obtained in step S2011,y1) And (x)2,y2) And cutting the picture, cutting the face part, scaling the face part by adopting a bilinear interpolation algorithm to a fixed size of 256 x 256.
And S107, inputting the picture preprocessed in the step S202 into the trained tree neural network model, and judging whether the picture is a living body or a non-living body part mask result according to the deduced non-living body probability.
And step S108, judging whether the face in the picture is a living body according to the result of the fact that the face is the living body 0/1, if so, successfully detecting the face living body, and if not, returning a non-living body result and deducing the non-living body attack type according to the mask result.
Setting the batch processing size to be 32, the learning rate to be 0.001 and the learning rate momentum to be 0.999 under the actual experimental environment to classify the loss into two categoriesα in the loss function is 0.25, γ is 2, α in the total loss function1=0.01,α2=1,α3The above method was run with 2 parameters and the results on the test set were APCER-3.62% and BPCER-12.56%, where: the APCER is the proportion misjudged as a living body in all non-living body samples, and the BPCER is the proportion misjudged as a non-living body in all living body samples.
Compared with the typical face living body detection algorithm SVM and LBP based on the traditional method, wherein the APCER is 32.8 +/-29.8, the BPCER is 21.0 +/-2.9, and the typical face living body detection algorithm Auxiliary based on deep learning, the APCER is 38.3 +/-37.4, and the BPCER is 8.9 +/-2.0, the APCER of the method is 3.62%, the BPCER is 12.56%, the non-living body false detection rate is lower than that of the prior art, and the comprehensive accuracy of the detection result is high.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (10)

1. A human face in-vivo detection method based on a tree-shaped neural network structure is characterized in that training samples are collected and labeled for training a human face in-vivo detection model based on the tree-shaped neural network structure, and then a preprocessed picture to be detected is input into the trained human face in-vivo detection model to realize in-vivo detection;
the human face living body detection model based on the tree-shaped neural network structure is a neural network with a 4-layer tree-shaped structure, and specifically comprises the following steps: 7 feature extraction modules, 8 supervised training modules and 7 unsupervised clustering routing modules for attack samples, wherein: non-leaf nodes on the tree neural network pass through a feature extraction module and then are connected with a routing module, and leaf nodes are supervised training modules;
the training sample collection and labeling means that: and carrying out face detection on each frame in the non-living sample video through an MTCNN face detection algorithm by using the public data set and the non-living sample video collected from a video website and used for training, and making a mask as training supervision data.
2. The method as claimed in claim 1, wherein the exposing the data set comprises: collecting and shooting a live sample video for training and a non-live sample video for paper printing and screen attack in an indoor light environment;
the living body sample video used for training is a living body sample video shot in natural light and indoors;
the non-living sample video is a face photo shot and printed as a printing attack and a screen display face photo as a screen attack.
3. The method for detecting the living human face based on the tree neural network structure as claimed in claim 1 or 2, wherein the non-living sample video further comprises: the collection from the video website includes: the method comprises the steps of partially masking, silicone masking, paper masking, dummy models, makeup simulation, paper printing of non-living body sample videos of eye glasses, randomly extracting 10 frames, namely 1626 images by 10 images in total, from training sample videos to serve as training data, marking a living body sample label to be 0, and marking a non-living body sample label to be 1, so that the effect of data enhancement is achieved.
4. The method for detecting the living human face based on the tree neural network structure as claimed in claim 1, wherein the human face detection is that: the real human face part is 0, the prosthesis part is 1, and the method specifically comprises the following steps: the mask of the real face is 0 for the area of the face, the mask of the print photo is 1 for the area of the face, the mask of the eye mask is 1 for the area of the eyes, and the other areas of the face are 0.
5. The method for detecting the living human face based on the tree neural network structure as claimed in claim 1, wherein the training is: and (3) disordering the videos of the living body sample and the non-living body sample, randomly extracting a video frame and a mask thereof, and inputting the video frame and the mask thereof into the tree-shaped neural network for training to obtain a trained human face living body detection model based on the tree-shaped neural network structure.
6. The method for detecting the living human face based on the tree neural network structure as claimed in claim 1, wherein the feature extraction module comprises: three convolutional layers with residual structure and one max pooling layer, where: the convolution kernels of the convolution layers are all 3 × 3 in size, and characteristic graphs with the sizes of 128 × 40, 64 × 40 and 32 × 40 are obtained by using the ReLU as an activation function and adding the BN layer into the first layer, the second layer and the third layer respectively;
the routing module reduces the dimension of the input feature graph by using convolution of 1 × 1, scales the feature graph to 16 × 20, uses reshape as a vector with the length of 16 × 20, and inputs the vector into a routing function to calculate and obtain a routing target;
the supervised training module takes a feature map of 32 × 40 as input, and firstly, one branch passes through a convolution layer of 1 × 1 to generate a non-living body mask map of 32 × 1; the other branch passes through two convolution layers, the sizes of convolution kernels are 3 x 3, the number of channels is 40, then the final result of whether the living body is obtained through two fully-connected layers with the dimensionality of 500 and 2 respectively, the living body is 0, and the non-living body is 1.
7. The method for detecting the living human face based on the tree neural network structure as claimed in claim 1, wherein the unsupervised clustering is as follows: passing the input vector x through a routing function
Figure FDA0002821816750000021
Calculating to obtain a sample set S advancing to the left child nodeleft:
Figure FDA0002821816750000022
And a sample set S proceeding to the right child noderight:
Figure FDA0002821816750000023
Wherein: s is a sample set, IkK is 1,2,3, and K is a data sample;
the loss function adopted for the training of the routing module of the unsupervised cluster used is
Figure FDA0002821816750000024
Wherein: n, Nl,NrRespectively are sample sets S, Sleft,SrightThe number of samples.
8. The method for detecting the living human face based on the tree neural network structure as claimed in claim 1, wherein the preprocessing is as follows: after face detection is carried out on an input picture to be detected to obtain a face position, cutting and zooming are carried out on the picture to be detected according to the obtained face position, and the method specifically comprises the following steps: detecting the face position coordinates in an input picture by using an MTCNN method, selecting the largest face for processing when the picture contains a plurality of faces, and quitting a detection system and returning no-face abnormity when the picture does not contain the faces; and preprocessing the picture to be detected according to the obtained face position, cutting the picture according to face position coordinates obtained by face detection, cutting out the picture of the face part, and zooming to 256 × 256 of a fixed size.
9. The method for detecting the living human face based on the tree neural network structure as claimed in claim 1, wherein the living detection is: preprocessing a picture to be detected and inputting the preprocessed picture into a trained human face living body detection model to obtain a result comprising: and determining whether the detected image is a live body 0/1 value and a mask with the size of a non-live body part being 32 × 1 in the image as a detection result, and performing live body judgment on the detection result and non-live body attack type judgment according to the mask to obtain a final live body detection result.
10. A system for implementing the method of any preceding claim, comprising: the system comprises a data preprocessing unit and a face living body detection unit, wherein the data preprocessing unit receives and preprocesses a picture to be detected, processes the picture into a picture which only contains a face part and has a fixed size, and inputs the picture into the face living body detection unit; the human face living body detection unit is connected with the data preprocessing unit, receives the picture data after data preprocessing, performs living body detection on the human face in the picture, and generates the probability that the human face in the picture is a non-living body and a mask of the non-living body part in the picture.
CN202011439243.7A 2020-12-08 2020-12-08 Face living body detection method based on tree-shaped neural network structure Pending CN112464864A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011439243.7A CN112464864A (en) 2020-12-08 2020-12-08 Face living body detection method based on tree-shaped neural network structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011439243.7A CN112464864A (en) 2020-12-08 2020-12-08 Face living body detection method based on tree-shaped neural network structure

Publications (1)

Publication Number Publication Date
CN112464864A true CN112464864A (en) 2021-03-09

Family

ID=74801916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011439243.7A Pending CN112464864A (en) 2020-12-08 2020-12-08 Face living body detection method based on tree-shaped neural network structure

Country Status (1)

Country Link
CN (1) CN112464864A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111750A (en) * 2021-03-31 2021-07-13 智慧眼科技股份有限公司 Face living body detection method and device, computer equipment and storage medium
CN115131880A (en) * 2022-05-30 2022-09-30 上海大学 Multi-scale attention fusion double-supervision human face in-vivo detection method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255322A (en) * 2018-09-03 2019-01-22 北京诚志重科海图科技有限公司 A kind of human face in-vivo detection method and device
US20190026605A1 (en) * 2017-07-19 2019-01-24 Baidu Online Network Technology (Beijing) Co., Ltd . Neural network model training method and apparatus, living body detecting method and apparatus, device and storage medium
CN110263691A (en) * 2019-06-12 2019-09-20 合肥中科奔巴科技有限公司 Head movement detection method based on android system
CN110472519A (en) * 2019-07-24 2019-11-19 杭州晟元数据安全技术股份有限公司 A kind of human face in-vivo detection method based on multi-model
CN110516576A (en) * 2019-08-20 2019-11-29 西安电子科技大学 Near-infrared living body faces recognition methods based on deep neural network
CN110674730A (en) * 2019-09-20 2020-01-10 华南理工大学 Monocular-based face silence living body detection method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190026605A1 (en) * 2017-07-19 2019-01-24 Baidu Online Network Technology (Beijing) Co., Ltd . Neural network model training method and apparatus, living body detecting method and apparatus, device and storage medium
CN109255322A (en) * 2018-09-03 2019-01-22 北京诚志重科海图科技有限公司 A kind of human face in-vivo detection method and device
CN110263691A (en) * 2019-06-12 2019-09-20 合肥中科奔巴科技有限公司 Head movement detection method based on android system
CN110472519A (en) * 2019-07-24 2019-11-19 杭州晟元数据安全技术股份有限公司 A kind of human face in-vivo detection method based on multi-model
CN110516576A (en) * 2019-08-20 2019-11-29 西安电子科技大学 Near-infrared living body faces recognition methods based on deep neural network
CN110674730A (en) * 2019-09-20 2020-01-10 华南理工大学 Monocular-based face silence living body detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YAOJIE LIU ETAL: "Deep Tree Learning for Zero-shot Face Anti-Spoofing", 《ARXIV:1904.02860V2 [CS.CV]》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111750A (en) * 2021-03-31 2021-07-13 智慧眼科技股份有限公司 Face living body detection method and device, computer equipment and storage medium
CN115131880A (en) * 2022-05-30 2022-09-30 上海大学 Multi-scale attention fusion double-supervision human face in-vivo detection method

Similar Documents

Publication Publication Date Title
CN108830252B (en) Convolutional neural network human body action recognition method fusing global space-time characteristics
CN106096538B (en) Face identification method and device based on sequencing neural network model
CN109101865A (en) A kind of recognition methods again of the pedestrian based on deep learning
JP4743823B2 (en) Image processing apparatus, imaging apparatus, and image processing method
CN107122744A (en) A kind of In vivo detection system and method based on recognition of face
CN111783576A (en) Pedestrian re-identification method based on improved YOLOv3 network and feature fusion
CN109410184B (en) Live broadcast pornographic image detection method based on dense confrontation network semi-supervised learning
Kimura et al. CNN hyperparameter tuning applied to iris liveness detection
CN110674730A (en) Monocular-based face silence living body detection method
CN112926522B (en) Behavior recognition method based on skeleton gesture and space-time diagram convolution network
CN114863263B (en) Snakehead fish detection method for blocking in class based on cross-scale hierarchical feature fusion
CN112464864A (en) Face living body detection method based on tree-shaped neural network structure
CN110599463A (en) Tongue image detection and positioning algorithm based on lightweight cascade neural network
CN112434647A (en) Human face living body detection method
CN111914758A (en) Face in-vivo detection method and device based on convolutional neural network
CN113221655B (en) Face spoofing detection method based on feature space constraint
CN113762009B (en) Crowd counting method based on multi-scale feature fusion and double-attention mechanism
CN112183504B (en) Video registration method and device based on non-contact palm vein image
CN113298018A (en) False face video detection method and device based on optical flow field and facial muscle movement
CN109711232A (en) Deep learning pedestrian recognition methods again based on multiple objective function
CN112560618A (en) Behavior classification method based on skeleton and video feature fusion
Jindal et al. Sign Language Detection using Convolutional Neural Network (CNN)
Zaidan et al. A novel hybrid module of skin detector using grouping histogram technique for Bayesian method and segment adjacent-nested technique for neural network
Karthigayan et al. Genetic algorithm and neural network for face emotion recognition
CN110717544B (en) Pedestrian attribute analysis method and system under vertical fisheye lens

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210309

RJ01 Rejection of invention patent application after publication