CN111967289A - Uncooperative human face in-vivo detection method and computer storage medium - Google Patents

Uncooperative human face in-vivo detection method and computer storage medium Download PDF

Info

Publication number
CN111967289A
CN111967289A CN201910420108.9A CN201910420108A CN111967289A CN 111967289 A CN111967289 A CN 111967289A CN 201910420108 A CN201910420108 A CN 201910420108A CN 111967289 A CN111967289 A CN 111967289A
Authority
CN
China
Prior art keywords
image
model
network
frame
face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910420108.9A
Other languages
Chinese (zh)
Inventor
毛亮
张宇聪
张�杰
朱婷婷
林焕凯
郝鹏
刘昕
山世光
黄仝宇
汪刚
宋一兵
侯玉清
刘双广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Seetatech Beijing Technology Co ltd
Gosuncn Technology Group Co Ltd
Original Assignee
Seetatech Beijing Technology Co ltd
Gosuncn Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Seetatech Beijing Technology Co ltd, Gosuncn Technology Group Co Ltd filed Critical Seetatech Beijing Technology Co ltd
Priority to CN201910420108.9A priority Critical patent/CN111967289A/en
Publication of CN111967289A publication Critical patent/CN111967289A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a non-matching human face in-vivo detection method and a computer storage medium, wherein the method comprises the following steps: s1, acquiring a living body detection image set for model training, and processing the image set; s2, designing an inclusion Net classification network; s3, designing an SSD detection network; s4, performing model training on the image set through the inclusion Net classification network and the SSD detection network; and S5, acquiring a real-time image, and predicting whether the real-time image is a human face living body according to the model training result. According to the non-matching type human face in-vivo detection method provided by the embodiment of the invention, hardware equipment except a camera is not required to be added, and the matching of personnel to be detected is not required, so that the super-real-time human face in-vivo detection speed can be achieved, the cost is low, and the use is convenient.

Description

Uncooperative human face in-vivo detection method and computer storage medium
Technical Field
The invention relates to the field of face detection, in particular to an uncooperative face in-vivo detection method and a computer storage medium.
Background
The human face recognition becomes an important encryption and decryption mode gradually due to rapidity, effectiveness and user friendliness, but at present, many human face recognition systems cannot distinguish the authenticity of human faces, so that the introduction of a living body detection method into the human face recognition system is beneficial to improving the practicability and safety of the human face recognition. At present, the human face living body detection methods mainly comprise the following methods:
1) interactive active liveness detection based on video streaming. The main technical means are as follows: the system firstly carries out face detection and face key point positioning, if a face exists in a video, a plurality of actions are randomly generated, if a tester finishes the specified action within the specified time, the system judges that the tester is a living body, otherwise, the tester is a non-living body. The problems and disadvantages of this method are: user cooperation is required and the time is long.
2) A human face living body detection method based on a bright pupil effect. The main technical means are as follows: the method is characterized in that a living human face and a non-living human face are distinguished by detecting whether bright pupil effect exists in the eye area of the human face. The problems and disadvantages of this method are: additional light source equipment is required, the manufacturing cost is high, and the operation is complicated.
3) A human face living body detection method with image distortion analysis. The main technical means are as follows: firstly, the system carries out face detection and face key point positioning, if a face exists in the picture, 4 features (specular reflection feature, fuzzy degree feature, moment feature and color diversity feature) in the picture are extracted, and a support vector machine is applied to carry out training and prediction. The problems and disadvantages are: the method has the advantages of relatively simple extracted features, weak discrimination capability and weak generalization capability, and cannot be well applied to a real scene.
Therefore, there is still a need to solve the above problems in the prior art, and a need to develop a novel living detection method for uncooperative human faces is urgently needed.
Disclosure of Invention
In view of the above, the present invention provides a non-cooperative human face in-vivo detection method and a computer storage medium, which can effectively improve the accuracy and speed of human face in-vivo detection.
In order to solve the above technical problem, in one aspect, the present invention provides a non-cooperative human face in-vivo detection method, including the following steps: s1, acquiring a living body detection image set for model training, and processing the image set; s2, designing an inclusion Net classification network; s3, designing an SSD detection network; s4, performing model training on the image set through the inclusion Net classification network and the SSD detection network; and S5, acquiring a real-time image, and predicting whether the real-time image is a human face living body according to the model training result.
According to the non-cooperative human face in-vivo detection method provided by the embodiment of the invention, the inclusion Net classification network and the SSD detection network are used in a matched manner, so that the accuracy and the speed of human face in-vivo detection are greatly improved, the SSD detection network can be adapted to training and detection tasks of electronic equipment frames with various scales, and a target can be detected in real time and at high precision. The living body detection method does not need to add hardware equipment except a camera, does not need to be matched with personnel to be detected, can achieve super real-time human face living body detection speed, and is low in cost and convenient to use.
According to some embodiments of the invention, step S1 includes: s11, detecting the image set Q by the camera equipment for batch preservation of the live body; s12, performing frame labeling on the images of the first set in the image set Q, and acquiring an image G with frame labeling information and an image P without labeling information, wherein the images of the first set are subsets of the image set Q.
According to some embodiments of the present invention, in step S2, the inclusion Net classification network is designed as M, the inclusion Net classification network includes a deep neural network a and 3 groups of inclusion structures C, the inclusion structures C include four branches, each of the branches is composed of 1 × 1 convolution, 3 × 3 convolution, 5 × 5 convolution, and 3 × 3 maximal pooling, respectively.
According to some embodiments of the present invention, in step S3, the SSD detection network designed is denoted as S, and the main network of the SSD detection network adds VGG16 models of 4 convolutional layers to the convolutional layers.
According to some embodiments of the invention, step S3 further comprises: performing convolution on the outputs of the 5 different convolution layers by using two convolution kernels of 3 x 3 respectively, wherein one convolution kernel output is a classification confidence coefficient, each default box generates 3 confidence coefficients which respectively correspond to a background, a real person and a non-living person face in a living body detection task, the other convolution kernel output is the positioning of a target position, and each default box generates x, y, w and h4 coordinate values; 5 convolutional layers pass through a priorBox layer to generate a default box; and combining the three calculation results respectively and transmitting the combined result to a loss layer.
According to some embodiments of the invention, step S4 includes:
s41, dividing the image set Q into a training set T and a check set V, and enabling the training set T and the check set V to respectively contain an image G with frame annotation information and an image P without annotation information;
s42, performing forward calculation on part of the face images P in the training set T through the increment Net classification network M, outputting recognition results after each layer of the model, performing model training by applying a batch stochastic gradient descent algorithm according to the label difference between the current network output and input features, and continuously adjusting each weight in the increment Net classification network M;
s43, verifying the training effect of the model by using the image P without the labeling information in the verification set V, and stopping training the model when the accuracy of the model M on the verification set V cannot be continuously improved along with the training time to obtain a model M';
s44, performing forward calculation on part of the image G with the frame marking information in the training set T through the SSD detection network S, and outputting a target frame detected by the model and a category corresponding to each frame after passing through each layer of the model;
s45, comparing the result output by the current network with the target labeling information, and respectively calculating the positioning loss LlocAnd confidence loss Lconf(ii) a The target loss function L (x, c, L, g) of the network ensemble is LlocAnd LconfIs shown in equation (1):
Figure BDA0002065739530000031
wherein, x represents the matching condition of the prior frame and the labeling information; c is the confidence of the model output prediction frame, l is the prediction frame of the model, g is the position of the marked frame, a is a parameter for adjusting the proportion between the confidence loss and the positioning loss, and N is the number of the prediction frames matched with the marked frame information; performing model training on the target function in the formula (1) by using a batch random gradient descent method algorithm, and continuously adjusting each weight in the SSD detection network S;
s46, verifying the training effect of the model by using the image G with the frame marking information in the verification set V, and stopping training the model when the accuracy of the model S on the verification set V cannot be continuously improved along with the training time to obtain a model S';
and S47, adding the confidence degrees of the two model input real person types after the picture passes through M 'and S', and recording as the confidence degree score of the model D.
According to some embodiments of the invention, step S5 includes: s51, acquiring a real-time RGB image A through camera equipment, inputting the image A into a cascade convolution neural network model, and carrying out face detection; if the human face is detected, normalizing the human face to obtain a normalized human face image B; and recording the position k of the face in the image A; s52, inputting the image A into the model M' obtained in the step S4 to obtain a result x; s53, inputting the picture B into the model S' obtained in the step S4 to obtain a frame set Y; s54, judging the coincidence rate of the face position k and each frame in the frame set Y; if the frame with the coincidence rate exceeding the set value exists, recording the confidence coefficient u of the frame; selecting a maximum value v from all u meeting the conditions; and S55, inputting the result x and the maximum confidence v to the fusion model D obtained in the step S4, and giving the human face living body detection prediction result of the image A through the fusion model D.
According to some embodiments of the invention, in step S51, the step of normalizing the face is: and inputting the face region picture in the image A into a face key point detection model for face key point positioning, and transforming the face pictures in different postures into the face pictures in the standard posture by calculating affine transformation from key points to standard key points to obtain a normalized face image B.
According to some embodiments of the present invention, in step S53, the overlapped or incorrect predicted frames are removed by the non-maximum suppression algorithm to obtain a frame set W, and the subsequent operation is performed through the frame set W.
In a second aspect, embodiments of the present invention provide a computer storage medium comprising one or more computer instructions that, when executed, implement a method as in the above embodiments.
Drawings
FIG. 1 is a general flowchart of a non-cooperative human face live detection method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an inclusion structure C in the uncooperative human face in-vivo detection method according to the embodiment of the present invention;
FIG. 3 is a flow chart of real-time prediction in the uncooperative human face in-vivo detection method according to the embodiment of the invention;
fig. 4 is a schematic diagram of an electronic device according to an embodiment of the invention.
Reference numerals:
a non-fit human face in vivo detection method 100;
an electronic device 300;
a memory 310; an operating system 311; an application 312;
a processor 320; a network interface 330; an input device 340; a hard disk 350; a display device 360.
Detailed Description
The following detailed description of embodiments of the present invention will be made with reference to the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
First, the uncooperative human face liveness detection method 100 according to the embodiment of the present invention will be described in detail with reference to the drawings.
As shown in fig. 1, the uncooperative human face live detection method according to the embodiment of the present invention includes the following steps:
and S1, acquiring a living body detection image set for model training, and processing the image set.
And S2, designing an inclusion Net classification network.
And S3, designing the SSD detection network.
S4, performing model training on the image set through the inclusion Net classification network and the SSD detection network.
And S5, acquiring a real-time image, and predicting whether the real-time image is a human face living body according to the model training result.
In other words, according to the uncooperative human face living body detection method of the embodiment of the invention, before the real-time image of the human face is acquired, the living body detection image sets for model training are collected, the image sets are labeled, then the increment Net classification network and the SSD detection network are designed, the image sets are subjected to model training through the increment Net classification network and the SSD detection network, the real-time image is acquired on the basis of the model training, and whether the real-time image is the living body human face or not is predicted and judged according to the fusion model of the model training.
The method classifies the authenticity of the face by using a classifier fusing an SSD detection network and a face increment net classification network, and the method has the technical advantages that: the SSD detection network is used for detecting the environment information of the face, the face increment net is used for classifying the contents of the face part, and the two models are fused to have a better living body detection effect relative to a single classifier or detector.
Therefore, according to the non-cooperative human face in-vivo detection method provided by the embodiment of the invention, the inclusion Net classification network and the SSD detection network are used in a matched manner, so that the accuracy and the speed of human face in-vivo detection are greatly improved, the SSD detection network can be adapted to training and detection tasks of electronic equipment frames with various scales, and a target can be detected in real time and at high precision. The living body detection method does not need to add hardware equipment except a camera, does not need to be matched with personnel to be detected, can achieve super real-time human face living body detection speed, and is low in cost and convenient to use.
The following describes the steps of the uncooperative human face live detection method according to the embodiment of the invention.
In some embodiments of the present invention, step S1 includes:
s11, the image set Q is detected by the image pickup device for batch saving of live objects.
S12, performing frame labeling on the images of the first set in the image set Q, and acquiring an image G with frame labeling information and an image P without labeling information, wherein the images of the first set are subsets of the image set Q.
Specifically, the partial images in the image set Q may be frame-marked manually, where an image with frame-marking information may be denoted as G, and an image without frame-marking information may be denoted as P.
Therefore, the authenticity of the human face can be classified by classifying and marking the living body detection image sets stored in batches.
In step S2, the inclusion Net classification network is designated as M, and includes a deep neural network a and 3 groups of inclusion structures C, and the inclusion structures C include four branches, each of which is composed of 1 × 1 convolution, 3 × 3 convolution, 5 × 5 convolution, and 3 × 3 max pooling, respectively.
Specifically, the inclusion Net network structure is a deeper network structure than a VGG deep learning convolutional network, but has fewer parameters and higher computational efficiency. Recording a newly designed increment Net classification network as M, wherein the newly designed increment Net classification network comprises the combination of a deep neural network A (8-layer convolutional neural network) and 3 groups of increment structures C; the inclusion structure C employs four branches, each branch consisting of 1 × 1 convolution, 3 × 3 convolution, 5 × 5 convolution, and 3 × 3 max pooling, respectively; the feature map of the previous layer generates four outputs after passing through four branches, and the outputs of the four branches are superposed in the channel dimension and used as the input of the next layer.
And a plurality of inclusion structures C are added after the last convolution layer of the deep neural network A, so that the width of the network is increased, and the adaptability of the network to different scales is also increased, therefore, compared with the deep neural network A and the inclusion Net classification network M, the classification effect of the human face living body classifier is better.
Generally, the most direct method for improving the performance of the deep neural network a is to increase the depth and width of the network, but this means the huge amount of parameters, which easily generate overfitting and greatly increase the amount of calculation. In the present invention, as shown in fig. 2, the inclusion structure C can perform a plurality of convolution operations or pooling operations on the input image in parallel, and concatenate all output results into a very deep feature map. Since different convolution operations and pooling operations of 1 x 1, 3 x 3 or 5 x 5 etc. can obtain different information of the input image, processing these operations in parallel and combining all the results will obtain a better image representation. Therefore, the newly designed Incepton Net classification network can keep the sparsity of the network structure and can utilize the high computation performance of the dense matrix, so that the training effect is improved.
Therefore, according to the non-cooperative human face in-vivo detection method provided by the embodiment of the invention, the human face inclusion net classification network is additionally provided with an inclusion structure on the basis of the traditional convolutional neural network, and the method has the technical advantages that: the classification effect of the human face living body classifier can be greatly improved.
Optionally, in some embodiments of the present invention, in step S3, the SSD detection network is marked as S, and a primary network of the SSD detection network adds VGG16 models of 4 convolutional layers to the convolutional layers.
Further, step S3 further includes:
and respectively convolving the outputs of the 5 different convolution layers by using two convolution kernels of 3 x 3, wherein one convolution kernel output is a classification confidence coefficient, each default box generates 3 confidence coefficients which respectively correspond to a background, a real person and a non-living human face in a living body detection task, the other convolution kernel output is the positioning of a target position, and each default box generates x, y, w and h4 coordinate values.
5 convolution layers pass through the priorBox layer to generate a default box.
And combining the three calculation results respectively and transmitting the combined result to a loss layer.
In other words, an SSD (Single-Shot multi box Detector) detection network is an object detection algorithm that directly predicts coordinates and categories of a frame, and there is no process of generating a candidate region. For the detection of objects of different sizes, the conventional method is to convert the images into different sizes, then process them separately, and finally integrate the results.
The newly designed SSD detection network is marked as S, and the main network structure is a VGG16 model which changes two full connection layers into convolution layers and then adds 4 convolution layers. Convolving the outputs of 5 different convolution layers (conv4_3, conv7, conv9_2, conv10_2 and conv11_2) by two convolution kernels of 3 x 3 respectively, wherein one convolution kernel is output as a classification confidence coefficient, and each default box generates 3 confidence coefficients (respectively corresponding to the background, the real person and the non-living human face in a living body detection task); the output of the other convolution kernel is the positioning of the target position, and each default box generates 4 coordinate values (x, y, w, h); in addition, the 5 convolutional layers also pass through the priorBox layer to generate a default box; and finally, combining the three calculation results respectively and then transmitting the result to the loss layer. Wherein, the default box means that there is a series of default boxes with fixed size in each cell of feature map. The PriorBox layer is a network fabric layer for deploying a default box at each location (pixel point) in the feature map.
Generally, the target detector will first generate candidate regions by some method and then classify and pinpoint the candidate regions, but this resampling process will take a lot of time and slow down. The newly designed SSD detection network is a network capable of directly positioning and classifying targets in pictures, and the network can generate a series of frames with fixed sizes on each cell in several feature maps at specific layers, so that the network can detect targets with different length-width ratios.
Therefore, according to the uncooperative human face living body detection method provided by the embodiment of the invention, the SSD detection network adopts a one stage idea so as to improve the detection speed; and the anchors thought in the Faster R-CNN is merged into the network, and the characteristic layering extraction and the frame regression and classification operation are sequentially calculated. The method has the technical advantages that: the SSD detection network can be adapted to training and detecting tasks of electronic equipment frames with various scales, and can detect a target in real time and with high precision.
According to an embodiment of the present invention, the step S4 includes the following steps:
and S41, dividing the image set Q into a training set T and a check set V, and enabling the training set T and the check set V to respectively contain an image G with frame annotation information and an image P without annotation information.
S42, performing forward calculation on part of the face images P in the training set T through the Incepration Net classification network M, outputting recognition results after each layer of the model, performing model training by applying a batch stochastic gradient descent algorithm according to the label difference between the current network output and input features, and continuously adjusting each weight in the Incepration Net classification network M.
S43, verifying the training effect of the model by using the image P without the labeling information in the verification set V, and stopping the training of the model when the accuracy of the model M on the verification set V cannot be continuously improved along with the training time to obtain the model M'.
And S44, performing forward calculation on part of the image G with the frame marking information in the training set T through the SSD detection network S, and outputting the target frame detected by the model and the category corresponding to each frame after passing through each layer of the model.
S45, comparing the result output by the current network with the target labeling information, and respectively calculating the positioning loss LlocAnd confidence loss Lconf(ii) a The target loss function L (x, c, L, g) of the network ensemble is LlocAnd LconfIs shown in equation (1):
Figure BDA0002065739530000091
wherein, x represents the matching condition of the prior frame and the labeling information; c is the confidence of the model output prediction frame, l is the prediction frame of the model, g is the position of the marked frame, a is a parameter for adjusting the proportion between the confidence loss and the positioning loss, and N is the number of the prediction frames matched with the marked frame information; and (3) performing model training on the target function in the formula (1) by using a batch random gradient descent method algorithm, and continuously adjusting each weight in the SSD detection network S.
And S46, verifying the training effect of the model by using the image G with the frame marking information in the verification set V, and stopping training the model to obtain the model S' when the accuracy of the model S on the verification set V cannot be continuously improved along with the training time.
And S47, adding the confidence degrees of the two model input real person types after the picture passes through M 'and S', and recording as the confidence degree score of the model D.
In step S42, forward calculation is performed on a part of the face images P in the training set T through the inclusion Net classification network M, an identification result is output after each layer (n layers, i.e., M1, … Mn) of the model, each layer network is composed of many neurons, each neuron has a preset weight, model training is performed by applying a batch stochastic gradient descent algorithm according to a label difference between current network output and input features, and each weight in the inclusion Net classification network M is continuously adjusted.
In step S44, the model has k layers, i.e., S1, … … Sk, and each layer network is composed of many neurons, each neuron having a preset weight.
In step S47, the trained inclusion Net classification network M 'and SSD detection network S' respectively have a better recognition effect on the pictures in the training set; the result of fusing the two models is stronger than the discrimination capability of a single model and is marked as a model D. The adopted fusion method comprises the following steps: and after the picture passes through M 'and S', adding the confidence degrees of the two models input into the real person category, and recording as the confidence degree score of the model D.
Therefore, according to the uncooperative human face in-vivo detection method provided by the embodiment of the invention, the authenticity of the human face is classified by using the classifier fusing the SSD detection network and the human face increment net classification network, and the technical advantages of the method are as follows: the SSD detection network is used for detecting the environment information of the face, the face increment net is used for classifying the contents of the face part, and the two models are fused to have a better living body detection effect relative to a single classifier or detector.
As shown in fig. 3, according to an embodiment of the present invention, step S5 includes:
s51, acquiring a real-time RGB image A through camera equipment, inputting the image A into a cascade convolution neural network model, and carrying out face detection; if the human face is detected, normalizing the human face to obtain a normalized human face image B; and the position k of the face in image a is recorded.
S52, inputting the image A into the model M' obtained in the step S4 to obtain a result x.
S53, inputting the picture B into the model S' obtained in the step S4 to obtain a frame set Y.
S54, judging the coincidence rate of the face position k and each frame in the frame set Y; if the frame with the coincidence rate exceeding the set value exists, recording the confidence coefficient u of the frame; and selecting a maximum value v from all the u meeting the conditions.
And S55, inputting the result x and the maximum confidence v to the fusion model D obtained in the step S4, and giving the human face living body detection prediction result of the image A through the fusion model D.
Specifically, in step S51, a real-time RGB image a is acquired by an image pickup device, and the image a is input into a Cascade convolutional neural network (Cascade CNN) model for face detection; if the face is detected, inputting a face region picture in the image A into a face key point detection model for face key point positioning, and transforming the face pictures in different postures into the face pictures in the standard postures by calculating affine transformation from key points to standard key points to obtain a normalized face image B; and the position k of the face in image a is recorded.
In step S53, since the model S' generates a large number of predicted frames, which have a large number of errors and overlaps, the non-maximum suppression algorithm removes the overlapped or erroneous predicted frames to obtain the frame set W.
In other words, after the training model is established, when a real-time image is detected, a real-time image is acquired by a camera device, whether a human face is detected is judged, if the human face is detected, normalization processing is performed to obtain a normalized human face image B, the position k of the human face in the image a is recorded, the image a and the image B are respectively input into the model obtained in the step S4 to obtain different results, the coincidence rate of the position k of the human face and each frame in the frame set Y is judged, and finally the result and the maximum confidence value are input into the fusion model, so that the human face living body detection result can be predicted.
In summary, according to the non-cooperative human face in-vivo detection method provided by the embodiment of the invention, the inclusion Net classification network and the SSD detection network are used in cooperation, so that the accuracy and speed of human face in-vivo detection are greatly improved, the SSD detection network can be adapted to training and detection tasks of electronic device frames with various scales, and a target can be detected in real time and with high precision. The living body detection method does not need to add hardware equipment except a camera, does not need to be matched with personnel to be detected, can achieve super real-time human face living body detection speed, and is low in cost and convenient to use.
In addition, the present invention also provides a computer storage medium, which includes one or more computer instructions, and when executed, the one or more computer instructions implement any of the above-mentioned uncooperative human face liveness detection methods 100.
That is, the computer storage medium stores a computer program that, when executed by a processor, causes the processor to perform any of the above-described uncooperative face liveness detection methods 100.
As shown in fig. 4, an embodiment of the present invention provides an electronic device 300, which includes a memory 310 and a processor 320, where the memory 310 is configured to store one or more computer instructions, and the processor 320 is configured to call and execute the one or more computer instructions, so as to implement any one of the methods described above.
That is, the electronic device 300 includes: a processor 320 and a memory 310, in which memory 310 computer program instructions are stored, wherein the computer program instructions, when executed by the processor, cause the processor 320 to perform any of the methods 100 described above.
Further, as shown in fig. 4, the electronic device 300 further includes a network interface 330, an input device 340, a hard disk 350, and a display device 360.
The various interfaces and devices described above may be interconnected by a bus architecture. A bus architecture may be any architecture that may include any number of interconnected buses and bridges. Various circuits of one or more Central Processing Units (CPUs), represented in particular by processor 320, and one or more memories, represented by memory 310, are coupled together. The bus architecture may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like. It will be appreciated that a bus architecture is used to enable communications among the components. The bus architecture includes a power bus, a control bus, and a status signal bus, in addition to a data bus, all of which are well known in the art and therefore will not be described in detail herein.
The network interface 330 may be connected to a network (e.g., the internet, a local area network, etc.), and may obtain relevant data from the network and store the relevant data in the hard disk 350.
The input device 340 may receive various commands input by an operator and send the commands to the processor 320 for execution. The input device 340 may include a keyboard or a pointing device (e.g., a mouse, a trackball, a touch pad, a touch screen, or the like).
The display device 360 may display the result of the instructions executed by the processor 320.
The memory 310 is used for storing programs and data necessary for operating the operating system, and data such as intermediate results in the calculation process of the processor 320.
It will be appreciated that memory 310 in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. The memory 310 of the apparatus and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
In some embodiments, memory 310 stores the following elements, executable modules or data structures, or a subset thereof, or an expanded set thereof: an operating system 311 and application programs 312.
The operating system 311 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application programs 312 include various application programs, such as a Browser (Browser), and are used for implementing various application services. A program implementing methods of embodiments of the present invention may be included in application 312.
The method disclosed by the above embodiment of the present invention can be applied to the processor 320, or implemented by the processor 320. Processor 320 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 320. The processor 320 may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, and may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present invention. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 310, and the processor 320 reads the information in the memory 310 and completes the steps of the method in combination with the hardware.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
In particular, the processor 320 is also configured to read the computer program and execute any of the methods described above.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the transceiving method according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A non-matching human face living body detection method is characterized by comprising the following steps:
s1, acquiring a living body detection image set for model training, and processing the image set;
s2, designing an inclusion Net classification network;
s3, designing an SSD detection network;
s4, performing model training on the image set through the inclusion Net classification network and the SSD detection network;
and S5, acquiring a real-time image, and predicting whether the real-time image is a human face living body according to the model training result.
2. The method according to claim 1, wherein step S1 includes:
s11, detecting the image set Q by the camera equipment for batch preservation of the live body;
s12, performing frame labeling on the images of the first set in the image set Q, and acquiring an image G with frame labeling information and an image P without labeling information, wherein the images of the first set are subsets of the image set Q.
3. The method according to claim 2, wherein in step S2, the inclusion Net classification network designed is denoted as M, and comprises a deep neural network a and 3 sets of inclusion structures C, wherein the inclusion structures C comprise four branches, and each branch consists of 1 × 1 convolution, 3 × 3 convolution, 5 × 5 convolution, and 3 × 3 maximal pooling, respectively.
4. The method of claim 2, wherein in step S3, the SSD detection network is designed as S, and a primary network of the SSD detection network adds VGG16 models of 4 convolutional layers to the convolutional layers.
5. The method according to claim 4, wherein step S3 further comprises:
performing convolution on the outputs of the 5 different convolution layers by using two convolution kernels of 3 x 3 respectively, wherein one convolution kernel output is a classification confidence coefficient, each default box generates 3 confidence coefficients which respectively correspond to a background, a real person and a non-living person face in a living body detection task, the other convolution kernel output is the positioning of a target position, and each default box generates x, y, w and h4 coordinate values;
5 convolutional layers pass through a priorBox layer to generate a default box;
and combining the three calculation results respectively and transmitting the combined result to a loss layer.
6. The method according to claim 2, wherein step S4 includes:
s41, dividing the image set Q into a training set T and a check set V, and enabling the training set T and the check set V to respectively contain an image G with frame annotation information and an image P without annotation information;
s42, performing forward calculation on part of the face images P in the training set T through the increment Net classification network M, outputting recognition results after each layer of the model, performing model training by applying a batch stochastic gradient descent algorithm according to the label difference between the current network output and input features, and continuously adjusting each weight in the increment Net classification network M;
s43, verifying the training effect of the model by using the image P without the labeling information in the verification set V, and stopping training the model when the accuracy of the model M on the verification set V cannot be continuously improved along with the training time to obtain a model M';
s44, performing forward calculation on part of the image G with the frame marking information in the training set T through the SSD detection network S, and outputting a target frame detected by the model and a category corresponding to each frame after passing through each layer of the model;
s45, comparing the result output by the current network with the target labeling information, and respectively calculating the positioning loss LlocAnd confidence loss Lconf(ii) a The target loss function L (x, c, L, g) of the network ensemble is LlocAnd LconfIs shown in equation (1):
Figure FDA0002065739520000021
wherein, x represents the matching condition of the prior frame and the labeling information; c is the confidence of the model output prediction frame, l is the prediction frame of the model, g is the position of the marked frame, a is a parameter for adjusting the proportion between the confidence loss and the positioning loss, and N is the number of the prediction frames matched with the marked frame information; performing model training on the target function in the formula (1) by using a batch random gradient descent method algorithm, and continuously adjusting each weight in the SSD detection network S;
s46, verifying the training effect of the model by using the image G with the frame marking information in the verification set V, and stopping training the model when the accuracy of the model S on the verification set V cannot be continuously improved along with the training time to obtain a model S';
and S47, adding the confidence degrees of the two model input real person types after the picture passes through M 'and S', and recording as the confidence degree score of the model D.
7. The method according to claim 6, wherein step S5 includes:
s51, acquiring a real-time RGB image A through camera equipment, inputting the image A into a cascade convolution neural network model, and carrying out face detection; if the human face is detected, normalizing the human face to obtain a normalized human face image B; and recording the position k of the face in the image A;
s52, inputting the image A into the model M' obtained in the step S4 to obtain a result x;
s53, inputting the picture B into the model S' obtained in the step S4 to obtain a frame set Y;
s54, judging the coincidence rate of the face position k and each frame in the frame set Y; if the frame with the coincidence rate exceeding the set value exists, recording the confidence coefficient u of the frame; selecting a maximum value v from all u meeting the conditions;
and S55, inputting the result x and the maximum confidence v to the fusion model D obtained in the step S4, and giving the human face living body detection prediction result of the image A through the fusion model D.
8. The method according to claim 7, wherein in step S51, the step of normalizing the face is:
and inputting the face region picture in the image A into a face key point detection model for face key point positioning, and transforming the face pictures in different postures into the face pictures in the standard posture by calculating affine transformation from key points to standard key points to obtain a normalized face image B.
9. The method of claim 7, wherein in step S53, the overlapped or erroneous predicted bounding box is removed by a non-maximum suppression algorithm to obtain a bounding box set W, and the subsequent operation is performed by the bounding box set W.
10. A computer storage medium comprising one or more computer instructions which, when executed, implement the method of any of claims 1-9.
CN201910420108.9A 2019-05-20 2019-05-20 Uncooperative human face in-vivo detection method and computer storage medium Pending CN111967289A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910420108.9A CN111967289A (en) 2019-05-20 2019-05-20 Uncooperative human face in-vivo detection method and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910420108.9A CN111967289A (en) 2019-05-20 2019-05-20 Uncooperative human face in-vivo detection method and computer storage medium

Publications (1)

Publication Number Publication Date
CN111967289A true CN111967289A (en) 2020-11-20

Family

ID=73357670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910420108.9A Pending CN111967289A (en) 2019-05-20 2019-05-20 Uncooperative human face in-vivo detection method and computer storage medium

Country Status (1)

Country Link
CN (1) CN111967289A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613470A (en) * 2020-12-30 2021-04-06 山东山大鸥玛软件股份有限公司 Face silence living body detection method, device, terminal and storage medium
CN113486699A (en) * 2021-05-07 2021-10-08 成都理工大学 Automatic detection method and device for fatigue driving

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105389554A (en) * 2015-11-06 2016-03-09 北京汉王智远科技有限公司 Face-identification-based living body determination method and equipment
CN107818313A (en) * 2017-11-20 2018-03-20 腾讯科技(深圳)有限公司 Vivo identification method, device, storage medium and computer equipment
CN108038456A (en) * 2017-12-19 2018-05-15 中科视拓(北京)科技有限公司 A kind of anti-fraud method in face identification system
CN108182409A (en) * 2017-12-29 2018-06-19 北京智慧眼科技股份有限公司 Biopsy method, device, equipment and storage medium
CN108416304A (en) * 2018-03-12 2018-08-17 中科视拓(北京)科技有限公司 A kind of three classification method for detecting human face using contextual information
CN108596082A (en) * 2018-04-20 2018-09-28 重庆邮电大学 Human face in-vivo detection method based on image diffusion velocity model and color character
CN108875618A (en) * 2018-06-08 2018-11-23 高新兴科技集团股份有限公司 A kind of human face in-vivo detection method, system and device
CN108985200A (en) * 2018-07-02 2018-12-11 中国科学院半导体研究所 A kind of In vivo detection algorithm of the non-formula based on terminal device
CN109166196A (en) * 2018-06-21 2019-01-08 广东工业大学 A kind of hotel's disengaging personnel management methods based on single sample recognition of face
CN109598242A (en) * 2018-12-06 2019-04-09 中科视拓(北京)科技有限公司 A kind of novel biopsy method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105389554A (en) * 2015-11-06 2016-03-09 北京汉王智远科技有限公司 Face-identification-based living body determination method and equipment
CN107818313A (en) * 2017-11-20 2018-03-20 腾讯科技(深圳)有限公司 Vivo identification method, device, storage medium and computer equipment
CN108038456A (en) * 2017-12-19 2018-05-15 中科视拓(北京)科技有限公司 A kind of anti-fraud method in face identification system
CN108182409A (en) * 2017-12-29 2018-06-19 北京智慧眼科技股份有限公司 Biopsy method, device, equipment and storage medium
CN108416304A (en) * 2018-03-12 2018-08-17 中科视拓(北京)科技有限公司 A kind of three classification method for detecting human face using contextual information
CN108596082A (en) * 2018-04-20 2018-09-28 重庆邮电大学 Human face in-vivo detection method based on image diffusion velocity model and color character
CN108875618A (en) * 2018-06-08 2018-11-23 高新兴科技集团股份有限公司 A kind of human face in-vivo detection method, system and device
CN109166196A (en) * 2018-06-21 2019-01-08 广东工业大学 A kind of hotel's disengaging personnel management methods based on single sample recognition of face
CN108985200A (en) * 2018-07-02 2018-12-11 中国科学院半导体研究所 A kind of In vivo detection algorithm of the non-formula based on terminal device
CN109598242A (en) * 2018-12-06 2019-04-09 中科视拓(北京)科技有限公司 A kind of novel biopsy method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHRISTIAN SZEGEDY 等: "Going deeper with convolutions", 《ARXIV》, pages 3 - 4 *
WEI LIU 等: "SSD: Single Shot MultiBox Detector", 《ARXIV》, pages 3 - 4 *
XIAO SONG 等: "Discriminative Representation Combinations for Accurate Face Spoofing Detection", 《ARXIV》, pages 3 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613470A (en) * 2020-12-30 2021-04-06 山东山大鸥玛软件股份有限公司 Face silence living body detection method, device, terminal and storage medium
CN113486699A (en) * 2021-05-07 2021-10-08 成都理工大学 Automatic detection method and device for fatigue driving

Similar Documents

Publication Publication Date Title
US11011275B2 (en) System and method for diagnosing gastrointestinal neoplasm
Zhang et al. Ensnet: Ensconce text in the wild
Yan et al. Face detection by structural models
US8750573B2 (en) Hand gesture detection
US8792722B2 (en) Hand gesture detection
Chen et al. Adversarial occlusion-aware face detection
US20120183212A1 (en) Identifying descriptor for person or object in an image
CN109657533A (en) Pedestrian recognition methods and Related product again
JP2017062781A (en) Similarity-based detection of prominent objects using deep cnn pooling layers as features
WO2023010758A1 (en) Action detection method and apparatus, and terminal device and storage medium
US20130251246A1 (en) Method and a device for training a pose classifier and an object classifier, a method and a device for object detection
Wang et al. A coupled encoder–decoder network for joint face detection and landmark localization
CN111695392B (en) Face recognition method and system based on cascade deep convolutional neural network
CN111079519B (en) Multi-gesture human body detection method, computer storage medium and electronic equipment
US20210272292A1 (en) Detection of moment of perception
US20230137337A1 (en) Enhanced machine learning model for joint detection and multi person pose estimation
JP2013206458A (en) Object classification based on external appearance and context in image
WO2023109361A1 (en) Video processing method and system, device, medium and product
JP2009064434A (en) Determination method, determination system and computer readable medium
US20240087352A1 (en) System for identifying companion animal and method therefor
CN111967289A (en) Uncooperative human face in-vivo detection method and computer storage medium
Wang et al. Multistage model for robust face alignment using deep neural networks
JP7396076B2 (en) Number recognition device, method and electronic equipment
Oguine et al. Yolo v3: Visual and real-time object detection model for smart surveillance systems (3s)
Li et al. A novel method for lung masses detection and location based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination