CN112232204A - Living body detection method based on infrared image - Google Patents

Living body detection method based on infrared image Download PDF

Info

Publication number
CN112232204A
CN112232204A CN202011106811.1A CN202011106811A CN112232204A CN 112232204 A CN112232204 A CN 112232204A CN 202011106811 A CN202011106811 A CN 202011106811A CN 112232204 A CN112232204 A CN 112232204A
Authority
CN
China
Prior art keywords
face
detector
living body
key point
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011106811.1A
Other languages
Chinese (zh)
Other versions
CN112232204B (en
Inventor
严安
周治尹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dianze Intelligent Technology Co ltd
Zhongke Zhiyun Technology Co ltd
Original Assignee
Shanghai Dianze Intelligent Technology Co ltd
Zhongke Zhiyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dianze Intelligent Technology Co ltd, Zhongke Zhiyun Technology Co ltd filed Critical Shanghai Dianze Intelligent Technology Co ltd
Priority to CN202011106811.1A priority Critical patent/CN112232204B/en
Publication of CN112232204A publication Critical patent/CN112232204A/en
Application granted granted Critical
Publication of CN112232204B publication Critical patent/CN112232204B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive

Abstract

The invention belongs to the technical field of face recognition, and particularly relates to a real-time multifunctional face detection method. The living body detection method based on the infrared image comprises the following steps: collecting an infrared picture and carrying out preprocessing operation; the picture is put into a detector for prediction, and a face frame prediction value, a face key point and a mask recognition result are obtained; decoding the face frame prediction value and the face key point; eliminating overlapped detection frames by adopting a non-maximum suppression algorithm with a threshold value of 0.4 to obtain a final face detection frame, a face key point and a mask recognition result; extracting coordinates x and y of two eyes according to key points of the face, and extending the x and y to preset pixels in four directions respectively to obtain an eye image; and judging whether the eye image is a living body by adopting a living body recognition neural network to obtain a judgment result. The invention can achieve the real-time detection effect under the condition that the mobile end only has a CPU, and accurately detect the eye position.

Description

Living body detection method based on infrared image
Technical Field
The invention belongs to the technical field of face recognition, and particularly relates to a real-time multifunctional face detection method.
Background
The face recognition system takes a face recognition technology as a core, is an emerging biological recognition technology, and is a high-precision technology for the current international scientific and technological field. The method is widely applied to regional characteristic analysis, integrates a computer image processing technology and a biological statistics principle, extracts portrait characteristic points from a video by using the computer image processing technology, analyzes and establishes a mathematical model by using the biological statistics principle, and has wide development prospect. Face detection is a key link in automatic face recognition systems. However, the human face has quite complicated detail changes, different appearances such as face shapes, skin colors and the like, and different expressions such as opening and closing of eyes and mouths and the like; mask occlusion, etc., and the variation of these intrinsic and extrinsic factors makes face detection a complex and challenging pattern detection problem in face recognition systems.
Although people have extensively studied face detection algorithms based on convolutional neural networks, the face detection algorithms on mobile devices cannot achieve real-time effects at the mobile end, nor can they achieve real-time detection effects under the condition of only a CPU.
In addition, when the existing face detection is performed, the detection function is single, the eye position cannot be accurately detected, the steps for detecting a dynamic living body are multiple, the dynamic living body is easily influenced by external environments such as natural illumination, and the robustness is insufficient.
Disclosure of Invention
The invention aims to solve the technical problems that the existing face detection cannot accurately detect the eye removing position and the dynamic living body detection has various steps, and provides a living body detection method based on an infrared image.
The living body detection method based on the infrared image comprises the following steps:
acquiring an infrared picture and preprocessing the picture;
the picture is put into a preset detector for prediction, features obtained through four different convolution layers in a backbone network of the detector are combined with anchor points with multiple sizes, face detection, face key point detection and mask recognition are carried out, and a face frame prediction value, a face key point and a mask recognition result are obtained;
decoding the predicted value of the face frame, converting the predicted value into the real position of a boundary frame, and decoding the key points of the face to convert the key points into the real position of the key points;
eliminating overlapped detection frames by adopting a non-maximum suppression algorithm with a threshold value of 0.4 to obtain a final face detection frame, face key points and mask recognition results, wherein the final face detection frame, face key points and mask recognition results comprise information of a left upper corner coordinate, a right lower corner coordinate, two eye coordinates, a nose coordinate, a pair of mouth corner coordinates and confidence coefficient of wearing a mask;
extracting coordinates x and y of two eyes according to the key points of the face, and extending the x and y to preset pixels in four directions respectively to obtain an eye image;
and judging whether the eye image is a living body by adopting a preset living body recognition neural network to obtain a judgment result.
Optionally, before the picture is placed in a preset detector for prediction, the method further includes:
loading preset pre-training network parameters to the detector, and generating a default anchor point according to the size and length-width ratio of the preset anchor point;
training the detector through a preset data set to obtain a trained detector;
the detector includes a backbone network, a prediction layer, and a multi-tasking loss layer.
Optionally, the training the detector through a preset data set to obtain a trained detector includes:
acquiring unoccluded data and occluded data serving as a data set, converting a BGR picture in the data set into a YUV format, only storing data of a Y channel, and then performing data enhancement to obtain an enhanced data set;
performing network training by using a random optimization algorithm with momentum of 0.9 and weight attenuation factor of 0.0005, wherein the random optimization algorithm reduces imbalance between positive and negative samples by using a difficult sample mining mode, and the initial learning rate is set to 10 in the first 100 rounds of training-3After 50 and 100 rounds each decreased by a factor of 10, each predictor was first compared to the best Jacca during trainingrd overlaps the anchor point to match, then matches the anchor point to the Jaccard overlapping face with a threshold above 0.35.
Optionally, the non-occlusion data is a face picture when the mask is not worn, the occlusion data is a face picture when the mask is worn, and the occlusion data is greater than the non-occlusion data.
Optionally, the performing data enhancement includes:
adding data to prevent model overfitting by applying a combination of at least one or more of color distortion, increased brightness contrast, random cropping, horizontal flipping, and transformation channels to pictures in the data set.
Optionally, put into the picture and predict in the detector of predetermineeing, through the characteristics that four different convolution layers obtained combine with the anchor point of a plurality of sizes in the backbone network of detector, carry out face detection, people's face key point detection and gauze mask discernment, obtain people's face frame predicted value, people's face key point and gauze mask recognition result, include:
the pictures are put into the trained detector for prediction, and the characteristics of 8 th, 11 th, 13 th and 15 th convolutional layers in the backbone network are respectively input into each prediction layer for face frame, face key point positioning and mask recognition operation during prediction;
for each anchor point, representing by using 4 offsets from its coordinates and N scores for classification, where N is 2; for each anchor point during detector training, the minimization of the multitask loss function:
Figure BDA0002727188230000031
wherein L isobjDetecting for a cross entropy loss function whether an anchor point contains a target classification, piFor the probability of an anchor having a target, if the anchor contains a target, then
Figure BDA0002727188230000032
Otherwise, the value is 0; l isboxEmploying the smoth-L1 loss function for humansPositioning of the face anchor, ti={tx,ty,tw,thI is the coordinate offset of the prediction box,
Figure BDA0002727188230000033
the coordinate offset of the anchor point of the positive sample; l islandmarkAdopting smoth-L1 loss function for positioning key points of human facei={lx1,ly1,lx2,ly2,...,lx5,ly5}iFor the predicted amount of the keypoint offset,
Figure BDA0002727188230000034
is the coordinate offset of the key point of the positive sample, if the sample is the wearing maski={lx1,ly1,lx2,ly2}i
Figure BDA0002727188230000035
Wherein lx1,ly1And
Figure BDA0002727188230000036
respectively representing the left-eye predicted keypoint coordinate offset and the positive sample keypoint offset, lx2,ly2And
Figure BDA0002727188230000037
respectively representing the coordinate offset of the right-eye predicted key point and the offset of the positive sample key point; lambda [ alpha ]1And λ2Respectively, the weight coefficients of the face frame and the key point loss function.
Optionally, anchor points of 10 to 256 pixels are used to match the minimum size of the corresponding effective receptive field, with each anchor point for detecting features being sized to (10, 16, 24), (32, 48), (64, 96) and (128, 192, 256), respectively.
Optionally, the decoding the face frame prediction value to convert the face frame prediction value into a real position of a bounding box, and the decoding the face key point to convert the face key point into a real position of a key point includes:
the predicted value l ═ of the face frame obtained by the detector is (l)cx,lcy,lw,lh) Decoding operation is carried out, and the real position b ═ b of the boundary box is converted intocx,bcy,bw,bh):
bcx=lcxdw+dcx,bcy=lcydh+dcy
bw=dwexp(lw),bh=dhexp(lh);
Predicting the face key point value obtained by the detector
Figure BDA0002727188230000038
Figure BDA0002727188230000039
Translating to true positions of keypoints
Figure BDA00027271882300000310
Figure BDA00027271882300000311
Wherein d ═ d (d)cx,dcy,dw,dh) Representing a generated default anchor point.
Optionally, the extracting coordinates x and y of the two eyes according to the key points of the face, and extending the x and y to preset pixels in four directions respectively to obtain the eye image includes:
and extracting coordinates x and y of two eyes according to the key points of the human face, and extending the x and the y to four directions by 32 pixels respectively to obtain a 64 x 64 eye image.
Optionally, the determining, by using a preset living body recognition neural network, whether the eye image is a living body, to obtain a determination result, includes:
the living body recognition neural network adopts a mobilenet lightweight neural network to extract living body characteristics, and the living body recognition neural network uses a cross entropy loss function as a loss function.
The positive progress effects of the invention are as follows: the invention adopts the living body detection method based on the infrared image, and has the following remarkable advantages:
1. the real-time detection effect can be achieved under the condition that the mobile terminal only has a CPU;
2. the living body accuracy is improved in a mode of finely detecting the bright pupil effect;
3. accurately detecting the position of the eye;
4. the robustness is strong, and the influence of the outside is small.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a network architecture of the detector of the present invention;
FIG. 3 is a diagram of the attack image results of the present invention;
FIG. 4 is a diagram of a human image result of the present invention.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific drawings.
Referring to fig. 1, a living body detecting method based on an infrared image includes:
and S1, inputting the picture, collecting the infrared picture through the infrared camera, and carrying out preprocessing operation on the picture.
In the step, the infrared picture can be directly acquired from the infrared camera end, or the infrared picture can be input through the input interface. The preprocessing operation of the picture comprises image size adjustment and standardization.
S2, predicting by the detector: the picture is put into a preset detector for prediction, features obtained through four different convolution layers in a backbone network of the detector are combined with anchor points of multiple sizes, face detection, face key point detection and mask recognition are carried out, and a face frame prediction value, a face key point and a mask recognition result are obtained.
Before the step of placing the picture into a preset detector for prediction, the method further comprises the following steps:
loading preset pre-training network parameters to a detector, and generating a default anchor point according to the size and length-width ratio of the preset anchor point, wherein the default anchor point is as follows: d ═ d (d)cx,dcy,dw,dh)。
Wherein, referring to fig. 2, the detector includes a backbone network, a prediction layer, and a multi-tasking loss layer. The backbone network comprises 15 convolutional layers, 4 prediction layers and 1 multitask loss layer. The 15 convolutional layers comprise a convolution module 1, thirteen convolution modules 2 and a convolution module 3. The convolution module 1 consists of convolution, normalization and activation layers. The convolution module 2 is composed of two groups of modules, namely a first module composed of a group of convolution, normalization and activation layers and a second module composed of a group of convolution, normalization and activation layers. The convolution module 3 is composed of two groups of modules, namely a first module composed of a group of convolution, normalization and activation layers and a second module only containing convolution. In the step, the characteristics of 8 th, 11 th, 13 th and 15 th convolution layers in the backbone network are respectively input into each prediction layer to carry out face frame, face key point positioning and mask recognition operation, and each prediction layer is input into a multi-task loss layer to realize the fitting of a plurality of detection results.
And training the detector through a preset data set to obtain the trained detector. The detector algorithm is preferably implemented using a pytorech open source deep learning library. During training, the following processes are included:
s201, data acquisition: the acquisition includes unoccluded data and occlusion data as datasets.
The non-occlusion data is a face picture when the mask is not worn, the occlusion data is a face picture when the mask is worn, the occlusion data is larger than the non-occlusion data, and most of the occlusion data is preferably a data set of the mask. During data acquisition, manually processed WiderFace unoccluded data and MAFA occluded data can be adopted.
S202, data processing and enhancing: and converting the BGR pictures in the data set into YUV format, and only storing the data of the Y channel, and then performing data enhancement to obtain an enhanced data set.
The data enhancement includes adding data to prevent model overfitting by applying a combination of at least one or more of color distortion, increasing brightness contrast, random cropping, horizontal flipping, and transformation channels to pictures in the data set.
The method for training the data in the single channel can reduce the parameter quantity of the model and improve the detection speed of the model. Through the picture of direct training single channel Y form, also avoid moving the end and need picture format conversion, save time for the model can reach super real-time detection's effect under the condition that moves the end only CPU.
The strategy adopted for enhancing the brightness contrast is to reduce the brightness in the target frame and increase the brightness outside the target frame. The data enhancement can be realized in various combinations, so that the model can be more robust under the illumination condition.
S203, training: performing network training by using a random optimization algorithm with momentum of 0.9 and weight attenuation factor of 0.0005, wherein the random optimization algorithm reduces imbalance between positive and negative samples by using a difficult sample mining mode, and the initial learning rate is set to 10 in the first 100 rounds of training-3After 50 and 100 rounds each by a factor of 10, during training each predictor was first matched to the best Jaccard overlap anchor point, and then the anchor point was matched to Jaccard overlap faces with a threshold above 0.35.
By the design, the trained detector can predict pictures.
During prediction, the characteristics of 8 th, 11 th, 13 th and 15 th convolution layers in the backbone network are respectively input into each prediction layer to carry out face frame, face key point positioning and mask recognition operation.
For each anchor point, representing by using 4 offsets from its coordinates and N scores for classification, where N is 2; for each anchor point during detector training, the minimization of the multitask loss function:
Figure BDA0002727188230000061
wherein L isobjDetecting for a cross entropy loss function whether an anchor point contains a target classification, piFor the probability of an anchor having a target, if the anchor contains a target, then
Figure BDA0002727188230000062
Otherwise, the value is 0; l isboxAdopt the smoth-L1 loss function for locating the anchor point of the human face, ti={tx,ty,tw,th}iIn order to predict the coordinate offset of the box,
Figure BDA0002727188230000063
the coordinate offset of the anchor point of the positive sample; l islandmarkAdopting smoth-L1 loss function for positioning key points of human facei={lx1,ly1,lx2,ly2,...,lx5,ly5}iFor the predicted amount of the keypoint offset,
Figure BDA0002727188230000064
is the coordinate offset of the key point of the positive sample, if the sample is the wearing maski={lx1,ly1,lx2,ly2}i
Figure BDA0002727188230000065
Wherein lx1,ly1And
Figure BDA0002727188230000066
respectively representing the left-eye predicted keypoint coordinate offset and the positive sample keypoint offset, lx2,ly2And
Figure BDA0002727188230000067
respectively representing the coordinate offset of the right-eye predicted key point and the offset of the positive sample key point; lambda [ alpha ]1And λ2Respectively, the weight coefficients of the face frame and the key point loss function.
Where anchor points of 10 to 256 pixels are employed to match the minimum size of the corresponding effective receptive field, the anchor points for each detected feature are sized to (10, 16, 24), (32, 48), (64, 96) and (128, 192, 256), respectively.
By the design, the purpose of end-to-end mask identification is achieved, an additional classifier is not needed to be added to independently identify whether the mask is worn, operations such as picture rotation and matting can be avoided under the condition that the mobile end only has a CPU, and time is saved. In addition, the invention optimizes the detection of key points of the face wearing the mask, and only visible eye feature loss is optimized during training under the condition of wearing the mask.
And S3, decoding according to the generated anchor points: and decoding the predicted value of the face frame, converting the predicted value into the real position of the boundary frame, and decoding the key points of the face to convert the decoded value into the real position of the key points.
The specific decoding process is as follows:
the predicted value l of the face frame obtained by the detector is equal to (l)cx,lcy,lw,lh) Decoding operation is carried out, and the real position b ═ b of the boundary box is converted intocx,bcy,bw,bh):
bcx=lcxdw+dcx,bcy=lcydh+dcy
bw=dwexp(lw),bh=dhexp(lh);
Predicting the face key points obtained by the detector
Figure BDA0002727188230000071
Translating to true positions of keypoints
Figure BDA0002727188230000072
Figure BDA0002727188230000073
Wherein d ═ d (d)cx,dcy,dw,dh) Indicating the default anchor generated at step S2.
S4, non-maximum suppression: and eliminating the overlapped detection frames by adopting a non-maximum suppression algorithm with a threshold value of 0.4 to obtain a final face detection frame, face key points and mask recognition result, wherein the final face detection frame, face key points and mask recognition result comprises information of the upper left corner coordinate, the lower right corner coordinate, the two eye coordinates, the nose coordinate, a pair of mouth corner coordinates and the confidence coefficient of wearing the mask.
The picture shown in fig. 3 is pre-processed to adjust the image size and standardize the image size. And converting the standardized picture format into a YUV format, storing the data of the Y channel only, enhancing the data, and inputting the data into a trained detector for prediction. The network model at the time of prediction is shown in fig. 2, in the multitask loss function, the anchor point contains the target,
Figure BDA0002727188230000074
finally, a face detection frame is detected and red frame marking is carried out, and each face detection frame comprises two eye coordinates, a nose coordinate and a pair of mouth angle coordinates and is marked. The obtained detection results are face detection frames, face key points and mask identification results, and the detection results are used in a face identification scene and can be used in other subsequent identification processes as accurate data. Particularly, the invention extracts the coordinates of two eyes as accurate data aiming at the key points of the human face in the detection result, and provides important basis for judging whether the human body is a living body or not after data processing.
The picture shown in fig. 4 is subjected to image resizing by means of preprocessing, so as to be standardized. And converting the standardized picture format into a YUV format, storing the data of the Y channel only, enhancing the data, and inputting the data into a trained detector for prediction. The network model at the time of prediction is shown in fig. 2, in the multitask loss function, the anchor point contains the target,
Figure BDA0002727188230000075
finally, a face detection frame is detected and red frame marking is carried out, and each face detection frame comprises two eye coordinates, a nose coordinate and a pair of mouth angle coordinates and is marked.
And S5, intercepting the eye image: extracting coordinates x and y of two eyes according to the key points of the face, and extending the x and y to preset pixels in four directions respectively to obtain an eye image.
Specifically, two eye coordinates x and y are extracted according to the key points of the face, and the x and the y are extended for 32 pixels in four directions respectively to obtain 64 × 64 eye images.
S6, living body recognition neural network: and judging whether the eye image is a living body by adopting a preset living body recognition neural network to obtain a judgment result.
Specifically, the living body recognition neural network extracts living body features by using a mobilenet lightweight neural network, and the living body recognition neural network judges whether an eye image is a living body or not by using a cross entropy loss function as a loss function.
The living body recognition neural network in the step adopts a trained living body recognition neural network, and during training, a trained data set uses collected samples, wherein a positive sample is a real person picture shot under an infrared camera, and an attack sample is one or more combination forms of a mobile phone screen face, an ipad face, a printed colorful face or a gray face shot under an infrared image.
The picture shown in fig. 3 is an attack sample, after the picture is processed in S4, two eye coordinates x and y are extracted according to key points of a human face, x and y are extended by 32 pixels in four directions respectively to obtain 64 × 64 eye images, and after the eye images are judged by the living body recognition neural network in the step, the judgment result is that "fake" is not a living body.
The picture shown in fig. 4 is a real person picture, after the picture is processed in S4, two eye coordinates x and y are extracted according to key points of a human face, x and y are extended by 32 pixels in four directions respectively to obtain 64 × 64 eye images, and after the eye images are judged by the living body recognition neural network in the step, the result is that "real" is a living body.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (10)

1. A living body detection method based on infrared images is characterized by comprising the following steps:
acquiring an infrared picture and preprocessing the picture;
the picture is put into a preset detector for prediction, features obtained through four different convolution layers in a backbone network of the detector are combined with anchor points with multiple sizes, face detection, face key point detection and mask recognition are carried out, and a face frame prediction value, a face key point and a mask recognition result are obtained;
decoding the predicted value of the face frame, converting the predicted value into the real position of a boundary frame, and decoding the key points of the face to convert the key points into the real position of the key points;
eliminating overlapped detection frames by adopting a non-maximum suppression algorithm with a threshold value of 0.4 to obtain a final face detection frame, face key points and mask recognition results, wherein the final face detection frame, face key points and mask recognition results comprise information of a left upper corner coordinate, a right lower corner coordinate, two eye coordinates, a nose coordinate, a pair of mouth corner coordinates and confidence coefficient of wearing a mask;
extracting coordinates x and y of two eyes according to the key points of the face, and extending the x and y to preset pixels in four directions respectively to obtain an eye image;
and judging whether the eye image is a living body by adopting a preset living body recognition neural network to obtain a judgment result.
2. The infrared image-based in-vivo detection method as set forth in claim 1, wherein before the picture is placed in a preset detector for prediction, the method further comprises:
loading preset pre-training network parameters to the detector, and generating a default anchor point according to the size and length-width ratio of the preset anchor point;
training the detector through a preset data set to obtain a trained detector;
the detector includes a backbone network, a prediction layer, and a multi-tasking loss layer.
3. The infrared image-based in-vivo detection method as set forth in claim 2, wherein the training of the detector through a preset data set to obtain a trained detector comprises:
acquiring unoccluded data and occluded data serving as a data set, converting a BGR picture in the data set into a YUV format, only storing data of a Y channel, and then performing data enhancement to obtain an enhanced data set;
performing network training by using a random optimization algorithm with momentum of 0.9 and weight attenuation factor of 0.0005, wherein the random optimization algorithm reduces imbalance between positive and negative samples by using a difficult sample mining mode, and the initial learning rate is set to 10 in the first 100 rounds of training-3After 50 and 100 rounds each by a factor of 10, during training each predictor was first matched to the best Jaccard overlap anchor point, and then the anchor point was matched to Jaccard overlap faces with a threshold above 0.35.
4. The infrared image-based living body detection method according to claim 3, wherein the non-occlusion data is a face picture when a mask is not worn, the occlusion data is a face picture when a mask is worn, and the occlusion data is larger than the non-occlusion data.
5. The infrared image-based liveness detection method of claim 3 wherein said performing data enhancement comprises:
adding data to prevent model overfitting by applying a combination of at least one or more of color distortion, increased brightness contrast, random cropping, horizontal flipping, and transformation channels to pictures in the data set.
6. The method for detecting living bodies based on infrared images according to claim 2, wherein the steps of putting the pictures into a preset detector for prediction, combining features obtained by four different convolution layers in a backbone network of the detector with anchor points with a plurality of sizes, and performing face detection, face key point detection and mask recognition to obtain a face frame prediction value, a face key point and a mask recognition result comprise:
the pictures are put into the trained detector for prediction, and the characteristics of 8 th, 11 th, 13 th and 15 th convolutional layers in the backbone network are respectively input into each prediction layer for face frame, face key point positioning and mask recognition operation during prediction;
for each anchor point, representing by using 4 offsets from its coordinates and N scores for classification, where N is 2; for each anchor point during detector training, the minimization of the multitask loss function:
Figure FDA0002727188220000021
wherein L isobjDetecting for a cross entropy loss function whether an anchor point contains a target classification, piFor the probability of an anchor having a target, if the anchor contains a target, then
Figure FDA0002727188220000022
Otherwise, the value is 0; l isboxAdopt the smoth-L1 loss function for locating the anchor point of the human face, ti={tx,ty,tw,th}iIn order to predict the coordinate offset of the box,
Figure FDA0002727188220000023
the coordinate offset of the anchor point of the positive sample; l islandmarkAdopting smoth-L1 loss function for positioning key points of human facei={lx1,ly1,lx2,ly2,…,lx5,ly5}iFor the predicted amount of the keypoint offset,
Figure FDA0002727188220000024
is the coordinate offset of the key point of the positive sample, if the sample is the wearing maski={lx1,ly1,lx2,ly2}i
Figure FDA0002727188220000025
Wherein lx1,ly1And
Figure FDA0002727188220000026
respectively representing the left-eye predicted keypoint coordinate offset and the positive sample keypoint offset, lx2,ly2And
Figure FDA0002727188220000027
respectively representing the coordinate offset of the right-eye predicted key point and the offset of the positive sample key point; lambda [ alpha ]1And λ2Respectively, the weight coefficients of the face frame and the key point loss function.
7. The infrared image-based living body detecting method as set forth in claim 6, wherein anchor points of 10 to 256 pixels are employed to match the minimum size of the corresponding effective receptive field, and the size of each anchor point for detecting the feature is set to (10, 16, 24), (32, 48), (64, 96) and (128, 192, 256), respectively.
8. The infrared image-based living body detection method as claimed in claim 1, wherein the decoding operation of the face frame prediction value is performed to convert the face frame prediction value into the real position of the bounding box, and the decoding operation of the face key point is performed to convert the face key point into the real position of the key point, comprising:
the predicted value l ═ of the face frame obtained by the detector is (l)cx,lcy,lw,lh) Decoding operation is carried out, and the real position b ═ b of the boundary box is converted intocx,bcy,bw,bh):
bcx=lcxdw+dcx,bcy=lcydh+dcy
bw=dwexp(lw),bh=dhexp(lh);
Predicting the face key point value obtained by the detector
Figure FDA0002727188220000031
Figure FDA0002727188220000032
Translating to true positions of keypoints
Figure FDA0002727188220000033
Figure FDA0002727188220000034
Wherein d ═ d (d)cx,dcy,dw,dh) Representing a generated default anchor point.
9. The infrared image-based in-vivo detection method as claimed in claim 1, wherein the extracting coordinates x and y of two eyes according to the key points of the human face, and extending the x and y to four directions respectively by preset pixels to obtain the eye image comprises:
and extracting coordinates x and y of two eyes according to the key points of the human face, and extending the x and the y to four directions by 32 pixels respectively to obtain a 64 x 64 eye image.
10. The method for detecting living bodies based on infrared images as claimed in claim 1, wherein the determining whether the eye images are living bodies by using a preset living body recognition neural network to obtain a determination result comprises:
the living body recognition neural network adopts a mobilenet lightweight neural network to extract living body characteristics, and the living body recognition neural network uses a cross entropy loss function as a loss function.
CN202011106811.1A 2020-10-16 2020-10-16 Living body detection method based on infrared image Active CN112232204B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011106811.1A CN112232204B (en) 2020-10-16 2020-10-16 Living body detection method based on infrared image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011106811.1A CN112232204B (en) 2020-10-16 2020-10-16 Living body detection method based on infrared image

Publications (2)

Publication Number Publication Date
CN112232204A true CN112232204A (en) 2021-01-15
CN112232204B CN112232204B (en) 2022-07-19

Family

ID=74118035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011106811.1A Active CN112232204B (en) 2020-10-16 2020-10-16 Living body detection method based on infrared image

Country Status (1)

Country Link
CN (1) CN112232204B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801038A (en) * 2021-03-02 2021-05-14 重庆邮电大学 Multi-view face living body detection method and system
CN113033374A (en) * 2021-03-22 2021-06-25 开放智能机器(上海)有限公司 Artificial intelligence dangerous behavior identification method and device, electronic equipment and storage medium
CN113298008A (en) * 2021-06-04 2021-08-24 杭州鸿泉物联网技术股份有限公司 Living body detection-based driver face identification qualification authentication method and device
WO2021238125A1 (en) * 2020-05-27 2021-12-02 嘉楠明芯(北京)科技有限公司 Face occlusion detection method and face occlusion detection apparatus

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748858A (en) * 2017-06-15 2018-03-02 华南理工大学 A kind of multi-pose eye locating method based on concatenated convolutional neutral net
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Face and key point combined detection system, method based on multi-task learning
CN110119676A (en) * 2019-03-28 2019-08-13 广东工业大学 A kind of Driver Fatigue Detection neural network based
CN110647817A (en) * 2019-08-27 2020-01-03 江南大学 Real-time face detection method based on MobileNet V3
CN110866490A (en) * 2019-11-13 2020-03-06 复旦大学 Face detection method and device based on multitask learning
WO2020151489A1 (en) * 2019-01-25 2020-07-30 杭州海康威视数字技术股份有限公司 Living body detection method based on facial recognition, and electronic device and storage medium
CN111680588A (en) * 2020-05-26 2020-09-18 广州多益网络股份有限公司 Human face gate living body detection method based on visible light and infrared light

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748858A (en) * 2017-06-15 2018-03-02 华南理工大学 A kind of multi-pose eye locating method based on concatenated convolutional neutral net
WO2020151489A1 (en) * 2019-01-25 2020-07-30 杭州海康威视数字技术股份有限公司 Living body detection method based on facial recognition, and electronic device and storage medium
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Face and key point combined detection system, method based on multi-task learning
CN110119676A (en) * 2019-03-28 2019-08-13 广东工业大学 A kind of Driver Fatigue Detection neural network based
CN110647817A (en) * 2019-08-27 2020-01-03 江南大学 Real-time face detection method based on MobileNet V3
CN110866490A (en) * 2019-11-13 2020-03-06 复旦大学 Face detection method and device based on multitask learning
CN111680588A (en) * 2020-05-26 2020-09-18 广州多益网络股份有限公司 Human face gate living body detection method based on visible light and infrared light

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TANVI B. PATEL,AND ETC: "Occlusion detection and recognizing human face using neural network", 《2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL (I2C2)》 *
刘淇缘等: "遮挡人脸检测方法研究进展", 《计算机工程与应用》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021238125A1 (en) * 2020-05-27 2021-12-02 嘉楠明芯(北京)科技有限公司 Face occlusion detection method and face occlusion detection apparatus
CN112801038A (en) * 2021-03-02 2021-05-14 重庆邮电大学 Multi-view face living body detection method and system
CN113033374A (en) * 2021-03-22 2021-06-25 开放智能机器(上海)有限公司 Artificial intelligence dangerous behavior identification method and device, electronic equipment and storage medium
CN113298008A (en) * 2021-06-04 2021-08-24 杭州鸿泉物联网技术股份有限公司 Living body detection-based driver face identification qualification authentication method and device

Also Published As

Publication number Publication date
CN112232204B (en) 2022-07-19

Similar Documents

Publication Publication Date Title
CN112232204B (en) Living body detection method based on infrared image
CN108717524B (en) Gesture recognition system based on double-camera mobile phone and artificial intelligence system
CN111310718A (en) High-accuracy detection and comparison method for face-shielding image
CN109886153B (en) Real-time face detection method based on deep convolutional neural network
CN111368666B (en) Living body detection method based on novel pooling and attention mechanism double-flow network
CN108446690B (en) Human face in-vivo detection method based on multi-view dynamic features
CN112052831A (en) Face detection method, device and computer storage medium
CN112215043A (en) Human face living body detection method
CN112818722A (en) Modular dynamically configurable living body face recognition system
CN109461186A (en) Image processing method, device, computer readable storage medium and electronic equipment
CN111079688A (en) Living body detection method based on infrared image in face recognition
CN114783024A (en) Face recognition system of gauze mask is worn in public place based on YOLOv5
CN112232205B (en) Mobile terminal CPU real-time multifunctional face detection method
CN111832405A (en) Face recognition method based on HOG and depth residual error network
CN112614136A (en) Infrared small target real-time instance segmentation method and device
CN115546683A (en) Improved pornographic video detection method and system based on key frame
CN109325472B (en) Face living body detection method based on depth information
CN114550268A (en) Depth-forged video detection method utilizing space-time characteristics
CN111881841B (en) Face detection and recognition method based on binocular vision
CN111274851A (en) Living body detection method and device
CN111881803B (en) Face recognition method based on improved YOLOv3
CN112200008A (en) Face attribute recognition method in community monitoring scene
CN112818938A (en) Face recognition algorithm and face recognition device adaptive to illumination interference environment
CN111797694A (en) License plate detection method and device
CN115294163A (en) Face image quality evaluation method based on adaptive threshold segmentation algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant