CN112507986B - Multi-channel human face in-vivo detection method and device based on neural network - Google Patents

Multi-channel human face in-vivo detection method and device based on neural network Download PDF

Info

Publication number
CN112507986B
CN112507986B CN202110146331.6A CN202110146331A CN112507986B CN 112507986 B CN112507986 B CN 112507986B CN 202110146331 A CN202110146331 A CN 202110146331A CN 112507986 B CN112507986 B CN 112507986B
Authority
CN
China
Prior art keywords
face
image
trained
living body
personal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110146331.6A
Other languages
Chinese (zh)
Other versions
CN112507986A (en
Inventor
陈俊逸
佐凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Xiaogu Technology Co ltd
Original Assignee
Changsha Xiaogu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Xiaogu Technology Co ltd filed Critical Changsha Xiaogu Technology Co ltd
Priority to CN202110146331.6A priority Critical patent/CN112507986B/en
Publication of CN112507986A publication Critical patent/CN112507986A/en
Application granted granted Critical
Publication of CN112507986B publication Critical patent/CN112507986B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Image Analysis (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention relates to a multichannel human face living body detection method and a multichannel human face living body detection device based on a neural network.A human face living body identification model not only simultaneously inputs an X personal face enlarged image and a Y personal face detail image to X + Y channels, but also takes the fusion of the X personal face enlarged image and the Y personal face detail image into consideration through superposition and combination to obtain X + Y + Z personal face living body identification uncertainty and judge whether a human face image to be detected is a living body. Because the X personal face enlarged image and the Y personal face detail image are simultaneously considered, and the superposition combination of the internal features of the X personal face enlarged image and the Y personal face detail image is also considered, the fused features contain global and local information, and the subsequent classification judgment is more facilitated; in addition, in the training and learning of the human face living body recognition model, the gradient return can promote the classification capability of parallel X + Y channels; compared with the prior art, the face living body identification precision and accuracy are higher, various forms of non-living body attacks such as photos, videos and 3D can be further prevented, and personal and property safety is further guaranteed.

Description

Multi-channel human face in-vivo detection method and device based on neural network
Technical Field
The invention relates to a deep neural network, in particular to a multichannel human face in-vivo detection method, a multichannel human face in-vivo detection device, terminal equipment and a computer readable medium based on a neural network.
Background
Human face, as a biological feature with the most influence on human, has been widely used in various industries of authentication systems (access control systems, business systems, payment systems, criminal identification in confidential places, and login and unlocking of terminal devices such as mobile phones and computers). Particularly, with the rapid development of artificial intelligence technology, computer technology, image recognition technology and the like in recent years, the accuracy of face recognition and face detection is greatly improved, but if the input face is a living face biology body, the input face is easily attacked by lawbreakers, and serious personal and property loss is caused.
According to incomplete statistics, a common human face non-living attack form mainly comprises the following steps: photo attacks (paper photos, electronic photos), video playback attacks, 3D attacks (masks, head models), etc. Therefore, the introduction of the face living body detection technology, namely, the recognition of whether the face image detected on the imaging device (camera, mobile phone, etc.) is from a real face or from some form of attack or disguise, plays a crucial role in the application of the face to various industries of the authentication system.
Existing face liveness detection methods include liveness detection based on interaction (blinking, mouth opening, nodding, shaking, etc.), liveness detection based on stereoscopy (solid angle, shadows, etc.), liveness detection based on sub-surfaces (skin texture, liveness, blood color, etc.), and liveness detection based on deep learning neural networks.
However, in-vivo detection based on interaction requires a user to perform corresponding action coordination according to the interaction type, so that the photo and static 3D attacks can be avoided, but video playback and dynamic 3D attacks cannot be avoided; the living body detection based on the three-dimensional property and the sub-surface considers the details of the human face such as the three-dimensional property, the shadow, the skin texture and the like, but the accuracy rate is reduced along with the more vividness of the human face non-living body attack. Therefore, the in-vivo detection method based on the deep learning neural network is most widely applied due to robustness and detection accuracy.
However, the detection accuracy of the existing human face living body detection method based on the deep learning neural network still needs to be improved, and how to provide a high-accuracy human face living body detection method is an important technical problem to be solved before the current human face recognition.
Disclosure of Invention
Therefore, it is necessary to provide a multi-channel face in-vivo detection method based on a neural network for solving the technical problem that the detection accuracy needs to be improved in the existing face in-vivo detection technology, which includes:
acquiring N face images to be trained;
processing the N human face images to be trained to obtain an X human face enlarged image and a Y human face detail image of each human face image to be trained;
constructing a human face living body recognition model for learning human face living body recognition uncertainty; the face living body recognition model comprises X + Y channels, wherein the X channels are used for inputting X enlarged faces of the face; y channels are used for inputting Y face detail maps;
sequentially inputting the X personal face enlarged images and the Y personal face detail images of the N to-be-trained facial images into X + Y channels of the facial living body recognition model respectively, outputting X + Y + Z personal face living body recognition uncertainty corresponding to each to-be-trained facial image, and determining a trained facial living body recognition model through facial living body recognition uncertainty regression loss;
acquiring a human face image to be detected;
processing the face image to be detected to obtain an X personal face enlarged image and a Y personal face detail image of the face image to be detected;
inputting the X personal face enlarged image and the Y personal face detail image of the facial image to be detected into the trained facial living body recognition model, and outputting X + Y + Z personal face living body recognition uncertainty corresponding to the facial image to be detected;
judging whether the face image to be detected is a living body or not according to the X + Y + Z individual face living body identification uncertainty of the face image to be detected;
wherein N represents the number of samples of the face image to be trained; x represents the selected number of the face enlarged images; y represents the selected number of the face detail pictures; z represents the selected number of the combination of the human face enlarged image and the human face detail image;
Figure DEST_PATH_IMAGE001
further, the step of processing the N to-be-trained face images to obtain X enlarged face images of each to-be-trained face image includes:
detecting the N face images to be trained, and acquiring a frame area of each face image to be trained;
selecting X amplification factors, and respectively amplifying the frame area of each face image to be trained by corresponding times according to the X amplification factors to obtain X amplified frame areas of each face image to be trained;
and reshaping the X amplified frame areas of each face image to be trained to a preset specification to obtain X amplified face images of each face image to be trained.
Further, the step of processing the N to-be-trained face images to obtain a face detail map of each to-be-trained face image includes:
detecting the N face images to be trained, and acquiring a frame area of each face image to be trained;
selecting a preset number of key points on a frame area of each face image to be trained;
and intercepting a block image with preset width and height by taking the key point as a center, selecting the block image and splicing the block image into Y face detail images corresponding to each face image to be trained.
Further, the face living body recognition model is used for outputting X personal face living body recognition uncertainty according to the input X personal face enlarged image; outputting Y personal face living body identification uncertainty according to the input Y personal face detail map; and outputting the Z-face living body identification uncertainty according to the superposition of at least two combined elements in the input X-face enlarged image and the input Y-face detail image.
Further, the multi-channel human face living body detection method further comprises the following steps:
and when the face image to be detected is a non-living body, judging the attack type of the face image to be detected according to the X + Y + Z personal face living body identification uncertainty of the face image to be detected.
Furthermore, the invention also provides a multi-channel human face living body detection device based on the neural network, which comprises:
the first acquisition module is used for acquiring N face images to be trained;
the first processing module is connected with the first processing module and used for processing the N human face images to be trained to obtain an X-individual-face enlarged image and a Y-individual-face detail image of each human face image to be trained;
the human face living body identification module is used for constructing a human face living body identification model for learning human face living body identification uncertainty; the face living body recognition model comprises X + Y channels, wherein the X channels are used for inputting X enlarged faces of the face; y channels are used for inputting Y face detail maps;
the training module is connected with the first processing module and the construction module and is used for respectively inputting the X personal face enlarged images and the Y personal face detail images of the N to-be-trained facial images into X + Y channels of the facial living body recognition model in sequence, outputting X + Y + Z personal face living body recognition uncertainty corresponding to each to-be-trained facial image and determining the trained facial living body recognition model through facial living body recognition uncertainty regression loss;
the second acquisition module is used for acquiring a face image to be detected;
the second processing module is connected with the second acquisition module and is used for processing the face image to be detected to obtain an X personal face enlarged image and a Y personal face detail image of the face image to be detected;
the detection module is connected with the second processing module and the training module and is used for inputting an X personal face enlarged view and a Y personal face detail view of the facial image to be detected into the trained facial living body recognition model and outputting X + Y + Z personal face living body recognition uncertainty corresponding to the facial image to be detected;
the judging module is connected with the detecting module and used for judging whether the face image to be detected is a living body or not according to the X + Y + Z personal face living body identification uncertainty of the face image to be detected;
wherein N represents the number of samples of the face image to be trained; x represents the selected number of the face enlarged images; y represents the selected number of the face detail pictures; z represents the selected number of the combination of the human face enlarged image and the human face detail image;
Figure 486126DEST_PATH_IMAGE002
further, the first processing module includes:
the detection unit is used for detecting the N face images to be trained and acquiring the frame area of each face image to be trained;
the amplifying unit is connected with the detecting unit and used for selecting X amplifying coefficients and amplifying the frame area of each face image to be trained by corresponding times according to the X amplifying coefficients to obtain X amplified frame areas of each face image to be trained;
and the reshaping unit is connected with the amplifying unit and is used for reshaping the X amplified frame areas of each to-be-trained face image to a preset specification to obtain an X personal face amplified image of each to-be-trained face image.
Further, the first processing module further includes:
the key point selecting unit is connected with the detecting unit and used for selecting a preset number of key points on the frame area of each face image to be trained;
and the splicing unit is connected with the key point selecting unit and used for intercepting a block image with preset width and height by taking the key point as a center and selecting the block image to be spliced into Y individual face detail images corresponding to each face image to be trained.
Further, the present invention also provides a terminal device, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, the computer program being configured to implement any of the above-mentioned multi-channel face liveness detection methods.
Further, the present invention also provides a computer readable medium storing a computer program for implementing any of the above-mentioned multi-channel face in-vivo detection methods.
In the multi-channel human face in-vivo detection method provided by the invention, the human face in-vivo identification model comprises X + Y channels, X personal face enlarged images and Y personal face detail images are simultaneously input into the X + Y channels, the X + Y + Z personal face in-vivo identification uncertainty is obtained by overlapping and combining the X personal face enlarged images and the Y personal face detail images and considering element fusion of the X personal face enlarged images and the Y personal face detail images, and whether the human face image to be detected is a living body is judged according to the X + Y + Z personal face in-. Because the X personal face enlarged image and the Y personal face detail image are simultaneously considered, and the superposition combination of the internal features of the X personal face enlarged image and the Y personal face detail image is considered, on one hand, the fused features contain the information of the whole face (enlarged face image) and the local face (detail face image), and the subsequent classification judgment is more facilitated; on the other hand, in the training and learning of the face living body recognition model, the gradient return can promote the classification capability of parallel X + Y channels. Therefore, compared with the prior art, the multichannel face living body detection method based on the neural network has higher face living body identification precision and accuracy, can further prevent various forms of non-living body attacks such as photos, videos and 3D, further improves the security of an authentication system, and guarantees personal and property safety.
Drawings
FIG. 1 is a flow chart of one embodiment of a multi-channel human face in-vivo detection method of the present invention;
FIG. 2 is a schematic diagram of an embodiment of a face detail view of the present invention;
FIG. 3 is a sub-flowchart of one embodiment of the multi-channel human face in-vivo detection method of the present invention;
FIG. 4 is a schematic diagram of another embodiment of a face detail view according to the present invention;
FIG. 5 is a block diagram of one embodiment of a live face recognition model of the present invention;
fig. 6 to 9 are block diagrams of 4 embodiments of feature extraction main networks in the living human face recognition model according to the present invention;
FIG. 10 is a block diagram of an embodiment of a classification and discrimination subnetwork in a living human face recognition model according to the present invention;
FIG. 11 is a block diagram of an embodiment of a multi-channel human face liveness detection apparatus of the present invention;
FIG. 12 is a block diagram of a first processing module of the multi-channel living human face detecting device according to an embodiment of the invention.
Detailed Description
As shown in fig. 1, in order to improve the accuracy and precision of the existing human face in-vivo detection, the invention provides a multi-channel human face in-vivo detection method based on a neural network, which comprises the following steps:
s1: n face images to be trained (one face image as shown in fig. 2) are obtained. Specifically, in step S1, the image of the face to be trained is obtained from an image acquisition device or an image repository such as an RGB or infrared sensor, a camera, a video camera, etc. The number N is optionally, but not limited to, the total number of the facial images to be trained or the total number of the sub-samples of the facial images to be trained.
S2: and processing the N human face images to be trained to obtain an X-individual human face enlarged image and a Y-individual human face detail image of each human face image to be trained. It is worth noting that the larger the number of 1, X, Y, the larger the number of samples of the face image to be trained, the more complicated the calculation of the subsequent face living body recognition model, the higher the requirement on the equipment, the more time required, but the higher the result, namely the face living body recognition precision. Therefore, the specific value of X, Y can be flexibly set according to the face living body recognition accuracy required by the current authentication system of the user. 2. The face is enlarged by a specific magnification factor, preferably 1-5 times, to capture a key part in the middle of the face.
Specifically, as shown in fig. 3, the step S2 may optionally but not limited to include:
s21: detecting N face images to be trained (by using a face detector), and acquiring a bounding-box (a maximum box shown in FIG. 2) of each face image to be trained;
s22: selecting X amplification factors, and respectively amplifying the frame area of each face image to be trained by corresponding times according to the X amplification factors to obtain X amplified frame areas of each face image to be trained;
s23: reshaping the X amplified frame areas of each face image to be trained to a preset specification to obtain X amplified face images of each face image to be trained;
s24: selecting a preset number of key points on a frame area of each face image to be trained;
s25: and intercepting a block image with a preset width and a preset height by taking the key point as a center, selecting the block image and splicing the block image into Y human face detail images corresponding to each human face image to be trained. Specifically, it is possible to select, but not limited to, using a landMark detector to obtain (68) key points of the face, select the center of the forehead, the center of the pupil of the left eye, the center of the pupil of the right eye, the tip of the nose, the center of the mouth, and four corners of the frame region as 9 key points, and intercept a block diagram with a preset width and height with this as the center (as shown in fig. 2), or directly select four corners, a horizontal and vertical center point, and center points of four sides of the frame region of the face image to be trained as 9 key points, and intercept a block diagram with a preset width and height with this as the center (as shown in fig. 4), and finally splice into a face detail diagram. More specifically, the specific selection mode and number of the key points, the specific selection mode and number of the block images, and the specific number of the face detail images formed by splicing can be flexibly set according to the face living body recognition precision of the user. It should be noted that the specific form and execution steps of steps S22 and S23 and steps S24 and S25 are only examples, and are not limited thereto.
S3: constructing a face living body recognition model for learning face living body recognition uncertainty, wherein the face living body recognition model comprises X + Y channels, and the X channels are used for inputting X enlarged faces of faces; the Y channels are used for inputting Y face detail maps. Specifically, as shown in fig. 5, taking X =2 and Y =1 as examples, according to steps S21-S23, for each face image to be trained, 1.2 times and 2 times of enlarged faces are obtained and respectively Input into Input shown in fig. 51And Input2In (note that, the specific number X and magnification of the face magnifications can be set arbitrarily, but not limited thereto, when there are X face magnifications, they can be Input into Input1To InputX) Simultaneously, 1 person face detail drawing Input shown in FIG. 5 is acquired according to steps S21, S24-S253(Input1To InputY) And then learning to obtain the uncertainty of the face living body recognition through a recognition model shown by a broken line frame. More preferably, in order to facilitate data transmission and increase user experience, the X personal face enlarged view and the Y personal face detail view may be directly input into X + Y channels, respectively, or may be packaged and transmitted as shown in the upper end of fig. 5, and then divided into X + Y channels by channel split.
S4: sequentially inputting X personal face enlarged images and face detail images of N to-be-trained face images into X + Y channels of the face living body recognition model respectively, outputting X + Y + Z personal face living body recognition uncertainty corresponding to each to-be-trained face image, and finally determining the trained face living body recognition model through face living body recognition uncertainty regression loss. Specifically, taking X =2, Y =1, and Z =1 as examples, as shown in fig. 5, 2 enlarged face views and 1 detailed face view are Input into Input, respectively1-Input3Extracting the main network through a plurality of levels of featuresExtracting features (such as a Block frame shown in an example of FIG. 5), and obtaining a face living body recognition uncertainty Head through a classification and discrimination branch network (such as a Head frame shown in an example of FIG. 5)1-Head3(when there is X enlarged picture of face and Y detailed picture of face, then Head can be obtained1-HeadX+Y) And elements with the same dimension are superposed and combined to form an Add channel (such as Add illustrated in FIG. 5), and the human face living body recognition uncertainty Head is obtained in the same way4(notably, the same dimensional element overlay can be the phase of an X-person face magnification and a Y-person face detail view
Superposition combination of at least two arbitrary elements of same dimension, combination type
Figure 416649DEST_PATH_IMAGE003
Taking X =2 and Y =1 as examples, the combination has 4 kinds of (c) ((r))
Figure 478277DEST_PATH_IMAGE004
) A combination method is to select any one of the 2 enlarged face image superimposed combinations (type 1), the 2 enlarged face images and the 1 enlarged face detail image superimposed combination (types 2 and 3), and select the 2 enlarged face images and the 1 enlarged face detail image superimposed combination (type 4, i.e., the embodiment illustrated in fig. 5).
When the enlarged X face, the detailed Y face and the Z superimposed images are selected and combined, a Head can be obtained1To HeadX+Y+ZAnd further taking the formula (1) as the face living body recognition uncertainty regression loss, training a face living body recognition model, and finally determining the trained face living body recognition model.
H=K1Head1…+KX HeadX+ KX+1HeadX+1…+KX+Y HeadX+Y+ KX+Y+1 Head X+Y+1… K X+Y+Z Head X+Y+Z(1)
Wherein H represents the uncertain regression loss of face living body recognition, K1To KXWeight coefficient, Head, representing the uncertainty of live recognition of a face in an X-person magnification1To HeadXHuman face living body recognition uncertainty, K, representing X individual face enlarged viewsX+1To KX+YWeight coefficient, Head, representing the uncertainty of live-face recognition of a Y-person face detail mapX+1To HeadX+YHuman face living body recognition uncertainty, K, representing a Y-person face detail viewX+Y+1To KX+Y+ZWeight coefficient representing the living body recognition uncertainty of a human face by superimposing the elements of an X-face enlargement and a Y-face detail X+Y+1To Head X+Y+ZAnd the living human face identification uncertainty represents the superposition of the elements of the X personal face enlarged image and the Y personal face detail image.
More specifically, the internal structure and the number of convolutional layers of the feature extraction master network (such as the Block frame illustrated in fig. 5) may be set arbitrarily, where any one Block frame may be selected, but not limited to, a single convolutional layer or a multi-convolutional layer stacking illustrated in fig. 6 to 9, and it is only necessary to ensure that elements after a Block frame at the same level must be of the same dimension to ensure that a subsequent Add channel can complete element stacking. Similarly, the specific structure and the number of convolutional layers of the classification and discrimination sub-network (e.g., the Head box shown in fig. 5) may be set arbitrarily, and may be selected, but not limited to, by using the connection layer shown in fig. 10 as an example, to output two classification values, to determine whether a living body or a non-living body is present, or may be converted to a probability output by using the softMax layer. More preferably, the classification discrimination sub-network is further optionally but not limited to output multi-classification values, and specifically determines the specific type of the non-living body, which belongs to photo attack, video attack or 3D attack, on the basis of determining that the face image is a non-living body.
S5: and acquiring a human face image to be detected. Specifically, step S5 is similar to step S1, and may optionally but not limited to acquire the image of the human face to be detected from an image acquisition device or an image repository such as an RGB or infrared sensor, a camera, a video camera, etc.
S6: and processing the face image to be detected to obtain an X personal face enlarged image and a Y personal face detail image of the face image to be detected. Specifically, step S6 is similar to step S2, and the detailed method steps are not described herein again.
S7: inputting the X personal face enlarged image and the Y personal face detail image of the face image to be detected into the trained face living body recognition model, and outputting the X + Y + Z personal face living body recognition uncertainty corresponding to the face image to be detected. Specifically, step S7 is similar to step S4, and the detailed method steps are not described herein again.
S8: and judging whether the face image to be detected is a living body or not according to the X + Y + Z individual face living body identification uncertainty of the face image to be detected. More specifically, in step S8, it is preferable to further determine which type of attack is determined when the face image to be detected is determined to be non-living, such as photo attack (paper photo, electronic photo), video playback attack, 3D attack (mask, head model), and the like.
In the multi-channel human face in-vivo detection method provided by the invention, the human face in-vivo identification model comprises X + Y channels, X personal face enlarged images and Y personal face detail images are simultaneously input into the X + Y channels, the X + Y + Z personal face in-vivo identification uncertainty is obtained by overlapping and combining the X personal face enlarged images and the Y personal face detail images and considering element fusion of the X personal face enlarged images and the Y personal face detail images, and whether the human face image to be detected is a living body is judged according to the X + Y + Z personal face in-. Because the X personal face enlarged image and the Y personal face detail image are simultaneously considered, and the superposition combination of the internal features of the X personal face enlarged image and the Y personal face detail image is considered, on one hand, the fused features contain the information of the whole face (enlarged face image) and the local face (detail face image), and the subsequent classification judgment is more facilitated; on the other hand, in the training and learning of the face living body recognition model, the gradient return can promote the classification capability of parallel X + Y channels. Therefore, compared with the prior art, the multichannel face living body detection method based on the neural network has higher face living body identification precision and accuracy, can further prevent various forms of non-living body attacks such as photos, videos and 3D, further improves the security of an authentication system, and guarantees personal and property safety.
Further, as shown in fig. 11, on the basis of the face live-body detection method, the present invention further provides a multi-channel face live-body detection apparatus based on a neural network, including:
the first acquisition module 10 is used for acquiring N face images to be trained;
the first processing module 20 is connected with the first processing module and is used for processing the N human face images to be trained to obtain an X-individual-face enlarged image and a Y-individual-face detail image of each human face image to be trained;
a construction module 30, configured to construct a living human face recognition model for learning living human face recognition uncertainty; the face living body recognition model comprises X + Y channels, wherein the X channels are used for inputting X enlarged faces of the faces; y channels are used for inputting Y face detail maps;
the training module 40 is connected with the first processing module and the construction module and is used for sequentially inputting X personal face enlarged images and Y personal face detail images of N to-be-trained facial images into X + Y channels of the facial living body recognition model respectively, outputting X + Y + Z personal face living body recognition uncertainty corresponding to each to-be-trained facial image and determining the trained facial living body recognition model through facial living body recognition uncertainty regression loss;
the second obtaining module 50 is used for obtaining a face image to be detected;
the second processing module 60 is connected to the second obtaining module and configured to process the to-be-detected face image to obtain an X-face enlarged image and a Y-face detail image of the to-be-detected face image;
the detection module 70 is connected with the second processing module and the training module, and is used for inputting the X personal face enlarged image and the Y personal face detail image of the facial image to be detected into the trained facial living body recognition model and outputting the X + Y + Z personal face living body recognition uncertainty corresponding to the facial image to be detected;
the judging module 80 is connected with the detecting module and is used for judging whether the face image to be detected is a living body or not according to the X + Y + Z personal face living body identification uncertainty of the face image to be detected;
wherein N represents the number of samples of the face image to be trained; x represents the selected number of the face enlarged images; y represents the selected number of the face detail pictures; z represents the selected number of the combination of the human face enlarged image and the human face detail image;
Figure DEST_PATH_IMAGE005
preferably, as shown in fig. 12, the first processing module 20 includes:
the detection unit 21 is configured to detect N face images to be trained, and obtain a frame region of each face image to be trained;
the amplifying unit 22 is connected with the detecting unit and used for selecting X amplifying coefficients and respectively amplifying the frame area of each face image to be trained by corresponding times according to the X amplifying coefficients to obtain X amplified frame areas of each face image to be trained;
and the reshaping unit 23 is connected with the amplifying unit and is used for reshaping the X amplified frame areas of each to-be-trained face image to a preset specification to obtain an X amplified face image of each to-be-trained face image.
More preferably, the first processing module further includes:
the key point selecting unit 24 is connected with the detecting unit and used for selecting a preset number of key points on the frame area of each face image to be trained;
and the splicing unit 25 is connected with the key point selecting unit and used for intercepting the block images with preset width and height by taking the key points as the center, and selecting the block images to splice the block images into Y individual face detail images corresponding to each face image to be trained.
Further, the present invention also provides a terminal device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement any of the above-mentioned multi-channel face liveness detection methods.
Further, the present invention also provides a computer readable medium storing a computer program for implementing any of the above-mentioned multi-channel face live detection methods.
It should be noted that the above multi-channel human face in-vivo detection apparatus, the terminal device and the computer readable medium all correspond to the above multi-channel human face in-vivo detection method, and the specific implementation form, preferred alternative, functional role and beneficial effect of each module and the whole system all correspond to the method, which is not described herein again.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (6)

1. A multi-channel human face in-vivo detection method based on a neural network is characterized by comprising the following steps:
acquiring N face images to be trained;
processing the N human face images to be trained to obtain an X human face enlarged image and a Y human face detail image of each human face image to be trained;
constructing a human face living body recognition model for learning human face living body recognition uncertainty; the face living body recognition model comprises X + Y channels, wherein the X channels are used for inputting X enlarged faces of the face; y channels are used for inputting Y face detail maps;
sequentially inputting the X personal face enlarged images and the Y personal face detail images of the N to-be-trained facial images into X + Y channels of the facial living body recognition model respectively, outputting X + Y + Z personal face living body recognition uncertainty corresponding to each to-be-trained facial image, and determining a trained facial living body recognition model through facial living body recognition uncertainty regression loss;
acquiring a human face image to be detected;
processing the face image to be detected to obtain an X personal face enlarged image and a Y personal face detail image of the face image to be detected;
inputting the X personal face enlarged image and the Y personal face detail image of the facial image to be detected into the trained facial living body recognition model, and outputting X + Y + Z personal face living body recognition uncertainty corresponding to the facial image to be detected;
judging whether the face image to be detected is a living body or not according to the X + Y + Z individual face living body identification uncertainty of the face image to be detected;
wherein N represents the number of samples of the face image to be trained; x represents the selected number of the face enlarged images; y represents the selected number of the face detail pictures; z represents the selected number of the combination of the human face enlarged image and the human face detail image;
Figure FDA0003008304330000011
the step of processing the N face images to be trained to obtain X enlarged face images of each face image to be trained comprises the following steps:
detecting the N face images to be trained, and acquiring a frame area of each face image to be trained;
selecting X amplification factors, and respectively amplifying the frame area of each face image to be trained by corresponding times according to the X amplification factors to obtain X amplified frame areas of each face image to be trained;
reshaping the X amplified frame areas of each face image to be trained to a preset specification to obtain X amplified face images of each face image to be trained;
the step of processing the N face images to be trained to obtain a face detail map of each face image to be trained comprises the following steps:
detecting the N face images to be trained, and acquiring a frame area of each face image to be trained;
selecting a preset number of key points on a frame area of each face image to be trained;
and intercepting a block image with preset width and height by taking the key point as a center, selecting the block image and splicing the block image into Y face detail images corresponding to each face image to be trained.
2. The multi-channel human face in-vivo detection method according to claim 1, wherein the human face in-vivo identification model is used for outputting X human face in-vivo identification uncertainty according to an input X human face enlarged image; outputting Y personal face living body identification uncertainty according to the input Y personal face detail map; and outputting the Z-face living body identification uncertainty according to the superposition of at least two combined elements in the input X-face enlarged image and the input Y-face detail image.
3. The multi-channel human face live detection method according to any one of claims 1-2, further comprising:
and when the face image to be detected is a non-living body, judging the attack type of the face image to be detected according to the X + Y + Z personal face living body identification uncertainty of the face image to be detected.
4. A multichannel human face in-vivo detection device based on a neural network is characterized by comprising:
the first acquisition module is used for acquiring N face images to be trained;
the first processing module is connected with the first processing module and used for processing the N human face images to be trained to obtain an X-individual-face enlarged image and a Y-individual-face detail image of each human face image to be trained;
the human face living body identification module is used for constructing a human face living body identification model for learning human face living body identification uncertainty; the face living body recognition model comprises X + Y channels, wherein the X channels are used for inputting X enlarged faces of the face; y channels are used for inputting Y face detail maps;
the training module is connected with the first processing module and the construction module and is used for respectively inputting the X personal face enlarged images and the Y personal face detail images of the N to-be-trained facial images into X + Y channels of the facial living body recognition model in sequence, outputting X + Y + Z personal face living body recognition uncertainty corresponding to each to-be-trained facial image and determining the trained facial living body recognition model through facial living body recognition uncertainty regression loss;
the second acquisition module is used for acquiring a face image to be detected;
the second processing module is connected with the second acquisition module and is used for processing the face image to be detected to obtain an X personal face enlarged image and a Y personal face detail image of the face image to be detected;
the detection module is connected with the second processing module and the training module and is used for inputting an X personal face enlarged view and a Y personal face detail view of the facial image to be detected into the trained facial living body recognition model and outputting X + Y + Z personal face living body recognition uncertainty corresponding to the facial image to be detected;
the judging module is connected with the detecting module and used for judging whether the face image to be detected is a living body or not according to the X + Y + Z personal face living body identification uncertainty of the face image to be detected;
wherein N represents the number of samples of the face image to be trained; x represents the selected number of the face enlarged images; y represents the selected number of the face detail pictures; z represents the selected number of the combination of the human face enlarged image and the human face detail image;
Figure FDA0003008304330000031
the first processing module comprises:
the detection unit is used for detecting the N face images to be trained and acquiring the frame area of each face image to be trained;
the amplifying unit is connected with the detecting unit and used for selecting X amplifying coefficients and amplifying the frame area of each face image to be trained by corresponding times according to the X amplifying coefficients to obtain X amplified frame areas of each face image to be trained;
the reshaping unit is connected with the amplifying unit and is used for reshaping the X amplified frame areas of each face image to be trained to a preset specification to obtain X amplified face images of each face image to be trained;
the key point selecting unit is connected with the detecting unit and used for selecting a preset number of key points on the frame area of each face image to be trained;
and the splicing unit is connected with the key point selecting unit and used for intercepting a block image with preset width and height by taking the key point as a center and selecting the block image to be spliced into Y individual face detail images corresponding to each face image to be trained.
5. A terminal device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, characterized in that said computer program is adapted to implement a multi-channel face liveness detection method according to any of claims 1-3.
6. A computer-readable medium, in which a computer program is stored, wherein the computer program is configured to implement the multi-channel face in-vivo detection method according to any one of claims 1 to 3.
CN202110146331.6A 2021-02-03 2021-02-03 Multi-channel human face in-vivo detection method and device based on neural network Active CN112507986B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110146331.6A CN112507986B (en) 2021-02-03 2021-02-03 Multi-channel human face in-vivo detection method and device based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110146331.6A CN112507986B (en) 2021-02-03 2021-02-03 Multi-channel human face in-vivo detection method and device based on neural network

Publications (2)

Publication Number Publication Date
CN112507986A CN112507986A (en) 2021-03-16
CN112507986B true CN112507986B (en) 2021-05-11

Family

ID=74952527

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110146331.6A Active CN112507986B (en) 2021-02-03 2021-02-03 Multi-channel human face in-vivo detection method and device based on neural network

Country Status (1)

Country Link
CN (1) CN112507986B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221767B (en) * 2021-05-18 2023-08-04 北京百度网讯科技有限公司 Method for training living body face recognition model and recognizing living body face and related device
CN113792701B (en) * 2021-09-24 2024-08-13 北京市商汤科技开发有限公司 Living body detection method, living body detection device, computer equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104484658A (en) * 2014-12-30 2015-04-01 中科创达软件股份有限公司 Face gender recognition method and device based on multi-channel convolution neural network
CN105956572A (en) * 2016-05-15 2016-09-21 北京工业大学 In vivo face detection method based on convolutional neural network
CN107886070A (en) * 2017-11-10 2018-04-06 北京小米移动软件有限公司 Verification method, device and the equipment of facial image
WO2020159437A1 (en) * 2019-01-29 2020-08-06 Agency For Science, Technology And Research Method and system for face liveness detection

Also Published As

Publication number Publication date
CN112507986A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
CN109446981B (en) Face living body detection and identity authentication method and device
CN110348387B (en) Image data processing method, device and computer readable storage medium
CN111597938B (en) Living body detection and model training method and device
WO2019152983A2 (en) System and apparatus for face anti-spoofing via auxiliary supervision
CN110569808A (en) Living body detection method and device and computer equipment
CN110008783A (en) Human face in-vivo detection method, device and electronic equipment based on neural network model
TW201911130A (en) Method and device for remake image recognition
CN108282644B (en) Single-camera imaging method and device
CN112507986B (en) Multi-channel human face in-vivo detection method and device based on neural network
EP2797051B1 (en) Image processing device, image processing method, program, and recording medium
WO2021147418A1 (en) Image dehazing method and apparatus, device and computer storage medium
CN107862658B (en) Image processing method, image processing device, computer-readable storage medium and electronic equipment
WO2020088029A1 (en) Liveness detection method, storage medium, and electronic device
CN113642639B (en) Living body detection method, living body detection device, living body detection equipment and storage medium
CN111274947A (en) Multi-task multi-thread face recognition method, system and storage medium
US20230100954A1 (en) Facial recognition method and apparatus, device, and medium
CN111222432A (en) Face living body detection method, system, equipment and readable storage medium
CN114387548A (en) Video and liveness detection method, system, device, storage medium and program product
JP7264308B2 (en) Systems and methods for adaptively constructing a three-dimensional face model based on two or more inputs of two-dimensional face images
CN111582155A (en) Living body detection method, living body detection device, computer equipment and storage medium
CN115082992A (en) Face living body detection method and device, electronic equipment and readable storage medium
CN112381749B (en) Image processing method, image processing device and electronic equipment
US9286707B1 (en) Removing transient objects to synthesize an unobstructed image
CN111401258B (en) Living body detection method and device based on artificial intelligence
CN117218398A (en) Data processing method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant