CN115601818B - Lightweight visible light living body detection method and device - Google Patents

Lightweight visible light living body detection method and device Download PDF

Info

Publication number
CN115601818B
CN115601818B CN202211503095.XA CN202211503095A CN115601818B CN 115601818 B CN115601818 B CN 115601818B CN 202211503095 A CN202211503095 A CN 202211503095A CN 115601818 B CN115601818 B CN 115601818B
Authority
CN
China
Prior art keywords
living body
target
face
visible light
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211503095.XA
Other languages
Chinese (zh)
Other versions
CN115601818A (en
Inventor
蒙顺开
瞿锐恒
李叶雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolphin Lezhi Technology Chengdu Co ltd
Original Assignee
Dolphin Lezhi Technology Chengdu Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolphin Lezhi Technology Chengdu Co ltd filed Critical Dolphin Lezhi Technology Chengdu Co ltd
Priority to CN202211503095.XA priority Critical patent/CN115601818B/en
Publication of CN115601818A publication Critical patent/CN115601818A/en
Application granted granted Critical
Publication of CN115601818B publication Critical patent/CN115601818B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/70Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in livestock or poultry

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a lightweight visible light living body detection method and a device, wherein the lightweight visible light living body detection method utilizes a visible light living body detection model to carry out living body discrimination on a human face in a visible light original image, the visible light living body detection model comprises a deep neural network, a first full-connection network and a second full-connection network, and an auxiliary monitoring network for carrying out auxiliary learning on green light intensity characteristics is introduced during training of the visible light living body detection model. Based on the principle that blood in a living body has certain intensity distribution in the green light direction when flowing through skin, the deep neural network is trained, the auxiliary monitoring network is arranged, and the auxiliary monitoring network assists the deep neural network to accurately extract living body characteristics of a human face, so that the problem that a silent living body detection method based on a visible light image in the prior art cannot resist 3D non-living body attack is solved, the living body detection accuracy is improved, and the light weight is achieved.

Description

Lightweight visible light living body detection method and device
Technical Field
The invention belongs to the technical field of image processing and target identification, and particularly relates to a light-weight visible light living body detection method and device.
Background
The liveness detection technology mainly discriminates whether a face appearing in front of a machine is real or fake, wherein faces appearing by means of other media can be defined as false faces, including printed photos, screen images, silica gel masks, stereoscopic 3D (three-dimensional) figures and the like. Currently, mainstream biopsy schemes include coordinated biopsy and uncoordinated biopsy (silent biopsy), and the like. The fitting type living body detection requires a user to complete a specified action according to a prompt and then perform living body verification, and can be called dynamic living body detection. The silent living body detection is opposite to the dynamic living body detection, and whether the living body is a real living body is judged mainly under the condition that a series of actions such as blinking, mouth opening and the like are not matched. Therefore, the technical realization difficulty of the silent live body detection is higher, the requirement on accuracy in practical application is higher, and meanwhile, the silent live body detection directly carries out live body verification under the condition that a user does not feel, so that better user experience is achieved.
The silent in-vivo detection is generally divided into three technical routes of infrared images, 3D structural light and visible light images according to different imaging sources: the infrared image filters light rays in a specific wave band, and the false face attack based on screen imaging is naturally resisted; depth information is introduced into the 3D structured light, and false face attacks of 2D media such as paper photos, screen imaging and the like can be easily distinguished; the visible light picture is mainly distinguished through Moire patterns, paper photo reflection and other detailed information which appear in screen shooting. Based on the above analysis, it is found that compared with the other two methods, the living body detection based on the visible light image can only perform discrimination by the information of the image itself, and thus the method faces a greater challenge in an actual open scene.
However, the silent in-vivo detection based on the visible light image has the advantages of high identification speed, simplicity and convenience in operation, non-contact type and the like, and in addition, compared with an infrared imaging device and a 3D structured light imaging device, the visible light imaging device is lower in cost and high in integration level, and the main flow directions of the existing face identification system are all visible light imaging devices, so that the method has important value in research on a method for performing in-vivo detection based on visible light imaging. Meanwhile, with the popularization of technologies such as 5G and AI, the world of everything interconnection has come, so that the face recognition technology has been widely applied to various types of interconnection devices, including various interconnection devices at the edge, and the application of the face recognition technology in the interconnection devices at the edge needs to consider the calculation power and power consumption of the interconnection devices at the edge, so that how to realize the lightweight of an algorithm process is also a problem that needs to be considered in the research of a method for performing living body detection based on visible light imaging, and the face recognition technology is suitable for the interconnection devices at the edge with extremely limited calculation power.
Disclosure of Invention
The invention aims to overcome one or more defects in the prior art and provide a light-weight visible light living body detection method and device.
The purpose of the invention is realized by the following technical scheme:
first aspect
The invention provides a light-weight visible light living body detection method, which comprises the following steps:
s1, acquiring a visible light original image to be processed;
s2, recognizing a human face target from a visible light original image to be processed by utilizing a pre-constructed visible light living body detection model, and determining that the human face target is a living body or a non-living body;
the construction process of the visible light living body detection model is as follows:
SS1, constructing a deep neural network, wherein the deep neural network is used for acquiring a historical visible light original image, extracting a target feature in the historical visible light original image and generating a target feature matrix, the target feature comprises a green light intensity feature, and the green light intensity feature is an intensity distribution feature of green light when blood flows through skin;
SS2, constructing a first fully-connected network, wherein the first fully-connected network is used for receiving the target feature matrix and identifying the position and the size of a human face target in the target feature matrix;
SS3, extracting a face feature matrix in a target feature matrix based on the position and the size of the face target, and performing global maximization processing on the face feature matrix to obtain living body distinguishing feature vectors after the global maximization processing;
SS4, constructing a second fully-connected network, wherein the second fully-connected network is used for receiving the living body distinguishing feature vector and determining that the current face target is a living body or a non-living body according to the living body distinguishing feature vector;
SS5, training the deep neural network, the first fully-connected network and the second fully-connected network by using training samples, introducing an auxiliary monitoring network when the deep neural network is trained, taking a loss function as training constraint, obtaining network parameters of the deep neural network, the first fully-connected network, the second fully-connected network and the auxiliary monitoring network after training is finished, and then generating a visible light living body detection model based on the network parameters of the deep neural network, the first fully-connected network and the second fully-connected network;
the auxiliary supervision network is used for auxiliary supervision when the deep neural network extracts the green light intensity characteristics.
Preferably, in the SS2, the position of the human face target in the target feature matrix is identified based on a non-maximum suppression algorithm.
Preferably, the SS3 specifically includes the following sub-steps:
SS31, and face feature matrix F in the extracted target feature matrix based on the position and size of the face target H ×F W ×N;
SS32, respectively for N F H ×F W The x 1 matrix finds the maximum value, and generates a living body discrimination feature vector from the N maximum values obtained.
Preferably, in the SS4, determining that the current face target is a living body or a non-living body according to the living body discrimination feature vector specifically includes the following sub-steps:
SS41, a second full-connection network classifies the obtained living body distinguishing feature vector and outputs the probability that the current face target is a living body and the probability that the current face target is a non-living body;
SS42, if the probability that the current face target is a living body is larger than the probability that the current face target is a non-living body, determining that the current face target is a living body; and if the probability that the current face target is a living body is smaller than the probability that the current face target is a non-living body, determining that the current face target is the non-living body.
Preferably, the auxiliary supervised network comprises a supervised learning network and a first spectral feature extraction network;
the first spectral feature extraction network is used for intercepting a face image from a historical visible light original image according to the position and the size of a face target, extracting green light intensity components of the face image, and then generating green light component spatial spectral features of the face image based on Fourier transform;
and the supervised learning network is used for receiving the target feature matrix, extracting a single face feature matrix in the target feature matrix based on the position and the size of a face target, then performing learning supervision, and enabling the green light intensity feature in the single face feature matrix to approach the green light component spatial spectrum feature after the learning supervision.
Preferably, the visible light original image is an RGB three-channel image; the method for extracting the green light intensity component of the face image specifically comprises the following substeps: based on a first formula, the face image I f The RGB three-channel numerical values of each pixel point are all converted into single green component numerical values, and the first formula is as follows:
Figure DEST_PATH_IMAGE002
wherein,
Figure DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE006
Figure DEST_PATH_IMAGE007
representing a face image I f The value of the 0 th channel of the nth column pixel point in the mth row->
Figure DEST_PATH_IMAGE008
Representing a face image I f The value of the 1 st channel of the nth pixel point in the mth row->
Figure DEST_PATH_IMAGE009
Representing a face image I f The value of the 2 nd channel of the nth pixel point in the mth row->
Figure DEST_PATH_IMAGE010
Representing a face image I f Middle mth row and nth column pixel pointThe transformed values.
Preferably, the fourier transform-based generation of the spatial spectral feature of the green light component of the face image specifically includes the following sub-steps:
SSS1, performing Fourier transform on the face image after the green light intensity component is extracted;
and SSS2, performing normalization calculation by taking a Fourier transform module, and then obtaining the green light component spatial spectrum characteristic of the face image.
Preferably, in the SS1, before extracting a target feature matrix in the historical visible light original image, the deep neural network scales the received historical visible light original image, where the scaled visible light original image is 256 × 256 × 3, and the target feature matrix is 8 × 8 × 128; in the SS2, when the first fully-connected network identifies the position of the face target in the target feature matrix, the preset prior frame sizes include 192 × 192, 128 × 128, and 32 × 32.
Preferably, the loss function
Figure DEST_PATH_IMAGE011
,/>
Figure DEST_PATH_IMAGE012
Is a preset first weight coefficient,
Figure DEST_PATH_IMAGE013
is a preset second weight factor, < > based on the comparison>
Figure DEST_PATH_IMAGE014
Is a preset third weight coefficient>
Figure DEST_PATH_IMAGE015
Is a preset fourth weight factor, < > based on>
Figure DEST_PATH_IMAGE016
Represents the classification loss when the human face and the non-human face are distinguished, and is judged>
Figure DEST_PATH_IMAGE017
Represents a loss in regression for the location of a human face target>
Figure DEST_PATH_IMAGE018
Represents the classification loss in the discrimination of a living body and a non-living body>
Figure DEST_PATH_IMAGE019
Representing a green light intensity feature learning loss;
wherein,
Figure DEST_PATH_IMAGE021
n is the number of training samples; />
Figure DEST_PATH_IMAGE022
Indicating whether the jth prior frame in the ith grid is responsible for detecting the true value of the face, if->
Figure 924511DEST_PATH_IMAGE022
=1, it means that the jth prior frame in the ith grid is responsible for detecting the face, if & ->
Figure 887657DEST_PATH_IMAGE022
=0, it means that the jth prior frame in the ith grid is not responsible for detecting the face; />
Figure DEST_PATH_IMAGE023
Whether the ith grid contains the true value of the central point of the jth prior frame or not is shown, if so
Figure 932973DEST_PATH_IMAGE023
=1, then indicate that the ith grid contains the jth prior frame center point, if £ is present>
Figure 124920DEST_PATH_IMAGE023
If =0, it means that the ith grid does not contain the jth prior frame center point; the number of the grids is 64, and each grid corresponds to each characteristic diagram in the target characteristic matrix one by one; />
Figure DEST_PATH_IMAGE024
An output value representing whether the ith grid contains the jth prior frame center point, if >>
Figure 204872DEST_PATH_IMAGE024
=1, then indicate that the ith grid contains the jth prior frame center point, if £ is present>
Figure 574804DEST_PATH_IMAGE024
If =0, it means that the ith grid does not contain the jth prior frame center point;
Figure DEST_PATH_IMAGE026
in which>
Figure DEST_PATH_IMAGE027
A center point coordinate estimate representing the jth prior frame in the ith grid,/>>
Figure DEST_PATH_IMAGE028
Represents the true value of the center point coordinate of the jth prior frame in the ith grid, and/or is greater than the true value of the center point coordinate of the jth prior frame in the ith grid>
Figure DEST_PATH_IMAGE029
Represents the width estimate, based on the value of the jth prior frame in the ith trellis>
Figure DEST_PATH_IMAGE030
Represents a height estimate, based on the jth prior frame in the ith grid>
Figure DEST_PATH_IMAGE031
Actual value representing the width of the jth prior box in the ith grid>
Figure DEST_PATH_IMAGE032
Representing the true height value of the jth prior box in the ith grid;
Figure DEST_PATH_IMAGE033
wherein is present>
Figure DEST_PATH_IMAGE034
A true label representing the training sample, device for selecting or keeping>
Figure DEST_PATH_IMAGE035
Indicates the probability that the current face target is not a living body>
Figure DEST_PATH_IMAGE036
Representing the probability that the current face target is a living body;
Figure DEST_PATH_IMAGE037
wherein is present>
Figure DEST_PATH_IMAGE038
Represents the output value of the auxiliary monitoring network after the auxiliary deep neural network learns the green light intensity characteristic, and is/is selected>
Figure DEST_PATH_IMAGE039
Represents the true value of the spatial spectral feature of the green light component, and->
Figure DEST_PATH_IMAGE040
Represents->
Figure 887843DEST_PATH_IMAGE038
And &>
Figure 883481DEST_PATH_IMAGE039
The distance between them.
The first aspect of the invention brings the following beneficial effects:
(1) On the basis of the principle that blood in a living body has certain intensity distribution in the green light direction when flowing through skin, an auxiliary monitoring network is arranged when a deep neural network is trained, and the auxiliary monitoring network assists the deep neural network to accurately extract living body characteristics (green light intensity characteristics) of the human face, so that the living body judgment based on the living body characteristics of the human face is further realized on the basis that a generated visible light living body detection model carries out living body judgment through some detail information such as moire patterns, paper photo reflection and the like generated by screen shooting, the problem that a silent living body detection method based on a visible light image in the prior art cannot resist 3D non-living body attack is solved, and the accuracy of living body detection is improved;
(2) The backbone network of the visible light living body detection model only comprises a deep neural network, and simultaneously, the tasks of face detection and living body identification are completed, the calculated amount of the face living body detection process is reduced, and the light weight is realized, so that the requirement of the visible light living body detection method realized by the embodiment of the invention on the calculation resource is reduced, and the delay of the living body detection process is also reduced, so that the living body detection speed is improved on the basis of improving the living body detection accuracy;
(3) The face living body detection is light, so that the visible light living body detection method realized by the embodiment of the invention can be suitable for interconnection equipment at the edge end, and meanwhile, the cost, the volume, the power consumption and the delay of the interconnection equipment in the face living body recognition are reduced.
Second aspect of the invention
A second aspect of the present invention provides a light-weight visible light biopsy device, comprising a memory for storing the light-weight visible light biopsy method according to the first aspect of the present invention, and a processor for calling the light-weight visible light biopsy method stored in the memory to perform biopsy.
The second aspect of the present invention brings about the same advantageous effects as the first aspect, and will not be described in detail herein.
Drawings
FIG. 1 is a flow chart of a method for light-weighted visible light biopsy;
FIG. 2 is a flow chart of a construction of a visible light living body detection model;
FIG. 3 is a schematic diagram of a visible light living body detection model;
fig. 4 is a schematic diagram of a deep neural network.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of protection of the present invention.
Example one
Referring to fig. 1 to 4, the embodiment provides a light-weight visible light living body detection method, including the following steps:
s1, acquiring a visible light original image to be processed. In this embodiment, the visible light original image is obtained from the visible light imaging device and is an RGB three-channel image.
And S2, recognizing a human face target from the original visible light image to be processed by utilizing a pre-constructed visible light living body detection model, and determining that the human face target is a living body or a non-living body.
The construction process of the visible light living body detection model is as follows:
and SS1, constructing a deep neural network, wherein the deep neural network is used for acquiring a historical visible light original image, extracting target characteristics in the historical visible light original image and generating a target characteristic matrix, the target characteristics comprise green light intensity characteristics, and the green light intensity characteristics are intensity distribution characteristics of green light when blood flows through skin.
Before the visible light original image is input into the deep neural network, scaling is carried out, the size of the scaled visible light original image is 256 multiplied by 3, and the scaled visible light original image is used as an input image of the deep neural network. In this embodiment, the deep neural network Net (x) includes eight feature extraction modules, and the eight feature extraction modules extract the target features and then generate a target feature matrix, where the size of the target feature matrix is 8 × 8 × 128, and a feature vector of 1 × 1 × 128 represents the target features in an image block of 16 × 16 size in the input image. As shown in fig. 4, the eight feature extraction modules include a first 3 × 3 channel separable convolution, a first 1 × 1 convolution, a first active layer, a second 3 × 3 channel separable convolution, a second 1 × 1 convolution, a second active layer, a max-pooling layer, a channel expansion layer, and an addition layer.
And SS2, constructing a first full-connection network, wherein the first full-connection network is used for receiving the target feature matrix and identifying the position and the size of the human face target in the target feature matrix.
Specifically, the first fully-connected network FC1 (x) receives the target feature matrix, regresses the category, position, and size of each target, and outputs the category, position, and size of each target, where the output of the first fully-connected network FC1 (x) is denoted by Dd (i), and i =0 to 14. Since the aspect ratio of the human face in the input image is close to 1:1, when the position of the human face object in the object feature matrix is identified through the first fully connected network FC1 (x), the prior frame sizes are set to 192 × 192, 128 × 128, and 32 × 32, and the position of the human face object in the object feature matrix is identified based on the non-maximum suppression algorithm.
And SS3, extracting a face feature matrix in the target feature matrix based on the position and size of the face target, performing global maximization processing on the face feature matrix, and obtaining a living body distinguishing feature vector after the global maximization processing.
Optionally, SS3 specifically includes the following sub-steps:
SS31, and face feature matrix F in the target feature matrix extracted based on position and size of face target H ×F W ×N;
SS32, respectively for N F H ×F W Obtaining the maximum value of the matrix x 1, and generating living body distinguishing feature vectors according to the obtained N maximum values; wherein the value of N is 128.
And SS4, constructing a second fully-connected network, wherein the second fully-connected network is used for receiving the living body distinguishing feature vector and determining that the current face target is a living body or a non-living body according to the living body distinguishing feature vector.
Optionally, in SS4, determining that the current face target is a living body or a non-living body according to the living body discrimination feature vector includes the following specific sub-steps:
the SS41 and the second fully-connected network FC2 (f) classify the acquired living body discrimination feature vectors, and output the probability that the current face target is a living body and the probability that the current face target is a non-living body. In this embodiment, the second fully-connected network FC2 (f) preferably outputs the probability that the current face target is a living body and the probability that the current face target is a non-living body through a softmax function, where the probability that the current face target is a living body is expressed as
Figure 286780DEST_PATH_IMAGE036
The probability that the current face target is not a living body is expressed as ≥>
Figure 326149DEST_PATH_IMAGE035
SS42, if the probability that the current face target is a living body is larger than the probability that the current face target is a non-living body, determining that the current face target is a living body; and if the probability that the current face target is a living body is smaller than the probability that the current face target is a non-living body, determining that the current face target is the non-living body.
SS5, training the deep neural network, the first fully-connected network and the second fully-connected network by using a training sample, introducing an auxiliary monitoring network when the deep neural network is trained, taking a loss function as the training constraint, obtaining network parameters of the deep neural network, the first fully-connected network, the second fully-connected network and the auxiliary monitoring network after the training is finished, and then generating a visible light living body detection model based on the obtained network parameters of the deep neural network, the first fully-connected network and the second fully-connected network, wherein the visible light living body detection model comprises the deep neural network, the first fully-connected network and the second fully-connected network; the auxiliary monitoring network is used for auxiliary monitoring when the deep neural network extracts the green light intensity characteristics.
Optionally, the auxiliary supervised network includes a supervised learning network and a first spectral feature extraction network.
The first spectral feature extraction network is used for intercepting a face image from a historical visible light original image according to the position and the size of a face target,and extracting green light intensity components of the face image, and then generating green light component spatial spectrum characteristics of the face image based on Fourier transform. In this embodiment, the cut-out face image is scaled and then the green light intensity component is extracted, and the scaled face image has a size of 256 × 256 × 3, which is represented as a face image I f
And the supervised learning network is used for receiving the target feature matrix, extracting a single face feature matrix in the target feature matrix based on the position and the size of the face target, then performing learning supervision, and enabling the green light intensity feature in the single face feature matrix to approach the green light component spatial spectrum feature after the learning supervision. In this embodiment, before learning and supervision are performed, the extracted single face feature matrix is further scaled, the size of the scaled single face feature matrix is 8 × 8 × 128, then the single face feature matrix is input into a supervised convolution network C (V), 1 × 1 convolution operation is performed by the supervised convolution network C (V), and the size of the single face feature matrix after 1 × 1 convolution operation is 8 × 8 × 1.
Optionally, face image I f Extracting green light intensity component to obtain face image I H Corresponding, face image I H Has a size of 256 × 256 × 1. Wherein, the extraction of the green light intensity component specifically comprises:
based on a first formula, the face image I f The RGB three-channel numerical values of each pixel point are all converted into single green component numerical values, wherein the first formula is as follows:
Figure DEST_PATH_IMAGE042
wherein,
Figure DEST_PATH_IMAGE043
Figure DEST_PATH_IMAGE044
Figure 673954DEST_PATH_IMAGE007
representing a face image I f The value of the 0 th channel of the mth row and nth column pixel,
Figure 145386DEST_PATH_IMAGE008
representing a face image I f The value of the 1 st channel of the mth row and nth column pixel point in the middle row>
Figure 216242DEST_PATH_IMAGE009
Representing a face image I f The value of the 2 nd channel of the nth pixel point in the mth row->
Figure 380507DEST_PATH_IMAGE010
Representing a face image I f And (4) converting the pixel point of the mth row and the nth column.
Optionally, the face image I is generated based on fourier transform H The spatial spectrum characteristic of the green light component specifically comprises the following sub-steps:
SSS1, face image I after green light intensity component is extracted H Performing Fourier transform;
and SSS2, performing normalization calculation by taking a Fourier transform module, scaling the human face image obtained after the normalization calculation, and obtaining the green light component spatial spectrum characteristic of the human face image after scaling. Wherein the size of the spatial spectral feature of the green light component is 8 × 8 × 1.
Optionally, a loss function
Figure 153291DEST_PATH_IMAGE011
,/>
Figure 428414DEST_PATH_IMAGE012
Is a preset first weight factor, < > based on the comparison>
Figure 868623DEST_PATH_IMAGE013
Is a preset second weight coefficient>
Figure 734948DEST_PATH_IMAGE014
Is a preset third weight factor, < > based on the comparison>
Figure 198290DEST_PATH_IMAGE015
Is a preset fourth weight factor, < > based on>
Figure 323110DEST_PATH_IMAGE016
Represents the classification loss when the human face and the non-human face are distinguished, and is judged>
Figure 352246DEST_PATH_IMAGE017
Represents a loss in the regression of the position of the human face target>
Figure 858313DEST_PATH_IMAGE018
Represents the classification loss in the discrimination of a living body and a non-living body>
Figure 605690DEST_PATH_IMAGE019
Indicating a loss of learning of the green intensity features.
Wherein,
Figure DEST_PATH_IMAGE045
n is the number of training samples; />
Figure 753774DEST_PATH_IMAGE022
Indicating whether the jth prior frame in the ith grid is responsible for detecting the true value of the face, if->
Figure 653728DEST_PATH_IMAGE022
=1, it means that the jth prior frame in the ith grid is responsible for detecting the face, if & ->
Figure 330697DEST_PATH_IMAGE022
=0, it means that the jth prior frame in the ith grid is not responsible for detecting the face; />
Figure 299790DEST_PATH_IMAGE023
Indicating whether the ith mesh contains the true value of the jth prior box center point,if/or>
Figure 517145DEST_PATH_IMAGE023
=1, then indicate that the ith grid includes the jth a priori frame center point, if &>
Figure 724135DEST_PATH_IMAGE023
If =0, it means that the ith grid does not contain the jth prior frame center point; the number of the grids is 64, and each grid corresponds to each characteristic diagram in the target characteristic matrix one by one;
Figure 368743DEST_PATH_IMAGE024
an output value representing whether the ith grid contains the jth prior frame center point if >>
Figure 339979DEST_PATH_IMAGE024
=1, then indicate that the ith grid contains the jth prior frame center point, if £ is present>
Figure 298708DEST_PATH_IMAGE024
And =0, it means that the ith mesh does not contain the jth prior frame center point.
Figure DEST_PATH_IMAGE046
,/>
Figure 688101DEST_PATH_IMAGE027
A center point coordinate estimate representing the jth prior frame in the ith grid,/>>
Figure 706873DEST_PATH_IMAGE028
Represents the true value of the center point coordinate of the jth prior frame in the ith grid, and/or is greater than the true value of the center point coordinate of the jth prior frame in the ith grid>
Figure 916137DEST_PATH_IMAGE029
Represents the width estimate, based on the value of the jth prior frame in the ith trellis>
Figure 226027DEST_PATH_IMAGE030
Represents a height estimate for the jth prior frame in the ith grid>
Figure 142030DEST_PATH_IMAGE031
Represents the true value, based on the width of the jth prior box in the ith trellis>
Figure 331703DEST_PATH_IMAGE032
Representing the true height value of the jth prior box in the ith mesh.
Figure 762684DEST_PATH_IMAGE033
,/>
Figure 859953DEST_PATH_IMAGE034
A true label representing the training sample.
Figure 161622DEST_PATH_IMAGE037
,/>
Figure 318934DEST_PATH_IMAGE038
An output value representing the learning of the green light intensity characteristic by the auxiliary deep neural network of the auxiliary supervision network is combined>
Figure 509916DEST_PATH_IMAGE039
Represents the true value of the spatial spectral feature of the green light component, and->
Figure 614138DEST_PATH_IMAGE040
Represents->
Figure 301472DEST_PATH_IMAGE038
And/or>
Figure 629685DEST_PATH_IMAGE039
The distance between them.
Example two
The embodiment provides a light-weight visible light biopsy device, which comprises a memory and a processor, wherein the memory is used for storing the light-weight visible light biopsy method in the first embodiment, and the processor is used for calling the light-weight visible light biopsy method stored in the memory to perform biopsy.
The lightweight visible light living body detection method realizes the rapid and accurate judgment of whether the human face target in the visible light original image is a living body, and is based on the following principle:
the first visible light living body detection model construction stage:
after a first full-connection network FC1 (x), a second full-connection network FC2 (f) and a deep neural network Net (x) in a backbone are built, when the deep neural network Net (x), the first full-connection network FC1 (x) and the second full-connection network FC2 (f) are trained, an auxiliary monitoring network is introduced, based on the assistance of the auxiliary monitoring network, the deep neural network accurately learns the green light intensity characteristics of the human face, when the human face is classified as a living body, the second full-connection network FC2 (f) classifies whether the current human face is the living body or not based on the green light intensity characteristics of the human face in a target characteristic matrix, and then the probability that the classified output current human face is the living body is compared with the probability that the current human face is the non-living body, so that the judgment result whether the current human face is the living body or not is obtained.
The green light intensity characteristic of the human face is a living body characteristic of the human face, and the green light intensity characteristic is used as a supplement of basic characteristics such as moire and reflection based on a traditional visible light living body detection method.
Secondly, loading a visible light living body detection model on the interconnection equipment:
an online visible light living body detection model is arranged in the interconnection equipment, and comprises a deep neural network Net (x), a first fully-connected network FC1 (x) and a second fully-connected network FC2 (f) which are positioned in the backbone;
interconnection equipment carries out the live body and detects, specifically includes: the method comprises the steps that a deep neural network Net (x) obtains an input image, target feature extraction is conducted, and a target feature matrix is generated; the first fully-connected network FC1 (x) is directed to the class, location and size of each object in the object feature matrixPerforming regression, and outputting the category, position and size of each target so as to obtain the position and size of the human face target; based on the position and size of a human face target, a human face feature matrix is intercepted from a target feature matrix, and a living body distinguishing feature vector is generated after the global maximization operation is carried out on the human face feature matrix; inputting the living body discrimination feature vector into a second full-connection network FC2 (f) to classify the current face as a living body or not, and then outputting the probability P that the current face is the living body t Probability of non-living body P f If P is t >P f Then the current face is live, if P t <P f And the current face is a non-living body.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A light-weight visible light living body detection method is characterized by comprising the following steps:
s1, acquiring a visible light original image to be processed;
s2, recognizing a human face target from a visible light original image to be processed by utilizing a pre-constructed visible light living body detection model, and determining that the human face target is a living body or a non-living body;
the construction process of the visible light living body detection model is as follows:
SS1, constructing a deep neural network, wherein the deep neural network is used for acquiring a historical visible light original image, extracting target features in the historical visible light original image and generating a target feature matrix, the target features comprise green light intensity features, and the green light intensity features are intensity distribution features of green light when blood flows through skin;
SS2, constructing a first fully-connected network, wherein the first fully-connected network is used for receiving the target feature matrix and identifying the position and the size of a human face target in the target feature matrix;
SS3, extracting a face feature matrix in a target feature matrix based on the position and the size of the face target, and performing global maximization processing on the face feature matrix to obtain living body distinguishing feature vectors after the global maximization processing;
SS4, constructing a second fully-connected network, wherein the second fully-connected network is used for receiving the living body distinguishing feature vector and determining that the current face target is a living body or a non-living body according to the living body distinguishing feature vector;
SS5, training the deep neural network, the first fully-connected network and the second fully-connected network by using a training sample, introducing an auxiliary supervision network when the deep neural network is trained, taking a loss function as the training constraint, obtaining network parameters of the deep neural network, the first fully-connected network, the second fully-connected network and the auxiliary supervision network after the training is finished, and then generating a visible light living body detection model based on the network parameters of the deep neural network, the first fully-connected network and the second fully-connected network;
the auxiliary supervision network is used for auxiliary supervision when the deep neural network extracts the green light intensity characteristics;
the auxiliary supervision network comprises a supervision learning network and a first spectral feature extraction network;
the first spectral feature extraction network is used for intercepting a face image from a historical visible light original image according to the position and the size of a face target, extracting green light intensity components of the face image, and then generating green light component spatial spectral features of the face image based on Fourier transform;
the supervised learning network is used for receiving the target feature matrix, extracting a single face feature matrix in the target feature matrix based on the position and the size of a face target, then performing learning supervision, and enabling the green light intensity feature in the single face feature matrix to approach the green light component spatial spectrum feature after the learning supervision;
the visible light original image is an RGB three-channel image;
the method for extracting the green light intensity component of the face image specifically comprises the following substeps: based on a first formula, the face image I f The RGB three-channel numerical values of each pixel point are converted into single green component numerical values, and the first formula is as follows:
Figure QLYQS_1
wherein,
Figure QLYQS_2
Figure QLYQS_3
,/>
Figure QLYQS_4
representing a face image I f The value of the 0 th channel of the nth column pixel point in the mth row->
Figure QLYQS_5
Representing a face image I f The value of the 1 st channel of the nth pixel point in the mth row->
Figure QLYQS_6
Representing a face image I f The value of the 2 nd channel of the nth pixel point in the mth row->
Figure QLYQS_7
Representing a face image I f The value of the m-th row and n-th column of pixel points after transformation;
the Fourier transform-based generation of the green light component spatial spectrum feature of the face image specifically comprises the following sub-steps:
SSS1, performing Fourier transform on the face image with the green light intensity component extracted;
and SSS2, performing normalization calculation by taking a Fourier transform module, and then obtaining the green light component spatial spectrum characteristic of the face image.
2. The light weight visible light living body detection method according to claim 1, wherein in the SS2, a position of the human face target in the target feature matrix is identified based on a non-maximum suppression algorithm.
3. The method for detecting a light-weighted visible light living body according to claim 1, wherein the SS3 includes the following steps:
SS31, and face feature matrix F in the extracted target feature matrix based on the position and size of the face target H ×F W ×N;
SS32, respectively for N F H ×F W The x 1 matrix finds the maximum value, and generates a living body discrimination feature vector from the N maximum values obtained.
4. The method for detecting a light-weighted visible light living body according to claim 1, wherein the SS4 determines whether the current human face target is a living body or a non-living body according to the living body discrimination feature vector, and specifically comprises the following sub-steps:
SS41, a second full-connection network classifies the obtained living body distinguishing feature vector and outputs the probability that the current face target is a living body and the probability that the current face target is a non-living body;
SS42, if the probability that the current face target is a living body is larger than the probability that the current face target is a non-living body, determining that the current face target is a living body; and if the probability that the current face target is a living body is smaller than the probability that the current face target is a non-living body, determining that the current face target is the non-living body.
5. The method for detecting a lightweight visible light living body according to claim 1,
in the SS1, before a target feature matrix in a historical visible light original image is extracted by a deep neural network, scaling the received historical visible light original image, wherein the size of the scaled visible light original image is 256 × 256 × 3, and the size of the target feature matrix is 8 × 8 × 128;
in the SS2, when the first fully-connected network identifies the position of the face target in the target feature matrix, the preset prior frame sizes include 192 × 192, 128 × 128, and 32 × 32.
6. The method for detecting a lightweight visible light living body according to claim 5,
said loss function
Figure QLYQS_9
,/>
Figure QLYQS_11
Is a preset first weight factor, < > based on the comparison>
Figure QLYQS_14
Is a preset second weight factor, < > based on the comparison>
Figure QLYQS_10
Is a preset third weight coefficient>
Figure QLYQS_13
Is a preset fourth weight factor, < > based on>
Figure QLYQS_15
Represents the classification loss in the discrimination of a face and a non-face>
Figure QLYQS_16
Represents a loss in the regression of the position of the human face target>
Figure QLYQS_8
Represents the classification loss in the discrimination of a living body and a non-living body>
Figure QLYQS_12
Representing a green light intensity feature learning loss;
wherein,
Figure QLYQS_19
n is the number of training samples; />
Figure QLYQS_20
Indicating whether the jth prior frame in the ith grid is responsible for detecting the true value of the face, if->
Figure QLYQS_23
=1, this means that the jth prior frame in the ith grid is responsible for detecting a face, and if ÷>
Figure QLYQS_18
=0, it means that the jth prior frame in the ith grid is not responsible for detecting the face; />
Figure QLYQS_21
Whether the ith grid contains the true value of the jth prior frame central point is represented, if so, the real value of the jth prior frame central point is represented
Figure QLYQS_25
=1, then indicate that the ith grid contains the jth prior frame center point, if £ is present>
Figure QLYQS_26
If =0, it means that the ith grid does not contain the jth prior frame center point; the number of the grids is 64, and each grid corresponds to each characteristic diagram in the target characteristic matrix one by one; />
Figure QLYQS_17
An output value representing whether the ith grid contains the jth prior frame center point, if >>
Figure QLYQS_22
=1,It indicates that the ith grid contains the jth prior frame center point, if >>
Figure QLYQS_24
=0, then it means that the ith grid does not contain the jth prior frame center point;
Figure QLYQS_27
wherein
Figure QLYQS_28
represents an estimate of the coordinates of the center point of the jth prior frame in the ith grid, </or>
Figure QLYQS_29
Represents the true value of the center point coordinate of the jth prior frame in the ith grid, and/or is greater than the true value of the center point coordinate of the jth prior frame in the ith grid>
Figure QLYQS_30
Represents the width estimate, based on the value of the jth prior frame in the ith trellis>
Figure QLYQS_31
Represents a height estimate, based on the jth prior frame in the ith grid>
Figure QLYQS_32
Represents the true value, based on the width of the jth prior box in the ith trellis>
Figure QLYQS_33
Representing the true height value of the jth prior box in the ith grid;
Figure QLYQS_34
wherein is present>
Figure QLYQS_35
A true label representing the training sample, device for selecting or keeping>
Figure QLYQS_36
Indicates the probability that the current face target is not a living body>
Figure QLYQS_37
Representing the probability that the current face target is a living body;
Figure QLYQS_38
wherein is present>
Figure QLYQS_39
Represents the output value of the auxiliary monitoring network after the auxiliary deep neural network learns the green light intensity characteristic, and is/is selected>
Figure QLYQS_40
Represents the true value of the spatial spectral feature of the green light component, and->
Figure QLYQS_41
Represents->
Figure QLYQS_42
And/or>
Figure QLYQS_43
The distance between them.
7. A lightweight visible-light living-body detection device comprising a memory for storing the lightweight visible-light living-body detection method according to any one of claims 1 to 6, and a processor for calling the lightweight visible-light living-body detection method stored in the memory to perform living-body detection.
CN202211503095.XA 2022-11-29 2022-11-29 Lightweight visible light living body detection method and device Active CN115601818B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211503095.XA CN115601818B (en) 2022-11-29 2022-11-29 Lightweight visible light living body detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211503095.XA CN115601818B (en) 2022-11-29 2022-11-29 Lightweight visible light living body detection method and device

Publications (2)

Publication Number Publication Date
CN115601818A CN115601818A (en) 2023-01-13
CN115601818B true CN115601818B (en) 2023-04-07

Family

ID=84852200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211503095.XA Active CN115601818B (en) 2022-11-29 2022-11-29 Lightweight visible light living body detection method and device

Country Status (1)

Country Link
CN (1) CN115601818B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576488B (en) * 2024-01-17 2024-04-05 海豚乐智科技(成都)有限责任公司 Infrared dim target detection method based on target image reconstruction

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2023127B1 (en) * 2006-05-31 2017-12-20 Olympus Corporation Biological specimen imaging method and biological specimen imaging apparatus
CN111488756B (en) * 2019-01-25 2023-10-03 杭州海康威视数字技术股份有限公司 Face recognition-based living body detection method, electronic device, and storage medium
CN110462632A (en) * 2019-06-27 2019-11-15 深圳市汇顶科技股份有限公司 The method, apparatus and electronic equipment of recognition of face
CN110941986B (en) * 2019-10-10 2023-08-01 平安科技(深圳)有限公司 Living body detection model training method, living body detection model training device, computer equipment and storage medium
CN110765923B (en) * 2019-10-18 2024-05-24 腾讯科技(深圳)有限公司 Face living body detection method, device, equipment and storage medium
CN111191521B (en) * 2019-12-11 2022-08-12 智慧眼科技股份有限公司 Face living body detection method and device, computer equipment and storage medium
CN111767900B (en) * 2020-07-28 2024-01-26 腾讯科技(深圳)有限公司 Face living body detection method, device, computer equipment and storage medium
CN112329696A (en) * 2020-11-18 2021-02-05 携程计算机技术(上海)有限公司 Face living body detection method, system, equipment and storage medium
CN114663985A (en) * 2020-12-23 2022-06-24 北京眼神智能科技有限公司 Face silence living body detection method and device, readable storage medium and equipment
CN113128481A (en) * 2021-05-19 2021-07-16 济南博观智能科技有限公司 Face living body detection method, device, equipment and storage medium
CN113378715B (en) * 2021-06-10 2024-01-05 北京华捷艾米科技有限公司 Living body detection method based on color face image and related equipment
CN113496215B (en) * 2021-07-07 2024-07-02 浙江大华技术股份有限公司 Method and device for detecting living human face and electronic equipment
CN115131880B (en) * 2022-05-30 2024-05-10 上海大学 Multi-scale attention fusion double-supervision human face living body detection method
CN115082994A (en) * 2022-06-27 2022-09-20 平安银行股份有限公司 Face living body detection method, and training method and device of living body detection network model
CN115273184B (en) * 2022-07-15 2023-05-05 北京百度网讯科技有限公司 Training method and device for human face living body detection model

Also Published As

Publication number Publication date
CN115601818A (en) 2023-01-13

Similar Documents

Publication Publication Date Title
CN109543606B (en) Human face recognition method with attention mechanism
CN112949565B (en) Single-sample partially-shielded face recognition method and system based on attention mechanism
CN110348319B (en) Face anti-counterfeiting method based on face depth information and edge image fusion
CN108921100B (en) Face recognition method and system based on visible light image and infrared image fusion
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN105740780B (en) Method and device for detecting living human face
CN108345818B (en) Face living body detection method and device
CN112801015B (en) Multi-mode face recognition method based on attention mechanism
CN107423701A (en) The non-supervisory feature learning method and device of face based on production confrontation network
CN108446617A (en) The human face quick detection method of anti-side face interference
CN114758288B (en) Power distribution network engineering safety control detection method and device
CN107871101A (en) A kind of method for detecting human face and device
KR20130048076A (en) Face recognition apparatus and control method for the same
CN110263768A (en) A kind of face identification method based on depth residual error network
CN109993103A (en) A kind of Human bodys&#39; response method based on point cloud data
CN111626113A (en) Facial expression recognition method and device based on facial action unit
CN106709418A (en) Face identification method based on scene photo and identification photo and identification apparatus thereof
CN115601818B (en) Lightweight visible light living body detection method and device
CN105956570A (en) Lip characteristic and deep learning based smiling face recognition method
CN117423134A (en) Human body target detection and analysis multitasking cooperative network and training method thereof
CN111881803B (en) Face recognition method based on improved YOLOv3
CN104318267B (en) A kind of automatic identification system of Tibetan mastiff pup purity
CN111950586B (en) Target detection method for introducing bidirectional attention
CN111553202B (en) Training method, detection method and device for neural network for living body detection
CN114360034A (en) Method, system and equipment for detecting deeply forged human face based on triplet network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant