CN115601818B

CN115601818B - Lightweight visible light living body detection method and device

Info

Publication number: CN115601818B
Application number: CN202211503095.XA
Authority: CN
Inventors: 蒙顺开; 瞿锐恒; 李叶雨
Original assignee: Dolphin Lezhi Technology Chengdu Co ltd
Current assignee: Dolphin Lezhi Technology Chengdu Co ltd
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-04-07
Anticipated expiration: 2042-11-29
Also published as: CN115601818A

Abstract

The invention discloses a lightweight visible light living body detection method and a device, wherein the lightweight visible light living body detection method utilizes a visible light living body detection model to carry out living body discrimination on a human face in a visible light original image, the visible light living body detection model comprises a deep neural network, a first full-connection network and a second full-connection network, and an auxiliary monitoring network for carrying out auxiliary learning on green light intensity characteristics is introduced during training of the visible light living body detection model. Based on the principle that blood in a living body has certain intensity distribution in the green light direction when flowing through skin, the deep neural network is trained, the auxiliary monitoring network is arranged, and the auxiliary monitoring network assists the deep neural network to accurately extract living body characteristics of a human face, so that the problem that a silent living body detection method based on a visible light image in the prior art cannot resist 3D non-living body attack is solved, the living body detection accuracy is improved, and the light weight is achieved.

Description

Lightweight visible light living body detection method and device

Technical Field

The invention belongs to the technical field of image processing and target identification, and particularly relates to a light-weight visible light living body detection method and device.

Background

The liveness detection technology mainly discriminates whether a face appearing in front of a machine is real or fake, wherein faces appearing by means of other media can be defined as false faces, including printed photos, screen images, silica gel masks, stereoscopic 3D (three-dimensional) figures and the like. Currently, mainstream biopsy schemes include coordinated biopsy and uncoordinated biopsy (silent biopsy), and the like. The fitting type living body detection requires a user to complete a specified action according to a prompt and then perform living body verification, and can be called dynamic living body detection. The silent living body detection is opposite to the dynamic living body detection, and whether the living body is a real living body is judged mainly under the condition that a series of actions such as blinking, mouth opening and the like are not matched. Therefore, the technical realization difficulty of the silent live body detection is higher, the requirement on accuracy in practical application is higher, and meanwhile, the silent live body detection directly carries out live body verification under the condition that a user does not feel, so that better user experience is achieved.

The silent in-vivo detection is generally divided into three technical routes of infrared images, 3D structural light and visible light images according to different imaging sources: the infrared image filters light rays in a specific wave band, and the false face attack based on screen imaging is naturally resisted; depth information is introduced into the 3D structured light, and false face attacks of 2D media such as paper photos, screen imaging and the like can be easily distinguished; the visible light picture is mainly distinguished through Moire patterns, paper photo reflection and other detailed information which appear in screen shooting. Based on the above analysis, it is found that compared with the other two methods, the living body detection based on the visible light image can only perform discrimination by the information of the image itself, and thus the method faces a greater challenge in an actual open scene.

However, the silent in-vivo detection based on the visible light image has the advantages of high identification speed, simplicity and convenience in operation, non-contact type and the like, and in addition, compared with an infrared imaging device and a 3D structured light imaging device, the visible light imaging device is lower in cost and high in integration level, and the main flow directions of the existing face identification system are all visible light imaging devices, so that the method has important value in research on a method for performing in-vivo detection based on visible light imaging. Meanwhile, with the popularization of technologies such as 5G and AI, the world of everything interconnection has come, so that the face recognition technology has been widely applied to various types of interconnection devices, including various interconnection devices at the edge, and the application of the face recognition technology in the interconnection devices at the edge needs to consider the calculation power and power consumption of the interconnection devices at the edge, so that how to realize the lightweight of an algorithm process is also a problem that needs to be considered in the research of a method for performing living body detection based on visible light imaging, and the face recognition technology is suitable for the interconnection devices at the edge with extremely limited calculation power.

Disclosure of Invention

The invention aims to overcome one or more defects in the prior art and provide a light-weight visible light living body detection method and device.

The purpose of the invention is realized by the following technical scheme:

first aspect

The invention provides a light-weight visible light living body detection method, which comprises the following steps:

s1, acquiring a visible light original image to be processed;

s2, recognizing a human face target from a visible light original image to be processed by utilizing a pre-constructed visible light living body detection model, and determining that the human face target is a living body or a non-living body;

the construction process of the visible light living body detection model is as follows:

SS1, constructing a deep neural network, wherein the deep neural network is used for acquiring a historical visible light original image, extracting a target feature in the historical visible light original image and generating a target feature matrix, the target feature comprises a green light intensity feature, and the green light intensity feature is an intensity distribution feature of green light when blood flows through skin;

SS2, constructing a first fully-connected network, wherein the first fully-connected network is used for receiving the target feature matrix and identifying the position and the size of a human face target in the target feature matrix;

SS3, extracting a face feature matrix in a target feature matrix based on the position and the size of the face target, and performing global maximization processing on the face feature matrix to obtain living body distinguishing feature vectors after the global maximization processing;

SS4, constructing a second fully-connected network, wherein the second fully-connected network is used for receiving the living body distinguishing feature vector and determining that the current face target is a living body or a non-living body according to the living body distinguishing feature vector;

SS5, training the deep neural network, the first fully-connected network and the second fully-connected network by using training samples, introducing an auxiliary monitoring network when the deep neural network is trained, taking a loss function as training constraint, obtaining network parameters of the deep neural network, the first fully-connected network, the second fully-connected network and the auxiliary monitoring network after training is finished, and then generating a visible light living body detection model based on the network parameters of the deep neural network, the first fully-connected network and the second fully-connected network;

the auxiliary supervision network is used for auxiliary supervision when the deep neural network extracts the green light intensity characteristics.

Preferably, in the SS2, the position of the human face target in the target feature matrix is identified based on a non-maximum suppression algorithm.

Preferably, the SS3 specifically includes the following sub-steps:

SS31, and face feature matrix F in the extracted target feature matrix based on the position and size of the face target _H ×F _W ×N；

SS32, respectively for N F _H ×F _W The x 1 matrix finds the maximum value, and generates a living body discrimination feature vector from the N maximum values obtained.

Preferably, in the SS4, determining that the current face target is a living body or a non-living body according to the living body discrimination feature vector specifically includes the following sub-steps:

SS41, a second full-connection network classifies the obtained living body distinguishing feature vector and outputs the probability that the current face target is a living body and the probability that the current face target is a non-living body;

SS42, if the probability that the current face target is a living body is larger than the probability that the current face target is a non-living body, determining that the current face target is a living body; and if the probability that the current face target is a living body is smaller than the probability that the current face target is a non-living body, determining that the current face target is the non-living body.

Preferably, the auxiliary supervised network comprises a supervised learning network and a first spectral feature extraction network;

the first spectral feature extraction network is used for intercepting a face image from a historical visible light original image according to the position and the size of a face target, extracting green light intensity components of the face image, and then generating green light component spatial spectral features of the face image based on Fourier transform;

and the supervised learning network is used for receiving the target feature matrix, extracting a single face feature matrix in the target feature matrix based on the position and the size of a face target, then performing learning supervision, and enabling the green light intensity feature in the single face feature matrix to approach the green light component spatial spectrum feature after the learning supervision.

Preferably, the visible light original image is an RGB three-channel image; the method for extracting the green light intensity component of the face image specifically comprises the following substeps: based on a first formula, the face image I _f The RGB three-channel numerical values of each pixel point are all converted into single green component numerical values, and the first formula is as follows:

，

wherein,

，

，

representing a face image I _f The value of the 0 th channel of the nth column pixel point in the mth row->

Representing a face image I _f The value of the 1 st channel of the nth pixel point in the mth row->

Representing a face image I _f The value of the 2 nd channel of the nth pixel point in the mth row->

Representing a face image I _f Middle mth row and nth column pixel pointThe transformed values.

Preferably, the fourier transform-based generation of the spatial spectral feature of the green light component of the face image specifically includes the following sub-steps:

SSS1, performing Fourier transform on the face image after the green light intensity component is extracted;

and SSS2, performing normalization calculation by taking a Fourier transform module, and then obtaining the green light component spatial spectrum characteristic of the face image.

Preferably, in the SS1, before extracting a target feature matrix in the historical visible light original image, the deep neural network scales the received historical visible light original image, where the scaled visible light original image is 256 × 256 × 3, and the target feature matrix is 8 × 8 × 128; in the SS2, when the first fully-connected network identifies the position of the face target in the target feature matrix, the preset prior frame sizes include 192 × 192, 128 × 128, and 32 × 32.

Preferably, the loss function

，/>

Is a preset first weight coefficient,

is a preset second weight factor, < > based on the comparison>

Is a preset third weight coefficient>

Is a preset fourth weight factor, < > based on>

Represents the classification loss when the human face and the non-human face are distinguished, and is judged>

Represents a loss in regression for the location of a human face target>

Represents the classification loss in the discrimination of a living body and a non-living body>

Representing a green light intensity feature learning loss;

wherein,

n is the number of training samples; />

Indicating whether the jth prior frame in the ith grid is responsible for detecting the true value of the face, if->

=1, it means that the jth prior frame in the ith grid is responsible for detecting the face, if & ->

=0, it means that the jth prior frame in the ith grid is not responsible for detecting the face; />

Whether the ith grid contains the true value of the central point of the jth prior frame or not is shown, if so

=1, then indicate that the ith grid contains the jth prior frame center point, if £ is present>

If =0, it means that the ith grid does not contain the jth prior frame center point; the number of the grids is 64, and each grid corresponds to each characteristic diagram in the target characteristic matrix one by one; />

An output value representing whether the ith grid contains the jth prior frame center point, if >>

If =0, it means that the ith grid does not contain the jth prior frame center point;

in which>

A center point coordinate estimate representing the jth prior frame in the ith grid,/>>

Represents the true value of the center point coordinate of the jth prior frame in the ith grid, and/or is greater than the true value of the center point coordinate of the jth prior frame in the ith grid>

Represents the width estimate, based on the value of the jth prior frame in the ith trellis>

Represents a height estimate, based on the jth prior frame in the ith grid>

Actual value representing the width of the jth prior box in the ith grid>

Representing the true height value of the jth prior box in the ith grid;

wherein is present>

A true label representing the training sample, device for selecting or keeping>

Indicates the probability that the current face target is not a living body>

Representing the probability that the current face target is a living body;

wherein is present>

Represents the output value of the auxiliary monitoring network after the auxiliary deep neural network learns the green light intensity characteristic, and is/is selected>

Represents the true value of the spatial spectral feature of the green light component, and->

Represents->

And &>

The distance between them.

The first aspect of the invention brings the following beneficial effects:

(1) On the basis of the principle that blood in a living body has certain intensity distribution in the green light direction when flowing through skin, an auxiliary monitoring network is arranged when a deep neural network is trained, and the auxiliary monitoring network assists the deep neural network to accurately extract living body characteristics (green light intensity characteristics) of the human face, so that the living body judgment based on the living body characteristics of the human face is further realized on the basis that a generated visible light living body detection model carries out living body judgment through some detail information such as moire patterns, paper photo reflection and the like generated by screen shooting, the problem that a silent living body detection method based on a visible light image in the prior art cannot resist 3D non-living body attack is solved, and the accuracy of living body detection is improved;

(2) The backbone network of the visible light living body detection model only comprises a deep neural network, and simultaneously, the tasks of face detection and living body identification are completed, the calculated amount of the face living body detection process is reduced, and the light weight is realized, so that the requirement of the visible light living body detection method realized by the embodiment of the invention on the calculation resource is reduced, and the delay of the living body detection process is also reduced, so that the living body detection speed is improved on the basis of improving the living body detection accuracy;

(3) The face living body detection is light, so that the visible light living body detection method realized by the embodiment of the invention can be suitable for interconnection equipment at the edge end, and meanwhile, the cost, the volume, the power consumption and the delay of the interconnection equipment in the face living body recognition are reduced.

Second aspect of the invention

A second aspect of the present invention provides a light-weight visible light biopsy device, comprising a memory for storing the light-weight visible light biopsy method according to the first aspect of the present invention, and a processor for calling the light-weight visible light biopsy method stored in the memory to perform biopsy.

The second aspect of the present invention brings about the same advantageous effects as the first aspect, and will not be described in detail herein.

Drawings

FIG. 1 is a flow chart of a method for light-weighted visible light biopsy;

FIG. 2 is a flow chart of a construction of a visible light living body detection model;

FIG. 3 is a schematic diagram of a visible light living body detection model;

fig. 4 is a schematic diagram of a deep neural network.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of protection of the present invention.

Example one

Referring to fig. 1 to 4, the embodiment provides a light-weight visible light living body detection method, including the following steps:

s1, acquiring a visible light original image to be processed. In this embodiment, the visible light original image is obtained from the visible light imaging device and is an RGB three-channel image.

And S2, recognizing a human face target from the original visible light image to be processed by utilizing a pre-constructed visible light living body detection model, and determining that the human face target is a living body or a non-living body.

and SS1, constructing a deep neural network, wherein the deep neural network is used for acquiring a historical visible light original image, extracting target characteristics in the historical visible light original image and generating a target characteristic matrix, the target characteristics comprise green light intensity characteristics, and the green light intensity characteristics are intensity distribution characteristics of green light when blood flows through skin.

Before the visible light original image is input into the deep neural network, scaling is carried out, the size of the scaled visible light original image is 256 multiplied by 3, and the scaled visible light original image is used as an input image of the deep neural network. In this embodiment, the deep neural network Net (x) includes eight feature extraction modules, and the eight feature extraction modules extract the target features and then generate a target feature matrix, where the size of the target feature matrix is 8 × 8 × 128, and a feature vector of 1 × 1 × 128 represents the target features in an image block of 16 × 16 size in the input image. As shown in fig. 4, the eight feature extraction modules include a first 3 × 3 channel separable convolution, a first 1 × 1 convolution, a first active layer, a second 3 × 3 channel separable convolution, a second 1 × 1 convolution, a second active layer, a max-pooling layer, a channel expansion layer, and an addition layer.

And SS2, constructing a first full-connection network, wherein the first full-connection network is used for receiving the target feature matrix and identifying the position and the size of the human face target in the target feature matrix.

Specifically, the first fully-connected network FC1 (x) receives the target feature matrix, regresses the category, position, and size of each target, and outputs the category, position, and size of each target, where the output of the first fully-connected network FC1 (x) is denoted by Dd (i), and i =0 to 14. Since the aspect ratio of the human face in the input image is close to 1:1, when the position of the human face object in the object feature matrix is identified through the first fully connected network FC1 (x), the prior frame sizes are set to 192 × 192, 128 × 128, and 32 × 32, and the position of the human face object in the object feature matrix is identified based on the non-maximum suppression algorithm.

And SS3, extracting a face feature matrix in the target feature matrix based on the position and size of the face target, performing global maximization processing on the face feature matrix, and obtaining a living body distinguishing feature vector after the global maximization processing.

Optionally, SS3 specifically includes the following sub-steps:

SS31, and face feature matrix F in the target feature matrix extracted based on position and size of face target _H ×F _W ×N；

SS32, respectively for N F _H ×F _W Obtaining the maximum value of the matrix x 1, and generating living body distinguishing feature vectors according to the obtained N maximum values; wherein the value of N is 128.

And SS4, constructing a second fully-connected network, wherein the second fully-connected network is used for receiving the living body distinguishing feature vector and determining that the current face target is a living body or a non-living body according to the living body distinguishing feature vector.

Optionally, in SS4, determining that the current face target is a living body or a non-living body according to the living body discrimination feature vector includes the following specific sub-steps:

the SS41 and the second fully-connected network FC2 (f) classify the acquired living body discrimination feature vectors, and output the probability that the current face target is a living body and the probability that the current face target is a non-living body. In this embodiment, the second fully-connected network FC2 (f) preferably outputs the probability that the current face target is a living body and the probability that the current face target is a non-living body through a softmax function, where the probability that the current face target is a living body is expressed as

The probability that the current face target is not a living body is expressed as ≥>

。

SS5, training the deep neural network, the first fully-connected network and the second fully-connected network by using a training sample, introducing an auxiliary monitoring network when the deep neural network is trained, taking a loss function as the training constraint, obtaining network parameters of the deep neural network, the first fully-connected network, the second fully-connected network and the auxiliary monitoring network after the training is finished, and then generating a visible light living body detection model based on the obtained network parameters of the deep neural network, the first fully-connected network and the second fully-connected network, wherein the visible light living body detection model comprises the deep neural network, the first fully-connected network and the second fully-connected network; the auxiliary monitoring network is used for auxiliary monitoring when the deep neural network extracts the green light intensity characteristics.

Optionally, the auxiliary supervised network includes a supervised learning network and a first spectral feature extraction network.

The first spectral feature extraction network is used for intercepting a face image from a historical visible light original image according to the position and the size of a face target,and extracting green light intensity components of the face image, and then generating green light component spatial spectrum characteristics of the face image based on Fourier transform. In this embodiment, the cut-out face image is scaled and then the green light intensity component is extracted, and the scaled face image has a size of 256 × 256 × 3, which is represented as a face image I _f 。

And the supervised learning network is used for receiving the target feature matrix, extracting a single face feature matrix in the target feature matrix based on the position and the size of the face target, then performing learning supervision, and enabling the green light intensity feature in the single face feature matrix to approach the green light component spatial spectrum feature after the learning supervision. In this embodiment, before learning and supervision are performed, the extracted single face feature matrix is further scaled, the size of the scaled single face feature matrix is 8 × 8 × 128, then the single face feature matrix is input into a supervised convolution network C (V), 1 × 1 convolution operation is performed by the supervised convolution network C (V), and the size of the single face feature matrix after 1 × 1 convolution operation is 8 × 8 × 1.

Optionally, face image I _f Extracting green light intensity component to obtain face image I _H Corresponding, face image I _H Has a size of 256 × 256 × 1. Wherein, the extraction of the green light intensity component specifically comprises:

based on a first formula, the face image I _f The RGB three-channel numerical values of each pixel point are all converted into single green component numerical values, wherein the first formula is as follows:

，

wherein,

，

，

representing a face image I _f The value of the 0 th channel of the mth row and nth column pixel,

representing a face image I _f The value of the 1 st channel of the mth row and nth column pixel point in the middle row>

Representing a face image I _f And (4) converting the pixel point of the mth row and the nth column.

Optionally, the face image I is generated based on fourier transform _H The spatial spectrum characteristic of the green light component specifically comprises the following sub-steps:

SSS1, face image I after green light intensity component is extracted _H Performing Fourier transform;

and SSS2, performing normalization calculation by taking a Fourier transform module, scaling the human face image obtained after the normalization calculation, and obtaining the green light component spatial spectrum characteristic of the human face image after scaling. Wherein the size of the spatial spectral feature of the green light component is 8 × 8 × 1.

Optionally, a loss function

，/>

Is a preset first weight factor, < > based on the comparison>

Is a preset second weight coefficient>

Is a preset third weight factor, < > based on the comparison>

Is a preset fourth weight factor, < > based on>

Represents a loss in the regression of the position of the human face target>

Indicating a loss of learning of the green intensity features.

Wherein,

n is the number of training samples; />

Indicating whether the ith mesh contains the true value of the jth prior box center point,if/or>

=1, then indicate that the ith grid includes the jth a priori frame center point, if &>

If =0, it means that the ith grid does not contain the jth prior frame center point; the number of the grids is 64, and each grid corresponds to each characteristic diagram in the target characteristic matrix one by one;

an output value representing whether the ith grid contains the jth prior frame center point if >>

And =0, it means that the ith mesh does not contain the jth prior frame center point.

，/>

Represents a height estimate for the jth prior frame in the ith grid>

Represents the true value, based on the width of the jth prior box in the ith trellis>

Representing the true height value of the jth prior box in the ith mesh.

，/>

A true label representing the training sample.

，/>

An output value representing the learning of the green light intensity characteristic by the auxiliary deep neural network of the auxiliary supervision network is combined>

Represents->

And/or>

The distance between them.

Example two

The embodiment provides a light-weight visible light biopsy device, which comprises a memory and a processor, wherein the memory is used for storing the light-weight visible light biopsy method in the first embodiment, and the processor is used for calling the light-weight visible light biopsy method stored in the memory to perform biopsy.

The lightweight visible light living body detection method realizes the rapid and accurate judgment of whether the human face target in the visible light original image is a living body, and is based on the following principle:

the first visible light living body detection model construction stage:

after a first full-connection network FC1 (x), a second full-connection network FC2 (f) and a deep neural network Net (x) in a backbone are built, when the deep neural network Net (x), the first full-connection network FC1 (x) and the second full-connection network FC2 (f) are trained, an auxiliary monitoring network is introduced, based on the assistance of the auxiliary monitoring network, the deep neural network accurately learns the green light intensity characteristics of the human face, when the human face is classified as a living body, the second full-connection network FC2 (f) classifies whether the current human face is the living body or not based on the green light intensity characteristics of the human face in a target characteristic matrix, and then the probability that the classified output current human face is the living body is compared with the probability that the current human face is the non-living body, so that the judgment result whether the current human face is the living body or not is obtained.

The green light intensity characteristic of the human face is a living body characteristic of the human face, and the green light intensity characteristic is used as a supplement of basic characteristics such as moire and reflection based on a traditional visible light living body detection method.

Secondly, loading a visible light living body detection model on the interconnection equipment:

an online visible light living body detection model is arranged in the interconnection equipment, and comprises a deep neural network Net (x), a first fully-connected network FC1 (x) and a second fully-connected network FC2 (f) which are positioned in the backbone;

interconnection equipment carries out the live body and detects, specifically includes: the method comprises the steps that a deep neural network Net (x) obtains an input image, target feature extraction is conducted, and a target feature matrix is generated; the first fully-connected network FC1 (x) is directed to the class, location and size of each object in the object feature matrixPerforming regression, and outputting the category, position and size of each target so as to obtain the position and size of the human face target; based on the position and size of a human face target, a human face feature matrix is intercepted from a target feature matrix, and a living body distinguishing feature vector is generated after the global maximization operation is carried out on the human face feature matrix; inputting the living body discrimination feature vector into a second full-connection network FC2 (f) to classify the current face as a living body or not, and then outputting the probability P that the current face is the living body _t Probability of non-living body P _f If P is _t >P _f Then the current face is live, if P _t <P _f And the current face is a non-living body.

The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A light-weight visible light living body detection method is characterized by comprising the following steps:

s1, acquiring a visible light original image to be processed;

SS1, constructing a deep neural network, wherein the deep neural network is used for acquiring a historical visible light original image, extracting target features in the historical visible light original image and generating a target feature matrix, the target features comprise green light intensity features, and the green light intensity features are intensity distribution features of green light when blood flows through skin;

SS5, training the deep neural network, the first fully-connected network and the second fully-connected network by using a training sample, introducing an auxiliary supervision network when the deep neural network is trained, taking a loss function as the training constraint, obtaining network parameters of the deep neural network, the first fully-connected network, the second fully-connected network and the auxiliary supervision network after the training is finished, and then generating a visible light living body detection model based on the network parameters of the deep neural network, the first fully-connected network and the second fully-connected network;

the auxiliary supervision network is used for auxiliary supervision when the deep neural network extracts the green light intensity characteristics;

the auxiliary supervision network comprises a supervision learning network and a first spectral feature extraction network;

the supervised learning network is used for receiving the target feature matrix, extracting a single face feature matrix in the target feature matrix based on the position and the size of a face target, then performing learning supervision, and enabling the green light intensity feature in the single face feature matrix to approach the green light component spatial spectrum feature after the learning supervision;

the visible light original image is an RGB three-channel image;

the method for extracting the green light intensity component of the face image specifically comprises the following substeps: based on a first formula, the face image I _f The RGB three-channel numerical values of each pixel point are converted into single green component numerical values, and the first formula is as follows:

；

wherein,

，

，/>

Representing a face image I _f The value of the m-th row and n-th column of pixel points after transformation;

the Fourier transform-based generation of the green light component spatial spectrum feature of the face image specifically comprises the following sub-steps:

SSS1, performing Fourier transform on the face image with the green light intensity component extracted;

2. The light weight visible light living body detection method according to claim 1, wherein in the SS2, a position of the human face target in the target feature matrix is identified based on a non-maximum suppression algorithm.

3. The method for detecting a light-weighted visible light living body according to claim 1, wherein the SS3 includes the following steps:

4. The method for detecting a light-weighted visible light living body according to claim 1, wherein the SS4 determines whether the current human face target is a living body or a non-living body according to the living body discrimination feature vector, and specifically comprises the following sub-steps:

5. The method for detecting a lightweight visible light living body according to claim 1,

in the SS1, before a target feature matrix in a historical visible light original image is extracted by a deep neural network, scaling the received historical visible light original image, wherein the size of the scaled visible light original image is 256 × 256 × 3, and the size of the target feature matrix is 8 × 8 × 128;

in the SS2, when the first fully-connected network identifies the position of the face target in the target feature matrix, the preset prior frame sizes include 192 × 192, 128 × 128, and 32 × 32.

6. The method for detecting a lightweight visible light living body according to claim 5,

said loss function

，/>

Is a preset first weight factor, < > based on the comparison>

Is a preset second weight factor, < > based on the comparison>

Is a preset third weight coefficient>

Is a preset fourth weight factor, < > based on>

Represents the classification loss in the discrimination of a face and a non-face>

Represents a loss in the regression of the position of the human face target>

Representing a green light intensity feature learning loss;

wherein,

n is the number of training samples; />

=1, this means that the jth prior frame in the ith grid is responsible for detecting a face, and if ÷>

Whether the ith grid contains the true value of the jth prior frame central point is represented, if so, the real value of the jth prior frame central point is represented

=1，It indicates that the ith grid contains the jth prior frame center point, if >>

=0, then it means that the ith grid does not contain the jth prior frame center point;

wherein

represents an estimate of the coordinates of the center point of the jth prior frame in the ith grid, </or>

Represents a height estimate, based on the jth prior frame in the ith grid>

Representing the true height value of the jth prior box in the ith grid;

wherein is present>

A true label representing the training sample, device for selecting or keeping>

Indicates the probability that the current face target is not a living body>

Representing the probability that the current face target is a living body;

wherein is present>

Represents->

And/or>

The distance between them.

7. A lightweight visible-light living-body detection device comprising a memory for storing the lightweight visible-light living-body detection method according to any one of claims 1 to 6, and a processor for calling the lightweight visible-light living-body detection method stored in the memory to perform living-body detection.