CN115601818A

CN115601818A - Lightweight visible light living body detection method and device

Info

Publication number: CN115601818A
Application number: CN202211503095.XA
Authority: CN
Inventors: 蒙顺开; 瞿锐恒; 李叶雨
Original assignee: Dolphin Lezhi Technology Chengdu Co ltd
Current assignee: Dolphin Lezhi Technology Chengdu Co ltd
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-01-13
Anticipated expiration: 2042-11-29
Also published as: CN115601818B

Abstract

The invention discloses a lightweight visible light living body detection method and a device, wherein the lightweight visible light living body detection method utilizes a visible light living body detection model to carry out living body discrimination on a human face in a visible light original image, the visible light living body detection model comprises a deep neural network, a first fully connected network and a second fully connected network, and an auxiliary monitoring network for carrying out auxiliary learning on green light intensity characteristics is introduced during training of the visible light living body detection model. Based on the principle that blood in a living body has certain intensity distribution in the green light direction when flowing through skin, the deep neural network is trained by the aid of the auxiliary monitoring network, the deep neural network is assisted by the auxiliary monitoring network to accurately extract living body characteristics of a human face, the problem that a silent living body detection method based on a visible light image in the prior art cannot resist 3D non-living body attack is solved, the living body detection accuracy is improved, and the light weight is achieved.

Description

Lightweight visible light living body detection method and device

Technical Field

The invention belongs to the technical field of image processing and target identification, and particularly relates to a light-weight visible light living body detection method and device.

Background

The living body detection technology mainly judges whether a human face appearing in front of a machine is real or fake, wherein the human face presented by means of other media can be defined as a false human face, and the false human face comprises a printed photo, a screen image, a silica gel mask, a three-dimensional 3D portrait and the like. Currently, mainstream biopsy schemes include coordinated biopsy and uncoordinated biopsy (silent biopsy), and the like. The cooperative biopsy requires a user to complete a specified action according to a prompt and then perform a biopsy, which may also be referred to as dynamic biopsy. The silent living body detection is opposite to the dynamic living body detection, and whether the living body is a real living body is judged mainly under the condition that a series of actions such as blinking, mouth opening and the like are not matched. Therefore, the silence living body detection is technically more difficult to realize, the requirement on accuracy in practical application is higher, meanwhile, the silence living body detection directly performs living body verification under the condition that a user is not sensitive, and better user experience is achieved.

Silent living body detection is generally divided into three technical routes of infrared images, 3D structural light and visible light images according to the difference of imaging sources: the infrared image filters light rays in a specific wave band, and the false face attack based on screen imaging is naturally resisted; depth information is introduced into the 3D structured light, and false face attacks of 2D media such as paper photos, screen imaging and the like can be easily distinguished; the visible light picture is mainly distinguished through Moire patterns, paper photo reflection and other detailed information which appear in screen shooting. Based on the above analysis, it is found that the living body detection based on the visible light image can only be distinguished by the information of the image itself, and is more challenging in an actual open scene, compared with the other two methods.

However, the silent in-vivo detection based on the visible light image has the advantages of high identification speed, simplicity and convenience in operation, non-contact and the like, and in addition, compared with infrared imaging equipment and 3D structured light imaging equipment, visible light imaging equipment is lower in cost and high in integration level, and the existing face identification system adopts the visible light imaging equipment in the mainstream direction, so that the method has important value in research on in-vivo detection based on visible light imaging. Meanwhile, with the popularization of technologies such as 5G and AI, the world of everything interconnection has come, so that the face recognition technology has been widely applied to various types of interconnection devices, including various interconnection devices at the edge, and the application of the face recognition technology in the interconnection devices at the edge needs to consider the calculation power and power consumption of the interconnection devices at the edge, so that how to realize the lightweight of an algorithm process is also a problem that needs to be considered in the research of a method for performing living body detection based on visible light imaging, and the face recognition technology is suitable for the interconnection devices at the edge with extremely limited calculation power.

Disclosure of Invention

The invention aims to overcome one or more defects in the prior art and provide a light-weight visible light living body detection method and device.

The purpose of the invention is realized by the following technical scheme:

first aspect

The invention provides a light-weight visible light living body detection method, which comprises the following steps:

s1, acquiring a visible light original image to be processed;

s2, recognizing a human face target from a visible light original image to be processed by utilizing a pre-constructed visible light living body detection model, and determining that the human face target is a living body or a non-living body;

the construction process of the visible light living body detection model is as follows:

SS1, constructing a deep neural network, wherein the deep neural network is used for acquiring a historical visible light original image, extracting target features in the historical visible light original image and generating a target feature matrix, the target features comprise green light intensity features, and the green light intensity features are intensity distribution features of green light when blood flows through skin;

SS2, constructing a first fully-connected network, wherein the first fully-connected network is used for receiving the target feature matrix and identifying the position and the size of a human face target in the target feature matrix;

SS3, extracting a face feature matrix in a target feature matrix based on the position and the size of the face target, and performing global maximization processing on the face feature matrix to obtain living body distinguishing feature vectors after the global maximization processing;

SS4, constructing a second fully-connected network, wherein the second fully-connected network is used for receiving the living body distinguishing feature vector and determining that the current face target is a living body or a non-living body according to the living body distinguishing feature vector;

SS5, training the deep neural network, the first fully-connected network and the second fully-connected network by using a training sample, introducing an auxiliary supervision network when the deep neural network is trained, taking a loss function as the training constraint, obtaining network parameters of the deep neural network, the first fully-connected network, the second fully-connected network and the auxiliary supervision network after the training is finished, and then generating a visible light living body detection model based on the network parameters of the deep neural network, the first fully-connected network and the second fully-connected network;

the auxiliary supervision network is used for auxiliary supervision when the deep neural network extracts the green light intensity characteristics.

Preferably, in the SS2, the position of the human face target in the target feature matrix is identified based on a non-maximum suppression algorithm.

Preferably, the SS3 specifically includes the following sub-steps:

SS31, and face feature matrix F in the extracted target feature matrix based on the position and size of the face target _H ×F _W ×N；

SS32, respectively for N F _H ×F _W The x 1 matrix is used to find the maximum value, and generating a living body distinguishing feature vector according to the obtained N maximum values.

Preferably, in the SS4, determining that the current face target is a living body or a non-living body according to the living body discrimination feature vector specifically includes the following sub-steps:

SS41, a second full-connection network classifies the obtained living body distinguishing feature vectors and outputs the probability that the current face target is a living body and the probability that the current face target is a non-living body;

SS42, if the probability that the current face target is a living body is larger than the probability that the current face target is a non-living body, determining that the current face target is a living body; and if the probability that the current face target is a living body is smaller than the probability that the current face target is a non-living body, determining that the current face target is the non-living body.

Preferably, the auxiliary supervised network comprises a supervised learning network and a first spectral feature extraction network;

the first spectral feature extraction network is used for intercepting a face image from a historical visible light original image according to the position and the size of a face target, extracting green light intensity components of the face image, and then generating green light component spatial spectral features of the face image based on Fourier transform;

and the supervised learning network is used for receiving the target feature matrix, extracting a single face feature matrix in the target feature matrix based on the position and the size of a face target, then performing learning supervision, and enabling the green light intensity feature in the single face feature matrix to approach the green light component spatial spectrum feature after the learning supervision.

Preferably, the visible light original image is an RGB three-channel image; the method for extracting the green light intensity component of the face image specifically comprises the following substeps: based on a first formula, the face image I _f The RGB three-channel numerical values of each pixel point are all converted into single green component numerical values, and the first formula is as follows:

，

wherein the content of the first and second substances,

，

，

representing a face image I _f The value of the 0 th channel of the mth row and nth column pixel,

representing a face image I _f The value of the 1 st channel of the mth row and nth column pixel points,

representing a face image I _f The value of the 2 nd channel of the mth row and nth column pixel points,

representing a face image I _f And (4) converting the pixel point of the mth row and the nth column.

Preferably, the fourier transform-based generation of the spatial spectral feature of the green light component of the face image specifically includes the following sub-steps:

SSS1, performing Fourier transform on the face image with the green light intensity component extracted;

and SSS2, performing normalization calculation by taking a Fourier transform module, and then obtaining the green light component spatial spectrum characteristic of the face image.

Preferably, in the SS1, before extracting a target feature matrix in the historical visible light original image, the deep neural network scales the received historical visible light original image, where the scaled visible light original image is 256 × 256 × 3, and the target feature matrix is 8 × 8 × 128; in the SS2, when the first fully-connected network identifies the position of the face target in the target feature matrix, the preset prior frame sizes include 192 × 192, 128 × 128, and 32 × 32.

Preferably, the loss function

，

Is a preset first weight coefficient,

is a preset second weight coefficient,

is a preset third weight coefficient,

is a preset fourth weight coefficient,

representing the classification loss in the discrimination of human faces and non-human faces,

representing the loss in regression of the position of the face object,

indicating a classification loss in discriminating between living bodies and non-living bodies,

representing a green light intensity feature learning loss;

wherein the content of the first and second substances,

n is the number of training samples;

indicating whether the jth prior frame in the ith grid is responsible for detecting the real value of the face or not, if so, determining whether the jth prior frame in the ith grid is responsible for detecting the real value of the face

=1, it means that the jth prior frame in the ith grid is responsible for detecting the face, if yes, the face is detected

=0, it means that the jth prior frame in the ith grid is not responsible for detecting the face;

whether the ith grid contains the true value of the central point of the jth prior frame or not is shown, if so

If =1, it means that the ith grid contains the jth prior frame center point, if yes

=0, then it means that the ith grid does not contain the jth prior frame center point; the number of the grids is 64, and each grid corresponds to each characteristic diagram in the target characteristic matrix one by one;

indicating whether the ith grid contains the output value of the jth prior frame center point, if so

If =0, it means that the ith grid does not contain the jth prior frame center point;

wherein, in the step (A),

represents the estimated value of the coordinate of the center point of the jth prior frame in the ith grid,

representing the true value of the coordinates of the central point of the jth prior box in the ith grid,

representing the width estimate of the jth prior box in the ith trellis,

representing the height estimate of the jth prior box in the ith grid,

the true value of the width of the jth prior box in the ith mesh is represented,

representing the true height value of the jth prior frame in the ith grid;

wherein, in the step (A),

a true label representing the training sample,

representing the probability that the current face object is not a living body,

representing the probability that the current face target is a living body;

wherein, in the step (A),

representing the output value of the auxiliary monitoring network after the auxiliary deep neural network learns the green light intensity characteristic,

representing the true value of the spatial spectral feature of the green component,

to represent

And

the distance between them.

The first aspect of the invention brings the following beneficial effects:

(1) On the basis of the principle that blood in a living body has certain intensity distribution in the green light direction when flowing through skin, an auxiliary monitoring network is arranged when a deep neural network is trained, and the auxiliary monitoring network assists the deep neural network to accurately extract living body characteristics (green light intensity characteristics) of the human face, so that the living body judgment based on the living body characteristics of the human face is further realized on the basis that a generated visible light living body detection model carries out living body judgment through some detail information such as moire patterns, paper photo reflection and the like generated by screen shooting, the problem that a silent living body detection method based on a visible light image in the prior art cannot resist 3D non-living body attack is solved, and the accuracy of living body detection is improved;

(2) The backbone network of the visible light living body detection model only comprises a deep neural network, and simultaneously, the tasks of face detection and living body identification are completed, the calculated amount of the face living body detection process is reduced, and the light weight is realized, so that the requirement of the visible light living body detection method realized by the embodiment of the invention on the calculation resource is reduced, and the delay of the living body detection process is also reduced, so that the living body detection speed is improved on the basis of improving the living body detection accuracy;

(3) The face living body detection is light, so that the visible light living body detection method realized by the embodiment of the invention can be suitable for interconnection equipment at the edge end, and meanwhile, the cost, the volume, the power consumption and the delay of the interconnection equipment in the face living body recognition are reduced.

Second aspect of the invention

A second aspect of the present invention provides a lightweight visible light biopsy device, including a memory for storing the lightweight visible light biopsy method according to the first aspect of the present invention, and a processor for calling the lightweight visible light biopsy method stored in the memory to perform a biopsy.

The second aspect of the present invention brings about the same advantageous effects as the first aspect, and will not be described in detail herein.

Drawings

FIG. 1 is a flow chart of a method for light-weighted visible light biopsy;

FIG. 2 is a flow chart of a construction of a visible light living body detection model;

FIG. 3 is a schematic diagram of a visible light living body detection model;

fig. 4 is a schematic diagram of a deep neural network.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

Example one

Referring to fig. 1 to 4, the embodiment provides a light-weight visible light living body detection method, including the following steps:

s1, acquiring a visible light original image to be processed. In this embodiment, the visible light original image is acquired from the visible light imaging device and is an RGB three-channel image.

And S2, recognizing a human face target from the original visible light image to be processed by utilizing a pre-constructed visible light living body detection model, and determining that the human face target is a living body or a non-living body.

SS1, constructing a deep neural network, wherein the deep neural network is used for acquiring a historical visible light original image, extracting a target feature in the historical visible light original image and generating a target feature matrix, the target feature comprises a green light intensity feature, and the green light intensity feature is an intensity distribution feature of the green light direction when blood flows through the skin.

Before the visible light original image is input into the deep neural network, scaling is carried out, the size of the scaled visible light original image is 256 multiplied by 3, and the scaled visible light original image is used as an input image of the deep neural network. In this embodiment, the deep neural network Net (x) includes eight feature extraction modules, and the eight feature extraction modules extract the target features and then generate a target feature matrix, where the size of the target feature matrix is 8 × 8 × 128, and a feature vector of 1 × 1 × 128 represents the target features in an image block of 16 × 16 size in the input image. As shown in fig. 4, the eight feature extraction modules include a first 3 × 3 channel separable convolution, a first 1 × 1 convolution, a first active layer, a second 3 × 3 channel separable convolution, a second 1 × 1 convolution, a second active layer, a max-pooling layer, a channel expansion layer, and an addition layer.

And SS2, constructing a first full-connection network, wherein the first full-connection network is used for receiving the target feature matrix and identifying the position and the size of the human face target in the target feature matrix.

Specifically, the first fully-connected network FC1 (x) receives the target feature matrix, regresses the category, position, and size of each target, and outputs the category, position, and size of each target, where the output of the first fully-connected network FC1 (x) is denoted by Dd (i), and i =0 to 14. Since the aspect ratio of the face in the input image is close to 1:1, when the position of the face target in the target feature matrix is recognized through the first fully-connected network FC1 (x), the prior frame sizes are set to 192 × 192, 128 × 128, and 32 × 32, and the position of the face target in the target feature matrix is recognized based on the non-maximum suppression algorithm.

And SS3, extracting a face feature matrix in the target feature matrix based on the position and size of the face target, performing global maximization processing on the face feature matrix, and obtaining a living body distinguishing feature vector after the global maximization processing.

Optionally, SS3 specifically includes the following sub-steps:

SS31, and face feature matrix F in the target feature matrix extracted based on position and size of face target _H ×F _W ×N；

SS32, respectively for N F _H ×F _W The x 1 matrix obtains the maximum value, and a living body distinguishing feature vector is generated according to the obtained N maximum values; wherein the value of N is 128.

And SS4, constructing a second fully-connected network, wherein the second fully-connected network is used for receiving the living body distinguishing feature vector and determining that the current face target is a living body or a non-living body according to the living body distinguishing feature vector.

Optionally, in SS4, determining that the current face target is a living body or a non-living body according to the living body discrimination feature vector includes the following specific sub-steps:

the SS41 and the second fully-connected network FC2 (f) classify the acquired living body discrimination feature vectors, and output the probability that the current face target is a living body and the probability that the current face target is a non-living body. In this embodiment, the second fully-connected network FC2 (f) preferably outputs the probability that the current face target is a living body and the probability that the current face target is a non-living body through a softmax function, where the probability that the current face target is a living body is expressed as

The probability that the current face target is not a living body is expressed as

。

SS5, training the deep neural network, the first fully-connected network and the second fully-connected network by using the training samples, introducing an auxiliary monitoring network when the deep neural network is trained, taking a loss function as the training constraint, obtaining network parameters of the deep neural network, the first fully-connected network, the second fully-connected network and the auxiliary monitoring network after the training is finished, and then generating a visible light living body detection model based on the obtained network parameters of the deep neural network, the first fully-connected network and the second fully-connected network, wherein the visible light living body detection model comprises the deep neural network, the first fully-connected network and the second fully-connected network; the auxiliary monitoring network is used for auxiliary monitoring when the deep neural network extracts the green light intensity characteristics.

Optionally, the auxiliary supervised network includes a supervised learning network and a first spectral feature extraction network.

The first spectral feature extraction network is used for intercepting a face image from a historical visible light original image according to the position and size of a face target, extracting green light intensity components of the face image, and generating green light component spatial spectral features of the face image based on Fourier transform. In this embodiment, the cut-out face image is scaled and then the green light intensity component is extracted, and the scaled face image has a size of 256 × 256 × 3, which is represented as a face image I _f 。

And the supervised learning network is used for receiving the target feature matrix, extracting a single face feature matrix in the target feature matrix based on the position and the size of the face target, then performing learning supervision, and enabling the green light intensity feature in the single face feature matrix to approach the green light component spatial spectrum feature after the learning supervision. In this embodiment, before learning and supervision are performed, the extracted single face feature matrix is further scaled, the size of the scaled single face feature matrix is 8 × 8 × 128, then the single face feature matrix is input into a supervised convolution network C (V), 1 × 1 convolution operation is performed by the supervised convolution network C (V), and the size of the single face feature matrix after 1 × 1 convolution operation is 8 × 8 × 1.

Optionally, a face image I _f Extracting green light intensity component to obtain face image I _H Corresponding, face image I _H Has a size of 256 × 256 × 1. Wherein, the extraction of the green light intensity component specifically comprises:

based on a first formula, the face image I _f Of each pixel pointThe RGB three-channel numerical values are all converted into single green component numerical values, wherein the first formula is as follows:

，

wherein the content of the first and second substances,

，

，

representing a face image I _f The value of the 2 nd channel of the mth row and nth column pixel,

Optionally, the face image I is generated based on fourier transform _H The spatial spectrum characteristic of the green light component specifically comprises the following sub-steps:

SSS1, face image I after green light intensity component is extracted _H Performing Fourier transform;

and SSS2, performing normalization calculation by taking a Fourier transform module, scaling the human face image obtained after the normalization calculation, and obtaining the green light component spatial spectrum characteristic of the human face image after scaling. Wherein the size of the spatial spectral feature of the green light component is 8 × 8 × 1.

Optionally, a loss function

，

Is a preset first weight coefficient,

is a preset second weight coefficient,

is a preset third weight coefficient and is a preset third weight coefficient,

is a preset fourth weight coefficient,

representing the loss in regression of the position of the face object,

indicating a loss of learning of the green intensity features.

Wherein the content of the first and second substances,

n is the number of training samples;

If =1, it means that the jth prior frame in the ith grid is responsible for detecting the face, if yes, the face is detected

If =0, it means that the ith grid does not contain the jth prior frame center point; the number of the grids is 64, and each grid corresponds to each characteristic diagram in the target characteristic matrix one by one;

And =0, it means that the ith mesh does not contain the jth prior frame center point.

，

Denotes the jth first in the ith gridThe coordinate estimation value of the central point of the check frame,

representing the width estimate of the jth prior box in the ith trellis,

representing the height estimate of the jth prior box in the ith grid,

representing the highly true value of the jth prior box in the ith mesh.

，

A true label representing the training sample.

，

Represents the output value of the auxiliary monitoring network after the auxiliary deep neural network learns the green light intensity characteristic,

represent

And

the distance between them.

Example two

The embodiment provides a lightweight visible light biopsy device, which comprises a memory and a processor, wherein the memory is used for storing the lightweight visible light biopsy method in the first embodiment, and the processor is used for calling the lightweight visible light biopsy method stored in the memory to perform biopsy.

The lightweight visible light living body detection method realizes the rapid and accurate judgment of whether the human face target in the visible light original image is a living body, and is based on the following principle:

the first visible light living body detection model construction stage:

after a first full-connection network FC1 (x), a second full-connection network FC2 (f) and a deep neural network Net (x) in a backbone are built, when the deep neural network Net (x), the first full-connection network FC1 (x) and the second full-connection network FC2 (f) are trained, an auxiliary monitoring network is introduced, based on the assistance of the auxiliary monitoring network, the deep neural network accurately learns the green light intensity characteristics of the human face, when the human face is classified as a living body, the second full-connection network FC2 (f) classifies whether the current human face is the living body or not based on the green light intensity characteristics of the human face in a target characteristic matrix, and then the probability that the classified output current human face is the living body is compared with the probability that the current human face is the non-living body, so that the judgment result whether the current human face is the living body or not is obtained.

The green light intensity characteristic of the human face is a living body characteristic of the human face, and the human face green light intensity characteristic is used as a supplement of basic characteristics such as moire patterns, reflection and the like based on a traditional visible light living body detection method.

Secondly, loading a visible light living body detection model on the interconnection equipment:

an online visible light living body detection model is arranged in the interconnection equipment, and comprises a deep neural network Net (x), a first fully-connected network FC1 (x) and a second fully-connected network FC2 (f) which are positioned in the backbone;

interconnection equipment carries out the live body and detects, specifically includes: the method comprises the steps that a deep neural network Net (x) obtains an input image, target feature extraction is conducted, and a target feature matrix is generated; the first full-connection network FC1 (x) regresses the category, position and size of each target in the target feature matrix, outputs the category, position and size of each target, and further obtains the position and size of the human face target; based on the position and size of a human face target, a human face feature matrix is intercepted from a target feature matrix, and a living body distinguishing feature vector is generated after the global maximization operation is carried out on the human face feature matrix; inputting the living body discrimination feature vector into a second full-connection network FC2 (f) to classify the current face as a living body or not, and then outputting the probability P that the current face is the living body _t Probability of non-living body P _f If P is _t >P _f Then the current face is live, if P _t <P _f And the current face is a non-living body.

The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A light-weight visible light living body detection method is characterized by comprising the following steps:

s1, acquiring a visible light original image to be processed;

2. The light weight visible light living body detection method according to claim 1, wherein in the SS2, a position of the human face target in the target feature matrix is identified based on a non-maximum suppression algorithm.

3. The method for detecting a light-weighted visible light living body according to claim 1, wherein the SS3 specifically comprises the following substeps:

SS32, respectively for N F _H ×F _W The x 1 matrix finds the maximum value, and generates a living body discrimination feature vector from the N maximum values obtained.

4. The method for detecting a light-weighted visible light living body according to claim 1, wherein the SS4 is configured to determine whether the current human face target is a living body or a non-living body according to the living body discrimination feature vector, and specifically includes the following sub-steps:

SS41, a second full-connection network classifies the obtained living body distinguishing feature vector and outputs the probability that the current face target is a living body and the probability that the current face target is a non-living body;

SS42, if the probability that the current face target is a living body is larger than the probability that the current face target is a non-living body, determining that the current face target is a living body; and if the probability that the current face target is a living body is smaller than the probability that the current face target is a non-living body, determining that the current face target is a non-living body.

5. The light-weight visible light living body detection method according to claim 1, wherein the auxiliary supervision network comprises a supervision learning network and a first spectral feature extraction network;

the first spectral feature extraction network is used for intercepting a face image from a historical visible light original image according to the position and size of a face target, extracting green light intensity components of the face image, and then generating green light component spatial spectral features of the face image based on Fourier transform;

6. The method for detecting a lightweight visible light living body according to claim 5,

the visible light original image is an RGB three-channel image;

the method for extracting the green light intensity component of the face image specifically comprises the following substeps: based on a first formula, the face image I _f The RGB three-channel numerical values of each pixel point are all converted into single green component numerical values, and the first formula is as follows:

，

wherein the content of the first and second substances,

，

，

7. The method for detecting a light-weighted visible light living body according to claim 5, wherein the step of generating the green light component spatial spectrum feature of the face image based on Fourier transform specifically comprises the following sub-steps:

8. The method for detecting a lightweight visible light living body according to claim 1,

in the SS1, before a target feature matrix in a historical visible light original image is extracted by a deep neural network, scaling the received historical visible light original image, wherein the scaled visible light original image is 256 multiplied by 3, and the size of the target feature matrix is 8 multiplied by 128;

in the SS2, when the first fully-connected network identifies the position of the face target in the target feature matrix, the preset prior frame sizes include 192 × 192, 128 × 128, and 32 × 32.

9. The method for detecting a lightweight visible light living body according to claim 8,

said loss function

，

To prepareThe first weight coefficient is set to be a first weight coefficient,

is a preset second weight coefficient,

is a preset third weight coefficient,

is a preset fourth weight coefficient,

representing the loss in regression of the position of the face object,

representing a green light intensity feature learning loss;

wherein the content of the first and second substances,

n is the number of training samples;

If =0, it means that the ith grid does not contain the jth prior frame center point; the number of the grids is 64, and each grid corresponds to each feature map in the target feature matrix one by one;

wherein, in the step (A),

representing the width estimate of the jth prior box in the ith trellis,

representing the height estimate of the jth prior box in the ith grid,

representing the true height value of the jth prior box in the ith grid;

wherein, in the step (A),

a true label representing the training sample,

representing the probability that the current face object is not a living body,

representing the probability that the current face target is a living body;

wherein, in the step (A),

to represent

And with

The distance between them.

10. A lightweight visible light living body detection device comprising a memory for storing the lightweight visible light living body detection method according to any one of claims 1 to 9, and a processor for calling the lightweight visible light living body detection method stored in the memory to perform living body detection.