CN111914758A

CN111914758A - Face in-vivo detection method and device based on convolutional neural network

Info

Publication number: CN111914758A
Application number: CN202010770605.4A
Authority: CN
Inventors: 李薪宇
Original assignee: Chengdu Aokuai Technology Co ltd
Current assignee: Chengdu Aokuai Technology Co ltd
Priority date: 2020-08-04
Filing date: 2020-08-04
Publication date: 2020-11-10

Abstract

The embodiment of the invention provides a face living body detection method based on a convolutional neural network, which comprises the steps of obtaining a face sample image, constructing a convolutional neural network which comprises a multi-scale attention module and is based on living body detection, inputting the face sample image into the convolutional neural network based on the living body detection, training to obtain a convolutional neural network model based on the living body detection, inputting a face image to be detected into the convolutional neural network model based on the living body detection, and detecting the living body face image. The embodiment of the invention also provides a human face living body detection device based on the convolutional neural network. The embodiment of the invention trains and optimizes the convolutional neural network model based on the living body detection by constructing the convolutional neural network based on the living body detection, can quickly and accurately realize the face recognition of the living body, has strong practicability, and effectively improves the face recognition efficiency and the safety.

Description

Face in-vivo detection method and device based on convolutional neural network

Technical Field

The invention relates to the technical field of computer vision and image recognition, in particular to a face in-vivo detection method and device based on a convolutional neural network.

Background

With the intensive research and rapid development of computer vision and pattern recognition technologies, biometric identification technologies such as face identification, fingerprint identification, iris identification and the like are applied in different scenes. The face recognition technology has the advantages of convenience in use, non-contact property and the like, and is widely applied to various fields such as finance, security protection, internet and the like. Meanwhile, aiming at the problem that the human face recognition system attacks by using photos, videos, masks and the like to disguise the living human faces, how to ensure the safety of the human face recognition system becomes a general concern of users by effectively recognizing the living human faces.

At present, two common living body face recognition methods are mainly used, one is a man-machine interaction mode, a user needs to complete appointed actions such as opening mouth, blinking, shaking head and the like, a face recognition system judges whether a detected object is a real person or not by detecting the actions, and the method needs the user to complete the appointed actions in a matching manner, so that the user experience is poor, and the detection efficiency is low; the other method is to use a depth camera to collect three-dimensional face information of a detected person and use methods such as an optical flow field to extract a characteristic vector to judge the living body face, and the method needs to add additional equipment and has higher popularization cost. Therefore, how to rapidly and accurately realize living body face recognition and effectively improve face recognition efficiency and safety becomes one of the technical problems to be solved urgently in the process of face recognition technology development and application.

Disclosure of Invention

In order to solve at least one of the above technical problems, an embodiment of the present invention provides a face live detection method based on a convolutional neural network, including the following steps: s101, acquiring a face sample image, wherein the face sample image comprises a living body face image and a non-living body face image, and preprocessing the acquired face sample image; s102, constructing a convolutional neural network based on in vivo detection, wherein the convolutional neural network based on in vivo detection comprises a multi-scale attention fusion module; s103, inputting the face sample image into the convolutional neural network based on the living body detection, and training to obtain a convolutional neural network model based on the living body detection; s104, inputting a face image to be detected into the convolutional neural network model based on living body detection, judging whether output data is larger than a preset threshold value, and if so, determining that the image to be detected is a living body face image; and if not, determining that the image to be detected is a non-living body face image.

Preferably, the step of preprocessing the acquired face sample image includes: adjusting the human face sample image to be a preset size; and identifying a living body face image and a non-living body face image in the face sample image.

Preferably, the step S102 specifically includes: searching for optimal features through an NAS algorithm based on a plurality of preset layers of a preset initial convolutional neural network, and determining a backbone network of the convolutional neural network based on in vivo detection; and integrating the multi-scale attention fusion module into the next layer of the main network of the convolutional neural network based on living body detection to obtain the convolutional neural network based on living body face detection.

Preferably, the step S103 specifically includes: inputting the face sample image into the convolutional neural network based on the living body detection, training and optimizing the convolutional neural network based on the living body detection, and determining that the convolutional neural network structure when the loss function is smaller than a preset threshold value is the convolutional neural network model based on the living body detection.

Preferably, before the step of identifying the living body face image and the non-living body face image in the face sample image, the method further includes: and positioning a face image area in the face sample image.

The embodiment of the invention also provides a human face living body detection device based on the convolutional neural network, which comprises the following components: the system comprises an acquisition module, a preprocessing module and a display module, wherein the acquisition module is used for acquiring a human face sample image, the human face sample image comprises a living body human face image and a non-living body human face image, and the acquired human face sample image is preprocessed; the system comprises a construction module, a data processing module and a control module, wherein the construction module is used for constructing a convolutional neural network based on in vivo detection, and the convolutional neural network based on in vivo detection comprises a multi-scale attention fusion module; the training module is used for inputting the face sample image into the convolutional neural network based on the living body detection, and training to obtain a convolutional neural network model based on the living body detection; the detection module is used for inputting a face image to be detected into the convolutional neural network model based on living body detection, judging whether output data is larger than a preset threshold value or not, and if so, determining that the image to be detected is a living body face image; and if not, determining that the image to be detected is a non-living body face image.

Preferably, the obtaining module further includes: the preprocessing unit is used for adjusting the human face sample image to be a preset size; and identifying a living body face image and a non-living body face image in the face sample image.

Preferably, the building block comprises: the search unit is used for searching for optimal characteristics through an NAS algorithm based on a plurality of preset layers of a preset initial convolutional neural network and determining a main network of the convolutional neural network based on the living body detection; and the integration unit is used for integrating the multi-scale attention fusion module into the next layer of the main network of the convolutional neural network based on living body detection to obtain the convolutional neural network based on living body face detection.

Preferably, the training module is specifically configured to input the face sample image into the convolutional neural network based on living body detection, perform training optimization on the convolutional neural network based on living body detection, and determine that a convolutional neural network structure when a loss function is smaller than a preset threshold is the convolutional neural network model based on living body detection.

Preferably, the preprocessing unit is further configured to locate a face image region in the face sample image.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow chart of a face in-vivo detection method based on a convolutional neural network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the network architecture of a convolutional neural network DepthNet according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the cell structure of the convolutional neural network DepthNet according to the embodiment of the present invention;

FIG. 4 is a schematic diagram of a multi-scale attention fusion module of a convolutional neural network of an embodiment of the present invention;

FIG. 5 is a schematic diagram of a network architecture of a convolutional neural network VivoNet based on liveness detection according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a multi-scale attention fusion module according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a face in-vivo detection device based on a convolutional neural network according to an embodiment of the present invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

Example one

The embodiment of the invention provides a face living body detection method based on a convolutional neural network, which comprises the following steps as shown in figure 1: s101, obtaining a face sample image, wherein the face sample image comprises a living body face image and a non-living body face image, and preprocessing the obtained face sample image; s102, constructing a convolutional neural network based on in vivo detection, wherein the convolutional neural network based on in vivo detection comprises a multi-scale attention fusion module; s103, inputting the face sample image into the convolutional neural network based on the living body detection, and training to obtain a convolutional neural network model based on the living body detection; s104, inputting a face image to be detected into the convolutional neural network model based on living body detection, judging whether output data is larger than a preset threshold value, and if so, determining that the image to be detected is a living body face image; and if not, determining that the image to be detected is a non-living body face image.

In the technical scheme, a preset number of face sample images are collected as a training data set, a convolutional neural network based on living body detection is constructed, the face sample images are input into the constructed convolutional neural network based on living body detection for training and optimization to obtain a convolutional neural network model based on living body detection, the face images to be detected in a test data set are input into the convolutional neural network model based on living body detection, and whether the input face images to be detected are living body face images or not is judged according to output data of the convolutional neural network model based on living body detection.

In the above technical solution, the step of preprocessing the acquired face sample image includes: adjusting the human face sample image to be a preset size; and identifying a living body face image and a non-living body face image in the face sample image.

In the technical scheme, a preset number of living body face images and non-living body face images are collected to serve as training images, wherein the training images comprise a first preset number of living body face images collected under different light rays, different backgrounds and different collecting equipment conditions, and a second preset number of non-living body face images collected through pictures, videos and the like. And selecting sample images from the collected living body face image and the collected non-living body face image, and removing sample images which do not contain complete face images or do not meet preset conditions, such as image resolution lower than a preset threshold value. The method comprises the steps of carrying out face region recognition on a sample image meeting preset conditions, positioning a face image region in the sample image, specifically, detecting a 3D shape of a face in the sample image based on a Mask R-CNN model, determining a position region of the face image in the sample image, carrying out size cutting on the sample image according to the position region of the face image, and adjusting the sample image to a preset size, for example, adjusting the size of the sample image to 256 × 256. Marking the sample image after the size adjustment, identifying a living body face image and a non-living body face image in the sample image, and specifically, setting labels for the living body face image and the non-living body face image respectively, wherein the label corresponding to the living body face image is 1, and the label corresponding to the non-living body face image is 2, so as to obtain a face sample image set, namely a training data set for training a convolutional neural network based on living body detection.

In the above technical solution, step S102 specifically includes: searching for optimal features through an NAS algorithm based on a plurality of preset layers of a preset initial convolutional neural network, and determining a backbone network of the convolutional neural network based on in vivo detection; and integrating the multi-scale attention fusion module into the next layer of the main network of the convolutional neural network based on living body detection to obtain the convolutional neural network based on living body face detection.

In the technical scheme, an initial convolutional neural network DepthNet is preset, and the architecture of the convolutional neural network DepthNet is shown in fig. 2, wherein the DepthNet convolutional neural network is composed of three stacked units (a lower layer, a middle layer and a higher layer), each unit comprises a maximum pooling layer, and a stem layer and a head layer adopt a convolution kernel of 3 × 3; as shown in fig. 3, each cell contains 6 nodes, where each node represents a network structure and the operating space of each node is denoted o. The 6 nodes include an input node, four intermediate nodes B1, B2, B3, B4, and an output node; as FIG. 4 shows two operations, namely None and Skip-Connect, the edge (i, j) between two nodes (except the output node) represents the flow of information between the nodes, which is represented by the parameter a^(i,j)And (4) weighting operation composition. In particular, each edge (i, j) may be defined by a function o^(i,j)Is represented by the formula, wherein o^(i,j)(x_i)＝Σ_o∈Oη_o ^(i,j)×o(x_i). Softmax function will a^(i,j)As a weight parameter, i.e. eta_o ^(i,j)＝exp(a_o ^(i,j))/Σ_o’∈Oexp(a_o’ ^(i,j)) I.e. the intermediate node can be denoted x_j＝Σ_i<jo^(i,j)(x_j) Output node (x)_N-1) Representing x by a weighted sum of all intermediate nodes_N-1＝Σ_0<i<N-1β_i(x_i). The importance weight strategy β of the intermediate nodes, i.e. β_i＝exp(β_i’)/Σ_0<j<N-1exp(β_j') wherein beta is_iIs an intermediate node (x)_i) Of initial learning weightsLoss value.

The network parameter configuration of the convolutional neural network DepthNet is shown in table 1, where Conv denotes convolutional layers, the convolutional kernel size is 3 × 3, and the step size of all convolutional layers is 1.

TABLE 1

With the convolutional neural network DepthNet as a base line network, searching for an optimal feature by using an nas (neural Architecture search) algorithm in three layers (a lower layer, a middle layer and a higher layer) of the convolutional neural network DepthNet to form a live body detection backbone network, as shown in fig. 5;

in searching for optimal features using NAS algorithm, L_trainAnd L_valRespectively representing training loss and verification loss, and the network parameter (w) and the NAS search parameter (a) in the search stage are obtained by the following two-layer optimization learning:

since different layers of neurons are under different regions of their receptive field, they may be stimulated differently. By introducing a Multiscale Attention Fusion Module (MAFM), the low-layer, middle-layer and high-layer features of a DepthNet network can be refined and fused by spatial selective Attention, specifically, the Multiscale Attention Fusion Module is integrated into the next layer of the main network of the convolutional neural network based on living body detection to obtain the convolutional neural network based on living body face detection, specifically, as shown in FIG. 6, the features F from different hierarchical units (low-layer, middle-layer and high-layer) are refined by spatial Attention and then merged together, and the refined features F are₀Can be expressed as:

F_i’＝F_i⊙(σ(C_i([A(F_i)，M(F_i)]))),i∈{low,mid,high}

where A and M represent the average pool layer and the maximum pool layer, respectively, σ is a sigmoid function, C is a convolution layer, and convolution kernels of 7 × 7, 5 × 5, and 3 × 3 sizes are used for the lower layer unit C, respectively_lowMiddle layer unit C_midAnd a higher layer unit C_high。

In the above technical solution, step S103 specifically includes: inputting the face sample image into the convolutional neural network based on the living body detection, training and optimizing the convolutional neural network based on the living body detection, and determining that the convolutional neural network structure when the loss function is smaller than a preset threshold value is the convolutional neural network model based on the living body detection.

In the technical scheme, a face sample image of which a living body face image and a non-living body face image are identified by a label is input into a convolutional neural network VivoNet based on living body detection, a network loss value is calculated through a preset loss function, and specifically, a mean square error loss L is adopted_mseLoss of binding depth L_cdlTo calculate the loss of the network, the network loss function expression is L_all＝L_mse+L_cdl. If the loss function value is not less than the preset threshold value, adjusting network parameters of the convolutional neural network VivoNet based on the living body detection, and continuing training optimization, wherein the network parameters comprise weights and/or iteration times of each layer of the convolutional neural network VivoNet based on the living body detection; and if the loss function value is smaller than a preset threshold value, finishing the training, and determining that the current convolutional neural network VivoNet based on the living body detection is a convolutional neural network model based on the living body detection.

In the above technical solution, step S104 specifically includes: adopting four public face data sets of OULU-NPU, CASIA-MFSD, RELAY-ATTACK and MSU-MFSD as test data sets, respectively inputting the test data sets into a convolutional neural network model based on living body detection, judging whether output data is larger than a preset threshold value, and if so, determining that an image to be detected is a living body face image; and if not, determining that the image to be detected is a non-living body face image.

In the technical scheme, the face data set with the source different from that of the training data set is used as the test data set, so that the identification accuracy of the convolutional neural network model based on the living body detection can be effectively verified, and the improvement of the identification accuracy of the convolutional neural network model based on the living body detection can be facilitated.

On one hand, the face biopsy method based on the convolutional neural network provided by the embodiment of the invention improves the performance and accuracy of the algorithm by designing the convolutional neural network DepthNet with stronger robust modeling capability, solves the problems of high time consumption and low calculation efficiency of the neural network based on Central Difference Convolution (CDC), as shown in Table 2,

TABLE 2

Based on the time statistics of training and testing in the OULU-NPU video, compared with a network model combining a DepthNet network and a CDC network, the training time of the DepthNet network model is reduced by 21.5 minutes, the testing time is reduced by 9 minutes, and the training and testing time of the network model is effectively reduced. Meanwhile, the error rate test results of different Convolution types shown in table 3 indicate that the error rate of the ordinary Convolution is lower than that of the Local Binary Convolution (LB-Convolution) and Gabor filter Convolution (Gabor Convolution), and the feature extraction effect is better.

TABLE 3

On the other hand, based on a plurality of preset layers of a preset initial convolutional neural network, the optimal features are searched through the NAS algorithm, the trunk network of the convolutional neural network based on the living body detection is determined, the multi-scale attention fusion module is integrated on the next layer of the trunk network of the convolutional neural network based on the living body detection, so as to obtain the convolutional neural network VivoNet based on the living body face detection, and through tests, as shown in a test result shown in Table 4, compared with the neural network in which the trunk network is not searched through the NAS algorithm, the human face detection error rate of the convolutional neural network VivoNet based on the living body face detection in the embodiment of the invention is reduced by 0.4%, compared with the human face detection error rate of the convolutional neural network VivoNet based on the living body face detection in the embodiment of the invention, which is only searched through the NAS algorithm and is not integrated, is reduced by 0.1%, the accuracy rate of the living human face detection is obviously improved.

TABLE 4

Meanwhile, in order to further verify the generalization ability of the convolutional neural network model based on the living body detection, different neural network models are compared and tested by adopting CASIA-MFSD, RELAY-ATTACK and MSU-MFSD data sets, and the generalization ability of the neural network model to a non-living body face sample is verified, as shown in Table 5, the convolutional neural network VivoNet based on the living body face detection constructed by the embodiment of the invention has good generalization ability in a test set and has better recognition efficiency and performance compared with the neural network model of a comparison group.

TABLE 5

Example two

An embodiment of the present invention also provides a face in-vivo detection apparatus 200 based on a convolutional neural network, as shown in fig. 7, including: an obtaining module 201, configured to obtain a face sample image, where the face sample image includes a living body face image and a non-living body face image, and pre-process the obtained face sample image; a building module 202, configured to build a convolutional neural network based on in vivo detection, where the convolutional neural network based on in vivo detection includes a multi-scale attention fusion module; the training module 203 is configured to input the face sample image into the convolutional neural network based on living body detection, and train to obtain a convolutional neural network model based on living body detection; the detection module 204 is configured to input the face image to be detected into the convolutional neural network model based on living body detection, determine whether output data is greater than a preset threshold, and determine that the image to be detected is a living body face image if the output data is greater than the preset threshold; and if not, determining that the image to be detected is a non-living body face image.

In the technical scheme, an acquisition module 201 acquires a preset number of face sample images as a training data set, a construction module 202 constructs a convolutional neural network based on in-vivo detection, a training module 203 inputs the face sample images into the constructed convolutional neural network based on in-vivo detection for training and optimization to obtain a convolutional neural network model based on in-vivo detection, a detection module 204 inputs the face images to be detected in a test data set into the convolutional neural network model based on in-vivo detection, and whether the input face images to be detected are living face images is judged according to output data of the convolutional neural network model based on in-vivo detection.

In the foregoing technical solution, the obtaining module 201 further includes: the preprocessing unit is used for adjusting the human face sample image to be a preset size; and identifying a living body face image and a non-living body face image in the face sample image.

In this technical scheme, the obtaining module 201 collects a preset number of live body face images and non-live body face images as training images, including a first preset number of live body face images collected under different light, different backgrounds and different collecting device conditions, and a second preset number of non-live body face images collected through pictures, videos and the like. The preprocessing unit selects sample images from the collected living body face images and non-living body face images, and removes sample images which do not contain complete face images or do not meet preset conditions, such as image resolution lower than a preset threshold value. The preprocessing unit performs face region recognition on a sample image meeting preset conditions, positions a face image region in the sample image, specifically detects a 3D shape of a face in the sample image based on a Mask R-CNN model, determines a position region of the face image in the sample image, performs size clipping on the sample image according to the position region of the face image, and adjusts the sample image to a preset size, for example, the size of the sample image is adjusted to 256 × 256. Marking the sample image after the size adjustment, identifying a living body face image and a non-living body face image in the sample image, and specifically, setting labels for the living body face image and the non-living body face image respectively, wherein the label corresponding to the living body face image is 1, and the label corresponding to the non-living body face image is 2, so as to obtain a face sample image set, namely a training data set for training a convolutional neural network based on living body detection.

In the above technical solution, the building module 202 includes: the search unit is used for searching for optimal characteristics through an NAS algorithm based on a plurality of preset layers of a preset initial convolutional neural network and determining a main network of the convolutional neural network based on the living body detection; and the integration unit is used for integrating the multi-scale attention fusion module into the next layer of the main network of the convolutional neural network based on living body detection to obtain the convolutional neural network based on living body face detection.

In the technical scheme, an initial convolutional neural network DepthNet is preset by a construction module 202, the architecture of the convolutional neural network DepthNet is shown in fig. 2, a search unit takes the convolutional neural network DepthNet as a base line network, and searches for optimal features by using an NAS algorithm in three layers (a lower layer, a middle layer and a higher layer) of the convolutional neural network DepthNet to form a trunk network for in vivo detection, as shown in fig. 3, an integration unit integrates a multi-scale attention fusion module into the next layer of the trunk network of the convolutional neural network based on in vivo detection, so as to obtain the convolutional neural network VivoNet based on in vivo face detection.

In the above technical solution, the training module 203 is specifically configured to input the face sample image into the convolutional neural network based on living body detection, perform training optimization on the convolutional neural network based on living body detection, and determine that a convolutional neural network structure when a loss function is smaller than a preset threshold is the convolutional neural network model based on living body detection.

In the technical fieldIn the scheme, the training module 203 is configured to input a face sample image, in which a living body face image and a non-living body face image are identified by a label, into a convolutional neural network VivoNet based on living body detection, calculate a network loss value by using a preset loss function, and specifically, calculate a network loss value by using a mean square error loss L_mseLoss of binding depth L_cdlTo calculate the loss of the network, the network loss function expression is L_all＝L_mse+L_cdl. If the loss function value is not less than the preset threshold value, adjusting network parameters of the convolutional neural network VivoNet based on the living body detection, and continuing training optimization, wherein the network parameters comprise weights and/or iteration times of each layer of the convolutional neural network VivoNet based on the living body detection; and if the loss function value is smaller than a preset threshold value, finishing the training, and determining that the current convolutional neural network VivoNet based on the living body detection is a convolutional neural network model based on the living body detection.

In the above technical solution, the detection module 204 is configured to use four public face data sets, i.e., the OULU-NPU, the CASIA-MFSD, the RELAY-ATTACK, and the MSU-MFSD, as test data sets, respectively input a convolutional neural network model based on living body detection, determine whether output data is greater than a preset threshold, and if so, determine that an image to be detected is a living body face image; and if not, determining that the image to be detected is a non-living body face image.

The face in-vivo detection device based on the convolutional neural network provided by the embodiment of the invention can execute the face in-vivo detection method based on the convolutional neural network provided by the embodiment of the invention, has a corresponding functional module for executing the face in-vivo detection method based on the convolutional neural network, and has the beneficial effects generated by realizing the face in-vivo detection method based on the convolutional neural network.

In the present invention, the term "plurality" means two or more unless explicitly defined otherwise. The terms "mounted," "connected," "fixed," and the like are to be construed broadly, and for example, "connected" may be a fixed connection, a removable connection, or an integral connection; "coupled" may be direct or indirect through an intermediary. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the description herein, the description of the terms "one embodiment," "some embodiments," "specific embodiments," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A face living body detection method based on a convolutional neural network is characterized by comprising the following steps:

s101, acquiring a face sample image, wherein the face sample image comprises a living body face image and a non-living body face image, and preprocessing the acquired face sample image;

s102, constructing a convolutional neural network based on in vivo detection, wherein the convolutional neural network based on in vivo detection comprises a multi-scale attention fusion module;

s103, inputting the face sample image into the convolutional neural network based on the living body detection, and training to obtain a convolutional neural network model based on the living body detection;

s104, inputting a face image to be detected into the convolutional neural network model based on living body detection, judging whether output data is larger than a preset threshold value, and if so, determining that the image to be detected is a living body face image; and if not, determining that the image to be detected is a non-living body face image.

2. The convolutional neural network-based living human face detection method according to claim 1, wherein the step of preprocessing the acquired human face sample image comprises:

adjusting the human face sample image to be a preset size;

and identifying a living body face image and a non-living body face image in the face sample image.

3. The face live-body detection method based on the convolutional neural network as claimed in claim 2, wherein the step of S102 specifically is:

searching for optimal features through an NAS algorithm based on a plurality of preset layers of a preset initial convolutional neural network, and determining a backbone network of the convolutional neural network based on in vivo detection;

and integrating the multi-scale attention fusion module into the next layer of the main network of the convolutional neural network based on living body detection to obtain the convolutional neural network based on living body face detection.

4. The face live-body detection method based on the convolutional neural network as claimed in claim 3, wherein the step of S103 specifically is:

inputting the face sample image into the convolutional neural network based on the living body detection, training and optimizing the convolutional neural network based on the living body detection, and determining that the convolutional neural network structure when the loss function is smaller than a preset threshold value is the convolutional neural network model based on the living body detection.

5. The convolutional neural network-based living human face detection method as claimed in any one of claims 2 to 4, wherein the step of identifying the living human face image and the non-living human face image in the human face sample image is preceded by:

and positioning a face image area in the face sample image.

6. A face in vivo detection device based on a convolutional neural network is characterized by comprising:

the system comprises an acquisition module, a preprocessing module and a display module, wherein the acquisition module is used for acquiring a human face sample image, the human face sample image comprises a living body human face image and a non-living body human face image, and the acquired human face sample image is preprocessed;

the system comprises a construction module, a data processing module and a control module, wherein the construction module is used for constructing a convolutional neural network based on in vivo detection, and the convolutional neural network based on in vivo detection comprises a multi-scale attention fusion module;

the training module is used for inputting the face sample image into the convolutional neural network based on the living body detection, and training to obtain a convolutional neural network model based on the living body detection;

the detection module is used for inputting a face image to be detected into the convolutional neural network model based on living body detection, judging whether output data is larger than a preset threshold value or not, and if so, determining that the image to be detected is a living body face image; and if not, determining that the image to be detected is a non-living body face image.

7. The convolutional neural network-based living human face detection device as claimed in claim 6, wherein the acquiring module further comprises:

the preprocessing unit is used for adjusting the human face sample image to be a preset size; and identifying a living body face image and a non-living body face image in the face sample image.

8. The convolutional neural network-based living human face detection device as claimed in claim 7, wherein the construction module comprises:

the search unit is used for searching for optimal characteristics through an NAS algorithm based on a plurality of preset layers of a preset initial convolutional neural network and determining a main network of the convolutional neural network based on the living body detection;

and the integration unit is used for integrating the multi-scale attention fusion module into the next layer of the main network of the convolutional neural network based on living body detection to obtain the convolutional neural network based on living body face detection.

9. The convolutional neural network-based human face in-vivo detection device as claimed in claim 8, wherein the training module is specifically configured to input the human face sample image into the convolutional neural network based on in-vivo detection, perform training optimization on the convolutional neural network based on in-vivo detection, and determine that the convolutional neural network structure when the loss function is smaller than a preset threshold is the convolutional neural network model based on in-vivo detection.

10. The convolutional neural network-based living human face detection device as claimed in any one of claims 7 to 9, wherein the preprocessing unit is further configured to locate a human face image region in the human face sample image.