CN116798132A

CN116798132A - Method, system and detection method for constructing flash living body detection model

Info

Publication number: CN116798132A
Application number: CN202310940768.6A
Authority: CN
Inventors: 刘伟华; 严宇; 左勇; 罗艳
Original assignee: Athena Eyes Co Ltd
Current assignee: Athena Eyes Co Ltd
Priority date: 2023-07-28
Filing date: 2023-07-28
Publication date: 2023-09-22
Anticipated expiration: 2043-07-28
Also published as: CN116798132B

Abstract

The application provides a method, a system and a method for constructing a flash living body detection model, which relate to the field of living body detection, in particular to a method for constructing a flash living body detection model, comprising the following steps: acquiring a face image sample; processing the face image sample to obtain a training sample; classifying the training samples according to the categories of the training samples to obtain classified samples, wherein the classified samples comprise attack face samples and real face samples carrying classification labels; taking the classified samples as input and the predicted depth information of the classified samples as output, and constructing a prediction model based on the hybrid expert network; taking the predicted depth information output by the predicted model as input, and taking the probability of the predicted depth information belonging to each classification label as output to construct a classification model; and carrying out joint training on the prediction model and the classification model by using the classification sample to obtain the flash living body detection model. The application can improve the detection precision of the flash living body detection.

Description

Method, system and detection method for constructing flash living body detection model

Technical Field

The application relates to the field of living body detection, in particular to a method, a system and a detection method for constructing a flash living body detection model.

Background

In performing face recognition, the face recognition model is often deceptively used in various ways, such as displaying a photo of a person on a printed matter or an electronic screen, so that the face recognition model can be successfully deceptively used. Therefore, before the recognition, a living body detection model is generally used to determine whether the object currently performing face recognition is a real person or a dummy person.

The flash living body detection is a method for judging whether an object shot by a front camera of a mobile phone is a real person or a dummy person (such as a person displayed by a printed matter or an electronic screen) by using light rays emitted by a mobile phone screen as auxiliary signals. The principle is as follows: when the light of the mobile phone screen changes, the color of the photographed face also changes, and the color change information is obviously different between a true person (with a three-dimensional form) and a dummy person (with a plane form), so that the photographed color change information of the face (caused by the change of the light emitted by the mobile phone screen) can be input into a deep learning model to judge whether the photographed person is the true person or the dummy person. However, different attack data (such as printed matter attack and electronic screen attack) have different data distribution, and inherent conflict between the data distribution actually damages the prediction effect of the model, so that the accuracy of the existing flash living model is not high.

Therefore, how to improve the detection accuracy of the flash living body detection is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

In order to solve the technical problems, the application provides a construction method of a flash living body detection model, which can improve the detection precision of flash living body detection. The application also provides a system for constructing the flash living body detection model and a flash living body detection method, which have the same technical effects.

The first object of the application is to provide a method for constructing a flash living body detection model.

The first object of the present application is achieved by the following technical solutions:

a method for constructing a flash living body detection model comprises the following steps:

acquiring a face image sample;

processing the face image sample to obtain a training sample;

classifying the training samples according to the categories of the training samples to obtain classification samples, wherein the classification samples comprise attack face samples and real face samples carrying classification labels;

taking the classified samples as input and the predicted depth information of the classified samples as output to construct a prediction model based on a hybrid expert network;

taking the predicted depth information output by the predicted model as input, and taking the probability that the predicted depth information belongs to each classification label as output to construct a classification model;

performing joint training on the prediction model and the classification model by using the classification sample to obtain a flash living body detection model;

the classification labels comprise a plurality of classification labels belonging to the attack face sample and 1 classification label belonging to the real face sample.

Preferably, in the method for constructing a flash living body detection model, the prediction model includes a plurality of expert networks and 1 gating network, and when the joint training is performed, the classification sample is used to train the prediction model, including:

distributing different types of the attack face samples to a corresponding expert network according to the classification labels of the attack face samples in the classification samples, and training the weight of the gating network by using the classification labels of the attack face samples;

and distributing the real face samples in the classified samples to each expert network, and training the weight of the gating network in an adaptive mode.

Preferably, the method for constructing the flash living body detection model further comprises:

and taking the classified sample as input, taking the attention information of the classified sample as output, and constructing a first model based on an attention mechanism.

acquiring a plurality of pieces of prediction depth information output after the classification sample is input into the prediction model and a plurality of pieces of attention information output after the classification sample is input into the first model;

and fusing the plurality of predicted depth information and the plurality of attention information to obtain fused information.

Preferably, in the method for constructing a flash living body detection model, the constructing a classification model with the prediction depth information output by the prediction model as input and the probability that the prediction depth information belongs to each classification label as output includes:

and taking the fusion information as input, and taking the probability of the fusion information belonging to each classification label as output to construct a classification model.

Preferably, in the method for constructing a flash living body detection model, the performing joint training on the prediction model and the classification model by using the classification sample to obtain the flash living body detection model specifically includes:

and performing joint training on the prediction model, the first model and the classification model by using the classification sample to obtain a flash living body detection model.

A second object of the present application is to provide a flash living body detection method.

The second object of the present application is achieved by the following technical solutions:

a flash in vivo detection method comprising:

acquiring face image data to be detected;

processing the face image data to be detected to obtain input data;

inputting the input data into a flash living body detection model to obtain a detection result;

the flash living body detection model is obtained by adopting the method for constructing the flash living body detection model.

A third object of the present application is to provide a system for constructing a flash living body detection model.

a system for building a flash living body detection model, comprising:

the first acquisition unit is used for acquiring a face image sample;

the processing unit is used for processing the face image sample to obtain a training sample;

the classification unit is used for classifying the training samples according to the categories of the training samples to obtain classification samples, wherein the classification samples comprise attack face samples and real face samples which carry classification labels;

the first construction unit is used for taking the classified samples as input, taking the predicted depth information of the classified samples as output and constructing a prediction model based on a mixed expert network;

the second construction unit is used for taking the prediction depth information output by the prediction model as input, and taking the probability that the prediction depth information belongs to each classification label as output to construct a classification model;

the training unit is used for carrying out combined training on the prediction model and the classification model by utilizing the classification sample to obtain a flash living body detection model;

Preferably, in the system for constructing a flash living body detection model, the prediction model comprises a plurality of expert networks and 1 gating network,

the training unit is further configured to assign different types of the attack face samples to a corresponding one of the expert networks according to the classification labels of the attack face samples in the classification samples, and train weights of the gating networks by using the classification labels of the attack face samples;

the training unit is further configured to assign the real face sample in the classification sample to each of the expert networks, and train the weight of the gating network in an adaptive manner.

Preferably, in the system for constructing a flash living body detection model, the method further comprises:

a third construction unit, configured to construct a first model based on an attention mechanism by taking the classification sample as input and attention information of the classification sample as output;

a second acquisition unit configured to acquire a plurality of pieces of predicted depth information output after the classification sample is input to the prediction model and a plurality of pieces of attention information output after the classification sample is input to the first model;

the fusion unit is used for fusing the plurality of predicted depth information and the plurality of attention information to obtain fusion information;

the second construction unit is specifically configured to, when executing, as input, prediction depth information output by the prediction model, and output probability that the prediction depth information belongs to each classification label, construct a classification model:

taking the fusion information as input, and taking the probability of the fusion information belonging to each classification label as output to construct a classification model;

the training unit is specifically configured to, when performing joint training on the prediction model and the classification model by using the classification sample to obtain a flash living body detection model:

According to the technical scheme, the face image sample is obtained; processing the face image sample to obtain a training sample; classifying the training samples according to the categories of the training samples to obtain classified samples, wherein the classified samples comprise attack face samples and real face samples carrying classification labels; taking the classified samples as input and the predicted depth information of the classified samples as output, and constructing a prediction model based on the hybrid expert network; taking the predicted depth information output by the predicted model as input, and taking the probability of the predicted depth information belonging to each classification label as output to construct a classification model; and carrying out joint training on the prediction model and the classification model by using the classification sample to obtain the flash living body detection model. According to the technical scheme, the prediction model is constructed based on the mixed expert network, inherent conflict between data distribution can be solved by using the mixed expert network, the classification sample is used for carrying out combined training on the prediction model and the classification model, the flash living body detection model is obtained, damage of the inherent conflict between the data distribution to the prediction effect of the flash living body detection model can be reduced, and therefore the detection precision of the model is improved. In summary, the above technical solution can improve the detection accuracy of the flash living body detection.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.

FIG. 1 is a schematic flow chart of a method for constructing a flash living body detection model in an embodiment of the application;

FIG. 2 is a schematic diagram of a network structure of a flash living body detection model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of another network structure of a flash living body detection model according to an embodiment of the present application;

FIG. 4 is a flow chart of a flash living body detection method according to an embodiment of the application;

fig. 5 is a schematic structural diagram of a system for constructing a flash living body detection model according to an embodiment of the present application.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present application, the technical solutions of the embodiments of the present application will be clearly and completely described below, and it is obvious that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In the embodiments provided in the present application, it should be understood that the disclosed method and system may be implemented in other manners. The system embodiments described below are merely illustrative, and for example, the division of modules is merely a logical function division, and other divisions may be implemented in practice, such as: multiple modules or components may be combined, or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or modules, whether electrically, mechanically, or otherwise.

In addition, each functional unit in each embodiment of the present application may be integrated in one processor, or each unit may be separately used as one device, or two or more units may be integrated in one device; the functional units in the embodiments of the present application may be implemented in hardware, or may be implemented in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will appreciate that: all or part of the steps of implementing the method embodiments described below may be performed by program instructions and associated hardware, and the foregoing program instructions may be stored in a computer readable storage medium, which when executed, perform steps comprising the method embodiments described below; and the aforementioned storage medium includes: a mobile storage device, a Read Only Memory (ROM), a magnetic disk or an optical disk, or the like, which can store program codes.

It should be appreciated that the use of "systems," "devices," "units," and/or "modules" in this disclosure is but one way to distinguish between different components, elements, parts, portions, or assemblies at different levels. However, if other words can achieve the same purpose, the word can be replaced by other expressions.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, the meaning of "a plurality" or "a number" means two or more, unless specifically defined otherwise.

If a flowchart is used in the present application, the flowchart is used to describe the operations performed by a system according to an embodiment of the present application. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.

It should also be noted that, in this document, terms such as "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such article or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in an article or apparatus that comprises such element.

The embodiment of the application is written in a progressive manner.

As shown in fig. 1, an embodiment of the present application provides a method for constructing a flash living body detection model, including:

s101, acquiring a face image sample;

in S101, specifically, a plurality of frames of face photos are collected as face image samples by means of mobile phone photographing by means of flash of a mobile phone screen. In some embodiments, a large number of face image samples can be collected under the conditions of different ages, different environments, different attack modes and different mobile phone devices so as to ensure the training effect of the model. The face image sample can also be obtained directly in other reasonable ways, and the application is not limited to this.

S102, processing a face image sample to obtain a training sample;

in S102, one implementation of this step includes: and calculating color change information of the face image sample to obtain a training sample, wherein the color change information comprises Normal Cues. Specifically, the mobile phone camera performs image acquisition on the irradiated face to obtain a face reflection image sequence (Encoded Sequence Frames), and the acquired face reflection image sequence is calculated according to a Lambertian reflection model to obtain corresponding Normal Cues. Face depth maps can be obtained based on Normal Cues.

S103, classifying the training samples according to the categories of the training samples to obtain classified samples, wherein the classified samples comprise attack face samples and real face samples which carry classification labels;

in S103, definition of classification tags: two classification labels are defined according to the attack face sample and the real face sample to which the training sample belongs, and a plurality of classification labels are defined in a refined manner under the class of the attack face sample, namely, the classification labels can comprise a plurality of classification labels belonging to the attack face sample and 1 classification label belonging to the real face sample. The classification labels belonging to the attack face sample may be classified according to one or more classification dimensions, for example, classified according to attack categories, and the classification labels of the attack face sample may include an electronic screen attack label, a printed matter attack label, and the like. In some embodiments, the number of samples corresponding to each class label in the class sample should be consistent.

S104, taking the classified samples as input, taking the predicted depth information of the classified samples as output, and constructing a prediction model based on the mixed expert network;

in S104, a prediction model based on the hybrid expert network is constructed, the prediction model takes the classification sample as an input, predicts according to the classification sample, and outputs prediction depth information of the classification sample. In some embodiments, the predictive model may be used to predict depth information for a person's face, with subsequent training forcing the predictive model to understand the facial form and facial features information contained in the input data, and for a dummy, no depth information (e.g., a person printed on a piece of 2D paper), should a plane be predicted at this time. The predicted depth information may be a depth information map.

Among them, the hybrid expert network was first used in the recommendation algorithm, google 2018 published paper Modeling Task Relationships in Multi-task Learning with Multi-gate mix-of-expertise, which uses a hybrid expert network with multiple gating mechanisms to achieve multitasking training. Multitasking is in fact a model making multiple predictions simultaneously. The mixed expert network of the multi-gate mechanism firstly inputs input data to n expert networks to obtain n feature vectors, then inputs the input data to a gate network to output the weight values of the n expert networks, multiplies the weight values of the n expert networks to the feature vectors output by the n expert networks, and then adds up the weight values to obtain a feature vector. It should be noted that the n weight values output by the gating network are typically adaptive, with no tags in the training data to guide its learning. It controls the fusion ratio of the characteristic vectors of the corresponding expert network just like a gate, so the gate network is called. While the traditional flash living model uses only one network, in this step, the predictive model is built based on a hybrid expert network, so that multiple expert networks can be utilized to resolve the inherent conflicts between data distributions.

S105, taking the predicted depth information output by the predicted model as input, and taking the probability of the predicted depth information belonging to each classification label as output to construct a classification model;

in S105, the classification model takes as input the prediction depth information output by the prediction model, classifies the prediction depth information, and outputs the probability that the prediction depth information belongs to each classification label. In some embodiments, the classification model may determine whether the face image is a real person or a dummy person according to the depth information map predicted by the prediction model. In other embodiments, a conventional deep learning network may be constructed as a classification model, and other types of networks may be reasonably employed, which is not limited by the comparison of the present application.

S106, performing combined training on the prediction model and the classification model by using the classification sample to obtain a flash living body detection model.

In S106, the constructed prediction model and the classification model are jointly trained by using the classification samples, so as to obtain a flash living body detection model, where the flash living body detection model may include the prediction model and the classification model after training is completed. In some embodiments, the predictive model is built based on a hybrid expert network, and samples of different class labels in the class samples can be assigned to different expert networks, each expert focusing on processing data for a particular class label, and using an adaptive approach, training the weights of the gating network. Other reasonable training modes can be adopted, and the application is not limited to the method. In some embodiments, the classification model may be implemented by a general model training method, for example, the parameters of the model are set randomly and then automatically trained, which is not limited by the present application.

The embodiment above, by acquiring a face image sample; processing the face image sample to obtain a training sample; classifying the training samples according to the categories of the training samples to obtain classified samples, wherein the classified samples comprise attack face samples and real face samples carrying classification labels; taking the classified samples as input and the predicted depth information of the classified samples as output, and constructing a prediction model based on the hybrid expert network; taking the predicted depth information output by the predicted model as input, and taking the probability of the predicted depth information belonging to each classification label as output to construct a classification model; and carrying out joint training on the prediction model and the classification model by using the classification sample to obtain the flash living body detection model. According to the embodiment, the prediction model is constructed based on the mixed expert network, the mixed expert network can be used for solving the inherent conflict between the data distribution, the classification sample is used for carrying out combined training on the prediction model and the classification model to obtain the flash living body detection model, the damage of the inherent conflict between the data distribution to the prediction effect of the flash living body detection model can be reduced, and therefore the detection precision of the model is improved. In summary, the above-described embodiments can improve the detection accuracy of flash living body detection.

In other embodiments of the present application, the prediction model includes a plurality of expert networks and 1 gating network, and the training the prediction model by using the classification samples during the joint training includes:

s201, distributing different types of attack face samples to a corresponding expert network according to the classification labels of the attack face samples in the classification samples, and training the weight of a gating network by using the classification labels of the attack face samples;

s202, distributing the real face samples in the classified samples to each expert network, and training the weight of the gating network in a self-adaptive mode.

In this embodiment, based on the classification label of the attack face sample, different types of attack face samples are allocated to a corresponding one of the expert networks, so that the different expert networks process different types of attack face sample data, and each expert network is focused on processing own good data; and then merging output results of the expert networks by using a gating network. The learning of the gating network is generally adaptive, and in this embodiment, the real face sample in the classification sample is allocated to each expert network, the gating network selects an adaptive manner when learning the real face sample, and when learning the attack face sample, the output of the gating network is guided by the classification label of the attack face sample, which is not adaptive (for example, for the expert network processing the attack face sample, the gating network outputs a corresponding preset weight value), so that the model prediction effect can be improved.

In other embodiments of the present application, the method for constructing a flash living body detection model further includes:

s301, taking a classified sample as input, taking attention information of the classified sample as output, and constructing a first model based on an attention mechanism;

in S301, the first model takes the classified sample as input, and may obtain attention information of the classified sample from the classified sample based on an attention mechanism. In some embodiments, the attention information may be an attention profile, a first model, using an attention mechanism, that focuses attention on a noise-free portion, generating a multi-frame attention profile, based on the features required to extract the attention profile. In other embodiments, a conventional deep learning network may be constructed as the first model, and other types of networks may be reasonably employed, which is not limited by the present application.

S302, acquiring a plurality of pieces of prediction depth information output after a classification sample is input into a prediction model and a plurality of pieces of attention information output after the classification sample is input into a first model;

in S302, specifically, a multi-frame depth information map obtained by predicting a classification sample through a prediction model and a multi-frame attention map obtained by processing a classification sample through a first model may be obtained, and the number of specific acquisitions may be confirmed according to actual application requirements, for example, a 6-frame depth information map and a 6-frame attention map may be obtained.

S304, fusing the plurality of predicted depth information and the plurality of attention information to obtain fused information;

in S304, specifically, the attention map of the multiple frames may be used to weight-fuse the depth information maps of the multiple frames to obtain a fused information map, so that noise on each map may be reduced. The fusion information graph can be used as input of a subsequent classification model. In some embodiments, after normalization is performed on all attention attempts, the attention attempts can be fused with the depth information graph, so that the training speed of the model can be increased, and the model precision is improved.

The implementation manner of the step of constructing the classification model by taking the prediction depth information output by the prediction model as input and the probability that the prediction depth information belongs to each classification label as output specifically comprises the following steps:

s305, taking fusion information as input, taking probability of the fusion information belonging to each classification label as output, and constructing a classification model;

in S305, the classification model takes the fusion information as input, classifies the fusion information, and outputs probabilities that the fusion information belongs to each classification label. In some embodiments, the classification model may determine whether the face image is a real person or a dummy person according to the fusion information map obtained in S304.

The method comprises the steps of utilizing a classification sample to carry out joint training on a prediction model and a classification model to obtain a flash living body detection model, and specifically comprises the following steps:

s306, performing joint training on the prediction model, the first model and the classification model by using the classification sample to obtain a flash living body detection model.

In S306, the constructed prediction model, the first model and the classification model are jointly trained by using the classification sample to obtain a flash living body detection model, where the flash living body detection model may include the prediction model, the first model and the classification model after training is completed. In some embodiments, the prediction model may employ the training method shown in S201-S202 in the foregoing embodiments, and the first model and the classification model may employ a general model training method, for example, the parameters of the model are set randomly and then automatically trained.

In this embodiment, when the prediction model outputs multi-frame face depth information maps, the depth information maps are noisy, considering that the input data of the prediction model may contain color change information of multi-frame faces. Through constructing a first model, using an attention mechanism, focusing attention on a part without noise in the depth information graph to obtain a multi-frame attention information graph, fusing the multi-frame attention information graph and the multi-frame depth information graph to obtain a fused information graph, and completing final classification by utilizing the fused information graph so as to improve the classification effect of the model.

In a specific embodiment, the prediction model, the first model and the classification model are jointly trained by using the classification sample, and the obtained flash living body detection model has a specific network structure schematic diagram, and reference may be made to fig. 2.

Classifying the acquired images, further processing to obtain multiple frames of Normal Cues, and marking as N ₁ ,N ₂ ,…,N _i ；

Formalized representation of the predictive model is as follows:

wherein S is _gen Representing shared feature extraction module, P _embd Represents a position vector (used to mark absolute position information in a picture),represents the j-th expert network, S _gate Representing the gating network (output is the fused weight of each expert network), N _i Input data of the ith classification sample representing the predictive model,/->Representing the i-th output data of the predictive model, i.e., the depth information map.

It should be noted that some auxiliary labels may be used in the training process, for example, a pre-use tool (e.g., PRNet network) to generate a depth map of a real person to guideIs a learning of (a); the depth map of the dummy is represented by a pure gray map; when inputting the attack sample, the classification information of the attack sample is used as a label to guide S _gate Is a learning object of (a).

From the classified samples, attention information of the classified samples is obtained, and the following can be referred to:

wherein S is _atten Representing the features required to extract an attention-seeking diagram, u _atten Attention diagram for generating multiple frames, N _i Input data representing the ith classification sample of the first model, normalized to all attention attemptsRepresenting the normalized attention map.

Formalized representation of the classification model is as follows:

where c represents the classification model and Pred represents the output result of the classification model.

Wherein, the specific network structure diagram of different models can refer to fig. 3.

As shown in fig. 4, in other embodiments of the present application, there is also provided a flash living body detection method, including:

s401, acquiring face image data to be detected;

in S401, specifically, by means of flash of a mobile phone screen, a multi-frame face photo is collected by using a mobile phone photographing mode, and is used as face image data to be measured. The face image data to be measured can also be directly obtained in other reasonable modes, and the application is not limited to the above.

S402, processing face image data to be detected to obtain input data;

in S402, specifically, color change information of the face image data to be detected may be calculated to obtain input data, where the color change information includes Normal Cues.

S403, inputting the input data into a flash living body detection model to obtain a detection result, wherein the flash living body detection model is obtained by adopting the construction method of the flash living body detection model.

In S403, the flash living body detection model may include a prediction model and a classification model after the training is completed, specifically, input data into the prediction model, and obtain predicted depth information of the input data; then inputting the predicted depth information into a classification model to obtain the probability that the predicted depth information belongs to each classification label, and taking the probability as a detection result; in other embodiments, the flash living detection model may include a prediction model, a first model and a classification model after training is completed, specifically, input data is input into the prediction model to obtain predicted depth information of the input data; inputting the input data into a first model to obtain the attention information of the input data; fusing the attention information of the input data and the predicted depth information to obtain fused information; and finally, inputting the fusion information into a classification model to obtain the probability that the fusion information belongs to each classification label, and taking the probability as a detection result.

In this embodiment, the flash living body detection model is obtained by adopting the method for constructing the flash living body detection model according to any one of the above, and this embodiment can improve the detection accuracy of flash living body detection.

As shown in fig. 5, in another embodiment of the present application, there is also provided a system for constructing a flash living body detection model, including:

a first acquiring unit 10, configured to acquire a face image sample;

a processing unit 11, configured to process the face image sample to obtain a training sample;

the classifying unit 12 is configured to classify the training sample according to the class of the training sample, so as to obtain a classification sample, where the classification sample includes an attack face sample and a real face sample that carry classification labels;

a first construction unit 13 for constructing a hybrid expert network-based prediction model with the classification samples as input and prediction depth information of the classification samples as output;

a second construction unit 14, configured to construct a classification model by taking as input prediction depth information output by the prediction model, and taking as output a probability that the prediction depth information belongs to each classification label;

the training unit 15 is configured to perform joint training on the prediction model and the classification model by using the classification sample, so as to obtain a flash living body detection model;

On the basis of the embodiment, in the system for constructing the flash living body detection model, the prediction model comprises a plurality of expert networks and 1 gating network,

the training unit 15 is further configured to assign different types of attack face samples to a corresponding expert network according to the classification labels of the attack face samples in the classification samples, and train the weights of the gating networks by using the classification labels of the attack face samples;

the training unit 15 is further configured to assign a real face sample in the classification samples to each expert network, and train the weights of the gating networks in an adaptive manner.

On the basis of the foregoing embodiment, the system for constructing a flash living body detection model further includes:

a third construction unit, configured to construct a first model based on an attention mechanism with the classification sample as input and attention information of the classification sample as output;

the second construction unit 14 is specifically configured to, when executing, as input, prediction depth information output by the prediction model, and as output, probabilities that the prediction depth information belongs to each classification label, construct the classification model:

taking fusion information as input, taking probability of the fusion information belonging to each classification label as output, and constructing a classification model;

the training unit 15 is specifically configured to, when performing joint training on the prediction model and the classification model using the classification samples to obtain the flash living body detection model:

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The method for constructing the flash living body detection model is characterized by comprising the following steps of:

acquiring a face image sample;

processing the face image sample to obtain a training sample;

2. The method of claim 1, wherein the predictive model comprises a plurality of expert networks and 1 gating network, and wherein the training the predictive model using the classification samples in the joint training comprises:

3. The method as recited in claim 1, further comprising:

4. A method as recited in claim 3, further comprising:

5. The method of claim 4, wherein the constructing a classification model with the predicted depth information output by the prediction model as input and the probability that the predicted depth information belongs to each of the classification labels as output comprises:

6. The method of claim 5, wherein the jointly training the prediction model and the classification model to obtain a flash living body detection model by using the classification sample, specifically comprises:

7. A flash in vivo detection method, comprising:

acquiring face image data to be detected;

processing the face image data to be detected to obtain input data;

wherein the flash living body detection model is obtained by the method according to any one of claims 1 to 7.

8. A system for building a flash living body detection model, comprising:

the first acquisition unit is used for acquiring a face image sample;

9. The system of claim 8, wherein the predictive model includes a plurality of expert networks and 1 gating network,

10. The system as recited in claim 8, further comprising: