CN116798132A - Method, system and detection method for constructing flash living body detection model - Google Patents

Method, system and detection method for constructing flash living body detection model Download PDF

Info

Publication number
CN116798132A
CN116798132A CN202310940768.6A CN202310940768A CN116798132A CN 116798132 A CN116798132 A CN 116798132A CN 202310940768 A CN202310940768 A CN 202310940768A CN 116798132 A CN116798132 A CN 116798132A
Authority
CN
China
Prior art keywords
classification
model
samples
sample
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310940768.6A
Other languages
Chinese (zh)
Other versions
CN116798132B (en
Inventor
刘伟华
严宇
左勇
罗艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Athena Eyes Co Ltd
Original Assignee
Athena Eyes Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Athena Eyes Co Ltd filed Critical Athena Eyes Co Ltd
Priority to CN202310940768.6A priority Critical patent/CN116798132B/en
Publication of CN116798132A publication Critical patent/CN116798132A/en
Application granted granted Critical
Publication of CN116798132B publication Critical patent/CN116798132B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Abstract

The application provides a method, a system and a method for constructing a flash living body detection model, which relate to the field of living body detection, in particular to a method for constructing a flash living body detection model, comprising the following steps: acquiring a face image sample; processing the face image sample to obtain a training sample; classifying the training samples according to the categories of the training samples to obtain classified samples, wherein the classified samples comprise attack face samples and real face samples carrying classification labels; taking the classified samples as input and the predicted depth information of the classified samples as output, and constructing a prediction model based on the hybrid expert network; taking the predicted depth information output by the predicted model as input, and taking the probability of the predicted depth information belonging to each classification label as output to construct a classification model; and carrying out joint training on the prediction model and the classification model by using the classification sample to obtain the flash living body detection model. The application can improve the detection precision of the flash living body detection.

Description

Method, system and detection method for constructing flash living body detection model
Technical Field
The application relates to the field of living body detection, in particular to a method, a system and a detection method for constructing a flash living body detection model.
Background
In performing face recognition, the face recognition model is often deceptively used in various ways, such as displaying a photo of a person on a printed matter or an electronic screen, so that the face recognition model can be successfully deceptively used. Therefore, before the recognition, a living body detection model is generally used to determine whether the object currently performing face recognition is a real person or a dummy person.
The flash living body detection is a method for judging whether an object shot by a front camera of a mobile phone is a real person or a dummy person (such as a person displayed by a printed matter or an electronic screen) by using light rays emitted by a mobile phone screen as auxiliary signals. The principle is as follows: when the light of the mobile phone screen changes, the color of the photographed face also changes, and the color change information is obviously different between a true person (with a three-dimensional form) and a dummy person (with a plane form), so that the photographed color change information of the face (caused by the change of the light emitted by the mobile phone screen) can be input into a deep learning model to judge whether the photographed person is the true person or the dummy person. However, different attack data (such as printed matter attack and electronic screen attack) have different data distribution, and inherent conflict between the data distribution actually damages the prediction effect of the model, so that the accuracy of the existing flash living model is not high.
Therefore, how to improve the detection accuracy of the flash living body detection is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In order to solve the technical problems, the application provides a construction method of a flash living body detection model, which can improve the detection precision of flash living body detection. The application also provides a system for constructing the flash living body detection model and a flash living body detection method, which have the same technical effects.
The first object of the application is to provide a method for constructing a flash living body detection model.
The first object of the present application is achieved by the following technical solutions:
a method for constructing a flash living body detection model comprises the following steps:
acquiring a face image sample;
processing the face image sample to obtain a training sample;
classifying the training samples according to the categories of the training samples to obtain classification samples, wherein the classification samples comprise attack face samples and real face samples carrying classification labels;
taking the classified samples as input and the predicted depth information of the classified samples as output to construct a prediction model based on a hybrid expert network;
taking the predicted depth information output by the predicted model as input, and taking the probability that the predicted depth information belongs to each classification label as output to construct a classification model;
performing joint training on the prediction model and the classification model by using the classification sample to obtain a flash living body detection model;
the classification labels comprise a plurality of classification labels belonging to the attack face sample and 1 classification label belonging to the real face sample.
Preferably, in the method for constructing a flash living body detection model, the prediction model includes a plurality of expert networks and 1 gating network, and when the joint training is performed, the classification sample is used to train the prediction model, including:
distributing different types of the attack face samples to a corresponding expert network according to the classification labels of the attack face samples in the classification samples, and training the weight of the gating network by using the classification labels of the attack face samples;
and distributing the real face samples in the classified samples to each expert network, and training the weight of the gating network in an adaptive mode.
Preferably, the method for constructing the flash living body detection model further comprises:
and taking the classified sample as input, taking the attention information of the classified sample as output, and constructing a first model based on an attention mechanism.
Preferably, the method for constructing the flash living body detection model further comprises:
acquiring a plurality of pieces of prediction depth information output after the classification sample is input into the prediction model and a plurality of pieces of attention information output after the classification sample is input into the first model;
and fusing the plurality of predicted depth information and the plurality of attention information to obtain fused information.
Preferably, in the method for constructing a flash living body detection model, the constructing a classification model with the prediction depth information output by the prediction model as input and the probability that the prediction depth information belongs to each classification label as output includes:
and taking the fusion information as input, and taking the probability of the fusion information belonging to each classification label as output to construct a classification model.
Preferably, in the method for constructing a flash living body detection model, the performing joint training on the prediction model and the classification model by using the classification sample to obtain the flash living body detection model specifically includes:
and performing joint training on the prediction model, the first model and the classification model by using the classification sample to obtain a flash living body detection model.
A second object of the present application is to provide a flash living body detection method.
The second object of the present application is achieved by the following technical solutions:
a flash in vivo detection method comprising:
acquiring face image data to be detected;
processing the face image data to be detected to obtain input data;
inputting the input data into a flash living body detection model to obtain a detection result;
the flash living body detection model is obtained by adopting the method for constructing the flash living body detection model.
A third object of the present application is to provide a system for constructing a flash living body detection model.
The second object of the present application is achieved by the following technical solutions:
a system for building a flash living body detection model, comprising:
the first acquisition unit is used for acquiring a face image sample;
the processing unit is used for processing the face image sample to obtain a training sample;
the classification unit is used for classifying the training samples according to the categories of the training samples to obtain classification samples, wherein the classification samples comprise attack face samples and real face samples which carry classification labels;
the first construction unit is used for taking the classified samples as input, taking the predicted depth information of the classified samples as output and constructing a prediction model based on a mixed expert network;
the second construction unit is used for taking the prediction depth information output by the prediction model as input, and taking the probability that the prediction depth information belongs to each classification label as output to construct a classification model;
the training unit is used for carrying out combined training on the prediction model and the classification model by utilizing the classification sample to obtain a flash living body detection model;
the classification labels comprise a plurality of classification labels belonging to the attack face sample and 1 classification label belonging to the real face sample.
Preferably, in the system for constructing a flash living body detection model, the prediction model comprises a plurality of expert networks and 1 gating network,
the training unit is further configured to assign different types of the attack face samples to a corresponding one of the expert networks according to the classification labels of the attack face samples in the classification samples, and train weights of the gating networks by using the classification labels of the attack face samples;
the training unit is further configured to assign the real face sample in the classification sample to each of the expert networks, and train the weight of the gating network in an adaptive manner.
Preferably, in the system for constructing a flash living body detection model, the method further comprises:
a third construction unit, configured to construct a first model based on an attention mechanism by taking the classification sample as input and attention information of the classification sample as output;
a second acquisition unit configured to acquire a plurality of pieces of predicted depth information output after the classification sample is input to the prediction model and a plurality of pieces of attention information output after the classification sample is input to the first model;
the fusion unit is used for fusing the plurality of predicted depth information and the plurality of attention information to obtain fusion information;
the second construction unit is specifically configured to, when executing, as input, prediction depth information output by the prediction model, and output probability that the prediction depth information belongs to each classification label, construct a classification model:
taking the fusion information as input, and taking the probability of the fusion information belonging to each classification label as output to construct a classification model;
the training unit is specifically configured to, when performing joint training on the prediction model and the classification model by using the classification sample to obtain a flash living body detection model:
and performing joint training on the prediction model, the first model and the classification model by using the classification sample to obtain a flash living body detection model.
According to the technical scheme, the face image sample is obtained; processing the face image sample to obtain a training sample; classifying the training samples according to the categories of the training samples to obtain classified samples, wherein the classified samples comprise attack face samples and real face samples carrying classification labels; taking the classified samples as input and the predicted depth information of the classified samples as output, and constructing a prediction model based on the hybrid expert network; taking the predicted depth information output by the predicted model as input, and taking the probability of the predicted depth information belonging to each classification label as output to construct a classification model; and carrying out joint training on the prediction model and the classification model by using the classification sample to obtain the flash living body detection model. According to the technical scheme, the prediction model is constructed based on the mixed expert network, inherent conflict between data distribution can be solved by using the mixed expert network, the classification sample is used for carrying out combined training on the prediction model and the classification model, the flash living body detection model is obtained, damage of the inherent conflict between the data distribution to the prediction effect of the flash living body detection model can be reduced, and therefore the detection precision of the model is improved. In summary, the above technical solution can improve the detection accuracy of the flash living body detection.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.
FIG. 1 is a schematic flow chart of a method for constructing a flash living body detection model in an embodiment of the application;
FIG. 2 is a schematic diagram of a network structure of a flash living body detection model according to an embodiment of the present application;
FIG. 3 is a schematic diagram of another network structure of a flash living body detection model according to an embodiment of the present application;
FIG. 4 is a flow chart of a flash living body detection method according to an embodiment of the application;
fig. 5 is a schematic structural diagram of a system for constructing a flash living body detection model according to an embodiment of the present application.
Detailed Description
In order to enable those skilled in the art to better understand the technical solutions of the present application, the technical solutions of the embodiments of the present application will be clearly and completely described below, and it is obvious that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In the embodiments provided in the present application, it should be understood that the disclosed method and system may be implemented in other manners. The system embodiments described below are merely illustrative, and for example, the division of modules is merely a logical function division, and other divisions may be implemented in practice, such as: multiple modules or components may be combined, or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or modules, whether electrically, mechanically, or otherwise.
In addition, each functional unit in each embodiment of the present application may be integrated in one processor, or each unit may be separately used as one device, or two or more units may be integrated in one device; the functional units in the embodiments of the present application may be implemented in hardware, or may be implemented in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will appreciate that: all or part of the steps of implementing the method embodiments described below may be performed by program instructions and associated hardware, and the foregoing program instructions may be stored in a computer readable storage medium, which when executed, perform steps comprising the method embodiments described below; and the aforementioned storage medium includes: a mobile storage device, a Read Only Memory (ROM), a magnetic disk or an optical disk, or the like, which can store program codes.
It should be appreciated that the use of "systems," "devices," "units," and/or "modules" in this disclosure is but one way to distinguish between different components, elements, parts, portions, or assemblies at different levels. However, if other words can achieve the same purpose, the word can be replaced by other expressions.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, the meaning of "a plurality" or "a number" means two or more, unless specifically defined otherwise.
If a flowchart is used in the present application, the flowchart is used to describe the operations performed by a system according to an embodiment of the present application. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.
It should also be noted that, in this document, terms such as "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such article or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in an article or apparatus that comprises such element.
The embodiment of the application is written in a progressive manner.
As shown in fig. 1, an embodiment of the present application provides a method for constructing a flash living body detection model, including:
s101, acquiring a face image sample;
in S101, specifically, a plurality of frames of face photos are collected as face image samples by means of mobile phone photographing by means of flash of a mobile phone screen. In some embodiments, a large number of face image samples can be collected under the conditions of different ages, different environments, different attack modes and different mobile phone devices so as to ensure the training effect of the model. The face image sample can also be obtained directly in other reasonable ways, and the application is not limited to this.
S102, processing a face image sample to obtain a training sample;
in S102, one implementation of this step includes: and calculating color change information of the face image sample to obtain a training sample, wherein the color change information comprises Normal Cues. Specifically, the mobile phone camera performs image acquisition on the irradiated face to obtain a face reflection image sequence (Encoded Sequence Frames), and the acquired face reflection image sequence is calculated according to a Lambertian reflection model to obtain corresponding Normal Cues. Face depth maps can be obtained based on Normal Cues.
S103, classifying the training samples according to the categories of the training samples to obtain classified samples, wherein the classified samples comprise attack face samples and real face samples which carry classification labels;
in S103, definition of classification tags: two classification labels are defined according to the attack face sample and the real face sample to which the training sample belongs, and a plurality of classification labels are defined in a refined manner under the class of the attack face sample, namely, the classification labels can comprise a plurality of classification labels belonging to the attack face sample and 1 classification label belonging to the real face sample. The classification labels belonging to the attack face sample may be classified according to one or more classification dimensions, for example, classified according to attack categories, and the classification labels of the attack face sample may include an electronic screen attack label, a printed matter attack label, and the like. In some embodiments, the number of samples corresponding to each class label in the class sample should be consistent.
S104, taking the classified samples as input, taking the predicted depth information of the classified samples as output, and constructing a prediction model based on the mixed expert network;
in S104, a prediction model based on the hybrid expert network is constructed, the prediction model takes the classification sample as an input, predicts according to the classification sample, and outputs prediction depth information of the classification sample. In some embodiments, the predictive model may be used to predict depth information for a person's face, with subsequent training forcing the predictive model to understand the facial form and facial features information contained in the input data, and for a dummy, no depth information (e.g., a person printed on a piece of 2D paper), should a plane be predicted at this time. The predicted depth information may be a depth information map.
Among them, the hybrid expert network was first used in the recommendation algorithm, google 2018 published paper Modeling Task Relationships in Multi-task Learning with Multi-gate mix-of-expertise, which uses a hybrid expert network with multiple gating mechanisms to achieve multitasking training. Multitasking is in fact a model making multiple predictions simultaneously. The mixed expert network of the multi-gate mechanism firstly inputs input data to n expert networks to obtain n feature vectors, then inputs the input data to a gate network to output the weight values of the n expert networks, multiplies the weight values of the n expert networks to the feature vectors output by the n expert networks, and then adds up the weight values to obtain a feature vector. It should be noted that the n weight values output by the gating network are typically adaptive, with no tags in the training data to guide its learning. It controls the fusion ratio of the characteristic vectors of the corresponding expert network just like a gate, so the gate network is called. While the traditional flash living model uses only one network, in this step, the predictive model is built based on a hybrid expert network, so that multiple expert networks can be utilized to resolve the inherent conflicts between data distributions.
S105, taking the predicted depth information output by the predicted model as input, and taking the probability of the predicted depth information belonging to each classification label as output to construct a classification model;
in S105, the classification model takes as input the prediction depth information output by the prediction model, classifies the prediction depth information, and outputs the probability that the prediction depth information belongs to each classification label. In some embodiments, the classification model may determine whether the face image is a real person or a dummy person according to the depth information map predicted by the prediction model. In other embodiments, a conventional deep learning network may be constructed as a classification model, and other types of networks may be reasonably employed, which is not limited by the comparison of the present application.
S106, performing combined training on the prediction model and the classification model by using the classification sample to obtain a flash living body detection model.
In S106, the constructed prediction model and the classification model are jointly trained by using the classification samples, so as to obtain a flash living body detection model, where the flash living body detection model may include the prediction model and the classification model after training is completed. In some embodiments, the predictive model is built based on a hybrid expert network, and samples of different class labels in the class samples can be assigned to different expert networks, each expert focusing on processing data for a particular class label, and using an adaptive approach, training the weights of the gating network. Other reasonable training modes can be adopted, and the application is not limited to the method. In some embodiments, the classification model may be implemented by a general model training method, for example, the parameters of the model are set randomly and then automatically trained, which is not limited by the present application.
The embodiment above, by acquiring a face image sample; processing the face image sample to obtain a training sample; classifying the training samples according to the categories of the training samples to obtain classified samples, wherein the classified samples comprise attack face samples and real face samples carrying classification labels; taking the classified samples as input and the predicted depth information of the classified samples as output, and constructing a prediction model based on the hybrid expert network; taking the predicted depth information output by the predicted model as input, and taking the probability of the predicted depth information belonging to each classification label as output to construct a classification model; and carrying out joint training on the prediction model and the classification model by using the classification sample to obtain the flash living body detection model. According to the embodiment, the prediction model is constructed based on the mixed expert network, the mixed expert network can be used for solving the inherent conflict between the data distribution, the classification sample is used for carrying out combined training on the prediction model and the classification model to obtain the flash living body detection model, the damage of the inherent conflict between the data distribution to the prediction effect of the flash living body detection model can be reduced, and therefore the detection precision of the model is improved. In summary, the above-described embodiments can improve the detection accuracy of flash living body detection.
In other embodiments of the present application, the prediction model includes a plurality of expert networks and 1 gating network, and the training the prediction model by using the classification samples during the joint training includes:
s201, distributing different types of attack face samples to a corresponding expert network according to the classification labels of the attack face samples in the classification samples, and training the weight of a gating network by using the classification labels of the attack face samples;
s202, distributing the real face samples in the classified samples to each expert network, and training the weight of the gating network in a self-adaptive mode.
In this embodiment, based on the classification label of the attack face sample, different types of attack face samples are allocated to a corresponding one of the expert networks, so that the different expert networks process different types of attack face sample data, and each expert network is focused on processing own good data; and then merging output results of the expert networks by using a gating network. The learning of the gating network is generally adaptive, and in this embodiment, the real face sample in the classification sample is allocated to each expert network, the gating network selects an adaptive manner when learning the real face sample, and when learning the attack face sample, the output of the gating network is guided by the classification label of the attack face sample, which is not adaptive (for example, for the expert network processing the attack face sample, the gating network outputs a corresponding preset weight value), so that the model prediction effect can be improved.
In other embodiments of the present application, the method for constructing a flash living body detection model further includes:
s301, taking a classified sample as input, taking attention information of the classified sample as output, and constructing a first model based on an attention mechanism;
in S301, the first model takes the classified sample as input, and may obtain attention information of the classified sample from the classified sample based on an attention mechanism. In some embodiments, the attention information may be an attention profile, a first model, using an attention mechanism, that focuses attention on a noise-free portion, generating a multi-frame attention profile, based on the features required to extract the attention profile. In other embodiments, a conventional deep learning network may be constructed as the first model, and other types of networks may be reasonably employed, which is not limited by the present application.
S302, acquiring a plurality of pieces of prediction depth information output after a classification sample is input into a prediction model and a plurality of pieces of attention information output after the classification sample is input into a first model;
in S302, specifically, a multi-frame depth information map obtained by predicting a classification sample through a prediction model and a multi-frame attention map obtained by processing a classification sample through a first model may be obtained, and the number of specific acquisitions may be confirmed according to actual application requirements, for example, a 6-frame depth information map and a 6-frame attention map may be obtained.
S304, fusing the plurality of predicted depth information and the plurality of attention information to obtain fused information;
in S304, specifically, the attention map of the multiple frames may be used to weight-fuse the depth information maps of the multiple frames to obtain a fused information map, so that noise on each map may be reduced. The fusion information graph can be used as input of a subsequent classification model. In some embodiments, after normalization is performed on all attention attempts, the attention attempts can be fused with the depth information graph, so that the training speed of the model can be increased, and the model precision is improved.
The implementation manner of the step of constructing the classification model by taking the prediction depth information output by the prediction model as input and the probability that the prediction depth information belongs to each classification label as output specifically comprises the following steps:
s305, taking fusion information as input, taking probability of the fusion information belonging to each classification label as output, and constructing a classification model;
in S305, the classification model takes the fusion information as input, classifies the fusion information, and outputs probabilities that the fusion information belongs to each classification label. In some embodiments, the classification model may determine whether the face image is a real person or a dummy person according to the fusion information map obtained in S304.
The method comprises the steps of utilizing a classification sample to carry out joint training on a prediction model and a classification model to obtain a flash living body detection model, and specifically comprises the following steps:
s306, performing joint training on the prediction model, the first model and the classification model by using the classification sample to obtain a flash living body detection model.
In S306, the constructed prediction model, the first model and the classification model are jointly trained by using the classification sample to obtain a flash living body detection model, where the flash living body detection model may include the prediction model, the first model and the classification model after training is completed. In some embodiments, the prediction model may employ the training method shown in S201-S202 in the foregoing embodiments, and the first model and the classification model may employ a general model training method, for example, the parameters of the model are set randomly and then automatically trained.
In this embodiment, when the prediction model outputs multi-frame face depth information maps, the depth information maps are noisy, considering that the input data of the prediction model may contain color change information of multi-frame faces. Through constructing a first model, using an attention mechanism, focusing attention on a part without noise in the depth information graph to obtain a multi-frame attention information graph, fusing the multi-frame attention information graph and the multi-frame depth information graph to obtain a fused information graph, and completing final classification by utilizing the fused information graph so as to improve the classification effect of the model.
In a specific embodiment, the prediction model, the first model and the classification model are jointly trained by using the classification sample, and the obtained flash living body detection model has a specific network structure schematic diagram, and reference may be made to fig. 2.
Classifying the acquired images, further processing to obtain multiple frames of Normal Cues, and marking as N 1 ,N 2 ,…,N i
Formalized representation of the predictive model is as follows:
wherein S is gen Representing shared feature extraction module, P embd Represents a position vector (used to mark absolute position information in a picture),represents the j-th expert network, S gate Representing the gating network (output is the fused weight of each expert network), N i Input data of the ith classification sample representing the predictive model,/->Representing the i-th output data of the predictive model, i.e., the depth information map.
It should be noted that some auxiliary labels may be used in the training process, for example, a pre-use tool (e.g., PRNet network) to generate a depth map of a real person to guideIs a learning of (a); the depth map of the dummy is represented by a pure gray map; when inputting the attack sample, the classification information of the attack sample is used as a label to guide S gate Is a learning object of (a).
From the classified samples, attention information of the classified samples is obtained, and the following can be referred to:
wherein S is atten Representing the features required to extract an attention-seeking diagram, u atten Attention diagram for generating multiple frames, N i Input data representing the ith classification sample of the first model, normalized to all attention attemptsRepresenting the normalized attention map.
Formalized representation of the classification model is as follows:
where c represents the classification model and Pred represents the output result of the classification model.
Wherein, the specific network structure diagram of different models can refer to fig. 3.
As shown in fig. 4, in other embodiments of the present application, there is also provided a flash living body detection method, including:
s401, acquiring face image data to be detected;
in S401, specifically, by means of flash of a mobile phone screen, a multi-frame face photo is collected by using a mobile phone photographing mode, and is used as face image data to be measured. The face image data to be measured can also be directly obtained in other reasonable modes, and the application is not limited to the above.
S402, processing face image data to be detected to obtain input data;
in S402, specifically, color change information of the face image data to be detected may be calculated to obtain input data, where the color change information includes Normal Cues.
S403, inputting the input data into a flash living body detection model to obtain a detection result, wherein the flash living body detection model is obtained by adopting the construction method of the flash living body detection model.
In S403, the flash living body detection model may include a prediction model and a classification model after the training is completed, specifically, input data into the prediction model, and obtain predicted depth information of the input data; then inputting the predicted depth information into a classification model to obtain the probability that the predicted depth information belongs to each classification label, and taking the probability as a detection result; in other embodiments, the flash living detection model may include a prediction model, a first model and a classification model after training is completed, specifically, input data is input into the prediction model to obtain predicted depth information of the input data; inputting the input data into a first model to obtain the attention information of the input data; fusing the attention information of the input data and the predicted depth information to obtain fused information; and finally, inputting the fusion information into a classification model to obtain the probability that the fusion information belongs to each classification label, and taking the probability as a detection result.
In this embodiment, the flash living body detection model is obtained by adopting the method for constructing the flash living body detection model according to any one of the above, and this embodiment can improve the detection accuracy of flash living body detection.
As shown in fig. 5, in another embodiment of the present application, there is also provided a system for constructing a flash living body detection model, including:
a first acquiring unit 10, configured to acquire a face image sample;
a processing unit 11, configured to process the face image sample to obtain a training sample;
the classifying unit 12 is configured to classify the training sample according to the class of the training sample, so as to obtain a classification sample, where the classification sample includes an attack face sample and a real face sample that carry classification labels;
a first construction unit 13 for constructing a hybrid expert network-based prediction model with the classification samples as input and prediction depth information of the classification samples as output;
a second construction unit 14, configured to construct a classification model by taking as input prediction depth information output by the prediction model, and taking as output a probability that the prediction depth information belongs to each classification label;
the training unit 15 is configured to perform joint training on the prediction model and the classification model by using the classification sample, so as to obtain a flash living body detection model;
the classification labels comprise a plurality of classification labels belonging to the attack face sample and 1 classification label belonging to the real face sample.
On the basis of the embodiment, in the system for constructing the flash living body detection model, the prediction model comprises a plurality of expert networks and 1 gating network,
the training unit 15 is further configured to assign different types of attack face samples to a corresponding expert network according to the classification labels of the attack face samples in the classification samples, and train the weights of the gating networks by using the classification labels of the attack face samples;
the training unit 15 is further configured to assign a real face sample in the classification samples to each expert network, and train the weights of the gating networks in an adaptive manner.
On the basis of the foregoing embodiment, the system for constructing a flash living body detection model further includes:
a third construction unit, configured to construct a first model based on an attention mechanism with the classification sample as input and attention information of the classification sample as output;
a second acquisition unit configured to acquire a plurality of pieces of predicted depth information output after the classification sample is input to the prediction model and a plurality of pieces of attention information output after the classification sample is input to the first model;
the fusion unit is used for fusing the plurality of predicted depth information and the plurality of attention information to obtain fusion information;
the second construction unit 14 is specifically configured to, when executing, as input, prediction depth information output by the prediction model, and as output, probabilities that the prediction depth information belongs to each classification label, construct the classification model:
taking fusion information as input, taking probability of the fusion information belonging to each classification label as output, and constructing a classification model;
the training unit 15 is specifically configured to, when performing joint training on the prediction model and the classification model using the classification samples to obtain the flash living body detection model:
and performing joint training on the prediction model, the first model and the classification model by using the classification sample to obtain a flash living body detection model.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The method for constructing the flash living body detection model is characterized by comprising the following steps of:
acquiring a face image sample;
processing the face image sample to obtain a training sample;
classifying the training samples according to the categories of the training samples to obtain classification samples, wherein the classification samples comprise attack face samples and real face samples carrying classification labels;
taking the classified samples as input and the predicted depth information of the classified samples as output to construct a prediction model based on a hybrid expert network;
taking the predicted depth information output by the predicted model as input, and taking the probability that the predicted depth information belongs to each classification label as output to construct a classification model;
performing joint training on the prediction model and the classification model by using the classification sample to obtain a flash living body detection model;
the classification labels comprise a plurality of classification labels belonging to the attack face sample and 1 classification label belonging to the real face sample.
2. The method of claim 1, wherein the predictive model comprises a plurality of expert networks and 1 gating network, and wherein the training the predictive model using the classification samples in the joint training comprises:
distributing different types of the attack face samples to a corresponding expert network according to the classification labels of the attack face samples in the classification samples, and training the weight of the gating network by using the classification labels of the attack face samples;
and distributing the real face samples in the classified samples to each expert network, and training the weight of the gating network in an adaptive mode.
3. The method as recited in claim 1, further comprising:
and taking the classified sample as input, taking the attention information of the classified sample as output, and constructing a first model based on an attention mechanism.
4. A method as recited in claim 3, further comprising:
acquiring a plurality of pieces of prediction depth information output after the classification sample is input into the prediction model and a plurality of pieces of attention information output after the classification sample is input into the first model;
and fusing the plurality of predicted depth information and the plurality of attention information to obtain fused information.
5. The method of claim 4, wherein the constructing a classification model with the predicted depth information output by the prediction model as input and the probability that the predicted depth information belongs to each of the classification labels as output comprises:
and taking the fusion information as input, and taking the probability of the fusion information belonging to each classification label as output to construct a classification model.
6. The method of claim 5, wherein the jointly training the prediction model and the classification model to obtain a flash living body detection model by using the classification sample, specifically comprises:
and performing joint training on the prediction model, the first model and the classification model by using the classification sample to obtain a flash living body detection model.
7. A flash in vivo detection method, comprising:
acquiring face image data to be detected;
processing the face image data to be detected to obtain input data;
inputting the input data into a flash living body detection model to obtain a detection result;
wherein the flash living body detection model is obtained by the method according to any one of claims 1 to 7.
8. A system for building a flash living body detection model, comprising:
the first acquisition unit is used for acquiring a face image sample;
the processing unit is used for processing the face image sample to obtain a training sample;
the classification unit is used for classifying the training samples according to the categories of the training samples to obtain classification samples, wherein the classification samples comprise attack face samples and real face samples which carry classification labels;
the first construction unit is used for taking the classified samples as input, taking the predicted depth information of the classified samples as output and constructing a prediction model based on a mixed expert network;
the second construction unit is used for taking the prediction depth information output by the prediction model as input, and taking the probability that the prediction depth information belongs to each classification label as output to construct a classification model;
the training unit is used for carrying out combined training on the prediction model and the classification model by utilizing the classification sample to obtain a flash living body detection model;
the classification labels comprise a plurality of classification labels belonging to the attack face sample and 1 classification label belonging to the real face sample.
9. The system of claim 8, wherein the predictive model includes a plurality of expert networks and 1 gating network,
the training unit is further configured to assign different types of the attack face samples to a corresponding one of the expert networks according to the classification labels of the attack face samples in the classification samples, and train weights of the gating networks by using the classification labels of the attack face samples;
the training unit is further configured to assign the real face sample in the classification sample to each of the expert networks, and train the weight of the gating network in an adaptive manner.
10. The system as recited in claim 8, further comprising:
a third construction unit, configured to construct a first model based on an attention mechanism by taking the classification sample as input and attention information of the classification sample as output;
a second acquisition unit configured to acquire a plurality of pieces of predicted depth information output after the classification sample is input to the prediction model and a plurality of pieces of attention information output after the classification sample is input to the first model;
the fusion unit is used for fusing the plurality of predicted depth information and the plurality of attention information to obtain fusion information;
the second construction unit is specifically configured to, when executing, as input, prediction depth information output by the prediction model, and output probability that the prediction depth information belongs to each classification label, construct a classification model:
taking the fusion information as input, and taking the probability of the fusion information belonging to each classification label as output to construct a classification model;
the training unit is specifically configured to, when performing joint training on the prediction model and the classification model by using the classification sample to obtain a flash living body detection model:
and performing joint training on the prediction model, the first model and the classification model by using the classification sample to obtain a flash living body detection model.
CN202310940768.6A 2023-07-28 2023-07-28 Method, system and detection method for constructing flash living body detection model Active CN116798132B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310940768.6A CN116798132B (en) 2023-07-28 2023-07-28 Method, system and detection method for constructing flash living body detection model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310940768.6A CN116798132B (en) 2023-07-28 2023-07-28 Method, system and detection method for constructing flash living body detection model

Publications (2)

Publication Number Publication Date
CN116798132A true CN116798132A (en) 2023-09-22
CN116798132B CN116798132B (en) 2024-02-27

Family

ID=88043945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310940768.6A Active CN116798132B (en) 2023-07-28 2023-07-28 Method, system and detection method for constructing flash living body detection model

Country Status (1)

Country Link
CN (1) CN116798132B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569808A (en) * 2019-09-11 2019-12-13 腾讯科技(深圳)有限公司 Living body detection method and device and computer equipment
CN114333011A (en) * 2021-12-28 2022-04-12 北京的卢深视科技有限公司 Network training method, face recognition method, electronic device and storage medium
US20220189147A1 (en) * 2020-02-13 2022-06-16 Tencent Technology (Shenzhen) Company Limited Object detection model training method and apparatus, object detection method and apparatus, computer device, and storage medium
CN114842267A (en) * 2022-05-23 2022-08-02 南京邮电大学 Image classification method and system based on label noise domain self-adaption
CN115063866A (en) * 2022-06-30 2022-09-16 南京邮电大学 Expression recognition method integrating reinforcement learning and progressive learning
CN115240280A (en) * 2022-03-29 2022-10-25 浙大城市学院 Construction method of human face living body detection classification model, detection classification method and device
CN115880740A (en) * 2021-09-27 2023-03-31 腾讯科技(深圳)有限公司 Face living body detection method and device, computer equipment and storage medium
CN115937993A (en) * 2022-12-14 2023-04-07 北京百度网讯科技有限公司 Living body detection model training method, living body detection device and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569808A (en) * 2019-09-11 2019-12-13 腾讯科技(深圳)有限公司 Living body detection method and device and computer equipment
US20220189147A1 (en) * 2020-02-13 2022-06-16 Tencent Technology (Shenzhen) Company Limited Object detection model training method and apparatus, object detection method and apparatus, computer device, and storage medium
CN115880740A (en) * 2021-09-27 2023-03-31 腾讯科技(深圳)有限公司 Face living body detection method and device, computer equipment and storage medium
CN114333011A (en) * 2021-12-28 2022-04-12 北京的卢深视科技有限公司 Network training method, face recognition method, electronic device and storage medium
CN115240280A (en) * 2022-03-29 2022-10-25 浙大城市学院 Construction method of human face living body detection classification model, detection classification method and device
CN114842267A (en) * 2022-05-23 2022-08-02 南京邮电大学 Image classification method and system based on label noise domain self-adaption
CN115063866A (en) * 2022-06-30 2022-09-16 南京邮电大学 Expression recognition method integrating reinforcement learning and progressive learning
CN115937993A (en) * 2022-12-14 2023-04-07 北京百度网讯科技有限公司 Living body detection model training method, living body detection device and electronic equipment

Also Published As

Publication number Publication date
CN116798132B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
CN110827236B (en) Brain tissue layering method, device and computer equipment based on neural network
KR20190106853A (en) Apparatus and method for recognition of text information
CN111428448B (en) Text generation method, device, computer equipment and readable storage medium
EP3647993A1 (en) Interactive user verification
WO2020190480A1 (en) Classifying an input data set within a data category using multiple data recognition tools
CN112287896A (en) Unmanned aerial vehicle aerial image target detection method and system based on deep learning
CN111415336B (en) Image tampering identification method, device, server and storage medium
CN112084859A (en) Building segmentation method based on dense boundary block and attention mechanism
CN116071294A (en) Optical fiber surface defect detection method and device
CN113255516A (en) Living body detection method and device and electronic equipment
CN110472655B (en) Marker machine learning identification system and method for cross-border travel
CN113269307B (en) Neural network training method and target re-identification method
CN116798132B (en) Method, system and detection method for constructing flash living body detection model
CN110688878A (en) Living body identification detection method, living body identification detection device, living body identification detection medium, and electronic device
CN116958512A (en) Target detection method, target detection device, computer readable medium and electronic equipment
KR102525491B1 (en) Method of providing structure damage detection report
CN116468113A (en) Living body detection model training method, living body detection method and living body detection system
CN116311546A (en) Living body detection method and system
CN113837255B (en) Method, apparatus and medium for predicting cell-based antibody karyotype class
CN113240050B (en) Metal printing molten pool detection method with adjustable feature fusion weight
CN112446428B (en) Image data processing method and device
US20230289957A1 (en) Disease diagnosis method using neural network trained by using multi-phase biometric image, and disease diagnosis system performing same
CN110991370B (en) Multichannel information fusion ATM panel carryover detection method
CN112784781A (en) Method and device for detecting forged faces based on difference perception meta-learning
CN117495767A (en) Image quality inspection method, image quality inspection device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: No. 205, Building B1, Huigu Science and Technology Industrial Park, No. 336 Bachelor Road, Bachelor Street, Yuelu District, Changsha City, Hunan Province, 410000

Applicant after: Wisdom Eye Technology Co.,Ltd.

Address before: Building 14, Phase I, Changsha Zhongdian Software Park, No. 39 Jianshan Road, Changsha High tech Development Zone, Changsha City, Hunan Province, 410205

Applicant before: Wisdom Eye Technology Co.,Ltd.

Country or region before: China

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant