CN116959059A

CN116959059A - Living body detection method, living body detection device and storage medium

Info

Publication number: CN116959059A
Application number: CN202310390679.9A
Authority: CN
Inventors: 张克越; 周千寓; 姚太平; 尹邦杰; 丁守鸿
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-04-04
Filing date: 2023-04-04
Publication date: 2023-10-27

Abstract

The application discloses a method, a device and a storage medium for living body detection. Self-adaptive sampling is carried out on sample characteristics in a sample data set to obtain base sample characteristics corresponding to each sample type; linear combination is carried out on the characteristics of the substrate sample according to the weight information of sampling distribution to obtain new style information; and performing feature recombination to obtain recombination features; and then configuring corresponding multi-dimensional loss information to perform model training so as to perform living body detection. Because the sample self-adaptive characteristic enhancement is adopted to construct the recombination characteristic and the corresponding whitening loss information is configured, the generalization capability of the model on the unseen target domain is enhanced, and the cross-domain invariant characteristic of each sample can be learned under the condition of not accessing any domain label, so that the generalization of the model is improved, and the accuracy of living body detection is improved.

Description

Living body detection method, living body detection device and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for living body detection, and a storage medium.

Background

The human face living body detection is a key step in the human face recognition process, and is directly related to the safety problem of user identity verification, but due to the invisibility and diversity of real data, the full utilization of training data to obtain a model with good generalization becomes the difficulty of current research.

In general, better generalization capability can be achieved by learning cross-domain invariant features in a common feature space, and the learning process relies on domain labels in the dataset that have been artificially pre-assigned.

However, because of the complex situations of different illumination, different attack types, different photographing devices and the like existing in the face living body data set, the manually defined domain label cannot accurately and comprehensively and truly reflect the data distribution of one domain, and the accuracy of living body detection is affected.

Disclosure of Invention

In view of the above, the present application provides a method for in vivo detection, which can effectively improve the accuracy of in vivo detection.

The first aspect of the present application provides a method for detecting a living body, which can be applied to a system or a program including a function of detecting a living body in a terminal device, and specifically includes:

acquiring a training sample, and determining a sample data set according to a depth map corresponding to the training sample, wherein the sample data set is configured based on different sample categories;

sampling sample characteristics in the sample data set to obtain base sample characteristics corresponding to each sample category, wherein the base sample characteristics are obtained by concentrating the sample characteristics, the base sample characteristics are used for determining first style information of the sample category, and the first style information is represented by means of mean and variance of the sample characteristics in the sample category;

Sampling each sample category by adopting preset distribution to obtain weight information, and linearly combining the first style information according to the weight information to obtain second style information;

carrying out feature recombination on the sample features in each sample category based on the second style information to obtain recombined features;

configuring multidimensional loss information according to the sample characteristics and the recombination characteristics so as to determine target loss information;

training a preset recognition model based on the target loss information to obtain a living body detection model, and carrying out living body detection on a target object according to the living body detection model.

Optionally, in some possible implementations of the present application, the obtaining a training sample and determining a sample data set according to a depth map corresponding to the training sample includes:

acquiring a living body sample and an attack sample in the training sample;

identifying the living body sample according to a depth estimation network to obtain a sample depth map;

configuring a depth map corresponding to the attack sample as a black base map;

and determining the sample data set according to the sample depth map and the black matrix.

Optionally, in some possible implementations of the present application, the identifying the living sample according to the depth estimation network to obtain a sample depth map includes:

determining a living body detection area in the living body sample;

adjusting the living body detection area based on preset expansion parameters to obtain an expansion identification area;

cutting the face image in the enlarged recognition area to obtain a sample to be processed;

and identifying the sample to be processed according to a depth estimation network so as to obtain the sample depth map.

Optionally, in some possible implementations of the present application, the adjusting the living body detection area based on a preset expansion parameter to obtain an expanded identification area includes:

acquiring an adjustment image set configured for the living body sample, the adjustment image set being obtained by adjusting the living body sample based on an enlarged parameter sequence;

determining face integrity information corresponding to images in the adjustment image set;

determining the preset expansion parameters in the expansion parameter sequence based on the face integrity information;

and adjusting the living body detection area based on the preset expansion parameters so as to obtain an expansion identification area.

Optionally, in some possible implementations of the present application, the sampling the sample features in the sample dataset to obtain the base sample features corresponding to the sample categories includes:

determining the feature to be processed corresponding to the training sample in the sample data set;

the feature to be processed is divided in the channel dimension in an average mode to obtain a first input feature and a second input feature;

inputting the first input feature into a static convolution branch to perform convolution calculation based on a static convolution kernel to obtain a static feature;

inputting the second input feature into a static convolution branch, and carrying out average pooling on the second input feature to obtain pooled features;

inputting the pooled features into a dynamic convolution network so as to match the dynamic convolution parameters of the training samples;

configuring a dynamic convolution kernel based on the dynamic convolution parameters, and adopting the dynamic convolution kernel to carry out convolution calculation on the second input characteristic to obtain a dynamic characteristic;

splicing the static features and the dynamic features to obtain the sample features;

and sampling the sample characteristics to obtain the substrate sample characteristics corresponding to each sample category.

Optionally, in some possible implementations of the present application, the method further includes:

and if the feature to be processed cannot be split evenly in the channel dimension, copying the feature to be processed to obtain the first input feature and the second input feature.

Optionally, in some possible implementations of the present application, the sampling the sample feature to obtain a base sample feature corresponding to each sample class includes:

respectively executing a furthest point sampling algorithm on the sample characteristics to obtain a mean value and a variance corresponding to the sample characteristics;

storing the mean and the variance corresponding to the sample characteristics in a memory pool to obtain first style information corresponding to the sample characteristics in different sample categories;

and configuring and obtaining the base sample characteristics based on the first style information.

Optionally, in some possible implementations of the present application, the inputting the pooled feature into a dynamic convolution network so as to match a dynamic convolution parameter of a training sample includes:

extracting low-frequency information in the pooling feature;

the low frequency information is input to the dynamic convolution network so as to match the dynamic convolution parameters of training samples.

Optionally, in some possible implementations of the present application, the configuring the multidimensional loss information according to the sample feature and the reorganization feature to determine target loss information includes:

inputting the sample characteristics into a normalization layer to obtain a first covariance matrix corresponding to the sample characteristics;

inputting the recombined characteristics into the normalization layer to obtain a second covariance matrix corresponding to the recombined characteristics;

calculating variance information based on the first covariance matrix and the second covariance matrix;

traversing variance information corresponding to sample features in the sample data set to obtain a variance matrix;

sorting according to the values of the elements in the variance matrix to obtain salient elements;

suppressing the salient elements to obtain the whitening loss information;

configuring the classification loss information according to the sample characteristics and sample categories corresponding to the recombination characteristics;

configuring the depth loss information according to the sample characteristics and the depth map corresponding to the reorganization characteristics;

the target loss information is configured based on the whitening loss information, the classification loss information, and the depth loss information.

Optionally, in some possible implementations of the present application, the configuring to obtain the target loss information based on the whitening loss information, the classification loss information, and the depth loss information includes:

acquiring weighting parameters configured for the training samples;

weighting the depth loss information based on the weighting parameters to obtain weighted depth information;

the target loss information is configured based on the whitening loss information, the classification loss information, and the weighted depth information.

Optionally, in some possible implementations of the present application, training the preset recognition model based on the target loss information to obtain a living body detection model, so as to perform living body detection on the target object according to the living body detection model, including:

training a preset recognition model based on the target loss information to obtain the living body detection model, and configuring the living body detection model in a back-end server;

responding to the triggering operation, and acquiring a face image of the target object acquired by the image acquisition equipment;

and transmitting the face image to the back-end server so as to identify the face image based on the living body detection model to obtain a living body detection result.

Optionally, in some possible implementations of the present application, the living body detection model is encapsulated in a front-end device, and the method further includes:

triggering the face image of the target object acquired by the image acquisition equipment in response to the initiation of the admission request acquired by the sensing module;

and inputting the face image into the front-end equipment to identify the face image based on the living body detection model so as to obtain a living body detection result.

A second aspect of the present application provides a living body detection apparatus including: the acquisition unit is used for acquiring training samples, determining a sample data set according to a depth map corresponding to the training samples, and configuring the sample data set based on different sample categories;

the processing unit is used for sampling sample characteristics in the sample data set to obtain base sample characteristics corresponding to each sample category, the base sample characteristics are obtained by concentrating the sample characteristics, the base sample characteristics are used for determining first style information of the sample category, and the first style information is represented by means of a mean value and a variance of the sample characteristics in the sample category;

the processing unit is further used for sampling each sample category by adopting preset distribution to obtain weight information, and linearly combining the first style information according to the weight information to obtain second style information;

The processing unit is further configured to perform feature recombination on the sample features in each sample category based on the second style information, so as to obtain recombined features;

the processing unit is further used for configuring multi-dimensional loss information according to the sample characteristics and the recombination characteristics so as to determine target loss information;

and the detection unit is used for training a preset identification model based on the target loss information to obtain a living body detection model so as to carry out living body detection on the target object according to the living body detection model.

Optionally, in some possible implementations of the present application, the acquiring unit is specifically configured to acquire a living sample and an attack sample in the training sample;

the acquisition unit is specifically configured to identify the living sample according to a depth estimation network, so as to obtain a sample depth map;

the obtaining unit is specifically configured to configure a depth map corresponding to the attack sample as a black background map;

the acquisition unit is specifically configured to determine the sample data set according to the sample depth map and the black matrix.

Optionally, in some possible implementations of the present application, the acquiring unit is specifically configured to determine a living body detection area in the living body sample;

The acquisition unit is specifically configured to adjust the living body detection area based on a preset expansion parameter, so as to obtain an expansion identification area;

the acquisition unit is specifically configured to cut the face image in the enlarged recognition area to obtain a sample to be processed;

the acquisition unit is specifically configured to identify the sample to be processed according to a depth estimation network, so as to obtain the sample depth map.

Optionally, in some possible implementations of the present application, the acquiring unit is specifically configured to acquire an adjustment image set configured for the living sample, where the adjustment image set is obtained by adjusting the living sample based on an expansion parameter sequence;

the acquiring unit is specifically configured to determine face integrity information corresponding to an image in the adjustment image set;

the acquiring unit is specifically configured to determine the preset expansion parameter in the expansion parameter sequence based on the face integrity information;

the acquiring unit is specifically configured to adjust the living body detection area based on the preset expansion parameter, so as to obtain an expansion identification area.

Optionally, in some possible implementations of the present application, the processing unit is specifically configured to determine a feature to be processed corresponding to a training sample in the sample dataset;

The processing unit is specifically configured to split the feature to be processed in the channel dimension on average, so as to obtain a first input feature and a second input feature;

the processing unit is specifically configured to input the first input feature into a static convolution branch, so as to perform convolution calculation based on a static convolution kernel to obtain a static feature;

the processing unit is specifically configured to input the second input feature into a static convolution branch, and average pooling the second input feature to obtain a pooled feature;

the processing unit is specifically configured to input the pooled feature into a dynamic convolution network, so as to match the dynamic convolution parameter of the training sample;

the processing unit is specifically configured to configure a dynamic convolution kernel based on the dynamic convolution parameter, and perform convolution calculation on the second input feature by adopting the dynamic convolution kernel to obtain a dynamic feature;

the processing unit is specifically configured to splice the static feature and the dynamic feature to obtain the sample feature;

the processing unit is specifically configured to sample the sample features to obtain base sample features corresponding to the sample categories.

Optionally, in some possible implementation manners of the present application, the processing unit is specifically configured to replicate the feature to be processed to obtain the first input feature and the second input feature if the feature to be processed cannot be split equally in a channel dimension.

Optionally, in some possible implementations of the present application, the processing unit is specifically configured to execute a furthest point sampling algorithm on the sample features respectively, so as to obtain a mean value and a variance corresponding to the sample features;

the processing unit is specifically configured to store the mean and the variance corresponding to the sample feature in the memory pool, so as to obtain first style information corresponding to the sample feature in different sample categories;

the processing unit is specifically configured to obtain the base sample feature based on the first style information configuration.

Optionally, in some possible implementations of the present application, the processing unit is specifically configured to extract low-frequency information in the pooling feature;

the processing unit is specifically configured to input the low-frequency information into the dynamic convolution network so as to match the dynamic convolution parameters of the training samples.

Optionally, in some possible implementations of the present application, the processing unit is specifically configured to input the sample feature into a normalization layer to obtain a first covariance matrix corresponding to the sample feature;

the processing unit is specifically configured to input the reorganization feature into the normalization layer, so as to obtain a second covariance matrix corresponding to the reorganization feature;

The processing unit is specifically configured to calculate variance information based on the first covariance matrix and the second covariance matrix;

the processing unit is specifically configured to traverse variance information corresponding to sample features in the sample dataset to obtain a variance matrix;

the processing unit is specifically configured to sort according to the values of the elements in the variance matrix, so as to obtain salient elements;

the processing unit is specifically configured to suppress the salient element to obtain the whitening loss information;

the processing unit is specifically configured to configure the classification loss information according to the sample characteristics and sample types corresponding to the reorganization characteristics;

the processing unit is specifically configured to configure the depth loss information according to the sample feature and a depth map corresponding to the reorganization feature;

the processing unit is specifically configured to obtain the target loss information based on the whitening loss information, the classification loss information, and the depth loss information.

Optionally, in some possible implementations of the present application, the processing unit is specifically configured to obtain a weighting parameter configured for the training sample;

The processing unit is specifically configured to weight the depth loss information based on the weighting parameter to obtain weighted depth information;

the processing unit is specifically configured to obtain the target loss information based on the whitening loss information, the classification loss information, and the weighted depth information.

Optionally, in some possible implementations of the present application, the detection unit is specifically configured to train a preset recognition model based on the target loss information to obtain the living body detection model, and configure the living body detection model in a back-end server;

the detection unit is specifically used for responding to the triggering operation and acquiring the face image of the target object acquired by the image acquisition equipment;

the detection unit is specifically configured to transmit the face image to the backend server, so as to identify the face image based on the living body detection model, and obtain a living body detection result.

Optionally, in some possible implementations of the present application, the detection unit is specifically configured to trigger a face image of the target object acquired by the image acquisition device in response to the initiation of the admission request acquired by the sensing module;

The detection unit is specifically configured to input the face image into the front-end device, so as to identify the face image based on the living body detection model, and obtain a living body detection result.

A third aspect of the present application provides a computer apparatus comprising: a memory, a processor, and a bus system; the memory is used for storing program codes; the processor is configured to perform the method of living body detection according to the first aspect or any one of the first aspects described above according to instructions in the program code.

A fourth aspect of the application provides a computer-readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of living body detection of the first aspect or any one of the first aspects described above.

According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from a computer-readable storage medium by a processor of a computer device, which executes the computer instructions, causing the computer device to perform the method of in-vivo detection provided in the first aspect or various alternative implementations of the first aspect described above.

From the above technical solutions, the embodiment of the present application has the following advantages:

determining a sample data set by acquiring training samples and according to depth maps corresponding to the training samples, wherein the sample data set is configured based on different sample categories; sampling sample characteristics in a sample data set to obtain base sample characteristics corresponding to each sample category, wherein the base sample characteristics are used for indicating first style information of the sample category, and the first style information is determined based on the mean value and the variance of the sample characteristics in the sample category; further sampling each sample category by adopting preset distribution to obtain weight information, and linearly combining the first style information according to the weight information to obtain second style information; carrying out feature recombination on sample features in each sample category based on the second style information to obtain recombined features; then configuring whitening loss information, classification loss information and depth loss information according to the sample characteristics and the recombination characteristics so as to determine target loss information; and training the preset recognition model based on the target loss information to obtain a living body detection model, so as to carry out living body detection on the target object according to the living body detection model. Because the sample self-adaptive characteristic enhancement is adopted to construct the recombination characteristic, thereby obtaining sample pairs with diversified styles, and corresponding whitening loss information is configured, the generalization capability of the model on the unseen target domain is enhanced, and the cross-domain invariant characteristic of each sample can be learned under the condition of not accessing any domain label, so that the generalization of the model is improved, and the accuracy of living body detection is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a network architecture diagram of the system operation of in vivo detection;

FIG. 2 is a schematic diagram of a living body detection flow according to an embodiment of the present application;

FIG. 3 is a flow chart of a method for in vivo detection according to an embodiment of the present application;

fig. 4 is a schematic view of a living body detection method according to an embodiment of the present application;

FIG. 5 is a schematic view of another living body detection method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a model structure of a method for in-vivo detection according to an embodiment of the present application;

FIG. 7 is a flow chart of another method for in vivo detection provided by an embodiment of the present application;

fig. 8 is a schematic structural view of a living body detection apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

Fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides a living body detection method and a related device, which can be applied to a system or a program containing a living body detection function in terminal equipment, wherein a sample data set is determined by acquiring a training sample and according to a depth map corresponding to the training sample, and the sample data set is configured based on different sample types; sampling sample characteristics in a sample data set to obtain base sample characteristics corresponding to each sample category, wherein the base sample characteristics are used for indicating first style information of the sample category, and the first style information is determined based on the mean value and the variance of the sample characteristics in the sample category; further sampling each sample category by adopting preset distribution to obtain weight information, and linearly combining the first style information according to the weight information to obtain second style information; carrying out feature recombination on sample features in each sample category based on the second style information to obtain recombined features; then configuring whitening loss information, classification loss information and depth loss information according to the sample characteristics and the recombination characteristics so as to determine target loss information; and training the preset recognition model based on the target loss information to obtain a living body detection model, so as to carry out living body detection on the target object according to the living body detection model. Because the sample self-adaptive characteristic enhancement is adopted to construct the recombination characteristic, thereby obtaining sample pairs with diversified styles, and corresponding whitening loss information is configured, the generalization capability of the model on the unseen target domain is enhanced, and the cross-domain invariant characteristic of each sample can be learned under the condition of not accessing any domain label, so that the generalization of the model is improved, and the accuracy of living body detection is improved.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "includes" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that the method for detecting a living body provided by the present application may be applied to a system or a program including a function of detecting a living body in a terminal device, for example, security application, specifically, the system for detecting a living body may operate in a network architecture as shown in fig. 1, which is a network architecture diagram in which the system for detecting a living body operates, as shown in fig. 1, the system for detecting a living body may provide a procedure of detecting a living body with a plurality of information sources, that is, send a corresponding living body image to a server through a triggering operation on a terminal side, so that the server returns a corresponding detection result; it will be appreciated that various terminal devices are shown in fig. 1, the terminal devices may be computer devices, in an actual scenario, there may be more or less terminal devices participating in the living body detection process, and the specific number and types are not limited herein, and in addition, one server is shown in fig. 1, but in an actual scenario, there may also be participation of multiple servers, and the specific number of servers is determined by the actual scenario.

In this embodiment, the server may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligence platforms. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, and the like. The terminals and servers may be directly or indirectly connected by wired or wireless communication, and the terminals and servers may be connected to form a blockchain network, which is not limited herein.

It will be appreciated that the above-described system for in-vivo detection may be operated on a personal mobile terminal, for example: the security application can be used as an application which can also be run on a server, and can also be used as a processing result which can be run on third-party equipment to provide living body detection so as to obtain living body detection of an information source; the specific living body detection system may be in a program form, may also be operated as a system component in the device, and may also be used as a cloud service program, where the specific operation mode is determined according to the actual scenario and is not limited herein.

In order to solve the above-mentioned problems, the present application proposes a method for detecting a living body, which is applied to a flow frame of living body detection shown in fig. 2, as shown in fig. 2, and is a flow frame diagram of living body detection provided in an embodiment of the present application, a user triggers a process of living body detection through an interaction operation with a terminal, and a server performs a convolution of sample characteristics and performs adaptive sample enhancement to perform learning of bilateral whitening loss, so as to obtain a living body detection model; and executing the living body detection process sent by the terminal according to the trained living body detection model, and returning a corresponding detection result.

In the embodiment, domain generalization learning is not performed depending on a domain label any more, but dynamic domain generalization face living body detection is performed on each sample in a self-adaptive manner, sample pairs with diversified styles are constructed through sample self-adaptive feature enhancement, and generalization capability is further improved through a displayed sample self-adaptive whitening loss function.

It can be understood that the method provided by the application can be a program writing method, which is used as a processing logic in a hardware system, and can also be used as a living body detection device, and the processing logic can be realized in an integrated or external mode. As one implementation, the living body detection device obtains training samples and determines a sample data set according to a depth map corresponding to the training samples, wherein the sample data set is configured based on different sample categories; sampling sample characteristics in a sample data set to obtain base sample characteristics corresponding to each sample category, wherein the base sample characteristics are used for indicating first style information of the sample category, and the first style information is determined based on the mean value and the variance of the sample characteristics in the sample category; further sampling each sample category by adopting preset distribution to obtain weight information, and linearly combining the first style information according to the weight information to obtain second style information; carrying out feature recombination on sample features in each sample category based on the second style information to obtain recombined features; then configuring whitening loss information, classification loss information and depth loss information according to the sample characteristics and the recombination characteristics so as to determine target loss information; and training the preset recognition model based on the target loss information to obtain a living body detection model, so as to carry out living body detection on the target object according to the living body detection model. Because the sample self-adaptive characteristic enhancement is adopted to construct the recombination characteristic, thereby obtaining sample pairs with diversified styles, and corresponding whitening loss information is configured, the generalization capability of the model on the unseen target domain is enhanced, and the cross-domain invariant characteristic of each sample can be learned under the condition of not accessing any domain label, so that the generalization of the model is improved, and the accuracy of living body detection is improved.

The scheme provided by the embodiment of the application relates to an artificial intelligence computer vision technology, and is specifically described by the following embodiments:

with reference to the above flowchart, the method for in-vivo detection in the present application will be described with reference to fig. 3, and fig. 3 is a flowchart of a method for in-vivo detection provided in an embodiment of the present application, where the method for managing may be performed by a participant, and the embodiment of the present application at least includes the following steps:

301. and acquiring a training sample, and determining a sample data set according to a depth map corresponding to the training sample.

In this embodiment, the sample data set is configured based on different sample categories including living body types, i.e., real person samples; but also the type of attack, i.e. non-living or other samples that are not real persons.

Specifically, for the construction of a sample data set, depth maps are configured, and depth maps corresponding to different sample categories are different; firstly, acquiring a living body sample and an attack sample in a training sample; then, identifying a living sample according to the depth estimation network to obtain a sample depth map; the depth map corresponding to the attack sample is configured as a black base map, namely, the depth map is calculated for each real person picture, and the black base map with the same size is used as the depth map of each attack picture; a sample dataset is thermally determined from the sample depth map and the black matrix.

It can be understood that, the sample data set used for training is constructed, and image clipping is performed from positive samples (true persons) and negative samples (attacks) respectively, but the positive and negative samples in the sample data set are not required to be matched one by one, because the label configuration is not required in the embodiment, and the characteristic enhancement process of the sample dimension is performed.

In one possible scenario, for determining a sample depth map, preprocessing may be performed on a living body sample, because a face area marked by the sample for face recognition may only include a five-sense organ area, and for a scenario with a side face and strong illumination, omission of face elements may occur, where a process shown in fig. 4 may be adopted for the preprocessing process of the sample, and fig. 4 is a schematic diagram of a scenario of a living body detection method provided by an embodiment of the present application; namely, first, a living body detection area A1 in a living body sample is determined; then, adjusting the living body detection area based on preset expansion parameters to obtain an expansion identification area; cutting the face image in the enlarged recognition area to obtain a sample to be processed; and identifying the sample to be processed according to the depth estimation network to obtain a sample depth map. For example, after obtaining the user image, firstly framing the region where the face of the user is located by using a face detection technology, and enlarging the region by 1.8 times (preset enlarging parameters) with the region as the center to obtain more background content so as to improve the integrity of the face, and cutting out the enlarged region. And then calculating the cut true human face through a depth estimation network to obtain a depth map corresponding to the face, wherein the true human picture shown in fig. 4 has the depth map, and the depth map corresponding to the attack picture is a black base map.

In addition, for determining the preset expansion parameters, that is, for improving the integrity of the face elements, an adjustment image set configured for the living sample may be obtained first, where the adjustment image set is obtained by adjusting the living sample based on the expansion parameter sequence, for example, an image set obtained by amplifying 1.5-2.5; then determining and adjusting the face integrity information corresponding to the images in the image set, namely whether all feature elements of the face are contained or not; determining preset expansion parameters in the expansion parameter sequence based on the face integrity information; and then the living body detection area is adjusted based on the preset expansion parameters so as to obtain an expansion identification area, thereby improving the accuracy of the expansion identification area and avoiding the omission of face elements.

302. And sampling the sample characteristics in the sample data set to obtain the substrate sample characteristics corresponding to each sample category.

In this embodiment, the base sample features are obtained by sampling the sample features, that is, the base sample features may be understood as concentrated sample features, for example, 1w sample features and 1000 base sample features; further, the base sample feature is used to determine first style information of the sample category, the first style information is represented by means of a mean value and a variance of the sample feature in the sample category, that is, the style information is used to reflect the style of the sample, and the style of the sample indicates the texture distribution of the sample, which is a global texture feature.

In one possible scenario, before the style information is enhanced, the determining process of the sample characteristics in the sample data set may be performed by adopting an adaptive dynamic convolution kernel, that is, first determining the to-be-processed characteristics corresponding to the training samples in the sample data set; then, the feature to be processed is divided in the channel dimension averagely to obtain a first input feature and a second input feature; inputting the first input feature into a static convolution branch to perform convolution calculation based on a static convolution kernel to obtain a static feature; then inputting the second input feature into a static convolution branch, and carrying out average pooling on the second input feature to obtain pooled features; further, inputting the pooled features into a dynamic convolution network so as to match the dynamic convolution parameters of the training samples; then, configuring a dynamic convolution kernel based on the dynamic convolution parameters, and adopting the dynamic convolution kernel to carry out convolution calculation on the second input characteristic to obtain dynamic characteristics; splicing the static features and the dynamic features to obtain sample features; and further sampling the sample characteristics to obtain the substrate sample characteristics corresponding to each sample category.

In particular, with the above-described configuration of the dynamic convolution kernel, it is difficult to extract sample-specific features for each sample by one static filter in consideration of the diversity of samples in a plurality of source domains. Therefore, the dynamic convolution kernel generator with sample self-adaptation is designed to automatically generate the dynamic filter, so as to help the static filter learn the overall sample specific characteristics to further improve generalization, and the process of extracting the characteristics by the dynamic convolution kernel generator is as follows, and the formulas of the static convolution branch and the dynamic convolution kernel branch are as follows:

Wherein, the liquid crystal display device comprises a liquid crystal display device,is a static filter(static convolution kernel),>for sample X ⁱ An adaptive dynamic filter (dynamic convolution kernel).

It will be appreciated that given an input feature X ⁱ The first half of the alignment is used to generate the dynamic convolution kernel Wi, and the second half is used to generate a common feature through a static filter. The formula of this process is as follows:

the former term of the formula is a static common feature generated by the second half channel, and the latter term is a dynamic convolution feature generated by the first half channel.

Further, the output feature Fi of the module is a sample feature obtained by splicing the static feature and the dynamic feature, that is, the spliced feature of the two branch output features in the channel dimension, and the formula is as follows:

wherein concatate in the formula represents a feature stitching operation in the channel dimension.

In one possible scenario, if the feature to be processed cannot be split equally in the channel dimension (for example, the number of channels is 3), the feature to be processed is copied to obtain the first input feature and the second input feature, so that the situation that the two branches cannot be aligned is avoided.

In the above embodiment, the average pooling layer is adopted to reduce the size of the feature map, that is, the calculated amount and the required video memory can be reduced; furthermore, the dynamic convolution can be performed by adopting the low-frequency information in the features, so that the calculated amount is further reduced, and the low-frequency information in the features can represent most of the features, namely, the low-frequency information in the pooled features is extracted firstly; the low frequency information is then input into a dynamic convolution network to match the dynamic convolution parameters of the training samples, thereby reducing the data throughput of the convolution process.

As can be appreciated, by means of the diversified sample-specific features output by the dynamic convolution kernel generator, the present embodiment proposes a class-based style reorganization module to further construct pairs of style-diversified samples in the feature space for sample-level feature whitening. This is because domain generalization face living detection uses an adaptive instance normalization layer to perform style enhancement, however, the feature enhancement method only randomly swaps or mixes source styles of different samples without considering the frequency of the different styles of source domains or class information of the source samples.

Therefore, the category-based style reorganization module of the embodiment fully utilizes the source domain styles of each sample, including high-frequency styles and rare styles, so as to improve the diversity of style enhancement. In addition, the embodiment introduces the category information into the style reorganization module so as to avoid negative influence caused by exchanging the style information among different categories.

Specifically, for the determination process of the base sample feature, the furthest point sampling algorithm may be firstly executed on the sample feature respectively to obtain the mean and variance corresponding to the sample feature; then, the mean and the variance corresponding to the sample characteristics are stored in a memory pool to obtain first style information corresponding to the sample characteristics in different sample categories; and obtaining the base sample characteristics based on the first style information configuration. I.e. the furthest point sampling algorithm is performed across the entire dataset, L mutually dissimilar base sample features are obtained for each class. The style information of the base sample features is stored in a memory pool for dynamic update, i.e. the memory pool (style base) mainly stores statistical information of the mean and variance of the base sample features:

wherein the former term in the formula represents the mean value of the features, the latter term represents the variance of the features, the input features are F, H and W, and the heights and widths of the feature graphs.

It can be understood that the furthest point sampling algorithm can also be other sampling algorithms such as random sampling, and the adoption of the furthest point sampling algorithm has the advantage of comprehensive feature coverage.

303. And sampling each sample category by adopting preset distribution to obtain weight information, and linearly combining the first style information according to the weight information to obtain second style information.

In this embodiment, the process of sampling each sample class by using a preset distribution to obtain weight information, i.e. to aggregate the above base sample features, i.e. for each sample class, sampling a weight Wc for aggregating the above base sample features from a dirichlet distribution, and using this weight Wc for the set of base samplesLinear combination is carried out to obtain the mean value u of the new style _augc Sum of variances sigma _augc ：

It can be understood that the preset distribution can be a high-dimensional continuous probability distribution mode such as uniform distribution and dirichlet distribution, and the dirichlet distribution sampling is verified to be more diversified, so that the effect is better, and the specific distribution mode is determined according to an actual scene.

304. And carrying out feature recombination on the sample features in each sample category based on the second style information to obtain recombined features.

In this embodiment, the feature reorganization process is based on the newly generated second style information For the original sample feature F _org Performing characteristic recombination, wherein the recombination is characterized by F _aug ：

It can be understood that the mean and variance in the second style information should consider the category information to perform feature enhancement between the same categories:

wherein c represents category, r and s represent real person and attack respectively by developing the detail of category, namely, the feature enhancement between the same category is carried out, and the negative influence caused by the exchange of style information between different categories is avoided.

305. And configuring the multi-dimensional loss information according to the sample characteristics and the recombination characteristics so as to determine target loss information.

In this embodiment, the multidimensional loss information includes whitening loss information, classification loss information, and depth loss information; since the purpose of feature whitening is to remove redundant information of the input data, the corresponding whitening loss information needs to be configured to perform a bilateral constraint process based on the sample features and the reconstructed features.

Specifically, for the whitening loss information, namely the adaptive whitening loss of the bilateral sample, the covariance matrix of the feature map of the stored style information is considered, so that the adaptive whitening loss of the bilateral example is introduced as an explicit constraint to adaptively eliminate the style sensitive feature. Firstly, inputting sample characteristics into a normalization layer to obtain a first covariance matrix corresponding to the sample characteristics; then inputting the recombined characteristics into a normalization layer to obtain a second covariance matrix corresponding to the recombined characteristics; i.e. the feature map X is sent to an instance normalization layer to obtain normalized features Xn. The covariance matrix Σn of the feature Xn is thus as follows:

further, variance information is calculated based on the first covariance matrix and the second covariance matrix The method comprises the steps of carrying out a first treatment on the surface of the And traversing the variance information corresponding to the sample characteristics in the sample data set to obtain a variance matrix. I.e. the covariance sigma org of the original sample feature and the covariance difference sigma aug of the reconstructed feature (i.e. the enhancement feature) are calculated to obtain their variance sigma ² The calculation process is as follows:

/>

then, all samples are traversed to obtain a variance matrix diagram V for which the selectivity mask M is derived.

In order to select style sensitive features, sorting can be performed according to the values of elements in the variance matrix to obtain salient elements; the salient elements are then suppressed to obtain whitening loss information. Specifically, the element values of the variance matrix V are ordered from large to small and k% of the top-ranked elements are extracted, the top-ranked positions are set at 1, and the rest positions are set at 0:

in one possible scenario, the value of k is between 0.3% and 0.6% determined by the effect of the test on the data validation set, the specific value being dependent on the actual scenario.

Then, based on the above mask M containing the salient elements, the element values sensitive to the style (i.e., the position of m=1 in the mask M) in the feature covariance (Σorg, Σaug) are selectively suppressed, that is, the element values sensitive to the style are required to approach 0, so that the feature insensitive to the style can be learned, and the formula can be expressed as:

It will be appreciated that this loss is bilateral in that the unidirectional whitening loss does not guarantee that elements of feature covariance remain insensitive to elements that are style insensitive after style transformation.

In addition, for the configuration of the classification loss information, that is, the process of configuring the classification loss information according to the sample category corresponding to the sample feature and the reorganization feature, specifically, through a classification loss (cross entropy loss), the task related feature is learned, so that the classifier Cls can distinguish between the living body and the attack, and the formula can be expressed as follows:

in addition, the depth loss information is configured according to the depth map corresponding to the sample characteristics and the reorganization characteristics; specifically, by a depth prediction loss (MSE loss), the depth predictor Dep predicts the depth of the real person, predicts zero elements for attack, and as additional supervision, the formula can be expressed as:

in order for the model to learn features consistent with tasks, the feature-enhanced branches are also used for supervision. The branch is removed during the test phase. For the determination of the target loss information, i.e. the total loss function, a weighting parameter λ configured for the training samples can be obtained; then weighting the depth loss information based on the weighting parameters to obtain weighted depth information; and obtaining target loss information based on the whitening loss information, the classification loss information, and the weighted depth information configuration. I.e. the total loss function is a weighted sum, the formula of which can be expressed as:

Where lambda is the weighting coefficient,classification loss information representing the correspondence of sample characteristics, < >>Classification loss information representing the correspondence of the reorganization characteristic, < >>Depth loss information representing the correspondence of a sample feature, < ->Depth loss information representing the correspondence of the reorganization feature, < >>Indicating whitening loss information. />

According to the embodiment, the self-adaptive whitening loss of the bilateral instance is introduced as an explicit constraint to self-adaptively eliminate the style sensitive characteristics, and the effectiveness of the training process is improved by combining the classification loss information corresponding to classification and the depth loss information corresponding to the depth map.

306. Training the preset recognition model based on the target loss information to obtain a living body detection model, and carrying out living body detection on the target object according to the living body detection model.

In this embodiment, the preset recognition model is an untrained living body detection model, or a living body detection model for previous sample detection, and through training the model, the living body detection model can overcome the interference of the attack object, and the accuracy of living body detection on the target object is improved.

Specifically, the living body detection process of the present embodiment combined with the above embodiment is shown in fig. 5, and fig. 5 is a schematic view of a scene of another living body detection method according to the embodiment of the present application; the figure shows the technical process of using a sample-adaptive dynamic convolution kernel, sample-adaptive feature style enhancement, and sample-adaptive feature whitening loss for an input sample. First, the present embodiment contemplates a sample-adaptive dynamic convolution kernel generator to automatically generate each sample-specific convolution filter, in combination with a static filter, to assist in comprehensive sample-specific feature learning. Then, the embodiment introduces a sample-adaptive-based category style recombination module to generate samples of style diversification in the feature space for simulating the domain offset of the sample level; furthermore, the embodiment provides the self-adaptive whitening loss of the bilateral sample to display the characteristics of removing the sensitivity of the style and improve the generalization capability.

In a possible scenario, the living body detection model of the embodiment is shown in fig. 6, and fig. 6 is a schematic diagram of a model structure of a living body detection method according to an embodiment of the present application; the living body detection model comprises a dynamic convolution kernel generator (DKG), a category-based style reorganization module (CSA) and a double-sided sample adaptive whitening loss module (BIAW). Namely, a sample pair with diversified styles is generated for each sample, and then the characteristics sensitive to the styles are adaptively eliminated, so that domain generalization is realized. The training of the present model may not rely on any manually defined domain labels. All three modules are used to optimize and update the gradient during the training phase, the bilateral sample adaptive whitening loss module and the category-based style reorganization module are removed during the testing phase, and the test data stream is propagated forward according to the solid arrows.

In addition, for the application scenario of the living body detection model, the technology of combining the human face living body detection and the deep learning method is mature. In-vivo detection is often used in combination with other techniques, such as a human face, a human body, etc., during actual use. The living body detection is used as a first defense line, and an important link of authentication safety is controlled. At present, the face kernel has been applied to a plurality of services: remote check of bank, face payment, remote authentication of driver, and community access control system.

In one possible scenario, the process of in vivo detection may include back-end detection or front-end detection.

For the back-end detection process, the method can comprise the scenes of bank account opening, mobile phone payment and the like; firstly, training a preset identification model based on target loss information to obtain a living body detection model, and configuring the living body detection model in a back-end server; then, responding to triggering operation (such as account opening operation, payment operation and the like), and acquiring a face image of the target object acquired by the image acquisition equipment; and then transmitting the face image to a back-end server to identify the face image based on the living body detection model to obtain a living body detection result.

Specifically, in the process of remote account opening of a bank, in order to confirm the true identity of an account opening person, a face checking technology is also adopted, and a living body detection algorithm is also applied. The specific flow is as follows: firstly, a user can acquire an image containing a human face through a camera at the front end of an application. The front end transmits the image to the back end and invokes an algorithm. The algorithm will perform a live detection and return the result to the front end. If the living body is judged, the living body passes, otherwise, the nuclear body fails.

In addition, the human face check body plays an important role in the human face payment process, wherein the living body detection is an important link for controlling the payment safety, and the high-precision living body detection method can reject some illegal attacks to try to carry out transactions, so that the safety of the transactions is ensured, and the interests of companies and individuals are not damaged.

The front-end detection process can comprise access scenes such as entrance guard and the like; the living body detection model is packaged in front-end equipment, and firstly, a response module (induction equipment such as an access card) acquires the initiation of an access request, and a face image of a target object acquired by image acquisition equipment is triggered; and then inputting the face image into front-end equipment to identify the face image based on the living body detection model so as to obtain a living body detection result.

Specifically, in the door control system, the adopted nuclear strategy is different from that in the bank, and in order to improve nuclear efficiency, the district door control system sends the face image obtained by the front end directly into a packaged model, directly judges and feeds back the result, thereby avoiding long-time waiting of entering personnel.

In combination with the above embodiment, by acquiring a training sample and determining a sample data set according to a depth map corresponding to the training sample, the sample data set is configured based on different sample types; sampling sample characteristics in a sample data set to obtain base sample characteristics corresponding to each sample category, wherein the base sample characteristics are used for indicating first style information of the sample category, and the first style information is determined based on the mean value and the variance of the sample characteristics in the sample category; further sampling each sample category by adopting preset distribution to obtain weight information, and linearly combining the first style information according to the weight information to obtain second style information; carrying out feature recombination on sample features in each sample category based on the second style information to obtain recombined features; then configuring whitening loss information, classification loss information and depth loss information according to the sample characteristics and the recombination characteristics so as to determine target loss information; and training the preset recognition model based on the target loss information to obtain a living body detection model, so as to carry out living body detection on the target object according to the living body detection model. Because the sample self-adaptive characteristic enhancement is adopted to construct the recombination characteristic, thereby obtaining sample pairs with diversified styles, and corresponding whitening loss information is configured, the generalization capability of the model on the unseen target domain is enhanced, and the cross-domain invariant characteristic of each sample can be learned under the condition of not accessing any domain label, so that the generalization of the model is improved, and the accuracy of living body detection is improved.

The above-described embodiments introduce a procedure of performing living body detection in different scenes, and in a front-end scene, a diversified attack type may not be dealt with due to processing performance limitation of the front-end apparatus, the scene is described below. Referring to fig. 7, fig. 7 is a flowchart of another method for detecting a living body according to an embodiment of the application, where the method includes at least the following steps:

701. and the acquisition sensing module acquires the admission instruction collected in the process of acquiring the admission request.

In this embodiment, since the front end is an access control system, the processing capacity of the front end is prioritized, and the process of updating the back end may be performed here, that is, the detection condition of the front end is periodically acquired by the back end, for example, one month, half year, etc. Thereby collecting the related object information indicated by the admission instruction of the front end in the time period.

702. And detecting the updating information of the attack type collected by the back end.

In this embodiment, the attack type update information is updated continuously, and at this time, after the attack type update is detected, the relevant feature of the attack is determined, that is, the update information.

Specifically, the update information of the attack type can be detected by the current back end or can be shared by other back ends from the cloud, so that a large-scale collaborative protection process is realized.

703. And if the admittance object contained in the admittance instruction accords with the object indicated by the updating information of the attack type, updating the living body detection model.

In this embodiment, if the admitted object included in the admitted instruction accords with the object indicated by the update information of the attack type, that is, indicates that the front-end device may have been attacked, at this time, model update needs to be performed on the living body detection model encapsulated therein.

704. And configuring the updated living body detection model into front-end equipment to perform living body detection.

In this embodiment, the configuration of the updated living body detection model to the front-end device may be field configuration performed by related personnel, or may be model update performed by the front-end device opening a network or connecting the network through a third-party device.

By the dynamic model updating process of the embodiment, the possible loophole situation in the front-end detection scene is avoided, and the accuracy of living body detection is improved.

In order to better implement the above-described aspects of the embodiments of the present application, the following provides related apparatuses for implementing the above-described aspects. Referring to fig. 8, fig. 8 is a schematic structural diagram of a living body detection device according to an embodiment of the present application, and a detection device 800 includes:

An obtaining unit 801, configured to obtain a training sample, and determine a sample data set according to a depth map corresponding to the training sample, where the sample data set is configured based on different sample types;

a processing unit 802, configured to sample features in the sample dataset to obtain base sample features corresponding to each sample category, where the base sample features are obtained by concentrating the sample features, and the base sample features are used to determine first style information of the sample category, and the first style information is represented by a mean and a variance of sample features in the sample category;

the processing unit 802 is further configured to sample each sample class with a preset distribution to obtain weight information, so as to linearly combine the first style information according to the weight information to obtain second style information;

the processing unit 802 is further configured to perform feature recombination on the sample features in each sample category based on the second style information, so as to obtain recombined features;

the processing unit 802 is further configured to perform configuration of multi-dimensional loss information according to the sample feature and the reorganization feature, so as to determine target loss information;

And the detection unit 803 is used for training a preset recognition model based on the target loss information to obtain a living body detection model so as to carry out living body detection on the target object according to the living body detection model.

Optionally, in some possible implementations of the present application, the obtaining unit 801 is specifically configured to obtain a living sample and an attack sample in the training sample;

the obtaining unit 801 is specifically configured to identify the living sample according to a depth estimation network, so as to obtain a sample depth map;

the obtaining unit 801 is specifically configured to configure a depth map corresponding to the attack sample as a black background map;

the obtaining unit 801 is specifically configured to determine the sample data set according to the sample depth map and the black matrix.

Optionally, in some possible implementations of the present application, the acquiring unit 801 is specifically configured to determine a living body detection area in the living body sample;

the acquiring unit 801 is specifically configured to adjust the living body detection area based on a preset expansion parameter, so as to obtain an expansion identification area;

the acquiring unit 801 is specifically configured to cut the face image in the enlarged recognition area to obtain a sample to be processed;

The obtaining unit 801 is specifically configured to identify the sample to be processed according to a depth estimation network, so as to obtain the sample depth map.

Optionally, in some possible implementations of the present application, the obtaining unit 801 is specifically configured to obtain an adjustment image set configured for the living sample, where the adjustment image set is obtained by adjusting the living sample based on an expansion parameter sequence;

the acquiring unit 801 is specifically configured to determine face integrity information corresponding to an image in the adjusted image set;

the acquiring unit 801 is specifically configured to determine the preset expansion parameter in the expansion parameter sequence based on the face integrity information;

the obtaining unit 801 is specifically configured to adjust the living body detection area based on the preset expansion parameter, so as to obtain an expanded identification area.

Optionally, in some possible implementations of the present application, the processing unit 802 is specifically configured to determine a feature to be processed corresponding to a training sample in the sample dataset;

the processing unit 802 is specifically configured to split the feature to be processed in the channel dimension on average, so as to obtain a first input feature and a second input feature;

The processing unit 802 is specifically configured to input the first input feature into a static convolution branch, so as to perform convolution calculation based on a static convolution kernel to obtain a static feature;

the processing unit 802 is specifically configured to input the second input feature into a static convolution branch, and average pooling the second input feature to obtain a pooled feature;

the processing unit 802 is specifically configured to input the pooled feature into a dynamic convolution network so as to match the dynamic convolution parameters of the training samples;

the processing unit 802 is specifically configured to configure a dynamic convolution kernel based on the dynamic convolution parameter, and perform convolution calculation on the second input feature by adopting the dynamic convolution kernel to obtain a dynamic feature;

the processing unit 802 is specifically configured to splice the static feature and the dynamic feature to obtain the sample feature;

the processing unit 802 is specifically configured to sample the sample features to obtain base sample features corresponding to the sample categories.

Optionally, in some possible implementations of the present application, the processing unit 802 is specifically configured to replicate the feature to be processed to obtain the first input feature and the second input feature if the feature to be processed cannot be split equally in a channel dimension.

Optionally, in some possible implementations of the present application, the processing unit 802 is specifically configured to perform a most distant point sampling algorithm on the sample features to obtain a mean value and a variance corresponding to the sample features;

the processing unit 802 is specifically configured to store the mean and the variance corresponding to the sample feature in a memory pool, so as to obtain first style information corresponding to the sample feature in different sample categories;

the processing unit 802 is specifically configured to obtain the base sample feature based on the first style information configuration.

Optionally, in some possible implementations of the present application, the processing unit 802 is specifically configured to extract low-frequency information in the pooling feature;

the processing unit 802 is specifically configured to input the low frequency information into the dynamic convolution network so as to match the dynamic convolution parameters of training samples.

Optionally, in some possible implementations of the present application, the processing unit 802 is specifically configured to input the sample feature into a normalization layer to obtain a first covariance matrix corresponding to the sample feature;

the processing unit 802 is specifically configured to input the reorganization feature into the normalization layer to obtain a second covariance matrix corresponding to the reorganization feature;

The processing unit 802 is specifically configured to calculate variance information based on the first covariance matrix and the second covariance matrix;

the processing unit 802 is specifically configured to traverse variance information corresponding to sample features in the sample dataset to obtain a variance matrix;

the processing unit 802 is specifically configured to sort according to the values of the elements in the variance matrix, so as to obtain salient elements;

the processing unit 802 is specifically configured to suppress the salient element to obtain the whitening loss information;

the processing unit 802 is specifically configured to configure the classification loss information according to the sample characteristics and sample types corresponding to the reorganization characteristics;

the processing unit 802 is specifically configured to configure the depth loss information according to the sample feature and a depth map corresponding to the reorganization feature;

the processing unit 802 is specifically configured to obtain the target loss information based on the whitening loss information, the classification loss information, and the depth loss information.

Optionally, in some possible implementations of the present application, the processing unit 802 is specifically configured to obtain a weighting parameter configured for the training sample;

The processing unit 802 is specifically configured to weight the depth loss information based on the weighting parameter to obtain weighted depth information;

the processing unit 802 is specifically configured to obtain the target loss information based on the whitening loss information, the classification loss information, and the weighted depth information.

Optionally, in some possible implementations of the present application, the detection unit 803 is specifically configured to train a preset recognition model based on the target loss information to obtain the living body detection model, and configure the living body detection model in a back-end server;

the detection unit 803 is specifically configured to obtain a face image of the target object acquired by the image acquisition device in response to a triggering operation;

the detection unit 803 is specifically configured to transmit the face image to the backend server, so as to identify the face image based on the living body detection model, and obtain a living body detection result.

Optionally, in some possible implementations of the present application, the detecting unit 803 is specifically configured to trigger a face image of the target object acquired by an image acquisition device in response to the induction module acquiring initiation of the admission request;

The detection unit 803 is specifically configured to input the face image into the front-end device, so as to identify the face image based on the living body detection model, and obtain a living body detection result.

The embodiment of the present application further provides a terminal device, as shown in fig. 9, which is a schematic structural diagram of another terminal device provided in the embodiment of the present application, for convenience of explanation, only the portion related to the embodiment of the present application is shown, and specific technical details are not disclosed, please refer to the method portion of the embodiment of the present application. The terminal may be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (personal digital assistant, PDA), a point of sale (POS), a vehicle-mounted computer, and the like, taking the terminal as an example of the mobile phone:

fig. 9 is a block diagram showing a part of the structure of a mobile phone related to a terminal provided by an embodiment of the present application. Referring to fig. 9, the mobile phone includes: radio Frequency (RF) circuitry 910, memory 920, input unit 930, display unit 940, sensor 950, audio circuitry 960, wireless fidelity (wireless fidelity, wiFi) module 970, processor 980, and power source 990. It will be appreciated by those skilled in the art that the handset construction shown in fig. 9 is not limiting of the handset and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The following describes the components of the mobile phone in detail with reference to fig. 9:

the RF circuit 910 may be used for receiving and transmitting signals during a message or a call, and particularly, after receiving downlink information of a base station, the signal is processed by the processor 980; in addition, the data of the design uplink is sent to the base station. Typically, the RF circuitry 910 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (low noise amplifier, LNA), a duplexer, and the like. In addition, the RF circuitry 910 may also communicate with networks and other devices via wireless communications. The wireless communications may use any communication standard or protocol including, but not limited to, global system for mobile communications (global system of mobile communication, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), long term evolution (long term evolution, LTE), email, short message service (short messaging service, SMS), and the like.

The memory 920 may be used to store software programs and modules, and the processor 980 performs various functional applications and data processing by operating the software programs and modules stored in the memory 920. The memory 920 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, memory 920 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The input unit 930 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the handset. In particular, the input unit 930 may include a touch panel 931 and other input devices 932. The touch panel 931, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on or thereabout the touch panel 931 using a finger, a stylus, or any other suitable object or accessory, and spaced touch operations within a certain range on the touch panel 931), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel 931 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 980, and can receive commands from the processor 980 and execute them. In addition, the touch panel 931 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 930 may include other input devices 932 in addition to the touch panel 931. In particular, other input devices 932 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 940 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The display unit 940 may include a display panel 941, and alternatively, the display panel 941 may be configured in the form of a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 931 may overlay the display panel 941, and when the touch panel 931 detects a touch operation thereon or thereabout, the touch operation is transferred to the processor 980 to determine a type of touch event, and then the processor 980 provides a corresponding visual output on the display panel 941 according to the type of touch event. Although in fig. 9, the touch panel 931 and the display panel 941 are implemented as two separate components for the input and output functions of the mobile phone, in some embodiments, the touch panel 931 may be integrated with the display panel 941 to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 950, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 941 according to the brightness of ambient light, and the proximity sensor may turn off the display panel 941 and/or the backlight when the mobile phone moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for applications of recognizing the gesture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the handset are not described in detail herein.

Audio circuitry 960, speaker 961, microphone 962 may provide an audio interface between a user and a cell phone. Audio circuit 960 may transmit the received electrical signal converted from audio data to speaker 961, where it is converted to a sound signal by speaker 961 for output; on the other hand, microphone 962 converts the collected sound signals into electrical signals, which are received by audio circuit 960 and converted into audio data, which are processed by audio data output processor 980 for transmission to, for example, another cell phone via RF circuit 910 or for output to memory 920 for further processing.

WiFi belongs to a short-distance wireless transmission technology, and a mobile phone can help a user to send and receive emails, browse webpages, access streaming media and the like through a WiFi module 970, so that wireless broadband Internet access is provided for the user. Although fig. 9 shows a WiFi module 970, it is understood that it does not belong to the necessary constitution of the handset, and can be omitted entirely as required within the scope of not changing the essence of the embodiment.

The processor 980 is a control center of the handset, connects various parts of the entire handset using various interfaces and lines, and performs various functions and processes of the handset by running or executing software programs and/or modules stored in the memory 920 and invoking data stored in the memory 920, thereby performing overall detection of the handset. Optionally, processor 980 may include one or more processing units; alternatively, processor 980 may integrate an application processor with a modem processor, where the application processor primarily handles operating systems, user interfaces, applications programs, etc., and the modem processor primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 980.

The handset further includes a power supply 990 (e.g., a battery) for powering the various components, optionally in logical communication with the processor 980 through a power management system, such as by performing charge, discharge, and power management functions via the power management system.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which will not be described herein.

In the embodiment of the present application, the processor 980 included in the terminal further has a function of executing each step of the page processing method as described above.

Referring to fig. 10, fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1000 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 1022 (e.g., one or more processors) and a memory 1032, one or more storage media 1030 (e.g., one or more mass storage devices) storing application programs 1042 or data 1044. Wherein memory 1032 and storage medium 1030 may be transitory or persistent. The program stored on the storage medium 1030 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Further, central processor 1022 may be configured to communicate with storage medium 1030 to perform a series of instruction operations in storage medium 1030 on server 1000.

The server 1000 may also include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input/output interfaces 1058, and/or one or more operating systems 1041, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

The steps performed by the management apparatus in the above-described embodiments may be based on the server structure shown in fig. 10.

In an embodiment of the present application, there is further provided a computer-readable storage medium having stored therein instructions for living body detection, which when executed on a computer, cause the computer to perform the steps performed by the living body detection apparatus in the method described in the embodiment shown in fig. 3 to 7.

There is also provided in an embodiment of the application a computer program product comprising instructions for in vivo detection which, when run on a computer, cause the computer to perform the steps performed by the in vivo detection apparatus in the method described in the embodiment shown in the foregoing figures 3 to 7.

The embodiment of the application also provides a living body detection system, which can comprise a living body detection device in the embodiment shown in fig. 8, or a terminal device in the embodiment shown in fig. 9, or a server shown in fig. 10.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a living body detection device, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of in vivo detection, comprising:

2. The method of claim 1, wherein the acquiring training samples and determining a sample dataset from a depth map corresponding to the training samples comprises:

acquiring a living body sample and an attack sample in the training sample;

configuring a depth map corresponding to the attack sample as a black base map;

3. The method of claim 2, wherein the identifying the living sample from the depth estimation network to obtain a sample depth map comprises:

determining a living body detection area in the living body sample;

4. The method of claim 3, wherein adjusting the living being detection area based on a preset expansion parameter to obtain an expanded identification area comprises:

5. The method of claim 1, wherein the sampling sample features in the sample dataset to obtain base sample features for each of the sample categories comprises:

6. The method of claim 5, wherein the method further comprises:

7. The method of claim 5, wherein the sampling the sample features to obtain base sample features corresponding to each of the sample categories comprises:

8. The method of claim 5, wherein inputting the pooled feature into a dynamic convolution network such that it matches a dynamic convolution parameter of a training sample comprises:

extracting low-frequency information in the pooling feature;

9. The method of claim 1, wherein the configuring of the multi-dimensional loss information based on the sample features and the reorganization features to determine target loss information comprises:

suppressing the salient elements to obtain the whitening loss information;

10. The method of claim 9, wherein the configuring the target loss information based on the whitening loss information, the classification loss information, and the depth loss information comprises:

Acquiring weighting parameters configured for the training samples;

11. The method according to claim 1, wherein training the preset recognition model based on the target loss information to obtain a living detection model, so as to perform living detection on the target object according to the living detection model, includes:

12. The method of claim 11, wherein the living detection model is packaged in a headend device, the method further comprising:

13. A living body detecting device, characterized by comprising:

the acquisition unit is used for acquiring training samples, determining a sample data set according to a depth map corresponding to the training samples, and configuring the sample data set based on different sample categories;

14. A computer device, the computer device comprising a processor and a memory:

the memory is used for storing program codes; the processor is configured to perform the method of in-vivo detection of any one of claims 1 to 12 according to instructions in the program code.

15. A computer program product comprising computer programs/instructions stored on a computer readable storage medium, characterized in that the computer programs/instructions in the computer readable storage medium, when executed by a processor, implement the steps of the method of living body detection according to any of the preceding claims 1 to 12.