CN112766162A

CN112766162A - Living body detection method, living body detection device, electronic apparatus, and computer-readable storage medium

Info

Publication number: CN112766162A
Application number: CN202110075262.4A
Authority: CN
Inventors: 孙庆宏; 尹榛菲; 邵婧; 吴一超
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2021-01-20
Filing date: 2021-01-20
Publication date: 2021-05-07
Anticipated expiration: 2041-01-20
Also published as: CN112766162B

Abstract

The present disclosure extracts image features of an image to be recognized including a target object after acquiring the image, and provides a living body detection method, an apparatus, an electronic device, and a computer-readable storage medium. According to the method, the multi-dimensional features are obtained through image feature extraction, and the defect that information captured in the human face living body detection process is single in the prior art is overcome. Meanwhile, the method sets a plurality of prototypes for each preset category, namely sets a multidimensional subclass for each preset category, and overcomes the defect of limited two-classification performance capability in the prior art through setting a plurality of subclasses. The living body detection is carried out by combining the acquired multidimensional characteristics and the multidimensional prototypes obtained by training, the precision of the living body detection of the human face can be effectively improved, the fact that a tested subject needs to complete sequence actions such as blinking, mouth opening and the like is avoided, and the user experience is improved.

Description

Living body detection method, living body detection device, electronic apparatus, and computer-readable storage medium

Technical Field

The disclosure relates to the field of computer vision and face recognition, in particular to a method and a device for detecting a living body and a computer-readable storage medium of electronic equipment.

Background

At present, the human face living body detection technology is a main problem in the fields of computer vision and human face recognition. In recent years, a face recognition technology is applied to many scenes, such as face payment, face unlocking, access control, and the like, as a common authentication technology.

The information captured by the traditional face in-vivo detection technology is extremely single, and the traditional face in-vivo detection technology cannot be applied to complex and variable practical application scenes due to the fact that the dimensionality of data in the face in-vivo detection technology is wide, the attack types are various, and the scenes are complex and variable. In addition, at present, a technology for performing face live body detection through a two-classification task exists, but in an actual application scene, attack means are diversified, the scene difference is large, the difference between subjects is large, the two-classification expressive ability is limited, and the detection accuracy is low.

Disclosure of Invention

The embodiment of the disclosure at least provides a living body detection method, a living body detection device, electronic equipment and a computer-readable storage medium.

In a first aspect, an embodiment of the present disclosure provides a method for detecting a living body, including:

obtaining image characteristics corresponding to an image to be recognized based on the image to be recognized comprising a target object;

obtaining a plurality of prototypes, wherein the prototypes comprise prototypes corresponding to each of at least two preset categories;

and performing living body detection on the target object in the image to be recognized based on the image features and the plurality of prototypes to obtain a living body detection result.

In this aspect, through image feature extraction, multi-dimensional features are obtained, and the defect that information captured in the human face living body detection process in the prior art is single is overcome. Meanwhile, a plurality of prototypes are set for each preset category, namely, a multidimensional subclass is set for each preset category, and the defect of limited two-classification performance capability in the prior art is overcome through the setting of the plurality of subclasses. The living body detection is carried out by combining the acquired multidimensional characteristics and the multidimensional prototypes obtained by training, the precision of the living body detection of the human face can be effectively improved, the fact that a tested subject needs to complete sequence actions such as blinking, mouth opening and the like is avoided, and the user experience is improved

In a possible implementation manner, the performing living body detection on the target object in the image to be recognized based on the image feature and the plurality of prototypes to obtain a living body detection result includes:

respectively determining first similarity information between the image features and each prototype in each preset category;

and screening a target class to which the target object belongs from the plurality of preset classes based on the determined first similarity information, and taking the target class as the living body detection result.

According to the embodiment, the target category to which the target object belongs can be accurately screened from the preset categories by utilizing the similarity information between the image characteristics and each prototype, so that an accurate living body detection result is obtained.

In one possible embodiment, the plurality of prototypes includes a first prototype trained using a first training sample in a first scenario, and a second prototype trained using a second training sample in a second scenario;

the image to be recognized is an image shot in the first scene.

The application of face live body detection is wide, so that scenes are changeable, in order to improve the scene applicability of the prototypes obtained through training, a new scene, namely a training sample in the first scene is used for training a first prototype corresponding to the new scene, and an old scene, namely a training sample in the second scene is used for training a second prototype corresponding to the old scene for detection, and on the basis of ensuring the detection precision of the old scene, the precision of face live body detection in the new scene is improved.

In a possible implementation manner, the first prototype includes a prototype corresponding to each preset class, and the second prototype includes a plurality of prototypes on a plurality of data dimensions corresponding to each preset class.

The application of the face living body detection is wide, so that scenes are changeable, in order to improve the scene applicability of the trained prototypes and reduce the labeling cost of the training samples of the new scene, namely the first scene, a small number of training samples in the new scene are utilized to train the first prototypes corresponding to the new scene, a large number of training samples in the old scene, namely the second scene, are utilized to train the second prototypes corresponding to the old scene, and therefore the face living body detection precision in the new scene can be improved on the basis of reducing the labeling cost of the new scene and ensuring the detection precision of the old scene.

In a possible implementation, the above-mentioned biopsy method further comprises the step of determining the first prototype:

acquiring a plurality of first sample images which are shot in the first scene and respectively correspond to each preset category;

and aiming at each preset category, respectively extracting image features from a plurality of first sample images corresponding to the preset category by using a feature extraction network, and determining a first prototype corresponding to the preset category based on the extracted image features.

In the embodiment, each preset category obtains a first prototype, the first prototype is adapted to a new scene, the first prototype is added to the second prototype, and the prototypes in the two scenes can improve the scene adaptability and improve the detection precision in the new scene. In addition, because only one first prototype is obtained in each preset category, the number of first sample images required by each preset category is not large, so that the marking cost can be effectively reduced, and the training efficiency of the prototypes is improved.

In a possible implementation manner, the determining, based on the extracted image features, a first prototype corresponding to the preset category includes:

and taking the average value of the extracted image features as a first prototype corresponding to the preset category.

According to the embodiment, the average value of the image features can accurately represent the corresponding features of the preset type of images, so that the adaptability of the in-vivo detection method to a new scene can be improved, and the detection precision in the new scene can be improved.

In a possible implementation, the above biopsy method further comprises the step of determining the second prototype:

acquiring a second sample image corresponding to each preset category and a plurality of initial prototypes on a plurality of data dimensions corresponding to each preset category, wherein the second sample image is shot in the second scene;

extracting image features in the second sample image by using a feature extraction network to be trained to obtain sample features;

determining second similarity information between the sample features and each initial prototype;

and determining a second prototype corresponding to each initial prototype based on the obtained second similarity information.

Because the dimensionality of the data detected by the human face living body is wide, the attack types are various, the scene is complex, and the traditional classifier is difficult to obtain a robust solution. According to the embodiment, a plurality of prototypes corresponding to different dimensions are set under each preset category, the dimensions of the original full-connection layer are increased by using a prototype learning method, the problem is converted into a multi-classification problem so as to increase the learning difficulty of the human face in-vivo detection model, the adaptability of the human face in-vivo detection model to complex data is further improved, and the robustness is improved.

In a possible implementation manner, the determining, based on the obtained second similarity information, a second prototype corresponding to each initial prototype includes:

for each preset category, determining the category of the second sample image as probability information of the preset category based on a plurality of pieces of second similarity information corresponding to the preset category;

generating a first loss based on the probability information corresponding to each preset category;

and determining a second prototype corresponding to each initial prototype based on the corresponding first loss of each second sample image.

According to the embodiment, the probability information that the second sample image belongs to a certain preset category can be accurately determined by using the second similarity information of the second sample image and each initial prototype in the certain preset category; and then, training each initial prototype by using the first loss corresponding to the corresponding second sample image generated by the probability information and the first loss corresponding to each second sample image to obtain a plurality of second prototypes capable of accurately representing the image features of the preset category.

In a possible embodiment, the determining a second prototype corresponding to each initial prototype based on the first loss corresponding to each second sample image includes:

determining third similarity information between every two initial prototypes in each preset category according to each preset category;

generating a second loss based on the third similarity information corresponding to each preset category and the first similarity threshold;

and determining a second prototype corresponding to each initial prototype based on the first loss and the second loss.

In this embodiment, in order to improve the detection accuracy, for intra-class constraint, it is required to ensure that different data attributes are represented between different prototypes in the same preset class, that is, it is specified that the similarity between different prototypes in the same preset class is greater than a preset first similarity threshold.

In a possible implementation, the determining a second prototype corresponding to each initial prototype based on the first loss function and the second loss includes:

for each preset category, screening the maximum third similarity information from the third similarity information corresponding to the preset category;

determining minimum similarity information between initial prototypes of different preset categories;

generating a third loss based on the maximum third similarity information, the minimum similarity information, and the second similarity threshold;

and determining a second prototype corresponding to each initial prototype based on the first loss, the second loss and the third loss.

In this embodiment, in order to improve the detection accuracy, it is necessary to ensure that the similarity between different prototypes in the same preset category is smaller than the similarity between prototypes in different preset categories, that is, the minimum similarity between prototypes in different preset categories is specified, and a value obtained by subtracting the maximum similarity between different prototypes in the same preset category is greater than the second similarity threshold.

In one possible embodiment, the in-vivo detection method further includes:

and training the feature extraction network to be trained by utilizing the first loss, the second loss and the third loss to obtain the trained feature extraction network.

In order to enable the feature extraction network to adapt to the living body detection, the sample features can be extracted by using the feature extraction network to be trained, so that the initial prototype is trained by using the first loss, the second loss and the third loss, the feature extraction network is also trained, the trained feature extraction network is obtained while the trained second prototype is obtained, the feature extraction network can extract the image features suitable for the living body detection, the quality of a target prototype needing to be subsequently selected is improved, and the living body detection precision is improved.

In a possible implementation, after determining the corresponding second prototype of each initial prototype, the method includes:

extracting image features in the second sample image by using the trained feature extraction network to obtain target features;

for each preset category, screening target prototypes from second prototypes corresponding to the preset category based on the target features of the second sample images corresponding to the preset category;

the performing living body detection on the target object in the image to be recognized based on the image features and the plurality of prototypes to obtain a living body detection result includes:

and performing living body detection on the target object in the image to be recognized based on the image features, the first prototype and the target prototype to obtain a living body detection result.

The number of prototypes has a great influence on the accuracy of the living body detection of the human face, so after a trained second prototype in a second scene is obtained, the obtained second prototype can be selected to improve the accuracy of the living body detection.

In a possible embodiment, the screening the target prototype from the second prototype characterization based on the target feature of the second sample image includes:

respectively determining fourth similarity information between each second prototype corresponding to each preset type and each target feature corresponding to each preset type aiming at each preset type;

respectively determining density information corresponding to each second prototype based on the fourth similarity information and a third similarity threshold corresponding to the preset category;

and screening the target prototype corresponding to the preset category from the second prototype corresponding to the preset category based on the density information corresponding to each second prototype.

According to the embodiment, the target prototype with higher matching degree with the corresponding preset category can be selected by using the density information determined by the similarity information between the second prototype and each target feature belonging to the same preset category, and the living body detection precision can be improved by using the target prototype with higher matching degree with the preset category.

In a possible embodiment, the screening the target prototype from the second prototypes based on the density information corresponding to each second prototype includes:

for each preset category, taking a second prototype corresponding to the maximum density information in the preset category as a target prototype corresponding to the preset category, and removing the second prototype corresponding to the maximum density information from the second prototype corresponding to the preset category;

removing target features of which fourth similarity information with the target prototype is larger than the third similarity threshold;

and returning to the step of respectively determining fourth similarity information between each second prototype corresponding to each preset category and each target feature corresponding to each preset category.

According to the embodiment, the second prototype corresponding to the maximum density information is used as the target prototype, and the matching degree of the target prototype obtained through screening and the corresponding preset type can be guaranteed.

In a possible implementation manner, the screening, from the second prototypes corresponding to the preset category, the target prototypes corresponding to the preset category based on the density information corresponding to each second prototype includes:

and under the condition that the maximum density information corresponding to the preset category is larger than zero, screening the target prototype corresponding to the preset category from the second prototypes corresponding to the preset category based on the density information corresponding to each second prototype.

In the embodiment, the target prototype is screened under the condition that the maximum density information corresponding to the preset category is greater than zero, so that the matching degree of the screened target prototype and the corresponding preset category can be ensured; when the maximum density information corresponding to the preset category is equal to zero, the matching of the residual second prototypes and the corresponding preset categories is poor, and the target prototypes are not screened from the residual second prototypes at the moment, so that the living body detection precision is improved.

In a possible implementation manner, the screening, based on the density information corresponding to each of the second prototypes, the target prototype corresponding to the preset category from the second prototypes corresponding to the preset category further includes:

and under the condition that the number of the second prototypes corresponding to the preset category is greater than zero, screening the target prototypes corresponding to the preset category from the second prototypes corresponding to the preset category based on the density information corresponding to each second prototype.

In this embodiment, the target prototype can be continuously screened from the remaining second prototypes only if the number of the remaining second prototypes is greater than zero.

In a second aspect, the present disclosure provides a living body detection apparatus comprising:

the characteristic extraction module is used for obtaining image characteristics corresponding to an image to be recognized based on the image to be recognized comprising a target object;

the prototype acquiring module is used for acquiring a plurality of prototypes, and the prototypes comprise prototypes corresponding to each of at least two preset categories;

and the detection module is used for carrying out living body detection on the target object in the image to be recognized based on the image characteristics and the plurality of prototypes to obtain a living body detection result.

In a third aspect, an embodiment of the present disclosure further provides a computer device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect described above, or any possible implementation of the first aspect.

In a fourth aspect, this disclosed embodiment also provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.

For the description of the effects of the above-described living body detecting apparatus, the computer device, and the computer-readable storage medium, reference is made to the description of the above-described living body detecting method, which is not repeated herein.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

FIG. 1 illustrates a flow chart of a method of live detection provided by an embodiment of the present disclosure;

FIG. 2 is a flow chart illustrating another in vivo detection method provided by embodiments of the present disclosure;

FIG. 3 illustrates a schematic diagram of a living body detection apparatus provided by an embodiment of the present disclosure;

fig. 4 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

The human face living body detection technology is a main problem in the field of computer vision and human face recognition. In recent years, a face recognition technology is applied to many scenes, such as face payment, face unlocking, access control, and the like, as a common authentication technology. The face recognition technology mainly has two branches, one is a face comparison technology, and the other is a face living body detection technology. The human face living body detection technology is used for judging whether an input human face is a real human face or a false human face generated by other attack means. The method plays an extremely important role as a post-technology of a face comparison technology.

The information captured by the traditional face in-vivo detection technology is extremely single, and the traditional face in-vivo detection technology cannot be applied to complex and variable practical application scenes due to the fact that the dimensionality of data in the face in-vivo detection technology is wide, the attack types are various, and the scenes are complex and variable.

Currently, some techniques also perform face liveness detection by using timing information, such as blinking, opening the mouth, shaking the head, and the like. These techniques require the subject to be tested to perform the above-described actions, which can be time-consuming and can be easily averse to the subject.

In addition, at present, a technology for performing face live body detection through a two-classification task exists, but in an actual application scene, attack means are diversified, the scene difference is large, the difference between subjects is large, the two-classification expressive ability is limited, and the detection accuracy is low.

In view of the above technical drawbacks, the present disclosure provides a method, an apparatus, an electronic device, and a computer-readable storage medium for detecting a living body, where prototypes in multiple dimensions are predetermined for each preset category, such as a real face and a dummy face, respectively, where the prototypes are obtained by training a face living body detection model and may be sample image features corresponding to the preset categories. After an image to be recognized including a target object is acquired, image features are extracted, and living body detection is carried out on the target object in the image to be recognized based on the extracted image features and a plurality of prototypes corresponding to each preset category to obtain a living body detection result.

According to the method, the multi-dimensional features are obtained through image feature extraction, and the defect that information captured in the human face living body detection process is single in the prior art is overcome. Meanwhile, the method sets a plurality of prototypes for each preset category, namely sets a multidimensional subclass for each preset category, and overcomes the defect of limited two-classification performance capability in the prior art through setting a plurality of subclasses. The obtained multidimensional characteristics and the multidimensional prototypes obtained through training are combined, the human face living body detection precision can be effectively improved, the detection process does not need to be actively matched with the tested main body, for example, the sequence actions such as blinking, mouth opening and the like of the tested main body are reduced, and the user experience is improved.

The method, apparatus, electronic device, and storage medium for detecting a living body according to the present disclosure will be described below with reference to specific examples.

As shown in fig. 1, the embodiment of the present disclosure discloses a method for detecting a living body, which can be applied to a device having a data processing function, such as a processor, a server, etc. Specifically, the living body detecting method may include the steps of:

s110, obtaining image characteristics corresponding to the image to be recognized based on the image to be recognized including the target object.

Firstly, an image to be recognized including a target object needs to be acquired, and then feature extraction is carried out on the image to be recognized to obtain the image features.

The image to be recognized may be captured by an image capturing device on the apparatus for performing the living body detecting method of the present embodiment, may be captured by an image capturing apparatus of a separate facility, and thereafter this image capturing apparatus uploads the captured image to be recognized to the apparatus for performing the living body detecting method of the present embodiment.

When the image features are extracted, the trained feature extraction network can be used for extracting the image features suitable for living body detection, so that the quality of target prototypes needing to be selected subsequently can be improved, and the living body detection precision can be improved.

And S120, obtaining a plurality of prototypes, wherein the plurality of prototypes comprise prototypes corresponding to each of at least two preset categories.

In order to improve the adaptability of the live body detection method of the embodiment to the new scene and improve the accuracy of the live body detection in the new scene, the prototype here includes a first prototype trained by using a first training sample in a first scene (i.e. the new scene).

In order to ensure the living body detection accuracy of the old scene (i.e. the following practical application scene, the second scene), the prototype herein also includes a second prototype trained by using the second training sample in the second scene.

Further, in order to reduce the labeling cost of the first training sample in the new scene, a small number of training samples in the new scene are used to train the first prototype corresponding to the new scene.

Further, in order to ensure the accuracy of the live body detection in the old scene and improve the accuracy of the live body detection in the new scene, the second prototype includes a plurality of second prototypes in a plurality of data dimensions corresponding to each preset class.

In a specific implementation, the first prototype may be determined by the following steps:

acquiring a plurality of first sample images which are shot in the first scene and respectively correspond to each preset category; and aiming at each preset category, respectively extracting image features from a plurality of first sample images corresponding to the preset category by using a feature extraction network, and determining a first prototype corresponding to the preset category based on the extracted image features. Specifically, the mean value of the extracted image features may be used as the first prototype corresponding to the preset category.

Each preset category obtains a first prototype, the first prototype is adapted to a new scene, the first prototype is added into the second prototype, and the scene adaptability and the detection accuracy in the new scene can be improved by utilizing the prototypes in the two scenes. In addition, because only one first prototype is obtained in each preset category, the number of first sample images required by each preset category is not large, so that the marking cost can be effectively reduced, and the training efficiency of the prototypes is improved.

The specific determination step for the first prototype is described in the following stage four, and in the prototype training stage in the new scenario.

In a specific implementation, the second prototype may be determined by the following steps:

acquiring a second sample image corresponding to each preset category and a plurality of initial prototypes on a plurality of data dimensions corresponding to each preset category, wherein the second sample image is shot in the second scene; extracting image features in the second sample image by using a feature extraction network to be trained to obtain sample features; determining second similarity information between the sample features and each initial prototype; and determining a second prototype corresponding to each initial prototype based on the obtained second similarity information.

The determining of the second prototype corresponding to each initial prototype based on the obtained second similarity information may specifically be implemented by using the following steps:

for each preset category, determining the category of the second sample image as probability information of the preset category based on a plurality of pieces of second similarity information corresponding to the preset category; generating a first loss based on the probability information corresponding to each preset category; and determining a second prototype corresponding to each initial prototype based on the corresponding first loss of each second sample image.

To improve the training accuracy of the prototype, prototype constraints may be set, which may specifically include an intra-class constraint L_PCintra(i.e., the second loss described below) and/or inter-class constraint L_PCinter(i.e., the third loss described below), and then using the first loss and the prototype constraints, determining a second prototype corresponding to each initial prototype.

The second loss may be determined using the following steps: determining third similarity information between every two initial prototypes in each preset category according to each preset category; and generating a second loss based on the third similarity information corresponding to each preset category and the first similarity threshold.

The third loss may be determined using the following steps: for each preset category, screening the maximum third similarity information from the third similarity information corresponding to the preset category; determining minimum similarity information between initial prototypes of different preset categories; generating a third loss based on the maximum third similarity information, the minimum similarity information, and the second similarity threshold.

In order to enable the feature extraction network to adapt to the living body detection, the sample features can be extracted by using the feature extraction network to be trained, so that the initial prototype is trained by using the first loss, the second loss and the third loss, the feature extraction network is also trained, the trained feature extraction network is obtained while the trained second prototype is obtained, the feature extraction network can extract image features suitable for the living body detection, the quality of a target prototype needing to be subsequently selected is improved, and the living body detection precision is improved.

The specific extraction steps for the sample features are described in the following stage one, and in the feature extraction stage.

The specific determination step for the second prototype is described in the following stage two, and in the prototype training stage in the practical application scenario.

130. And performing living body detection on the target object in the image to be recognized based on the image features and the plurality of prototypes to obtain a living body detection result.

In practice, the following steps can be used to perform the biopsy: respectively determining first similarity information between the image features and each prototype in each preset category; and screening a target class to which the target object belongs from the plurality of preset classes based on the determined first similarity information, and taking the target class as the living body detection result.

The screening, based on the determined first similarity information, a target category to which the target object belongs from the multiple preset categories, and taking the target category as the living body detection result may specifically include:

for each preset category, carrying out weighted summation on a plurality of pieces of first similarity information corresponding to the preset category to obtain the probability that the image to be identified is the preset category; the preset category comprises a real face and a false face; and determining the target class of the target object based on the probability corresponding to each preset class to obtain a detection result of the human face living body detection of the target object. Illustratively, the preset category corresponding to the larger probability is taken as the target category of the target object in the image to be recognized.

By utilizing the similarity information between the image characteristics and each prototype, the target class to which the target object belongs can be accurately screened from the preset classes, and an accurate living body detection result is obtained.

The number of prototypes has a great influence on the accuracy of the living body detection of the human face, so after a trained second prototype in a second scene is obtained, the obtained second prototype can be selected to improve the accuracy of the living body detection. Specifically, the target prototype may be selected from the second prototype by the following steps: extracting image features in the second sample image by using the trained feature extraction network to obtain target features; and for each preset category, screening target prototypes from second prototypes corresponding to the preset category based on the target features of the second sample images corresponding to the preset category.

After obtaining the target prototype, the biopsy can be performed using the following steps: and performing living body detection on the target object in the image to be recognized based on the image features, the first prototype and the target prototype to obtain a living body detection result.

For example, the screening of the target prototype from the second prototype characterization based on the target feature of the second sample image may include:

respectively determining fourth similarity information between each second prototype corresponding to each preset type and each target feature corresponding to each preset type aiming at each preset type; respectively determining density information corresponding to each second prototype based on the fourth similarity information and a third similarity threshold corresponding to the preset category; and screening the target prototype corresponding to the preset category from the second prototype corresponding to the preset category based on the density information corresponding to each second prototype.

The screening of the target prototype from the second prototypes based on the density information corresponding to each second prototype may specifically include: and regarding each preset category, taking the second prototype corresponding to the maximum density information in the preset category as the target prototype corresponding to the preset category.

After a target prototype is screened out, a second prototype corresponding to the maximum density information needs to be removed, so that in the next screening process, the second prototype except for the maximum density information in the second prototype can be screened; removing target features of which fourth similarity information with the target prototype is larger than the third similarity threshold; and then, returning to the step of respectively determining fourth similarity information between each second prototype corresponding to each preset category and each target feature corresponding to each preset category aiming at each preset category so as to screen the next target prototype.

For each round of screening a target prototype, the conditions for stopping the cycle may include: for each preset category, if the maximum density information corresponding to the preset category is equal to zero, or the number of the second prototypes corresponding to the preset category is equal to zero.

The screening and determining steps for the target prototype are described in the third stage, and the adaptive prototype selecting stage.

As can be seen from the above description, the prototype plays an important role in improving the detection accuracy, and the following description will first describe the training process of the prototype by some other embodiments, and then describe the specific use of the prototype in the human face living body detection.

The application of the face living body detection is wide, so that scenes are changeable, in order to improve the scene applicability of the prototypes obtained through training and reduce the labeling cost of the training samples of the new scenes, the prototypes corresponding to the new scenes are trained by using a small number of training samples of the new scenes, and a plurality of prototypes corresponding to the actual application scenes are obtained through training by using a large number of training samples of the old scenes (such as the actual application scenes), so that the face living body detection precision in the new scenes can be improved on the basis of reducing the labeling cost of the new scenes.

Illustratively, training of a prototype includes the following four stages:

stage one, a characteristic extraction stage.

Obtaining a large number of training samples in an actual application scene, wherein each training sample comprises a sample image and a preset category to which the sample image belongs. The preset categories herein may include a real face and a false face. The preset category to which the sample image belongs may be pre-labeled.

After the training samples are obtained, extracting the image features of each sample image by using a feature extractor to be trained to obtain the sample features.

For example, in a scene of face payment, the sample image may be an image successfully matched with a stored standard face image, and the standard face image is stored in advance. And after the sample image is successfully matched with the standard face image, carrying out living body face detection on the sample object in the sample image. The successful matching of the sample image and the stored standard face image indicates that the sample object in the sample image is the owner of the materials used for payment, and at this time, the living body face detection needs to be carried out on the sample image.

In addition, the step is a training step, is not an actual application step, and does not need to perform actual operations such as payment, so that the sample image is marked with a preset category and may not be an image matched with a standard face image.

After the sample images are acquired, the sample images may be first scaled to a certain size, for example, 224 × 224 pixels, according to the limit of the computing power of the training apparatus and the requirement of the training speed.

The feature extractor to be trained may be a convolutional neural network to be trained, which is capable of extracting N-dimensional image features, the dimensions of which are related to the performance capabilities of the selected feature extractor, e.g., the convolutional neural network of ResNet18 is capable of extracting 512-dimensional image features.

In practical application, a proper feature extractor can be selected according to the requirements of the actual scene on speed and precision.

And stage two, a prototype training stage in an actual application scene.

For example, the preset categories may include both real faces and false faces, where multiple prototypes are initialized for each preset category, for example, K prototypes are initialized for each preset category, and the dimension of each prototype may be equal to the dimension of the extracted sample feature. As shown in FIG. 2, the prototype initialized for a real face includes P⁰ _S…P^K _SThe prototype initialized for the artificial face includes P⁰ _L…P^K _L。

After the initialized prototypes are obtained, all the initialized prototypes are subjected to L2 norm normalization, and simultaneously, the sample features extracted in the first stage are subjected to L2 norm normalization.

For a sample image, the extracted sample features (e.g., 1 × N in fig. 2) and each prototype (e.g., P in fig. 2) are calculated separately⁰ _S…P^K _S) The similarity between them; here, specifically, an inner product of the sample feature and each prototype may be calculated, and the similarity may be determined based on the obtained inner product calculation result.

And for each preset category, carrying out weighted summation on a plurality of similarities corresponding to the category to obtain the probability that the sample image is the preset category.

Specifically, the probability corresponding to a certain sample image under each preset category can be calculated by using the following formula:

wherein cos θ_jRepresenting the probability that the ith sample image is in a preset category j, and K represents the preset categoryLet the number of prototypes, f, corresponding to the class j_iSample feature representing the ith sample image, P^r _jDenotes the r-th prototype of the preset category j and τ denotes the preset hyper-parameter, e.g. set to 10. When the ith sample image is a real face, j is set to be 1, and when the ith sample image is a false face, j is set to be 0.

Then, a first loss corresponding to a certain sample image is generated by using the obtained probability and a preset category to which the sample image belongs, specifically, the first loss is as follows:

in the formula (I), the compound is shown in the specification,

the method includes the steps that a first loss corresponding to the ith sample image is represented, s and m represent preset hyper-parameters, yi represents a preset category corresponding to the ith sample image, for example, when the ith sample image is a real face, yi is set to be 1, and when the ith sample image is a false face, yi is set to be 0.

Utilizing the corresponding first loss L of all sample images_PDTraining is carried out, K trained prototypes under each preset category can be obtained, and a trained feature extractor can be obtained at the same time. The feature extractor is used in practical application to extract image features of an image to be recognized.

As can be seen from the above description, K prototypes are respectively set under each preset category, and in order to improve the training precision of the prototypes, a prototype constraint L can be set as shown in FIG. 2_PCThe prototype constraints may specifically include an intra-class constraint L_PCintraAnd inter-class constraint L_PCinter。

For the intra-class constraint, in order to ensure that different prototypes in the same preset class represent different data attributes, it may be specified that the similarity between different prototypes in the same preset class is greater than a preset first similarity threshold. In particular implementation, the second penalty may be established to complete the intra-class constraint using the following steps:

and determining the similarity between any two initially initialized prototypes in each preset category. And then generating a second loss based on the similarity corresponding to each preset category and the first similarity threshold. The similarity between any two of the initially initialized prototypes may be determined specifically according to an inner product calculation result between any two of the initially initialized prototypes.

The second loss equation may be as follows:

in the formula (I), the compound is shown in the specification,

represents the second loss, δ₂Representing the first similarity threshold, r, t representing the order of the prototype.

For inter-class constraints, in order to ensure that the similarity between different prototypes in the same preset class is less than the similarity between prototypes in different preset classes, the minimum similarity between prototypes in different preset classes may be specified, and a value obtained by subtracting the maximum similarity between different prototypes in the same preset class is greater than a second similarity threshold. In particular implementation, the third loss may be established by using the following steps to complete the inter-class constraint:

for each preset category, screening the maximum similarity from the similarities between any two initialized prototypes under the preset category; determining the minimum similarity between any two initialized prototypes under different preset categories; generating a third loss based on the maximum similarity, the minimum similarity, and the second similarity threshold. The maximum similarity and the minimum similarity may be determined according to an inner product calculation result between two initialized prototypes.

The third loss equation may be as follows:

in the formula (I), the compound is shown in the specification,

represents the third loss, δ₁Represents a second similarity threshold, j represents a preset category, and r1, r2, r1 ', r 2' represent the order of the prototype.

After the above three losses are obtained, training can be performed using the first loss, the second loss, and the third loss, resulting in a more optimal prototype and feature extractor.

Because the dimensionality of the data detected by the human face living body is wide, the attack types are various, the scene is complex, and the traditional classifier is difficult to obtain a robust solution. According to the method, through a deep learning mode, a plurality of prototypes corresponding to different dimensions are set under each preset category, the dimensions of an original full-connection layer are increased by utilizing a prototype learning method, the problems are converted into a multi-classification problem so as to increase the learning difficulty of a human face in-vivo detection model, the adaptability of the human face in-vivo detection model to complex data is improved, and the robustness is improved.

And step three, a prototype self-adaptive selection stage.

The number of the prototypes has a great influence on the expressive ability of the human face living body detection model, different scenes have different data complexity, the data distribution difference of different preset categories is large, and the number of the prototypes has higher requirements, so that after the trained prototypes in the practical application scenes are obtained, the obtained prototypes can be selected to obtain the final target prototypes.

In stage two, more prototypes can be obtained under different preset categories, for example, 100 trained prototypes are obtained for each preset category.

Illustratively, the target prototype may be selected using the following steps:

step one, extracting the image characteristics of the sample image by using a characteristic extractor trained in the stage two to obtain target characteristics; and according to the preset category to which the sample image belongs, dividing the extracted target features into feature sets corresponding to the preset category, for example, the sample image includes a sample image corresponding to a real face and a sample image corresponding to a false face, taking all target features extracted from the sample image corresponding to the real face as a feature set corresponding to the preset category of the real face, and taking all target features extracted from the sample image corresponding to the false face as a feature set corresponding to the preset category of the false face.

Meanwhile, the prototypes trained in the second stage are divided into a plurality of prototype sets according to the preset categories to which the prototypes belong, for example, the prototype corresponding to the preset category of the real face is used as one prototype set, and the prototype corresponding to the preset category of the artificial face is used as one prototype set.

Because the number of the sample images is large, a certain number of sample images can be randomly extracted to extract the image characteristics.

After the target feature is obtained, the L2 norm normalization processing is performed on the target feature.

In this step, the target feature is extracted from the sample image corresponding to the actual application scene, and in the actual application, the target feature is not limited to the sample image used for training a prototype, and may be another image captured in the actual application scene, for example, as long as the target feature is identical to or similar to the scene of the sample image. As shown in fig. 2, the target feature is extracted from other images that are the same as or similar to the scene of the sample image.

And step two, performing L2 norm normalization processing on the prototypes in each prototype set. Then, the target prototype can be selected according to the following sub-steps for any prototype set:

and a first substep of acquiring a feature set matched with the prototype set based on the preset category corresponding to the prototype set, for example, acquiring the feature set corresponding to the real face when the preset category corresponding to the prototype set is the real face.

And a second step of calculating the similarity between the prototype and each target feature in the feature set acquired in the first step to obtain a plurality of similarities, counting the number of the similarities larger than a preset similarity threshold, and taking the obtained number as the density corresponding to the prototype. This step is repeated until the corresponding density of each prototype in the set of prototypes is obtained. The similarity in this sub-step can be determined according to the calculation result of the inner product of the prototype and the target feature. The similarity threshold in this step is set according to different preset categories, and the similarity thresholds corresponding to the different preset categories may be different.

Selecting the prototype corresponding to the maximum density as a target prototype corresponding to the prototype set; and (3) eliminating the prototype corresponding to the maximum density from the prototype set, and eliminating the target features with the similarity greater than the similarity threshold value with the target prototype from the feature set.

As shown in FIG. 2, the prototype P is determined by density calculation_L ^KWhen the prototype P is reached_L ^KRemoving from the prototype set, and simultaneously removing prototype P_L ^KAs a target prototype.

And according to the substep two and the substep three, continuously selecting a target prototype, continuously removing the prototype from the prototype set, removing the target feature from the feature set until the maximum density of the prototype in the prototype set is zero or the prototype set is empty, and stopping selecting the target prototype from the prototype set. Thus, all target prototypes corresponding to the prototype set are obtained.

And step two is respectively executed on each prototype set to obtain a target prototype corresponding to each prototype set. Since the prototype sets are divided according to the preset categories, when the target prototype corresponding to each prototype set is obtained, it is equivalent to obtain the target prototype corresponding to each preset category. To this end, a plurality of target prototypes in a plurality of dimensions are respectively determined for each preset category.

The target prototype selected in the stage is a necessary prototype of the face living body detection model, and the rest prototypes are determined to be redundant and can be discarded.

The performance of the human face living body detection model is greatly influenced by the number of prototypes. For different scenes, the human face living body detection model needs to set different number of prototypes under the influence of data complexity, and the number of target prototypes corresponding to different preset categories can be the same or different. The prototype is selected in a self-adaptive mode, and the universality of the human face living body detection model in different scenes is improved.

And stage four, a prototype training stage in a new scene.

In practical application, due to the change of attack means and the change of scenes, the face living body detection model needs to adapt to new dimensional data continuously, and in the prior art, a large number of new training samples are generally used for retraining the face living body detection model so as to obtain the face living body detection model with higher detection precision. However, the mode not only increases the labeling cost of model training, but also reduces the training efficiency of the face living body detection model, and influences the face living body detection efficiency.

Aiming at the defects, the human face living body detection is carried out by training a new prototype by using a small amount of newly added training number samples and combining the new prototype with the target prototype obtained in the last stage.

Illustratively, the above new prototype may be trained using the following steps:

acquiring a plurality of sample images which are shot in a new scene and respectively correspond to each preset category; and aiming at each preset category, respectively extracting image features from a plurality of sample images corresponding to the preset category, calculating the mean value of the extracted image features, and taking the obtained mean value as a new prototype corresponding to the preset category. To this end, each preset category gets a new prototype, which fits into a new scene. For example, the target prototype in the actual application scene obtained in the third stage includes prototypes corresponding to two face forgery modes, namely "paper-cut forgery" and "screen forgery", and a face forged by the face forgery mode, namely "face forgery", appears in the new scene, and at this time, the face forged by the face forgery mode, namely "face forgery", cannot be accurately recognized only by using the target prototype. To solve this problem, the above-described new prototype can be generated using a sample image including a human face generated by means of "mask forgery". And then, adding a new prototype into the target prototype, wherein the accuracy of recognizing the face forged by the face forging mode of mask forging can be improved by using the prototypes in the two scenes. The 'paper-cut forgery' can be a face image obtained by printing or a photo comprising the face image; the 'screen counterfeiting' may be that a face image is displayed through a display interface of the device, for example, the face image stored in an album is accessed through a mobile phone, and the face image is displayed through the display interface of the mobile phone; "mask forgery" may mean that a face image of a certain person is worn as a mask on the face of another person.

In the stage, the image features can be extracted by using the feature extractor obtained by the training in the stage two.

The new prototype obtained in the stage and the target prototype obtained in the stage III are combined to carry out face living body detection, so that the detection precision of a face living body detection model on the data of a new dimension can be effectively improved, the detection precision of the original data is kept, the labeling cost can be reduced, the training complexity of the face living body detection model is reduced, the training efficiency is improved, the face living body detection model can be quickly adapted to newly-added data with larger scene difference with practical application, the universality and universality of the face living body detection model are greatly improved, and the face living body detection model can be applied to different scenes.

Through the four stages, prototypes corresponding to each preset category and used for face living body detection are determined, the prototypes include a target prototype obtained in the third stage and a new prototype obtained in the fourth stage, and the two prototypes are taken as target application prototypes together.

In application, the following steps can be utilized to perform the human face living body detection:

extracting image features in the image to be recognized by using a trained feature extractor, and calculating the similarity between the extracted image features and each target application prototype in each preset category;

for each preset category, carrying out weighted summation on a plurality of similarities corresponding to the category to obtain the probability that the image to be identified is the preset category; the preset category comprises a real face and a false face;

and determining the target class of the target object based on the probability corresponding to each preset class to obtain a detection result of the human face living body detection of the target object.

Illustratively, a preset class corresponding to a higher probability is used as the target class of the target object in the image to be recognized, for example, in the case that the probability corresponding to a real face is 70%, and the probability corresponding to a false face is 30%, the target class of the target object is a real face.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, the embodiment of the present disclosure further provides a living body detection device corresponding to the living body detection method, and since the principle of solving the problem of the device in the embodiment of the present disclosure is similar to the living body detection method in the embodiment of the present disclosure, the implementation of the device can refer to the implementation of the method, and repeated details are not repeated.

As shown in fig. 3, a living body detection apparatus provided by an embodiment of the present disclosure may include:

the feature extraction module 301 is configured to obtain an image feature corresponding to an image to be recognized based on the image to be recognized including a target object.

The prototype obtaining module 302 is configured to obtain a plurality of prototypes, where the plurality of prototypes include a prototype corresponding to each of at least two preset categories.

A detecting module 303, configured to perform living body detection on the target object in the image to be recognized based on the image features and the plurality of prototypes, so as to obtain a living body detection result.

In some embodiments, the detecting module 303, when performing living body detection on the target object in the image to be recognized based on the image features and the plurality of prototypes, and obtaining a living body detection result, is configured to:

In some embodiments, the plurality of prototypes includes a first prototype trained using a first training sample in a first scenario, and a second prototype trained using a second training sample in a second scenario;

the image to be recognized is an image shot in the first scene.

In some embodiments, the first prototype includes a prototype corresponding to each preset class, and the second prototype includes a plurality of prototypes on a plurality of data dimensions corresponding to each preset class.

In some embodiments, the above-mentioned biopsy device further comprises a prototype training module 304, wherein the prototype training module 304 is configured to determine the first prototype:

In some embodiments, the prototype training module 304, when determining the first prototype corresponding to the preset category based on the extracted image features, is configured to:

In some embodiments, the prototype training module 304 further comprises means for determining the second prototype:

In some embodiments, the prototype training module 304, when determining the second prototype corresponding to each initial prototype based on the obtained second similarity information, is configured to:

In some embodiments, the prototype training module 304, when determining the second prototype corresponding to each initial prototype based on the first loss corresponding to each second sample image, is configured to:

In some embodiments, the prototype training module 304, in determining the second prototype corresponding to each initial prototype based on the first loss function and the second loss, is configured to:

In some embodiments, the biopsy device further comprises a network training module 305, and the network training module 305 is configured to:

In some embodiments, the prototype training module 304, after determining the corresponding second prototype for each initial prototype, is further configured to:

the detecting module 303, when performing living body detection on the target object in the image to be recognized based on the image features and the plurality of prototypes to obtain a living body detection result, is configured to:

In some embodiments, the prototype training module 304, when screening the target prototype from the second prototype characterization based on the target feature of the second sample image, is configured to:

In some embodiments, the prototype training module 304, when screening the target prototype from the second prototypes based on the density information corresponding to each of the second prototypes, is configured to:

In some embodiments, the prototype training module 304, when filtering the target prototype corresponding to the preset category from the second prototypes corresponding to the preset category based on the density information corresponding to each second prototype, is configured to:

Based on the same technical concept, the embodiment of the disclosure also provides computer equipment. Referring to fig. 4, a schematic structural diagram of a computer device 400 provided in the embodiment of the present disclosure includes a processor 41, a memory 42, and a bus 43. The memory 42 is used for storing execution instructions and includes a memory 421 and an external memory 422; the memory 421 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 41 and the data exchanged with the external memory 422 such as a hard disk, the processor 41 exchanges data with the external memory 422 through the memory 421, and when the computer apparatus 400 operates, the processor 41 communicates with the memory 42 through the bus 43, so that the processor 41 executes the following instructions:

obtaining image characteristics corresponding to an image to be recognized based on the image to be recognized comprising a target object; obtaining a plurality of prototypes, wherein the prototypes comprise prototypes corresponding to each of at least two preset categories; and performing living body detection on the target object in the image to be recognized based on the image features and the plurality of prototypes to obtain a living body detection result.

The disclosed embodiments also provide a computer-readable storage medium having stored thereon a computer program, which, when executed by a processor, performs the steps of the living body detection method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The computer program product of the biopsy method provided in the embodiments of the present disclosure includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the steps of the biopsy method described in the above method embodiments, which may be referred to specifically for the above method embodiments, and are not described herein again.

The embodiments of the present disclosure also provide a computer program, which when executed by a processor implements any one of the methods of the foregoing embodiments. The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of in vivo detection, comprising:

2. The in-vivo detection method according to claim 1, wherein the in-vivo detection of the target object in the image to be recognized based on the image features and the plurality of prototypes to obtain an in-vivo detection result comprises:

3. The biopsy method according to claim 1 or 2, wherein the plurality of prototypes includes a first prototype trained using a first training sample in a first scene, and a second prototype trained using a second training sample in a second scene;

the image to be recognized is an image shot in the first scene.

4. The in-vivo detection method according to claim 3, wherein the first prototype includes a prototype corresponding to each of the predetermined classes, and the second prototype includes a plurality of prototypes in a plurality of data dimensions corresponding to each of the predetermined classes.

5. The biopsy method according to claim 3 or 4, further comprising the step of determining the first prototype:

6. The in-vivo detection method according to claim 5, wherein the determining the first prototype corresponding to the preset category based on the extracted image features comprises:

7. The in-vivo detection method according to any one of claims 3 to 6, further comprising the step of determining the second prototype:

8. The in-vivo detection method according to claim 7, wherein the determining a second prototype corresponding to each initial prototype based on the obtained second similarity information comprises:

9. The in-vivo detection method according to claim 8, wherein determining the second prototype corresponding to each initial prototype based on the first loss corresponding to each second sample image comprises:

10. The in-vivo detection method according to claim 9, wherein said determining a second prototype corresponding to each initial prototype based on the first loss function and the second loss comprises:

11. The in-vivo detection method according to claim 10, further comprising:

12. The in-vivo detection method according to any one of claims 7 to 11, wherein after determining the second prototype corresponding to each initial prototype, comprising:

13. The in-vivo detection method as recited in claim 12, wherein the screening of the target prototype from the second prototype characterization based on the target feature of the second sample image comprises:

14. The in-vivo detection method according to claim 13, wherein the screening of the target prototype from the second prototypes based on the density information corresponding to each of the second prototypes comprises:

15. The in-vivo detection method according to claim 13 or 14, wherein the screening the target prototype corresponding to the preset category from the second prototypes corresponding to the preset category based on the density information corresponding to each of the second prototypes comprises:

16. The in-vivo detection method according to any one of claims 13 to 15, wherein the screening the target prototype corresponding to the preset category from the second prototypes corresponding to the preset category based on the density information corresponding to each of the second prototypes further comprises:

17. A living body detection device, comprising:

18. A computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when a computer device is run, the machine-readable instructions when executed by the processor performing the steps of the liveness detection method of any one of claims 1 to 16.

19. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, performs the steps of the liveness detection method according to any one of claims 1 to 16.