CN115830721B

CN115830721B - Living body detection method, living body detection device, terminal device and readable storage medium

Info

Publication number: CN115830721B
Application number: CN202211361753.6A
Authority: CN
Inventors: 肖良才; 胡文杰
Original assignee: SHENZHEN ELOAM TECHNOLOGY CO LTD
Current assignee: SHENZHEN ELOAM TECHNOLOGY CO LTD
Priority date: 2022-11-02
Filing date: 2022-11-02
Publication date: 2024-05-03
Anticipated expiration: 2042-11-02
Also published as: CN115830721A

Abstract

The embodiment of the invention provides a living body detection method, a living body detection device, terminal equipment and a readable storage medium, wherein an image to be detected is obtained; setting a plurality of language labels for the image to be detected by adopting a language description mode; determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images; inputting the language label into the language coding model to obtain language supervision characteristics; inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected; and calculating the similarity of the language supervision characteristic and the image characteristic, judging whether the image to be detected is a living body image according to the similarity value, and performing living body training in a manner based on language description supervision, namely, using detailed labels, wherein the change of data does not influence the model at all along with the increase of the types of the prosthesis, so that the expansibility of the system is increased, and the accuracy of living body detection is improved.

Description

Living body detection method, living body detection device, terminal device and readable storage medium

Technical Field

The present invention relates to the field of computer vision, and in particular, to a living body detection method, apparatus, terminal device, and readable storage medium.

Background

With the use of deep learning, face recognition systems have emerged in numerous products, with the consequent living detection of faces becoming one of the subjects of intense research in computer vision. It needs to judge the identified face as living body or false body to prevent criminals from stealing other information.

The main mode of the existing living body detection model is also supervision training, the data label adopts a category label in a one-hot mode, the category label is single in form and small in information quantity, and the living body cannot be accurately detected by adopting the existing detection method due to the complexity and uncertainty of the prosthesis type in the prosthesis data.

Disclosure of Invention

In view of the above problems, embodiments of the present invention are presented to provide a living body detection method, apparatus, terminal device, and readable storage medium that overcome or at least partially solve the above problems.

In a first aspect, an embodiment of the present invention provides a living body detection method, including:

Acquiring an image to be detected;

setting a plurality of language labels for the image to be detected by adopting a language description mode;

Determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images;

Inputting the language label into a language coding model to obtain a language supervision feature;

inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected;

And calculating the similarity of the language supervision feature and the image feature, and judging whether the image to be detected is a living body image according to the similarity value.

Optionally, the setting a plurality of language labels for the image to be detected by adopting a language description mode includes:

And marking the living body data in the image to be detected by adopting the language description as a data mark to obtain a plurality of language labels, wherein the language labels at least comprise data types, the data types at least comprise living bodies or prostheses, and the prostheses at least comprise one or more of prosthesis materials, accessories or illumination.

Optionally, determining a key area and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key area to obtain a corrected image, including:

Inputting the image to be detected into a pre-established face detection model to obtain a face image in the image to be detected, and determining a face area and a non-face area in the detection image according to the face image;

Performing image segmentation processing on the face region to obtain segmented images;

Inputting the segmented image into a pre-established key point detection model to obtain face key point information in the segmented image;

And mapping the face region to a pre-established face standard model according to the face key point information, and carrying out alignment calibration on the face region in the image to be detected and a standard face image to obtain a corrected image.

Optionally, the method further comprises:

And performing operations of rotating, moving, tilting, scaling and color dithering on the corrected image to obtain a processed image, and adding the processed image to an image data set.

Optionally, the inputting the corrected image into an image coding model to obtain an image feature of the image to be detected includes:

Inputting the segmented image into a pre-established image coding model for feature coding, and extracting living body feature vectors in the segmented image, wherein the pre-established image coding model is a VIT network model.

Optionally, the inputting the language tag into a language coding model to obtain a language supervision feature includes:

Converting the language tag into text data in a binary form of BPE;

Inputting the text data in the BPE binary form into a pre-established language coding model, extracting the semantic features of the tag, and determining the semantic features of the tag as language supervision features, wherein the language coding model is at least a transducer network language model.

Optionally, the calculating the similarity between the language monitor feature and the image feature includes:

calculating the similarity between the image features and the language supervision features according to the following formula;

wherein I is an image feature; t is a language supervision feature.

In a second aspect, an embodiment of the present invention provides a living body detection apparatus, including:

the acquisition module is used for acquiring the image to be detected;

the setting module is used for setting a plurality of language labels for the image to be detected by adopting a language description mode;

The identification module is used for determining key areas and key point information in the image to be detected according to the image to be detected, and carrying out image cutting and correction processing on the key areas to obtain corrected images;

The first coding module is used for inputting the language tag into a language coding model to obtain language supervision characteristics;

The second coding module is used for inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected;

the detection module is used for calculating the similarity between the language supervision feature and the image feature and judging whether the image to be detected is a living body image or not according to the similarity value.

Optionally, the setting module is configured to:

Optionally, the identification module is configured to:

Optionally, the identification module is further configured to:

Optionally, the first encoding module is configured to:

Converting the language tag into text data in a binary form of BPE;

Optionally, the second encoding module is configured to:

Optionally, the detection module is configured to:

wherein I is an image feature; t is a language supervision feature.

In a third aspect, an embodiment of the present invention provides a terminal device, including: at least one processor and memory;

The memory stores a computer program; the at least one processor executes the computer program stored by the memory to implement the living body detection method provided in the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium having stored therein a computer program which, when executed, implements the living body detection method provided in the first aspect.

The embodiment of the invention has the following advantages:

The living body detection method, the living body detection device, the terminal equipment and the readable storage medium provided by the embodiment of the invention are used for acquiring the image to be detected; setting a plurality of language labels for the image to be detected by adopting a language description mode; determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images; inputting the language label into the language coding model to obtain language supervision characteristics; inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected; and calculating the similarity of the language supervision characteristic and the image characteristic, judging whether the image to be detected is a living body image according to the similarity value, and performing living body training in a manner based on language description supervision, namely, using detailed labels, wherein the change of data does not influence the model at all along with the increase of the types of the prosthesis, so that the expansibility of the system is increased, and the accuracy of living body detection is improved.

Drawings

FIG. 1 is a flow chart of steps of an embodiment of a method for in-vivo detection of the present invention;

FIG. 2 is a flow chart of steps of yet another embodiment of a method of in vivo detection of the present invention;

FIG. 3 is a schematic diagram of a living body detection system embodiment of the present invention;

FIG. 4 is a block diagram showing the construction of an embodiment of a living body detecting device according to the present invention;

Fig. 5 is a schematic structural view of a terminal device of the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

An embodiment of the present invention provides a living body detection method for detecting a living body in an image. The execution body of the embodiment is a living body detection device, which is disposed on a terminal device, wherein the terminal device at least includes a computer, a tablet terminal, and the like.

Referring to fig. 1, there is shown a flow chart of steps of an embodiment of a method for in vivo detection of the present invention, which may specifically include the steps of:

S101, acquiring an image to be detected;

Specifically, the terminal device acquires an image to be detected, where the image to be detected includes at least a face image and possibly a prosthesis.

S102, setting a plurality of language labels for an image to be detected by adopting a language description mode;

The terminal equipment uses language specific description data to establish detailed and complex labels such as a photo of a { label }, a type of paper, a type of glass; language tags are set for living organisms and prostheses in the images to be detected.

When the living body data is marked, category labels of one-hot type are abandoned, more detailed and flexible language description is adopted as data marks, the data are described in detail, and detailed information is provided, such as whether the data type is a living body or prosthesis descriptor, a prosthesis material descriptor, an accessory descriptor, an illumination descriptor and the like.

S103, determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images;

S104, inputting the language label into a language coding model to obtain a language supervision feature;

Specifically, the terminal equipment inputs a language tag into a language code Encoder model, extracts language supervision characteristics, inputs a calibrated face picture into an image code Encode model, and extracts picture characteristics;

The language text labels are converted into BPE binary forms, then input into a language Encoder network to extract the semantic features of the labels, and a specific method of Encoder is a transducer network language model as language supervision features.

S105, inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected;

Specifically, due to the fact that the image to be detected has the problems of uncertain face positions and unstable gestures, the image to be detected, namely an original image, is subjected to a face detection model to obtain face positions, face areas and non-face areas of the image are distinguished, the face areas are cut into pictures, after the pictures are sent into a key point detection model, face key points are obtained, the face areas are mapped to a face standard model according to the face key point positions, and face alignment calibration is conducted.

S106, calculating the similarity of the language supervision characteristic and the image characteristic, and judging whether the image to be detected is a living body image or not according to the similarity value.

Specifically, the terminal device calculates the similarity between the language supervision feature and the image feature, and determines whether the image to be detected is a living image according to the magnitude of the similarity value, for example, selects the image type with the highest similarity as the image type.

The living body detection method provided by the embodiment of the invention comprises the steps of obtaining an image to be detected; setting a plurality of language labels for the image to be detected by adopting a language description mode; determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images; inputting the language label into the language coding model to obtain language supervision characteristics; inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected; and calculating the similarity of the language supervision characteristic and the image characteristic, judging whether the image to be detected is a living body image according to the similarity value, and performing living body training in a manner based on language description supervision, namely, using detailed labels, wherein the change of data does not influence the model at all along with the increase of the types of the prosthesis, so that the expansibility of the system is increased, and the accuracy of living body detection is improved.

Fig. 2 is a flowchart showing steps of still another embodiment of a living body detecting method of the present invention, which includes:

S1, when labeling the living body data labels, language description is adopted, category labels of one-hot type, such as a photo of a { label }, a type of paper, a type of glass, are abandoned, and different language labels can be created according to different living body requirements.

S2, extracting face regions and key points of the images, and performing face segmentation and calibration.

S3, inputting the language label into a language Encoder model to extract language characteristics, and inputting the calibrated face picture into a language Encode model to extract picture characteristics.

S4, calculating the similarity between the image features and the language features, and selecting the image type with the highest similarity as the image type.

As shown in fig. 3, a further embodiment of the present invention further supplements the living body detection method provided in the above embodiment.

Optionally, a language description mode is adopted to set a plurality of language labels for the image to be detected, including:

The method comprises the steps of marking living body data in an image to be detected by using language description as a data mark to obtain a plurality of language labels, wherein the language labels at least comprise data types, the data types at least comprise living bodies or prostheses, and the prostheses at least comprise one or more of prosthesis materials, accessories or illumination.

Specifically, the language supervision module discards one-hot type labels, maps language description into feature vectors, adopts a metric learning mode, and uses one image and self-corresponding language label feature as positive sample pair, besides self-corresponding language label, in-vivo language label there is usually an inclusion relationship, for example, a prosthesis contains a mask prosthesis, the mask prosthesis contains a mask prosthesis with glasses, so that one image and self-corresponding language label form positive sample pair, and language label containing relationship form positive sample pair, besides, other image and self-corresponding language label feature are negative sample pair, and specific pseudo code is as follows

I_f＝image_encoder(I)#[n,d_i]

T_f＝text_encoder(T)#[n,d_t]

I_e＝l2_normalize(np.dot(I_f,W_i),axis＝1)

T_e＝l2_normalize(np.dot(T_f,W_t),axis＝1)

logits＝np.dot(I_e,T_e.T)*np.exp(t)

Construction of positive and negative sample pairs labels

loss_i＝sigmoid_entropy_loss(logits,labels,axis＝0)

loss_t＝sigmoid_entropy_loss(logits,labels,axis＝1)

loss＝(loss_i+loss_t)/2

The image supervision module has no one-hot class label, and can not directly use a classifier, but because of the inclusion relationship in the language label, the image data can also form positive and negative sample pairs, one image and the language description form positive sample pairs, and the other positive sample pairs are negative sample pairs, and the module adopts a measurement learning mode, and the specific pseudo code is as follows

I_f＝image_encoder(I)#[n,d_i]

I_e＝l2_normalize(np.dot(I_f,W_i),axis＝1)

logits＝np.dot(I_e,I_e.T)*np.exp(t)

Construction of positive and negative sample pairs labels

loss_i＝sigmoid_entropy_loss(logits,labels,axis＝0)

loss_t＝sigmoid_entropy_loss(logits,labels,axis＝1)

loss＝(loss_i+loss_t)/2

The image positive sample pair and the negative sample pair are different from one-hot type labels, and in the language description, the containing relationship exists, such as the prosthesis comprises a paper prosthesis, a 3D mask prosthesis and the like, after subdivision, the 3D mask prosthesis comprises a gypsum material prosthesis, a graphite material prosthesis and the like, and different positive and negative sample pair standards can be established according to different requirements.

In the living body detection system of the invention, no classifier exists, so that the scores of living bodies and prostheses cannot be directly obtained, but the number of categories is not fixed, and the flexibility of a model is increased, in the invention, language description features are utilized as a class center, and as language description can have various expression modes, such as expression 1: a photo of fake; expression 2: the a photo of face and the a type of mask are used for obtaining different text feature vectors through inputting different language descriptions to serve as class centers, and meanwhile feature fusion can be carried out on different class centers, so that a model fusion effect, such as class center average value, is achieved, and model accuracy is improved.

When the traditional living body data is marked, a one-hot mode type label is adopted, for example, the prosthesis is 0, the living body is 1, the information quantity of the label is small, the language label is adopted, the language label can flexibly describe data in detail, rich supervision information is provided, for example, a paper-made prosthesis is adopted, and the accessory is glasses: a photo of a fake, type of paper, type of glass;

optionally, determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images, including:

Inputting an image to be detected into a pre-established face detection model to obtain a face image in the image to be detected, and determining a face area and a non-face area in the detected image according to the face image;

Carrying out image segmentation processing on the face region to obtain segmented images;

and mapping the face region to a pre-established face standard model according to the face key point information, and carrying out alignment calibration on the face region in the image to be detected and the standard face image to obtain a corrected image.

Specifically, the first step of face detection of an input image is to cut a face region according to a face detection frame, the cut face region is sent to a key point detection model to obtain face key points, face correction is carried out according to the face key points, and calibrated data are obtained;

In some embodiments, the face keypoint model employs a five-point model, namely a left-right eye center, a nose tip, and left-right mouth corners;

in some embodiments, the correction algorithm employs a radiological transformation;

optionally, the method further comprises:

The corrected image is subjected to operations of rotation, movement, tilting, scaling and color dithering to obtain a processed image, which is added to the image dataset.

Optionally, inputting the corrected image into an image coding model to obtain image features of the image to be detected, including:

The VIT network input requirement is image patch, so that the image needs to be divided into discontinuous image patches with the size of 16 x 16, and the vector is expressed as xp= [ Xp1; xp2; .., xpN ], while adding a class label vector Xcls, Z0, = [ Xcls; xp1; xp2; .., xpN ] as input to Encoder.

The specific implementation method of the method for extracting the living body feature vector Encoder is that the VIT network in the transducer network is used for inputting the calibrated image into the image Encoder network for feature coding

Optionally, inputting the language tag into the language coding model to obtain the language supervision feature, including:

Converting the language tag into text data in a binary form of a BPE;

In some embodiments, the language model adopts a transducer, the network layer number is 12, the width is 512, the attention head number is 8, the bag number is 49152, the input is represented by BPE, the maximum input length is limited to 76, and language characteristics are obtained after the language is sent into a language Encoder;

in some embodiments, the language feature length is 512 dimensions;

In some embodiments, the input RGB image is segmented into non-overlapping patches, each of which is considered a marker, specifically comprising:

Each RGB image I e R (h×w×c), wherein H, W and C represent the height, width, and number of channels, respectively;

The number of plaques N generated can be described as Where Ph and Pw represent the resolution of each image patch;

reshaping image I into a flattened string of two-dimensional plaques

In some embodiments, set to ph=pw=16, when an RGB image of 224x224 size is taken as input; a total of 196 plaques are produced;

in some embodiments, the image Encoder model uses a VIT network to send the plaque into the VIT network structure for encoding, resulting in image features.

In some embodiments, the image feature length is 512 dimensions;

optionally, calculating the similarity of the language supervision feature and the image feature includes:

wherein I is an image feature; t is a language supervision feature.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Another embodiment of the present invention provides a living body detecting apparatus for performing the living body detecting method provided by the above embodiment.

Referring to fig. 4, there is shown a block diagram of an embodiment of a living body detecting apparatus of the present invention, which may include the following modules in particular: an acquisition module 401, a setting module 402, an identification module 403, a first encoding module 404, a second encoding module 405 and a detection module 406, wherein:

The acquisition module 401 is configured to acquire an image to be detected;

The setting module 402 is configured to set a plurality of language tags for an image to be detected in a language description manner;

the recognition module 403 is configured to determine a key area and key point information in the image to be detected according to the image to be detected, and perform image cutting and correction processing on the key area to obtain a corrected image;

the first encoding module 404 is configured to input a language tag into the language encoding model to obtain a language supervision feature;

the second encoding module 405 is configured to input the corrected image into an image encoding model, so as to obtain an image feature of the image to be detected;

the detection module 406 is configured to calculate a similarity between the language monitoring feature and the image feature, and determine whether the image to be detected is a living body image according to the similarity value.

The living body detection device provided by the embodiment of the invention obtains the image to be detected; setting a plurality of language labels for the image to be detected by adopting a language description mode; determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images; inputting the language label into the language coding model to obtain language supervision characteristics; inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected; and calculating the similarity of the language supervision characteristic and the image characteristic, judging whether the image to be detected is a living body image according to the similarity value, and performing living body training in a manner based on language description supervision, namely, using detailed labels, wherein the change of data does not influence the model at all along with the increase of the types of the prosthesis, so that the expansibility of the system is increased, and the accuracy of living body detection is improved.

A further embodiment of the present invention provides a biopsy device according to the above embodiment.

Optionally, the setting module is configured to:

Optionally, the identification module is configured to:

Optionally, the identification module is further configured to:

Optionally, the first encoding module is configured to:

Converting the language tag into text data in a binary form of a BPE;

Optionally, the second encoding module is configured to:

Optionally, the detection module is configured to:

wherein I is an image feature; t is a language supervision feature.

It should be noted that, in this embodiment, each of the embodiments may be implemented separately, or may be implemented in any combination without conflict, without limiting the application.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

Still another embodiment of the present invention provides a terminal device for performing the living body detection method provided in the above embodiment.

Fig. 5 is a schematic structural view of a terminal device of the present invention, as shown in fig. 5, the terminal device includes: at least one processor 501 and memory 502;

The memory stores a computer program; the at least one processor executes the computer program stored in the memory to implement the living body detection method provided by the above embodiment.

The terminal equipment provided by the embodiment obtains the image to be detected; setting a plurality of language labels for the image to be detected by adopting a language description mode; determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images; inputting the language label into the language coding model to obtain language supervision characteristics; inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected; and calculating the similarity of the language supervision characteristic and the image characteristic, judging whether the image to be detected is a living body image according to the similarity value, and performing living body training in a manner based on language description supervision, namely, using detailed labels, wherein the change of data does not influence the model at all along with the increase of the types of the prosthesis, so that the expansibility of the system is increased, and the accuracy of living body detection is improved.

Still another embodiment of the present application provides a computer-readable storage medium having stored therein a computer program which, when executed, implements the living body detection method provided in any of the above embodiments.

According to the computer-readable storage medium of the present embodiment, an image to be detected is obtained; setting a plurality of language labels for the image to be detected by adopting a language description mode; determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images; inputting the language label into the language coding model to obtain language supervision characteristics; inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected; and calculating the similarity of the language supervision characteristic and the image characteristic, judging whether the image to be detected is a living body image according to the similarity value, and performing living body training in a manner based on language description supervision, namely, using detailed labels, wherein the change of data does not influence the model at all along with the increase of the types of the prosthesis, so that the expansibility of the system is increased, and the accuracy of living body detection is improved.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, electronic devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing electronic device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing electronic device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or electronic device that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or electronic device. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or electronic device that comprises an element.

The above description of a living body detection method and a living body detection device provided by the present invention has been provided in detail, and specific examples are applied herein to illustrate the principles and embodiments of the present invention, and the above examples are only for helping to understand the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. A method of in vivo detection, the method comprising:

Acquiring an image to be detected;

Comprising the following steps: inputting the image to be detected into a pre-established face detection model to obtain a face image in the image to be detected, and determining a face area and a non-face area in the detection image according to the face image;

According to the face key point information, mapping the face area to a pre-established face standard model, and carrying out alignment calibration on the face area in the image to be detected and a standard face image to obtain a corrected image;

Inputting the language label into a language coding model to obtain a language supervision feature; comprising the following steps:

Converting the language tag into text data in a binary form of BPE;

Inputting the text data in the BPE binary form into a pre-established language coding model, extracting the semantic features of the tag, and determining the semantic features of the tag as language supervision features, wherein the language coding model is at least a Transformer network language model;

The method comprises the steps that a one-hot type label is abandoned by language supervision characteristics, language description is mapped into feature vectors, a measurement learning mode is adopted, one image and self-corresponding language label characteristics are positive sample pairs, except the self-corresponding language label, an inclusion relationship exists in a living language label, a prosthesis comprises a mask prosthesis, the mask prosthesis comprises a mask prosthesis with glasses, so that one image and the self-corresponding language label form positive sample pairs, and the other image and the self-corresponding language label form positive sample pairs;

2. The method according to claim 1, wherein the setting a plurality of language tags for the image to be detected in a language description manner includes:

3. The method according to claim 2, wherein the method further comprises:

4. A method according to claim 3, wherein said inputting the corrected image into an image coding model to obtain image features of the image to be detected comprises:

5. The method of claim 4, wherein said calculating the similarity of said language monitor feature and said image feature comprises:

calculating the similarity between the image features and the language supervision features according to the following formula; wherein I is an image feature; t is a language supervision feature.

6. A living body detection apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring the image to be detected;

The detection module is used for calculating the similarity of the language supervision characteristic and the image characteristic and judging whether the image to be detected is a living body image or not according to the similarity value;

Converting the language tag into text data in a binary form of BPE;

The method includes that a one-hot type label is abandoned by language supervision features, language description is mapped into feature vectors, a measurement learning mode is adopted, one image and self-corresponding language label features are positive sample pairs, except the self-corresponding language label, an inclusion relationship exists in a living language label, a prosthesis comprises a mask prosthesis, the mask prosthesis comprises a mask prosthesis with glasses, therefore, one image and the self-corresponding language label form positive sample pairs, and the other image and self-corresponding language label form positive sample pairs, except the self-corresponding language label, the other image and self-corresponding language label form negative sample pairs.

7. A terminal device, comprising: at least one processor and memory;

The memory stores a computer program; the at least one processor executes the computer program stored by the memory to implement the living body detection method of any one of claims 1-5.

8. A computer readable storage medium, characterized in that a computer program is stored in the computer readable storage medium, which computer program, when executed, implements the living body detection method according to any one of claims 1-5.