CN115830721B - Living body detection method, living body detection device, terminal device and readable storage medium - Google Patents

Living body detection method, living body detection device, terminal device and readable storage medium Download PDF

Info

Publication number
CN115830721B
CN115830721B CN202211361753.6A CN202211361753A CN115830721B CN 115830721 B CN115830721 B CN 115830721B CN 202211361753 A CN202211361753 A CN 202211361753A CN 115830721 B CN115830721 B CN 115830721B
Authority
CN
China
Prior art keywords
image
language
detected
face
living body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211361753.6A
Other languages
Chinese (zh)
Other versions
CN115830721A (en
Inventor
肖良才
胡文杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENZHEN ELOAM TECHNOLOGY CO LTD
Original Assignee
SHENZHEN ELOAM TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENZHEN ELOAM TECHNOLOGY CO LTD filed Critical SHENZHEN ELOAM TECHNOLOGY CO LTD
Priority to CN202211361753.6A priority Critical patent/CN115830721B/en
Publication of CN115830721A publication Critical patent/CN115830721A/en
Application granted granted Critical
Publication of CN115830721B publication Critical patent/CN115830721B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the invention provides a living body detection method, a living body detection device, terminal equipment and a readable storage medium, wherein an image to be detected is obtained; setting a plurality of language labels for the image to be detected by adopting a language description mode; determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images; inputting the language label into the language coding model to obtain language supervision characteristics; inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected; and calculating the similarity of the language supervision characteristic and the image characteristic, judging whether the image to be detected is a living body image according to the similarity value, and performing living body training in a manner based on language description supervision, namely, using detailed labels, wherein the change of data does not influence the model at all along with the increase of the types of the prosthesis, so that the expansibility of the system is increased, and the accuracy of living body detection is improved.

Description

Living body detection method, living body detection device, terminal device and readable storage medium
Technical Field
The present invention relates to the field of computer vision, and in particular, to a living body detection method, apparatus, terminal device, and readable storage medium.
Background
With the use of deep learning, face recognition systems have emerged in numerous products, with the consequent living detection of faces becoming one of the subjects of intense research in computer vision. It needs to judge the identified face as living body or false body to prevent criminals from stealing other information.
The main mode of the existing living body detection model is also supervision training, the data label adopts a category label in a one-hot mode, the category label is single in form and small in information quantity, and the living body cannot be accurately detected by adopting the existing detection method due to the complexity and uncertainty of the prosthesis type in the prosthesis data.
Disclosure of Invention
In view of the above problems, embodiments of the present invention are presented to provide a living body detection method, apparatus, terminal device, and readable storage medium that overcome or at least partially solve the above problems.
In a first aspect, an embodiment of the present invention provides a living body detection method, including:
Acquiring an image to be detected;
setting a plurality of language labels for the image to be detected by adopting a language description mode;
Determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images;
Inputting the language label into a language coding model to obtain a language supervision feature;
inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected;
And calculating the similarity of the language supervision feature and the image feature, and judging whether the image to be detected is a living body image according to the similarity value.
Optionally, the setting a plurality of language labels for the image to be detected by adopting a language description mode includes:
And marking the living body data in the image to be detected by adopting the language description as a data mark to obtain a plurality of language labels, wherein the language labels at least comprise data types, the data types at least comprise living bodies or prostheses, and the prostheses at least comprise one or more of prosthesis materials, accessories or illumination.
Optionally, determining a key area and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key area to obtain a corrected image, including:
Inputting the image to be detected into a pre-established face detection model to obtain a face image in the image to be detected, and determining a face area and a non-face area in the detection image according to the face image;
Performing image segmentation processing on the face region to obtain segmented images;
Inputting the segmented image into a pre-established key point detection model to obtain face key point information in the segmented image;
And mapping the face region to a pre-established face standard model according to the face key point information, and carrying out alignment calibration on the face region in the image to be detected and a standard face image to obtain a corrected image.
Optionally, the method further comprises:
And performing operations of rotating, moving, tilting, scaling and color dithering on the corrected image to obtain a processed image, and adding the processed image to an image data set.
Optionally, the inputting the corrected image into an image coding model to obtain an image feature of the image to be detected includes:
Inputting the segmented image into a pre-established image coding model for feature coding, and extracting living body feature vectors in the segmented image, wherein the pre-established image coding model is a VIT network model.
Optionally, the inputting the language tag into a language coding model to obtain a language supervision feature includes:
Converting the language tag into text data in a binary form of BPE;
Inputting the text data in the BPE binary form into a pre-established language coding model, extracting the semantic features of the tag, and determining the semantic features of the tag as language supervision features, wherein the language coding model is at least a transducer network language model.
Optionally, the calculating the similarity between the language monitor feature and the image feature includes:
calculating the similarity between the image features and the language supervision features according to the following formula;
wherein I is an image feature; t is a language supervision feature.
In a second aspect, an embodiment of the present invention provides a living body detection apparatus, including:
the acquisition module is used for acquiring the image to be detected;
the setting module is used for setting a plurality of language labels for the image to be detected by adopting a language description mode;
The identification module is used for determining key areas and key point information in the image to be detected according to the image to be detected, and carrying out image cutting and correction processing on the key areas to obtain corrected images;
The first coding module is used for inputting the language tag into a language coding model to obtain language supervision characteristics;
The second coding module is used for inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected;
the detection module is used for calculating the similarity between the language supervision feature and the image feature and judging whether the image to be detected is a living body image or not according to the similarity value.
Optionally, the setting module is configured to:
And marking the living body data in the image to be detected by adopting the language description as a data mark to obtain a plurality of language labels, wherein the language labels at least comprise data types, the data types at least comprise living bodies or prostheses, and the prostheses at least comprise one or more of prosthesis materials, accessories or illumination.
Optionally, the identification module is configured to:
Inputting the image to be detected into a pre-established face detection model to obtain a face image in the image to be detected, and determining a face area and a non-face area in the detection image according to the face image;
Performing image segmentation processing on the face region to obtain segmented images;
Inputting the segmented image into a pre-established key point detection model to obtain face key point information in the segmented image;
And mapping the face region to a pre-established face standard model according to the face key point information, and carrying out alignment calibration on the face region in the image to be detected and a standard face image to obtain a corrected image.
Optionally, the identification module is further configured to:
And performing operations of rotating, moving, tilting, scaling and color dithering on the corrected image to obtain a processed image, and adding the processed image to an image data set.
Optionally, the first encoding module is configured to:
Converting the language tag into text data in a binary form of BPE;
Inputting the text data in the BPE binary form into a pre-established language coding model, extracting the semantic features of the tag, and determining the semantic features of the tag as language supervision features, wherein the language coding model is at least a transducer network language model.
Optionally, the second encoding module is configured to:
Inputting the segmented image into a pre-established image coding model for feature coding, and extracting living body feature vectors in the segmented image, wherein the pre-established image coding model is a VIT network model.
Optionally, the detection module is configured to:
calculating the similarity between the image features and the language supervision features according to the following formula;
wherein I is an image feature; t is a language supervision feature.
In a third aspect, an embodiment of the present invention provides a terminal device, including: at least one processor and memory;
The memory stores a computer program; the at least one processor executes the computer program stored by the memory to implement the living body detection method provided in the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium having stored therein a computer program which, when executed, implements the living body detection method provided in the first aspect.
The embodiment of the invention has the following advantages:
The living body detection method, the living body detection device, the terminal equipment and the readable storage medium provided by the embodiment of the invention are used for acquiring the image to be detected; setting a plurality of language labels for the image to be detected by adopting a language description mode; determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images; inputting the language label into the language coding model to obtain language supervision characteristics; inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected; and calculating the similarity of the language supervision characteristic and the image characteristic, judging whether the image to be detected is a living body image according to the similarity value, and performing living body training in a manner based on language description supervision, namely, using detailed labels, wherein the change of data does not influence the model at all along with the increase of the types of the prosthesis, so that the expansibility of the system is increased, and the accuracy of living body detection is improved.
Drawings
FIG. 1 is a flow chart of steps of an embodiment of a method for in-vivo detection of the present invention;
FIG. 2 is a flow chart of steps of yet another embodiment of a method of in vivo detection of the present invention;
FIG. 3 is a schematic diagram of a living body detection system embodiment of the present invention;
FIG. 4 is a block diagram showing the construction of an embodiment of a living body detecting device according to the present invention;
Fig. 5 is a schematic structural view of a terminal device of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
An embodiment of the present invention provides a living body detection method for detecting a living body in an image. The execution body of the embodiment is a living body detection device, which is disposed on a terminal device, wherein the terminal device at least includes a computer, a tablet terminal, and the like.
Referring to fig. 1, there is shown a flow chart of steps of an embodiment of a method for in vivo detection of the present invention, which may specifically include the steps of:
S101, acquiring an image to be detected;
Specifically, the terminal device acquires an image to be detected, where the image to be detected includes at least a face image and possibly a prosthesis.
S102, setting a plurality of language labels for an image to be detected by adopting a language description mode;
The terminal equipment uses language specific description data to establish detailed and complex labels such as a photo of a { label }, a type of paper, a type of glass; language tags are set for living organisms and prostheses in the images to be detected.
When the living body data is marked, category labels of one-hot type are abandoned, more detailed and flexible language description is adopted as data marks, the data are described in detail, and detailed information is provided, such as whether the data type is a living body or prosthesis descriptor, a prosthesis material descriptor, an accessory descriptor, an illumination descriptor and the like.
S103, determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images;
S104, inputting the language label into a language coding model to obtain a language supervision feature;
Specifically, the terminal equipment inputs a language tag into a language code Encoder model, extracts language supervision characteristics, inputs a calibrated face picture into an image code Encode model, and extracts picture characteristics;
The language text labels are converted into BPE binary forms, then input into a language Encoder network to extract the semantic features of the labels, and a specific method of Encoder is a transducer network language model as language supervision features.
S105, inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected;
Specifically, due to the fact that the image to be detected has the problems of uncertain face positions and unstable gestures, the image to be detected, namely an original image, is subjected to a face detection model to obtain face positions, face areas and non-face areas of the image are distinguished, the face areas are cut into pictures, after the pictures are sent into a key point detection model, face key points are obtained, the face areas are mapped to a face standard model according to the face key point positions, and face alignment calibration is conducted.
S106, calculating the similarity of the language supervision characteristic and the image characteristic, and judging whether the image to be detected is a living body image or not according to the similarity value.
Specifically, the terminal device calculates the similarity between the language supervision feature and the image feature, and determines whether the image to be detected is a living image according to the magnitude of the similarity value, for example, selects the image type with the highest similarity as the image type.
The living body detection method provided by the embodiment of the invention comprises the steps of obtaining an image to be detected; setting a plurality of language labels for the image to be detected by adopting a language description mode; determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images; inputting the language label into the language coding model to obtain language supervision characteristics; inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected; and calculating the similarity of the language supervision characteristic and the image characteristic, judging whether the image to be detected is a living body image according to the similarity value, and performing living body training in a manner based on language description supervision, namely, using detailed labels, wherein the change of data does not influence the model at all along with the increase of the types of the prosthesis, so that the expansibility of the system is increased, and the accuracy of living body detection is improved.
Fig. 2 is a flowchart showing steps of still another embodiment of a living body detecting method of the present invention, which includes:
S1, when labeling the living body data labels, language description is adopted, category labels of one-hot type, such as a photo of a { label }, a type of paper, a type of glass, are abandoned, and different language labels can be created according to different living body requirements.
S2, extracting face regions and key points of the images, and performing face segmentation and calibration.
S3, inputting the language label into a language Encoder model to extract language characteristics, and inputting the calibrated face picture into a language Encode model to extract picture characteristics.
S4, calculating the similarity between the image features and the language features, and selecting the image type with the highest similarity as the image type.
As shown in fig. 3, a further embodiment of the present invention further supplements the living body detection method provided in the above embodiment.
Optionally, a language description mode is adopted to set a plurality of language labels for the image to be detected, including:
The method comprises the steps of marking living body data in an image to be detected by using language description as a data mark to obtain a plurality of language labels, wherein the language labels at least comprise data types, the data types at least comprise living bodies or prostheses, and the prostheses at least comprise one or more of prosthesis materials, accessories or illumination.
Specifically, the language supervision module discards one-hot type labels, maps language description into feature vectors, adopts a metric learning mode, and uses one image and self-corresponding language label feature as positive sample pair, besides self-corresponding language label, in-vivo language label there is usually an inclusion relationship, for example, a prosthesis contains a mask prosthesis, the mask prosthesis contains a mask prosthesis with glasses, so that one image and self-corresponding language label form positive sample pair, and language label containing relationship form positive sample pair, besides, other image and self-corresponding language label feature are negative sample pair, and specific pseudo code is as follows
I_f=image_encoder(I)#[n,d_i]
T_f=text_encoder(T)#[n,d_t]
I_e=l2_normalize(np.dot(I_f,W_i),axis=1)
T_e=l2_normalize(np.dot(T_f,W_t),axis=1)
logits=np.dot(I_e,T_e.T)*np.exp(t)
Construction of positive and negative sample pairs labels
loss_i=sigmoid_entropy_loss(logits,labels,axis=0)
loss_t=sigmoid_entropy_loss(logits,labels,axis=1)
loss=(loss_i+loss_t)/2
The image supervision module has no one-hot class label, and can not directly use a classifier, but because of the inclusion relationship in the language label, the image data can also form positive and negative sample pairs, one image and the language description form positive sample pairs, and the other positive sample pairs are negative sample pairs, and the module adopts a measurement learning mode, and the specific pseudo code is as follows
I_f=image_encoder(I)#[n,d_i]
I_e=l2_normalize(np.dot(I_f,W_i),axis=1)
logits=np.dot(I_e,I_e.T)*np.exp(t)
Construction of positive and negative sample pairs labels
loss_i=sigmoid_entropy_loss(logits,labels,axis=0)
loss_t=sigmoid_entropy_loss(logits,labels,axis=1)
loss=(loss_i+loss_t)/2
The image positive sample pair and the negative sample pair are different from one-hot type labels, and in the language description, the containing relationship exists, such as the prosthesis comprises a paper prosthesis, a 3D mask prosthesis and the like, after subdivision, the 3D mask prosthesis comprises a gypsum material prosthesis, a graphite material prosthesis and the like, and different positive and negative sample pair standards can be established according to different requirements.
In the living body detection system of the invention, no classifier exists, so that the scores of living bodies and prostheses cannot be directly obtained, but the number of categories is not fixed, and the flexibility of a model is increased, in the invention, language description features are utilized as a class center, and as language description can have various expression modes, such as expression 1: a photo of fake; expression 2: the a photo of face and the a type of mask are used for obtaining different text feature vectors through inputting different language descriptions to serve as class centers, and meanwhile feature fusion can be carried out on different class centers, so that a model fusion effect, such as class center average value, is achieved, and model accuracy is improved.
When the traditional living body data is marked, a one-hot mode type label is adopted, for example, the prosthesis is 0, the living body is 1, the information quantity of the label is small, the language label is adopted, the language label can flexibly describe data in detail, rich supervision information is provided, for example, a paper-made prosthesis is adopted, and the accessory is glasses: a photo of a fake, type of paper, type of glass;
optionally, determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images, including:
Inputting an image to be detected into a pre-established face detection model to obtain a face image in the image to be detected, and determining a face area and a non-face area in the detected image according to the face image;
Carrying out image segmentation processing on the face region to obtain segmented images;
Inputting the segmented image into a pre-established key point detection model to obtain face key point information in the segmented image;
and mapping the face region to a pre-established face standard model according to the face key point information, and carrying out alignment calibration on the face region in the image to be detected and the standard face image to obtain a corrected image.
Specifically, the first step of face detection of an input image is to cut a face region according to a face detection frame, the cut face region is sent to a key point detection model to obtain face key points, face correction is carried out according to the face key points, and calibrated data are obtained;
In some embodiments, the face keypoint model employs a five-point model, namely a left-right eye center, a nose tip, and left-right mouth corners;
in some embodiments, the correction algorithm employs a radiological transformation;
optionally, the method further comprises:
The corrected image is subjected to operations of rotation, movement, tilting, scaling and color dithering to obtain a processed image, which is added to the image dataset.
Optionally, inputting the corrected image into an image coding model to obtain image features of the image to be detected, including:
Inputting the segmented image into a pre-established image coding model for feature coding, and extracting living body feature vectors in the segmented image, wherein the pre-established image coding model is a VIT network model.
The VIT network input requirement is image patch, so that the image needs to be divided into discontinuous image patches with the size of 16 x 16, and the vector is expressed as xp= [ Xp1; xp2; .., xpN ], while adding a class label vector Xcls, Z0, = [ Xcls; xp1; xp2; .., xpN ] as input to Encoder.
The specific implementation method of the method for extracting the living body feature vector Encoder is that the VIT network in the transducer network is used for inputting the calibrated image into the image Encoder network for feature coding
Optionally, inputting the language tag into the language coding model to obtain the language supervision feature, including:
Converting the language tag into text data in a binary form of a BPE;
Inputting the text data in the BPE binary form into a pre-established language coding model, extracting the semantic features of the tag, and determining the semantic features of the tag as language supervision features, wherein the language coding model is at least a transducer network language model.
In some embodiments, the language model adopts a transducer, the network layer number is 12, the width is 512, the attention head number is 8, the bag number is 49152, the input is represented by BPE, the maximum input length is limited to 76, and language characteristics are obtained after the language is sent into a language Encoder;
in some embodiments, the language feature length is 512 dimensions;
In some embodiments, the input RGB image is segmented into non-overlapping patches, each of which is considered a marker, specifically comprising:
Each RGB image I e R (h×w×c), wherein H, W and C represent the height, width, and number of channels, respectively;
The number of plaques N generated can be described as Where Ph and Pw represent the resolution of each image patch;
reshaping image I into a flattened string of two-dimensional plaques
In some embodiments, set to ph=pw=16, when an RGB image of 224x224 size is taken as input; a total of 196 plaques are produced;
in some embodiments, the image Encoder model uses a VIT network to send the plaque into the VIT network structure for encoding, resulting in image features.
In some embodiments, the image feature length is 512 dimensions;
optionally, calculating the similarity of the language supervision feature and the image feature includes:
calculating the similarity between the image features and the language supervision features according to the following formula;
wherein I is an image feature; t is a language supervision feature.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.
The living body detection method provided by the embodiment of the invention comprises the steps of obtaining an image to be detected; setting a plurality of language labels for the image to be detected by adopting a language description mode; determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images; inputting the language label into the language coding model to obtain language supervision characteristics; inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected; and calculating the similarity of the language supervision characteristic and the image characteristic, judging whether the image to be detected is a living body image according to the similarity value, and performing living body training in a manner based on language description supervision, namely, using detailed labels, wherein the change of data does not influence the model at all along with the increase of the types of the prosthesis, so that the expansibility of the system is increased, and the accuracy of living body detection is improved.
Another embodiment of the present invention provides a living body detecting apparatus for performing the living body detecting method provided by the above embodiment.
Referring to fig. 4, there is shown a block diagram of an embodiment of a living body detecting apparatus of the present invention, which may include the following modules in particular: an acquisition module 401, a setting module 402, an identification module 403, a first encoding module 404, a second encoding module 405 and a detection module 406, wherein:
The acquisition module 401 is configured to acquire an image to be detected;
The setting module 402 is configured to set a plurality of language tags for an image to be detected in a language description manner;
the recognition module 403 is configured to determine a key area and key point information in the image to be detected according to the image to be detected, and perform image cutting and correction processing on the key area to obtain a corrected image;
the first encoding module 404 is configured to input a language tag into the language encoding model to obtain a language supervision feature;
the second encoding module 405 is configured to input the corrected image into an image encoding model, so as to obtain an image feature of the image to be detected;
the detection module 406 is configured to calculate a similarity between the language monitoring feature and the image feature, and determine whether the image to be detected is a living body image according to the similarity value.
The living body detection device provided by the embodiment of the invention obtains the image to be detected; setting a plurality of language labels for the image to be detected by adopting a language description mode; determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images; inputting the language label into the language coding model to obtain language supervision characteristics; inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected; and calculating the similarity of the language supervision characteristic and the image characteristic, judging whether the image to be detected is a living body image according to the similarity value, and performing living body training in a manner based on language description supervision, namely, using detailed labels, wherein the change of data does not influence the model at all along with the increase of the types of the prosthesis, so that the expansibility of the system is increased, and the accuracy of living body detection is improved.
A further embodiment of the present invention provides a biopsy device according to the above embodiment.
Optionally, the setting module is configured to:
The method comprises the steps of marking living body data in an image to be detected by using language description as a data mark to obtain a plurality of language labels, wherein the language labels at least comprise data types, the data types at least comprise living bodies or prostheses, and the prostheses at least comprise one or more of prosthesis materials, accessories or illumination.
Optionally, the identification module is configured to:
Inputting an image to be detected into a pre-established face detection model to obtain a face image in the image to be detected, and determining a face area and a non-face area in the detected image according to the face image;
Carrying out image segmentation processing on the face region to obtain segmented images;
Inputting the segmented image into a pre-established key point detection model to obtain face key point information in the segmented image;
and mapping the face region to a pre-established face standard model according to the face key point information, and carrying out alignment calibration on the face region in the image to be detected and the standard face image to obtain a corrected image.
Optionally, the identification module is further configured to:
The corrected image is subjected to operations of rotation, movement, tilting, scaling and color dithering to obtain a processed image, which is added to the image dataset.
Optionally, the first encoding module is configured to:
Converting the language tag into text data in a binary form of a BPE;
Inputting the text data in the BPE binary form into a pre-established language coding model, extracting the semantic features of the tag, and determining the semantic features of the tag as language supervision features, wherein the language coding model is at least a transducer network language model.
Optionally, the second encoding module is configured to:
Inputting the segmented image into a pre-established image coding model for feature coding, and extracting living body feature vectors in the segmented image, wherein the pre-established image coding model is a VIT network model.
Optionally, the detection module is configured to:
calculating the similarity between the image features and the language supervision features according to the following formula;
wherein I is an image feature; t is a language supervision feature.
It should be noted that, in this embodiment, each of the embodiments may be implemented separately, or may be implemented in any combination without conflict, without limiting the application.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
The living body detection device provided by the embodiment of the invention obtains the image to be detected; setting a plurality of language labels for the image to be detected by adopting a language description mode; determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images; inputting the language label into the language coding model to obtain language supervision characteristics; inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected; and calculating the similarity of the language supervision characteristic and the image characteristic, judging whether the image to be detected is a living body image according to the similarity value, and performing living body training in a manner based on language description supervision, namely, using detailed labels, wherein the change of data does not influence the model at all along with the increase of the types of the prosthesis, so that the expansibility of the system is increased, and the accuracy of living body detection is improved.
Still another embodiment of the present invention provides a terminal device for performing the living body detection method provided in the above embodiment.
Fig. 5 is a schematic structural view of a terminal device of the present invention, as shown in fig. 5, the terminal device includes: at least one processor 501 and memory 502;
The memory stores a computer program; the at least one processor executes the computer program stored in the memory to implement the living body detection method provided by the above embodiment.
The terminal equipment provided by the embodiment obtains the image to be detected; setting a plurality of language labels for the image to be detected by adopting a language description mode; determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images; inputting the language label into the language coding model to obtain language supervision characteristics; inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected; and calculating the similarity of the language supervision characteristic and the image characteristic, judging whether the image to be detected is a living body image according to the similarity value, and performing living body training in a manner based on language description supervision, namely, using detailed labels, wherein the change of data does not influence the model at all along with the increase of the types of the prosthesis, so that the expansibility of the system is increased, and the accuracy of living body detection is improved.
Still another embodiment of the present application provides a computer-readable storage medium having stored therein a computer program which, when executed, implements the living body detection method provided in any of the above embodiments.
According to the computer-readable storage medium of the present embodiment, an image to be detected is obtained; setting a plurality of language labels for the image to be detected by adopting a language description mode; determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images; inputting the language label into the language coding model to obtain language supervision characteristics; inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected; and calculating the similarity of the language supervision characteristic and the image characteristic, judging whether the image to be detected is a living body image according to the similarity value, and performing living body training in a manner based on language description supervision, namely, using detailed labels, wherein the change of data does not influence the model at all along with the increase of the types of the prosthesis, so that the expansibility of the system is increased, and the accuracy of living body detection is improved.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, electronic devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing electronic device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing electronic device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or electronic device that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or electronic device. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or electronic device that comprises an element.
The above description of a living body detection method and a living body detection device provided by the present invention has been provided in detail, and specific examples are applied herein to illustrate the principles and embodiments of the present invention, and the above examples are only for helping to understand the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (8)

1. A method of in vivo detection, the method comprising:
Acquiring an image to be detected;
setting a plurality of language labels for the image to be detected by adopting a language description mode;
Determining key areas and key point information in the image to be detected according to the image to be detected, and performing image cutting and correction processing on the key areas to obtain corrected images;
Comprising the following steps: inputting the image to be detected into a pre-established face detection model to obtain a face image in the image to be detected, and determining a face area and a non-face area in the detection image according to the face image;
Performing image segmentation processing on the face region to obtain segmented images;
Inputting the segmented image into a pre-established key point detection model to obtain face key point information in the segmented image;
According to the face key point information, mapping the face area to a pre-established face standard model, and carrying out alignment calibration on the face area in the image to be detected and a standard face image to obtain a corrected image;
Inputting the language label into a language coding model to obtain a language supervision feature; comprising the following steps:
Converting the language tag into text data in a binary form of BPE;
Inputting the text data in the BPE binary form into a pre-established language coding model, extracting the semantic features of the tag, and determining the semantic features of the tag as language supervision features, wherein the language coding model is at least a Transformer network language model;
The method comprises the steps that a one-hot type label is abandoned by language supervision characteristics, language description is mapped into feature vectors, a measurement learning mode is adopted, one image and self-corresponding language label characteristics are positive sample pairs, except the self-corresponding language label, an inclusion relationship exists in a living language label, a prosthesis comprises a mask prosthesis, the mask prosthesis comprises a mask prosthesis with glasses, so that one image and the self-corresponding language label form positive sample pairs, and the other image and the self-corresponding language label form positive sample pairs;
inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected;
And calculating the similarity of the language supervision feature and the image feature, and judging whether the image to be detected is a living body image according to the similarity value.
2. The method according to claim 1, wherein the setting a plurality of language tags for the image to be detected in a language description manner includes:
And marking the living body data in the image to be detected by adopting the language description as a data mark to obtain a plurality of language labels, wherein the language labels at least comprise data types, the data types at least comprise living bodies or prostheses, and the prostheses at least comprise one or more of prosthesis materials, accessories or illumination.
3. The method according to claim 2, wherein the method further comprises:
And performing operations of rotating, moving, tilting, scaling and color dithering on the corrected image to obtain a processed image, and adding the processed image to an image data set.
4. A method according to claim 3, wherein said inputting the corrected image into an image coding model to obtain image features of the image to be detected comprises:
Inputting the segmented image into a pre-established image coding model for feature coding, and extracting living body feature vectors in the segmented image, wherein the pre-established image coding model is a VIT network model.
5. The method of claim 4, wherein said calculating the similarity of said language monitor feature and said image feature comprises:
calculating the similarity between the image features and the language supervision features according to the following formula; wherein I is an image feature; t is a language supervision feature.
6. A living body detection apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring the image to be detected;
the setting module is used for setting a plurality of language labels for the image to be detected by adopting a language description mode;
The identification module is used for determining key areas and key point information in the image to be detected according to the image to be detected, and carrying out image cutting and correction processing on the key areas to obtain corrected images;
Comprising the following steps: inputting the image to be detected into a pre-established face detection model to obtain a face image in the image to be detected, and determining a face area and a non-face area in the detection image according to the face image;
Performing image segmentation processing on the face region to obtain segmented images;
Inputting the segmented image into a pre-established key point detection model to obtain face key point information in the segmented image;
According to the face key point information, mapping the face area to a pre-established face standard model, and carrying out alignment calibration on the face area in the image to be detected and a standard face image to obtain a corrected image;
The first coding module is used for inputting the language tag into a language coding model to obtain language supervision characteristics;
The second coding module is used for inputting the corrected image into an image coding model to obtain image characteristics of the image to be detected;
The detection module is used for calculating the similarity of the language supervision characteristic and the image characteristic and judging whether the image to be detected is a living body image or not according to the similarity value;
Inputting the language label into a language coding model to obtain a language supervision feature; comprising the following steps:
Converting the language tag into text data in a binary form of BPE;
Inputting the text data in the BPE binary form into a pre-established language coding model, extracting the semantic features of the tag, and determining the semantic features of the tag as language supervision features, wherein the language coding model is at least a Transformer network language model;
The method includes that a one-hot type label is abandoned by language supervision features, language description is mapped into feature vectors, a measurement learning mode is adopted, one image and self-corresponding language label features are positive sample pairs, except the self-corresponding language label, an inclusion relationship exists in a living language label, a prosthesis comprises a mask prosthesis, the mask prosthesis comprises a mask prosthesis with glasses, therefore, one image and the self-corresponding language label form positive sample pairs, and the other image and self-corresponding language label form positive sample pairs, except the self-corresponding language label, the other image and self-corresponding language label form negative sample pairs.
7. A terminal device, comprising: at least one processor and memory;
The memory stores a computer program; the at least one processor executes the computer program stored by the memory to implement the living body detection method of any one of claims 1-5.
8. A computer readable storage medium, characterized in that a computer program is stored in the computer readable storage medium, which computer program, when executed, implements the living body detection method according to any one of claims 1-5.
CN202211361753.6A 2022-11-02 2022-11-02 Living body detection method, living body detection device, terminal device and readable storage medium Active CN115830721B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211361753.6A CN115830721B (en) 2022-11-02 2022-11-02 Living body detection method, living body detection device, terminal device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211361753.6A CN115830721B (en) 2022-11-02 2022-11-02 Living body detection method, living body detection device, terminal device and readable storage medium

Publications (2)

Publication Number Publication Date
CN115830721A CN115830721A (en) 2023-03-21
CN115830721B true CN115830721B (en) 2024-05-03

Family

ID=85526185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211361753.6A Active CN115830721B (en) 2022-11-02 2022-11-02 Living body detection method, living body detection device, terminal device and readable storage medium

Country Status (1)

Country Link
CN (1) CN115830721B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117197904B (en) * 2023-03-31 2024-09-17 北京百度网讯科技有限公司 Training method of human face living body detection model, human face living body detection method and human face living body detection device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845499A (en) * 2017-01-19 2017-06-13 清华大学 A kind of image object detection method semantic based on natural language
CN110033018A (en) * 2019-03-06 2019-07-19 平安科技(深圳)有限公司 Shape similarity judgment method, device and computer readable storage medium
CN110909673A (en) * 2019-11-21 2020-03-24 河北工业大学 Pedestrian re-identification method based on natural language description
CN111598155A (en) * 2020-05-13 2020-08-28 北京工业大学 Fine-grained image weak supervision target positioning method based on deep learning
CN111738186A (en) * 2020-06-28 2020-10-02 香港中文大学(深圳) Target positioning method and device, electronic equipment and readable storage medium
WO2021031528A1 (en) * 2019-08-21 2021-02-25 创新先进技术有限公司 Method, apparatus, and device for identifying operation user
CN113033465A (en) * 2021-04-13 2021-06-25 北京百度网讯科技有限公司 Living body detection model training method, device, equipment and storage medium
WO2021147325A1 (en) * 2020-01-21 2021-07-29 华为技术有限公司 Object detection method and apparatus, and storage medium
CN113591526A (en) * 2020-04-30 2021-11-02 华为技术有限公司 Face living body detection method, device, equipment and computer readable storage medium
CN114998962A (en) * 2022-05-31 2022-09-02 北京三快在线科技有限公司 Living body detection and model training method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845499A (en) * 2017-01-19 2017-06-13 清华大学 A kind of image object detection method semantic based on natural language
CN110033018A (en) * 2019-03-06 2019-07-19 平安科技(深圳)有限公司 Shape similarity judgment method, device and computer readable storage medium
WO2021031528A1 (en) * 2019-08-21 2021-02-25 创新先进技术有限公司 Method, apparatus, and device for identifying operation user
CN110909673A (en) * 2019-11-21 2020-03-24 河北工业大学 Pedestrian re-identification method based on natural language description
WO2021147325A1 (en) * 2020-01-21 2021-07-29 华为技术有限公司 Object detection method and apparatus, and storage medium
CN113591526A (en) * 2020-04-30 2021-11-02 华为技术有限公司 Face living body detection method, device, equipment and computer readable storage medium
CN111598155A (en) * 2020-05-13 2020-08-28 北京工业大学 Fine-grained image weak supervision target positioning method based on deep learning
CN111738186A (en) * 2020-06-28 2020-10-02 香港中文大学(深圳) Target positioning method and device, electronic equipment and readable storage medium
CN113033465A (en) * 2021-04-13 2021-06-25 北京百度网讯科技有限公司 Living body detection model training method, device, equipment and storage medium
CN114998962A (en) * 2022-05-31 2022-09-02 北京三快在线科技有限公司 Living body detection and model training method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
应用深度光学应变特征图的人脸活体检测;马思源;郑涵;郭文;;中国图象图形学报;20200316(第03期);全文 *

Also Published As

Publication number Publication date
CN115830721A (en) 2023-03-21

Similar Documents

Publication Publication Date Title
WO2019238063A1 (en) Text detection and analysis method and apparatus, and device
CN106295653B (en) Water quality image classification method
CN111563452B (en) Multi-human-body gesture detection and state discrimination method based on instance segmentation
CN110472524A (en) Invoice information management method, system and readable medium based on deep learning
CN115830721B (en) Living body detection method, living body detection device, terminal device and readable storage medium
WO2022042348A1 (en) Medical image annotation method and apparatus, device, and storage medium
CN113597614A (en) Image processing method and device, electronic device and storage medium
CN111127417A (en) Soft package coil stock printing defect detection method based on SIFT feature matching and improved SSD algorithm
CN111611912B (en) Detection method for pedestrian head-falling abnormal behavior based on human body joint point
CN114219781A (en) Method and device for detecting fishing stealing behavior based on scene decoupling and readable storage medium
CN106846399A (en) A kind of method and device of the vision center of gravity for obtaining image
CN107273793A (en) A kind of feature extracting method for recognition of face
CN111274882A (en) Automatic estimation method for human face age based on weak supervision
JPH11306325A (en) Method and device for object detection
CN107092907B (en) Growth curve processing method, device and system for blood bacteria culture
CN110147715A (en) A kind of retina OCT image Bruch film angle of release automatic testing method
CN116704518A (en) Text recognition method and device, electronic equipment and storage medium
CN111931689B (en) Method for extracting video satellite data identification features on line
CN108734167B (en) Method for recognizing characters on contaminated film
CN112330660B (en) Sperm tail detection method and system based on neural network
CN112507931B (en) Deep learning-based information chart sequence detection method and system
CN112232272B (en) Pedestrian recognition method by fusing laser and visual image sensor
Zeng et al. Text Image with Complex Background Filtering Method Based on Harris Corner-point Detection.
CN118172546B (en) Model generation method, detection device, electronic equipment, medium and product
CN102855632B (en) A kind of number variable localization method for bill printing on-line checkingi

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant