CN115147936A

CN115147936A - Living body detection method, electronic device, storage medium, and program product

Info

Publication number: CN115147936A
Application number: CN202210528092.5A
Authority: CN
Inventors: 马志明
Original assignee: Beijing Kuangshi Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd
Priority date: 2022-05-16
Filing date: 2022-05-16
Publication date: 2022-10-04

Abstract

The embodiment of the application provides a living body detection method, electronic equipment, a storage medium and a program product. The method comprises the following steps: acquiring a video to be detected of an object to be detected, wherein the video of the object to be detected is as follows: collecting a video of an object to be detected during illumination of the object to be detected according to a first illumination sequence; generating a response graph of the object to be detected according to the first illumination sequence and a second illumination sequence reflected by each entity position point of the object to be detected represented by the video to be detected; processing the response image by using the first in-vivo detection model based on the response image to perform in-vivo detection on the object to be detected, so as to obtain a first in-vivo detection result of the object to be detected; inputting the video to be detected into a second in-vivo detection model to obtain a second in-vivo detection result of the object to be detected; and determining the final in-vivo detection result of the object to be detected according to the first in-vivo detection result and the second in-vivo detection result.

Description

Living body detection method, electronic device, storage medium, and program product

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a method for detecting a living body, an electronic device, a storage medium, and a program product.

Background

The living body detection technology is mature day by day, and the related colorful living body detection technology mainly comprises two parts: firstly, a lighting sequence is checked, the emitted lighting sequence is compared with a sequence of reflected light presented by an object to be detected in a video, and whether camera hijacking exists is judged; and secondly, detecting whether the acquired colorful video contains common living attack behaviors such as screen reproduction, printing paper reproduction and the like through a living detection method of the acquired video or image of the object to be detected.

The existing dazzle color in-vivo detection technology generally splits the two parts for detection, that is, the two parts are detected by respectively adopting different models or algorithms, and then the detection results of the two parts are combined to obtain the final in-vivo detection result. Therefore, the existing dazzle color light living body detection technology is still to be improved.

Disclosure of Invention

In view of the above problems, embodiments of the present application provide a living body detection method, an electronic device, a storage medium, and a program product, so as to overcome the above problems or at least partially solve the above problems.

In a first aspect of the embodiments of the present application, a method for detecting a living body is provided, including:

acquiring a video to be detected of an object to be detected, wherein the video to be detected is as follows: acquiring a video of the object to be detected during illumination of the object to be detected according to a first illumination sequence;

generating a response graph of the object to be detected according to the first illumination sequence and a second illumination sequence reflected by each entity position point of the object to be detected represented by the video to be detected, wherein the response intensity of each pixel point in the response graph is represented by: similarity between a second illumination sequence reflected by the entity position point corresponding to the pixel point and the first illumination sequence;

performing living body detection on the object to be detected by using a first living body detection model based on the response graph to obtain a first living body detection result of the object to be detected;

inputting the video to be detected into a second in-vivo detection model to obtain a second in-vivo detection result of the object to be detected;

and determining the final in-vivo detection result of the object to be detected according to the first in-vivo detection result and the second in-vivo detection result.

Optionally, the second illumination sequence reflected by each entity position point of the object to be detected represented by the video to be detected is obtained according to the following steps:

extracting a plurality of video frames of the video to be detected;

aligning pixel points describing the same entity position point of the object to be detected in the plurality of video frames;

aiming at each entity position point of the object to be detected, illumination reflected by a pixel point of the entity position point is described according to each video frame in the plurality of video frames, and a second illumination sequence reflected by the entity position point is obtained.

Optionally, aligning pixel points describing the same entity position point of the object to be detected in the plurality of video frames includes:

performing face key point detection on the video frames aiming at each video frame in the plurality of video frames to obtain face key points contained in the video frames, performing face key point level alignment processing on the plurality of video frames according to the face key points contained in each video frame, and performing pixel point level alignment processing on the plurality of video frames after the face key point level alignment processing is performed;

or,

and carrying out pixel point level alignment processing on the plurality of video frames.

Optionally, the alignment processing at the pixel point level is performed through the following process:

respectively calculating dense optical flow data between each of other video frames except a reference video frame in the plurality of video frames participating in the alignment processing at the pixel point level and the reference video frame, wherein the reference video frame is any one of the plurality of video frames participating in the alignment processing at the pixel point level;

and according to the dense optical flow data between each other video frame and the reference video frame, carrying out pixel point level alignment processing on each other video frame and the reference video frame.

Optionally, the first illumination sequence and the second illumination sequence each comprise a color light sequence of a plurality of color channels;

generating a response map of the object to be detected according to the first illumination sequence and a second illumination sequence reflected by each entity position point of the object to be detected represented by the video to be detected, including:

separating the first illumination sequence to obtain a first color light sequence corresponding to each color channel, and separating the second illumination sequence to obtain a second color light sequence corresponding to each color channel;

for each color channel, generating a response subgraph of the object to be detected in the color channel according to the similarity between the second colorful light sequence of the color channel reflected by each entity position point of the object to be detected and the first colorful light sequence corresponding to the color channel;

and performing fusion processing on the response subgraphs of the object to be detected in each color channel to obtain the response graph of the object to be detected.

Optionally, performing fusion processing on the response subgraph of the object to be detected in each color channel to obtain a response graph of the object to be detected, where the fusion processing includes:

acquiring a response intensity mean value of a face region of the object to be detected in each color channel according to the response subgraph of the object to be detected in each color channel;

performing regularization processing on a response subgraph of the object to be detected in each color channel according to the response intensity average value of the face region of the object to be detected in each color channel to obtain a regularized response subgraph of the object to be detected in each color channel;

and performing fusion processing on the regularized response subgraphs of the to-be-detected object in each color channel to obtain a response graph of the to-be-detected object.

Optionally, before processing the response map by using the first in-vivo detection model, the method further includes:

obtaining attribute values of a response graph of the object to be detected, wherein the attribute values comprise at least one of the following: mean value and quality value of response intensity;

determining a living body detection result of the object to be detected according to the size relation between the attribute value of the response image of the object to be detected and the corresponding attribute threshold value;

determining that the object to be detected is not a living body under the condition that the attribute value of the response image of the object to be detected is smaller than the corresponding attribute threshold value;

and under the condition that the attribute value of the response map of the object to be detected is not less than the corresponding attribute threshold value, executing the step of performing in-vivo detection on the object to be detected by utilizing a first in-vivo detection model based on the response map.

Optionally, the first living body detection model is a model in which a first image feature of a response map of a living body face and a second image feature of the living body face are learned;

performing in-vivo detection on the object to be detected by using a first in-vivo detection model based on the response diagram to obtain a first in-vivo detection result of the object to be detected, including:

extracting a first image feature of the response map;

extracting any video frame of the video to be detected, and acquiring a second image characteristic of the video frame;

fusing the first image characteristic and the second image characteristic to obtain a fused image characteristic;

and processing the fusion image characteristics by using the first living body model to obtain a first living body detection result of the object to be detected.

In a second aspect of the embodiments of the present application, there is provided a method for detecting a living body, including:

and performing in-vivo detection on the object to be detected by using an in-vivo detection model based on the response graph to obtain an in-vivo detection result of the object to be detected.

extracting a plurality of video frames of the video to be detected;

or,

generating a response diagram of the object to be detected according to the first illumination sequence and a second illumination sequence reflected by each entity position point of the object to be detected represented by the video to be detected, wherein the response diagram comprises:

Optionally, performing fusion processing on the response subgraph of each color channel of the object to be detected to obtain a response graph of the object to be detected, where the fusion processing includes:

according to the response subgraph of the object to be detected in each color channel, obtaining the mean value of the response intensity of the face area of the object to be detected in each color channel;

Optionally, before processing the response map by using the in-vivo detection model, the method further includes:

and under the condition that the attribute value of the response map of the object to be detected is not less than the corresponding attribute threshold value, executing the step of performing in-vivo detection on the object to be detected by using an in-vivo detection model based on the response map.

Optionally, the living body detection model is a model in which a first image feature of a response map of a living body face and a second image feature of the living body face are learned;

performing in-vivo detection on the object to be detected by using an in-vivo detection model based on the response map to obtain an in-vivo detection result of the object to be detected, wherein the in-vivo detection result comprises the following steps:

extracting a first image feature of the response map;

and processing the fusion image characteristics by using the living body model to obtain a living body detection result of the object to be detected.

In a third aspect of embodiments of the present application, there is provided an electronic device comprising a memory, a processor, and a computer program stored on the memory, the processor executing the computer program to implement the liveness detection method according to the first aspect; alternatively, the processor executes the computer program to implement the living body detection method according to the second aspect.

In a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program/instructions which, when executed by a processor, implement the liveness detection method according to the first aspect; alternatively, the computer program/instructions, when executed by a processor, implement the liveness detection method as described in the second aspect.

In a fifth aspect of embodiments of the present application, there is provided a computer program product comprising a computer program/instructions which, when executed by a processor, implements the liveness detection method according to the first aspect; alternatively, the computer program/instructions, when executed by a processor, implement the liveness detection method as described in the second aspect.

The embodiment of the application has the following advantages:

in this embodiment, the response map of the object to be detected is specified to the pixel point level, and different reflection patterns exhibited by each entity position area of the object to be detected can be obtained according to the similarity between the second illumination sequence and the first illumination sequence reflected by the entity position point of the object to be detected, which is represented by each pixel point of the response map. The living body face is uneven, the screen or printing paper adopted by the copying is smooth and high in reflectivity, so that the living body face and the light reflected by the copying have different modes, and the object to be detected can be distinguished to be the copying or the living body face based on the response image of the object to be detected, so that a first living body detection result of the object to be detected is obtained. That is, in the dazzle color in-vivo detection method, the characteristic of reflected light is introduced, so that the reproduction attack detection can be further realized on the basis of the dazzle color in-vivo detection, and the accuracy and the detection capability of in-vivo detection are effectively improved;

in addition, the second in-vivo detection model can also be used for obtaining in-vivo detection results of other attack types of the object to be detected, and the first in-vivo detection model and the second in-vivo detection model are combined together, so that the capability of detecting other attacks can be effectively improved under the condition that the probability that the real person is judged as the in-vivo is not reduced, and the accuracy of the final in-vivo detection result is effectively improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings may be obtained according to these drawings without inventive labor.

FIG. 1 is a flow chart illustrating the steps of a method for detecting a living organism in an embodiment of the present application;

FIG. 2 is a schematic flow chart of obtaining a first in-vivo detection result in an embodiment of the present application;

FIG. 3 is a schematic flow chart of in vivo detection in an embodiment of the present application;

FIG. 4 is a flow chart illustrating the steps of a method for detecting a living subject according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a living body detecting apparatus according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a living body detecting apparatus according to an embodiment of the present application;

fig. 7 is a schematic diagram of an electronic device in an embodiment of the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been advanced significantly. Artificial Intelligence (AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is used as an important branch of artificial intelligence, specifically, a machine is used for identifying the world, and computer vision technologies generally comprise technologies such as face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction (SLAM), computational photography, robot navigation and positioning and the like. With the research and progress of artificial intelligence technology, the technology is applied to many fields, such as safety control, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, intelligent medical treatment, face payment, face unlocking, fingerprint unlocking, person certificate verification, smart screens, smart televisions, cameras, mobile internet, live webcasts, beauty treatment, medical beauty treatment, intelligent temperature measurement and the like.

In the related art, when living body detection is carried out, whether camera hijacking exists needs to be judged by comparing the similarity between an emitted illumination sequence and an acquired illumination sequence reflected by a video; however, the judgment is only performed based on the similarity between the illumination sequences, situations such as the reflection of illumination by the printed human face may exist, and the illumination sequence reflected by the printed human face and the emitted illumination sequence also have higher similarity. Therefore, the related living body detection technology also needs to determine whether a living body attack behavior is included based on the captured video, for example, whether the behavior is a screen shot, a printing paper shot, or the like.

However, the living body detection technology is related to the detection of the similarity and the detection of the attack behavior of the living body to split, and the applicant proposes that the similarity detection and the detection of the attack behavior of the living body can be combined according to the information that light is irradiated on the face of the real person and irradiated on the screen/printing paper, and the reflected light has different reflection modes, and the similarity and the detection of the attack behavior of the living body are considered while the difference of the reflected light of the real person and the screen/printing paper is considered. Because the human face is uneven, the light reflected by different areas is different, and the background area behind the human face often can only reflect weak light or even can not reflect light because the received illumination intensity is low; the screen/printing paper etc. is smooth and flat, so the reflected light may be uniform and regular. Therefore, the applicant thought that the accuracy of the dazzle color live body detection can be improved by using information that the face of the live body has a different reflection pattern from that of the screen/printing paper.

Referring to fig. 1, a flowchart illustrating steps of a living body detection method in an embodiment of the present application is shown, and as shown in fig. 1, the living body detection method may be applied to a background server, and includes the following steps:

step S11: acquiring a video to be detected of an object to be detected, wherein the video to be detected is as follows: acquiring a video of the object to be detected during illumination of the object to be detected according to a first illumination sequence;

step S12: generating a response graph of the object to be detected according to the first illumination sequence and a second illumination sequence reflected by each entity position point of the object to be detected represented by the video to be detected, wherein the response intensity of each pixel point in the response graph is represented by: similarity between a second illumination sequence reflected by the entity position point corresponding to the pixel point and the first illumination sequence;

step S13: performing living body detection on the object to be detected by using a first living body detection model based on the response graph to obtain a first living body detection result of the object to be detected;

step S14: inputting the video to be detected into a second in-vivo detection model to obtain a second in-vivo detection result of the object to be detected;

step S15: and determining the final in-vivo detection result of the object to be detected according to the first in-vivo detection result and the second in-vivo detection result.

In specific implementation, the background server may issue the first illumination sequence to the terminal. The terminal emits light according to the first light sequence to irradiate the object to be detected, and collects the video to be detected of the object to be detected during the period when the light is emitted according to the first light sequence to irradiate the object to be detected. The object to be detected is an object collected by a camera of the terminal.

Optionally, in some specific embodiments, the scheme of the application may also be executed by an electronic device such as a terminal, for example, when performing living body detection, a first illumination sequence is generated by the electronic device itself, and a video of an object to be detected is acquired during illumination of the object to be detected according to the first illumination sequence; and then the electronic equipment executes a subsequent living body detection process according to the first illumination sequence and the video to be detected. Specifically, the first illumination sequence is issued by a background server or generated by an electronic device such as a terminal, and the in-vivo detection process is executed by the background server or the electronic device such as the terminal, and even a part of steps in the in-vivo detection process are executed by the electronic device such as the terminal, a part of steps are executed by the background server, and the like, which can be set according to actual requirements.

Processing each video frame of the video to be detected, determining the corresponding pixel point of the entity position point of the object to be detected in each video frame, and acquiring the illumination reflected by the corresponding pixel point in each video frame, so as to obtain a second illumination sequence reflected by each entity position point of the object to be detected. For example, when the object to be detected is a human face, an entity position point of the object to be detected may be a point on the nose of the human face, and the size of the point is the size represented by a pixel point in the video.

And calculating the similarity between the first illumination sequence and the second illumination sequence reflected by each entity position point of the object to be detected, and taking the similarity as the response intensity of the response graph to generate the response graph of the object to be detected. Optionally, when the illumination sequence is white light, each element in the first illumination sequence and the second illumination sequence may represent an illumination intensity of the white light; when the illumination sequence is a color light, each element in the first illumination sequence and the second illumination sequence may characterize an illumination intensity of the color light and/or a color of the color light. The similarity between the first illumination sequence and the second illumination sequence can be obtained by dot product of the two sequences. Because the response graph of the object to be detected is refined to the pixel point level, different reflection modes presented by different position areas of the object to be detected can be obtained through the response graph of the object to be detected.

And inputting the response image of the object to be detected into the first in-vivo detection model, and performing in-vivo detection on the object to be detected by the first in-vivo detection model according to the response image of the object to be detected to obtain a first in-vivo detection result of the object to be detected. The first living body detection model is a model which learns the first image characteristics of the response graph of the living body face through supervised training and can distinguish the response graph of the living body face from the response graph of other attacks (such as copying attacks), so that the first living body detection model can obtain a first living body detection result of the object to be detected through the response graph of the object to be detected. Wherein, the supervised training performed by the first living body detection model may be: acquiring response graphs of a plurality of sample objects (including living human faces and other objects), and inputting the response graphs of the sample objects into a first living detection model to be trained to obtain the prediction probability of the sample objects as living bodies; and establishing a loss function according to the prediction probability and whether the sample object is a living body, and updating model parameters of a first living body detection model to be trained based on the loss function to obtain the first living body detection model. In this way, the first living body detection model can learn the response map of the living body face. The method for obtaining the response map of the sample object may refer to the method for obtaining the response map of the object to be detected.

In order to detect more attack types of attacks simultaneously and improve the accuracy of the in-vivo detection result, a second in-vivo detection result of the object to be detected can be obtained through the video to be detected by adopting a second in-vivo detection model. The second living body detection model may be a common model for living body detection in the related art, and the second living body detection model may perform living body detection according to an input video or a video frame. Optionally, in a specific implementation, the second living body detection model may be any type of living body detection model, such as a mask attack living body detection model, a motion detection living body model, and the like; for some living body detection models, a user may be required to perform a corresponding action, and therefore, in order to obtain the first living body detection result and the second living body detection result based on the video to be detected, when the video is recorded, the user may be instructed to perform a corresponding action according to the requirement of the second living body detection model. In a specific application scenario, the setting may be performed according to actual requirements of the specific application scenario, which is not limited in the embodiment of the present application.

Alternatively, the first living body detection model and the second living body detection model work in parallel, and the first living body detection result and the second living body detection result are obtained in parallel; or the first in vivo detection model obtains a first in vivo detection result first, and then the second in vivo detection model obtains a second in vivo detection result; the second in-vivo detection model may obtain the second in-vivo detection result first, and then the first in-vivo detection model may obtain the first in-vivo detection result.

And synthesizing the first in-vivo detection result and the second in-vivo detection result to obtain a final in-vivo detection result of the object to be detected. Alternatively, the first in-vivo detection result and the second in-vivo detection result may be respectively representing probabilities that the object to be detected is a living human face. Comparing the smaller value of the probability represented by the first living body detection result and the probability represented by the second living body detection result with the magnitude relation of the living body threshold value; and under the condition that the smaller value is larger than the living body threshold value, determining that the final living body detection result of the object to be detected is as follows: is a living body; and under the condition that the smaller value is not larger than the living body threshold value, determining that the final living body detection result of the object to be detected is as follows: not a living body. Wherein the living body threshold value may be a relatively reasonable value set in advance. Optionally, different weights may be set for the first in-vivo detection result and the second in-vivo detection result, and the final in-vivo detection result of the object to be detected may be obtained by combining the weighted first in-vivo detection result and the weighted second in-vivo detection result.

In this way, the ability to detect other attacks can be effectively improved without reducing the probability that a real person is determined to be a living body.

By adopting the technical scheme of the embodiment of the application, the response image of the object to be detected is specific to the pixel point level, and different reflection modes presented in each entity position area of the object to be detected can be obtained according to the similarity between the second illumination sequence and the first illumination sequence reflected by the entity position point of the object to be detected represented by each pixel point of the response image. The live body face is uneven, the screen or printing paper adopted by the copying is smooth and high in reflectivity, so that the live body face and the light reflected by the copying have different modes, and therefore the first live body detection model can distinguish whether the object to be detected is the copying or the live body face based on the response image of the object to be detected, and therefore the first live body detection result of the object to be detected is obtained. Therefore, the first living body detection model utilizes the information that the copying and the living body face have different reflection modes, the combination of two detection modes (whether the lighting sequence inspection and the detection are copying) is realized, and the determined first living body detection result is more accurate. In addition, the second in-vivo detection model can also obtain a second in-vivo detection result of the object to be detected, and the final in-vivo detection result of the object to be detected, which is determined by combining the first in-vivo detection result and the second in-vivo detection result, can effectively improve the capability of detecting other attacks under the condition of ensuring that the probability that a real person is judged as a living body is not reduced, so that the accuracy of the final in-vivo detection result is effectively improved.

Considering that there is also an attack such as wearing a mask because the mask structure and the face structure are similar, the reflected illumination, the generated response map are similar, and it may be difficult to recognize such an attack only by the first live detection model in which the first image feature of the response map of the live face is learned. Therefore, when the first living body detection model is subjected to supervised training, the first living body detection model can also be made to learn the second image feature of the living body face. Optionally, the supervised training by the first in-vivo detection model may be: acquiring response graphs of a plurality of sample objects (including living human faces and other objects) and a collected video of each sample object, extracting any video frame from the video, and performing supervised training by using a first living body detection model to be trained on the basis of the response graphs and the video frame and information on whether the sample objects are really living bodies. In the case that the model structure of the first liveness detection model allows only one input, a first image feature of the response map and a second image feature of the video frame can be extracted; and fusing the first image characteristic of the response image and the second image characteristic of the video frame, and inputting the fused characteristic into a first living body detection model to be trained to obtain a first living body detection result of the sample object.

Accordingly, the first living body detection model of the image characteristics of the living body face is learned, and a first living body detection result of the object to be detected can be obtained according to the fused image characteristics. The method for acquiring the fusion image features comprises the following steps: extracting a first image feature of the response map; extracting any video frame of the video to be detected, and acquiring a second image characteristic of the video frame; and fusing the first image characteristic and the second image characteristic to obtain a fused image characteristic.

And fusing the image characteristics of the response image of the object to be detected with the image characteristics of any video frame extracted from the video of the object to be detected to obtain the characteristics.

By adopting the technical scheme of the embodiment of the application, the response image of the living body face is learned, and the first living body detection model of the image characteristics of the living body face is learned, so that attacks such as wearing a mask can be avoided, and the accuracy of first living body detection is improved.

On the basis of the technical scheme, the second illumination sequence reflected by each entity position point of the object to be detected represented by the video to be detected is obtained according to the following steps: extracting a plurality of video frames of the video to be detected; aligning pixel points describing the same entity position point of the object to be detected in the plurality of video frames; and aiming at each entity position point of the object to be detected, obtaining a second illumination sequence reflected by the entity position point according to illumination reflected by the pixel point describing the entity position point in each video frame in the plurality of video frames.

The video to be detected has a plurality of video frames, the position of an entity position point of the object to be detected in each video frame may be different, and in order to obtain the second illumination sequence reflected by each entity position point of the object to be detected, the plurality of video frames need to be aligned to obtain a pixel point corresponding to any entity position point of the object to be detected in each video frame. According to the illumination reflected by the corresponding pixel point of any entity position point of the object to be detected in each video frame and the time sequence between each video frame, a second illumination sequence reflected by each entity position point of the object to be detected can be obtained.

For example, the illumination is red light, there are 5 video frames in total, the pixel points of an entity position point of the object to be detected in each video frame are a, B, C, D, and E, respectively, and if the points a, C, and D all reflect red light, and the points B and E do not reflect red light, the second entity sequence of red light reflected by the entity position point of the object to be detected may be 10110. It will be appreciated that the number in the second physical sequence may also be a number between 0 and 1 depending on the intensity of the reflected light, the magnitude of the number being related to the intensity of the reflected light.

By adopting the technical scheme of the embodiment of the application, the problem that the second illumination sequence reflected by each entity position point of the object to be detected is difficult to acquire due to different positions of the object to be detected in each video frame can be solved.

The alignment processing of multiple video frames can be realized based on the face key points. Firstly, respectively detecting face key points of each video frame in a plurality of video frames of a video to be detected to obtain the face key points contained in the plurality of video frames. The specific detection method for detecting the key points of the human face is not required, and related human face key point detection algorithms or software can be adopted; other methods which are shorter in time consumption can also be adopted, for example, after key points of the human face in the first video frame are obtained, 5 points of the left eye corner, the right eye corner, the nose tip, the left lip corner and the right lip corner are used as anchor points, and the rest video frames are mapped to the template through a Thin Plate Spline interpolation algorithm (Thin Plate Spline). The anchor points can be customized according to the input points of the key points of the human face, and the positions of some key points are fixed after rough alignment. The greater the number of anchor points, the better the alignment, but 5 points are sufficient. It is understood that when a human face is not detected, the object to be detected can be directly determined as a non-living body.

According to the face key points contained in each video frame, the alignment processing of the face key point level can be carried out on each video frame. Further, pixel point level alignment processing is performed on a plurality of video frames subjected to face key point level alignment processing.

Optionally, pixel-level alignment processing may also be directly performed on multiple video frames.

The alignment processing of pixel point level is directly carried out, so that the alignment process is simpler; the alignment processing at the key point level of the human face is performed first, and then the alignment processing at the pixel point level is performed, so that the calculation resources consumed during the alignment processing at the pixel point level can be saved. Specifically, how to implement alignment processing at the pixel point level can be selected according to actual requirements.

Optionally, the fine alignment at the pixel point level may be implemented by the following process: respectively calculating dense optical flow data between each of other video frames except a reference video frame in the plurality of video frames participating in the alignment processing at the pixel point level and the reference video frame, wherein the reference video frame is any one of the plurality of video frames participating in the alignment processing at the pixel point level; and according to the dense optical flow data between each other video frame and the reference video frame, carrying out pixel point level alignment processing on each other video frame and the reference video frame.

The first video frame can be taken as a reference video frame, dense optical flow data between each other video frame and the reference video frame can be calculated in a brightness channel, and the dense optical flow data is taken as a mapping basis to perform fine alignment on pixel point level on each other video frame and the reference video frame. The alignment processing at the pixel level may be implemented based on a dense optical flow algorithm, such as Gunnar Farneback algorithm (a dense optical flow algorithm).

Therefore, fine alignment of pixel point levels is achieved, the second illumination sequence reflected by each entity position point of the object to be detected can be accurately obtained, an accurate response diagram is further generated, and an accurate first living body detection result is obtained.

Fig. 2 shows a schematic flowchart of obtaining a first in-vivo test result, and the step of obtaining the first in-vivo test result may include: the method comprises the steps of alignment processing of key point levels of the face, alignment processing of pixel point levels, obtaining of a second illumination sequence, similarity calculation, generation of a response image, acquisition of a video frame, input of a first living body detection model and obtaining of a first living body detection result. The above steps may form a relatively complete flow, but according to actual requirements, one or more steps may be discarded, for example, a step of alignment processing at a face key point level may be discarded to reduce the complexity of the whole flow; the step of acquiring video frames may be omitted and accordingly only the response map is input into the first in vivo detection model, while identification of an attack such as wearing a mask may be achieved by the second in vivo model to avoid repetitive work between the first and second in vivo detection models; the steps of alignment processing at the human face key point level and video frame acquisition can be simultaneously omitted, complexity of the whole process is reduced, and meanwhile repeated work between a living body detection model and a second living body detection model is avoided.

On the basis of the above technical solution, the first illumination sequence and the second illumination sequence may be both color light sequences including a plurality of color channels. When the similarity between the first illumination sequence and the second illumination sequence is calculated, the similarity between the first color light sequence and the second color light sequence in each color channel is calculated, and then a response subgraph of the color channel is obtained.

In order to obtain the color light sequence reflected by each color channel, the first illumination sequence and the second illumination sequence need to be separated respectively. The first illumination sequence is separated to obtain a first color light sequence corresponding to each color channel, and the second illumination sequence is separated to obtain a second color light sequence corresponding to each color channel. In separating the first illumination sequence and the second illumination sequence, each color channel may be regularized in order to improve the accuracy of the separation. Can utilize the formula x' _t ＝x _t -(∑x _t ) N implements regularization for each color channel, where x' _t Characterizing the normalized color light sequence (which may be the first color light sequence or the second color light sequence), x _t And characterizing the color light sequence before regularization, and n characterizes the length of the color light sequence.

The method for regularizing each color channel may also refer to the related art, which is not limited in this application. The first illumination sequence is issued by the background, so that the first color light sequence corresponding to each color channel can be directly acquired from the background.

For example, taking one second as a unit between color light sequences, the color channels are a red channel, a yellow channel, and a blue channel, the emitted color light is red light in the first second, orange light composed of red light and yellow light in the second, white light composed of yellow light and blue light in the third second, yellow light in the fourth second, and red light in the fifth second, 1 represents that there is light of the corresponding color channel, 0 represents that there is no light of the corresponding color channel, and the obtained first color light sequence may be: the first color light sequence 11001 of the red channel, the first color light sequence 01110 of the yellow channel, and the first color light sequence 00100 of the blue channel. According to a similar principle, the second color light sequences corresponding to the color channels and reflected by each entity position point of the object to be detected can be obtained. It is understood that the number in the first color light sequence may be a number between 0 and 1 according to the intensity of the transmitted light; the number in the second color light sequence may be a number between 0 and 1 depending on the intensity of the reflected light.

For each color channel, if the similarity between the second color light sequence of the color channel reflected by one entity position point of the object to be detected and the first color light sequence corresponding to the color channel is greater than a preset value, the color of the pixel point corresponding to the entity position point in the response subgraph of the color channel is the color of the color channel. And after the response subgraph of each color channel is obtained, fusing the response subgraphs of the color channels to obtain a response graph of the object to be detected.

For example, if the color of a pixel point corresponding to an entity position point of the object to be detected in the response graph of the red channel is red (the similarity is greater than the preset value), the color of a pixel point corresponding to the entity position point in the response graph of the yellow channel is yellow (the similarity is greater than the preset value), and the color of a pixel point corresponding to the entity position point in the response graph of the blue channel is zero (the similarity is not greater than the preset value), the response subgraphs of the color channels are fused, and the position of the pixel point corresponding to the entity position point in the response graph of the object to be detected is orange composed of red and yellow.

Therefore, for each color channel, a response subgraph of each color channel is obtained, and then a response graph of the object to be detected is obtained according to the response subgraph of each color channel.

On the basis of the technical scheme, in each color channel, in order to avoid adverse effects on a response subgraph caused by overhigh brightness of the response subgraph of the color channel, the response subgraph of the color channel can be regularized by using the response intensity mean value of a human face area in the response subgraph, so that a regularized response subgraph of an object to be detected in the color channel is obtained. And fusing the regularized response subgraphs of the object to be detected in each color channel to obtain a response graph of the object to be detected.

Optionally, with the use of the response intensity mean of the face region in the response subgraph of each color channel, the regularizing process on the response subgraph of the color channel may be: dividing the response intensity average value of the face area of the response subgraph of the color channel by the response intensity of each pixel point of the response subgraph of the color channel, and taking the obtained quotient as the response intensity of each pixel point of the regularized response subgraph of the color channel.

The response intensity of the response graph after regularization can be calculated by the following formula:

wherein F represents a face region, r _i，j Characterizing the response intensity of a pixel point (i, j) of the response map; n represents the number of pixels, N _F And representing the average value of the response intensity of the face area.

The resulting response map is Resp '= { r' _i，j }，r′ _i，j The response intensity of the pixel point (i, j) of the response graph after the representation is normalized can be obtained through a formula

And (4) calculating.

The fusion of the regularized response subgraphs of the color channels can be achieved by the following formula:

ri _i，j ＝uint8(min(max(r′ _i，j *255，0)，255))

wherein ri _i，j Representing the response strength of a pixel point (i, j) of the fused response graph, and unit8 represents that the floating point number is converted into an 8-bit non-negative integer, r' _i，j And (5) representing the response intensity of the pixel point (i, j) of the response graph after regularization.

Therefore, the response intensity of the response subgraph of each color channel can be balanced, and an over-bright scene is avoided.

Optionally, on the basis of the above technical solution, after the response map of the object to be detected is obtained and before the response map of the object to be detected is input into the first in-vivo detection model, the in-vivo detection result of the object to be detected may be determined according to at least one attribute value of the response map of the object to be detected.

The attribute values of the response graph of the object to be detected include at least one of: mean and mass values of response intensity. And under the condition that any attribute value is smaller than the corresponding attribute threshold value, determining that the object to be detected is not a living body, and under the condition that all attribute values are not smaller than the corresponding attribute threshold value, inputting a response graph of the object to be detected into the first living body detection model.

Fig. 3 shows a schematic flowchart of the living body detection, and before the response diagram of the object to be detected is input into the first living body detection model, it is determined whether the attribute value of the response diagram is smaller than the corresponding attribute threshold. And under the condition that any attribute value is smaller than the corresponding attribute threshold value, judging that the object to be detected is not a living body, directly outputting a detection result of a result that the object to be detected is not a living body, and otherwise, inputting the response graph of the object to be detected into the first living body detection model to carry out living body detection.

The response intensity average value of the response image of the object to be detected is smaller than the response intensity threshold value, the collected second color light sequence is considered to be too weak, so that the response image is not enough as a clue of living body verification, for safety consideration, an attack can be considered to exist, the object to be detected is directly judged not to be a living body, and therefore the response image of the object to be detected does not need to be input into the first living body detection model.

If the quality value of the response image of the object to be detected is smaller than the quality threshold, an attack may exist, and the object to be detected is directly judged to be not a living body, so that the response image of the object to be detected does not need to be input into the first living body detection model. The quality value of the response image of the object to be detected can be determined according to noise in the response image, and the quality value is lower when the noise is more.

The quality value of the response map may be calculated by the following equation:

wherein quality is quality value r _i，j Characterizing the response intensity of a pixel point (i, j) of the response map, t characterizing each element, y 'in the illumination sequence' _t Characterizing a first illumination sequence, x ', after color channel regularization' _t And characterizing the second illumination sequence after the color channel regularization processing.

Therefore, some attacks can be avoided, and the accuracy of the in-vivo detection result is improved.

Referring to fig. 4, which is a flowchart illustrating steps of a living body detection method in an embodiment of the present application, as shown in fig. 4, the living body detection method can be applied to a backend server, and includes the following steps:

step S41: acquiring a video to be detected of an object to be detected, wherein the video to be detected is as follows: acquiring a video of the object to be detected during illumination of the object to be detected according to a first illumination sequence;

step S42: generating a response graph of the object to be detected according to the first illumination sequence and a second illumination sequence reflected by each entity position point of the object to be detected represented by the video to be detected, wherein the response intensity of each pixel point in the response graph is represented by: similarity between a second illumination sequence reflected by the entity position point corresponding to the pixel point and the first illumination sequence;

step S43: and carrying out in-vivo detection on the object to be detected by using an in-vivo detection model based on the response diagram to obtain an in-vivo detection result of the object to be detected.

The method for acquiring the video to be detected of the object to be detected and generating the response graph of the object to be detected can refer to the method for acquiring the video to be detected of the object to be detected and generating the response graph of the object to be detected; the training method of the living body test model may refer to the training method of the first living body test model.

And inputting the response image of the object to be detected into the living body detection model to obtain a living body detection result of the object to be detected.

By adopting the technical scheme of the embodiment of the application, the response graph of the detection object is specifically at the pixel point level, and different reflection modes presented in each entity position area of the detection object can be obtained according to the similarity between the first illumination sequence and the second illumination sequence reflected by the entity position point of the detection object represented by each pixel point of the response graph. The live body face is uneven, the screen or printing paper adopted by the copying is smooth and high in reflectivity, so that the live body face and the light reflected by the copying have different modes, and therefore the live body detection model can distinguish whether the object to be detected is the copying or the live body face based on the response image of the object to be detected, and therefore the live body detection result of the object to be detected is obtained. Therefore, the living body detection model utilizes the information that the copying and the living body face have different reflection modes, the combination of two detection modes (whether the lighting sequence detection and the detection are copying) is realized, and the determined living body detection result is more accurate.

Optionally, the step of obtaining a second illumination sequence reflected by each entity position point of the object to be detected represented by the video to be detected is obtained according to the following steps:

extracting a plurality of video frames of the video to be detected; aligning pixel points describing the same entity position point of the object to be detected in the plurality of video frames; aiming at each entity position point of the object to be detected, illumination reflected by a pixel point of the entity position point is described according to each video frame in the plurality of video frames, and a second illumination sequence reflected by the entity position point is obtained.

Optionally, aligning pixel points describing the same entity position point of the object to be detected in the plurality of video frames, specifically including the following processes:

or,

Optionally, the alignment processing at the pixel point level may be performed by the following process:

Optionally, the first illumination sequence and the second illumination sequence may each comprise a color light sequence of a plurality of color channels; generating a response diagram of the object to be detected according to the first illumination sequence and a second illumination sequence reflected by each entity position point of the object to be detected represented by the video to be detected, wherein the response diagram comprises: separating the first illumination sequence to obtain a first color light sequence corresponding to each color channel, and separating the second illumination sequence to obtain a second color light sequence corresponding to each color channel; for each color channel, generating a response subgraph of the object to be detected in the color channel according to the similarity between the second colorful light sequence of the color channel reflected by each entity position point of the object to be detected and the first colorful light sequence corresponding to the color channel; and performing fusion processing on the response subgraphs of the object to be detected in each color channel to obtain the response graph of the object to be detected. For specific steps, reference may be made to the foregoing description.

Optionally, performing fusion processing on the response subgraph of each color channel of the object to be detected to obtain a response graph of the object to be detected, where the fusion processing includes: according to the response subgraph of the object to be detected in each color channel, obtaining the mean value of the response intensity of the face area of the object to be detected in each color channel; performing regularization processing on a response subgraph of the object to be detected in each color channel according to the response intensity average value of the face region of the object to be detected in each color channel to obtain a regularized response subgraph of the object to be detected in each color channel; and performing fusion processing on the regularized response subgraph of each color channel of the object to be detected to obtain a response graph of the object to be detected. Performing fusion processing on the response subgraphs of the object to be detected in each color channel to obtain the response graph of the object to be detected, wherein the fusion processing comprises the following steps:

acquiring a response intensity mean value of a face region of the object to be detected in each color channel according to the response subgraph of the object to be detected in each color channel; performing regularization processing on a response subgraph of the object to be detected in each color channel according to the response intensity average value of the face region of the object to be detected in each color channel to obtain a regularized response subgraph of the object to be detected in each color channel; and performing fusion processing on the regularized response subgraph of each color channel of the object to be detected to obtain a response graph of the object to be detected.

Optionally, before processing the response map by using the in-vivo detection model, the method further includes: obtaining attribute values of a response graph of the object to be detected, wherein the attribute values comprise at least one of the following: mean value and quality value of response intensity; determining a living body detection result of the object to be detected according to the size relation between the attribute value of the response image of the object to be detected and the corresponding attribute threshold value; determining that the object to be detected is not a living body under the condition that the attribute value of the response image of the object to be detected is smaller than the corresponding attribute threshold value; and under the condition that the attribute value of the response map of the object to be detected is not less than the corresponding attribute threshold value, executing the step of performing in-vivo detection on the object to be detected by using an in-vivo detection model based on the response map. The specific steps can be referred to the above.

Optionally, the living body detection model is a model in which a first image feature of a response map of a living body face and a second image feature of the living body face are learned; performing in-vivo detection on the object to be detected by using an in-vivo detection model based on the response diagram to obtain an in-vivo detection result of the object to be detected, including: extracting a first image feature of the response image; extracting any video frame of the video to be detected, and acquiring a second image characteristic of the video frame; fusing the first image characteristic and the second image characteristic to obtain a fused image characteristic; and processing the fusion image characteristics by using the living body model to obtain a living body detection result of the object to be detected. For specific steps, reference may be made to the foregoing description.

The specific implementation process of each step in the in-vivo detection method provided in the embodiment of the present application may refer to the description of the foregoing method embodiment, and is not described herein again.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.

Fig. 5 is a schematic structural diagram of a living body detecting apparatus according to an embodiment of the present application, and as shown in fig. 5, the living body detecting apparatus includes a video acquiring module 51, a response map generating module 52, a first living body detection result acquiring module 53, a second living body detection result acquiring module 54, and a final detection result determining module 55, where:

the video acquisition module 51 is configured to acquire a to-be-detected video of an object to be detected, where the to-be-detected video is: acquiring a video of the object to be detected during illumination of the object to be detected according to a first illumination sequence;

a response map generating module 52, configured to generate a response map of the object to be detected according to the first illumination sequence and a second illumination sequence reflected by each entity position point of the object to be detected represented by the video to be detected, where response intensity of each pixel point in the response map is represented by: similarity between a second illumination sequence reflected by the entity position point corresponding to the pixel point and the first illumination sequence;

a first in-vivo detection result obtaining module 53, configured to perform in-vivo detection on the object to be detected by using a first in-vivo detection model based on the response map, so as to obtain a first in-vivo detection result of the object to be detected;

the second in-vivo detection result obtaining module 54 is configured to input the video to be detected into a second in-vivo detection model, so as to obtain a second in-vivo detection result of the object to be detected;

and a final in-vivo detection result determining module 55, configured to determine a final in-vivo detection result of the object to be detected according to the first in-vivo detection result and the second in-vivo detection result.

extracting a plurality of video frames of the video to be detected;

performing face key point detection on the video frames aiming at each video frame in the plurality of video frames to obtain face key points contained in each video frame, performing face key point level alignment processing on the plurality of video frames according to the face key points contained in each video frame, and performing pixel point level alignment processing on the plurality of video frames after the face key point level alignment processing is performed;

or, the alignment processing of the pixel point level is carried out on the plurality of video frames.

Optionally, the alignment processing at the pixel point level is performed through the following processes:

Optionally, the first illumination sequence and the second illumination sequence each comprise a color light sequence of a plurality of color channels; the response map generation module 52 includes:

a first separation unit, configured to separate the first illumination sequence to obtain a first color light sequence corresponding to each color channel, and separate the second illumination sequence to obtain a second color light sequence corresponding to each color channel;

a first response subgraph generation unit, configured to generate, for each color channel, a response subgraph of the object to be detected in the color channel according to a similarity between a second color light sequence of the color channel reflected by each entity position point of the object to be detected and a first color light sequence corresponding to the color channel;

and the first fusion unit is used for carrying out fusion processing on the response subgraph of each color channel of the object to be detected to obtain the response graph of the object to be detected.

Optionally, the first fusion unit includes:

the first mean value obtaining subunit is used for obtaining a mean value of response intensity of the face region of the object to be detected in each color channel according to the response subgraph of the object to be detected in each color channel;

the first regularization subunit is configured to perform regularization processing on a response subgraph of the object to be detected in each color channel according to a response intensity average value of the face region of the object to be detected in each color channel, so as to obtain a regularized response subgraph of the object to be detected in each color channel;

and the first fusion subunit is configured to perform fusion processing on the regularized response subgraph of each color channel of the object to be detected to obtain a response graph of the object to be detected.

Optionally, before processing the response map with the first in-vivo detection model, the apparatus further comprises:

a first attribute value obtaining module, configured to obtain an attribute value of a response map of the object to be detected, where the attribute value includes at least one of: mean value and quality value of response intensity;

the first determining module is used for determining the in-vivo detection result of the object to be detected according to the size relationship between the attribute value of the response graph of the object to be detected and the corresponding attribute threshold value;

the first judging module is used for determining that the object to be detected is not a living body under the condition that the attribute value of the response image of the object to be detected is smaller than the corresponding attribute threshold value;

and the first step execution module is used for executing the step of carrying out the living body detection on the object to be detected by utilizing the first living body detection model based on the response graph under the condition that the attribute value of the response graph of the object to be detected is not less than the corresponding attribute threshold value.

Optionally, the first living body detection model is a model in which a first image feature of a response map of a living body face and a second image feature of the living body face are learned; the first living body detection result acquisition module 53 includes:

a first response map feature extraction unit, configured to extract a first image feature of the response map;

the first image feature extraction unit is used for extracting any video frame of the video to be detected and acquiring a second image feature of the video frame;

the first fused image feature extraction unit is used for fusing the first image feature and the second image feature to obtain a fused image feature;

and the first processing unit is used for processing the fusion image characteristics by using the first living body model to obtain a first living body detection result of the object to be detected.

Fig. 6 is a schematic structural diagram of a living body detecting apparatus according to an embodiment of the present application, and as shown in fig. 6, the living body detecting apparatus includes a video acquiring module 61, a response map generating module 62, and a detection result determining module 63, where:

the video acquiring module 61 is configured to acquire a video to be detected of an object to be detected, where the video to be detected is: acquiring a video of the object to be detected during illumination of the object to be detected according to a first illumination sequence;

a response map generating module 62, configured to generate a response map of the object to be detected according to the first illumination sequence and the second illumination sequence reflected by each entity position point of the object to be detected represented by the video to be detected, where response intensity of each pixel point in the response map is represented by: similarity between a second illumination sequence reflected by the entity position point corresponding to the pixel point and the first illumination sequence;

and a detection result determining module 63, configured to perform in-vivo detection on the object to be detected by using an in-vivo detection model based on the response map, so as to obtain an in-vivo detection result of the object to be detected.

extracting a plurality of video frames of the video to be detected;

and aiming at each entity position point of the object to be detected, obtaining a second illumination sequence reflected by the entity position point according to illumination reflected by a pixel point describing the entity position point in each video frame in the plurality of video frames.

or,

respectively calculating dense optical flow data between each of the other video frames except the reference video frame in the plurality of video frames participating in the alignment processing at the pixel point level and the reference video frame, wherein the reference video frame is any one of the plurality of video frames participating in the alignment processing at the pixel point level;

Optionally, the first illumination sequence and the second illumination sequence each comprise a color light sequence of a plurality of color channels; the response map generation module 62 includes:

a second separation unit, configured to separate the first illumination sequence to obtain a first color light sequence corresponding to each color channel, and separate the second illumination sequence to obtain a second color light sequence corresponding to each color channel;

a second response subgraph generation unit, configured to generate, for each color channel, a response subgraph of the object to be detected in the color channel according to a similarity between a second color light sequence of the color channel, which is reflected by each entity position point of the object to be detected, and a first color light sequence corresponding to the color channel;

and the second fusion unit is used for carrying out fusion processing on the response subgraph of each color channel of the object to be detected to obtain the response graph of the object to be detected.

Optionally, the second fusion unit comprises:

the second mean value obtaining subunit is configured to obtain a mean value of response intensities of the face region of the object to be detected in each color channel according to the response subgraph of the object to be detected in each color channel;

the second regularization subunit is configured to perform regularization processing on a response subgraph of the object to be detected in each color channel according to a response intensity average value of the face region of the object to be detected in each color channel, so as to obtain a regularized response subgraph of the object to be detected in each color channel;

and the second fusion subunit is used for performing fusion processing on the regularized response subgraph of each color channel of the object to be detected to obtain a response graph of the object to be detected.

Optionally, before processing the response map with the in-vivo detection model, the apparatus further comprises:

a second attribute value obtaining module, configured to obtain an attribute value of a response map of the object to be detected, where the attribute value includes at least one of: mean value and quality value of response intensity;

the second determining module is used for determining the living body detection result of the object to be detected according to the size relation between the attribute value of the response image of the object to be detected and the corresponding attribute threshold value;

the second judgment module is used for determining that the object to be detected is not a living body under the condition that the attribute value of the response image of the object to be detected is smaller than the corresponding attribute threshold value;

and the second step execution module is used for executing the step of performing the living body detection on the object to be detected by using the living body detection model based on the response map under the condition that the attribute value of the response map of the object to be detected is not less than the corresponding attribute threshold value.

Optionally, the living body detection model is a model in which a first image feature of a response map of a living body face and a second image feature of the living body face are learned; the detection result determining module 63 includes:

the second response map feature extraction unit is used for extracting the first image features of the response map;

the second image feature extraction unit is used for extracting any video frame of the video to be detected and acquiring a second image feature of the video frame;

the second fused image feature extraction unit is used for fusing the first image feature and the second image feature to obtain a fused image feature;

and the second processing unit is used for processing the fusion image characteristics by using the first living body model to obtain a first living body detection result of the object to be detected.

It should be noted that the device embodiments are similar to the method embodiments, so that the description is simple, and reference may be made to the method embodiments for relevant points.

An embodiment of the present application further provides an electronic device, and referring to fig. 7, fig. 7 is a schematic diagram of the electronic device provided in the embodiment of the present application. As shown in fig. 7, the electronic apparatus 100 includes: the memory 110 and the processor 120 are connected in a communication manner through a bus, and the memory 110 and the processor 120 store a computer program, which can be run on the processor 120, so as to implement the steps in the biopsy method disclosed in the embodiment of the present application.

Embodiments of the present application also provide a computer-readable storage medium, on which a computer program/instruction is stored, which, when executed by a processor, implements the living body detection method as disclosed in embodiments of the present application.

Embodiments of the present application further provide a computer program product, which includes a computer program/instruction, and when executed by a processor, the computer program/instruction implements the living body detection method disclosed in the embodiments of the present application.

The embodiment of the application also provides a computer program, and the computer program can realize the living body detection method disclosed by the embodiment of the application when being executed.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the true scope of the embodiments of the present application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrases "comprising one of \ 8230; \8230;" does not exclude the presence of additional like elements in a process, method, article, or terminal device that comprises the element.

The above detailed description is provided for a method for detecting a living body, an electronic device, a storage medium, and a program product, and specific examples are applied herein to explain the principles and implementations of the present application, and the descriptions of the above examples are only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of in vivo detection, comprising:

performing in-vivo detection on the object to be detected by using a first in-vivo detection model based on the response diagram to obtain a first in-vivo detection result of the object to be detected;

and determining a final in-vivo detection result of the object to be detected according to the first in-vivo detection result and the second in-vivo detection result.

2. The method according to claim 1, wherein the second illumination sequence reflected by each physical location point of the object to be detected represented by the video to be detected is obtained according to the following steps:

extracting a plurality of video frames of the video to be detected;

3. The method according to claim 2, wherein aligning pixel points describing the same physical location point of the object to be detected in the plurality of video frames comprises:

or,

4. The method according to claim 3, wherein the alignment processing at the pixel point level is performed by:

5. The method of any of claims 1-4, wherein the first illumination sequence and the second illumination sequence each comprise a color light sequence of a plurality of color channels;

6. The method according to claim 5, wherein the step of performing fusion processing on the response subgraph of each color channel of the object to be detected to obtain the response graph of the object to be detected comprises:

7. A method of in vivo detection, comprising:

and carrying out in-vivo detection on the object to be detected by using an in-vivo detection model based on the response diagram to obtain an in-vivo detection result of the object to be detected.

8. The method according to claim 7, wherein the second illumination sequence reflected by each physical position point of the object to be detected represented by the video to be detected is obtained according to the following steps:

extracting a plurality of video frames of the video to be detected;

9. The method according to claim 8, wherein aligning pixel points describing a same physical location point of the object to be detected in the plurality of video frames comprises:

or,

10. The method according to claim 9, wherein the alignment processing at the pixel point level is performed by:

11. The method of any of claims 7-10, wherein the first illumination sequence and the second illumination sequence each comprise a color sequence of a plurality of color channels;

12. The method according to claim 11, wherein the obtaining of the response graph of the object to be detected by performing fusion processing on the response subgraph of each color channel of the object to be detected comprises:

13. The method of any of claims 7-12, further comprising, prior to processing the response map using a liveness model,:

determining the living body detection result of the object to be detected according to the size relation between the attribute value of the response image of the object to be detected and the corresponding attribute threshold value;

14. The method according to any one of claims 7 to 13, wherein the living body detection model is a model in which a first image feature of a response map of a living body face and a second image feature of the living body face are learned;

performing in-vivo detection on the object to be detected by using an in-vivo detection model based on the response diagram to obtain an in-vivo detection result of the object to be detected, including:

extracting a first image feature of the response image;

15. An electronic device comprising a memory, a processor, and a computer program stored on the memory, wherein the processor executes the computer program to implement the liveness detection method of any one of claims 1 to 6; alternatively, the processor executes the computer program to implement the living body detection method of any one of claims 7 to 14.

16. A computer-readable storage medium on which a computer program/instructions are stored, characterized in that the computer program/instructions, when executed by a processor, implement the liveness detection method according to any one of claims 1 to 6; alternatively, the computer program/instructions, when executed by a processor, implement the liveness detection method of any one of claims 7 to 14.

17. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the liveness detection method of any one of claims 1 to 6; alternatively, the computer program/instructions when executed by a processor implement the liveness detection method of any one of claims 7 to 14.