WO2023221996A1 - 一种活体检测方法、电子设备、存储介质及程序产品 - Google Patents

一种活体检测方法、电子设备、存储介质及程序产品 Download PDF

Info

Publication number
WO2023221996A1
WO2023221996A1 PCT/CN2023/094603 CN2023094603W WO2023221996A1 WO 2023221996 A1 WO2023221996 A1 WO 2023221996A1 CN 2023094603 W CN2023094603 W CN 2023094603W WO 2023221996 A1 WO2023221996 A1 WO 2023221996A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
detected
target
video
video frame
Prior art date
Application number
PCT/CN2023/094603
Other languages
English (en)
French (fr)
Inventor
马志明
Original Assignee
北京旷视科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202210528092.5A external-priority patent/CN115147936A/zh
Priority claimed from CN202310306903.1A external-priority patent/CN116434349A/zh
Application filed by 北京旷视科技有限公司 filed Critical 北京旷视科技有限公司
Publication of WO2023221996A1 publication Critical patent/WO2023221996A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection

Definitions

  • This application relates to the field of data processing technology, and in particular to a living body detection method, electronic equipment, storage media and program products.
  • Liveness detection technology is becoming increasingly mature.
  • the related colorful liveness detection technology mainly includes two parts: first, lighting sequence inspection, which compares the emitted lighting sequence with the sequence of reflected light presented by the object to be detected in the video to determine whether there is camera hijacking;
  • the second is to use the liveness detection method of the collected video or image of the object to be detected to detect whether the obtained colorful video contains ordinary living attack behaviors, such as screen copying, printing paper copying, etc.
  • embodiments of the present application provide a living body detection method, electronic device, storage medium and program product to overcome the above problems or at least partially solve the above problems.
  • a first aspect of the embodiments of the present application provides a living body detection method, including:
  • Obtain a to-be-detected video of the to-be-detected object where the to-be-detected video is: a video of the to-be-detected object collected during the period of irradiating the to-be-detected object according to the first illumination sequence;
  • a response graph of the object to be detected is generated, and each of the objects in the response graph is The response intensity of the pixel point represents: the similarity between the second illumination sequence reflected by the entity position point corresponding to the pixel point and the first illumination sequence;
  • a first life detection model is used to perform life detection on the object to be detected, and a first life detection result of the object to be detected is obtained.
  • a second aspect of the embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory.
  • the processor executes the computer program to implement the method described in the first aspect. Liveness detection methods.
  • a third aspect of the embodiments of the present application provides a computer-readable storage medium on which a computer program/instruction is stored.
  • the computer program/instruction is executed by a processor, the living body detection method as described in the first aspect is implemented.
  • a fourth aspect of the embodiments of the present application provides a computer program product, which includes a computer program/instruction.
  • the computer program/instruction is executed by a processor, the living body detection method as described in the first aspect is implemented.
  • the response map of the object to be detected is specific to the pixel level. According to the similarity between the second illumination sequence and the first illumination sequence reflected by the physical position point of the object to be detected represented by each pixel point of the response map, Different reflection patterns presented by each entity location area of the object to be detected can be obtained.
  • the living human face is uneven, and the screen or printing paper used in the remake is relatively smooth and has high reflectivity. Therefore, the light reflected by the living human face and the remake have different patterns. Therefore, based on the response map of the object to be detected, the object to be detected can be distinguished as The remake is still a live face, thus obtaining the first liveness detection result of the object to be detected. That is to say, by introducing the feature of reflected light into the colorful life detection method, copy attack detection can be further realized based on the colorful life detection, effectively improving the accuracy and detection capability of life detection.
  • Figure 1 is a step flow chart of a living body detection method according to an embodiment of the present application.
  • Figure 2 is a schematic flow chart for obtaining the first living body detection result according to an embodiment of the present application
  • Figure 3 is a schematic flow chart of a living body detection according to an embodiment of the present application.
  • Figure 4 is a step flow chart of a living body detection method according to an embodiment of the present application.
  • Figure 5 is a schematic flow chart of a living body detection method according to an embodiment of the present application.
  • Figure 6 is a schematic structural diagram of a living body detection device according to an embodiment of the present application.
  • Figure 7 is a schematic diagram of an electronic device according to an embodiment of the present application.
  • Artificial Intelligence is an emerging science and technology that studies and develops theories, methods, technologies and application systems for simulating and extending human intelligence.
  • the subject of artificial intelligence is a comprehensive subject, involving many types of technologies such as chips, big data, cloud computing, Internet of Things, distributed storage, deep learning, machine learning, neural networks, etc.
  • Computer vision as an important branch of artificial intelligence, specifically allows machines to recognize the world.
  • Computer vision technology usually includes face recognition, live body detection, fingerprint recognition and anti-counterfeiting verification, biometric recognition, face detection, pedestrian detection, target detection, pedestrian Recognition, image processing, image recognition, image semantic understanding, image retrieval, text recognition, video processing, video content recognition, three-dimensional reconstruction, virtual reality, augmented reality, simultaneous localization and map construction (SLAM), computational photography, robot navigation and positioning and other technologies.
  • the relevant living body detection technology separates similarity detection and living body attack behavior detection.
  • This information combines similarity detection and live attack behavior detection. While considering similarity, it also takes into account the difference in reflected light between a real person and a screen/printed paper. Because the human face is uneven, different areas reflect different light, and the background area behind the human face often reflects only weak light or even no light due to the low intensity of light it receives; while using screens/printing paper, etc. Remake, screen/printing paper etc. are relatively smooth and flat, so The reflected light may be relatively uniform and regular. Therefore, the applicant thought that the information that the living human face and the screen/printing paper have different reflection patterns can be used to improve the accuracy of colorful living body detection.
  • the living body detection method can be applied to a backend server and includes the following steps:
  • Step S11 Obtain the video of the object to be detected, where the video to be detected is: the video of the object to be detected collected during the period of irradiating the object to be detected according to the first illumination sequence;
  • Step S12 Generate a response map of the object to be detected based on the first illumination sequence and the second illumination sequence reflected by each physical position point of the object to be detected represented by the video to be detected, and the response
  • the response intensity of each pixel in the figure represents: the similarity between the second illumination sequence reflected by the entity position point corresponding to the pixel and the first illumination sequence;
  • Step S13 Based on the response graph, use the first vitality detection model to perform vitality detection on the object to be detected, and obtain the first vitality detection result of the object to be detected;
  • the background server may send the first illumination sequence to the terminal.
  • the terminal emits light according to the first illumination sequence to illuminate the object to be detected, and collects the video of the object to be detected while emitting light according to the first illumination sequence to illuminate the object to be detected.
  • the object to be detected is an object collected by the camera of the terminal.
  • the solution of this application can also be executed by electronic devices such as terminals.
  • the electronic device when performing life detection, the electronic device itself generates a first illumination sequence and illuminates the object to be detected according to the first illumination sequence. During this period, the video of the object to be detected is collected; and then the electronic device itself executes the subsequent living body detection process based on the first illumination sequence and the video to be detected.
  • the first illumination sequence is issued by the backend server or generated by electronic devices such as terminals
  • the specific living body detection process is executed by the backend server or by electronic devices such as terminals themselves, and even some steps in the living body detection process.
  • the execution by electronic devices such as terminals, and the execution of some steps by the backend server can all be set according to actual needs.
  • the embodiments of this application do not limit this, and the possible implementation methods are not listed one by one.
  • a location point of the object to be detected refers to a point where the object to be detected actually exists. For example, when the object to be detected is a face, an entity location point of the object to be detected can be a point on the nose of the face.
  • the size is the size represented by one pixel in the video.
  • each element in the first illumination sequence and the second illumination sequence can represent the illumination intensity of white light
  • the first illumination sequence and the second illumination sequence can represent the illumination intensity of white light
  • Each element in the two-light sequence can represent the illumination intensity of the colored light and/or the color of the colored light.
  • the similarity between the first illumination sequence and the second illumination sequence can be obtained through the dot product of the two sequences. Because the response map of the object to be detected is refined to the pixel level, through the response map of the object to be detected, different reflection patterns presented in different locations of the object to be detected can be obtained.
  • the response graph of the object to be detected is input into the first living body detection model, and the first living body detection model performs life detection on the object to be detected according to the response graph of the object to be detected, so that the first living body detection result of the object to be detected can be obtained.
  • the first living body detection model is a model that has learned the first image feature of the response map of a living human face through supervised training, and can distinguish the response map of a living human face from the response map of other attacks (for example: remake attack). Therefore, The first living body detection model can obtain the first living body detection result of the object to be detected through the response graph of the object to be detected.
  • the supervised training of the first living body detection model can be: obtaining the response graphs of multiple sample objects (including living faces and other objects), inputting the response graphs of the sample objects into the first living body detection model to be trained, and obtaining The predicted probability that the sample object is a living body; based on the predicted probability and whether the sample object is actually alive, a loss function is established, and the model parameters of the first living body detection model to be trained are updated based on the loss function to obtain the first living body detection model.
  • the first living body detection model can learn the response map of a living human face.
  • the method of obtaining the response graph of the sample object can refer to the method of obtaining the response graph of the object to be detected.
  • the method further includes: inputting the video to be detected into a second life detection model to obtain a second life detection result of the object to be detected; based on the first life detection result and the second life detection model The detection result determines the final vitality detection result of the object to be detected.
  • a second liveness detection model can also be used to obtain the second liveness detection result of the object to be detected through the video to be detected.
  • the second life detection model may be a commonly used model for life detection in related technologies, and the second life detection model may perform life detection based on the input video or video frame.
  • the second living body detection model can be any type of living body detection model, such as a mask attack living body detection model, an action detection living body model, etc.; for some living body detection models, the user may be required to perform corresponding Therefore, in order to obtain the first life detection result and the second life detection result based on the video to be detected, during video recording, the user can be instructed to perform corresponding actions according to the needs of the second life detection model.
  • a specific application scenario settings can be made according to the actual needs of the specific application scenario, and the embodiments of the present application do not limit this.
  • the first life detection model and the second life detection model can work in parallel to obtain the first life detection result and the second life detection result in parallel; or the first life detection model can first obtain the first life detection result, Then the second living body detection model obtains the second living body detection result; it may also be that the second living body detection model first obtains the second living body detection result, and then the first living body detection model obtains the first living body detection result.
  • the optimal identification of the object to be detected can be obtained. End-of-life test results.
  • the first living body detection result and the second living body detection result may respectively represent the probability that the object to be detected is a living human face.
  • the living body detection result is: it is a living body; when the smaller value is not greater than the living body threshold, it is determined that the final living body detection result of the object to be detected is: it is not a living body.
  • the living body threshold can be a relatively reasonable value set in advance.
  • different weights can also be set for the first vitality detection result and the second vitality detection result, and the final vitality detection result of the object to be detected is obtained by combining the weighted first vitality detection result and the weighted second vitality detection result. result.
  • the response map of the object to be detected is specific to the pixel level. According to each pixel point of the response map, the relationship between the second illumination sequence reflected by the physical position point of the object to be detected and the first illumination sequence is The similarity can be used to obtain different reflection patterns presented by each entity location area of the object to be detected.
  • the living human face is uneven, and the screen or printing paper used in the remake is relatively smooth and has high reflectivity. Therefore, the light reflected by the living human face and the remake have different patterns.
  • the first living body detection model can be based on the response map of the object to be detected, Distinguish whether the object to be detected is a remake or a live face, thereby obtaining the first live detection result of the object to be detected.
  • the first living body detection model makes use of the information that remakes and living faces have different reflection patterns, and realizes the combination of two detection methods (lighting sequence inspection and detection of whether it is a remake), and the determined first living body detection result is more accurate precise.
  • the second living body detection model can also obtain the second living body detection result of the subject to be detected.
  • the final living body detection result of the subject to be detected is determined by combining the first living body detection result and the second living body detection result, which can ensure that the real person is determined to be Without reducing the probability of living bodies, the ability to detect other attacks is effectively improved, thus effectively improving the accuracy of the final live body detection results.
  • the first living body detection model can also be allowed to learn the second image features of the living human face.
  • the supervised training of the first living body detection model can be: obtaining response maps of multiple sample objects (including living faces and other objects), and the collected video of each sample object, and extracting from the video
  • the first living body detection model to be trained performs supervised training based on the response map and the video frame, as well as the information on whether the sample object is actually alive.
  • the model structure of the first living body detection model can only allow one input, the first image feature of the response map and the second image feature of the video frame can be extracted; the first image feature of the response map, and The second image features of the video frame are fused, and the fused features are input into the first living body detection model to be trained to obtain the first living body detection result of the sample object.
  • the first living body detection model that has learned the image features of the living human face can obtain the first live body detection result of the object to be detected based on the fused image features.
  • the method for obtaining the fused image features is: extracting the first image feature of the response graph; extracting any video frame of the video to be detected, and obtaining the second image feature of the video frame; fusing the first image features and the second image features to obtain fused image features.
  • the technical solutions of the embodiments of the present application not only the response map of the living human face is learned, but also the first living body detection model of the image features of the living human face is learned, which can avoid attacks such as wearing masks, thereby improving the first living body detection model. Detection accuracy.
  • the second illumination sequence reflected by each physical position point of the object to be detected represented by the video to be detected is obtained according to the following steps: extracting multiple elements of the video to be detected Video frames; align pixel points in the plurality of video frames that describe the same entity position point of the object to be detected; for each entity position point of the object to be detected, according to each of the plurality of video frames The video frame describes the illumination reflected by the pixel point of the entity position point, and a second illumination sequence reflected by the entity position point is obtained.
  • the video to be detected has multiple video frames, and the position of an entity position point of the object to be detected may be different in each video frame.
  • the second reflected light of each entity position point of the object to be detected can be obtained. Lighting sequence.
  • the illumination is red light and there are 5 video frames in total.
  • the pixel points of an entity position point of the object to be detected in each video frame are A, B, C, D, and E respectively. If points A, C, and D All reflect red light, but points B and E do not reflect red light, then the second entity sequence of the red light reflected by the entity position point of the object to be detected can be 10110. It can be understood that, according to the intensity of the reflected light, the number in the second entity sequence can also be a number between 0 and 1, and the numerical value is related to the intensity of the reflected light.
  • the problem that the position of the object to be detected is different in each video frame and it is difficult to obtain the second illumination sequence reflected by each physical position point of the object to be detected can be solved.
  • Alignment of multiple video frames can be achieved based on facial key points.
  • facial key points are detected on each of the multiple video frames of the video to be detected, and the facial key points contained in each of the multiple video frames are obtained.
  • This application does not require specific detection methods for facial key point detection. Relevant facial key point detection algorithms or software can be used; other shorter time-consuming methods can also be used, such as in obtaining the first video frame.
  • key points of the face take the corners of the left and right eyes, the tip of the nose, and the left side of the lips. Five points at the right corner of the mouth are used as anchor points, and the remaining video frames are mapped to the template through the Thin Plate Spline interpolation algorithm.
  • the anchor points can be customized according to the input points of the face key points. You only need to ensure that the positions of some key points are fixed after rough alignment. The greater the number of anchor points, the better the alignment, but 5 points is enough. It is understandable that when a human face cannot be detected, the object to be detected can be directly determined to be an inanimate body.
  • each video frame can be aligned at the facial key point level. Further, the multiple video frames that have been aligned at the face key point level are aligned at the pixel level.
  • multiple video frames can also be directly aligned at the pixel level.
  • Directly performing alignment processing at the pixel level can make the alignment process simpler; performing alignment processing at the face key point level first, and then performing alignment processing at the pixel level can save the calculations consumed when performing alignment processing at the pixel level. resource. How to implement pixel-level alignment can be selected according to actual needs.
  • fine alignment at the pixel level can be achieved through the following process: separately calculate the difference between each video frame except the reference video frame among the multiple video frames participating in the pixel level alignment process, and the reference video frame.
  • Dense optical flow data between frames, the reference video frame is any one of the multiple video frames participating in the pixel level alignment process; according to each other video frame and the reference video frame The dense optical flow data between the other video frames and the reference video frame are aligned at the pixel level.
  • the first video frame can be used as the reference video frame, and the dense optical flow data between each other video frame and the reference video frame can be calculated in the brightness channel.
  • the dense optical flow data can be used as the mapping basis to compare each other video frame with the reference video frame.
  • Video frames are finely aligned at the pixel level. Among them, pixel-level alignment processing can be achieved based on dense optical flow algorithms, such as Gunnar Farneback algorithm (a dense optical flow algorithm).
  • Figure 2 shows a schematic flow chart for obtaining the first living body detection result.
  • the steps for obtaining the first living body detection result may include: alignment processing at the face key point level, alignment processing at the pixel level, obtaining the second illumination sequence, similarity Calculate and generate a response graph, obtain video frames, input the first living body detection model, and obtain the first living body detection result.
  • the above steps can form a relatively complete process, but according to actual needs, one or more of the steps can be discarded. For example, the step of alignment processing at the face key point level can be discarded to reduce the complexity of the entire process.
  • the step of obtaining video frames can be discarded, and accordingly only the response map is input into the first living body detection model, and the identification of attacks such as wearing masks can be achieved through the second living body detection model to avoid the first living body detection model.
  • the alignment processing at the key point level and the steps of obtaining video frames reduce the complexity of the entire process and avoid duplication of work between the first living body detection model and the second living body detection model.
  • the first illumination sequence and the second illumination sequence may both be colored light sequences including multiple color channels.
  • the similarity between the first colored light sequence and the second colored light sequence in each color channel is calculated, and then the responder of the color channel is obtained. picture.
  • each color channel can be regularized.
  • the method of regularizing each color channel can also refer to related technologies, and this application does not limit this. Among them, because the first lighting sequence is issued by the background, the first colored light sequence corresponding to each color channel can be directly obtained from the background.
  • one second is used as the unit between colored light sequences.
  • the multiple color channels are red channel, yellow channel, and blue channel.
  • the colored light emitted is red light in the first second, and red light and yellow light in the second second.
  • the orange light composed together the third second is the white light composed of yellow light and blue light, the fourth second is yellow light, and the fifth second is red light.
  • 1 represents the presence of light with the corresponding color channel, and 0 represents no light. If there is light corresponding to the color channel, the obtained first colored light sequence may be: the first colored light sequence 11001 of the red channel, the first colored light sequence 01110 of the yellow channel, and the first colored light sequence 00100 of the blue channel.
  • a second sequence of colored light corresponding to each color channel reflected by each physical position point of the object to be detected can be obtained. It can be understood that, according to the intensity of the emitted light, the numbers in the first colored light sequence can be between 0 and 1; according to the intensity of the reflected light, the numbers in the second colored light sequence can be between 0 and 1. number between 1.
  • the similarity between the second colored light sequence of the color channel reflected by an entity position point of the object to be detected and the first colored light sequence corresponding to the color channel is greater than the preset value, Then the color of the pixel corresponding to the entity position point in the response submap of the color channel is the color of the color channel.
  • the response subgraphs of each color channel are fused to obtain the response map of the object to be detected.
  • the color of the corresponding pixel point in the response map of the red channel of an entity position point of the object to be detected is red (the similarity is greater than the preset value), and the color of the corresponding pixel point in the response map of the yellow channel is yellow ( The similarity is greater than the preset value), and the color of the corresponding pixel in the response map of the blue channel is none (the similarity is not greater than the preset value), then the response submaps of each color channel are fused. After combining, the position of the corresponding pixel in the response map of the object to be detected is orange composed of red and yellow.
  • Response graph has the advantage of being more accurate.
  • the response intensity of the face area in the response sub-image can be used.
  • the mean value is used to regularize the response subgraph of the color channel to obtain the regularized response subgraph of the object to be detected in the color channel.
  • the regularized response subgraph of each color channel of the object to be detected is fused to obtain the response map of the object to be detected.
  • regularization processing of the response sub-image of the color channel can be: changing the response sub-image of the color channel into The mean value of the response intensity is divided by the response intensity of each pixel of the response subimage of the color channel, and the resulting quotient is used as the response intensity of each pixel of the regularized response subimage of the color channel.
  • the response intensity of the regularized response map can be calculated by the following formula:
  • F represents the face area
  • r i, j represents the response intensity of the pixel point (i, j) of the response map
  • n represents the number of pixels
  • N F represents the mean response intensity of the face area
  • ri i, j uint8(min(max(r′ i, j *255, 0), 255));
  • ri i, j represents the response intensity of the pixel point (i, j) of the fused response map
  • unit8 represents the conversion of floating point numbers into 8-bit non-negative integers
  • r′ i, j represents the response map after regularization.
  • the response intensity of the response sub-image of each color channel can be balanced to avoid overly bright scenes.
  • the response graph of the object to be detected after obtaining the response graph of the object to be detected, and before inputting the response graph of the object to be detected into the first living body detection model, it is possible to first obtain the response graph of the object to be detected according to at least An attribute value that determines the liveness detection result of the object to be detected.
  • the attribute values of the response map of the object to be detected include at least one of the following: response intensity mean, quality value.
  • response intensity mean When any attribute value is less than the corresponding attribute threshold, it is determined that the object to be detected is not a living body.
  • quality value When each attribute value is not less than the corresponding attribute threshold, the response map of the object to be detected is input into the first living body detection model.
  • Figure 3 shows a schematic flow chart of life detection.
  • the average response intensity of the response graph of the object to be detected is less than the response intensity threshold, it can be considered that the second color light sequence collected is too weak, so the response graph is not enough to be used as a clue for in vivo verification. For security reasons, it can be considered that there is an attack. It is directly determined that the object to be detected is not a living body, so there is no need to input the response graph of the object to be detected into the first living body detection model.
  • the quality value of the response graph of the object to be detected is less than the quality threshold, there may be an attack, and it is directly determined that the object to be detected is not a living body. Therefore, there is no need to input the response graph of the object to be detected into the first living body detection model.
  • the quality value of the response graph of the object to be detected can be determined based on the noise in the response graph. The more noise, the lower the quality value.
  • the quality value of the response graph can be calculated by the following formula:
  • quality is the quality value
  • r i, j represents the response intensity of the pixel point (i, j) of the response map
  • t represents each element in the lighting sequence
  • y′ t represents the first element after color channel regularization processing.
  • the illumination sequence, x′ t represents the second illumination sequence after color channel regularization.
  • FIG. 4 a flow chart of steps of a living body detection method in an embodiment of the present application is shown.
  • the living body detection method can be applied to a backend server and includes the following steps:
  • Step S41 Obtain the video of the object to be detected, where the video to be detected is: the video of the object to be detected collected during the period of irradiating the object to be detected according to the first illumination sequence;
  • Step S42 Generate a response map of the object to be detected based on the first illumination sequence and the second illumination sequence reflected by each physical position point of the object to be detected represented by the video to be detected, and the response
  • the response intensity of each pixel in the figure represents: the similarity between the second illumination sequence reflected by the entity position point corresponding to the pixel and the first illumination sequence;
  • Step S43 Based on the response graph, use a vitality detection model to perform vitality detection on the object to be detected, and obtain the vitality detection result of the object to be detected.
  • the method of obtaining the detection video of the object to be detected and generating the response graph of the object to be detected can be Refer to the method described above to obtain the video of the object to be detected and generate the response graph of the object to be detected; the training method of the living body detection model can refer to the training method of the first living body detection model.
  • the living body detection result of the object to be detected can be obtained.
  • the response map of the detection object is specific to the pixel level, and the difference between the second illumination sequence and the first illumination sequence reflected by the physical position point of the object to be detected, represented by each pixel point of the response map, is
  • the similarity can be used to obtain the different reflection patterns presented by each entity location area of the object to be detected.
  • the face of a living person is uneven, and the screen or printing paper used in the remake is relatively smooth and has high reflectivity. Therefore, the light reflected by the face of the living person and the remake has different patterns. Therefore, the living body detection model can distinguish the objects to be detected based on the response map of the object to be detected.
  • the liveness detection model makes use of the information that remakes and live faces have different reflection patterns, and realizes the combination of two detection methods (lighting sequence inspection and detection of whether it is a remake), and the determined liveness detection results are more accurate.
  • the step of obtaining the second illumination sequence reflected by each entity position point of the object to be detected represented by the video to be detected is obtained by following the following steps:
  • aligning pixels describing the same physical location point of the object to be detected in the multiple video frames specifically includes the following process:
  • face key point detection is performed on the video frame to obtain the face key points contained in the video frame.
  • point performing alignment processing at the face key point level on the plurality of video frames, and performing alignment processing at the pixel level on the plurality of video frames that have been aligned at the face key point level;
  • the pixel level alignment process can be performed through the following process:
  • each other video frame and the reference video frame are aligned at the pixel level.
  • the first illumination sequence and the second illumination sequence may both include colored light sequences of multiple color channels; according to the first illumination sequence, and the to-be-detected video characterized by Detecting the second illumination sequence reflected by each physical position point of the object and generating a response map of the object to be detected includes: separating the first illumination sequence to obtain the first colored light sequence corresponding to each of the color channels, And, separate the second illumination sequence to obtain a second colored light sequence corresponding to each of the color channels; for each of the color channels, according to the reflected light of each physical position point of the object to be detected The similarity between the second colored light sequence of the color channel and the first colored light sequence corresponding to the color channel is used to generate a response subgraph of the object to be detected in the color channel; The response sub-images of each of the color channels are fused to obtain a response image of the object to be detected.
  • the specific steps can be referred to the above.
  • performing a fusion process on the response subgraphs of the object to be detected in each color channel to obtain a response map of the object to be detected including: according to the response subgraphs of the object to be detected in each color channel.
  • the response subgraph of the subject to be detected is obtained to obtain the mean response intensity of the face area of the subject to be detected in each of the color channels; according to the mean response intensity of the face area of the subject to be detected in each of the color channels, the The response subgraph of the object to be detected in the color channel is regularized to obtain the regularized response subgraph of the object to be detected in each color channel;
  • the regularized response subgraphs of the channel are fused to obtain the response graph of the object to be detected.
  • the response subgraphs of the object to be detected in each of the color channels are fused to obtain the response map of the object to be detected, including:
  • the response subgraph of the object to be detected in each color channel obtain the mean response intensity of the face area of the object to be detected in each color channel; according to the face area of the object to be detected At the mean value of the response intensity of each color channel, the response subgraph of the object to be detected in the color channel is regularized to obtain the regularized response subgraph of the object to be detected in each of the color channels. ; Fusion processing is performed on the regularized response subgraphs of the object to be detected in each color channel to obtain a response map of the object to be detected.
  • the first living body detection model before using the first living body detection model to process the response graph, it also includes: obtaining attribute values of the response graph of the object to be detected, where the attribute values include at least one of the following: response intensity mean, Quality value; determine the living body detection result of the object to be detected according to the relationship between the attribute value of the response graph of the object to be detected and the corresponding attribute threshold; when the attribute value of the response graph of the object to be detected is less than the corresponding In the case of the attribute threshold, it is determined that the object to be detected is not a living body; in the case where the attribute value of the response map of the object to be detected is not less than the corresponding attribute threshold, execution based on the response map using the third A living body detection model performs the step of living body detection on the object to be detected.
  • the specific steps can be referred to the above.
  • the first living body detection model is a model that has learned the first image feature of the response map of the living human face and the second image feature of the living human face; based on the response map, using the first living body detection
  • the model performs life detection on the object to be detected and obtains the first life detection result of the object to be detected, including: extracting the first image feature of the response graph; extracting any video frame of the video to be detected, and Obtain the second image features of the video frame; fuse the first image features and the second image features to obtain fused image features; and the first life detection model is used to process the fused image features to obtain the first life detection result of the object to be detected.
  • the specific steps can be referred to the above.
  • the video obtained is an attack video sent directly to the server through camera hijacking technology, not a video collected by the terminal.
  • the captured video may have dropped frames, multiple frames, or be out of sync, resulting in misalignment between the emitted lighting sequence and the reflected lighting sequence.
  • the emitted lighting sequence has 24 lighting elements, and each lighting element corresponds to a video frame; video collection is started only when the second lighting element of the lighting sequence emits light to illuminate the object to be detected, which will result in the collected video having only 23 video frames, the collected video may lose frames; the video collection has started before light is emitted to illuminate the object to be detected, and the first lighting element of the lighting sequence corresponds to the second video frame of the video, which may cause The collected video has 25 frames, so the collected video has multiple frames; video collection is started only when the second lighting element of the lighting sequence emits light to illuminate the object to be detected, and when the 24th lighting element of the lighting sequence emits light When the video frame collected when light illuminates the object to be detected is the penultimate video frame, although the collected video also has 24 video frames, there is a problem of out-of-synchronization.
  • each video frame of the collected video is processed, and the length of the reflected illumination sequence obtained is equal to the number of video frames. Therefore, when calculating the similarity between the reflected illumination sequence and the transmitted illumination sequence, the calculation amount is relatively large. big.
  • the living body detection method proposed in the related art uses the information that the living human face and the screen/printing paper have different reflection patterns to improve the accuracy of live body detection. Specifically, the human face is uneven and the light reflected in different areas is different. If the screen/printing paper is reproduced, the screen/printing paper is relatively smooth and flat, so the reflected light is relatively uniform and regular.
  • This method needs to generate different reflected illumination sequences for different areas of the object to be detected, and calculate the similarity between the issued illumination sequence and the different reflected illumination sequences, then generate a response graph, and determine the living body detection result based on the response graph. . Because there are multiple reflected illumination sequences, the calculation amount of this living body detection method is particularly large, and it takes a long time to obtain the living body detection results.
  • this embodiment provides a life detection method, which method includes:
  • the video to be detected is: after irradiating the video to be detected according to the first illumination sequence. During the object period, the video of the object to be detected is collected;
  • a video frame sequence of a target number of frames is extracted from the video to be detected, and a second illumination sequence is determined based on each video frame in the video frame sequence.
  • the second illumination sequence represents each of the objects to be detected. Reflected light from entity position points;
  • sequence number information of each video frame in the video frame sequence and the index information of each lighting element in the first lighting sequence multiple candidate lighting sequences are extracted from the first lighting sequence, wherein each candidate The index information of each element in the lighting sequence and the sequence number information of each video frame in the video frame sequence satisfy different matching relationships;
  • the living body detection result of the object to be detected is obtained.
  • the second illumination sequence represents the reflected light of each entity position point of the object to be detected in each video frame in the video frame sequence. Therefore, the length of the second illumination sequence is the same as the length of the video frame sequence and is shorter than the length of the object to be detected. The number of raw video frames of the detected video. According to the sequence number information of each video frame in the video frame sequence and the index information of each lighting element in the first lighting sequence, multiple candidate lighting sequences are extracted from the first lighting sequence. Therefore, the length of the candidate lighting sequence is also the same as the video frame sequence. of the same length.
  • the illumination sequence and the candidate illumination sequence obtain the living detection result of the object to be detected, which has the advantages of small calculation amount and short time consumption.
  • the length of the candidate illumination sequence and the second illumination sequence are the same, the determined living body detection result of the object to be detected is more accurate.
  • a video frame sequence of a target number of frames is extracted from the video to be detected, and a second illumination sequence is determined based on each video frame in the video frame sequence.
  • the second illumination sequence represents each of the objects to be detected. Reflected light from entity position points;
  • sequence number information of each video frame in the video frame sequence and the index information of each lighting element in the first lighting sequence multiple candidate lighting sequences are extracted from the first lighting sequence, wherein each candidate The index information of each element in the lighting sequence and the sequence number information of each video frame in the video frame sequence satisfy different matching relationships;
  • the second illumination sequence reflected by each entity position point generates a response map of the object to be detected, including:
  • a response map of the object to be detected is generated according to the second illumination sequence and the target candidate illumination sequence.
  • the terminal emits light of different colors and/or different illumination intensities according to the first illumination sequence to illuminate the object to be detected, and collects the video of the object to be detected while emitting light according to the first illumination sequence to illuminate the object to be detected.
  • the object to be detected is an object collected by the camera of the terminal.
  • the first lighting sequence may be a lighting sequence generated by the terminal itself, or may be a lighting sequence issued by the server to the terminal.
  • Extract the target number of video frames from the video to be detected and generate a video frame sequence based on the target number of video frames.
  • the video to be detected includes multiple original video frames, the extracted target number of video frames is part of the original video frames, and the extracted target number of video frames is less than the number of original video frames.
  • the order of each video frame in the video frame sequence follows the order of each video frame in the video to be detected.
  • the extraction rules can be set according to the needs, which can be uniform extraction or non-uniform extraction.
  • the target number of frames can be a pre-configured number of frames, but because the video to be detected may have dropped frames or multiple frames, even if the video frames are extracted according to the same rules, the original video frame corresponding to each extracted video frame
  • the frame numbers of video frames may also be different.
  • the extraction rules are uniform extraction, the target number of frames is 8, and the total number of original video frames included in the video to be detected is 24 frames, the 4th video frame extracted may be the 4th video frame in the original video frame. 14 video frames. When the total number of original video frames included in the video to be detected is 23 frames, the extracted 4th video frame may be the 13th video frame in the original video frames.
  • each video frame in the video frame sequence determine the corresponding pixel point of the entity position point of the object to be detected in each video frame in the video frame sequence, and obtain the corresponding pixel point of each entity position point in each video frame.
  • the pixel value of the pixel point is used to obtain the reflected light of each entity position point in each video frame in the video frame sequence, and then the second illumination sequence is obtained.
  • An entity position point of the object to be detected refers to a point where the object to be detected actually exists. For example, when the object to be detected is a face, an entity position point of the object to be detected can be a point on the nose of the face.
  • the size of this point is the size represented by a pixel in the video. Because each lighting element in the second lighting sequence represents reflected light in one video frame, the length of the second lighting sequence is the same as the length of the video frame sequence.
  • the lighting element is an element in the lighting sequence, which can represent the color, intensity and other information of the light.
  • the terminal emits red light, blue light and green light in sequence, and the corresponding lighting sequence can be (red, blue, green), then "red” in the lighting sequence can be a lighting element, which represents the light color.
  • the matching relationship between the lighting elements in the second lighting sequence and the lighting elements in the first lighting sequence is consistent with the matching relationship between each video frame in the video frame sequence and the video to be detected. corresponding relationship between the original video frames.
  • the second video in the video frame sequence The frame is the fifth original video frame of the video to be detected, then the second lighting element in the second lighting sequence matches the fifth lighting element in the first lighting sequence.
  • the matching relationship between the lighting elements in the second lighting sequence and the lighting elements in the first lighting sequence may not match the relationship between each video frame in the video frame sequence and the one to be detected.
  • the correspondence between the original video frames of the video For example, the second video frame in the video frame sequence is the fifth video frame in the original video frame.
  • the video frame of the video to be detected is different from the first lighting There is an offset distance of one step between the sequences, causing the light reflected by the 5th video frame in the video to be detected to correspond to the 6th lighting element of the first lighting sequence. Therefore, the 2nd lighting element in the second lighting sequence should be the same as Matches the 6th lighting element in the first lighting sequence.
  • each lighting element in the second lighting sequence should match which lighting element in the first lighting sequence, on the one hand, it depends on the correspondence between the extracted video frame and the original video frame, on the other hand Depends on the offset distance between the sequence of light reflected by the video to be detected and the first illumination sequence.
  • Each matching relationship corresponds to a correspondence relationship and an offset distance.
  • the correspondence relationship is the correspondence relationship between the sequence number of each video frame in the video frame sequence and the frame sequence number of the original video frame.
  • the offset distance represents the offset distance for sliding matching between the sequence of light reflected by the video to be detected and the first illumination sequence.
  • the correspondence relationship is related to the total number of original video frames and the extraction rules for extracting the video frame sequence from the video to be detected.
  • the offset distance is related to whether the video to be detected has problems such as frame loss, multiple frames, or out-of-synchronization.
  • the matching relationship represents a one-to-one matching relationship between the serial number information of each video frame in the video frame sequence and the index information of each lighting element in the first lighting sequence in each candidate lighting sequence. Therefore, according to the serial number of each video frame in the video frame sequence Information and index information of each lighting element in the first lighting sequence, multiple candidate lighting sequences can be extracted from the first lighting sequence, and the length of each candidate lighting sequence is the same as the length of the video frame sequence. In the case where a matching relationship represents the first video frame in the video frame sequence and the index of the first illumination element in the first illumination sequence of a candidate illumination sequence is 5, then the index of the first illumination element in the first illumination sequence is 5. The 5 lighting elements are used as the first lighting element of the candidate lighting sequence, and so on, to obtain each lighting element in the candidate lighting sequence.
  • Each matching relationship corresponds to a correspondence relationship and an offset distance.
  • the number of target frames and extraction rules are determined, but the total number of original video frames is uncertain, so the correspondence relationship includes uncertainty.
  • a variety of matching relationships can be obtained by exhaustively enumerating the possible values of the two unknowns, the total number of frames and the offset distance of the original video frame.
  • the length of the first illumination sequence is 24, because the video to be detected usually only contains one or two frames of multi-frame or few frames, so the total number of frames of the video to be detected can be 22, 23, 24, 25 or 26.
  • the minimum value of the offset distance is 0, and the maximum value will be described in detail later.
  • the similarity values between the multiple candidate illumination sequences and the second illumination sequence can be determined, and the target candidate illumination sequence can be screened out from the multiple candidate illumination sequences based on the similarity values between each candidate illumination sequence and the second illumination sequence.
  • the candidate illumination sequence with the largest similarity value to the second illumination sequence may be determined as the target candidate illumination sequence.
  • a response graph of the object to be detected can be generated, and then based on the response graph, the living body detection result of the object to be detected is determined.
  • the response intensity of each pixel point in the response map represents: the similarity between the second illumination sequence reflected by the entity position point corresponding to the pixel point and the first illumination sequence.
  • the live detection model is a model that learns the image features of the response map of a live face through supervised training. It can distinguish the response map of a live face from the response maps of other attacks (such as remake attacks). Therefore, the live detection model can Through the response graph of the object to be detected, the living body detection result of the object to be detected is obtained.
  • the supervised training of the living body detection model can be: obtaining the response graphs of multiple sample objects (including living faces and other objects), inputting the response graphs of the sample objects into the living body detection model to be trained, and obtaining that the sample objects are alive The predicted probability; based on the predicted probability and whether the sample object is actually alive, a loss function is established, and the model parameters of the living body detection model to be trained are updated based on the loss function to obtain the living body detection model.
  • the live body detection model can learn the response map of live human faces.
  • the method of obtaining the response graph of the sample object can refer to the method of obtaining the response graph of the object to be detected.
  • the candidate illumination sequence with the highest similarity value to the second illumination sequence is the candidate illumination sequence corresponding to the true matching relationship. Therefore, each lighting element in the target candidate lighting sequence is filtered out according to the similarity value, and corresponds one-to-one to each lighting element in the second lighting sequence. Therefore, according to the second illumination sequence and the target candidate illumination sequence, the living body detection result of the object to be detected is obtained more accurately.
  • the second illumination sequence represents the reflected light of each physical position point of the object to be detected in each video frame in the video frame sequence, so the length of the second illumination sequence is the same as the length of the video frame sequence. , and less than the number of original video frames of the video to be detected.
  • the sequence number information of each video frame in the video frame sequence and the index information of each lighting element in the first lighting sequence multiple candidate lighting sequences are extracted from the first lighting sequence. Therefore, the length of the candidate lighting sequence is also the same as the video frame sequence. of the same length.
  • the illumination sequence and the candidate illumination sequence obtain the living detection result of the object to be detected, which has the advantages of small calculation amount and short time consumption.
  • the length of the candidate illumination sequence and the second illumination sequence are the same, the determined activity of the object to be detected is Physical examination results are more accurate.
  • the matching relationship can be determined based on the target video and the target illumination sequence.
  • the target video may be a video to be detected, and the target illumination sequence may be a first illumination sequence; or the target video may be a sample video, and the target illumination sequence may be a sample illumination sequence with the same length as the first illumination sequence.
  • the sample lighting sequence can be any lighting sequence.
  • the first lighting sequence is a lighting sequence delivered according to the delivery strategy. Therefore, the length of the first lighting sequence can be obtained before the first lighting sequence is delivered.
  • the sample video can be any video, because the total number of video frames of the sample video can be the hypothetical total number of frames of the video to be detected.
  • the matching relationship is determined through the following process:
  • the target video and the target illumination sequence where the target video includes a plurality of original video frames
  • the target video frame with the target number of frames extracted from the target video; when the target video is a video to be detected, the target video frame is a video frame in the video frame sequence.
  • the target video is a sample video
  • the target video frame is a video frame extracted from the sample video
  • the extraction rules used to extract the target video frame from the target video are the same as those used to extract video frames from the video to be detected. The extraction rules are the same.
  • the extraction rule the number of target frames, and the total number of original video frames of the target video are all determined, the corresponding relationship between the target serial number of the target video frame and the frame serial number of the original video frame can be determined.
  • determining the corresponding relationship between the sequence number of the target video frame extracted from the target video and the frame sequence number of the original video frame of the target video includes:
  • Determining multiple matching relationships based on the corresponding relationships and the plurality of offset distances including:
  • Multiple matching relationships are determined based on the multiple corresponding relationships and the multiple offset distances.
  • the total number of frames of the original video frames is usually uncertain. Therefore, when the extraction rules and the number of target frames are both determined, but the total number of frames of the original video frames is uncertain, the original video can be The total number of frames takes different values to obtain various correspondences between the target sequence number of the target video frame and the frame sequence number of the original video frame. Multiple hypothetical total frame numbers of the target video can be obtained, and for each hypothetical total frame number, the corresponding relationship between the target serial number of the target video frame and the frame serial number of the original video frame is determined.
  • the target sequence number of the target video frame represents: the order in which the target video frame is located among multiple target video frames.
  • the multiple matching relationships obtained subsequently include matching relationships corresponding to different corresponding relationships. Furthermore, when the number of original video frames contained in each video to be detected is different, there is at least a matching relationship among the multiple matching relationships that is adapted to the actual number of original video frames contained in each video to be detected.
  • the original video frame After obtaining the correspondence between the target sequence number of the target video frame and the frame sequence number of the original video frame, the original video frame can be slidingly matched with the target lighting sequence, and multiple offset distances can be determined.
  • the original video frame and the target illumination sequence are slidingly matched, not to determine at what offset distance the original video frame and the target illumination sequence can match, but to determine the value range of the offset distance.
  • multiple correspondence relationships and offset distance value ranges are determined, multiple correspondence relationships and multiple offset distances can be traversed to determine multiple matching relationships. How to determine multiple matching relationships based on correspondence relationships and offset distances will be described in detail later.
  • multiple original video frames are regarded as a sequence.
  • the length of the sliding window is the length of the shorter sequence
  • the sliding step is one step
  • the minimum value of the offset distance is 0,
  • the maximum value is the original video.
  • the method before sliding matching the original video frame with the target illumination sequence and determining multiple offset distances, the method further includes:
  • the step of sliding matching the original video frame with the target illumination sequence and determining multiple offset distances includes:
  • Sliding matching is performed between the original video frame and the target illumination sequence after removing the illumination elements at both ends, and multiple offset distances are determined.
  • the lighting elements at the beginning and the end of the target illumination sequence can be removed to obtain the target illumination sequence from which the illumination elements at the beginning and the end are removed.
  • the number of lighting elements removed at each end can be to set according to your needs. For example, if the number of colorful lights corresponding to the target lighting sequence is p, and each color of light hits q frames, then the length of the target lighting sequence is p ⁇ q, and the number of lighting elements removed at each end can be q/2 . When removing lighting elements, you should ensure that you do not remove the entire lighting element of the same color. For example, if the first three lighting elements in the target lighting sequence are all red light, and the fourth lighting element is blue light, then the number of lighting elements removed is Can be 1 or 2.
  • the multiple offset distances determined may be sliding matching between the original sample video frame and the target illumination sequence with the illumination elements at the beginning and end removed.
  • the frame loss and multi-frame problems of the video to be detected exist at both the beginning and the end of the video to be detected. Therefore, by removing the lighting elements at the first and last ends of the target lighting sequence, and then performing sliding matching between the target lighting sequence after removing the lighting elements at the first and last ends and the original video frame, the matching relationship obtained does not take into account the video to be detected and the first lighting the beginning and end of the sequence. Furthermore, when the candidate illumination sequence is obtained based on the matching relationship and the similarity between the candidate illumination sequence and the first illumination sequence is calculated, the first and last ends are not considered, thereby solving the problem of low similarity caused by frame loss and multiple frames.
  • determining multiple matching relationships according to the corresponding relationship and each offset distance may include: determining under each corresponding relationship and each offset distance, determining the target sequence number that matches the target video frame.
  • Target index determining the target sequence number that matches the target video frame.
  • multiple matching relationships are obtained based on the target index that matches the target sequence number of each target video frame under each correspondence relationship and each offset distance.
  • the matching relationship can be determined as: (1-3, 2-7, 3-13). Based on this matching relationship, when extracting the candidate lighting sequence from the first lighting sequence, the candidate lighting sequence can be generated based on the 3rd, 7th, and 13th lighting elements in the first lighting sequence.
  • the target index matched by the target serial number 1 of the target video frame is 2.
  • by substituting different values of dt multiple matching relationships can be obtained.
  • the target index of the lighting element in the target lighting sequence that matches the target video frame's target serial number can be determined, and then the target serial number of the target video frame and the target index of the lighting element in the lighting sequence can be obtained.
  • This matching relationship is the same as the matching relationship between the serial number of the video frame and the index of the lighting element in the first lighting sequence. Therefore, the serial number of each video frame in the video frame sequence can be obtained, and the matching relationship between each candidate The matching relationship between the indices of each lighting element in the lighting sequence in the first lighting sequence.
  • each target indexes that match the target serial numbers to obtain the multiple matching relationships including:
  • the target index matching the target sequence number of any of the target video frames is less than 1 or greater than the length of the target illumination sequence corresponding to the target index, Mark the target index matching the target sequence number of the target video frame as an empty target index;
  • the corresponding relationship and the matching relationship under the offset distance are determined according to the empty target index matching the target video frame and the target index matching other target video frames.
  • the target index matching the target sequence number of any of the target video frames is less than 1 or greater than the target corresponding to the target index.
  • the target index that matches the target serial number in the corresponding relationship and the matching relationship under the offset distance is recorded as an empty target index.
  • the minimum value of the target index is 1, and the maximum value is the length of the target lighting sequence. Therefore, in any determined matching relationship, if the target index matching any target serial number is less than 1, or greater than the length of the target illumination sequence corresponding to the target index, then the target index corresponding to the target serial number is empty. At this time, the matching relationship can be marked, marking that the target sequence number corresponds to an empty element, and the empty element occupies an element position.
  • the index of the lighting element that matches the sequence number of any video frame in the first lighting sequence is an empty element.
  • determine other lighting elements based on the sequence numbers of other video frames in the video frame sequence and the index information of each lighting element in the first lighting sequence. Based on the empty element and other lighting elements, a candidate lighting sequence under the matching relationship is generated.
  • the third lighting element in the extracted candidate lighting sequence will be an empty element.
  • obtaining the similarity values of each candidate illumination sequence and the second illumination sequence respectively includes:
  • For each candidate illumination sequence determine the target illumination element corresponding to the position of each non-empty element in the candidate illumination sequence in the second illumination sequence; calculate the target corresponding to the position of each non-empty element Similarity of lighting elements;
  • the average value of the similarity between each non-null element and the target illumination element corresponding to the position is determined as the similarity value between the candidate illumination sequence and the second illumination sequence.
  • the target illumination element in the second illumination sequence corresponding to the position of each non-empty element in the candidate illumination sequence determines the target illumination element in the second illumination sequence corresponding to the position of each non-empty element in the candidate illumination sequence; calculate each non-empty element The similarity of the target illumination element corresponding to the position; the average value of the similarity between each non-empty element and the target illumination element corresponding to the position is determined as the candidate The similarity value between the illumination sequence and the second illumination sequence.
  • the similarity between the candidate illumination sequence and the second illumination sequence is calculated.
  • any candidate illumination sequence includes an empty element
  • delete the empty element in the candidate illumination sequence to obtain a new candidate illumination sequence
  • delete the empty element in the second illumination sequence corresponding to the empty element The lighting element at the element position is used to obtain a new second lighting sequence; the similarity value between the new candidate lighting sequence and the new second lighting sequence is determined as the similarity value between the candidate lighting sequence and the second lighting sequence.
  • a candidate lighting sequence can be generated based on the 4th and 7th elements in the first lighting sequence and an empty element. Because the empty element occupies an element position, the length of the candidate lighting sequence is 3.
  • the index of the lighting element matching the video frame in the lighting sequence exceeds the range of the lighting sequence.
  • Figure 5 shows a schematic flow chart of a living body detection method in this embodiment.
  • Obtain the first illumination sequence y (y 1 ,..., y N ), and the length of the first illumination sequence is N.
  • An entity location point of the object to be detected refers to a point where the object to be detected actually exists. For example, when the object to be detected is a face, an entity location point of the object to be detected can be a point on the nose of the face. The size of this point is the size represented by a pixel in the video. Because Each lighting element in the second lighting sequence represents the light reflected from a video frame. Therefore, the length of the second lighting sequence is the same as the length of the video frame sequence, both being m.
  • the target index matching the target sequence number i of each target video frame is s(i)-dt, where dt is the offset distance.
  • the number of frames of the target video frame is m
  • the corresponding relationship between the target serial number of each target video frame and the frame serial number of the original video frame is s;
  • c n/m
  • n is the assumed total number of frames of the target video
  • ceil is the upward rounding function
  • the correspondence relationship can also characterize the extraction rule for extracting video frames from the video to be detected. Because the value of m is determined, the extraction rule is related to the value of n. Therefore, under this extraction rule, the correspondence s is related to the number n of original video frames contained in the video to be detected.
  • Multiple matching relationships can be obtained, and multiple matching relationships can be determined according to the following steps.
  • the extraction rules for extracting target video frames from the original video are the same as those for extracting video frames from the target video. Therefore, the corresponding relationship between the target serial number of the target video frame and the frame serial number of the original video frame is the same as the corresponding relationship between the target sequence number of the target video frame and the frame serial number of the original video frame. The corresponding relationship between the sequence number of the video frame and the frame sequence number of the original video frame is the same, both are s.
  • the length of the target lighting sequence is the same as the length of the first lighting sequence.
  • N p ⁇ q.
  • Floor is a downward rounding function, and the new index sequence is index.
  • N' is not equal to n, that is, the length of the target illumination sequence with the lighting elements at the beginning and end removed and the sequence composed of the original video frames are different.
  • n and dt take different values, multiple matching relationships can be obtained.
  • n and dt take other values, other matching relationships can be obtained, and other candidate lighting sequences can be generated based on the other matching relationships obtained.
  • the multiple matching relationships obtained cover the multiple possibilities of the original video frame number of the video to be detected, as well as the multiple number of steps between the video to be detected and the first lighting sequence. Out of sync situation.
  • a response map of the object to be detected is generated.
  • the response intensity of each pixel point in the response map is represented by: the second illumination sequence reflected by the entity position point corresponding to the pixel point and the first illumination sequence. similarity between the two; according to the response graph, the living body detection result of the object to be detected is obtained.
  • the length of the target lighting sequence is the same as the length of the first lighting sequence
  • the number of target video frames extracted from the original video is the same as the number of video frames extracted from the video to be detected, and the extraction rules are also the same. Therefore, it can Based on the target illumination sequence, original video, target video frame, etc., a variety of matching relationships suitable for the video to be detected and the first illumination sequence can be calculated in advance, thereby improving the efficiency of living body detection.
  • the matching relationship includes the unknown expected total number of frames of the original video. Therefore, multiple matching relationships include matching relationships that are suitable when there are dropped frames or multiple frames in the video to be detected. Matching relationship includes an unknown offset distance, therefore, multiple matching relationships include matching relationships suitable when there are different number of steps of out-of-synchronization between the video to be detected and the first illumination sequence.
  • the frame loss or multiple frames in the video to be detected is usually caused by the presence of frame loss or multiple frames in the video frames at the beginning and end.
  • the lighting elements at the beginning and end of the target lighting sequence are removed. Therefore, when extracting the candidate illumination sequence based on the matching relationship, the illumination elements at the beginning and the end of the first illumination sequence are also excluded, so that when calculating the similarity, the similarity at the beginning and the end is excluded.
  • this embodiment only needs to calculate the similarity based on the target number of video frames, which greatly reduces the amount of calculation and shortens the time-consuming time of live body detection.
  • FIG. 6 is a schematic structural diagram of a life detection device according to an embodiment of the present application.
  • the life detection device includes a video acquisition module 61, a response graph generation module 62, and a first life detection result acquisition 63, where:
  • the video acquisition module 61 is used to obtain the video of the object to be detected, where the video to be detected is: the video of the object to be detected collected during the period of irradiating the object to be detected according to the first illumination sequence;
  • the response graph generation module 62 is configured to generate a response of the object to be detected based on the first illumination sequence and the second illumination sequence reflected by each physical position point of the object to be detected represented by the video to be detected.
  • the response intensity of each pixel in the response map represents: the similarity between the second illumination sequence reflected by the entity position point corresponding to the pixel and the first illumination sequence;
  • the detection result determination module 63 is configured to perform life detection on the object to be detected based on the response graph and using the first life detection model to obtain the first life detection result of the object to be detected.
  • the device embodiment is similar to the method embodiment, so the description is relatively simple. For relevant information, please refer to the method embodiment.
  • FIG. 7 is a schematic diagram of the electronic device provided by the embodiment of the present application.
  • the electronic device 100 includes: a memory 110 and a processor 120.
  • the memory 110 and the processor 120 are connected through bus communication.
  • a computer program is stored in the memory 110, and the computer program can run on the processor 120. Then, the steps in the living body detection method disclosed in the embodiment of the present application are implemented.
  • Embodiments of the present application also provide a computer-readable storage medium on which a computer program/instruction is stored.
  • the computer program/instruction is executed by a processor, the living body detection method disclosed in the embodiment of the present application is implemented.
  • Embodiments of the present application also provide a computer program product, which includes a computer program/instruction. When executed by a processor, the computer program/instruction implements the living body detection method disclosed in the embodiment of the present application.
  • the embodiment of the present application also provides a computer program, which when executed can implement the living body detection method disclosed in the embodiment of the present application.
  • embodiments of the embodiments of the present application may be provided as methods, devices or computer program products. Therefore, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, apparatuses, electronic devices and computer program products according to embodiments of the present application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine such that the instructions are executed by the processor of the computer or other programmable data processing terminal device. Means are generated for implementing the functions specified in the process or processes of the flowchart diagrams and/or the block or blocks of the block diagrams.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing terminal equipment to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the The instruction means implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing terminal equipment, so that a series of operating steps are executed on the computer or other programmable terminal equipment to generate the computer program.
  • a computer-implemented process whereby instructions executed on a computer or other programmable terminal device provide steps for implementing the functions specified in a process or processes of a flowchart and/or a block or blocks of a block diagram .

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

一种活体检测方法、电子设备、存储介质及程序产品。方法包括:获取待检测对象的待检测视频,待检测对象的视频为:在按照第一光照序列照射待检测对象期间,所采集的待检测对象的视频(S11);根据第一光照序列,和待检测视频所表征的待检测对象的每各个实体位置点反射的第二光照序列,生成待检测对象的响应图(S12);基于响应图、利用第一活体检测模型对响应图进行处理待检测对象进行活体检测,得到待检测对象的第一活体检测结果(S13)。

Description

一种活体检测方法、电子设备、存储介质及程序产品
本申请要求在2022年5月16日提交中国专利局、申请号为202210528092.5、发明名称为“一种活体检测方法、电子设备、存储介质及程序产品”的中国专利申请的优先权,以及,在2023年3月24日提交中国专利局、申请号为202310306903.1、发明名称为“活体检测方法、电子设备、存储介质及程序产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,特别是涉及一种活体检测方法、电子设备、存储介质及程序产品。
背景技术
活体检测技术日趋成熟,相关的炫彩活体检测技术主要包括两个部分:一是打光序列检验,对比发出的光照序列和视频中待检测对象呈现的反射光的序列,判断是否存在摄像头劫持;二是通过采集的待检测对象的视频或则和图像的活体检测方法,检测得到的炫彩视频中是否包含普通活体攻击行为,如屏幕翻拍、打印纸翻拍等。
现有的炫彩活体检测技术通常是将上述两个部分分裂开进行检测的,也即,分别采用不同的模型或者算法实现上述两部分的检测,再结合两部分检测结果得到最终的活体检测结果。因此,现有的炫彩光活体检测技术还有待提升。
发明内容
鉴于上述问题,本申请实施例提供了一种活体检测方法、电子设备、存储介质及程序产品,以便克服上述问题或者至少部分地解决上述问题。
本申请实施例的第一方面,提供了一种活体检测方法,包括:
获取待检测对象的待检测视频,所述待检测视频为:在按照第一光照序列照射所述待检测对象期间,所采集的所述待检测对象的视频;
根据所述第一光照序列,和所述待检测视频所表征的所述待检测对象的各个实体位置点反射的第二光照序列,生成所述待检测对象的响应图,所述响应图中各像素点的响应强度表征:所述像素点所对应的实体位置点反射的第二光照序列与所述第一光照序列之间的相似度;
基于所述响应图、利用第一活体检测模型对所述待检测对象进行活体检测,得到所述待检测对象的第一活体检测结果。
本申请实施例的第二方面,提供了一种电子设备,包括存储器、处理器及存储在所述存储器上的计算机程序,所述处理器执行所述计算机程序以实现如第一方面所述的活体检测方法。
本申请实施例的第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序/指令,该计算机程序/指令被处理器执行时实现如第一方面所述的活体检测方法。
本申请实施例的第四方面,提供了一种计算机程序产品,包括计算机程序/指令,该计算机程序/指令被处理器执行时实现如第一方面所述的活体检测方法。
本申请实施例包括以下优点:
本实施例中,待检测对象的响应图具体到了像素点级别,根据响应图的各像素点表征的待检测对象的实体位置点反射的第二光照序列与第一光照序列之间的相似度,可以得到待检测对象各个实体位置区域的呈现出的不同反射模式。活体人脸凹凸不平,翻拍采用的屏幕或打印纸比较平滑且反射率高,因此活体人脸和翻拍反射的光具有不同的模式,因此,可以基于待检测对象的响应图,区分待检测对象为翻拍还是活体人脸,从而得到待检测对象的第一活体检测结果。也即,在炫彩活体检测方法中,引入反射光这一特征,可以在炫彩活体检测的基础上进一步实现翻拍攻击检测,有效提高了活体检测的精度和检出能力。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例的一种活体检测方法的步骤流程图;
图2是本申请一实施例的一种得到第一活体检测结果的流程示意图;
图3是本申请一实施例的一种活体检测的流程示意图;
图4是本申请一实施例的一种活体检测方法的步骤流程图;
图5是本申请一实施例的一种活体检测方法的流程示意图;
图6是本申请一实施例的一种活体检测装置的结构示意图;
图7是本申请一实施例的一种电子设备的示意图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
近年来,基于人工智能的计算机视觉、深度学习、机器学习、图像处理、图像识别等技术研究取得了重要进展。人工智能(Artificial Intelligence,AI)是研究、开发用于模拟、延伸人的智能的理论、方法、技术及应用系统的新兴科学技术。人工智能学科是一门综合性学科,涉及芯片、大数据、云计算、物联网、分布式存储、深度学习、机器学习、神经网络等诸多技术种类。计算机视觉作为人工智能的一个重要分支,具体是让机器识别世界,计算机视觉技术通常包括人脸识别、活体检测、指纹识别与防伪验证、生物特征识别、人脸检测、行人检测、目标检测、行人识别、图像处理、图像识别、图像语义理解、图像检索、文字识别、视频处理、视频内容识别、三维重建、虚拟现实、增强现实、同步定位与地图构建(SLAM)、计算摄影、机器人导航与定位等技术。随着人工智能技术的研究和进步,该项技术在众多领域展开了应用,例如安全防控、城市管理、交通管理、楼宇管理、园区管理、人脸通行、人脸考勤、物流管理、仓储管理、机器人、智能营销、计算摄影、手机影像、云服务、智能家居、穿戴设备、无人驾驶、自动驾驶、智能医疗、人脸支付、人脸解锁、指纹解锁、人证核验、智慧屏、智能电视、摄像机、移动互联网、网络直播、美颜、美妆、医疗美容、智能测温等领域。
相关技术中,在进行活体检测时,需要通过对比发出的光照序列和采集到的视频反射的光照序列之间的相似度,判断是否存在摄像头劫持;但是仅仅基于光照序列之间的相似度进行判断,可能存在打印的人脸反射光照等情况,打印的人脸反射的光照序列和发出的光照序列也具有较高的相似度。因此,相关活体检测技术还需要基于采集到的视频判断是否包含活体攻击行为,例如判断是否为屏幕翻拍、打印纸翻拍等。
但是,相关活体检测技术是将相似度检测以及活体攻击行为检测割裂开的,本申请人提出,可以根据光照射在真人脸上和照射在屏幕/打印纸上,反射的光具有不同的反射模式这一信息,将相似度检测以及活体攻击行为检测两部分相结合,在考虑相似度的同时,还考虑了真人和屏幕/打印纸的反射光的区别。因为人脸凹凸不平,不同区域反射的光不同,且人脸之后的背景区域往往因为接受到的光照强度较低,只能反射微弱的光或者甚至不能反射光;而通过屏幕/打印纸等进行翻拍,屏幕/打印纸等比较光滑且平整,因此 反射的光可能比较均匀、有规律。因此,本申请人想到,可以利用活体人脸与屏幕/打印纸具有不同的反射模式这一信息,来提高炫彩活体检测的准确性。
参照图1所示,示出了本申请实施例中一种活体检测方法的步骤流程图,如图1所示,该活体检测方法可以应用于后台服务器,包括以下步骤:
步骤S11:获取待检测对象的待检测视频,所述待检测视频为:在按照第一光照序列照射所述待检测对象期间,所采集的所述待检测对象的视频;
步骤S12:根据所述第一光照序列,和所述待检测视频所表征的所述待检测对象的各个实体位置点反射的第二光照序列,生成所述待检测对象的响应图,所述响应图中各像素点的响应强度表征:所述像素点所对应的实体位置点反射的第二光照序列与所述第一光照序列之间的相似度;
步骤S13:基于所述响应图、利用第一活体检测模型对所述待检测对象进行活体检测,得到所述待检测对象的第一活体检测结果;
在具体实施时,可以由后台服务器将第一光照序列下发给终端。终端按照第一光照序列发出光照照射待检测对象,并在按照第一光照序列发出光照照射待检测对象期间,采集待检测对象的待检测视频。其中,待检测对象为终端的摄像头采集到的对象。
可选的,在一些具体实施方式中,本申请方案也可以由终端等电子设备执行,如在进行活体检测时,通过电子设备自身生成第一光照序列,并按照第一光照序列照射待检测对象期间采集待检测对象的视频;然后由电子设备自身根据第一光照序列和待检测视频执行后续的活体检测流程。具体第一光照序列由后台服务端下发、还是由终端等电子设备自己生成,以及,具体活体检测流程是由后台服务端执行还是由终端等电子设备自己执行,甚至活体检测流程中的部分步骤由终端等电子设备执行、部分步骤由后台服务器执行等均可以根据实际需求进行设置,本申请实施例并不对此进行限定,也不再对可能的实施方式一一列举。
对待检测视频的每一视频帧进行处理,确定待检测对象的实体位置点在每个视频帧中对应的像素点,获取对应的像素点在每个视频帧中反射的光照,可以得到待检测对象的每个实体位置点反射的第二光照序列。待检测对象的一个待检测位置点是指待检测对象实际存在的一个点,例如,待检测对象为人脸时,待检测对象的一个实体位置点可以是人脸鼻子上的一个点,该点的大小为视频中一个像素点表征的大小。
计算第一光照序列和待检测对象的每个实体位置点反射的第二光照序列之间的相似度,将相似度作为响应图的响应强度,可以生成待检测对象的响应图。可选地,在光照序列为白光时,第一光照序列和第二光照序列中每个元素可以表征白光的光照强度;在光照序列为彩光时,第一光照序列和第 二光照序列中每个元素可以表征彩光的光照强度和/或彩光的颜色。其中,第一光照序列和第二光照序列之间的相似度,可以通过两个序列的点积得到。因为待检测对象的响应图是细化到像素点级别的,因此通过待检测对象的响应图,可以得到待检测对象的不同位置区域的呈现出的不同反射模式。
将待检测对象的响应图输入第一活体检测模型,第一活体检测模型根据待检测对象的响应图对待检测对象进行活体检测,可以得到待检测对象的第一活体检测结果。第一活体检测模型为通过有监督训练,学习了活体人脸的响应图的第一图像特征的模型,可以区分活体人脸的响应图和其它攻击(例如:翻拍攻击)的响应图,因此,第一活体检测模型可以通过待检测对象的响应图,得到待检测对象的第一活体检测结果。其中,第一活体检测模型进行的有监督训练可以为:获取多个样本对象(包括活体人脸以及其它对象)的响应图,将样本对象的响应图输入待训练的第一活体检测模型,得到样本对象为活体的预测概率;根据预测概率和样本对象真实是否为活体,建立损失函数,基于损失函数对待训练的第一活体检测模型的模型参数进行更新,得到第一活体检测模型。如此,第一活体检测模型可以学习到活体人脸的响应图。其中,获取样本对象的响应图的方法可以参照获取待检测对象的响应图的方法。
可选地,所述方法还包括:将所述待检测视频输入第二活体检测模型,得到所述待检测对象的第二活体检测结果;根据所述第一活体检测结果和所述第二活体检测结果,确定所述待检测对象的最终活体检测结果。
为了可以同时检出较多攻击类型的攻击,提高活体检测结果的准确性,还可以采用第二活体检测模型通过待检测视频,得到待检测对象的第二活体检测结果。其中,第二活体检测模型可以为相关技术中用于活体检测的常用模型,第二活体检测模型可以根据输入的视频或视频帧,进行活体检测。可选的,在具体实施时,第二活体检测模型可以为任意类型的活体检测模型,如面具攻击活体检测模型、动作检测活体模型等等;针对某些活体检测模型,可能需要用户执行相应的动作,因此,为了能够基于待检测视频得到第一活体检测结果和第二活体检测结果,在进行视频录制时,可根据第二活体检测模型的需求指示用户执行相应的动作。在具体应用场景中,可根据具体应用场景的实际需求进行设置,本申请实施例并不对此进行限定。
可选地,可以是第一活体检测模型和第二活体检测模型并行工作,并行得到第一活体检测结果和第二活体检测结果;也可以是第一活体检测模型先得到第一活体检测结果,然后第二活体检测模型得到第二活体检测结果;还可以是第二活体检测模型先得到第二活体检测结果,然后第一活体检测模型得到第一活体检测结果。
综合第一活体检测结果和第二活体检测结果,可以得到待检测对象的最 终活体检测结果。可选地,第一活体检测结果和第二活体检测结果可以分别为表征待检测对象是活体人脸的概率。通过比较第一活体检测结果表征的概率和第二活体检测结果表征的概率中的较小值,和活体阈值的大小关系;在该较小值大于活体阈值的情况下,确定待检测对象的最终活体检测结果为:是活体;在该较小值不大于活体阈值的情况下,确定待检测对象的最终活体检测结果为:不是活体。其中,活体阈值可以是预先设置的比较合理的值。可选地,还可以为第一活体检测结果和第二活体检测结果设置不同的权重,综合加权后的第一活体检测结果和加权后的第二活体检测结果,得到待检测对象的最终活体检测结果。
如此,可以在保证真人被判定为活体的概率不降低的情况下,有效提高检出其他攻击的能力。
采用本申请实施例的技术方案,待检测对象的响应图具体到了像素点级别,根据响应图的各像素点表征的待检测对象的实体位置点反射的第二光照序列与第一光照序列之间的相似度,可以得到待检测对象各个实体位置区域的呈现出的不同反射模式。活体人脸凹凸不平,翻拍采用的屏幕或打印纸比较平滑且反射率高,因此活体人脸和翻拍反射的光具有不同的模式,因此,第一活体检测模型可以基于待检测对象的响应图,区分待检测对象为翻拍还是活体人脸,从而得到待检测对象的第一活体检测结果。如此,第一活体检测模型利用了翻拍和活体人脸具有不同反射模式这一信息,实现了两种检测方式(打光序列检验和检测是否为翻拍)的结合,确定的第一活体检测结果更加准确。此外,第二活体检测模型还可以得到待检测对象的第二活体检测结果,综合第一活体检测结果和第二活体检测结果确定的待检测对象的最终活体检测结果,可以在保证真人被判定为活体的概率不降低的情况下,有效提高检出其他攻击的能力,因此有效提升最终活体检测结果的准确性。
考虑到还存在戴面具之类的攻击,因为面具结构和人脸结构类似,因此反射的光照、生成的响应图都类似,而仅仅学习了活体人脸的响应图的第一图像特征的第一活体检测模型,可能难以识别这种攻击。因此,在对第一活体检测模型进行有监督训练时,还可以让第一活体检测模型学习活体人脸的第二图像特征。可选地,第一活体检测模型进行的有监督训练可以为:获取多个样本对象(包括活体人脸以及其它对象)的响应图,以及采集的每个样本对象的视频,从该视频中提取任一个视频帧,待训练的第一活体检测模型基于该响应图和该视频帧,以及样本对象真实是否为活体的信息,进行有监督训练。在第一活体检测模型的模型结构只能允许一个输入的情况下,可以提取该响应图的第一图像特征,以及该视频帧的第二图像特征;将该响应图的第一图像特征,以及该视频帧的第二图像特征进行融合,并将融合后特征输入待训练的第一活体检测模型,得到样本对象的第一活体检测结果。
相应地,学习了活体人脸的图像特征的第一活体检测模型,可以根据融合图像特征得到待检测对象的第一活体检测结果。其中,融合图像特征的获取方法为:提取所述响应图的第一图像特征;提取所述待检测视频的任一个视频帧,并获取该视频帧的第二图像特征;融合所述第一图像特征以及所述第二图像特征,得到融合图像特征。
将待检测对象的响应图的图像特征,与从待检测对象的视频中提取的任一个视频帧的图像特征进行融合得到的特征。
采用本申请实施例的技术方案,既学习了活体人脸的响应图,又学习了该活体人脸的图像特征的第一活体检测模型,可以避免戴面具之类的攻击,从而提高第一活体检测的准确性。
在上述技术方案的基础上,所述待检测视频所表征的所述待检测对象的每个实体位置点反射的第二光照序列,是按照以下步骤得到的:提取所述待检测视频的多个视频帧;对所述多个视频帧中描述所述待检测对象的同一实体位置点的像素点进行对齐;针对所述待检测对象的每个实体位置点,根据所述多个视频帧中各视频帧描述该实体位置点的像素点所反射的光照,得到该实体位置点所反射的第二光照序列。
待检测视频具有多个视频帧,待检测对象的一个实体位置点在每个视频帧中的位置可能都不同,为了获取待检测对象的每个实体位置点反射的第二光照序列,需要对多个视频帧进行对齐,以得到待检测对象的任一实体位置点在每个视频帧中对应的像素点。根据待检测对象的任一实体位置点在每个视频帧中对应的像素点反射的光照,以及每个视频帧之间的时序,则可以得到待检测对象的每个实体位置点反射的第二光照序列。
例如,光照为红光,总共有5个视频帧,待检测对象的一个实体位置点在每个视频帧中的像素点分别为A、B、C、D、E,若A、C、D点都反射了红光,而B、E点没有反射红光,则待检测对象的该实体位置点反射的红光的第二实体序列可以为10110。可以理解的是,根据反射的光的强度,第二实体序列中的数字还可以为0到1之间的数,数值大小和反射的光的强度相关。
采用本申请实施例的技术方案,可以解决待检测对象在各个视频帧中的位置不同,难以获取到待检测对象的每个实体位置点反射的第二光照序列的问题。
可以基于人脸关键点实现多个视频帧的对齐处理。首先对待检测视频的多个视频帧中的各视频帧分别进行人脸关键点检测,得到多个视频帧各自包含的人脸关键点。本申请对人脸关键点检测的具体检测方法不做要求,可以采用相关的人脸关键点检测算法或软件;还可以采用其他耗时更短的方法,如,在得到第一个视频帧中的人脸关键点后,以左右眼眼角、鼻尖、嘴唇左 右嘴角5个点位作为锚定点,将其余视频帧通过薄板样条插值算法(Thin Plate Spline)映射到模板上。其中,可以根据人脸关键点的输入点位自定义锚定点,只需保证在粗对齐后某些关键点所在的位置是固定的。锚定点数量越多,对齐效果越好,但5个点位足以满足要求。可以理解的是,当检测不到人脸时,可以直接将待检测对象确定为非活体。
根据各视频帧所包含的人脸关键点,可以对各视频帧进行人脸关键点级别的对齐处理。进一步地,对进行人脸关键点级别的对齐处理后的多个视频帧,进行像素点级别的对齐处理。
可选地,也可以直接对多个视频帧进行像素点级别的对齐处理。
直接进行像素点级别的对齐处理,可以使对齐的流程比较简单;先进行人脸关键点级别的对齐处理,再进行像素点级别的对齐处理,可以节省进行像素点级别的对齐处理时消耗的计算资源。具体怎样实现进行像素点级别的对齐处理,可以根据实际需求进行选择。
可选地,可以通过如下过程实现像素点级别的细对齐:分别计算参与所述像素点级别的对齐处理的多个视频帧中除参考视频帧外的其它每个视频帧,与所述参考视频帧之间的稠密光流数据,所述参考视频帧为参与所述像素点级别的对齐处理的多个视频帧中的任一个视频帧;根据所述其它每个视频帧与所述参考视频帧之间的稠密光流数据,将所述其它每个视频帧与所述参考视频帧进行像素点级别的对齐处理。
可以以第一个视频帧为参考视频帧,在亮度通道计算其他每一个视频帧与参考视频帧之间的稠密光流数据,以稠密光流数据作为映射依据,将其他每一个视频帧与参考视频帧进行对像素点级别的细对齐。其中,可以基于稠密光流算法实现像素点级别的对齐处理,如Gunnar Farneback算法(一种稠密光流算法)。
如此,实现了像素点级别的细对齐,才能准确得到待检测对象每个实体位置点反射的第二光照序列,进而生成准确的响应图,得到准确的第一活体检测结果。
图2示出了得到第一活体检测结果的流程示意图,得到第一活体检测结果的步骤可以包括:人脸关键点级别的对齐处理、像素点级别的对齐处理、得到第二光照序列、相似度计算、生成响应图、获取视频帧、输入第一活体检测模型、得到第一活体检测结果。上述几个步骤可以组成比较完整的流程,但根据实际需求,可以对其中的一个步骤或多个步骤进行舍弃,例如,可以舍弃人脸关键点级别的对齐处理的步骤,以降低整个流程的复杂度;可以舍弃获取视频帧的步骤,并相应地只将响应图输入第一活体检测模型,而对于戴面具之类的攻击的识别,则可以通过第二活体检测模型实现,以避免第一活体检测模型和第二活体检测模型之间存在重复工作;还可以同时舍弃人脸 关键点级别的对齐处理以及获取视频帧的步骤,在降低整个流程的复杂度的同时,避免一活体检测模型和第二活体检测模型之间存在重复工作。
在上述技术方案的基础上,第一光照序列和第二光照序列可以均为包含多个颜色通道的彩光序列。在计算第一光照序列和第二光照序列之间的相似度时,是计算每个颜色通道中第一彩光序列和第二彩光序列之间的相似度,进而得到该颜色通道的响应子图。
为了得到每个颜色通道反射的彩光序列,需要对第一光照序列和第二光照序列分别进行分离。分离第一光照序列,可以得到每个颜色通道所对应的第一彩光序列,以及,分离第二光照序列,可以得到每个颜色通道所对应的第二彩光序列。在分离第一光照序列和第二光照序列时,为了提高分离的准确性,可以对每个颜色通道进行正则化。可以利用公式x′t=xt-(Σxt)/n实现对每个颜色通道的正则化,其中,x′t表征正则化后的彩光序列(可以为第一彩光序列或第二彩光序列),xt表征正则化前的彩光序列,n表征彩光序列的长度。
对每个颜色通道进行正则化的方法还可以参照相关技术,本申请对此不作限制。其中,因为第一光照序列是后台下发的,因此可以直接从后台获取每个颜色通道所对应的第一彩光序列。
例如,以一秒为彩光序列之间的单位,多个颜色通道为红色通道、黄色通道、蓝色通道,发出的彩光在第一秒为红光,第二秒为红光和黄光共同组成的橙色的光,第三秒为黄光和蓝光共同组成的白色的光,第四秒为黄光,第五秒为红光,以1表征存在对应颜色通道的光,以0表征不存在对应颜色通道的光,则得到的第一彩光序列可以为:红色通道的第一彩光序列11001,黄色通道的第一彩光序列01110,蓝色通道的第一彩光序列00100。按照相似的原理,可以得到待检测对象的每个实体位置点反射的在各个颜色通道对应的第二彩光序列。可以理解的是,根据下发的光的强度,第一彩光序列中的数字可以为0到1之间的数;根据反射的光的强度,第二彩光序列中的数字可以为0到1之间的数。
对于每个颜色通道,若待检测对象的一个实体位置点所反射的该颜色通道的第二彩光序列,与该颜色通道所对应的第一彩光序列之间的相似度大于预设值,则该实体位置点在该颜色通道的响应子图中对应的像素点的颜色,为该颜色通道的颜色。在得到每个颜色通道的响应子图后,将各颜色通道的响应子图进行融合,可以得到待检测对象的响应图。
例如,待检测对象的一个实体位置点在红色通道的响应图中对应的像素点的颜色为红色(相似度大于预设值),在黄色通道的响应图中对应的像素点的颜色为黄色(相似度大于预设值),在蓝色通道的响应图中对应的像素点的颜色为无(相似度不大于预设值),则将各颜色通道的响应子图进行融 合后,得到的待检测对象的响应图中对应像素点的位置为红色和黄色组成的橙色。
如此,针对各个颜色通道,得到各个颜色通道的响应子图,再根据各个颜色通道的响应子图得到待检测对象的响应图,相比于直接根据综合所有颜色通道的彩光得到待检测对象的响应图,具有更加准确的优点。
在上述技术方案的基础上,在每个颜色通道,为了避免该颜色通道的响应子图的亮度过高给响应子图带来的不良影响,可以利用该响应子图中人脸区域的响应强度均值,对该颜色通道的响应子图进行正则化处理,以得到待检测对象在该颜色通道的正则化响应子图。融合待检测对象在每个颜色通道的正则化响应子图,得到待检测对象的响应图。
可选地,利用每个颜色通道的响应子图中人脸区域的响应强度均值,对该颜色通道的响应子图进行正则化处理可以是:将该颜色通道的响应子图的人脸区域的响应强度均值,除以该颜色通道的响应子图的每个像素点的响应强度,将得到的商作为该颜色通道的正则化响应子图的每个像素点的响应强度。
其中,可以通过如下公式计算正则化后的响应图的响应强度:
其中,F表征人脸区域,ri,j表征响应图的像素点(i,j)的响应强度;n表征像素点的数量,NF表征人脸区域的响应强度均值。
最终得到的响应图为Resp′={r′i,j},r′i,j表征经过正则化后的响应图的像素点(i,j)的响应强度,可以通过公式计算得到。
可以通过如下公式实现对各颜色通道的正则化后的响应子图的融合:
rii,j=uint8(min(max(r′i,j*255,0),255));
其中,rii,j表征融合后的响应图的像素点(i,j)的响应强度,unit8表示将浮点数转化为8位非负整数,r′i,j表征经过正则化后的响应图的像素点(i,j)的响应强度。
如此,可以使每个颜色通道的响应子图的响应强度比较均衡,避免出现过亮的场景。
可选地,在上述技术方案的基础上,在得到待检测对象的响应图之后,以及在将待检测对象的响应图输入第一活体检测模型之前,可以先根据待检测对象的响应图的至少一个属性值,确定待检测对象的活体检测结果。
待检测对象的响应图的属性值包括以下至少一者:响应强度均值、质量 值。在任一属性值小于对应的属性阈值的情况下,确定待检测对象不是活体,在各属性值都不小于对应的属性阈值的情况下,将待检测对象的响应图输入第一活体检测模型。
图3示出了活体检测的流程示意图,在将待检测对象的响应图输入第一活体检测模型之前,先判断响应图的属性值是否小于对应的属性阈值。在任一属性值小于对应的属性阈值的情况下,判定待检测对象不是活体,可以直接输出结果“不是活体”的检测结果,否则才将待检测对象的响应图输入第一活体检测模型进行活体检测。
待检测对象的响应图的响应强度均值小于响应强度阈值,可以认为采集到的第二彩光序列太弱,因此该响应图不足以作为活体验证的线索,为了安全性考虑,可以认为存在攻击,直接判定待检测对象不是活体,因此也无需将待检测对象的响应图输入第一活体检测模型。
待检测对象的响应图的质量值小于质量阈值,则可能存在攻击,直接判定待检测对象不是活体,因此无需将待检测对象的响应图输入第一活体检测模型。其中,待检测对象的响应图的质量值,可以根据响应图中的噪声确定,噪声越多则质量值越低。
可以通过如下公式计算响应图的质量值:
其中,quality为质量值,ri,j表征响应图的像素点(i,j)的响应强度,t表征光照序列中的每个元素,y′t表征进行颜色通道正则化处理后的第一光照序列,x′t表征进行颜色通道正则化处理后的第二光照序列。
如此,可以避免一些攻击,提升活体检测结果的准确性。
参照图4所示,示出了本申请实施例中一种活体检测方法的步骤流程图,如图4所示,该活体检测方法可以应用于后台服务器,包括以下步骤:
步骤S41:获取待检测对象的待检测视频,所述待检测视频为:在按照第一光照序列照射所述待检测对象期间,所采集的所述待检测对象的视频;
步骤S42:根据所述第一光照序列,和所述待检测视频所表征的所述待检测对象的各个实体位置点反射的第二光照序列,生成所述待检测对象的响应图,所述响应图中各像素点的响应强度表征:所述像素点所对应的实体位置点反射的第二光照序列与所述第一光照序列之间的相似度;
步骤S43:基于所述响应图、利用活体检测模型对所述待检测对象进行活体检测,得到所述待检测对象的活体检测结果。
获取待检测对象的待检测视频,以及生成待检测对象的响应图的方法可 以参照前文所述获取待检测对象的待检测视频,以及生成待检测对象的响应图的方法;活体检测模型的训练方法可以参照第一活体检测模型的训练方法。
将待检测对象的响应图输入活体检测模型,可以得到待检测对象的活体检测结果。
采用本申请实施例的技术方案,检测对象的响应图具体到了像素点级别,根据响应图的各像素点表征的待检测对象的实体位置点反射的第二光照序列与第一光照序列之间的相似度,可以得到待检测对象各个实体位置区域的呈现出的不同反射模式。活体人脸凹凸不平,翻拍采用的屏幕或打印纸比较平滑且反射率高,因此活体人脸和翻拍反射的光具有不同的模式,因此,活体检测模型可以基于待检测对象的响应图,区分待检测对象为翻拍还是活体人脸,从而得到待检测对象的活体检测结果。如此,活体检测模型利用了翻拍和活体人脸具有不同反射模式这一信息,实现了两种检测方式(打光序列检验和检测是否为翻拍)的结合,确定的活体检测结果更加准确。
可选地,得到待检测视频所表征的所述待检测对象的各个实体位置点反射的第二光照序列的步骤,是按照以下步骤得到的:
提取所述待检测视频的多个视频帧;将所述多个视频帧中描述所述待检测对象的同一实体位置点的像素点进行对齐;针对所述待检测对象的各个实体位置点,根据所述多个视频帧中各视频帧描述该实体位置点的像素点所反射的光照,得到该实体位置点所反射的第二光照序列。
可选地,将所述多个视频帧中描述所述待检测对象的同一实体位置点的像素点进行对齐,具体包括如下过程:
针对所述多个视频帧中的各视频帧,对所述视频帧进行人脸关键点检测,得到所述视频帧所包含的人脸关键点,根据各所述视频帧所包含的人脸关键点,对所述多个视频帧进行人脸关键点级别的对齐处理,对进行人脸关键点级别的对齐处理后的所述多个视频帧进行像素点级别的对齐处理;
或者,
对所述多个视频帧进行像素点级别的对齐处理。
可选地,可通过如下过程进行所述像素点级别的对齐处理:
分别计算参与所述像素点级别的对齐处理的多个视频帧中除参考视频帧外的其它每个视频帧,与所述参考视频帧之间的稠密光流数据,所述参考视频帧为参与所述像素点级别的对齐处理的多个视频帧中的任一个视频帧;
根据所述其它每个视频帧与所述参考视频帧之间的稠密光流数据,将所述其它每个视频帧与所述参考视频帧进行像素点级别的对齐处理。
可选地,所述第一光照序列和所述第二光照序列可以均包含多个颜色通道的彩光序列;根据所述第一光照序列,和所述待检测视频所表征的所述待 检测对象的各个实体位置点反射的第二光照序列,生成所述待检测对象的响应图,包括:分离所述第一光照序列,得到每个所述颜色通道所对应的第一彩光序列,以及,分离所述第二光照序列,得到每个所述颜色通道所对应的第二彩光序列;针对每个所述颜色通道,根据所述待检测对象的每个实体位置点所反射的该颜色通道的第二彩光序列,与该颜色通道所对应的第一彩光序列之间的相似度,生成所述待检测对象在该颜色通道的响应子图;将所述待检测对象在每个所述颜色通道的响应子图进行融合处理,得到所述待检测对象的响应图。其中具体步骤,可以参照前文所述。
可选地,将所述待检测对象在每个所述颜色通道的响应子图进行融合处理,得到所述待检测对象的响应图,包括:根据所述待检测对象在每个所述颜色通道的响应子图,获取所述待检测对象的人脸区域在每个所述颜色通道的响应强度均值;根据所述待检测对象的人脸区域在每个所述颜色通道的响应强度均值,对所述待检测对象在该颜色通道的响应子图进行正则化处理,得到所述待检测对象在每个所述颜色通道的正则化响应子图;将所述待检测对象在每个所述颜色通道的正则化响应子图进行融合处理,得到所述待检测对象的响应图。其中,将所述待检测对象在每个所述颜色通道的响应子图进行融合处理,得到所述待检测对象的响应图,包括:
根据所述待检测对象在每个所述颜色通道的响应子图,获取所述待检测对象的人脸区域在每个所述颜色通道的响应强度均值;根据所述待检测对象的人脸区域在每个所述颜色通道的响应强度均值,对所述待检测对象在该颜色通道的响应子图进行正则化处理,得到所述待检测对象在每个所述颜色通道的正则化响应子图;将所述待检测对象在每个所述颜色通道的正则化响应子图进行融合处理,得到所述待检测对象的响应图。可选地,在利用第一活体检测模型对所述响应图进行处理之前,还包括:获取所述待检测对象的响应图的属性值,所述属性值包括以下至少一者:响应强度均值、质量值;根据所述待检测对象的响应图的属性值与对应的属性阈值的大小关系,确定所述待检测对象的活体检测结果;在所述待检测对象的响应图的属性值小于对应的所述属性阈值的情况下,确定所述待检测对象不是活体;在所述待检测对象的响应图的属性值不小于对应的所述属性阈值的情况下,执行基于所述响应图、利用第一活体检测模型对所述待检测对象进行活体检测的步骤。其中具体步骤,可以参照前文所述。
可选地,所述第一活体检测模型为学习了活体人脸的响应图的第一图像特征以及所述活体人脸的第二图像特征的模型;基于所述响应图、利用第一活体检测模型对所述待检测对象进行活体检测,得到所述待检测对象的第一活体检测结果,包括:提取所述响应图的第一图像特征;提取所述待检测视频的任一个视频帧,并获取该视频帧的第二图像特征;融合所述第一图像特 征以及所述第二图像特征,得到融合图像特征;利用所述第一活体检测模型对所述融合图像特征进行处理,得到所述待检测对象的第一活体检测结果。其中具体步骤,可以参照前文所述。
在进行活体检测时,需要通过对比发出的光照序列(例如:炫彩光序列)和采集到的视频中对象的反射光照序列之间的相似度,在相似度低于某一阈值时,认为采集到的视频为经摄像头劫持技术直接发送到服务器的攻击视频,而非终端采集的视频。
计算发出的光照序列与反射光照序列之间的相似度,需要保证发出的光照序列与反射光照序列之间严格对齐。然而,采集的视频可能存在丢帧、多帧或不同步的情况,导致发出的光照序列与反射光照序列之间无法对齐。例如,假设发出的光照序列有24个光照元素,每个光照元素对应一个视频帧;在按照光照序列的第2个光照元素发出光照射待检测对象时才开始视频采集,会导致采集的视频只有23个视频帧,则采集的视频存在丢帧的情况;在还未发出光照射待检测对象时就已经开始视频采集,光照序列的第1个光照元素对应视频的第2个视频帧,可能导致采集的视频有25帧,则采集的视频存在多帧的情况;在按照光照序列的第2个光照元素发出光照射待检测对象时才开始视频采集,在按照光照序列的第24个光照元素发出光照射待检测对象时采集的视频帧为倒数第二个视频帧的情况下,虽然采集的视频也有24个视频帧,但是存在不同步的问题。
在发出的光照序列与反射光照序列之间未对齐的情况下,即使并非摄像头劫持攻击,也会导致发出的光照序列和反射光照序列之间的相似度低于阈值,从而被误判为摄像头劫持攻击。
此外,对采集的视频的每一视频帧进行处理,获得的反射光照序列的长度与视频帧的帧数相等,因此,在计算反射光照序列与下发的光照序列的相似度时,计算量较大。
相关技术提出的活体检测方法,利用活体人脸与屏幕/打印纸具有不同的反射模式这一信息,来提高活体检测的准确性。具体地,人脸凹凸不平,不同区域反射的光不同,通过屏幕/打印纸等进行翻拍,屏幕/打印纸等比较光滑且平整,因此反射的光比较均匀、有规律。这种方法需要针对待检测对象的不同区域生成不同的反射的光照序列,并计算下发的光照序列的与不同的反射光照序列的相似度,进而生成响应图,并根据响应图确定活体检测结果。因为存在多个反射光照序列,因此这种活体检测方法的计算量尤其的大,获得活体检测结果的耗时较长。
为解决上述相关技术中活体检测耗时长,计算量大的问题,本实施例提供了一种活体检测方法,所述方法包括:
获取待检测视频,所述待检测视频为:在按照第一光照序列照射待检测 对象期间,所采集的待检测对象的视频;
从所述待检测视频中抽取出目标帧数的视频帧序列,并根据所述视频帧序列中的每个视频帧确定第二光照序列,所述第二光照序列表征所述待检测对象的各个实体位置点的反射光;
根据所述视频帧序列中各个视频帧的序号信息和所述第一光照序列中各个光照元素的索引信息,从所述第一光照序列中提取出多种候选光照序列,其中,各所述候选光照序列中的各个元素的索引信息与所述视频帧序列中各个视频帧的序号信息满足不同的匹配关系;
根据所述多种候选光照序列与所述第二光照序列,获取所述待检测对象的活体检测结果。
本实施例中,第二光照序列表征视频帧序列中的每个视频帧中待检测对象的各个实体位置点的反射光,因此第二光照序列的长度与视频帧序列的长度相同,且小于待检测视频的原始视频帧的数量。根据视频帧序列中各个视频帧的序号信息和第一光照序列中各个光照元素的索引信息,从第一光照序列中提取出多种候选光照序列,因此,候选光照序列的长度也与视频帧序列的长度相同。因此,相较于基于待检测视频的每一视频帧中待检测对象的各个实体位置点的反射光的序列和第一光照序列,获取待检测对象的活体检测结果,本实施例中根据第二光照序列和候选光照序列获取待检测对象的活体检测结果,具有计算量小、耗时较短的优点。此外,因为候选光照序列与第二光照序列的长度相同,因此,确定的待检测对象的活体检测结果更加准确。
聚焦于视频解帧采样和炫彩光序匹配,可作为可选模块置于炫彩响应图算法之前,如下实施例为具体说明。
可选地,在根据所述第一光照序列,和所述待检测视频所表征的所述待检测对象的各个实体位置点反射的第二光照序列,生成所述待检测对象的响应图之前,还包括:
从所述待检测视频中抽取出目标帧数的视频帧序列,并根据所述视频帧序列中的每个视频帧确定第二光照序列,所述第二光照序列表征所述待检测对象的各个实体位置点的反射光;
根据所述视频帧序列中各个视频帧的序号信息和所述第一光照序列中各个光照元素的索引信息,从所述第一光照序列中提取出多种候选光照序列,其中,各所述候选光照序列中的各个元素的索引信息与所述视频帧序列中各个视频帧的序号信息满足不同的匹配关系;
分别获取各个所述候选光照序列与所述第二光照序列的相似度值;
根据各个所述候选光照序列与所述第二光照序列的相似度值,确定目标候选光照序列;
根据所述第一光照序列,和所述待检测视频所表征的所述待检测对象的 各个实体位置点反射的第二光照序列,生成所述待检测对象的响应图,包括:
根据所述第二光照序列和所述目标候选光照序列,生成所述待检测对象的响应图。
终端按照第一光照序列发出不同颜色和/或不同光照强度的光照射待检测对象,并在按照第一光照序列发出光照照射待检测对象期间,采集待检测对象的待检测视频。其中,待检测对象为终端的摄像头采集到的对象。第一光照序列可以是终端本身生成的光照序列,也可以是服务器下发给终端的光照序列。
从待检测视频中提取出目标帧数的视频帧,并根据目标帧数的视频帧生成视频帧序列。待检测视频包括多个原始视频帧,提取的目标帧数的视频帧为原始视频帧中的部分视频帧,提取的视频帧的目标帧数少于原始视频帧的帧数。视频帧序列中各个视频帧的先后顺序遵循各个视频帧在待检测视频中的先后顺序。
从待检测视频中提取目标帧数的视频帧,提取规则可以根据需求进行设置,可以是均匀提取也可以为非均匀提取。目标帧数可以是预先配置好的确定的帧数,但因为待检测视频可能存在丢帧或多帧的情况,因此即使是按照相同规则进行视频帧提取,提取出的每一视频帧对应的原始视频帧的帧序号也可能不同。例如,提取规则都为均匀提取,目标帧数为8,在待检测视频包括的原始视频帧的总帧数为24帧的情况下,提取的第4个视频帧可能为原始视频帧中的第14个视频帧,在待检测视频包括的原始视频帧的总帧数为23帧的情况下,提取的第4个视频帧可能为原始视频帧中的第13个视频帧。
对视频帧序列中的每个视频帧进行处理,确定待检测对象的实体位置点在视频帧序列中的每个视频帧中对应的像素点,获取各个实体位置点在每个视频帧中对应的像素点的像素值,从而得到各个实体位置点在视频帧序列中的每个视频帧中的反射光,进而得到第二光照序列。待检测对象的一个实体位置点是指待检测对象实际存在的一个点,例如,待检测对象为人脸时,待检测对象的一个实体位置点可以是人脸鼻子上的一个点,该点的大小为视频中一个像素点表征的大小。因为第二光照序列中的每个光照元素表征在一个视频帧中的反射光,因此,第二光照序列的长度与视频帧序列的长度相同。
光照元素为光照序列中的元素,可以表征光的颜色、强度等信息。例如,终端依次发出了红光、蓝光和绿光,对应的光照序列可以为(红、蓝、绿),则该光照序列中的“红”可以为一个光照元素,该光照元素表征了光的颜色。
在待检测视频不存在丢帧、多帧或不同步的问题时,第二光照序列中光照元素与第一光照序列中光照元素的匹配关系,符合视频帧序列中每个视频帧与待检测视频的原始视频帧的对应关系。例如,视频帧序列中第2个视频 帧为待检测视频的第5个原始视频帧,则第二光照序列中第2个光照元素与第一光照序列中的第5个光照元素相匹配。
当待检测视频存在丢帧、多帧或不同步的问题时,第二光照序列中光照元素与第一光照序列中光照元素的匹配关系,可能不符合视频帧序列中每个视频帧与待检测视频的原始视频帧的对应关系。例如,视频帧序列中第2个视频帧为原始视频帧中的第5个视频帧,因为待检测视频存在丢帧、多帧或不同步的问题,导致待检测视频的视频帧与第一光照序列之间存在一步的偏移距离,导致待检测视频中第5个视频帧反射的光对应第一光照序列的第6个光照元素,因此,第二光照序列中的第2个光照元素应该与第一光照序列中的第6个光照元素相匹配。
根据上述例子可以看出,第二光照序列中的各个光照元素分别应该与第一光照序列中的哪个光照元素相匹配,一方面取决于提取的视频帧与原始视频帧的对应关系,另一方面取决于待检测视频反射的光的序列和第一光照序列之间的偏移距离。
可以获取多种匹配关系,每种匹配关系对应一种对应关系和一种偏移距离,对应关系为视频帧序列中的每个视频帧的序号与原始视频帧的帧序号之间的对应关系,偏移距离表征待检测视频反射的光的序列与第一光照序列之间进行滑动匹配的偏移距离。对应关系与原始视频帧的总帧数和从待检测视频中提取视频帧序列的提取规则有关,偏移距离与待检测视频是否存在丢帧、多帧或不同步等问题有关。
匹配关系表征视频帧序列中各个视频帧的序号信息与每种候选光照序列中各个光照元素在第一光照序列中的索引信息的一一匹配关系,因此,根据视频帧序列中各个视频帧的序号信息和第一光照序列中各个光照元素的索引信息,可以从第一光照序列中提取出的多种候选光照序列,每种候选光照序列的长度,与视频帧序列的长度相同。在一种匹配关系表征视频帧序列中第1个视频帧与一种候选光照序列中第1个光照元素在第一光照序列中的索引为5的情况下,则将第一光照序列中的第5个光照元素作为候选光照序列的第1个光照元素,以此类推,得到该种候选光照序列中的每个光照元素。
每种匹配关系对应一种对应关系和一种偏移距离,在进行活体检测时目标帧数和提取规则是确定的,原始视频帧的总帧数是不确定的,因此对应关系中包括不确定的原始视频帧的总帧数;因为偏移距离也是不确定的,因此每种匹配关系中可以包括原始视频帧的总帧数和偏移距离两个未知数。可以通过穷举原始视频帧的总帧数和偏移距离两个未知数可能的取值,从而得到多种匹配关系。例如,第一光照序列的长度为24,因为待检测视频通常只会存在一两帧的多帧或少帧,因此待检测视频的总帧数的取值可以为22、23、24、25或26。偏移距离的最小值为0,最大值将在后文详述。
可以理解的是,多种匹配关系中,至多只有一种匹配关系为真实的匹配关系。因此,可以分别多种候选光照序列分别与第二光照序列的相似度值,根据各个候选光照序列与第二光照序列的相似度值,从多种候选光照序列中筛选出目标候选光照序列。在一些实施方式中,可以是将与第二光照序列之间的相似度值最大的候选光照序列,确定为目标候选光照序列。
根据第二光照序列和目标候选光照序列,可以生成待检测对象的响应图,进而根据响应图,确定待检测对象的活体检测结果。其中,响应图中各像素点的响应强度表征:像素点所对应的实体位置点反射的第二光照序列与第一光照序列之间的相似度。
将待检测对象的响应图输入活体检测模型,活体检测模型根据待检测对象的响应图对待检测对象进行活体检测,可以得到待检测对象的活体检测结果。活体检测模型为通过有监督训练,学习了活体人脸的响应图的图像特征的模型,可以区分活体人脸的响应图和其它攻击(例如:翻拍攻击)的响应图,因此,活体检测模型可以通过待检测对象的响应图,得到待检测对象的活体检测结果。其中,活体检测模型进行的有监督训练可以为:获取多个样本对象(包括活体人脸以及其它对象)的响应图,将样本对象的响应图输入待训练的活体检测模型,得到样本对象为活体的预测概率;根据预测概率和样本对象真实是否为活体,建立损失函数,基于损失函数对待训练的活体检测模型的模型参数进行更新,得到活体检测模型。如此,活体检测模型可以学习到活体人脸的响应图。其中,获取样本对象的响应图的方法可以参照获取待检测对象的响应图的方法。
可以理解的是,与第二光照序列之间的相似度值最高的候选光照序列,为真实的匹配关系对应的候选光照序列。因此,根据相似度值筛选出目标候选光照序列中的每个光照元素,与第二光照序列中的每个光照元素一一对应。因此,根据第二光照序列和目标候选光照序列,获取的待检测对象的活体检测结果更加准确。
采用本申请实施例的技术方案,第二光照序列表征视频帧序列中的每个视频帧中待检测对象的各个实体位置点的反射光,因此第二光照序列的长度与视频帧序列的长度相同,且小于待检测视频的原始视频帧的数量。根据视频帧序列中各个视频帧的序号信息和第一光照序列中各个光照元素的索引信息,从第一光照序列中提取出多种候选光照序列,因此,候选光照序列的长度也与视频帧序列的长度相同。因此,相较于基于待检测视频的每一视频帧中待检测对象的各个实体位置点的反射光的序列和第一光照序列,获取待检测对象的活体检测结果,本实施例中根据第二光照序列和候选光照序列获取待检测对象的活体检测结果,具有计算量小、耗时较短的优点。此外,因为候选光照序列与第二光照序列的长度相同,因此,确定的待检测对象的活 体检测结果更加准确。
在上述技术方案的基础上,匹配关系可以是基于目标视频和目标光照序列确定的。其中,目标视频可以为待检测视频,目标光照序列为第一光照序列;或者,目标视频为样本视频,目标光照序列为与第一光照序列长度相同的样本光照序列。
可以是在获取了待检测视频和第一光照序列之后,根据待检测视频和第一光照序列确定多种匹配关系。
也可以是预先根据样本视频和与第一光照序列长度相同的样本光照序列确定多种匹配关系,并在进行活体检测时,直接根据预先生成的多种匹配关系获取多种候选光照序列。样本光照序列可以为任意光照序列。第一光照序列是按照下发策略下发的光照序列,因此,在下发第一光照序列之前即可获取到第一光照序列的长度。样本视频可以为任意视频,因为样本视频的视频总帧数可以为待检测视频的假设总帧数。
预先根据样本视频和与第一光照序列长度相同的样本光照序列确定多种匹配关系,在进行活体检测时直接使用多种匹配关系,可以缩短获取到活体检测结果的时间,保证活体检测结果的实时性,提高了活体检测的效率。
在上述技术方案的基础上,多种匹配关系可以是通过如下所述的过程确定的。
可选地,所述匹配关系是通过如下过程确定的:
获取所述目标视频以及所述目标光照序列,所述目标视频包括多个原始视频帧;
确定从所述目标视频中抽取出的目标视频帧的目标序号,与所述目标视频的原始视频帧的帧序号之间的对应关系;
将所述原始视频帧与所述目标光照序列进行滑动匹配,并确定多个偏移距离;
根据所述对应关系和所述多个偏移距离,确定多种匹配关系。
获取从目标视频中提取出目标帧数的目标视频帧;在目标视频为待检测视频的情况下,目标视频帧为视频帧序列中的视频帧。在目标视频为样本视频的情况下,目标视频帧为从样本视频总提取出的视频帧,且从目标视频中提取目标视频帧时采用的提取规则,与从待检测视频中提取出视频帧采用的提取规则相同。
在提取规则、目标帧数和目标视频的原始视频帧的总帧数都确定的情况下,可以确定目标视频帧的目标序号,与原始视频帧的帧序号之间的对应关系。
可选地,确定从所述目标视频中抽取出的目标视频帧的序号,与所述目标视频的原始视频帧的帧序号之间的对应关系,包括:
获取所述目标视频的多个假设总帧数;
针对每个假设总帧数,确定所述目标视频帧的目标序号,与所述原始视频帧的帧序号之间的对应关系;
所述根据所述对应关系和所述多个偏移距离,确定多种匹配关系,包括:
根据多种所述对应关系和所述多个偏移距离,确定多种匹配关系。
然而,实际情况中,原始视频帧的总帧数通常是不确定的,因此,在提取规则和目标帧数都确定,而原始视频帧的总帧数不确定的情况下,可以通过对原始视频帧的总帧数取不同值,得到目标视频帧的目标序号与原始视频帧的帧序号之间的多种对应关系。可以获取目标视频的多个假设总帧数,针对每个假设总帧数,确定目标视频帧的目标序号,与原始视频帧的帧序号之间的对应关系。目标视频帧的目标序号表征:目标视频帧处于多个目标视频帧中的顺序。
如此,可以使后续得到的多种匹配关系中之间,包含对应不同对应关系的匹配关系。进而在各个待检测视频包含的原始视频帧的数量不同的情况下,多种匹配关系中至少存在与每一待检测视频包含的原始视频帧的真实数量适配的匹配关系。
在得到目标视频帧的目标序号与原始视频帧的帧序号之间的对应关系之后,可以将原始视频帧与目标光照序列进行滑动匹配,并确定多个偏移距离。此处将原始视频帧与目标光照序列进行滑动匹配,并非为了判断原始视频帧与目标光照序列在什么偏移距离下能匹配上,而是为了确定偏移距离的取值范围。
在确定了多种对应关系和偏移距离的取值范围的情况下,则可以遍历多种对应关系和多个偏移距离,从而确定多种匹配关系。后文将详述如何根据对应关系和偏移距离,确定多种匹配关系。此处将多个原始视频帧视作一个序列,进行滑动匹配时,滑动窗口的长度为较短的序列的长度,滑动步长为一步,偏移距离的最小值为0,最大值为原始视频帧的总帧数与目标光照序列的长度之间的差值的绝对值。
可选地,所述将所述原始视频帧与所述目标光照序列进行滑动匹配,并确定多个偏移距离之前,所述方法还包括:
去掉所述目标光照序列的首尾两端的光照元素;
所述将所述原始视频帧与所述目标光照序列进行滑动匹配,并确定多个偏移距离,包括:
将所述原始视频帧与去掉首尾两端的光照元素后的目标光照序列进行滑动匹配,并确定多个偏移距离。
在一些实施方式中,可以去掉目标光照序列的首尾两端的光照元素,得到去掉首尾两端的光照元素的目标光照序列。每端去掉的光照元素的数量可 以根据需求进行设置。例如,若目标光照序列对应的炫彩打光数为p,每种颜色的光打q帧,则目标光照序列的长度为p×q,则每端去掉的光照元素的数量可以为q/2。去掉光照元素时,应该保证不要将整段同种颜色的光照元素去掉,例如,目标光照序列中前3个光照元素都为红光,第4个光照元素为蓝光,则去掉的光照元素的数量可以为1或2。
此种实施方式中,确定的多个偏移距离,可以是将原始样本视频帧与去掉首尾两端的光照元素的目标光照序列进行滑动匹配。
因为待检测视频的丢帧和多帧问题,都存在于待检测视频的首尾两端。因此,通过去掉目标光照序列的首尾两端的光照元素,然后将掉首尾两端的光照元素后的目标光照序列与原始视频帧进行滑动匹配,得到的匹配关系,也未考虑待检测视频和第一光照序列的首尾两端。进而,根据匹配关系获取候选光照序列,并计算候选光照序列与第一光照序列的相似度时,也未考虑首尾两端,从而解决了丢帧和多帧带来的相似度不高的问题。
可选地,根据对应关系和每个偏移距离,确定多种匹配关系,可以包括:确定在每种对应关系以及每个偏移距离下,确定与每个目标视频帧的目标序号相匹配的目标索引;基于在每种对应关系以及每个偏移距离下,与每个目标视频帧的目标序号相匹配的目标索引,得到多种匹配关系。
例如,目标视频帧的目标序号1、2、3分别对应原始视频帧的帧序号5、10、15,偏移距离为2,则与目标序号1匹配的目标索引为5-2=3,与目标序号2匹配的目标索引为8,与目标序号3匹配的目标索引为13;因此,可以确定匹配关系为:(1-3,2-7,3-13)。基于此匹配关系,在从第一光照序列中提取候选光照序列时,可以根据第一光照序列中的第3、7、13个光照元素,生成候选光照序列。
例如,目标视频帧的目标序号与原始视频帧的帧序号的对应关系为i'=3i,其中i为目标序号,i'为与目标序号i对应的帧序号。偏移距离为dt,dt的取值范围为0到3的正整数,则可以得到匹配关系r(i)=i'-dt=3i-dt。当i=1,dt=1时,可以解得r(i)=2。则在对应关系为i'=3i,偏移距离为1的情况下,目标视频帧的目标序号1匹配的目标索引为2。其中,通过代入不同的dt的取值,可以得到多个匹配关系。
如此,根据目标视频帧的序号,可以确定与目标视频帧的目标序号相匹配的光照元素在目标光照序列中的目标索引,进而得到目标视频帧的目标序号和光照元素在光照序列中的目标索引之间的匹配关系,这种匹配关系与视频帧的序号与光照元素在第一光照序列中的索引之间的匹配关系相同,因此,可以得到视频帧序列中各个视频帧的序号,与各个候选光照序列中的各个光照元素在所述第一光照序列中的索引之间的匹配关系。
可选地,所述基于在每种所述对应关系以及每个所述偏移距离下,与每 个目标序号相匹配的目标索引,得到所述多种匹配关系,包括:
在任一所述对应关系以及任一所述偏移距离下,与任一所述目标视频帧的目标序号相匹配的目标索引小于1或大于该目标索引对应的目标光照序列的长度的情况下,将该目标视频帧的目标序号相匹配的目标索引标记为空目标索引;
根据与该目标视频帧相匹配的所述空目标索引,以及与其它目标视频帧相匹配的所述目标索引,确定该对应关系以及该偏移距离下的所述匹配关系。
在上述技术方案的基础上,在任一所述对应关系以及任一所述偏移距离下,与任一所述目标视频帧的目标序号相匹配的目标索引小于1或大于该目标索引对应的目标光照序列的长度的情况下,将该对应关系以及该偏移距离下的匹配关系中,与该目标序号相匹配的目标索引记为空目标索引。
目标索引的取值最小值为1,最大值为目标光照序列的长度。因此,在确定的任一匹配关系中,若与任一目标序号相匹配的目标索引小于1,或大于该目标索引对应的目标光照序列的长度,则该目标序号对应的目标索引为空。此时,可以对匹配关系进行标注,标注该目标序号对应空元素,空元素占据一个元素位置。
在根据视频帧序列中各个视频帧的序号信息和第一光照序列中各个光照元素的索引信息,从第一光照序列中提取出多种候选光照序列时,针对每个匹配关系,确定在该匹配关系下第一光照序列中与任一视频帧的序号相匹配的光照元素的索引为空的空元素。并根据视频帧序列分钟其它视频帧的序号和第一光照序列中各个光照元素的索引信息,确定其它光照元素。根据空元素和其它光照元素,生成该匹配关系下的候选光照序列。
若一种匹配关系中标注了目标序号3对应空元素,则在根据该种匹配关系提取候选光照序列时,提取的候选光照序列中的第3个光照元素为空元素。
可选地,所述分别获取各个候选光照序列与所述第二光照序列的相似度值,包括:
针对每个所述候选光照序列,确定所述第二光照序列中与所述候选光照序列中的各个非空元素位置对应的目标光照元素;计算各所述非空元素与位置对应的所述目标光照元素的相似度;
将各个所述非空元素与位置对应的所述目标光照元素的相似度的平均值,确定为所述候选光照序列与所述第二光照序列之间的相似度值。
在计算第二光照序列和候选光照序列的相似度值时,确定所述第二光照序列中与所述候选光照序列中的各个非空元素位置对应的目标光照元素;计算各所述非空元素与位置对应的所述目标光照元素的相似度;将各个所述非空元素与位置对应的所述目标光照元素的相似度的平均值,确定为所述候选 光照序列与所述第二光照序列之间的相似度值。
根据候选光照序列中的非空元素,以及与每一非空元素处于对应位置的第二光照序列中的光照元素,计算该候选光照序列与所述第二光照序列之间的相似度。
可以是在任一候选光照序列中包括空元素的情况下,删除该候选光照序列中的空元素,得到新候选光照序列;根据该空元素的元素位置,删除第二光照序列中与空元素处于相应元素位置的光照元素,得到新第二光照序列;将新候选光照序列与新第二光照序列之间的相似度值,确定为该候选光照序列与第二光照序列之间的相似度值。
例如,一种匹配关系为(1-4,2-7,3-空),则可以根据第一光照序列中的第4、7个元素,以及一个空元素,生成候选光照序列。因为空元素占据了一个元素位置,因此,该候选光照序列的长度为3。在计算该候选光照序列与第二光照序列之间的相似度时,删除候选光照序列中的空元素,得到新候选光照序列;删除第二光照序列中的第3个光照元素,得到新第二光照序列。计算新候选光照序列与新第二光照序列之间的相似度,并将该相似度确定为该候选光照序列和第二光照序列之间的相似度。
如此,考虑了存在偏移时,与视频帧匹配的光照元素在光照序列中的索引超出了光照序列的范围的情况。在根据匹配关系从第一光照序列中提取候选光照序列时,通过设置占据元素位置的空元素,使得到的候选光照序列的长度与第二光照序列的长度相同,且能根据空元素占据的元素位置,确定第二光照序列中需要计算相似度的光照元素。
在一些实施方式中,参照图5,图5示出了本实施例中的一种活体检测方法的流程示意图,如图5所示,获取待检测视频,设待检测视频包括的原始视频帧的数量为n,待检测视频包括的原始视频帧为V=(v1,…,vn)。获取第一光照序列y=(y1,…,yN),第一光照序列的长度为N。
从待检测视频中均匀抽取m个视频帧,得到视频帧序列V'=(v'1,…,v'm)。令c=n/m,视频帧序列中的第i个视频帧为原始视频中的第i'个原始视频帧,i'=s(i)=ceil(c/2+c×i)=ceil(n/2m+i×n/m),ceil为向上取整函数;i=1,2,…,m,表征m个视频帧各自的序号。则视频帧序列V'=(v'1,…,v'm)=(vs(1),…,vs(m))。
对视频帧序列中的每个视频帧进行处理,确定待检测对象的实体位置点在视频帧序列中的每个视频帧中对应的像素点,获取一个实体位置点在每个视频帧中对应的像素点,分别在视频帧序列中的每个视频帧中的反射光,进而得到第二光照序列。待检测对象的一个实体位置点是指待检测对象实际存在的一个点,例如,待检测对象为人脸时,待检测对象的一个实体位置点可以是人脸鼻子上的一个点,该点的大小为视频中一个像素点表征的大小。因 为第二光照序列中的每个光照元素表征从一个视频帧中反射的光,因此,第二光照序列的长度与视频帧序列的长度相同,都为m。
可选地,所述每个目标视频帧的目标序号与所述原始视频帧的帧序号的多种对应关系为s,其中,i'=s(i),i为所述目标序号,i'为与目标序号i对应的帧序号;
在所述多种对应关系以及所述多个偏移距离下,所述每个目标视频帧的目标序号i相匹配的目标索引为s(i)-dt,其中,dt为所述偏移距离;
在所述多种对应关系以及所述多个偏移距离下,所述多种匹配关系为f,其中,f(i)=s(i)-dt=i'-dt。
可选地,所述目标视频帧的帧数为m;
所述每个目标视频帧的目标序号与所述原始视频帧的帧序号的对应关系为s;
其中,i'=s(i)=ceil(c/2+c×i),c=n/m,n为所述目标视频的假设总帧数,ceil为向上取整函数,i=1,2,…,m,表征所述m个目标视频帧各自的目标序号。
视频帧序列中的视频帧的序号与原始视频帧的帧序号之间的对应关系可以表征为s,其中,i'=s(i)=ceil(c/2+c×i)=ceil(n/2m+i×n/m)。对应关系也可以表征从待检测视频中提取视频帧的提取规则,因为m的值确定,因此该提取规则与n的取值有关。因此,在这种提取规则之下,对应关系s与待检测视频包含的原始视频帧的数量n有关。
可以获取多种匹配关系,多种匹配关系可以是按照如下步骤确定的。
获取目标视频,从目标视频中提取出m个目标视频帧。从原始视频中提取目标视频帧的提取规则,与从目标视频中提取视频帧的提取规则相同,因此,目标视频帧的目标序号与原始视频帧的帧序号的对应关系,与视频帧序列中的视频帧的序号与原始视频帧的帧序号之间的对应关系相同,都为s。获取目标光照序列,目标光照序列的长度与第一光照序列的长度相同。设目标光照序列中,共有p种颜色的光,每种颜色的光占q个元素位置,则N=p×q。则目标光照序列对应的索引序列为indexy=(1,...,N)。去掉索引序列的首尾两端的光照元素,去掉的的光照元素的数量可以根据需求进行设置,可以设置为d=floor(q/2),floor为向下取整函数,得到新的索引序列为indexy′=(d,d+1,...,N-d),此时新的索引序列的长度为N'=N-floor(q/2)×2。一般情况下N'不等于n,即去掉了首尾两端的光照元素的目标光照序列和原始视频帧组成的序列的长度不同。
通过滑动窗口的方式对目标光照序列和原始视频帧组成的序列进行滑动匹配,记录下每个原始视频帧与目标光照序列中的光照元素在不同偏移距离下对应的关系。用dt表征偏移距离,则偏移距离dt=0,1,...,|N'-n|。|N'-n| 为新的索引序列的长度与原始视频帧组成的序列的长度的差值的绝对值。此时,原始视频中第i'个原始视频帧在新的索引序列中的索引为r(i')=i'-dt,i'=0,1,...,n,因此,目标视频帧中的i个目标视频帧在新的索引序列中的索引为r(i)=s(i)-dt,i=0,1,...,m。
根据目标视频帧中的i个目标视频帧在新的索引序列中的索引为r(i)=s(i)-dt,可以得到匹配关系f,f(i)=i'-dt=s(i)-dt=ceil(n/2m+i×n/m)-dt。在n和dt取不同值时,可以得到多个匹配关系。
受限于新的索引序列的长度,因此,r(i)<1或r(i)大于N'时,则认为该目标视频帧在目标光照序列中没有对应的光照元素。在r(i)<1或r(i)大于N',可以单独记录fi=null。
去掉多个匹配关系中的重复项,则可以得到最终的多个匹配关系。
在进行活体检测时,可以直接根据多个匹配关系,从第一光照序列中获取多个候选光照序列。例如,若m=5,在n=20,dt=1的情况下,则匹配关系f(i)=ceil[20÷(2×5)+i×20÷5]-1。i的取值分别为1、2、3、4、5,则对应的f(i)的取值分别为5、9、13、17、21。因此,可以根据第一光照序列中的第5、9、13、17、21个光照元素,生成候选光照序列。
当n和dt取其它值时,可以得到其它匹配关系,并根据得到的其它匹配关系生成其它候选光照序列。
遍历了n和dt可能的取值,得到的多个匹配关系,涵盖了待检测视频的原始视频帧数存在的多种可能,以及待检测视频和第一光照序列之间存在多种步数的不同步的情况。然而,从多个匹配关系中,存在一种符合真实情况的匹配关系。可以理解的是,符合真实情况的匹配关系中,候选光照序列与第二光照序列之间的相似度最高。因此,可以根据每个候选光照序列与第二光照序列之间的相似度,从多个候选光照序列中筛选出目标候选光照序列。
根据第二光照序列和目标候选光照序列,生成待检测对象的响应图,响应图中各像素点的响应强度表征:像素点所对应的实体位置点反射的第二光照序列与第一光照序列之间的相似度;根据响应图,获取所述待检测对象的活体检测结果。
采用本申请实施例的技术方案,具有以下优点:
1、目标光照序列的长度与第一光照序列的长度相同,从原始视频中提取的目标视频帧的数量与从待检测视频中提取的视频帧的数量相同,以及提取规则也相同,因此,可以根据目标光照序列、原始视频和目标视频帧等,可以预先计算出多种适用于待检测视频和第一光照序列的匹配关系,从而提高活体检测的效率。
2、匹配关系中包括未知的原始视频的预期总帧数,因此,多种匹配关系包含了适用于在待检测视频存在丢帧或多帧情况时的匹配关系。匹配关系 中包括未知的偏移距离,因此,多种匹配关系包含了适用于在待检测视频与第一光照序列存在不同步数的不同步时的匹配关系。
3、待检测视频的丢帧或多帧通常是因为首尾两端的视频帧存在丢帧或多帧。在确定匹配关系时,去掉了目标光照序列的首尾两端的光照元素。因此,根据匹配关系提取候选光照序列时,也排除了第一光照序列的首尾两端的光照元素,从而使计算相似度时,排除了首尾两端的相似度。
4、相较于根据待检测视频的各个视频帧计算相似度,本实施例只需要根据目标帧数的视频帧计算相似度,大大减少了计算量,缩短了活体检测的耗时时长。
5、多种匹配关系中穷举了原始视频帧的假设总帧数和偏移距离,因此,基于多种匹配关系提取的多种候选光照序列,涵盖了在待检测视频的原始视频帧数存在的多种可能,以及待检测视频和第一光照序列之间存在多种步数的不同步的每种情况下对应的候选光照序列。因此,可以从多种候选光照序列中筛选出与视频帧序列相适配的、符合真实匹配关系的目标候选光照序列。因此,获得的活体检测结果也更加准确。其中,本申请实施例提供的活体检测方法中各个步骤的具体实现过程可参考前面方法实施例的介绍,此处不再赘述。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
图6是本申请实施例的一种活体检测装置的结构示意图,如图6所示,该活体检测装置包括视频获取模块61、响应图生成模块62、第一活体检测结果获取63,其中:
视频获取模块61,用于获取待检测对象的待检测视频,所述待检测视频为:在按照第一光照序列照射所述待检测对象期间,所采集的所述待检测对象的视频;
响应图生成模块62,用于根据所述第一光照序列,和所述待检测视频所表征的所述待检测对象的各个实体位置点反射的第二光照序列,生成所述待检测对象的响应图,所述响应图中各像素点的响应强度表征:所述像素点所对应的实体位置点反射的第二光照序列与所述第一光照序列之间的相似度;
检测结果确定模块63,用于基于所述响应图、利用第一活体检测模型对所述待检测对象进行活体检测,得到所述待检测对象的第一活体检测结果。
需要说明的是,装置实施例与方法实施例相近,故描述的较为简单,相关之处参见方法实施例即可。
本申请实施例还提供了一种电子设备,参照图7,图7是本申请实施例提出的电子设备的示意图。如图7所示,电子设备100包括:存储器110和处理器120,存储器110与处理器120之间通过总线通信连接,存储器110中存储有计算机程序,该计算机程序可在处理器120上运行,进而实现本申请实施例公开的活体检测方法中的步骤。
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序/指令,该计算机程序/指令被处理器执行时实现如本申请实施例公开的所述的活体检测方法。
本申请实施例还提供了一种计算机程序产品,包括计算机程序/指令,该计算机程序/指令被处理器执行时实现如本申请实施例公开的所述的活体检测方法。
本申请实施例还提供了一种计算机程序,该计算机程序被执行时可实现本申请实施例公开的活体检测方法。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请实施例是参照根据本申请实施例的方法、装置、电子设备和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计 算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种活体检测方法、电子设备、存储介质及程序产品,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (23)

  1. 一种活体检测方法,其特征在于,包括:
    获取待检测对象的待检测视频,所述待检测视频为:在按照第一光照序列照射所述待检测对象期间,所采集的所述待检测对象的视频;
    根据所述第一光照序列,和所述待检测视频所表征的所述待检测对象的各个实体位置点反射的第二光照序列,生成所述待检测对象的响应图,所述响应图中各像素点的响应强度表征:所述像素点所对应的实体位置点反射的第二光照序列与所述第一光照序列之间的相似度;
    基于所述响应图、利用第一活体检测模型对所述待检测对象进行活体检测,得到所述待检测对象的第一活体检测结果。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    将所述待检测视频输入第二活体检测模型,得到所述待检测对象的第二活体检测结果;
    根据所述第一活体检测结果和所述第二活体检测结果,确定所述待检测对象的最终活体检测结果。
  3. 根据权利要求1所述的方法,其特征在于,所述待检测视频所表征的所述待检测对象的各个实体位置点反射的第二光照序列,是按照以下步骤得到的:
    提取所述待检测视频的多个视频帧;
    将所述多个视频帧中描述所述待检测对象的同一实体位置点的像素点进行对齐;
    针对所述待检测对象的各个实体位置点,根据所述多个视频帧中各视频帧描述该实体位置点的像素点所反射的光照,得到该实体位置点所反射的第二光照序列。
  4. 根据权利要求3所述的方法,其特征在于,将所述多个视频帧中描述所述待检测对象的同一实体位置点的像素点进行对齐,包括:
    针对所述多个视频帧中的各视频帧,对所述视频帧进行人脸关键点检测,得到所述视频帧所包含的人脸关键点,根据各所述视频帧所包含的人脸 关键点,对所述多个视频帧进行人脸关键点级别的对齐处理,对进行人脸关键点级别的对齐处理后的所述多个视频帧进行像素点级别的对齐处理;
    或者,
    对所述多个视频帧进行像素点级别的对齐处理。
  5. 根据权利要求4所述的方法,其特征在于,通过如下过程进行所述像素点级别的对齐处理:
    分别计算参与所述像素点级别的对齐处理的多个视频帧中除参考视频帧外的其它每个视频帧,与所述参考视频帧之间的稠密光流数据,所述参考视频帧为参与所述像素点级别的对齐处理的多个视频帧中的任一个视频帧;
    根据所述其它每个视频帧与所述参考视频帧之间的稠密光流数据,将所述其它每个视频帧与所述参考视频帧进行像素点级别的对齐处理。
  6. 根据权利要求1-5任一所述的方法,其特征在于,所述第一光照序列和所述第二光照序列均包含多个颜色通道的彩光序列;
    根据所述第一光照序列,和所述待检测视频所表征的所述待检测对象的各个实体位置点反射的第二光照序列,生成所述待检测对象的响应图,包括:
    分离所述第一光照序列,得到每个所述颜色通道所对应的第一彩光序列,以及,分离所述第二光照序列,得到每个所述颜色通道所对应的第二彩光序列;
    针对每个所述颜色通道,根据所述待检测对象的每个实体位置点所反射的该颜色通道的第二彩光序列,与该颜色通道所对应的第一彩光序列之间的相似度,生成所述待检测对象在该颜色通道的响应子图;
    将所述待检测对象在每个所述颜色通道的响应子图进行融合处理,得到所述待检测对象的响应图。
  7. 根据权利要求6所述的方法,其特征在于,将所述待检测对象在每个所述颜色通道的响应子图进行融合处理,得到所述待检测对象的响应图,包括:
    根据所述待检测对象在每个所述颜色通道的响应子图,获取所述待检测对象的人脸区域在每个所述颜色通道的响应强度均值;
    根据所述待检测对象的人脸区域在每个所述颜色通道的响应强度均值,对所述待检测对象在该颜色通道的响应子图进行正则化处理,得到所述待检测对象在每个所述颜色通道的正则化响应子图;
    将所述待检测对象在每个所述颜色通道的正则化响应子图进行融合处理,得到所述待检测对象的响应图。
  8. 根据权利要求1-7任一所述的方法,其特征在于,在利用第一活体检测模型对所述响应图进行处理之前,还包括:
    获取所述待检测对象的响应图的属性值,所述属性值包括以下至少一者:响应强度均值、质量值;
    根据所述待检测对象的响应图的属性值与对应的属性阈值的大小关系,确定所述待检测对象的活体检测结果;
    在所述待检测对象的响应图的属性值小于对应的所述属性阈值的情况下,确定所述待检测对象不是活体;
    在所述待检测对象的响应图的属性值不小于对应的所述属性阈值的情况下,执行基于所述响应图、利用活体检测模型对所述待检测对象进行活体检测的步骤。
  9. 根据权利要求1-7任一所述的方法,其特征在于,所述第一活体检测模型为学习了活体人脸的响应图的第一图像特征以及所述活体人脸的第二图像特征的模型;
    基于所述响应图、利用所述第一活体检测模型对所述待检测对象进行活体检测,得到所述待检测对象的第一活体检测结果,包括:
    提取所述响应图的第一图像特征;
    提取所述待检测视频的任一个视频帧,并获取该视频帧的第二图像特征;
    融合所述第一图像特征以及所述第二图像特征,得到融合图像特征;
    利用所述第一活体检测模型对所述融合图像特征进行处理,得到所述待检测对象的第一活体检测结果。
  10. 根据权利要求1-7任一所述的方法,其特征在于,在根据所述第一 光照序列,和所述待检测视频所表征的所述待检测对象的各个实体位置点反射的第二光照序列,生成所述待检测对象的响应图之前,还包括:
    从所述待检测视频中抽取出目标帧数的视频帧序列,并根据所述视频帧序列中的每个视频帧确定第二光照序列,所述第二光照序列表征所述待检测对象的各个实体位置点的反射光;
    根据所述视频帧序列中各个视频帧的序号信息和所述第一光照序列中各个光照元素的索引信息,从所述第一光照序列中提取出多种候选光照序列,其中,各所述候选光照序列中的各个元素的索引信息与所述视频帧序列中各个视频帧的序号信息满足不同的匹配关系;
    分别获取各个所述候选光照序列与所述第二光照序列的相似度值;
    根据各个所述候选光照序列与所述第二光照序列的相似度值,确定目标候选光照序列;
    根据所述第一光照序列,和所述待检测视频所表征的所述待检测对象的各个实体位置点反射的第二光照序列,生成所述待检测对象的响应图,包括:
    根据所述第二光照序列和所述目标候选光照序列,生成所述待检测对象的响应图。
  11. 根据权利要求10所述的方法,其特征在于,所述分别获取各个候选光照序列与所述第二光照序列的相似度值,包括:
    针对每个所述候选光照序列,确定所述第二光照序列中与所述候选光照序列中的各个非空元素位置对应的目标光照元素;
    计算各所述非空元素与位置对应的所述目标光照元素的相似度;
    将各个所述非空元素与位置对应的所述目标光照元素的相似度的平均值,确定为所述候选光照序列与所述第二光照序列之间的相似度值。
  12. 根据权利要求10所述的方法,其特征在于,所述匹配关系是基于目标视频和目标光照序列确定的;
    其中,所述目标视频为所述待检测视频,所述目标光照序列为所述第一光照序列;
    或者,所述目标视频为样本视频,所述目标光照序列为与所述第一光照 序列长度相同的样本光照序列。
  13. 根据权利要求12所述的方法,其特征在于,所述匹配关系是通过如下过程确定的:
    获取所述目标视频以及所述目标光照序列,所述目标视频包括多个原始视频帧;
    确定从所述目标视频中抽取出的目标视频帧的目标序号,与所述目标视频的原始视频帧的帧序号之间的对应关系;
    将所述原始视频帧与所述目标光照序列进行滑动匹配,并确定多个偏移距离;
    根据所述对应关系和所述多个偏移距离,确定多种匹配关系。
  14. 根据权利要求13所述的方法,其特征在于,所述确定从所述目标视频中抽取出的目标视频帧的序号,与所述目标视频的原始视频帧的帧序号之间的对应关系,包括:
    获取所述目标视频的多个假设总帧数;
    针对每个假设总帧数,确定所述目标视频帧的目标序号,与所述原始视频帧的帧序号之间的对应关系;
    所述根据所述对应关系和所述多个偏移距离,确定多种匹配关系,包括:
    根据多种所述对应关系和所述多个偏移距离,确定多种匹配关系。
  15. 根据权利要求13所述的方法,其特征在于,所述将所述原始视频帧与所述目标光照序列进行滑动匹配,并确定多个偏移距离之前,所述方法还包括:
    去掉所述目标光照序列的首尾两端的光照元素;
    所述将所述原始视频帧与所述目标光照序列进行滑动匹配,并确定多个偏移距离,包括:
    将所述原始视频帧与去掉首尾两端的光照元素后的目标光照序列进行滑动匹配,并确定多个偏移距离。
  16. 根据权利要求13-15任一所述的方法,其特征在于,所述根据所述对应关系和多个所述偏移距离,确定多种匹配关系,包括:
    在每种所述对应关系以及每个所述偏移距离下,确定与每个所述目标视频帧的目标序号相匹配的光照元素在所述目标光照序列中的目标索引;
    基于在每种所述对应关系以及每个所述偏移距离下,与每个目标序号相匹配的目标索引,得到所述多种匹配关系。
  17. 根据权利要求16所述的方法,其特征在于,所述基于在每种所述对应关系以及每个所述偏移距离下,与每个目标序号相匹配的目标索引,得到所述多种匹配关系,包括:
    在任一所述对应关系以及任一所述偏移距离下,与任一所述目标视频帧的目标序号相匹配的目标索引小于1或大于该目标索引对应的目标光照序列的长度的情况下,将该目标视频帧的目标序号相匹配的目标索引标记为空目标索引;
    根据与该目标视频帧相匹配的所述空目标索引,以及与其它目标视频帧相匹配的所述目标索引,确定该对应关系以及该偏移距离下的所述匹配关系。
  18. 根据权利要求13-15任一所述的方法,其特征在于,所述每个目标视频帧的目标序号与所述原始视频帧的帧序号的多种对应关系为s,其中,i'=s(i),i为所述目标序号,i'为与目标序号i对应的帧序号;
    在所述多种对应关系以及所述多个偏移距离下,所述每个目标视频帧的目标序号i相匹配的目标索引为s(i)-dt,其中,dt为所述偏移距离;
    在所述多种对应关系以及所述多个偏移距离下,所述多种匹配关系为f,其中,f(i)=s(i)-dt=i'-dt。
  19. 根据权利要求13-15任一所述的方法,其特征在于,所述目标视频帧的帧数为m;
    所述每个目标视频帧的目标序号与所述原始视频帧的帧序号的对应关系为s;
    其中,i'=s(i)=ceil(c/2+c×i),c=n/m,n为所述目标视频的假设总帧数,ceil为向上取整函数,i=1,2,…,m,表征所述m个目标视频帧各自的目标序号。
  20. 根据权利要求10-15任一项所述的方法,其特征在于,所述根据所述视频帧序列中各个视频帧的序号信息和所述第一光照序列中各个光照元素的索引信息,从所述第一光照序列中提取出多种候选光照序列,包括:
    针对每个匹配关系,确定在所述匹配关系下所述第一光照序列中与任一视频帧的序号相匹配的光照元素的索引为空的空元素;
    根据所述视频帧序列中其它视频帧的序号和所述第一光照序列中各个光照元素的索引信息,确定其它光照元素;
    根据所述空元素和所述其它光照元素,生成该匹配关系下的所述候选光照序列。
  21. 一种电子设备,包括存储器、处理器及存储在所述存储器上的计算机程序,其特征在于,所述处理器执行所述计算机程序以实现权利要求1至20中任一项所述的活体检测方法。
  22. 一种计算机可读存储介质,其上存储有计算机程序/指令,其特征在于,该计算机程序/指令被处理器执行时实现如权利要求1至20中任一项所述的活体检测方法。
  23. 一种计算机程序产品,包括计算机程序/指令,其特征在于,该计算机程序/指令被处理器执行时实现如权利要求1至20中任一项所述的活体检测方法。
PCT/CN2023/094603 2022-05-16 2023-05-16 一种活体检测方法、电子设备、存储介质及程序产品 WO2023221996A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210528092.5A CN115147936A (zh) 2022-05-16 2022-05-16 一种活体检测方法、电子设备、存储介质及程序产品
CN202210528092.5 2022-05-16
CN202310306903.1 2023-03-24
CN202310306903.1A CN116434349A (zh) 2023-03-24 2023-03-24 活体检测方法、电子设备、存储介质及程序产品

Publications (1)

Publication Number Publication Date
WO2023221996A1 true WO2023221996A1 (zh) 2023-11-23

Family

ID=88834672

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/094603 WO2023221996A1 (zh) 2022-05-16 2023-05-16 一种活体检测方法、电子设备、存储介质及程序产品

Country Status (1)

Country Link
WO (1) WO2023221996A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018121428A1 (zh) * 2016-12-30 2018-07-05 腾讯科技(深圳)有限公司 一种活体检测方法、装置及存储介质
CN110414346A (zh) * 2019-06-25 2019-11-05 北京迈格威科技有限公司 活体检测方法、装置、电子设备及存储介质
CN110765923A (zh) * 2019-10-18 2020-02-07 腾讯科技(深圳)有限公司 一种人脸活体检测方法、装置、设备及存储介质
CN113888500A (zh) * 2021-09-29 2022-01-04 平安银行股份有限公司 基于人脸图像的炫光程度检测方法、装置、设备及介质
US20220148337A1 (en) * 2020-01-17 2022-05-12 Tencent Technology (Shenzhen) Company Limited Living body detection method and apparatus, electronic device, and storage medium
CN115147936A (zh) * 2022-05-16 2022-10-04 北京旷视科技有限公司 一种活体检测方法、电子设备、存储介质及程序产品

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018121428A1 (zh) * 2016-12-30 2018-07-05 腾讯科技(深圳)有限公司 一种活体检测方法、装置及存储介质
CN110414346A (zh) * 2019-06-25 2019-11-05 北京迈格威科技有限公司 活体检测方法、装置、电子设备及存储介质
CN110765923A (zh) * 2019-10-18 2020-02-07 腾讯科技(深圳)有限公司 一种人脸活体检测方法、装置、设备及存储介质
US20220148337A1 (en) * 2020-01-17 2022-05-12 Tencent Technology (Shenzhen) Company Limited Living body detection method and apparatus, electronic device, and storage medium
CN113888500A (zh) * 2021-09-29 2022-01-04 平安银行股份有限公司 基于人脸图像的炫光程度检测方法、装置、设备及介质
CN115147936A (zh) * 2022-05-16 2022-10-04 北京旷视科技有限公司 一种活体检测方法、电子设备、存储介质及程序产品

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
REN, YU: "The Research on Face Anti-spoofing Algorithm Based on Multi-scale and Multi-modal Fusion", MASTER THESIS, SICHUAN UNIVERSITY / INFORMATION TECHNOLOGY, CHINA, 1 January 2021 (2021-01-01), China, pages 1 - 78, XP009550593, DOI: 10.27342/d.cnki.gscdu.2021.000060 *
YAO LIU; YING TAI; JILIN LI; SHOUHONG DING; CHENGJIE WANG; FEIYUE HUANG; DONGYANG LI; WENSHUAI QI; RONGRONG JI: "Aurora Guard: Real-Time Face Anti-Spoofing via Light Reflection", ARXIV.ORG, 27 February 2019 (2019-02-27), XP081034077 *

Similar Documents

Publication Publication Date Title
CN110765923B (zh) 一种人脸活体检测方法、装置、设备及存储介质
CN111488756B (zh) 基于面部识别的活体检测的方法、电子设备和存储介质
CN110163078A (zh) 活体检测方法、装置及应用活体检测方法的服务系统
CN107832677A (zh) 基于活体检测的人脸识别方法及系统
CN112801057B (zh) 图像处理方法、装置、计算机设备和存储介质
CN105095870A (zh) 基于迁移学习的行人重识别方法
CN108664843B (zh) 活体对象识别方法、设备和计算机可读存储介质
CN105046219A (zh) 一种人脸识别系统
CN107479801A (zh) 基于用户表情的终端显示方法、装置及终端
WO2022222575A1 (zh) 用于目标识别的方法和系统
CN114241517B (zh) 基于图像生成和共享学习网络的跨模态行人重识别方法
CN114998934B (zh) 基于多模态智能感知和融合的换衣行人重识别和检索方法
CN106991364A (zh) 人脸识别处理方法、装置以及移动终端
WO2022222569A1 (zh) 一种目标判别方法和系统
CN115147936A (zh) 一种活体检测方法、电子设备、存储介质及程序产品
CN112836625A (zh) 人脸活体检测方法、装置、电子设备
CN113312965A (zh) 一种人脸未知欺骗攻击活体检测方法及系统
CN112257685A (zh) 人脸翻拍识别方法、装置、电子设备及存储介质
CN110648336B (zh) 一种舌质和舌苔的分割方法及装置
CN109740527B (zh) 一种视频帧中图像处理方法
CN113111810A (zh) 一种目标识别方法和系统
WO2023221996A1 (zh) 一种活体检测方法、电子设备、存储介质及程序产品
CN109711232A (zh) 基于多目标函数的深度学习行人重识别方法
CN112070041B (zh) 一种基于cnn深度学习模型的活体人脸检测方法和装置
CN116152932A (zh) 活体检测方法以及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23806947

Country of ref document: EP

Kind code of ref document: A1