CN116206373A

CN116206373A - Living body detection method, electronic device and storage medium

Info

Publication number: CN116206373A
Application number: CN202310143646.4A
Authority: CN
Inventors: 莫原野; 金宇林; 朱纯博
Original assignee: Beijing Kuangshi Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd
Priority date: 2023-02-10
Filing date: 2023-02-10
Publication date: 2023-06-02

Abstract

The present disclosure provides a living body detection method, an electronic device, and a storage medium. The method comprises the steps of obtaining a full-image frame sequence and a human face image frame sequence of target content read by a user to be detected, inputting the full-image frame sequence into a first living body detection sub-model to obtain a first living body detection result, inputting the human face image frame sequence into a second living body detection sub-model to obtain a second living body detection result, and obtaining a target detection result based on the first living body detection result and the second living body detection result. In the method, the living body detection is realized based on the frame sequence of the full-image and the face image, the image global area, the face local area and the like are comprehensively and fully detected, so that the obtained target detection result can fully represent the image information, the classification precision of the living body detection is effectively improved, the detection is carried out by combining the global information and the local information when the forging mode is different, and the generalization and the stability of the living body detection are also ensured.

Description

Living body detection method, electronic device and storage medium

Technical Field

The disclosure belongs to the technical field of computer vision, and in particular relates to a living body detection method, electronic equipment and a storage medium.

Background

The living body detection is a detection method for verifying whether an object to be detected is a living body or not based on biological characteristics, and can effectively defend in a verification scene of face recognition by the living body detection, and attack means such as photos, videos and shielding are adopted. The lip language living body detection is a living body detection method based on reading, and the lip action characteristics of a user in a video picture are extracted by acquiring a video of random schematic content read by the user, so that living body detection is carried out according to the lip action characteristics.

At present, in order to improve the accuracy of living body detection and strengthen the performance of a living body detection algorithm, frames can be extracted from video, partial images of human faces can be cut out, and authenticity classification judgment can be carried out through a human face detection algorithm.

It can be seen that the existing improvement mode simplifies the information image processing mode, and adopts frame extraction, facial partial image extraction and the like to improve the attention of the model to the characteristic information. However, the method can influence the full learning of the model on the image information, so that the detection and recognition capability of the model on the information is insufficient, the generalization is weak, and the detection rate of different counterfeiting modes is unstable.

Disclosure of Invention

The purpose of the present disclosure is to provide a living body detection method, an electronic device, and a storage medium, which can ensure that a model fully learns image information, enhance generalization of living body detection, and ensure detection rates of different falsification modes.

In order to solve the above technical problems, the present disclosure is implemented as follows:

in a first aspect, the present disclosure provides a method of in vivo detection, the method comprising: acquiring a full image frame sequence and a face image frame sequence when a user to be detected reads target content, wherein each full image frame in the full image frame sequence comprises a global area when the user to be detected reads the target content, and each face image frame in the face image frame sequence comprises a face local area in the process of the user to be detected reading the target content; inputting the full image frame sequence into a first living body detection sub-model, and carrying out living body detection on the user to be detected based on the global area in each full image frame through the first living body detection sub-model to obtain a first living body detection result aiming at the user to be detected; inputting the human face image frame sequence into a second living body detection sub-model, and carrying out living body detection on the user to be detected based on the human face local area in each human face image frame through the second living body detection sub-model to obtain a second living body detection result aiming at the user to be detected; and determining a target detection result of the user to be detected based on the first living body detection result and the second living body detection result.

Optionally, before inputting the sequence of full-image frames into the first living body detection submodel, further comprising: performing inter-frame difference processing on the full-image frame sequence to obtain full-image frame differential images of any two adjacent full-image frames in the full-image frame sequence, and determining the full-image frame differential image sequence; accordingly, inputting the sequence of full image frames into the first living being detection submodel comprises: the full frame differential image sequence is input to a first living body detection sub-model.

Optionally, before inputting the sequence of face image frames into the second living body detection sub-model, the method further comprises: performing inter-frame difference processing on the face image frame sequence to obtain face frame difference images of any two adjacent face image frames in the face image frame sequence, and determining the face frame difference image sequence; correspondingly, inputting the sequence of face image frames into the second living body detection sub-model comprises: and inputting the human face frame differential image sequence into a second living body detection sub-model.

Optionally, the first living body detection sub-model includes a first time sequence sub-network and a first full-connection sub-network which are set in parallel, the number of the first full-connection sub-networks is equal to the number of full-frame differential images in the full-frame differential image sequence, living body detection is performed on the user to be detected based on global areas in all the full-frame differential image frames through the first living body detection sub-model, and a first living body detection result for the user to be detected is obtained, including: extracting first time sequence characteristics of the full-frame differential image sequence through a first time sequence sub-network, and performing living detection on a user to be detected based on the first time sequence characteristics to obtain a first time sequence detection result, wherein the first time sequence characteristics represent the change of each full-frame differential image in time sequence; extracting global image characteristics of a full-image frame differential image through each first full-connection sub-network respectively, and carrying out living detection on a user to be detected based on the global image characteristics to obtain first full-connection detection results corresponding to each full-image frame differential image respectively; and fusing the first timing sequence detection result and each first full-connection detection result to obtain a first living body detection result corresponding to the full-image frame sequence.

Optionally, the second living body detection sub-model includes a second time sequence sub-network and a second full-connection sub-network, the number of the second full-connection sub-networks is equal to the number of face frame differential images in the face frame differential image sequence, the living body detection is performed on the user to be detected based on the face local area in each face image frame through the second living body detection sub-model, and a second living body detection result for the user to be detected is obtained, including: extracting second time sequence characteristics of the human face frame differential image sequence diagram through a second time sequence sub-network, and performing living detection on a user to be detected based on the second time sequence characteristics to obtain a second time sequence detection result, wherein the second time sequence characteristics represent the change of each human face frame differential image in time sequence; extracting face image characteristics of a face frame differential image through each second full-connection sub-network respectively, and carrying out living detection on a user to be detected based on the face image characteristics to obtain second full-connection detection results corresponding to each full-frame differential image respectively; and fusing the second time sequence detection result and the second full-connection detection result to obtain a second living body detection result corresponding to the human face image frame sequence.

Optionally, acquiring the full image frame sequence and the face image frame sequence when the user to be detected reads the target content includes: acquiring an original video when a user to be tested reads target content; performing frame extraction processing on an original video to obtain a full-image frame sequence; determining key points of human faces in each full image frame respectively; respectively carrying out mean value processing on the face key points of each full image frame to obtain registration key points of each full image frame; and based on the registration key points, carrying out centering processing on the local areas of the human faces in all the image frames to obtain corresponding human face image frames so as to determine a human face image frame sequence.

Optionally, determining the target detection result of the user to be detected based on the first living detection result and the second living detection result includes: fusion processing is carried out on the first living body detection result and the second living body detection result, and a target detection result of a user to be detected is obtained; the fusion processing comprises taking the average value and the maximum value, or carrying out classification detection based on the first living body detection result and the second living body detection result.

Optionally, the training steps of the first living sub-model and the second living sub-model are as follows: acquiring a full-image frame sample sequence and a face image frame sample sequence; inputting the full-image frame sample sequence into an initial first living body detection sub-model, obtaining a third living body detection result output by the initial first living body detection sub-model, and iterating the initial first living body detection sub-model based on the third living body detection result to obtain a first living body detection sub-model; and inputting the human face image frame sample sequence into an initial second living body detection sub-model, obtaining a fourth living body detection result output by the initial second living body detection sub-model, and iterating the initial second living body detection sub-model based on the fourth living body detection result to obtain a second living body detection sub-model.

In a second aspect, the present disclosure provides an electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which program or instruction when executed by the processor implements the steps of the in vivo detection method as in the first aspect.

In a third aspect, the present disclosure provides a readable storage medium having stored thereon a program or instructions which when executed by a processor perform the steps of the in vivo detection method as in the first aspect.

In a fourth aspect, the present disclosure provides a chip comprising a processor and a communication interface, the communication interface and the processor being coupled, the processor being for running a program or instructions to implement the steps of the in vivo detection method as in the first aspect.

In a fifth aspect, the present disclosure provides a computer program product having a computer program stored thereon, which, when executed by a processor, implements the biopsy method according to the first aspect.

In the living body detection method provided by the disclosure, a full image frame sequence and a human face image frame sequence when a user to be detected reads target content are obtained, wherein each full image frame in the full image frame sequence comprises a global area when the user to be detected reads the target content, and each human face image frame in the human face image frame sequence comprises a human face local area in the process of the user to be detected reading the target content; and inputting the full-image frame sequence into a first living body detection sub-model, inputting the human face image frame sequence into a second living body detection sub-model, respectively obtaining and outputting a first living body detection result and a second living body detection result, and obtaining a target detection result of the user to be detected based on the first living body detection result and the second living body detection result. In the embodiment of the disclosure, the living body detection is realized based on the frame sequence of the full-image and the face image, and the global image, the local face area and the like are comprehensively and fully detected, so that the obtained target detection result can fully represent the image information, the classification precision of the living body detection is effectively improved, the detection is carried out by combining the global information and the local information when the forging mode is different, and the generalization and the stability of the living body detection are also ensured.

Drawings

FIG. 1 is one of the step flowcharts of the in-vivo detection method provided in the embodiments of the present disclosure;

FIG. 2 is a second flowchart illustrating steps of a method for in-vivo detection according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a living body detection model provided in an embodiment of the present disclosure;

FIG. 4 is a flowchart of steps of a training method for a living body detection model provided by an embodiment of the present disclosure;

fig. 5 is a block diagram showing the structure of a living body detection apparatus according to an embodiment of the present disclosure;

FIG. 6 is a block diagram of a training device for a living body detection model provided by an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure;

fig. 8 is a hardware schematic of an electronic device according to an embodiment of the disclosure.

Detailed Description

The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, where appropriate, such that embodiments of the disclosure may be practiced in sequences other than those illustrated and described herein, and that the objects identified by "first," "second," etc. are generally of the same type and are not limited to the number of objects, e.g., the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

It should be noted that, the data obtained in the present disclosure, including the related data such as the full image frame, the face image frame, and the user identity information, are all accessed, collected, stored, and applied to the subsequent analysis processing under the condition that the user or the related data belongs to agree and authorize after explicitly informing the user or the related data of the information such as the collection content, the data usage, the processing mode, and the like of the data, and the method for canceling the agreement and authorization can provide the user or the related data with a way for accessing, correcting, and deleting the data.

The following describes in detail the living body detection provided by the embodiment of the present disclosure through specific embodiments and application scenarios thereof with reference to the accompanying drawings.

Fig. 1 is one of the step flowcharts of the living body detection method provided in the embodiment of the present disclosure. The method may comprise the following steps 101 to 103.

Step 101, acquiring a full image frame sequence and a face image frame sequence when a user to be detected reads target content, wherein each full image frame in the full image frame sequence comprises a global area when the user to be detected reads the target content, and each face image frame in the face image frame sequence comprises a face local area in the process of the user to be detected reading the target content.

The user to be detected refers to a user waiting for living detection in a living detection scene, and the living detection scene can be a scene for carrying out identity verification on the user, such as a face payment scene, an entrance guard verification scene, an online login scene, a language test scene and the like. The lip language living body detection can be realized by acquiring an image frame when a user to be detected reads target content, the living body detection is carried out on the user to be detected, and the target content can be numbers, texts or question prompts so that the user to be detected can answer spoken language based on the question prompts. On the basis, the living body detection can be carried out through the change characteristics of the facial actions in time sequence in the process of reading the target content by the user to be detected, so that whether the user to be detected is a living body or not can be determined.

In the embodiment of the disclosure, a full image frame sequence and a human face image frame sequence when a user to be detected reads target content can be obtained, wherein the full image frame sequence comprises full image frames arranged according to time sequence, and each full image frame can be an image of a global area corresponding to the time sequence when the user to be detected reads the target content; the face image frame sequence includes face image frames arranged according to a time sequence, and each face image frame may be an image of a face local area corresponding to the time sequence to-be-detected user reading target content.

In an optional embodiment of the disclosure, the full image frame sequence and the face image frame sequence may be acquired by configuring the shooting module and acquiring the full image frame sequence and the face image frame sequence in the process of reading the target content by the user to be detected, or may be acquired by establishing communication connection with other electronic devices configured with the shooting module, so that the full image frame sequence and the face image frame sequence when the user to be detected reads the target content are acquired by the other electronic devices. Other electronic devices may be cell phones, tablets, computers, cameras, panel computers, monitoring devices, etc.

102, inputting a full image frame sequence into a first living body detection sub-model, and carrying out living body detection on a user to be detected based on a global area in each full image frame through the first living body detection sub-model to obtain a first living body detection result aiming at the user to be detected; and inputting the human face image frame sequence into a second living body detection sub-model, and carrying out living body detection on the user to be detected based on the human face local area in each human face image frame through the second living body detection sub-model to obtain a second living body detection result aiming at the user to be detected.

In the embodiment of the disclosure, the first living body sub-model is obtained based on full image sample training, and can be a living body detection model based on a global area of a full image frame; the second living body detection sub-model is obtained based on training of face image samples, and can be used for carrying out living body detection on a face local area of a face image frame. On the basis, the first living body detection sub-model focuses more on the whole information of the image, and the second living body detection sub-model focuses more on the local information of the face area.

Further, after the full-image frame sequence and the face image frame sequence are acquired, the first living body detection sub-model and the second living body detection sub-model can be used in combination, the full-image frame sequence is input into the first living body detection sub-model, so that the first living body detection sub-model carries out living body detection on the whole image information based on the global area in each full-image frame, and a first living body detection result corresponding to a user to be detected is obtained; and inputting the human face image frame sequence into a second living body detection sub-model, so that the second living body detection sub-model carries out living body detection on the human face local information based on the human face local area in each human face image frame, and a second living body detection result corresponding to the user to be detected is obtained.

Step 103, determining a target detection result of the user to be detected based on the first living body detection result and the second living body detection result.

In the embodiment of the disclosure, after the first living body detection result output by the first living body detection sub-model and the second living body detection result output by the second living body detection sub-model are respectively obtained, the target detection result may be further obtained based on the first living body detection result and the first living body detection result. The target detection result can integrate the whole-image structure information and the local face information in the process of reading the target content by the user to be detected, so that living body detection of the user to be detected is realized more accurately, the problem that the detection rate is reduced due to the change of the fake mode is avoided, and generalization and robustness of living body detection classification are improved.

Fig. 2 is a second flowchart of steps of a living body detection method according to an embodiment of the disclosure. The method may include the following steps 201 to 207.

Step 201, obtaining an original video when a user to be tested reads target content.

In the embodiment of the disclosure, an original video of a user to be detected when the user to be detected reads the target content can be obtained, wherein the original video comprises facial actions of the user to be detected. The shooting module can be configured locally to shoot the facial action of the user to be detected when the user reads the target content, or the communication connection can be established between the shooting module and other electronic equipment configured with the shooting module, and the original video of the user to be detected when the user reads the target content can be acquired through other equipment.

And 202, performing frame extraction processing on the original video to obtain a full-image frame sequence.

Furthermore, the collected original video can be subjected to frame extraction processing to obtain a full-image frame sequence, partial image information is obtained from the original video through frame extraction processing to represent integral information, integral analysis and operation processing of all the image information in the original video can be avoided, calculation cost is reduced on the basis of ensuring detection accuracy, and model operation and deployment cost is reduced. The method for extracting frames of the original video can be selected by a person skilled in the art according to operation conditions and analysis requirements, for example, the frame extraction process can be to extract 1 full image frame in the original video at equal intervals, or can be to set an entry point in the original video at equal intervals, and extract a certain number of full image frames continuously at each entry point. And the frame extraction processing can be performed after the original video is shot, or in the shooting process, and the embodiment of the disclosure does not limit the manner of obtaining the original video and the frame extraction processing and the timing of the frame extraction processing.

For example, in the original video in which the user to be tested reads the numbers "6, 1, 9, 4, 2, 9" at intervals t ₁ Extracting 1 full image frame, and obtaining a full image frame sequence based on time sequence arrangement for the full image frame;

or, in the original video of the user to be tested for reading the numbers "6, 1, 9, 4, 2 and 9", continuous n frames of full image frames are extracted at intervals t2, and a full image frame sequence is obtained for the full image frames based on time sequence arrangement.

Step 203, determining the key points of the human face in each full image frame.

In the embodiment of the disclosure, the face key points are used for identifying and locating different characteristic positions of the face, such as positions of eyes, nose, mouth, eyebrows, face contours and the like, and according to requirements on identification precision and actual data processing conditions, positions for calibrating the face key points and the number of the face key points can be set, and the number of the face key points can be 68, 49, 21 or 5 and the like. In the sequence of full image frames, the number of face keypoints for different full image frames, as well as the calibrated features, should be the same. The face key points can be marked manually or by a pre-trained face key point marking model, which is not particularly limited in the embodiment of the disclosure.

And 204, respectively carrying out mean value processing on the face key points of each full image frame to obtain registration key points of each full image frame.

In the embodiment of the disclosure, the mean processing may be to perform a mean operation on the coordinate positions of the face key points in the full image frame, and the coordinate positions of the registration key points in the full image frame may be obtained by performing the mean processing on the face key points. The registration key points are obtained on average based on the face key points, and the face movement information of the user to be detected when the user reads the target content can be reserved while the representation of the face key points on the face characteristics of the face is reserved, so that the accuracy and the efficiency of living body detection are improved.

Step 205, based on the registration key points, performing centering processing on the partial face areas in each full image frame to obtain corresponding face image frames so as to determine a face image frame sequence.

In the embodiment of the disclosure, the human face image frame sequence may be obtained by performing human face registration on the full image frame sequence. Face registration, also called face alignment, is a technology for detecting and positioning key points of faces in images, and a local area of the faces can be determined in a full image frame through face registration processing, so that interference of background information on face information is weakened and reduced in an obtained face image frame sequence, and the characteristics of the local face area are highlighted. Specifically, the face registration may be to perform face local region centering processing on the full image frame based on the registration key points, such as performing affine transformation on different full image frames based on the registration key points, so as to generate face image frames with the same size and face local regions in a centered position. In the full image frame sequence, alignment, matching and scaling can be carried out on each full image frame based on the registration key points, so that the face image frame centered on the face under the same size is obtained, and the face image frame sequence is formed.

And 206, carrying out inter-frame difference processing on the full image frame sequence to obtain full image frame differential images of any two adjacent full image frames in the full image frame sequence, and determining the full image frame differential image sequence.

In the embodiment of the disclosure, the inter-frame difference refers to pixel-by-pixel difference processing on two adjacent image frames, so as to obtain a frame difference image corresponding to the two adjacent image frames, and dynamic change characteristics of the two adjacent image frames in time sequence can be represented by the frame difference image, so that behavior characteristics of a user to be tested in the process of reading target content are represented. In the living body detection process, the living body detection can be carried out by paying attention to behavior abnormality caused by counterfeiting, and the attention to information such as image texture and structure is reduced, so that the living body detection precision is ensured and the living body detection efficiency is improved on the basis of reducing the data quantity. At this time, in the full-image frame sequence, the difference processing can be performed on the pixel-by-pixel basis on any two adjacent full-image frames according to the time sequence, so as to obtain the full-image frame differential image corresponding to any two adjacent full-image frames, and further obtain the full-image frame differential image sequence corresponding to the full-image frame sequence.

Step 207, inputting the full-image differential image sequence into a first living body detection sub-model, and performing living body detection on the user to be detected based on the global area in each full-image frame through the first living body detection sub-model to obtain a first living body detection result for the user to be detected.

Furthermore, the full-frame differential image sequence can be input into the first living body detection sub-model, and the dynamic change information of the video is re-extracted based on the full-frame differential image sequence, so that the information quantity of the first living body detection sub-model, which is concerned, is reduced, the network structure of the first living body detection sub-model is simplified, and the parameter quantity is reduced. On the basis of low deployment cost, the first living body detection sub-model can be applied to carry out living body detection on the full-frame differential image sequence, and a living body detection result corresponding to the full-frame differential image sequence is obtained.

In the embodiment of the disclosure, after the full-frame differential image sequence is obtained, feature extraction may be further performed on the full-frame differential image sequence to obtain a feature representation of the full-frame differential image sequence adapted to the first living body detection sub-model structure. The feature extraction mode may be selected according to the task requirement and the deployment condition of the first living body detection sub-model, for example, a neural Network with small parameter amount, such as a Residual Network (res net), a Residual Network (MobileNet, shuffleNet, effNet, vit, swinTransformer), and the like, and light-weight may be used to perform feature extraction on the full-frame differential image sequence. In practical application, when the CPU is adopted to perform feature extraction model reasoning, mobileNet, shuffleNet, effNet and the like with lower operation requirements can be adopted; when the GPU is adopted for carrying out feature extraction model reasoning, other network structures with higher operation requirements can be adopted, so that the precision of feature extraction is improved on the basis of meeting deployment conditions.

In an alternative method embodiment of the present disclosure, the first living being detection sub-model may include a first timing sub-network and a first fully-connected sub-network arranged in parallel, the number of the first fully-connected sub-networks being equal to the number of full-frame differential images in the full-frame differential image sequence.

The first time sequence sub-network can extract dynamic behavior characteristics from the time sequence, so that the whole full-frame differential image sequence is subjected to living detection based on dynamic change information among the full-frame differential images, and the first time sequence sub-network can select a neural network which can be used for processing time sequence information, such as RNN (Recurrent Neural Network, cyclic neural network), LSTM (Long Short-Term Memory network), CNN (Convolutional Neural Networks, convolutional neural network) and the like; the first fully-connected subnetwork may perform in-vivo detection on each fully-frame differential image based on dynamic change information characterized by pixel differences, and on this basis, the number of the first fully-connected subnetworks may be equal to the number of the fully-frame differential images in the sequence of fully-frame differential images. The first living body detection sub-model can carry out combined living body detection on the whole and part of the whole image frame differential image sequence through the structure of parallel arrangement of the first time sequence sub-network and the first full-connection sub-network, so that the whole image living body detection precision is further improved.

Step 207 may include steps A1 through A2 as follows.

A1, extracting first time sequence features of a full-frame differential image sequence through a first time sequence sub-network, and performing living detection on a user to be detected based on the first time sequence features to obtain a first time sequence detection result, wherein the first time sequence features represent the change of each full-frame differential image in time sequence; and extracting global image features of one full-frame differential image through each first full-connection sub-network respectively, and carrying out living detection on a user to be detected based on the global image features to obtain first full-connection detection results corresponding to each full-frame differential image respectively.

In the embodiment of the disclosure, after a full-frame differential image sequence is input into a first living body detection sub-model, a first timing sub-network can carry out living body detection on the whole of the full-frame differential image sequence, and through extracting first timing characteristics of full-frame differential images among the sequences, the change of each full-frame differential image in time sequence is detected so as to identify possible abnormal behaviors of a user to be detected in the process of reading target content, and a first timing detection result corresponding to the user to be detected is output; meanwhile, the first full-connection sub-network can respectively carry out living body detection on the whole of each full-frame differential image, the global image characteristics of each full-frame differential image are extracted, living bodies of users to be detected are detected according to the global image characteristics, and the first full-connection detection results corresponding to the full-frame differential images are output.

And A2, fusing the first timing sequence detection result and each first full-connection detection result to obtain a first living body detection result corresponding to the full-image frame sequence.

In the embodiment of the disclosure, the first timing sequence detection result and the first full-connection detection result can be fused, so that the first living body detection result is obtained, and the living body detection precision can be effectively improved through the detection results of the whole sequence and each image frame in the fusion sequence, so that stable detection rate can be maintained in the face of different counterfeiting modes, and the universality and the robustness of living body detection are improved. The method for fusing the first time sequence detection result and the first full-connection detection result can be selected according to the subsequent task processing requirements, operation conditions and the like, for example, the first time sequence detection result and all the first full-connection detection results can be maximized, an average value can be obtained, or a full-connection network can be adopted to further classify based on the first time sequence detection result and all the first full-connection detection results, and a single first living body detection result is output.

Step 208, performing inter-frame difference processing on the face image frame sequence to obtain face frame difference images of any two adjacent face image frames in the face image frame sequence, and determining the face frame difference image sequence.

Step 209, inputting the face frame differential image sequence into a second living body detection sub-model, and performing living body detection on the user to be detected based on the face local area in each face image frame through the second living body detection sub-model to obtain a second living body detection result aiming at the user to be detected.

In the embodiment of the present disclosure, the step 208 to 209 may correspond to the related description of the full image frame sequence in the step 206 to 207, and are not repeated here for avoiding repetition.

In an alternative method embodiment of the present disclosure, the second living body detection sub-model may include a second time sequence sub-network and a second full connection sub-network that are arranged in parallel, and a number of the second full connection sub-networks is equal to a number of face frame differential images in the face frame differential image sequence. Step 209 may include the following steps B1 to B2.

Step B1, extracting second time sequence features of the human face frame differential image sequence diagram through a second time sequence sub-network, and performing living body detection on a user to be detected based on the second time sequence features to obtain a second time sequence detection result, wherein the second time sequence features represent the change of each human face frame differential image in time sequence; and respectively extracting the face image characteristics of one face frame differential image through each second full-connection sub-network, and carrying out living detection on the user to be detected based on the face image characteristics to obtain second full-connection detection results respectively corresponding to each full-frame differential image.

In the embodiment of the disclosure, after the face frame differential image sequence is input into the second living body detection sub-model, the second time sequence sub-network can carry out living body detection on the whole face frame differential image sequence, and through extracting the second time sequence characteristics of the face frame differential images among the sequences, the change of each face frame differential image on time sequence is detected so as to identify the possible abnormal behavior of the user to be detected in the process of reading the target content, and the second time sequence detection result corresponding to the user to be detected is output; meanwhile, the second full-connection sub-network can respectively carry out living body detection on the whole of each face frame differential image, the face image characteristics of each face frame differential image are extracted, living body detection is carried out on a user to be detected according to the face image characteristics, and the second full-connection detection results corresponding to the face frame differential images are output.

And B2, fusing a second time sequence detection result and a second full-connection detection result to obtain a second living body detection result corresponding to the face image frame sequence.

In the embodiment of the disclosure, the step B2 may correspond to the related description referred to in the step A2, and will not be repeated here.

Step 210, fusion processing is carried out on the first living body detection result and the second living body detection result, and a target detection result of a user to be detected is obtained; the fusion processing comprises taking the average value and the maximum value, or carrying out classification detection based on the first living body detection result and the second living body detection result.

In the embodiment of the disclosure, the first living body detection result and the second living body detection result can be fused, so that a target detection result corresponding to the video is obtained. By fusing the detection results of the global area and the local area of the face in the video, the living body detection precision of the video can be further improved, and therefore the universality and the robustness of living body detection are improved. The method for fusing the first living body detection result and the second living body detection result can be selected according to the subsequent task processing requirements, operation conditions and the like, for example, the first living body detection result and the second living body detection result can be subjected to maximum value taking, average value taking or further classification detection based on the first living body detection result and the second living body detection result by adopting a fully-connected network, and a single target detection result is output.

Fig. 3 is a schematic structural diagram of a living body detection model according to an embodiment of the present disclosure, where the living body detection model may be a first living body detection sub-model when training with a full image frame sample sequence and may be a second living body detection sub-model when training with a face image frame sample sequence based on different training data. As shown in fig. 3, the living body detection model may include an inter-frame difference processing module 301, a feature extraction module 302, a living body detection module 303, and a result fusion module 304, wherein the feature extraction module 302 may be composed of a residual neural network, the living body detection module 303 may include one time sequence sub-network 3031, and m full-connection sub-networks 3032, and the time sequence sub-network may be composed of a cyclic neural network. Taking the example that the living detection model comprises a first living detection sub-model, the image frame sequence X ₀ Comprising a sequence of full image frames, a sequence of full image frames X to be extracted from an original video ₀ The first living body detection sub-model is input, and the living body detection step flow is as follows in steps C1 to C4:

step C1, inter-frame difference processing module 301 processes the full-image frame sequence X ₀ Performing inter-frame difference processing to obtain a full-frame difference image sequence X ₁ Full frame differential image sequence X ₁ The method comprises n full-frame differential images, wherein m is more than or equal to n;

step C2, the feature extraction module 302 extracts the full-frame differential image sequence X ₁ Conversion to a high-dimensional feature sequence X ₂ ；

Step C3, the in-vivo detection module 303 is based on the converted high-dimensional feature sequence X ₂ In vivo detection is performed, wherein the detection is based on a high-dimensional feature sequence X by the timing subnetwork 3031 ₂ Performing living body detection on the time sequence information of each frame of full-image frame differential image by each full-connection sub-network 3032 to obtain a first time sequence detection result;

step C4, the result fusion module 304 fuses the first timing detection result and the first full-connection detection result to obtain a full-image frame sequence X ₀ And a corresponding first living body detection result.

Alternatively, the living detection model may include a second living detection sub-model, then the image frame sequence X ₀ Comprises a human face image frame sequence, a human face full image frame sequence X ₀ The second living body detection sub-model is input, and the living body detection step flow is as follows in steps D1 to D4:

step D1, the inter-frame difference processing module 301 processes the face image frame sequence X ₀ Performing inter-frame difference processing to obtain a human face frame difference image sequence X ₁ Human face frame differential image sequence X ₁ The method comprises n face frame differential images, wherein m is more than or equal to n;

step D2, the feature extraction module 302 extracts a face frame differential image sequence X ₁ Conversion to a high-dimensional feature sequence X ₂ ；

Step D3, living body detectionBased on the converted high-dimensional feature sequence X in module 303 ₂ In vivo detection is performed, wherein the detection is based on a high-dimensional feature sequence X by the timing subnetwork 3031 ₂ Performing living body detection on the time sequence information of each frame of face frame differential image by each full-connection sub-network 3032 to obtain a second time sequence detection result, and performing living body detection on each frame of face frame differential image by each full-connection sub-network 3032 to obtain a second full-connection detection result;

step D4, the result fusion module 304 fuses the second time sequence detection result and the second full-connection detection result to obtain a face image frame sequence X ₀ And a corresponding second living body detection result.

Further, the first living body detection result and the second living body detection result can be fused, and a target detection result corresponding to the user to be detected is obtained.

Fig. 4 is a flowchart of steps of a method for training a living body detection model according to an embodiment of the present disclosure, where the living body detection model may include a first living body detection sub-model and a second living body detection sub-model, and may be applied to any of the living body detection methods of fig. 1 and 2. As shown in fig. 4, the training method may include the following steps 401 to 403.

Step 401, acquiring a full image frame sample sequence and a face image frame sample sequence.

In the embodiment of the disclosure, the full-image frame sample sequence and the face image frame sample sequence may correspond to a process of reading target content by any user, where the any user may be a real user or a fake user. If the global image frame sample sequence and the human face image frame sample sequence can be obtained when the real user reads the target content, a positive sample is obtained; the real user can be forged, the global image frame sample sequence and the face image frame sample sequence when the real user reads the target content can be obtained, and the negative sample can be obtained, so that different forged types and the image frame sample sequences corresponding to the real user can be obtained. The manner of collecting the full image frame sample sequence and the face image frame sample sequence may be correspondingly referred to the description of step 101, and will not be repeated here.

The counterfeiting of the real user can comprise deep counterfeiting such as face fusion, tampering and the like, or action driving of a static image based on a face action video of the target content read by the real user, or image splicing and the like; the system can also collect global image frame sample sequences and human face image frame sample sequences collected under the conditions of T-shaped masks, three-dimensional head models, screen shots, paper masks, shielding and the like so as to obtain negative samples of entity data. In the process of acquiring the full-image frame sample sequence and the face image frame sample sequence, different illumination conditions and picture backgrounds can be adjusted, the image frame sequences of users with different identity information are acquired, and in order to balance model performance and training efficiency, the number of the acquired image frame sequences can be determined according to operation conditions. In the embodiment of the disclosure, the forging mode of the negative sample can be selected according to the operation cost and task requirements, and the sample marking can be performed on the full-image frame sample sequence and the face image frame sample sequence according to the acquisition mode or the data type in the acquisition process so as to distinguish positive samples and negative samples.

In an alternative method embodiment of the disclosure, an augmentation process may be performed on a full-image frame sample sequence, a face image frame sample sequence, and the like, for example, an image quality compression process may be performed, or noise, blur, and the like may be added, or image brightness, contrast, chromaticity, and the like may be adjusted, so as to expand the number and types of samples, and improve model detection accuracy and versatility.

In the embodiment of the disclosure, the initial first living body detection sub-model and the initial second living body detection sub-model can be constructed according to task requirements and operation conditions. Because the whole image frame sample sequence is subjected to frame extraction processing, partial information is obtained to represent the whole, and the human face image frame sample sequence is subjected to human face centering processing to obtain partial information of a local human face area, the corresponding initial living body detection model and the initial second living body detection sub-model can adopt simple structures so as to reduce the number of model parameters and not influence the model performance, thereby effectively balancing the detection accuracy and the training deployment cost of the model. The initial first living body detection sub-model and the initial second living body detection sub-model have the same structure, and the input training data are different, so that different first living body detection sub-models and second living body detection sub-models can be obtained through training.

Step 402, inputting the whole image frame sample sequence into an initial first living body detection sub-model, obtaining a third living body detection result output by the initial first living body detection sub-model, and iterating the initial first living body detection sub-model based on the third living body detection result to obtain a first living body detection sub-model.

In the embodiment of the disclosure, the full image frame sample sequence may be input into the initial first living body detection sub-model, so that the initial first living body detection sub-model performs living body detection on the full image frame sample sequence, and a third living body detection result corresponding to the training video is obtained, where the third living body detection result includes that a user corresponding to the full image frame sequence is a real user or a fake user. Based on the above, the initial first living body detection sub-model can be iterated until meeting the convergence condition based on the third living body detection result and the classification marked in the acquisition process of the whole image frame sequence, so as to obtain the first living body detection sub-model. The convergence condition may be that the loss value of the third full-graph detection result relative to the labeling classification is within a preset numerical range, or that the iteration number reaches a preset number, so that the model precision and generalization meet the task requirements, which is not particularly limited in the embodiment of the disclosure.

In an alternative method embodiment of the present disclosure, the initial first living sub-model includes a third time-sequential sub-network and a third fully-connected sub-network arranged in parallel, and the number of third fully-connected sub-networks is equal to the number of full-image frame samples in the sequence of full-image frame samples.

Step 402 may include steps E1 through E3 as follows.

And E1, carrying out inter-frame difference processing on the full-image frame sample sequence to obtain a corresponding full-image frame difference image sample sequence.

In the embodiment of the disclosure, the inter-frame difference processing performed on the whole image frame sample sequence may correspond to the related description of the inter-frame difference processing performed on the whole image frame sequence in the foregoing step 206, and the inter-frame difference processing may reduce the interference of the background information to improve the detection precision and simplify the data volume while preserving the dynamic change information between the whole image frame samples, so as to further simplify the model structure, reduce the parameter number, improve the model training efficiency, and reduce the model deployment cost.

And E2, inputting the full-frame differential image sample sequence into an initial first living body detection sub-model, so that a third time sequence sub-network carries out living body detection on the full-frame differential image sample sequence to obtain a third time sequence detection result, and a third full-connection sub-network carries out living body detection on each second full-frame differential image to obtain a third full-connection detection result corresponding to each second full-frame differential image.

In the embodiment of the disclosure, the initial first living body detection sub-model may include a third time sequence sub-network and a third full connection sub-network, the number of which is matched with the number of the second full frame differential images in the full frame differential image sample sequence. The third time sequence sub-network can perform living body detection based on the whole time sequence information in the sequence, and the third full-connection sub-network can perform living body detection on the second full-frame differential images based on the information such as the image structure and the like, so that a third time sequence detection result of the whole full-frame differential image sample sequence and a third full-connection detection result corresponding to each second full-frame differential image in the full-frame differential image sample sequence can be obtained.

And E3, fusing a third time sequence detection result and a third full-connection detection result to obtain a third living body detection result corresponding to the full-image frame sample sequence.

Further, the third timing sequence detection result of the whole full-frame differential image sample sequence and the third full-connection detection result corresponding to each second full-frame differential image respectively may be fused, and specific reference may be made to the description related to the fusion of the first timing sequence detection result and the first full-connection detection result in the step A2, which is omitted for avoiding repetition.

Step 403, inputting the human face image frame sample sequence into an initial second living body detection sub-model, obtaining a fourth living body detection result output by the initial second living body detection sub-model, and iterating the initial second living body detection sub-model based on the fourth living body detection result to obtain a second living body detection sub-model.

In the embodiment of the present disclosure, the training step of the initial second living body sub-model by using the face image frame sample sequence in step 403 may correspond to the training step related description of the initial first living body sub-model by using the full image frame sample sequence in step 402, and will not be repeated here.

In an alternative method embodiment of the present disclosure, the initial second living body detection sub-model includes a fourth time sequence sub-network and a fourth full connection sub-network that are arranged in parallel, and the number of the fourth full connection sub-networks is equal to the number of face image frame samples in the sequence of face image frame samples. Step 403 may include steps F1 to F3 as follows.

And F1, carrying out inter-frame difference processing on the face image frame sample sequence to obtain a corresponding face frame difference image sample sequence.

In the embodiment of the disclosure, the inter-frame difference processing performed on the face image frame sample sequence may correspond to the related description of the inter-frame difference processing performed on the full image frame sequence in the foregoing step 205, and in order to avoid repetition, a detailed description is omitted here.

And F2, inputting the face frame differential image sample sequence into an initial second living body detection sub-model so that a fourth time sequence sub-network carries out living body detection on the face frame differential image sample sequence to obtain a fourth time sequence detection result, and enabling a fourth full-connection sub-network to carry out living body detection on each second face frame differential image to obtain a fourth full-connection detection result corresponding to each second face frame differential image.

In the embodiment of the disclosure, the initial second living body detection sub-model may include a fourth time sequence sub-network and a fourth full-connection sub-network, the number of which is matched with the number of the second face frame differential images in the face frame differential image sample sequence, where the fourth time sequence sub-network may perform living body detection in the sequence based on the whole time sequence information, and the fourth full-connection sub-network may perform living body detection on the second face frame differential images respectively based on the information such as the image structure, so as to obtain a fourth time sequence detection result of the whole face frame differential image sample sequence and a fourth full-connection detection result corresponding to each second face frame differential image in the face frame differential image sample sequence.

And F3, fusing a fourth timing sequence detection result and a fourth full-connection detection result to obtain a fourth living body detection result corresponding to the human face image frame sample sequence.

Further, the fourth timing detection result of the whole face frame differential image sample sequence and the fourth full-connection detection result corresponding to each second face frame differential image respectively may be fused, and specific reference may be made to the related description of the fused first timing detection result and the first full-connection detection result in the step A2, so that repetition is avoided and detailed description is omitted here.

In the embodiment of the disclosure, the first living body detection sub-model and the second living body detection sub-model are respectively and independently trained, so that two models can respectively achieve expected classification performance for two groups of different inputs, and the problem of dependence among models possibly caused by comprehensive training is avoided. In the subsequent living body detection task, the detection results output by the first living body detection sub-model and the second living body detection sub-model are fused, the partial face information and the full-image structure information are fully utilized, the classification precision is effectively improved, and the generalization and the robustness of the models are enhanced.

Fig. 5 is a block diagram showing the structure of a living body detection apparatus 500 according to an embodiment of the present disclosure. As shown in fig. 5, the apparatus may include: the sequence acquisition module 501 acquires a full image frame sequence and a face image frame sequence when a user to be detected reads target content, wherein each full image frame in the full image frame sequence comprises a global area when the user to be detected reads the target content, and each face image frame in the face image frame sequence comprises a face local area in the process of the user to be detected reading the target content; the full-image detection module 502 is configured to input a full-image frame sequence into a first living body detection sub-model, perform living body detection on a user to be detected based on a global area in each full-image frame through the first living body detection sub-model, and obtain a first living body detection result for the user to be detected; the face detection module 503 inputs the face image frame sequence into a second living body detection sub-model, and carries out living body detection on the user to be detected based on the face local area in each face image frame through the second living body detection sub-model to obtain a second living body detection result aiming at the user to be detected; the result determining module 504 is configured to determine a target detection result of the user to be detected based on the first living body detection result and the second living body detection result.

In an optional device embodiment of the present disclosure, a sequence obtaining module 501 is specifically configured to perform an inter-frame difference processing on a full-image frame sequence, obtain a full-image frame differential image of any two adjacent full-image frames in the full-image frame sequence, and determine the full-image frame differential image sequence; correspondingly, the full-image detection module 502 is specifically configured to input the full-image frame differential image sequence into the first living body detection sub-model.

In an optional device embodiment of the present disclosure, a sequence obtaining module 501 is specifically configured to perform an inter-frame difference processing on a face image frame sequence, obtain a face frame differential image of any two adjacent face image frames in the face image frame sequence, and determine a face frame differential image sequence; correspondingly, the face detection module 503 is specifically configured to input the face frame differential image sequence into the second living body detection sub-model.

In an optional device embodiment of the present disclosure, the first living body detection sub-model includes a first time sequence sub-network and a first fully connected sub-network, where the number of the first fully connected sub-networks is equal to the number of the full-frame differential images in the full-frame differential image sequence, and the full-frame detection module 502 is specifically configured to extract a first time sequence feature of the full-frame differential image sequence through the first time sequence sub-network, and perform living body detection on a user to be detected based on the first time sequence feature to obtain a first time sequence detection result, where the first time sequence feature characterizes a change in time sequence of each full-frame differential image; extracting global image characteristics of a full-image frame differential image through each first full-connection sub-network respectively, and carrying out living detection on a user to be detected based on the global image characteristics to obtain first full-connection detection results corresponding to each full-image frame differential image respectively; and fusing the first timing sequence detection result and each first full-connection detection result to obtain a first living body detection result corresponding to the full-image frame sequence.

In an alternative embodiment of the present disclosure, the second living body detection sub-model includes a second time sequence sub-network and a second fully connected sub-network, where the number of the second fully connected sub-networks is equal to the number of the face frame differential images in the face frame differential image sequence, and the face detection module 503 is specifically configured to extract, through the second time sequence sub-network, a second time sequence feature of the face frame differential image sequence chart, and perform living body detection on a user to be detected based on the second time sequence feature, to obtain a second time sequence detection result, where the second time sequence feature characterizes a change in time sequence of each face frame differential image; extracting face image characteristics of a face frame differential image through each second full-connection sub-network respectively, and carrying out living detection on a user to be detected based on the face image characteristics to obtain second full-connection detection results corresponding to each full-frame differential image respectively; and fusing the second time sequence detection result and the second full-connection detection result to obtain a second living body detection result corresponding to the human face image frame sequence.

In an optional device embodiment of the present disclosure, the sequence obtaining module 501 is specifically configured to obtain an original video when a user to be tested reads a target content; performing frame extraction processing on an original video to obtain a full-image frame sequence; determining key points of human faces in each full image frame respectively; respectively carrying out mean value processing on the face key points of each full image frame to obtain registration key points of each full image frame; and based on the registration key points, carrying out centering processing on the local areas of the human faces in all the image frames to obtain corresponding human face image frames so as to determine a human face image frame sequence.

In an optional device embodiment of the present disclosure, the result determining module 504 is specifically configured to perform fusion processing on the first living body detection result and the second living body detection result, so as to obtain a target detection result of the user to be detected; the fusion processing comprises taking the average value and the maximum value, or carrying out classification detection based on the first living body detection result and the second living body detection result.

In the living body detection device provided by the disclosure, a full image frame sequence and a human face image frame sequence which are obtained when a user to be detected reads target content are obtained, wherein each full image frame in the full image frame sequence comprises a global area when the user to be detected reads the target content, and each human face image frame in the human face image frame sequence comprises a human face local area in the process of the user to be detected reading the target content; and inputting the full-image frame sequence into a first living body detection sub-model of the first living body detection model, inputting the human face image frame sequence into a second living body detection sub-model of the second living body detection model, respectively obtaining and outputting a first living body detection result and a second living body detection result, and obtaining a target detection result of the user to be detected based on the first living body detection result and the second living body detection result. In the embodiment of the disclosure, the living body detection is realized based on the frame sequence of the full-image and the human face image, so that the operation amount of model training can be simplified, and the complicated process of converting the image frequency domain information is reduced; and the image global area, the human face local area and the like are comprehensively and fully detected, so that the obtained target detection result can fully represent the image information, the classification precision of living body detection is effectively improved, the global information and the local information are not combined at the same time in different counterfeiting modes for detection, and the generalization and the robust stability of counterfeiting detection are also ensured.

Fig. 6 is a block diagram of a device for training a living body detection model according to an embodiment of the present disclosure, where the living body detection model obtained by training the device may include a first living body detection sub-model and a second living body detection sub-model, and may be applied to the execution of the living body detection device described in fig. 5. As shown in fig. 6, the apparatus may include: the sample sequence obtaining module 601 is configured to obtain a full-image frame sample sequence and a face image frame sample sequence; the first model training module 602 inputs the whole image frame sample sequence into an initial first living body detection sub-model, acquires a third living body detection result output by the initial first living body detection sub-model, and iterates the initial first living body detection sub-model based on the third living body detection result to acquire a first living body detection sub-model; and a second model training module 602, configured to input the face image frame sample sequence into an initial second living body detection sub-model, obtain a fourth living body detection result output by the initial second living body detection sub-model, and iterate the initial second living body detection sub-model based on the fourth living body detection result, to obtain a second living body detection sub-model.

In the embodiment of the disclosure, the first living body detection sub-model and the second living body detection sub-model are respectively and independently trained, so that two models can respectively achieve expected classification performance for two groups of different inputs, and the problem that dependence is generated between the models possibly caused by comprehensive training is avoided. In the subsequent living body detection task, the detection results output by the first living body detection sub-model and the second living body detection sub-model are fused, the partial face information and the full-image structure information are fully utilized, the classification precision is effectively improved, and the generalization and the robustness of the models are enhanced.

Fig. 7 is a schematic structural diagram of an electronic device 700 according to an embodiment of the present disclosure, as shown in fig. 7, the electronic device 700 may include a processor 701, a memory 702, and a program or an instruction stored in the memory 702 and capable of running on the processor 701, where the program or the instruction implements each process of the above embodiment of the living body detection method when executed by the processor 701, and the process may achieve the same technical effects, and for avoiding repetition, a detailed description is omitted herein.

It should be noted that, the electronic device 700 shown in fig. 7 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present disclosure.

Fig. 8 is a hardware schematic diagram of an electronic device 800 according to an embodiment of the present disclosure, and as shown in fig. 8, the electronic device 800 includes a central processing unit (Central Processing Unit, CPU) 801, which may perform various appropriate actions and processes according to a program stored in a ROM (Read Only Memory) 802 or a program loaded from a storage portion 808 into a RAM (Random Access Memory ) 803. In the RAM 803, various programs and data required for system operation are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other by a bus 804. An I/O (Input/Output) interface 805 is also connected to bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a CRT (Cathode Ray Tube), an LCD (Liquid Crystal Display ), and the like, and a speaker, and the like; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN (Local Area Network, wireless network) card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.

In particular, according to embodiments of the present disclosure, the processes described below with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. When executed by a central processing unit (CPU 801), performs the various functions defined in the system of the present application.

The embodiment of the present disclosure further provides a readable storage medium, on which a program or an instruction is stored, where the program or the instruction when executed by a processor implements each process of the embodiment of the living body detection method, and the same technical effects can be achieved, and in order to avoid repetition, a description is omitted herein.

The processor is a processor in the electronic device in the above embodiment. A readable storage medium includes a computer readable storage medium such as ROM, RAM, magnetic or optical disk, etc.

The embodiment of the disclosure further provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to run a program or instructions, implement each process of the embodiment of the living body detection method, and achieve the same technical effects, so that repetition is avoided, and no further description is given here.

It should be understood that the chips referred to in the embodiments of the present disclosure may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present disclosure is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present disclosure may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, an electronic device, an air conditioner, or a network device, etc.) to perform the method of the embodiments of the present disclosure.

The embodiments of the present disclosure have been described above with reference to the accompanying drawings, but the present disclosure is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the disclosure and the scope of the claims, which are all within the protection of the present disclosure.

Claims

1. A method of in vivo detection, the method comprising:

acquiring a full image frame sequence and a human face image frame sequence when a user to be detected reads target content, wherein each full image frame in the full image frame sequence comprises a global area when the user to be detected reads the target content, and each human face image frame in the human face image frame sequence comprises a human face local area in the process of the user to be detected reading the target content;

inputting the full image frame sequence into a first living body detection sub-model, and carrying out living body detection on the user to be detected based on a global area in each full image frame through the first living body detection sub-model to obtain a first living body detection result aiming at the user to be detected; inputting the human face image frame sequence into a second living body detection sub-model, and carrying out living body detection on the user to be detected based on the human face local area in each human face image frame through the second living body detection sub-model to obtain a second living body detection result aiming at the user to be detected;

and determining a target detection result of the user to be detected based on the first living body detection result and the second living body detection result.

2. The method of claim 1, wherein before inputting the full image frame sequence into the first living sub-model, further comprising:

performing inter-frame difference processing on the full image frame sequence to obtain full image frame differential images of any two adjacent full image frames in the full image frame sequence, and determining the full image frame differential image sequence;

accordingly, inputting the full image frame sequence into a first living being detection submodel, comprising:

the full frame differential image sequence is input to the first living body sub-model.

3. The method according to claim 1 or 2, wherein before said inputting said sequence of face image frames into a second living being detection sub-model, further comprising:

performing inter-frame difference processing on the face image frame sequence to obtain face frame difference images of any two adjacent face image frames in the face image frame sequence, and determining a face frame difference image sequence;

correspondingly, inputting the sequence of facial image frames into a second living body detection sub-model, comprising:

and inputting the human face frame differential image sequence into the second living body detection sub-model.

4. The method according to claim 2, wherein the first living body detection sub-model includes a first time sequence sub-network and a first fully connected sub-network which are arranged in parallel, the number of the first fully connected sub-networks is equal to the number of the full-frame differential images in the full-frame differential image sequence, the living body detection is performed on the user to be detected based on a global area in each full-image frame by the first living body detection sub-model, and a first living body detection result for the user to be detected is obtained, including:

Extracting first time sequence features of the full-frame differential image sequence through the first time sequence sub-network, and performing living detection on the user to be detected based on the first time sequence features to obtain a first time sequence detection result, wherein the first time sequence features represent the change of each full-frame differential image in time sequence; extracting global image features of one full-image frame differential image through each first full-connection sub-network respectively, and performing living detection on the user to be detected based on the global image features to obtain first full-connection detection results corresponding to each full-image frame differential image respectively;

and fusing the first timing sequence detection result and each first full-connection detection result to obtain the first living body detection result corresponding to the full-image frame sequence.

5. The method according to claim 3, wherein the second living body detection sub-model includes a second time sequence sub-network and a second full-connection sub-network, the number of the second full-connection sub-networks is equal to the number of the face frame differential images in the face frame differential image sequence, the living body detection is performed on the user to be detected based on the face local area in each face image frame by the second living body detection sub-model, and a second living body detection result for the user to be detected is obtained, including:

Extracting a second time sequence characteristic of the human face frame differential image sequence chart through the second time sequence sub-network, and performing living detection on the user to be detected based on the second time sequence characteristic to obtain a second time sequence detection result, wherein the second time sequence characteristic represents the change of each human face frame differential image in time sequence; extracting face image characteristics of one face frame differential image through each second full-connection sub-network respectively, and carrying out living detection on the user to be detected based on the face image characteristics to obtain second full-connection detection results corresponding to each full-frame differential image respectively;

and fusing the second time sequence detection result and the second full-connection detection result to obtain the second living body detection result corresponding to the face image frame sequence.

6. The method according to claim 1, wherein the acquiring the full image frame sequence and the face image frame sequence when the user to be measured reads the target content includes:

acquiring an original video when a user to be tested reads target content;

performing frame extraction processing on the original video to obtain a full-image frame sequence;

determining face key points in each full image frame respectively;

Respectively carrying out mean value processing on the face key points of each full image frame to obtain registration key points of each full image frame;

and based on the registration key points, carrying out centering processing on the partial areas of the human face in each full image frame to obtain corresponding human face image frames so as to determine a human face image frame sequence.

7. The method of claim 1, wherein the determining the target detection result of the user to be detected based on the first living detection result and the second living detection result comprises:

performing fusion processing on the first living body detection result and the second living body detection result to obtain a target detection result of the user to be detected; the fusion processing comprises taking a mean value and a maximum value, or performing classification detection based on the first living detection result and the second living detection result.

8. The method of claim 1, wherein the training steps of the first living sub-model and the second living sub-model are as follows:

acquiring a full-image frame sample sequence and a face image frame sample sequence;

inputting the full-image frame sample sequence into an initial first living body detection sub-model, obtaining a third living body detection result output by the initial first living body detection sub-model, and iterating the initial first living body detection sub-model based on the third living body detection result to obtain the first living body detection sub-model; the method comprises the steps of,

And inputting the human face image frame sample sequence into the initial second living body detection sub-model, obtaining a fourth living body detection result output by the initial second living body detection sub-model, and iterating the initial second living body detection sub-model based on the fourth living body detection result to obtain the second living body detection sub-model.

9. An electronic device, the electronic device comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the in-vivo detection method of any one of claims 1-8 via execution of the executable instructions.

10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the living body detection method according to any of claims 1-8.

11. A computer program product, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the living body detection method according to any of claims 1-8.