CN113486829B

CN113486829B - Face living body detection method and device, electronic equipment and storage medium

Info

Publication number: CN113486829B
Application number: CN202110793445.XA
Authority: CN
Inventors: 俞颖超; 周秋生
Original assignee: Jingdong Technology Holding Co Ltd
Current assignee: Jingdong Technology Holding Co Ltd
Priority date: 2021-07-15
Filing date: 2021-07-15
Publication date: 2023-11-07
Anticipated expiration: 2041-07-15
Also published as: CN113486829A

Abstract

The application discloses a face living body detection method, a face living body detection device, electronic equipment and a storage medium. The specific scheme comprises the following steps: acquiring M frames of face images in a face video, and respectively carrying out face alignment on the M frames of face images to obtain M frames of face images; m is a positive integer greater than 1; recombining the M frames of facial images to obtain a plurality of groups of facial image sequences; each group of facial image sequences comprises N facial images, wherein N is a positive integer greater than 1, and N is less than or equal to M; respectively inputting the multiple groups of facial image sequences into a preset facial living body detection model to obtain multiple detection results; and determining a living body detection result of the human face according to the detection results. The application can avoid repeated training of the model, reduce the cost of model training, improve the accuracy of the detection result and reduce the jitter influence of the model.

Description

Face living body detection method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image processing, and in particular, to the field of deep learning and computer vision, and more particularly, to a method and apparatus for detecting a human face in vivo, an electronic device, and a storage medium.

Background

Living body detection techniques are often applied to face recognition systems. The technology can distinguish whether the face image acquired by the camera is from a real face or not, and avoid a series of losses caused by malicious attack of the face recognition system. In recent years, due to the rapid development of deep learning technology, living body detection technology has made remarkable progress, and is successfully applied to many face recognition system scenes, such as face payment, face security check, video monitoring and the like. However, the generalization of the current living body detection model is low, and the continuous model updating can cause high cost.

Disclosure of Invention

The application provides a method, a device, electronic equipment and a storage medium for human face living body detection.

According to a first aspect of the present application, there is provided a face in-vivo detection method, comprising:

acquiring M frames of face images in a face video, and respectively carrying out face alignment on the M frames of face images to obtain M frames of face images; m is a positive integer greater than 1;

recombining the M frames of facial images to obtain a plurality of groups of facial image sequences; each group of facial image sequences comprises N facial images, wherein N is a positive integer greater than 1, and N is less than or equal to M;

respectively inputting the multiple groups of facial image sequences into a preset facial living body detection model to obtain multiple detection results;

and determining a living body detection result of the human face according to the detection results.

In some embodiments of the present application, the reorganizing the M frames of face images to obtain a plurality of sets of face image sequences includes:

the M frames of face images are subjected to time sequence combination to obtainA seed combination;

the saidThe facial images in the seed combination are respectively sequenced to obtain +.>And assembling the face image sequence.

In the embodiment of the application, the human face living body detection model comprises a feature extraction module, a downsampling module and a normalization module; the step of respectively inputting the plurality of groups of facial image sequences into a preset facial living body detection model to obtain a plurality of detection results comprises the following steps:

inputting each group of facial image sequences into the feature extraction module for feature extraction to obtain space-time features of each group of facial image sequences;

inputting the space-time characteristics of each group of facial image sequences to the downsampling module for downsampling operation to obtain first characteristic information of each group of facial image sequences;

and inputting the first characteristic information of each group of facial image sequences to the normalization module for normalization processing to obtain the detection result of each group of facial image sequences.

In an embodiment of the present application, the determining, according to the plurality of detection results, a living body detection result of the face includes:

and carrying out averaging calculation on the detection results, and taking the calculation result as a living body detection result of the human face.

Further, in some embodiments of the present application, the face living body detection method further includes: and identifying whether the human face is a living body according to the living body detection result of the human face.

Wherein each of the detection result and the living body detection result is a two-dimensional array; one dimension of the two-dimensional array represents the probability that the face is a living body, and the other dimension represents the probability that the face is not a living body.

According to a second aspect of the present application, there is provided a face living body detection apparatus comprising:

the acquisition module is used for acquiring M frames of face images in the face video, and respectively carrying out face alignment on the M frames of face images to obtain M frames of face images; m is a positive integer greater than 1;

the reorganization module is used for reorganizing the M frames of facial images to obtain a plurality of groups of facial image sequences; each group of facial image sequences comprises N facial images, wherein N is a positive integer greater than 1, and N is less than or equal to M;

the detection module is used for respectively inputting the multiple groups of facial image sequences into a preset facial living body detection model to obtain multiple detection results;

and the determining module is used for determining the living body detection result of the human face according to the detection results.

In some embodiments of the application, the reorganization module is specifically configured to:

sequencing the facial images of each group of the combined seeds to obtainAnd assembling the face image sequence.

In some embodiments of the present application, the face living body detection model includes a feature extraction module, a downsampling module, and a normalization module; wherein, the detection module is specifically used for:

In some embodiments of the present application, the determining module is specifically configured to:

Optionally, in an embodiment of the present application, the face living body detection apparatus further includes:

and the identification module is used for identifying whether the human face is a living body or not according to the living body detection result of the human face.

Wherein the detection results and the living body detection results of the human face are two-dimensional arrays; one dimension of the two-dimensional array represents the probability that the face is a living body, and the other dimension represents the probability that the face is not a living body.

According to a third aspect of the present application, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect described above.

According to a fourth aspect of the present application, a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of the first aspect described above is presented.

According to the technical scheme of the application, a plurality of groups of face image sequences with fixed lengths are obtained by recombining the multi-frame face images, and face living detection is carried out according to the face image sequences with fixed lengths, namely, when the frame number of the face images changes, the images input into the model are still the fixed frame number, so that the face living detection model does not need retraining, the applicability of the model can be improved, and the cost of model training is reduced. In addition, the living body detection result of the human face is determined according to the detection results of the plurality of groups of human face image sequences, so that the jitter influence of the model can be reduced, and the accuracy of the detection result is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

fig. 1 is a flowchart of a face living body detection method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of key points of a face in an embodiment of the present application;

FIG. 3 is a flowchart of a multi-frame face image reorganization in an embodiment of the present application;

FIG. 4 is an exemplary diagram of a multi-frame face image reorganization in an embodiment of the present application;

FIG. 5 is a flowchart of the detection of a face living body detection model in an embodiment of the present application;

FIG. 6 is a block diagram of a face living body detection method in an embodiment of the present application;

fig. 7 is a block diagram of a face living body detection apparatus according to an embodiment of the present application;

fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the face living body detection technology is often applied to a face recognition system. The technology can distinguish whether the face image acquired by the camera is from a real face or not, and avoid a series of losses caused by malicious attack of the face recognition system. In a realistic face recognition system scenario, physical medium-based attacks are most common. Such attacks are typically face attacks that an attacker presents through media such as paper, screens, etc. For example, an attacker may use a printed photograph of a legitimate user or may use an electronic device screen to present an electronic photograph of a legitimate user to attack the face recognition system. Still other attackers attack face recognition systems by making realistic three-dimensional head model prostheses using legitimate user long-forms.

For these attack approaches based on physical media, the more common differentiation criteria at present are: 1) The characteristic distinction of reflected light rays by using different physical media is utilized; 2) And utilizing the distortion characteristics of the face color gamut in the attack medium. For the case of 1), since the attack methodology based on the physical medium is subject to secondary imaging, the face of the secondary imaging will have characteristics specific to the attack medium. For example, paper attacks may have some paper fiber texture in the secondary imaging face, and electronic screens may be used to present attacks with moire, etc. in the secondary imaging face. For the basis 2), in the process of printing the face photo or displaying the electronic screen, the color space of the printed photo and the electronic photo is different from that of a true person, and the characteristic can also be used as the basis for distinguishing the true face from the false face.

With the continuous development of technology, based on the attack technique and basis of the physical medium, the existing solution comprises a multi-frame face living detection scheme of video, and the scheme not only models single-frame face characteristics, but also extracts time sequence characteristic relations among the multi-frame faces, has more information content and better detection effect. However, the existing multi-frame human face living body detection scheme generally needs to fix the number of input image frames, and when the length of the input frames changes, the model needs to be updated again, so that the cost is high. In addition, the conventional multi-frame human face living detection scheme based on video generally uses a 3D convolutional neural network model, the model has more parameters, the over fitting is easy to cause in training, and meanwhile, the model trained under the condition of less data quantity has larger jitter and is not robust in performance.

Based on the above problems, the application provides a face living body detection method, a face living body detection device, electronic equipment and a storage medium.

Fig. 1 is a flowchart of a face living body detection method according to an embodiment of the present application. It should be noted that, the face living body detection method of the embodiment of the present application may be applied to the face living body detection apparatus of the embodiment of the present application, and the face living body detection apparatus may be configured to an electronic device. As shown in fig. 1, the face living body detection method includes the following steps:

step 101, obtaining M frames of face images in a face video, and respectively carrying out face alignment on the M frames of face images to obtain M frames of face images; m is a positive integer greater than 1.

It should be noted that the face living body detection method can be applied to a face recognition system. The M-frame face images in the face video may be a series of face images obtained according to a certain frame rate in the process of recording the face video by the face recognition system, or may be obtained by other modes, which is not limited in the application.

In order to locate each part of the face, extract the corresponding facial feature, improve the accuracy of the model output result at the same time, before carrying on the living body detection, need to carry on the alignment processing of the face separately to M frames of facial images obtained. The face alignment process may be implemented by using the existing technology, and in the embodiment of the present application, description is given by taking a RetinaFace face key point detection algorithm as an example, as follows: five face key points corresponding to each face image are obtained through a Retinaface algorithm on the obtained M frames of face images, as shown in fig. 2, the five key points are respectively left eye, right eye, nose, left mouth angle and right mouth angle, and then the faces are aligned according to the five key points obtained by each image, so that M frames of face images are obtained. The main principle of face alignment is as follows: assuming that five key point matrixes before alignment are A and five key point matrixes after alignment are B, the aim is to find a transformation matrix omega so that A is closest to B after omega transformation. This process can be expressed as r=argmin _Ω ‖ΩA-B‖ _F Wherein Ω satisfies the condition Ω ^T Ω＝，‖·‖ _F Is the Frobenius norm (Frobenius norm) of French Luo Beini. The solution for R is as follows, r=uv ^T ，M＝U∑V ^T ，M＝BA ^T By comparison with BA ^T Singular value decomposition is performed to obtain unitary matrix U, V, which is finally passed through UV ^T An affine transformation matrix is obtained.

Step 102, reorganizing the M frames of facial images to obtain a plurality of groups of facial image sequences.

Each group of facial image sequences includes N facial images, where N is a positive integer greater than 1 and N is less than or equal to M. That is, the length of the face image sequence obtained after the recombination is fixed, and the number of face images in the face image sequence after the recombination is less than or equal to the number of face images of the original face.

It can be understood that the recombination of the multiple frames of facial images into a fixed-length facial image sequence is equivalent to unifying the multiple frames of facial images into a fixed-frame facial image, so that the living body detection of the face is not affected by the frame number of the image. In addition, the M frames of facial images are recombined into a plurality of groups of facial image sequences, wherein each group of facial image sequences are different from each other, and if the plurality of groups of facial image sequences are subjected to facial living detection, the model can learn more information, so that the accuracy of facial living detection can be increased.

In the embodiment of the application, each group of facial image sequences comprises N facial images, wherein the numerical value of N is consistent with the number of facial image frames used in training a facial living body detection model, so that the number of facial image frames to be detected is consistent with the number of image frames corresponding to the detection of the model. That is, the human face living body detection model is not continuously updated along with the change of the number of human face image frames, so that the cost of model training can be reduced, and the applicability of the model is improved. The N corresponding value may be set according to the actual situation, which is not limited in the present application.

In the embodiment of the application, the mode of reorganizing the M frames of face images can be a mode of permutation and combination, a mode of random combination or other reorganizing modes capable of obtaining a plurality of groups of face image sequences, and the obtained groups of face image sequences are different from each other.

If the number of frames of the acquired face image is less than N due to transmission failure or other abnormal conditions, the number of corresponding frames can be randomly selected from the processed face image to be supplemented, so that the occurrence of face living detection failure caused by the abnormal conditions is avoided.

Step 103, respectively inputting a plurality of groups of facial image sequences into a preset facial living body detection model to obtain a plurality of detection results.

That is, after each group of face image sequences is input to a preset face living body detection model, a detection result is obtained, which is equivalent to that the face living body detection model can learn more information according to the input groups of face image sequences, so that the face images of the original M frames obtain a plurality of detection results.

In the embodiment of the application, the preset human face living body detection model can perform feature extraction through convolution operation according to the input human face image sequence, learn according to the extracted feature information and output a corresponding detection result. The preset human face living body detection model is trained in advance.

Step 104, determining the living body detection result of the human face according to the detection results.

It can be understood that, because the multiple groups of facial image sequences can carry more information than the original M frames of facial images, the model can extract more features and output multiple detection results, so that the living body detection result of the face can be determined more accurately according to the multiple detection results.

It should be noted that, the manner of determining the living body detection result of the face according to the plurality of detection results may be to perform fusion processing on the plurality of detection results, for example: the averaging method, the weighted averaging method, or other methods, the embodiment of the present application is described by taking the averaging method as an example, including: and carrying out averaging calculation on the detection results, and taking the calculation result as a human face living body detection result.

In the embodiment of the application, each detection result and the final face living body detection result can be in the form of a two-dimensional array, wherein one dimension of the two-dimensional array represents the probability that the face is a living body, and the other dimension represents the probability that the face is not a living body. Therefore, whether the face is a living body can be identified according to the living body detection result of the face, for example, a threshold value can be set, if the probability of representing the face as the living body exceeds the threshold value, the face is indicated as the living body, and otherwise, the face is not the living body.

According to the human face living body detection method provided by the embodiment of the application, a plurality of groups of human face image sequences with fixed lengths are obtained by recombining a plurality of frames of human face images, and human face living body detection is carried out according to the human face image sequences with fixed lengths, namely, when the frame number of the human face images changes, the images input into the model are still the fixed frame number, so that the human face living body detection model does not need retraining, the applicability of the model can be improved, and the cost of model training is reduced. In addition, the living body detection result of the human face is determined according to the detection results of the plurality of groups of human face image sequences, so that the jitter influence of the model can be reduced, and the accuracy of the detection result is improved.

Based on the above embodiment, in order to make the facial image sequences obtained by recombining the multi-frame facial images as many as possible, and further make the face living body detection model able to extract more features, another embodiment is provided for recombining the multi-frame facial images.

Fig. 3 is a flowchart of a multi-frame face image reorganization according to an embodiment of the present application. As shown in fig. 3, the implementation manner of reorganizing M frames of face images to obtain a plurality of groups of face image sequences may be:

step 301, performing time sequence combination on M frames of face images to obtainA combination.

That is, N face images are taken out of the M face images and are combined into one group, thereby obtainingDifferent combinations are provided, each of which includes N facial images.

As an example, if m=4, n=3, and M face images x ₀ = (f 1, f2, f3, f 4), wherein f1, f2, f3, f4 are 4 face images respectively, then (f 1, f2, f 3), (f 2, f3, f 4), (f 1, f2, f 4), (f 1, f3, f 4) combinations, that isA combination. Wherein each combination does not take into account the order between the images.

Step 302, will beThe facial images in the seed combination are respectively sequenced to obtain +.>And assembling a facial image sequence.

It can be understood that the information carried by the facial image sequences composed of facial images in different orders is different, so that the feature information extracted by the facial living body detection model is also different, so that in order to enable the facial living body detection model to learn more information, the robustness of the model is improved, the facial images in each combination can be respectively ordered, and as many facial image sequences as possible are obtained.

In the embodiment of the application, N facial images in each combination are respectively sequenced to obtain the facial images in each combinationGroup different sequences of facial images of faces, thus +.>The seed combination can be obtained->And (3) different facial image sequences of the face are assembled. Based on the above example, as shown in fig. 4, the face images in four combinations (f 1, f2, f 3), (f 2, f3, f 4), (f 1, f2, f 4), (f 1, f3, f 4) are respectively sorted, wherein the face images in each combination may have ++>The species are ordered and there is a total of->And assembling a facial image sequence.

According to the human face living body detection method provided by the embodiment of the application, the multi-frame human face images are firstly combined in a time sequence, and then the human face images in each combination are respectively sequenced, which is equivalent to the arrangement and combination of the multi-frame human face images, so that a human face image sequence with the fixed length as much as possible is obtained, more characteristic information can be extracted from a human face living body detection model, and the accuracy of model detection is further improved.

In order to further describe the detection flow of the face living body detection model in the above embodiment in detail, another embodiment of the present application is provided for the detection flow of the face living body detection model.

Fig. 5 is a detection flow chart of a human face living body detection model according to an embodiment of the present application. The network structure shown in fig. 6, the face living body detection model includes a feature extraction module, a downsampling module and a normalization module, wherein the feature extraction module can be a lightweight model based on an R2puls1d model. As shown in fig. 5, the step of inputting values of the plurality of sets of face image sequences to the face living body detection model, respectively, and obtaining a plurality of detection results may include:

step 501, inputting each group of facial image sequences to a feature extraction module for feature extraction to obtain space-time features of each group of facial image sequences.

It can be understood that, since each group of facial image sequences has N frames of facial images, after the facial image sequences are input to the feature extraction module, the feature extraction module may extract the spatiotemporal features of the corresponding N frames of facial images through convolution operation, thereby obtaining the spatiotemporal features of each group of facial image sequences.

Step 502, the space-time characteristics of each group of facial image sequences are input to a downsampling module to perform downsampling operation, so as to obtain first characteristic information of each group of facial image sequences.

The feature information dimension can be reduced by the downsampling operation, the parameter quantity of network learning is reduced, the occurrence of over fitting is avoided, the perception field can be enlarged, and the effect of network learning is improved, so that the feature information can be downsampled after the convolution operation.

In the embodiment of the application, the space-time feature is subjected to the space-time dimension downsampling operation, so that the number of model parameters can be detected, the occurrence of over-fitting is avoided, and the calculated amount of network learning is reduced. Thus, first characteristic information of each group of facial image sequences is obtained, wherein the first characteristic information is 2-dimensional characteristic.

Step 503, inputting the first feature information of each group of facial image sequences to a normalization module for normalization processing, so as to obtain the detection result of each group of facial image sequences.

It will be appreciated that normalizing the first feature information of each set of face image sequences corresponds to mapping the output of the neural network into the value range of (0, 1), so that the output result may correspond to a probability for indicating the probability that the face detection result is a living body or not.

In the embodiment of the application, the normalization module can use a softmax function, and as the function introduces an index, the original large value is larger, the original small value is smaller, and the characteristic distinguishing contrast is increased, so that the detection result of each group of facial image sequences can be obtained more efficiently. The normalization module may use other existing functions according to actual situations, which is not limited in the present application.

According to the human face living body detection method provided by the embodiment of the application, the human face living body detection model is divided into the feature extraction module, the downsampling module and the normalization module, so that the calculated amount of model parameters and a network model can be effectively reduced, the risk of overfitting in model calculation can be reduced, and meanwhile, the efficiency of model calculation can be improved, thereby improving the efficiency and accuracy of human face living body detection.

In order to achieve the above embodiments, the present application proposes a face living body detection apparatus.

Fig. 7 is a block diagram of a face living body detection apparatus according to an embodiment of the present application. As shown in fig. 7, the face biopsy device includes:

the acquiring module 701 is configured to acquire M frame face images in a face video, and respectively align the M frame face images to obtain M frame face images; m is a positive integer greater than 1;

the reorganization module 702 is configured to reorganize M frames of facial images to obtain a plurality of groups of facial image sequences; each group of facial image sequences comprises N facial images, wherein N is a positive integer greater than 1, and N is less than or equal to M;

the detection module 703 is configured to input a plurality of groups of facial image sequences into a preset facial living body detection model, respectively, to obtain a plurality of detection results;

a determining module 704, configured to determine a living body detection result of the face according to the multiple detection results.

In some embodiments of the present application, the reorganization module 702 is specifically configured to:

time sequence combination is carried out on the M frames of face images to obtainA seed combination;

sequencing face images of each group of combined seeds to obtainAnd assembling a facial image sequence.

In some embodiments of the present application, a face living body detection model includes a feature extraction module, a downsampling module, and a normalization module; the detection module 703 is specifically configured to:

inputting each group of facial image sequences into a feature extraction module for feature extraction to obtain space-time features of each group of facial image sequences;

inputting the space-time characteristics of each group of facial image sequences into a downsampling module for downsampling operation to obtain first characteristic information of each group of facial image sequences;

and inputting the first characteristic information of each group of facial image sequences to a normalization module for normalization processing to obtain the detection result of each group of facial image sequences.

In some embodiments of the present application, the determining module 704 is specifically configured to:

and carrying out averaging calculation on the plurality of detection results, and taking the calculation result as a living body detection result of the human face.

the identifying module 705 is configured to identify whether the face is a living body according to a living body detection result of the face.

Wherein, the detection results and the living body detection results of the human face are two-dimensional arrays; one of the two-dimensional arrays represents the probability that the face is a living body, and the other represents the probability that the face is not a living body.

According to the human face living body detection device provided by the embodiment of the application, a plurality of groups of human face image sequences with fixed lengths are obtained by recombining a plurality of frames of human face images, and human face living body detection is carried out according to the human face image sequences with fixed lengths, namely, when the number of frames of the human face images changes, the images input into the model are still the fixed number of frames, so that the human face living body detection model does not need retraining, the applicability of the model can be improved, and the cost of model training is reduced. In addition, the living body detection result of the human face is determined according to the detection results of the plurality of groups of human face image sequences, so that the jitter influence of the model can be reduced, and the accuracy of the detection result is improved.

According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.

Fig. 8 is a block diagram of an electronic device 800, according to an example embodiment. For example, electronic device 800 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 8, the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 88, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a touch-sensitive display screen between the electronic device 800 and the user that provides an output interface. In some embodiments, the touch display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the electronic device 800. For example, the sensor assembly 814 may detect an on/off state of the electronic device 800, a relative positioning of the components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of a user's contact with the electronic device 800, an orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the electronic device 800 and other devices, either wired or wireless. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for performing the above-described face in vivo detection method.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of electronic device 800 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

A non-transitory computer readable storage medium, which when executed by a processor of the electronic device 800, causes the electronic device 800 to perform the face in-vivo detection method described above.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A face living body detection method, characterized by comprising:

2. The method of claim 1, wherein the reorganizing the M frames of face images to obtain a plurality of sets of face image sequences, comprises:

3. The method of claim 1, wherein the face in-vivo detection model comprises a feature extraction module, a downsampling module, and a normalization module; the step of respectively inputting the plurality of groups of facial image sequences into a preset facial living body detection model to obtain a plurality of detection results comprises the following steps:

4. The method of claim 1, wherein the determining the living body detection result of the face according to the plurality of detection results comprises:

5. The method according to any one of claims 1 to 4, further comprising:

and identifying whether the human face is a living body according to the living body detection result of the human face.

6. The method of claim 1, wherein each of the detection result and the living detection result is a two-dimensional array; one dimension of the two-dimensional array represents the probability that the face is a living body, and the other dimension represents the probability that the face is not a living body.

7. A human face living body detection apparatus, characterized by comprising:

8. The apparatus of claim 7, wherein the reorganization module is specifically configured to:

9. The apparatus of claim 7, wherein the face in-vivo detection model comprises a feature extraction module, a downsampling module, and a normalization module; wherein, the detection module is specifically used for:

10. The apparatus of claim 7, wherein the determining module is specifically configured to:

11. The apparatus as recited in claim 7, further comprising:

12. The apparatus of claim 7, wherein the plurality of detection results and the living body detection result of the face are each a two-dimensional array; one dimension of the two-dimensional array represents the probability that the face is a living body, and the other dimension represents the probability that the face is not a living body.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.