CN113591526A

CN113591526A - Face living body detection method, device, equipment and computer readable storage medium

Info

Publication number: CN113591526A
Application number: CN202110009371.6A
Authority: CN
Inventors: 王雅萍; 薛传颂
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-04-30
Filing date: 2021-01-05
Publication date: 2021-11-02

Abstract

The application provides a face in-vivo detection method, a face in-vivo detection device and a computer readable storage medium, wherein the face in-vivo detection method comprises the following steps: acquiring a main-view face image and an auxiliary face image of a face of a person to be detected at different viewing angles through a camera of a terminal device, wherein the viewing angles comprise shooting angles and/or field angles; and carrying out human face living body detection through a living body detection model based on the main view human face image and the auxiliary human face image and outputting a detection result. According to the face in-vivo detection method, more difference information can be obtained through the auxiliary face image and the main view face image based on different visual angles, so that face attack is effectively prevented, the problems that the face in-vivo detection is easily attacked and the face recognition safety is poor due to the fact that the main view face image is only acquired and used in the traditional face in-vivo detection method are solved, and the face recognition safety can be improved.

Description

Face living body detection method, device, equipment and computer readable storage medium

Technical Field

The present application relates to the field of face recognition, and in particular, to a face in-vivo detection method, apparatus, device, and computer-readable storage medium.

Background

The face recognition technology is widely applied to terminal devices (such as mobile phones, monitoring systems and the like), for example, a user can unlock the face and pay by swiping the face on the mobile phone, and the monitoring system can capture a face image through a camera to perform face recognition, and therefore whether the person is a legal user or not and whether the current person is allowed to enter or not is judged.

However, since the face data of the user is easy to leak, the face recognition must be secure and protected from face attack.

At present, the face attack modes mainly include the following three modes, as shown in fig. 1, (a) represents that a two-dimensional (2D) face photo of a legal user is utilized to carry out face attack; (b) showing a face attack with 2D face video of a legitimate user; (c) a face attack with a three-dimensional (3D) face mask of a legitimate user is shown. The various attack modes are described in more detail below.

(a) Utilize 2D face photos of legitimate users. Specifically, the method comprises the steps of printing photos by paper, printing color photos, photos stored in a mobile phone and the like of the impersonated legal user, and in the face recognition process, the non-legal person conducts face recognition by using the face photos of the legal user so as to impersonate the legal user.

(b) Utilize 2D face video of legitimate users. And recording the video of the spoofed legal user in advance, wherein the video comprises action instructions such as blinking, turning, opening the mouth and the like, and in the face recognition process, the non-legal person uses the face video to play back to perform face recognition so as to spoof the legal user.

(c) Using the three-dimensional face mask of the legitimate user. Currently on the market, face masks are not very difficult to obtain. Face masks made of plastic or hard paper are common. The mask is low in cost, extremely low in similarity with the material of the face, and capable of being identified by common texture features. Compared with the poor facial masks, the stereoscopic masks formed by silica gel, latex, 3D printing and the like, the appearance of the masks is closer to the skin, for example, (c) the left side in the figure is a legal user, the right side in the figure is a three-dimensional facial mask generated based on the facial features of the legal user, and in the three-dimensional facial mask, the facial expressions such as wrinkles of the legal user are even reflected. The non-legal person impersonates the legal user by using the three-dimensional face mask of the legal user for face recognition, so that the risk is very high.

In order to prevent face attack, it is currently common practice to perform face liveness detection. The human face living body detection is to judge whether a human face image acquired by a camera is a real human face through an algorithm.

Currently, in human face living body detection, mainstream manufacturers adopt a deep learning scheme, and single-frame human face pictures are input into a neural network for secondary classification. The method utilizes the characteristic that the picture or video imitating the attack contains more noise when being subjected to secondary imaging, and distinguishes a real person from the picture, the video and the three-dimensional face mask by analyzing the texture details, thereby defending the face attack.

Disclosure of Invention

In view of the above, the present application provides a face in-vivo detection method, a face in-vivo detection device, a face in-vivo detection apparatus, and a computer-readable storage medium, where the face in-vivo detection method can more effectively defend against face attack, and solve the problems that the face in-vivo detection is easily attacked by a face and the security of face recognition is poor due to the fact that only a main view face image is acquired and used in the conventional face in-vivo detection method, and the security of face recognition can be improved.

The applicant finds that in the traditional face living body detection, only a single camera and a face image with a single visual angle, namely a single frame image are used for face living body detection, the camera is more focused on a face area due to the fact that details of a face need to be identified, other information except the face is ignored, the face image information with the single visual angle is relatively less, and the face living body detection is easy to be attacked by the face. Therefore, the applicant creatively proposes that in addition to a main view face image suitable for face recognition, auxiliary face images at other viewing angles are additionally acquired, and more edge information (such as photo edges, mask edges and the like) can be provided by utilizing the difference between the main view face image and the auxiliary view face image to carry out face living body detection, so that the face attack defense effect can be improved, and the face recognition safety can be improved.

The present application is described below in terms of several aspects, embodiments and advantages of which are mutually referenced.

In a first aspect, the present application provides a face living body detection method, used for a terminal device, the method including: acquiring a main-view face image and an auxiliary face image of a face of a person to be detected at different viewing angles through a camera of a terminal device, wherein the viewing angles comprise shooting angles and/or field angles; and carrying out human face living body detection through a living body detection model based on the main view human face image and the auxiliary human face image and outputting a detection result.

The "shooting angle" may be understood as an included angle formed by a connecting line between the center of the lens of the camera and the center of the face as the shooting object and a normal of a plane where the midpoint of the face is located. If the shooting angles are different, the pixel contents in the shot pictures are different. For example, when a picture taken at an angle of 0 ° shows a front view of a person, a picture taken at an angle of 45 ° shows an image taken around the right face of the person's face. In addition, the photographing angle is determined due to a fixed position of the camera, that is, the photographing angle can be changed by changing the position of the camera, and the photographing angle can also be changed by a camera disposed at a different position.

The "angle of view" is an angle formed by two edges of the lens at which the object image of the object to be measured can pass through the maximum range, with the center of the lens of the camera as a vertex. The field angle and the focal length of the lens have a uniquely determined one-to-one correspondence relationship, that is, different field angles represent different focal lengths, and the larger field angle represents a smaller focal length value, so that more pixel contents around the human face and the like can be acquired.

According to the embodiment of the application, when a terminal device (for example, a mobile phone, a tablet computer, a monitoring device, and other terminal devices) is used for face recognition, a main view face image and an auxiliary face image of a face of a person to be detected are acquired at different viewing angles, and then, face living body detection is performed through a living body detection model based on the main view face image and the auxiliary face image, and a detection result is output. Compared with the traditional human face in-vivo detection method which only uses the main view human face image with a single view angle, the auxiliary human face image provides a lot of information (hereinafter also referred to as difference information) different from the main view human face image, and the difference information can be used for human face in-vivo detection, so that the accuracy of human face in-vivo detection can be improved, and the safety of human face recognition is improved.

In a possible implementation of the first aspect, the main-view face image and the auxiliary face image are obtained by shooting with a main camera and an auxiliary camera, respectively, where the number of the auxiliary cameras is 1 or more. That is, the main-view face image can be obtained by shooting through the main camera, and a plurality of auxiliary face images can be obtained by shooting through 1 or a plurality of auxiliary cameras. The auxiliary cameras are used for acquiring a plurality of auxiliary face images, and different visual angles are set to the auxiliary cameras, so that more difference information can be acquired for face living body recognition, and the anti-attack performance is improved.

Further, the main camera and the auxiliary camera have different shooting angles. For example, for the terminal device, the main camera and the auxiliary camera may be disposed at different positions, so that an included angle formed by a connecting line between a lens center of the camera and a center of a face as a shooting object and a normal of a plane where a midpoint of the face is located is different, and different shooting angles may be provided, thereby forming a main view face image and an auxiliary face image.

Still further, the main camera and the auxiliary camera also have different focal lengths. On the basis that the main camera and the auxiliary camera have different shooting angles, in order to further acquire information of other dimensions (such as depth of field information, field range and the like) to improve the face living body detection accuracy, different cameras can use different focal lengths.

In a possible implementation of the first aspect, the main-view face image and the auxiliary face image are captured by a zoom camera with different focal lengths. That is, in addition to the multi-camera scheme using the main camera and the subsidiary camera, the difference information may be acquired by photographing the main-view face image and the subsidiary face image at different focal lengths, respectively, using the zoom camera. Different focal lengths can form different angles of view because of the uniquely determined one-to-one correspondence between the angle of view and the focal length of the lens. Therefore, the main-view face image and the auxiliary face image with different field angles and different shooting angles can be realized, and more difference information of the main-view face image and the auxiliary face image can be obtained, so that the face living body detection accuracy is further improved, and the face identification safety is improved.

Wherein, the zooming mode can be various. For example, it may be preset that the first picture is taken, the automatic zooming is performed, and the second picture is continuously taken. For another example, the user may set that the first picture is taken, a prompt box appears for the user to select whether and/or how much to zoom, and then the second picture is taken after the user sets the zoom. It should be noted that the manner of zooming is given by way of example only and is not intended to limit the scope of the present application. The main-view face image and the auxiliary face image with different visual angles are achieved through zooming, hardware improvement of existing equipment is less, and cost can be saved on the premise that face living body detection accuracy is improved and face recognition safety is improved.

Further, the focal distance used by the auxiliary face image is below 1/2 of the focal distance used by the main view face image. The proper selection of the focal length can take the problems of the field range and the noise into consideration. That is, the camera forms the assistant face image by shooting at a focal distance below 1/2 of the focal distance of the main-view face image. For example, after the primary view face image is shot, the focal length of the camera is reduced to below the original focal length 1/2 to shoot the auxiliary face image. Under the condition that the position of the human-side camera is not changed, the smaller the focal length is, the larger the area of the shot picture is, and the more easily the peripheral information outside the human face can be found. The focal length of the auxiliary camera is below 1/2 of the focal length of the main camera, and the auxiliary camera can find edge information of face attack more easily, so that the accuracy of face living body detection is improved, and the safety of face recognition is improved.

In a possible implementation of the first aspect, the auxiliary face image includes a plurality of auxiliary face images, and each auxiliary face image is obtained by shooting with a camera of a terminal device at different viewing angles. For example, the auxiliary face image may be captured by a plurality of different auxiliary cameras (each auxiliary image has a different capture angle), or may be captured by the same camera a plurality of times at different focal lengths (each auxiliary image has a different field angle). The more auxiliary face images are, the more face living body detection information is provided, the more detection precision can be improved, so that the defensive property of face attack is improved, and the safety of face recognition is improved.

In a possible implementation of the first aspect, the performing living body detection of a face through a living body detection model based on the main view face image and the auxiliary face image includes: respectively carrying out face detection and cutting on the main-view face image and the auxiliary face image to obtain a main-view face area image and an auxiliary face area image, wherein the proportion of the face part in the auxiliary face area image in the whole image is smaller than that of the face part in the main-view face area image in the whole image; and detecting the auxiliary face area image and the main view face area image through the living body detection model, and outputting a detection result. That is, after the face image is obtained, the face part is detected, corresponding cropping is performed to remove redundant information, and face living body detection is performed on the cropped main view face area image and the auxiliary face area image. It should be noted here that, when the face area in the auxiliary face image is cut, in order to avoid cutting out useful difference information, the proportion of the face part in the auxiliary face area image after cutting out to the whole image needs to be smaller (as for a specific proportion difference, it can be set as appropriate according to photographic requirements) than the proportion of the face part in the main view face area image to the whole image. The auxiliary face image reserves more pixel contents of non-face parts, and edge information of face attack (photo or video playing equipment) can be found more easily, so that the accuracy of face living body detection can be improved, and the defensive performance of the face attack can be improved.

Further, the living body detection model is obtained by training through the following method: acquiring a main-view face image and an auxiliary face image of a real person as positive samples, and acquiring a main-view image and an auxiliary image of an attack tool as negative samples; and training an objective function based on the positive sample and the negative sample to obtain the in-vivo detection model.

That is, the living body detection model is generated by performing the following training: firstly, acquiring a main view face image and an auxiliary face image of a real person as positive samples, and acquiring a main view image and an auxiliary image of an attack tool (including a photo, a display screen (video attack) and the like) as negative samples; and then, training an objective function based on the positive sample and the negative sample to obtain the in-vivo detection model. That is, the purpose of the live body detection model is to detect a live body face and an attack tool, and for this purpose, learning and training are performed based on a large number of live body face samples and attack tool samples.

Further, the attack tool comprises one or more of a photo, a video, and a face mask.

That is, the living body detection model is trained by taking one or more of photos, videos and human face areas as negative samples, so that the living body detection model can effectively identify the attack tool. It should be noted that the above is given only as an example of an attack tool and is not intended to limit the scope of the present application.

Further, when detecting that the auxiliary face area image contains edge information, outputting and displaying information that the object to be detected is a non-living body, wherein the edge information comprises any one of photo frame information, support information, screen information, and joint edge information of a mask and a human body part.

That is, when the living body detection model detects that the auxiliary face area image contains edge information, the living body detection model determines that the object to be detected is a non-living body, and outputs information showing that the object to be detected is a non-living body, wherein the edge information may include any one of the following: frame information, stent information, screen information, and joint margin information of the mask and the human body part. It should be noted that the above-mentioned edge information is only given as an example and is not used to limit the scope of the present application.

In a second aspect, the present application provides a face liveness detection device, including: the system comprises a face image acquisition module, a face image acquisition module and a face image acquisition module, wherein the face image acquisition module is used for acquiring a main view face image and an auxiliary face image of the face of a person to be detected, which are acquired through a camera of terminal equipment at different viewing angles respectively, and the viewing angles comprise shooting angles and/or field angles; and the detection module is used for carrying out human face living body detection through a living body detection model based on the main view human face image and the auxiliary human face image and outputting a detection result.

In a possible implementation of the second aspect, the main-view face image and the auxiliary face image are obtained by shooting with a main camera and an auxiliary camera, respectively, where the number of the auxiliary cameras is 1 or more.

Further, the main camera and the auxiliary camera have different shooting angles.

Still further, the main camera and the auxiliary camera also have different focal lengths.

In another possible implementation of the second aspect, the main-view face image and the auxiliary face image are captured by a zoom camera with different focal lengths.

Further, the focal distance used by the auxiliary face image is below 1/2 of the focal distance used by the main view face image.

In a possible implementation of the second aspect, the auxiliary face image includes a plurality of auxiliary face images, and each auxiliary face image is obtained by shooting with a camera of a terminal device at different viewing angles.

In a possible implementation of the second aspect, the detecting module may include:

the preprocessing module is used for respectively carrying out face detection and cutting on the main-view face image and the auxiliary face image to obtain a main-view face area image and an auxiliary face area image, wherein the proportion of the face part in the auxiliary face area image in the whole image is smaller than that of the face part in the main-view face area image in the whole image;

and the living body detection module is used for carrying out living body detection on the auxiliary face area image and the main view face area image and outputting a detection result. The training method for the living body detection module can be obtained by referring to the above-described training method for the living body model, and a detailed description thereof is omitted here.

Further, the liveness detection module may be to:

and when detecting that the auxiliary face area image contains edge information, outputting and displaying information that the object to be detected is a non-living body, wherein the edge information comprises any one of photo frame information, support information, screen information and joint edge information of a mask and a human body part.

In a third aspect, the present application provides a terminal device, including:

the camera is used for acquiring a main-view face image and an auxiliary face image;

one or more processors; one or more memories having computer readable code stored therein, which when executed by the one or more processors, causes the processors to perform the method of live face detection of any one of the possible implementations of the first aspect of the application described above.

Further, the camera includes main camera and supplementary camera, supplementary camera is 1 or more.

Further, the camera is a zoom camera, and the main-view face image and the auxiliary face image are images obtained at different focal lengths.

In a fourth aspect, the present application provides a computer-readable storage medium having stored therein computer-readable code, which, when executed by one or more processors, causes the processors to execute the above-mentioned face liveness detection method of any one of the possible implementations of the first aspect of the present application.

Drawings

FIG. 1 is a diagram illustrating a prior art face attack scenario;

fig. 2 is an application scene diagram of a living human face detection method according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of an electronic device provided in accordance with an embodiment of the present application;

FIG. 4 is a schematic view of a camera of a handset according to the present application;

FIG. 5 is a schematic view of a camera of another handset according to the application;

FIG. 6 is a schematic flow chart of a face live detection method according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a primary view face image and a secondary face image according to an embodiment of the present application;

FIG. 8 is a schematic flowchart of a live human face detection by a live human face detection model according to an embodiment of the present application;

FIG. 9 is a schematic flow chart of a face live detection method according to an embodiment of the present application;

FIG. 10 is a schematic flow chart of a face live detection method according to another embodiment of the present application;

FIG. 11 is a schematic view of an in vivo test model of the embodiment of FIG. 10;

FIG. 12 is a block diagram of a living human face detection apparatus provided according to an embodiment of the present application;

FIG. 13 is a block diagram of a detection module of a living human face detection apparatus provided according to an embodiment of the present application;

FIG. 14 is a block diagram of an apparatus according to some embodiments of the present application;

fig. 15 is a block diagram of a system on a chip (SoC) according to some embodiments of the present application.

Detailed Description

The technical solution in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

It will be appreciated that as used herein, the term module may refer to or include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) and/or memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable hardware components that provide the described functionality, or may be part of such hardware components.

It will be appreciated that in the various embodiments of the present application, the processor may be a microprocessor, a digital signal processor, a microcontroller, the like, and/or any combination thereof. According to another aspect, the processor may be a single-core processor, a multi-core processor, the like, and/or any combination thereof.

Hereinafter, embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Fig. 2 is a diagram of 2 possible application scenarios of the face live detection method according to an embodiment of the present application. Fig. 2 (a) shows a scenario in which a mobile phone is used as a terminal device to perform face recognition, so as to perform online payment for goods in a shopping cart. Fig. 2 (b) shows a face liveness detection performed by a terminal device (in the figure, a case where the terminal device is an entrance guard device) to determine whether or not a visitor is a legitimate user and to determine whether or not to open an entrance guard based on the face liveness detection.

Under the application scene, the terminal equipment acquires a main view face image and an auxiliary face image of the face of a person to be detected at different visual angles through the camera respectively, and then performs face living body detection through a living body detection model based on the main view face image and the auxiliary face image and outputs a detection result. That is to say, the processor of the terminal device calls the face living body detection method of the application to process, and finally outputs a detection result, if a non-living body is judged, corresponding operation is not allowed to be performed even if the face recognition similarity reaches a threshold value.

The concept of the angle of view referred to in the present application includes a shooting angle at the time of shooting and/or a field angle of a lens. The shooting angle during shooting is an included angle formed by a connecting line of the center of a lens of the camera and the center of a face serving as a shooting object and a normal of a plane where the midpoint of the face is located. The field angle of the lens is an angle formed by two edges of the camera head, wherein the center of the camera head is used as a vertex, and an object image of a shot object can pass through the maximum range of the lens.

The difference in the angle of view includes three cases, that is, a difference in the photographing angle, a difference in the angle of view, and a difference in both the photographing angle and the angle of view.

In the prior art, living body detection of a human face is performed by adopting information of a single visual angle (a single camera, a single shooting angle and a single field angle), although the accuracy of the living body detection can be improved to a certain degree by adopting a deep learning scheme, due to the limitation of input information, a plurality of potential safety hazards exist, and an attacker can break through a human face living body detection algorithm. For example, in the case of a two-dimensional (2D) camera, an attacker can trick the live body detection algorithm through a high-definition printed photograph, and even in the case of a three-dimensional (3D) camera, the attacker can trick the live body detection algorithm through a high-fidelity three-dimensional mask.

Compared with the scheme of carrying out human face living body detection on a single camera, a single visual angle and a single image in the prior art, the scheme of the application additionally acquires auxiliary human face images at other visual angles on the basis of acquiring the image at a certain visual angle as a main-view human face image, can provide more difference information (such as photo edges, mask edges and the like) for human face living body detection by utilizing the difference between the main-view human face image and the auxiliary-view human face image, can improve the effect of defending human face attacks, and improves the safety of human face recognition.

FIG. 3 illustrates a schematic structural diagram of an electronic device 100 according to some embodiments of the present application. For example, the terminal device such as a mobile phone mentioned in the above application scenario of the present application.

The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) connector 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a key 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identification Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

The processor 110 may generate operation control signals according to the instruction operation code and the timing signals, so as to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, and a Subscriber Identity Module (SIM) interface.

The I2C interface is a bi-directional synchronous serial bus that includes a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, the charger, the flash, the camera 193, etc. through different I2C bus interfaces, respectively. For example: the processor 110 may be coupled to the touch sensor 180K via an I2C interface, such that the processor 110 and the touch sensor 180K communicate via an I2C bus interface to implement the touch functionality of the electronic device 100.

The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may communicate audio signals to the wireless communication module 160 via the I2S interface, enabling answering of calls via a bluetooth headset.

The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled by a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to implement a function of answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a bluetooth headset.

MIPI interfaces may be used to connect processor 110 with peripheral devices such as display screen 194, camera 193, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the capture functionality of electronic device 100. The processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the electronic device 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, a MIPI interface, and the like.

It should be understood that the interface connection relationship between the modules illustrated in the embodiments of the present application is only an illustration, and does not limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The electronic device 100 implements display functions via the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, with N being a positive integer greater than 1.

The electronic device 100 may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display 194, the application processor, and the like.

The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

For example, taking a mobile phone as an example, fig. 4 and 5 respectively show two different cameras.

Fig. 4 shows a case where the mobile phone has 2 cameras, in which the camera located in the middle of the screen may be used as the main-view camera and the camera located on one side may be used as the auxiliary camera. In addition, the physical focal lengths of the two cameras can be the same or different. Still further, either or both of the cameras may be zoom cameras. In addition, it should be understood that fig. 4 only shows the situation with 2 cameras, and in fact, with the development of the mobile phone photographing function, etc., 3 cameras, 4 cameras, even 6 cameras, etc. are gradually introduced, and these mobile phones with multiple cameras can implement the face liveness detection method of the present application, which should be understood as falling within the scope of the present application.

Fig. 5 shows a case where the mobile phone has a camera with a zoom function, and (a) in fig. 5 shows a schematic view of the mobile phone, and (b) shows an exploded schematic view of a part of the camera 193. The position of a focus is changed by moving the lens in the lens, the length of the focal length of the lens is changed, and the size of the visual angle of the lens is changed, so that the image is enlarged and reduced. When the focal point moves to the opposite direction of the imaging surface, the focal length is lengthened, and conversely, the focal length is shortened.

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. Applications such as intelligent recognition of the electronic device 100 can be realized through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (such as audio data, phone book, etc.) created during use of the electronic device 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

According to some embodiments of the present application, the internal memory 121 stores therein instructions (in other words, computer readable codes), and the processor 110 executes the face liveness detection method according to the present application when reading the instructions stored in the internal memory 121. Specifically, reference may be made to the face live detection method of the following embodiment.

Fig. 6 shows a flowchart of a living human face detection method according to an embodiment of the present application.

The living human face detection method according to the embodiment of the present application is described below with reference to fig. 6. As an embodiment, the method may be implemented on a cell phone.

And S100, respectively acquiring a main view face image and an auxiliary face image of the face of the person to be detected at different viewing angles through a camera of the mobile phone. That is to say, when the mobile phone performs face recognition, the main view face image and the auxiliary face image of the face of the person to be detected are acquired at different viewing angles respectively.

For the main-view face image and the auxiliary face image, any one of the main-view face image and the auxiliary face image can be used as the main-view face image, and the other one is used as the auxiliary face image. For example, it may be considered that the main function of the main-view face image is locked to the identity authentication, and the function of the subsidiary face image is locked to the living body authentication by providing the subsidiary information.

The auxiliary face image of the main-view face image contains different image information (hereinafter also referred to as difference information) depending on the photographing angle. The main view face image may be an image shot by a camera serving as a main camera of the mobile phone under the condition that a shooting angle is zero. The auxiliary face image is an image obtained by shooting the auxiliary camera at a certain shooting angle. In addition, in the case of different shooting angles, the brightness distribution of the human face received by the image sensor of the camera is different, so the brightness distribution of the human face image is also different. Under the condition of different shooting angles, focal planes formed by the human face are different, and different depth of field information can be obtained.

The assistant face image of the main-view face image with different field angles also contains different image information. The different field angles mean that the focal lengths of the cameras are different, so that different depth of field information can be obtained for the photographic subject. The image is clearer at the focus, the farther away from the focus, the more fuzzy, the smaller the focal length, and the larger the depth of field. Different face surrounding information can be obtained at different field angles, and the larger the field angle is, the smaller the image area occupied by the face is, so that more face surrounding information can be seen; as described above, different pixel contents can also be acquired in the case of different shooting angles.

The main camera and the auxiliary camera can be arranged at different positions, can have different focal lengths, can also have image sensors and the like with different sizes, and therefore the main-view face image and the auxiliary face image with different visual angles can be shot and formed. The main-view face image can be formed by shooting through the main camera of the mobile phone, and the auxiliary face image can be formed by shooting through 1 or more auxiliary cameras. The following description will be made in detail with reference to a specific embodiment, for a case where the main-view face image and the auxiliary face image are respectively captured by different cameras (i.e., a main camera and an auxiliary camera), and a case where the main-view face image and the auxiliary face image are respectively captured by the same camera at different focal lengths.

For example, fig. 7 (a) shows a schematic diagram of a main-view face image captured by a camera at a small angle of view, and fig. 7 (b) shows a schematic diagram of a sub-face image captured at a large angle of view. As can be seen from fig. 7, the auxiliary face image shown in (b) has a relatively small space occupied by the face in the image, and can provide more information around the face, and the frame information, and the frame information of the photo, and the frame information can be used as bases for subsequently determining the non-living face.

And S200, carrying out human face living body detection through a living body detection model based on the main view human face image and the auxiliary human face image, and outputting a detection result.

That is, after the main-view face image and the auxiliary face image are acquired, face live body detection is performed through the live body detection model based on the main-view face image and the auxiliary face image, and a result is output.

Therefore, more difference information can be obtained based on the auxiliary face image and the main view face image with different visual angles, so that face attack is effectively prevented, the problems that the face attack is easy to happen and the face recognition safety is poor due to the fact that the face living body detection is only carried out by the main view face image in the traditional face living body detection method are solved, and the safety of face recognition can be improved.

Fig. 8 shows a schematic flow chart of the living body detection model for human face living body detection.

As shown in fig. 8, the process of performing living body detection of a human face through a living body detection model based on the main view human face image and the auxiliary human face image includes the following steps.

Step S201, performing face detection and clipping on the main-view face image and the auxiliary face image respectively to obtain a main-view face region image and an auxiliary face region image, wherein a proportion of a face part in the auxiliary face region image in the whole image is smaller than a proportion of the face part in the main-view face region image in the whole image.

That is, after the main-view face image and the assistant face image are acquired, the main-view face image and the assistant face image are preprocessed. Specifically, the main-view face image and the auxiliary face image are respectively subjected to face detection and clipping to obtain a main-view face area image and an auxiliary face area image, wherein the proportion of the face part in the auxiliary face area image in the whole image is smaller than that of the face part in the main-view face area image in the whole image. The face detection technology may be implemented by detecting key points such as eye, mouth, and nose contours, and may specifically use, for example, a DSFD open source algorithm. After the face area is detected, cutting out an area with a certain area by taking the face outline as the center, and respectively obtaining a main-view face area image area and an auxiliary face image area. For example, the main-view face image may cut out an area containing only a face as a main-view face area, and the auxiliary-view face image may cut out an area of a predetermined multiple of the face size as an auxiliary face area with the face center as a base point. Therefore, the main view face image can accurately perform face recognition, the auxiliary face image can obtain the auxiliary face image with more different information than the main view face, and excessive interference caused by excessive interference information (such as other peripheral people, surrounding objects and the like) due to excessive image is avoided.

And step S202, detecting the auxiliary face area image and the main view face area image through the living body detection model, and outputting a detection result.

Wherein, the input and output of the living body detection model can be embodied by the following examples: for example, machine learning may be performed on difference information between the main-view face image and the auxiliary face image to form a living body detection model, and when the difference value between the acquired main-view face image and the acquired auxiliary face image is not within the preset difference value range, information indicating that living body detection fails is output. For example, the operation is prompted as an illegal user operation on a mobile phone screen, or warning information is sent out, various current application program operations based on identity authentication are interrupted, and the like.

For face attacks, for example, a two-dimensional image in the case of a face picture is an attack tool, which is significantly different from three-dimensional image information difference information (depth of field difference information) of a real face. For another example, in the case that the attack tool is a three-dimensional face mask, since the photosensitivity of the three-dimensional face mask is different from that of the real face, the difference information (face brightness distribution difference information) is also different, so that the face attack can be effectively defended. For another example, the auxiliary face image with a large viewing angle may be used to see more information other than the face, so as to identify information of face attack, that is, identify information that a real face cannot form, such as edge information. The edge information referred to herein may include: photo frame information, stent information, screen information, information of the interface margin of the mask and the human body part, etc.

By performing machine learning on the unreal face information and forming a living body detection model, when the unreal face information is detected, the non-living body information of the face can be output, so that the face attack can be effectively defended.

It should be noted that the above is only an example, and the scope of the present application is not limited, and the machine may also identify the difference features between the main view face image and the auxiliary face image of the real face image and the main view face image and the auxiliary face image of the face attack tool by itself in a deep learning manner, and perform face living body detection based on the difference features.

That is, the auxiliary face region image and the main-view face region image are detected by the living body detection model, and the detection result is output.

In order to effectively prevent the interference of an attack tool on the face image recognition, the living body detection model according to the application considers the training of the positive sample and the negative sample at the same time. Firstly, acquiring a main-view face image and an auxiliary face image of a real person as positive samples, and acquiring a main-view image and an auxiliary image of an attack tool as negative samples; and then, training an objective function based on the positive sample and the negative sample to obtain the in-vivo detection model. Therefore, through training with the positive sample and the negative sample, the living body detection model can effectively master the difference characteristics between the real human face and various attack tools, so that the attack tools can be quickly and accurately identified, and the safety of human face living body detection is improved.

Two specific embodiments of the face live detection method will be described in detail below with reference to fig. 9 and 10.

Example 1

Fig. 9 is a flow chart diagram illustrating a method for detecting a living human face by a main camera and an auxiliary camera of a mobile phone. As shown in fig. 9, the flow of the face live detection method is specifically explained as follows. The method shown in fig. 9 is accomplished using a handset having multiple cameras as shown in fig. 4.

In step S01, face recognition is started. Face recognition can be triggered by pressing a key on the mobile phone or clicking the screen.

Step S02, a processor in the mobile phone controls a main camera to shoot a main view face image, and synchronously starts an auxiliary camera to shoot an auxiliary face image;

step S03, a processor in the mobile phone, such as a CPU or a GPU, preprocesses the shot main view face image and the shot auxiliary face image, that is, detects a face region based on the main view face image, cuts out the face region to obtain a main view face image region, detects a face region based on the auxiliary face image, and cuts out an auxiliary face image region. For example, the main-view face image may be cut out to have an area only including a face as a main-view face area, the auxiliary-view face image may be cut out to have an area 5 times the size of the face as an auxiliary face area with the center of the face as a base point, and the auxiliary face image may effectively identify edge information and filter some interference information.

Step S04, the processor in the mobile phone inputs the preprocessed main-view face image and the auxiliary face image into the trained living body detection model for face living body detection;

and step S05, the processor outputs the result of the living human face detection through the screen, and the detection result informs the current user of the mobile phone that the human face to be detected is a real human face or a false human face. Further, in case of a false face, the mobile phone may close the currently invoked program.

Example 2

The face live detection is performed through the zoom function of the mobile phone camera, as shown in fig. 10, and the face live detection method is specifically explained as follows. The method shown in fig. 10 is performed using a mobile phone including a zoom lens as shown in fig. 5.

Step S10, starting face recognition, and triggering face recognition through mobile phone keys or clicking a screen;

step S20, the processor of the mobile phone controls the camera to start the default focal length to collect the main-view face image, and the default focal length can be automatically started to collect the main-view face image when the face recognition is triggered;

step S30, after the processor of the mobile phone finishes shooting the main view face image and detects that the shot image is valid (for example, situations such as unclear focusing are eliminated), the processor further controls the camera to switch to the next focal length, and acquires an auxiliary face image (in case of acquiring multiple auxiliary face images, the auxiliary face image can be acquired by automatically zooming according to a preset program);

step S40, the processor preprocesses the shot main view face image and the shot auxiliary face image;

step S50, the processor inputs the preprocessed main-view face image and the auxiliary face image into the trained living body detection model for face living body detection;

and step S60, the processor outputs the result of the living human face detection through the screen, and the detection result informs the user of the mobile phone that the human face to be detected is a real human face or a false human face.

A method for training a living body test model according to an embodiment of the present application is described below with reference to fig. 11.

As shown in fig. 11, the living body detection model may include a plurality of inputs, and the plurality of inputs are subjected to fusion processing to obtain a living body detection result of the human face.

The face image 1 is a main-view face area image acquired and preprocessed by a camera, and the face image 2 is an auxiliary face area image acquired and preprocessed by the camera.

Firstly, in the living body detection model using the neural network according to the present application, a face image 1 and a face image 2 are input to the living body detection model at an input layer;

next, in the intermediate layer, the face image 1 and the face image 2 are respectively subjected to convolution processing by a plurality of convolution processing blocks (convolution processing blocks 1 to N in fig. 11). The middle layer comprises a plurality of convolution processing blocks, and each convolution processing block sequentially comprises a convolution layer conv, a batch normalization layer bn, a normalization layer scale, an activation function layer relu and a pooling processing layer pool.

After the face image 1 and the face image 2 are respectively processed by the processing blocks of the N rolling machines, the obtained processing result is processed at the full-connection layer fc of the middle layer.

And finally, after the face image 1 and the face image 2 which are respectively output from the full connection layer fc are fused concat, the fused concat is output as a final result of the face living body detection on the output layer.

Next, a living human face detection apparatus 1000 according to an embodiment of the present application is described with reference to fig. 12 to 13.

Fig. 12 shows a block diagram of a living human face detection device 1000 provided according to an embodiment of the present application, and fig. 13 shows a block diagram of a detection module in the living human face detection device provided according to an embodiment of the present application.

As shown in fig. 12, a living human face detection apparatus 1000, a human face image acquisition module 400 and a detection module 500 are provided according to an embodiment of the present application.

Referring to fig. 6 and 8, in implementing the living human face detection method, the human face image acquisition module 400 acquires a main view human face image and an auxiliary human face image of a person to be detected, which are captured by a camera. The detection module 500 is configured to perform living body detection on a human face through a living body detection model based on the main-view human face image and the auxiliary human face image, and output a detection result.

As shown in fig. 13, the detection module 500 may further include a preprocessing module 501 and a living body detection module 502. The preprocessing module 501 is configured to perform face detection and clipping on the main-view face image and the auxiliary face image respectively to obtain a main-view face region image and an auxiliary face region image; on the basis, the living body detection module 502 detects the auxiliary face region image and the main view face region image, and outputs a detection result.

Further, the present application also provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is run, the living human face detection method of the above embodiment can be implemented.

Next, a device 1200 (which may be a mobile phone, for example) according to an embodiment of the present application is described with reference to fig. 14. FIG. 14 is a block diagram illustrating an apparatus 1200 according to one embodiment of the present application. The device 1200 may include one or more processors 1201 coupled to a controller hub 1203. For at least one embodiment, the controller hub 1203 communicates with the processor 1201 via a multi-drop Bus such as a Front Side Bus (FSB), a point-to-point interface such as a Quick Path Interconnect (QPI), or similar connection 1206. The processor 1201 executes instructions that control general types of data processing operations. In one embodiment, Controller Hub 1203 includes, but is not limited to, a Graphics Memory Controller Hub (GMCH) (not shown) and an Input/Output Hub (IOH) (which may be on separate chips) (not shown), where the GMCH includes a Memory and a Graphics Controller and is coupled to the IOH.

The device 1200 may also include a coprocessor 1202 and a memory 1204 coupled to the controller hub 1203. Alternatively, one or both of the memory and GMCH may be integrated within the processor (as described herein), with the memory 1204 and coprocessor 1202 being directly coupled to the processor 1201 and to the controller hub 1203, with the controller hub 1203 and IOH being in a single chip. The Memory 1204 may be, for example, a Dynamic Random Access Memory (DRAM), a Phase Change Memory (PCM), or a combination of the two. In one embodiment, coprocessor 1202 is a special-Purpose processor, such as, for example, a high-throughput MIC processor (MIC), a network or communication processor, compression engine, graphics processor, General Purpose Graphics Processor (GPGPU), embedded processor, or the like. The optional nature of coprocessor 1202 is represented in FIG. 12 by dashed lines.

Memory 1204, as a computer-readable storage medium, may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. For example, the memory 1204 may include any suitable non-volatile memory, such as flash memory, and/or any suitable non-volatile storage device, such as one or more Hard-Disk drives (Hard-Disk drives, hdd (s)), one or more Compact Discs (CD) drives, and/or one or more Digital Versatile Discs (DVD) drives.

In one embodiment, device 1200 may further include a Network Interface Controller (NIC) 1206. Network interface 1206 may include a transceiver to provide a radio interface for device 1200 to communicate with any other suitable device (e.g., front end module, antenna, etc.). In various embodiments, the network interface 1206 may be integrated with other components of the device 1200. The network interface 1206 may implement the functions of the communication unit in the above-described embodiments.

The device 1200 may further include an Input/Output (I/O) device 1205. I/O1205 may include: a user interface designed to enable a user to interact with the device 1200; the design of the peripheral component interface enables peripheral components to also interact with the device 1200; and/or sensors may be configured to determine environmental conditions and/or location information associated with device 1200.

It is noted that fig. 14 is merely exemplary. That is, although fig. 14 shows that the apparatus 1200 includes a plurality of devices, such as the processor 1201, the controller hub 1203, the memory 1204, etc., in practical applications, an apparatus using the methods of the present application may include only a part of the devices of the apparatus 1200, for example, only the processor 1201 and the NIC1206 may be included. The nature of the alternative device in fig. 14 is shown in dashed lines.

According to some embodiments of the present application, the memory 1204 serving as a computer-readable storage medium stores instructions that, when executed on a computer, cause the system 1200 to perform the method for detecting a living human face according to the above embodiments, which may specifically refer to the method of the above embodiments and will not be described herein again.

Fig. 15 is a block diagram of a SoC (System on Chip) 1300 according to an embodiment of the present application. In fig. 15, like parts have the same reference numerals. In addition, the dashed box is an optional feature of more advanced socs. In fig. 15, SoC1300 includes: an interconnect unit 1350 coupled to the application processor 1310; a system agent unit 1380; a bus controller unit 1390; an integrated memory controller unit 1340; a set or one or more coprocessors 1320 which may include integrated graphics logic, an image processor, an audio processor, and a video processor; a Static Random Access Memory (SRAM) unit 1330; a Direct Memory Access (DMA) unit 1360. In one embodiment, the coprocessor 1320 includes a special-purpose processor, such as, for example, a network or communication processor, compression engine, GPGPU, a high-throughput MIC processor, embedded processor, or the like.

Included in Static Random Access Memory (SRAM) unit 1330 may be one or more computer-readable media for storing data and/or instructions. A computer-readable storage medium may have stored therein instructions, in particular, temporary and permanent copies of the instructions. The instructions may include: when executed by at least one unit in the processor, the Soc1300 may execute the calculation method according to the foregoing embodiment, which may specifically refer to the method of the foregoing embodiment and will not be described herein again.

Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.

Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this Application, a processing system includes any system having a Processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.

The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, Compact disk Read Only memories (CD-ROMs), magneto-optical disks, Read Only Memories (ROMs), Random Access Memories (RAMs), Erasable Programmable Read Only Memories (EPROMs), Electrically Erasable Programmable Read Only Memories (EEPROMs), magnetic or optical cards, flash Memory, or a tangible machine-readable Memory for transmitting information (e.g., carrier waves, infrared signals, digital signals, etc.) using the Internet in electrical, optical, acoustical or other forms of propagated signals. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).

In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.

It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.

It is noted that, in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.

While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims

1. A human face living body detection method is used for terminal equipment and is characterized by comprising the following steps:

acquiring a main-view face image and an auxiliary face image of a face of a person to be detected at different viewing angles through a camera of a terminal device, wherein the viewing angles comprise shooting angles and/or field angles;

and carrying out human face living body detection through a living body detection model based on the main view human face image and the auxiliary human face image and outputting a detection result.

2. The face liveness detection method according to claim 1, wherein the main-view face image and the auxiliary face image are captured by a main camera and an auxiliary camera respectively, and the number of the auxiliary cameras is 1 or more.

3. The face liveness detection method according to claim 2, wherein the main camera and the auxiliary camera have different shooting angles.

4. The face liveness detection method of claim 3 wherein the primary camera and the secondary camera further have different focal lengths.

5. The face liveness detection method according to claim 1, wherein the primary-view face image and the auxiliary face image are captured by a zoom camera at different focal lengths.

6. The face liveness detection method according to claim 5, wherein the focal distance used by the auxiliary face image is below 1/2 of the focal distance used by the main view face image.

7. The face liveness detection method according to claim 1, wherein the auxiliary face image comprises a plurality of auxiliary face images, and each auxiliary face image is obtained by shooting at different viewing angles through a camera of a terminal device.

8. The face in-vivo detection method according to any one of claims 1 to 7, wherein the face in-vivo detection based on the main-view face image and the auxiliary face image through a in-vivo detection model comprises:

performing face detection and cutting on the main-view face image to obtain a main-view face area image and an auxiliary face area image, wherein the proportion of the face part in the auxiliary face area image in the whole image is smaller than the proportion of the face part in the main-view face area image in the whole image;

and carrying out human face living body detection on the auxiliary human face area image and the main-view human face area image through the living body detection model, and outputting a detection result.

9. The human face living body detection method according to claim 8, wherein the living body detection model is obtained by training as follows:

acquiring a main-view face image and an auxiliary face image of a real person as positive samples, and acquiring a main-view image and an auxiliary image of an attack tool as negative samples;

and training an objective function based on the positive sample and the negative sample to obtain the in-vivo detection model.

10. The face liveness detection method according to claim 9, wherein said attack tool comprises one or more of a photo, a video, a face mask.

11. The face liveness detection method according to claim 10, wherein when detecting that the auxiliary face area image contains edge information, the method outputs information showing that the object to be detected is not a live body, wherein the edge information includes any one of photo frame information, stent information, screen information, and joint edge information of a mask and a human body part.

12. A face liveness detection device, comprising:

the system comprises a face image acquisition module, a face image acquisition module and a face image acquisition module, wherein the face image acquisition module is used for acquiring a main-view face image and an auxiliary face image of a face of a person to be detected, which are acquired through a camera of terminal equipment at different viewing angles respectively, and the viewing angles comprise shooting angles and/or field angles;

and the detection module is used for carrying out human face living body detection through a living body detection model based on the main view human face image and the auxiliary human face image and outputting a detection result.

13. A terminal device, comprising:

one or more processors;

one or more memories having computer readable code stored therein, which when executed by the one or more processors, causes the processors to perform the method of live human face detection of any one of claims 1 to 11.

14. The terminal device according to claim 13, wherein the cameras include a main camera and a sub-camera, and the number of the sub-cameras is 1 or more.

15. The terminal device according to claim 13, wherein the camera is a zoom camera, and the main-view face image and the auxiliary face image are images obtained at different focal lengths.

16. A computer readable storage medium having computer readable code stored therein, which when executed by one or more processors, causes the processors to perform the face liveness detection method of any one of claims 1 to 11.