WO2020177226A1

WO2020177226A1 - Improved resnet-based human face in-vivo detection method and related device

Info

Publication number: WO2020177226A1
Application number: PCT/CN2019/089163
Authority: WO
Inventors: 庞烨; 王义文; 王健宗
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-03-04
Filing date: 2019-05-30
Publication date: 2020-09-10
Also published as: CN110059542A

Abstract

The present application discloses an improved Resnet-based human face in-vivo detection method and related device, relating to the field of in-vivo detection, the method comprises: obtaining a single-frame image to be detected containing a human face image; for each human face image in the single-frame image to be detected, based on the improved Resnet, obtaining a probability value that the human face image directly comes from a living body; judging whether the human face image directly comes from a living body based on a matching result of the probability value with a preset threshold. The method improves the accuracy of human face in-vivo detection.

Description

Method and related equipment for face live detection based on improved Resnet

Technical field

This application is based on and claims the priority of the Chinese patent application CN201910160807.4 filed on March 4, 2019 with the title of "Improved Resnet-based method and related equipment for face living detection", which is hereby incorporated by reference in its entirety The content is merged here.

This application relates to the field of live body detection, and in particular to a method and related equipment for face live body detection based on an improved Resnet.

Background technique

Today, with the rapid development of Internet technology, the application of face recognition is more and more closely related to life, such as: access control systems, facial payment, etc. In these scenarios, in addition to being able to recognize the human face, it is also necessary to judge the authenticity of the human face to prevent malicious persons from using other people's photos and videos for illegal activities. To judge the authenticity of human faces, namely live body detection. The inventor of the present application realizes that in the existing living body detection technology, most rely on temperature sensing hardware, such as an infrared sensor. The temperature sensing hardware is used to determine whether the currently acquired video is directly from a living body. The disadvantage of this method is that it is not suitable for portable terminals, and the deployment of hardware increases the cost of the living body detection system. Moreover, in some special environments, the accuracy of the living body detection system of this kind of living body detection system is not satisfactory.

Summary of the invention

technical problem

Based on this, in order to solve the technical problem of how to improve the accuracy of face live detection in related technologies, this application provides a face live detection method and related equipment based on an improved Resnet.

The solution to the problem

Technical solutions

According to one aspect of the present disclosure, a method for face living detection based on improved Resnet is provided, which includes: obtaining a single frame image to be detected containing a face image; and for each face in the single frame image to be detected Based on the improved Resnet, the probability value that the face image is directly derived from a living body is acquired; based on the matching result of the probability value and a preset threshold, it is determined whether the face image is directly derived from the living body.

According to one aspect of the present disclosure, there is provided an apparatus for face living detection based on an improved Resnet, including: a first acquisition module configured to acquire a single frame image to be detected containing a face image; a second acquisition module configured For each face image in the single frame image to be detected, based on the improved Resnet, obtain the probability value that the face image is directly derived from a living body; the determination module is configured to be based on the probability value and a preset threshold According to the matching result, it is determined whether the face image is directly derived from a living body.

According to one aspect of the present disclosure, there is provided an electronic device for face living detection based on an improved Resnet, including: a memory configured to store executable instructions; a processor configured to execute executable instructions stored in the memory To perform the method described above.

According to one aspect of the present disclosure, there is provided a computer non-volatile readable storage medium, which stores computer program instructions that, when executed by a computer, cause the computer to execute the method described above.

Compared with the traditional technology that relies on external temperature sensing equipment for live detection of face images, the embodiments of the present disclosure use an improved Resnet to perform live detection of face images, which reduces hardware requirements and improves Accuracy of face live detection.

Other characteristics and advantages of the present disclosure will become apparent through the following detailed description, or partly learned through the practice of the present disclosure.

It should be understood that the above general description and the following detailed description are only exemplary and cannot limit the present disclosure.

The beneficial effects of the invention

Brief description of the drawings

Description of the drawings

Fig. 1 shows a flow chart of the steps of face living detection based on improved Resnet according to an exemplary embodiment of the present disclosure.

Fig. 2 shows a flow chart of partial steps of face living detection based on improved Resnet according to an exemplary embodiment of the present disclosure.

Fig. 3 shows a flow chart of partial steps of face living detection based on improved Resnet according to an exemplary embodiment of the present disclosure.

Fig. 4 shows a flow chart of partial steps of face living detection based on improved Resnet according to an exemplary embodiment of the present disclosure.

Fig. 5 shows a block diagram of a device for face living detection based on an improved Resnet according to an exemplary embodiment of the present disclosure.

Fig. 6 shows a system architecture diagram of face living detection based on improved Resnet according to an exemplary embodiment of the present disclosure.

Fig. 7 shows a diagram of an electronic device for face living detection based on an improved Resnet according to an exemplary embodiment of the present disclosure.

FIG. 8 shows a diagram of a computer non-volatile readable storage medium for face living detection based on improved Resnet according to an exemplary embodiment of the present disclosure.

Invention embodiment

Embodiments of the invention

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the example embodiments can be implemented in various forms, and should not be construed as being limited to the examples set forth herein; on the contrary, the provision of these embodiments makes the present disclosure more comprehensive and complete, and fully conveys the concept of the example embodiments To those skilled in the art. The described features, structures or characteristics may be combined in one or more embodiments in any suitable way. In the following description, many specific details are provided to give a sufficient understanding of the embodiments of the present disclosure. However, those skilled in the art will realize that the technical solutions of the present disclosure can be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. can be used. In other cases, the well-known technical solutions are not shown or described in detail to avoid overwhelming the crowd and obscure all aspects of the present disclosure.

In addition, the drawings are only schematic illustrations of the present disclosure, and are not necessarily drawn to scale. The same reference numerals in the figures denote the same or similar parts, and thus their repeated description will be omitted. Some of the block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically independent entities. These functional entities may be implemented in the form of software, or implemented in one or more hardware modules or integrated circuits, or implemented in different networks and/or processor devices and/or microcontroller devices.

The purpose of the present disclosure is to improve the accuracy of face living detection from the technical aspect. According to an embodiment of the present disclosure, a method for face living detection based on an improved Resnet includes: acquiring a single frame image to be detected containing a face image; and for each face image in the single frame image to be detected, based on the improvement Resnet of, obtaining the probability value that the face image is directly derived from a living body; based on the matching result of the probability value and a preset threshold, it is determined whether the face image is directly derived from the living body. Compared with the traditional technology that relies on external temperature sensing equipment for live detection of face images, the embodiments of the present disclosure use an improved Resnet to perform live detection of face images, which reduces hardware requirements and improves Accuracy of face live detection.

Fig. 1 shows a flow chart of face living detection based on an improved Resnet according to an exemplary embodiment of the present disclosure: Step S100: Obtain a single frame image to be detected containing a face image; Step S110: Check the single frame to be detected For each face image in the image, based on the improved Resnet, the probability value that the face image is directly derived from a living body is obtained; step S120: based on the matching result of the probability value and a preset threshold, it is determined whether the face image is Directly derived from living organisms.

In the embodiments of the present disclosure, the deep residual network Resnet used for face live detection has been improved in structure in advance, and the improved Resnet can realize face live detection with more excellent performance. When performing face live detection, obtain a single frame image to be detected containing a face image, apply an improved Resnet to each face image in the single frame image to be detected, and determine the probability value of each face directly derived from a living body. According to the probability value, it is judged whether the corresponding face image is directly derived from a living body.

Hereinafter, each step in the embodiment of the present disclosure will be explained and described in detail with reference to the accompanying drawings.

In step S100, a single frame image to be detected containing a human face image is obtained.

The single-frame image to be detected refers to the image obtained by decomposing the to-be-detected video into single frames.

In one embodiment, as shown in FIG. 2, step S100 includes: step S1001: obtaining a video to be detected; step S1002: decomposing the video to be detected into single frame images; step S1003: based on the dlib framework, from the single frame A single frame image to be detected containing a face image is obtained from the frame image.

The video to be detected refers to the video obtained by the server that needs to detect whether the face image appearing in the video is directly derived from a living body.

dlib is a toolkit containing machine learning algorithms that can determine the area of the face in the image, that is, recognize the face image in a single frame of image.

In one embodiment, the server obtains the video to be detected from a video recording terminal, such as a camera, which needs to be detected whether the face image appears directly from a living body. The video to be detected may be obtained by the video recording terminal directly shooting the action of a living body, or may be obtained by the video recording terminal shooting the video played by the electronic device. Therefore, it is necessary to perform live detection on the acquired video to determine whether the face image appearing in the video is directly derived from a living body. After the video to be detected is obtained, the video to be detected is decomposed into single frame images. Based on the dlib framework, face detection is performed on a single frame image, and the single frame image containing the face image is determined as the single frame image to be detected. In this way, the single frame image to be detected containing the face image is extracted, so that the server can further perform live detection on the single frame image to be detected containing the face image.

In one embodiment, as shown in FIG. 3, step S1003 includes: step S10031: randomly extract one from the single frame image as the original single frame image; step S10032: confirm the original single frame image based on the dlib framework Step S10033: if it is confirmed that the original single-frame image contains a human face image, use the original single-frame image as the single-frame image to be detected; if it is confirmed that the original single-frame image contains no human For a face image, another one is randomly selected from the single frame image as the original single frame image until it is confirmed that the original single frame image contains a face image, and the original single frame image is used as the single frame image to be detected.

In an embodiment, the server decomposes the to-be-detected video into a single frame image frame by frame. Randomly select an image from the single frame image, determine whether the image contains a human face image based on the dlib framework, if so, use the image as a single frame image to be detected for live detection; if not, then randomly Select an image until a single-frame image of a human face image is obtained, and use it as a single-frame image to be detected for living body detection. Through this method, the purpose of obtaining a single frame image to be detected containing a face image is achieved.

In step S110, for each face image in the single frame image to be detected, the probability value that the face image is directly derived from a living body is obtained based on the improved Resnet.

Resnet refers to a deep residual network based on residual learning to solve the problem of gradient disappearance during machine learning training.

In one embodiment, after the server obtains the single frame image to be detected containing the face image, it extracts each face image contained in the single frame image separately and inputs it into the improved Resnet to obtain each face output by Resnet The image is directly derived from the probability value of a living body. By this method, when there are multiple face images in a single frame of image to be detected, the server can judge whether each face image is directly derived from a living body according to the probability value that each face image is directly derived from a living body.

In one embodiment, as shown in FIG. 5, the method for obtaining each face image in the single frame image to be detected in step S110 includes: Step S1101: extract the face features in the single frame image to be detected based on the dlib framework Point; Step S1102: Use each group of images of the predetermined shape and size area where the facial feature points are located as the facial image.

In one embodiment, after extracting face feature points on a single frame image to be detected containing a face image, a group of face feature points is obtained. Among them, each group of facial feature points corresponds to a facial image. An image of a predetermined-sized square area where each group of facial feature points is located is determined as the facial image corresponding to the facial feature points. Thus, each face image in the single frame image to be detected is obtained. Through this method, each face image in the single frame image to be detected is determined.

In one embodiment, the improved Resnet includes: adding a dropout layer after the Resnet pooling layer; and using a sigmoid function to output the probability value that the face image is directly derived from a living body.

The dropout layer makes the neural network ignore half of the feature detectors in each training batch, thereby reducing the occurrence of overfitting during the training process.

The sigmoid function is a special case of the logistic regression function. The mathematical curve is in the shape of "S" and is used to deal with two classification problems.

In one embodiment, adding a dropout layer after the Resnet pooling layer can effectively prevent the occurrence of overfitting.

In one embodiment, Resnet for processing multi-classification problems is improved to deal with two-class classification problems, that is, the sigmoid function is used to output the probability value of the face image directly derived from a living body, instead of the softmax function used in the prior art Output probability value. This is because the probability value output by using the sigmoid function is more suitable for further binary classification judgments, which makes the sigmoid function perform better than the softmax function specially used to deal with multi-classification problems in dealing with two classification problems. Through this method, the Resnet improved in accordance with this method performs better in dealing with the two-category problem of living detection ("living" and "non-living").

In an embodiment, the improved Resnet is trained in the following manner:

The face images labeled "living" and "non-living" in advance according to whether they are directly derived from a living body are taken as samples, and they are randomly divided into training set and verification set; based on the gradient descent algorithm, the training set is used to Improved Resnet for training: For each input sample of the training set, the improved Resnet will output the probability value of the sample directly derived from the living body, and paste the sample with the probability value greater than or equal to the preset standard value On the label of "living", label the samples with the probability value less than the preset standard value as "non-living" to determine whether the improved Resnet judges whether the training set samples are directly derived from the living body, If the correct rate of judgment for the training set samples is less than the preset expected value, the improved Resnet is updated, and then the training set is used to train it until the training set samples are The correct judgment rate of is greater than or equal to the preset expected value; the verification set is used to verify the improved Resnet whose correct rate of labeling the training set samples is greater than or equal to the preset expected value: the improved Resnet pair is determined Whether the verification set samples are directly derived from the judgment accuracy rate of the living body, if the judgment accuracy rate of the verification set samples is greater than or equal to the preset expected value, the Resnet training is completed; if it is less than the preset expected value, Then, the improved Resnet is re-trained until the correct rate of judging the verification set samples is greater than or equal to the preset expected value.

In one embodiment, when the training set samples are used for training, the preset standard value is 97%, and the preset expected value is 99%. Among them, the preset standard value is used to measure how likely the training set samples are directly derived from living bodies; the preset expected value is to measure the accuracy of Resnet's judgment on the training set samples. That is, only when the probability value of Resnet output training set samples directly derived from a living body is greater than or equal to 97%, the training set samples will be labeled as "live body". Due to the existence of the Resnet error, according to this method, the final label will be mislabeled, that is, the judgment of the training set samples may not be accurate. The purpose of using the training set for training is to achieve this method, so that the accuracy of the judgment of the training set samples can be greater than or equal to the preset expected value, which is 99%.

After using the training set samples to meet the training purpose, Resnet must be verified. This is because the training process is repeated training using the same set of samples, and there is sample deviation. The judgment accuracy rate of the training set samples can be greater than or equal to the preset expected value, which does not mean that the judgment accuracy rate of the samples outside the training set can also be greater than or equal to the preset expected value. Therefore, the validation set samples are used for verification and adjustment, so that Resnet's judgment accuracy of the training set samples and the validation set samples is greater than or equal to the preset expected value, which is 99%. At this point, Resnet's training is complete. Through this method, the occurrence of over-fitting is further reduced, so that in practical applications, Resnet can correctly determine whether the input face image is directly derived from a living body.

In an embodiment, for each face image in the single frame image to be detected, based on an improved Resnet, obtaining the probability value that the face image is directly derived from a living body includes: according to the area where the face image is located From left to right, sequentially input each of the face images into the improved Resnet, and obtain the probability value that the face image output by the improved Resnet is directly derived from a living body. Through this method, the probability value that each face image in the single frame image to be detected is directly derived from a living body is determined.

In an embodiment, based on the matching result of the probability value and a preset threshold, determining whether the face image is directly derived from a living body includes: if the probability value is greater than or equal to the preset threshold, determining the person The face image is directly derived from a living body; if the probability value is less than a preset threshold, it is determined that the face image is directly derived from a non-living body.

In one embodiment, the preset threshold value is 98.7%, that is, only the corresponding face image with a probability value of greater than or equal to 98.7% directly derived from a living body will be determined to be directly derived from a living body. There are two face images in the single frame image to be detected: face image A and face image B. After the face image A is input to the improved Resnet, the probability value of the direct source from the living body output by Resnet is 99.1%, which is greater than the preset threshold. Therefore, it is determined that the face image A is directly derived from the living body; the face image B is input to the improvement After the Resnet, the probability value of the direct source from the living body output by the Resnet is 95.3%, which is less than the preset threshold. Therefore, it is determined that the face image B is directly derived from the non-living body. By comparing the probability value with a preset threshold value, it is determined whether the face image is directly derived from a living body, thereby achieving the purpose of living body detection.

In one embodiment, as shown in FIG. 5, a face living detection device 20 based on an improved Resnet is provided, which specifically includes: a first acquisition module 201 configured to acquire a single frame image containing a face image to be detected The second acquisition module 202 is configured to acquire the probability value of the face image directly derived from a living body based on the improved Resnet for each face image in the single frame image to be detected; the determination module 203 is configured to be based on According to the matching result of the probability value and the preset threshold value, it is determined whether the face image is directly derived from a living body.

As shown in FIG. 5, the first acquisition module 201 in the improved Resnet-based face living detection device 20 includes: a video acquisition module 2011 to be detected, configured to acquire a video to be detected; a decomposition module 2012, configured to The video to be detected is decomposed into single-frame images; the single-frame image acquisition module 2013 to be detected is configured to acquire a single-frame image to be detected containing a face image from the single-frame image based on the dlib framework.

As shown in FIG. 5, the single-frame image acquisition module 2013 to be detected in the improved Resnet-based face living detection device 20 includes: a single-frame image extraction module 20131 configured to randomly extract one frame from the single-frame image The face image detection module 20132 is configured to confirm whether the original single frame image contains a face image based on the dlib framework; the discrimination module 20133 is configured to determine whether the original single frame image contains a person Face image, using the original single frame image as the single frame image to be detected; if it is confirmed that the original single frame image does not contain a human face image, another one is randomly selected from the single frame image as the original single frame image Until it is confirmed that the original single frame image contains a human face image, the original single frame image is used as the single frame image to be detected.

As shown in FIG. 5, the second acquisition module 202 in the improved Resnet-based face living detection device 20 includes: a face image acquisition module 2021, configured to acquire each face in the single frame image to be detected Image; the probability value acquisition module 2022, configured to acquire the probability value of the face image directly derived from a living body based on an improved Resnet.

As shown in FIG. 5, the face image acquisition module 2021 in the improved Resnet-based face living detection device 20 includes: a face feature point extraction module 20111, configured to extract the single frame to be detected based on the dlib framework Face feature points in the image; the face feature point combination module 20112 is configured to use each group of images of a predetermined shape and size area where the face feature points are located as the face image.

For the implementation process of the functions and roles of each module in the above-mentioned device, please refer to the implementation process of corresponding steps in the above-mentioned improved Resnet-based face living detection method for details, which will not be repeated here.

It should be noted that although several modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory. In fact, according to the embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.

In addition, although the various steps of the method of the present disclosure are described in a specific order in the drawings, this does not require or imply that these steps must be performed in the specific order, or that all the steps shown must be performed to achieve the desired result. Additionally or alternatively, some steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution, etc.

Through the description of the foregoing embodiments, those skilled in the art can easily understand that the exemplary embodiments described herein can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) execute the method according to the embodiment of the present disclosure.

Fig. 6 shows a system architecture diagram of face living detection based on improved Resnet according to an exemplary embodiment of the present disclosure. The system architecture includes: a video recording terminal 310, a server 320, and a management terminal 330.

In one embodiment, the management terminal 330 sends the parameters required for Resnet training: preset standard values and preset expected values to the server 320, so that the server 320 can complete the Resnet training. The server 320 obtains the video to be detected uploaded from the video recording terminal 310, and obtains a single frame image after framing the video to be detected. After obtaining the single-frame image to be detected containing the face image therefrom, input each face image in the single-frame image to be detected into the improved Resnet, thereby determining whether each face image is directly derived from a living body. The server 320 sends the recognition result to the management terminal 330, so that the management terminal 330 performs corresponding service processing based on the recognition result.

Through the above description of the system architecture, those skilled in the art can easily understand that the system architecture described here can realize the functions of each module in the device for face living detection based on the improved Resnet shown in FIG. 5.

In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

Those skilled in the art can understand that various aspects of the present application can be implemented as a system, method, or program product. Therefore, each aspect of the present application can be specifically implemented in the following forms, namely: complete hardware implementation, complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, which can be collectively referred to herein as "Circuit", "Module" or "System".

The electronic device 400 according to this embodiment of the present application will be described below with reference to FIG. 7. The electronic device 400 shown in FIG. 7 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present application.

As shown in FIG. 7, the electronic device 400 takes the form of a general-purpose computing device. The components of the electronic device 400 may include, but are not limited to: the aforementioned at least one processing unit 410, the aforementioned at least one storage unit 420, and a bus 430 connecting different system components (including the storage unit 420 and the processing unit 410).

Wherein, the storage unit stores program code, and the program code can be executed by the processing unit 410, so that the processing unit 410 executes the various exemplary methods described in the "Exemplary Method" section of this specification. Implementation steps. For example, the processing unit 410 may perform step S100 as shown in FIG. 1: Obtain a single frame image to be detected containing a face image; Step S110: For each face image in the single frame image to be detected, based on improved Resnet of, obtains the probability value that the face image is directly derived from a living body; Step S120: Based on the matching result of the probability value and a preset threshold, determine whether the face image is directly derived from a living body.

The storage unit 420 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 4201 and/or a cache storage unit 4202, and may further include a read-only storage unit (ROM) 4203.

The storage unit 420 may also include a program/utility tool 4204 having a set of (at least one) program module 4205. Such program module 4205 includes but is not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.

The bus 430 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.

The electronic device 400 can also communicate with one or more external devices 500 (such as keyboards, pointing devices, Bluetooth devices, etc.), and can also communicate with one or more devices that enable a user to interact with the electronic device 400, and/or communicate with Any device (such as a router, modem, etc.) that enables the electronic device 400 to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 450. Moreover, the electronic device 400 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 460. As shown in the figure, the network adapter 460 communicates with other modules of the electronic device 400 through the bus 430. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the electronic device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.

Through the description of the foregoing embodiments, those skilled in the art can easily understand that the exemplary embodiments described herein can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present disclosure.

In the exemplary embodiment of the present disclosure, there is also provided a computer non-volatile readable storage medium on which is stored a program product capable of implementing the above-mentioned method in this specification. In some possible implementation manners, various aspects of the present application can also be implemented in the form of a program product, which includes program code. When the program product runs on a terminal device, the program code is used to enable the The terminal device executes the steps according to various exemplary embodiments of the present application described in the above-mentioned "Exemplary Method" section of this specification.

Referring to FIG. 8, a program product 600 for implementing the above method according to an embodiment of the present application is described. It can adopt a portable compact disk read-only memory (CD-ROM) and include program code, and can be installed in a terminal device, For example, running on a personal computer. However, the program product of this application is not limited to this. In this document, the non-volatile readable storage medium can be any tangible medium that contains or stores a program. The program can be used by or combined with an instruction execution system, device, or device. use.

The program product can use any combination of one or more readable media. The non-volatile readable storage medium may be a readable signal medium or a readable storage medium. The non-volatile readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. More specific examples (non-exhaustive list) of non-volatile readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM ), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

The computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a non-volatile readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.

The program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.

The program code for performing the operations of this application can be written in any combination of one or more programming languages. The programming languages include object-oriented programming languages such as Java, C++, etc., as well as conventional procedural programming languages. Programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on. In the case of a remote computing device, the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computing device (for example, using Internet service providers) Business to connect via the Internet).

In addition, the above-mentioned drawings are only schematic illustrations of the processing included in the method according to the exemplary embodiments of the present application, and are not intended for limitation. It is easy to understand that the processing shown in the above drawings does not indicate or limit the time sequence of these processings. In addition, it is easy to understand that these processes can be executed synchronously or asynchronously in multiple modules, for example.

Those skilled in the art will easily think of other embodiments of the present disclosure after considering the specification and practicing the invention disclosed herein. This application is intended to cover any variations, uses, or adaptive changes of the present disclosure, which follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure . The description and embodiments are only regarded as exemplary, and the true scope and spirit of the present disclosure are pointed out by the claims.

Claims

A method for face live detection based on improved Resnet, which is characterized in that it includes:

Obtain a single frame image to be detected containing a face image;

For each face image in the single frame image to be detected, based on the improved Resnet, obtain the probability value that the face image is directly derived from a living body;

Based on the matching result of the probability value and the preset threshold value, it is determined whether the face image is directly derived from a living body.
The method according to claim 1, wherein the obtaining a single frame image to be detected containing a human face image comprises:

Obtain the video to be detected;

Decompose the to-be-detected video into single frame images;

Based on the dlib framework, a single frame image to be detected containing a face image is obtained from the single frame image.
The method according to claim 2, wherein the obtaining a single frame image to be detected containing a face image from the single frame image based on the dlib framework comprises:

Randomly extract one from the single frame image as the original single frame image;

Based on the dlib framework, confirm whether the original single frame image contains a face image;

If it is confirmed that the original single frame image contains a human face image, use the original single frame image as the single frame image to be detected; if it is confirmed that the original single frame image does not contain a human face image, select the single frame image from Another randomly selected one is used as the original single frame image until it is confirmed that the original single frame image contains a human face image, and the original single frame image is used as the single frame image to be detected.
The method according to claim 1, characterized in that, for each face image in the single frame image to be detected, based on an improved Resnet, obtaining the probability value that the face image directly comes from a living body includes :

Acquiring each face image in the single frame image to be detected;

Based on the improved Resnet, the probability value that the face image is directly derived from a living body is obtained.
The method according to claim 4, wherein the acquiring each face image in the single frame image to be detected comprises:

Based on the dlib framework, extract the facial feature points in the single frame image to be detected;

Each group of images of the predetermined shape and size area where the facial feature points are located is used as the facial image.
An improved Resnet-based live face detection device, which is characterized in that it comprises:

The first acquisition module is configured to acquire a single frame image to be detected containing a face image;

The second acquisition module is configured to acquire, for each face image in the single frame image to be detected, the probability value that the face image is directly derived from a living body based on the improved Resnet;

The living body determination module is configured to determine whether the face image is directly derived from a living body based on the matching result of the probability value and a preset threshold.
The apparatus according to claim 6, wherein the first obtaining module comprises:

The video acquisition module to be detected is configured to acquire the video to be detected;

A decomposition module, configured to decompose the to-be-detected video into a single frame image;

The single-frame image acquisition module to be detected is configured to obtain a single-frame image to be detected containing a face image from the single-frame image based on the dlib framework.
The device according to claim 6, wherein the acquisition module of the single frame image to be detected comprises:

The single frame image extraction module is configured to randomly extract one of the single frame images as the original single frame image;

The face image detection module is configured to confirm whether the original single frame image contains a face image based on the dlib framework;

The discrimination module is configured to use the original single frame image as the single frame image to be detected if it is confirmed that the original single frame image contains a human face image; One more piece of the single frame image is randomly selected as the original single frame image until it is confirmed that the original single frame image contains a face image, and the original single frame image is used as the single frame image to be detected.
The device according to claim 6, wherein the second acquisition module comprises:

A face image acquisition module, configured to acquire each face image in the single frame image to be detected;

The probability value acquisition module is configured to acquire the probability value that the face image is directly derived from a living body based on the improved Resnet.
The apparatus according to claim 6, wherein the facial image acquisition module comprises:

The facial feature point extraction module is configured to extract the facial feature points in the single frame image to be detected based on the dlib framework;

The face feature point combination module is configured to use each group of images of a predetermined shape and size area where the face feature points are located as the face image.
An electronic device for face live detection based on improved Resnet, which is characterized in that it includes:

Memory, configured to store executable instructions;

A processor configured to execute executable instructions stored in the memory;

Wherein, the processor is configured to perform the following processing when executing the executable instruction:

Obtain a single frame image to be detected containing a face image;

For each face image in the single frame image to be detected, based on the improved Resnet, obtain the probability value that the face image is directly derived from a living body;

Based on the matching result of the probability value and the preset threshold value, it is determined whether the face image is directly derived from a living body.
The electronic device according to claim 11, wherein the processor is configured to perform the following processing when executing the executable instruction to implement the acquisition of the single-frame image to be detected containing the face image:

Obtain the video to be detected;

Decompose the to-be-detected video into single frame images;

Based on the dlib framework, a single frame image to be detected containing a face image is obtained from the single frame image.
The electronic device according to claim 11, wherein the processor is configured to execute the following processing when executing the executable instruction to implement the dlib-based framework to obtain the human face from the single frame image Single frame of image to be detected:

Randomly extract one from the single frame image as the original single frame image;

Based on the dlib framework, confirm whether the original single frame image contains a face image;

If it is confirmed that the original single frame image contains a human face image, use the original single frame image as the single frame image to be detected; if it is confirmed that the original single frame image does not contain a human face image, select the single frame image from Another randomly selected one is used as the original single frame image until it is confirmed that the original single frame image contains a human face image, and the original single frame image is used as the single frame image to be detected.
The electronic device according to claim 11, wherein the processor is configured to perform the following processing when executing the executable instruction to implement the processing of each face image in the single frame image to be detected, Based on the improved Resnet, the probability value that the face image is directly derived from a living body is obtained:

Acquiring each face image in the single frame image to be detected;

Based on the improved Resnet, the probability value that the face image is directly derived from a living body is obtained.
The electronic device according to claim 11, wherein the processor is configured to perform the following processing when executing the executable instruction to obtain each face image in the single frame image to be detected:

Based on the dlib framework, extract the facial feature points in the single frame image to be detected;

Each group of images of the predetermined shape and size area where the facial feature points are located is used as the facial image.
A computer non-volatile readable storage medium, characterized in that it stores computer program instructions, and when the computer instructions are executed by a computer, the processor is configured as:

Obtain a single frame image to be detected containing a face image;

For each face image in the single frame image to be detected, based on the improved Resnet, obtain the probability value that the face image is directly derived from a living body;

Based on the matching result of the probability value and the preset threshold value, it is determined whether the face image is directly derived from a living body.
The computer non-volatile readable storage medium according to claim 16, wherein when the computer instructions are executed by the processor, the processor is configured to perform the following processing to achieve the acquisition of the human face Single frame image of the image to be detected: obtain the video to be detected;

Decompose the to-be-detected video into single frame images;

Based on the dlib framework, a single frame image to be detected containing a face image is obtained from the single frame image.
The computer non-volatile readable storage medium according to claim 16, wherein when the computer instructions are executed by the processor, the processor is configured to implement the dlib-based framework by executing the following processing, Obtain a single frame image to be detected containing a face image from the single frame image:

Randomly extract one from the single frame image as the original single frame image;

Based on the dlib framework, confirm whether the original single frame image contains a face image;

If it is confirmed that the original single frame image contains a human face image, use the original single frame image as the single frame image to be detected; if it is confirmed that the original single frame image does not contain a human face image, select the single frame image from Another randomly selected one is used as the original single frame image until it is confirmed that the original single frame image contains a human face image, and the original single frame image is used as the single frame image to be detected.
The computer non-volatile readable storage medium according to claim 16, wherein when the computer instructions are executed by a processor, the processor is configured to perform the following processing to implement the processing Detect each face image in a single frame image, and obtain the probability value that the face image directly comes from a living body based on the improved Resnet:

Acquiring each face image in the single frame image to be detected;

Based on the improved Resnet, the probability value that the face image is directly derived from a living body is obtained.
The computer non-volatile readable storage medium according to claim 16, wherein when the computer instructions are executed by the processor, the processor is configured to perform the following processing to achieve the obtaining of the waiting Detect each face image in a single frame image:

Based on the dlib framework, extract the facial feature points in the single frame image to be detected;

Each group of images of the predetermined shape and size area where the facial feature points are located is used as the facial image.