WO2023061123A1

WO2023061123A1 - Facial silent living body detection method and apparatus, and storage medium and device

Info

Publication number: WO2023061123A1
Application number: PCT/CN2022/118048
Authority: WO
Inventors: 王洪
Original assignee: 北京眼神科技有限公司; 北京眼神智能科技有限公司
Priority date: 2021-10-15
Filing date: 2022-09-09
Publication date: 2023-04-20
Also published as: CN115995102A

Abstract

A facial silent living body detection method and apparatus, and a storage medium and a device. The method comprises: acquiring a facial image that is collected on the basis of a specific light supplementing parameter, and performing pre-processing (S100); inputting the facial image into a trained convolutional neural network, extracting a feature vector, obtaining a living body score according to the extracted feature vector, and regressing a light supplementing parameter according to the facial image (S200); and when the living body score is greater than a set threshold value and the difference between the regressed light supplementing parameter and the specific light supplementing parameter is within a preset range, determining that the facial image is from a real face, otherwise, determining that the facial image is from a fake face (S300).

Description

Face silent liveness detection method, device, storage medium and equipment

This application claims the priority of the Chinese patent application filed on October 15, 2021, with the application number 202111201839.8, and the title is "Method, device, storage medium and equipment for face silent liveness detection", the full text of which is hereby taken as a reference

technical field

The present application relates to the field of face recognition, in particular to a method, device, storage medium and equipment for face silent liveness detection.

Background technique

At present, face recognition technology has been widely used in the fields of finance and security. Because the face has the advantages of easy acquisition and non-contact, but it is also very easy to be used by others to break through the face recognition system by taking photos or remakes of videos. Therefore, face liveness detection technology is particularly important as the first threshold of face recognition technology.

At present, there are mainly three methods for face liveness detection on the mobile terminal. The first way is face movement liveness detection. The liveness detection system issues random head and face movement instructions, and the user completes the corresponding actions according to the instructions to determine that the person is alive; The first method is to extract the features used to distinguish the real face and the prosthetic face from the RGB image, and do the binary classification of the real face and the prosthetic face; the third method is the integration of the first two methods. In the process of performing human face liveness detection in the first way, a plurality of multi-expression RGB images are provided for the second way of human face liveness detection to determine whether it is a real human face.

Human face motion liveness detection requires a high degree of cooperation from users, and the random head and face motion commands issued are generally a single motion command, such as nodding, turning the head, blinking, and opening the mouth. There is a high probability that it can be deceived through the liveness detection system, or videos that take multiple actions of the user can also easily pass through the liveness detection system.

In the second method, the RGB image is extracted to judge the features of the real face and the prosthetic face. The method of live detection is silent live detection. This live detection method does not require user cooperation and is more easily accepted by users. When extracting features for judging real faces and prosthetic faces, deep learning methods are generally used. Driven by a large amount of live and non-living face data, automatic learning can effectively distinguish the features of real faces and prosthetic faces. The difference between real and fake face imaging. However, due to differences in lighting, resolution, different mobile phone lenses and other imaging differences in RGB images, as well as through high-definition video playback, it is easy to fool the liveness detection algorithm and cause misjudgment.

Contents of the invention

According to various embodiments of the present application, a method, device, storage medium, and equipment for face silent liveness detection are provided.

This application provides technical scheme as follows:

In a first aspect, the present application provides a method for face silent living body detection, the method comprising:

Obtaining a face image collected under specific supplementary light parameters, performing preprocessing on the human face image to obtain a preprocessed human face image; wherein, the supplementary light parameters include RGB color parameters and brightness parameters;

The preprocessed face image is input into a trained convolutional neural network to extract feature vectors, which are used to distinguish real faces and fake faces, and obtain living body scores according to the feature vectors; and According to the face image after the pretreatment, the light supplement parameter is regressed to obtain the light supplement parameter after regression;

When the living body score is greater than the set threshold, and the difference between the regressed fill light parameter and the specific fill light parameter is within a preset range, it is determined that the face image is from a real face, otherwise , it is determined that the face image is from a prosthetic face.

In one embodiment, the acquisition of the face image collected under specific supplementary light parameters includes:

During the process of collecting the face image, the supplementary light parameter is randomly changed every certain time interval, and the supplementary light parameter corresponding to the acquisition time is recorded as the specific supplementary light parameter.

In one embodiment, the RGB color parameter is α, the brightness parameter is β, and the RGB color parameter is calculated by the following formula:

α = (R/255, G/255, B/255),

Among them, R, G, and B are the R, G, and B values of the fill light, respectively, and β∈[0,1].

In one embodiment, the convolutional neural network is trained by the following method:

Acquiring face images collected under a plurality of fill light parameters, preprocessing the face images, and adding labels to the face images to obtain a training sample set; wherein the labels include RGB in the fill light parameters Color parameter α and brightness parameter β;

The training sample set is input into the convolutional neural network, and the feature vector sample is extracted, and the feature vector sample is used to distinguish between a real human face and a prosthetic human face, and returns the fill light parameter according to the training sample set;

Calculate the loss of the eigenvector sample by ArcFace Loss, calculate the loss of the fill light parameter regression through the Euclidean loss function, and update the parameters of the convolutional neural network through backpropagation; wherein, when calculating the loss of the fill light parameter regression , the regressors of the prosthetic training samples include RGB color parameters and brightness parameters, where α=(0,0,0), β=0.

In one embodiment, the preprocessing of the face image includes:

Carry out face detection and face key point location to described face image, obtain binocular coordinates;

The face image is aligned and scaled by binocular coordinates to obtain the preprocessed face image.

In an embodiment, the supplementary light is performed through the screen of the mobile terminal.

In one embodiment, the method also includes:

Calculate the Euclidean distance between the regressed fill light parameter and the specific fill light parameter, and obtain the difference between the regressed fill light parameter and the specific fill light parameter.

In one embodiment, the obtaining the living body score according to the feature vector includes:

Input the feature vector into ArcFace Loss to get the living body score.

In an embodiment, the key points of the human face further include key points of the nose, key points of the left corner of the mouth, and key points of the right corner of the mouth.

In one embodiment, when the living body score is greater than a set threshold and the difference between the regressed fill light parameter and the specific fill light parameter is within a preset range, it is determined that the person The face image is from a real face, otherwise, it is determined that the face image is from a prosthetic face, including:

When the living body score is greater than or equal to a set threshold, calculate the difference between the regressed fill light parameter and the specific fill light parameter;

When the difference is within a preset range, it is determined that the face image is from a real face,

When the difference is not within the preset range, it is determined that the face image is from a prosthetic face;

In a case where the living body score is smaller than the set threshold, it is determined that the face image is from a prosthetic face.

In a second aspect, the present application provides a face silent living body detection device, the device comprising:

An image acquisition module, configured to acquire a face image collected under specific light supplement parameters, and preprocess the face image to obtain a preprocessed face image; wherein, the light supplement parameters include RGB color parameters and brightness parameters;

A processing module, configured to input the preprocessed face image into a trained convolutional neural network to extract a feature vector, which is used to distinguish a real face from a prosthetic face, obtained according to the feature vector In vivo score; And according to the face image after the preprocessing regression supplementary light parameter, obtain the supplementary light parameter after regression;

A judging module, configured to determine that the face image is from If it is a real face, otherwise, it is determined that the face image is from a prosthetic face.

In one embodiment, the image acquisition module is specifically used for:

In one embodiment, the RGB color parameter is α, and the brightness parameter is β; the device also includes:

A color parameter determination module, configured to calculate the RGB color parameters by the following formula:

α = (R/255, G/255, B/255),

In one embodiment, the device also includes:

The sample acquisition module is used to obtain the face images collected under a plurality of supplementary light parameters, preprocess and add labels to obtain a training sample set; wherein, the labels include RGB color parameters α and brightness parameters β of supplementary light parameters;

The forward processing module is used for inputting the training sample set into the convolutional neural network, extracting feature vector samples for discriminating real faces and prosthetic faces, and regressing fill light parameters according to the training sample set;

The backpropagation module is used to calculate the loss of the eigenvector samples through ArcFace Loss, calculate the loss of the fill light parameter regression through the Euclidean loss function, and update the parameters of the convolutional neural network through backpropagation; wherein, when calculating the fill light parameter regression When the loss of , the regressor of the prosthetic training sample includes RGB color parameters and brightness parameters, where α=(0,0,0), β=0.

In one embodiment, the image acquisition module includes:

A face detection and positioning unit, configured to perform face detection and key point positioning on the face image to obtain binocular coordinates;

The human face normalization unit is used to align and scale the human face image through binocular coordinates to obtain the preprocessed human face image.

In one embodiment, the device also includes:

The calculation module is used to calculate the Euclidean distance between the regressed fill light parameter and the specific fill light parameter, and obtain the difference between the regressed fill light parameter and the specific fill light parameter.

In one example, the device also includes:

In one example, the processing module is specifically used for:

Input the feature vector into ArcFace Loss to get the living body score.

In an example, the judging module is specifically used for when the living body score is greater than a set threshold, and the difference between the regressed fill light parameter and the specific fill light parameter is within a preset range , it is determined that the face image is from a real face, otherwise, it is determined that the face image is from a fake face, including:

When the living body score is greater than or equal to the set threshold, calculate the difference between the regressed fill light parameter and the specific fill light parameter;

In a third aspect, the present application provides a computer-readable storage medium for face silent liveness detection, including a memory for storing processor-executable instructions, and when the instructions are executed by the processor, the first aspect is implemented. The steps of the described human face silent living body detection method.

In a fourth aspect, the present application provides a device for face silent liveness detection, including at least one processor and a memory storing computer-executable instructions. When the processor executes the instructions, the human body described in the first aspect is realized Steps of face silent liveness detection method.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features, objects and advantages of the present application will be apparent from the description, drawings and claims.

Description of drawings

Fig. 1 is the flow chart of the face silent life detection method according to one or more embodiments;

2 is a schematic diagram of a training process of a convolutional neural network according to one or more embodiments;

FIG. 3 is a flowchart of a training method of a convolutional neural network according to one or more embodiments;

Fig. 4 is a schematic diagram of a face silent liveness detection device according to one or more embodiments.

In order to better describe and illustrate embodiments and/or examples of the inventions disclosed herein, reference may be made to one or more of the accompanying drawings. Additional details or examples used to describe the drawings should not be considered limitations on the scope of any of the disclosed inventions, the presently described embodiments and/or examples, and the best mode of these inventions currently understood.

Detailed ways

In order to make the technical problems, technical solutions and advantages to be solved by the present application clearer, the technical solutions of the present application will be clearly and completely described below in conjunction with the accompanying drawings and specific embodiments. Apparently, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. The components of the embodiments of the application generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents selected embodiments of the application. Based on the embodiments of the present application, all other embodiments obtained by those skilled in the art without making creative efforts belong to the scope of protection of the present application.

In one embodiment, at present, there are two main influencing factors for human face liveness detection: one is ambient light, which has always been a huge challenge for various tasks of human face, and ambient light seriously affects the silent liveness detection of human face. accuracy. The second is the ultra-high-definition capability of the lens and screen. The ultra-high-definition face video or image captured by the ultra-high-definition lens is played on the ultra-high-definition screen. To a certain extent, it is difficult for the human eye to distinguish it from the real face. Algorithmic challenge.

In order to solve the above problems, the embodiment of the present application provides a face silent liveness detection method, as shown in Figure 1, the method includes:

S100: Obtain a face image collected under specific fill light parameters, perform preprocessing on the face image, and obtain a preprocessed face image.

Among them, the fill light parameters include RGB color parameters and brightness parameters. The fill light parameters have four parameters. The first three are fill light RGB values, namely RGB color parameters, including R color parameters, G color parameters, and B color parameters. The parameter represents the color parameter of red, the G color parameter represents the color parameter of green, and the B color parameter represents the color parameter of blue, that is, which color of light is to be supplemented. The fourth parameter is the brightness parameter.

In the embodiment of the present application, supplementary light with specific supplementary light parameters may be provided, so that a human face image may be collected under the specific supplementary light parameters. Based on this, it is possible to record the specific supplementary light parameter, and record the face image collected under the specific parameter. The face image is preprocessed to obtain the preprocessed face image.

S200: Input the preprocessed face image into the trained convolutional neural network, extract the feature vector, and obtain the in vivo score according to the feature vector; and return the fill light parameter according to the preprocessed face image to obtain the return fill light light parameters.

Among them, the eigenvectors are used to discriminate real faces and prosthetic faces.

In this embodiment, the preprocessed face image can be input to the pre-trained convolutional neural network to obtain the output result of the convolutional neural network, which is the feature vector of the preprocessed face image . Calculations are performed based on the eigenvectors to obtain the score of the living body. It is also possible to perform reverse reasoning based on the pre-trained convolutional neural network and the pre-processed face image to regress the fill light parameters.

Based on this, the convolutional neural network of the present application not only obtains the living body score for living body detection, but also performs reverse reasoning based on the input face image, and returns to obtain the supplementary light parameters, that is, obtains the supplementary light parameters after regression. The regressed fill light parameters are used for verification with the aforementioned specific fill light parameters.

S300: When the living body score is greater than the set threshold and the difference between the regressed fill light parameter and the specific fill light parameter is within the preset range, determine that the face image is from a real face; otherwise, determine that the face image is from a fake face. body face.

This application compares the regressed supplementary light parameters with specific supplementary light parameters, and calculates the difference, which can distinguish whether the collected face video or image is played on an ultra-high-definition screen. The Euclidean distance between the light parameter and the specific fill light parameter is obtained. For a real face, the difference between the regressed fill light parameter and the specific fill light parameter is small, and for the face played on the screen, a layer of fill light information (the light played on the screen) is added to the played screen at the same time, so that There is a large deviation between the regressed fill light parameter and the specific fill light parameter, and this deviation can accurately reject the attack of the face on the high-definition screen.

The human face silent living body detection method provided in the embodiment of the present application can obtain the human face image collected under specific supplementary light parameters, and input the trained convolutional neural network to extract the feature vector, and according to the extracted feature vector Get the living body score, and at the same time regress to get the fill light parameters; judge whether it is a real face according to the live score, the difference between the regressed fill light parameters and the specific fill light parameters. By adopting this method, the material and 3D information differences between the real face and the prosthetic face are highlighted through RGB fill light, which is beneficial to distinguish between real and fake faces, and effectively alleviates the influence of ambient light; through face image fill light The parameters are regressed, and the regressed fill light parameters are verified with specific fill light parameters, which solves the misjudgment of the high-definition remake of the prosthetic face.

That is to say, RGB supplementary light can effectively avoid the influence of ambient light on the face silent liveness detection method, and the high-definition remake of ordinary faces, under the assistance of RGB supplementary light in this application, adds colorful lighting factors, so it is basically correct sentenced. That is, the face image is collected under RGB supplementary light, which can effectively alleviate the influence of ambient light; and the prosthetic face of different materials has obvious difference in surface material compared with the real face skin. The difference can be amplified, which is more conducive to distinguishing real and fake faces; human faces contain rich 3D information, while photos and screen prosthetic faces are basically flat (or there is no rich 3D information of real faces), RGB fill light The method of irradiating the human face can enlarge the 3D information difference between the real face and the prosthetic face, which is also beneficial to distinguish the real face from the fake face.

In one example, during the process of capturing the face image, the supplementary light parameters are randomly changed at regular intervals, and the supplementary light parameters corresponding to the acquisition time are recorded as specific supplementary light parameters.

Taking the mobile terminal as an example, the front camera of the mobile phone is used to collect face images. During the collection process, every time interval (which can be recorded as τ), the mobile phone screen randomly changes the light of different colors and brightness, collects face images, and records The corresponding fill light parameters.

Due to the randomly generated parameter fill light, there are differences in the collected face images, and the fill light parameters are verified through the facial image feature regression, thereby effectively increasing the anti-hacking ability of high-definition videos.

In an example, the aforementioned RGB color parameter is α, and the brightness parameter is β;

Wherein, α=(R/255, G/255, B/255), R, G, and B are the R, G, and B values of the fill light, respectively, and β∈[0,1].

In this application, the R, G, and B values of the fill light are divided by 255, and the value range is controlled at [0,1], in order to facilitate convergence during training.

In an example, the aforementioned convolutional neural network can be a ResNet (Residual Network, deep residual) network, which can be obtained by training as shown in Figure 3:

S10: Acquire face images collected under multiple fill light parameters, preprocess the face images, and add labels to the face images to obtain a training sample set.

Among them, the label includes the RGB color parameter α and the brightness parameter β among the fill light parameters, and α and parameter β constitute a 1*4-dimensional vector.

The preprocessing in this step is the same as the preprocessing method in S100. In one example, the preprocessing includes face preprocessing mainly includes face detection, face key point location, head pose estimation, and face normalization. of:

Use MTCNN (Multi-task convolutional neural network, multi-task convolutional neural network) detection algorithm to realize face detection and face feature point location. For example, the following five key points of face can be used, including left eye, right eye, Nose, left mouth corner, right mouth corner.

The 3D attitude information is estimated by the 2D coordinate information of the key points, and the algorithm used for estimation can be the SolvePnP algorithm (monocular relative pose estimation function) in OpenCV.

Through the coordinates of the eyes, the normalization operation including face alignment and scaling is performed on the face image: for example, the eyes can be aligned to (94,108) and (129,108), the nose can be aligned to (112,128), and the corners of the mouth can be aligned to (98,148) and ( 126,148), scaled to a size of (224,224).

S20: Input the training sample set into the convolutional neural network, extract feature vector samples, the feature vector samples are used to distinguish real faces and prosthetic faces, and return fill light parameters according to the training sample set.

S30: Calculate the loss of the eigenvector samples through ArcFace Loss, calculate the loss of the fill light parameter regression through the Euclidean loss function, as shown in Figure 2, and update the parameters of the convolutional neural network through backpropagation.

In this step, European loss is used for parameter regression, LabelLoss outputs a 512-dimensional feature vector to the network, and ArcFaceLoss (face recognition loss function) is used for training. In this way, the parameters of the convolutional neural network (resnet18 network) can be updated through Backpropagation through LabelLoss and regression loss.

This application regards the real face as a closed set, and the prosthetic face as an open set. Because the prosthetic face has various attack methods, it is a relatively open set compared with the real face, so when calculating the loss of the fill light parameter regression, the regressor of the prosthetic training sample includes RGB color parameters and brightness parameters, among them, α=(0,0,0), β=0.

Based on the above scheme, backpropagation through LabelLoss and regression loss can achieve better training effect and improve the accuracy of convolutional neural network.

In an example, the method further includes: supplementing light through the screen of the mobile terminal.

In the embodiment of the present application, the method provided by the embodiment of the present disclosure can be applied to mobile terminals such as mobile phones, and the RGB supplementary light of different brightnesses is performed through the screen of the mobile terminal in consideration of the actual usage scenarios of the mobile terminal. It can make full use of the advantages of mobile terminal devices without adding additional lighting hardware; and use screen lighting to provide uniform lighting on the plane, unify the light intensity to a certain extent, and effectively alleviate the impact of ambient light.

Of course, the embodiments of the present application are not limited to mobile terminals, and can also be applied to other devices other than mobile terminals, providing RGB supplementary light with different brightnesses through additional supplementary light hardware instead of the screen supplementary light.

Based on the above scheme, different degrees of supplementary light can be performed through the mobile terminal, increasing the flexibility and convenience of supplementary light.

In one example, the method also includes:

Based on the above solution, the gap between the regressed fill light parameter and a specific fill light parameter can be accurately determined by calculating the Euclidean distance.

In an example, the obtaining the living body score according to the feature vector includes:

Input the feature vector into ArcFace Loss to get the living body score.

Based on the above scheme, the accuracy of face detection can be improved.

In an example, the key points of the human face further include key points of the nose, key points of the left corner of the mouth, and key points of the right corner of the mouth.

In an example, the face image is aligned and scaled through binocular coordinates to obtain a preprocessed face image, including:

Aligning the coordinate information of each of the key points of the human face;

Scaling is performed on the face image to obtain a preprocessed face image.

Based on the above scheme, the consistency of each data sample in the sample data set can be ensured and the training effect can be improved by preprocessing the alignment and scaling of the face images.

It should be understood that although the steps in the flow charts shown in FIG. 1 and FIG. 3 are displayed sequentially as indicated by the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in FIG. 1 and FIG. 3 may include a plurality of sub-steps or stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times. These steps or stages The order of execution is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages in other steps.

In one embodiment, the embodiment of the present application provides a face silent liveness detection device, as shown in Figure 4, the device includes:

The image acquisition module 401 is configured to acquire a human face image collected under specific supplementary light parameters, and perform preprocessing on the human face image to obtain a preprocessed human face image; wherein the supplementary light parameters include RGB colors parameter and brightness parameter.

The processing module 402 is used to input the preprocessed face image into a trained convolutional neural network to extract a feature vector, which is used to distinguish a real face from a prosthetic face, according to the feature vector Obtaining the living body score; and regressing the supplementary light parameters according to the preprocessed face image to obtain the regression supplementary light parameters.

Judging module 403, used to determine that the face image is from a real face when the score of the living body is greater than the set threshold and the difference between the regressed fill light parameter and the specific fill light parameter is within a preset range; The face image is from a prosthetic face.

This application obtains the face image collected under the specific supplementary light parameters, and inputs the trained convolutional neural network, extracts the feature vector, obtains the score of the living body according to the extracted feature vector, and returns the supplementary light parameter at the same time; Judging whether it is a real face or not according to the living body score, the difference between the regressed fill light parameter and the specific fill light parameter. This application highlights the difference in material and 3D information between the real face and the prosthetic face through RGB fill light, which is beneficial to distinguish between real and fake faces, and effectively alleviates the impact of ambient light; regression is performed through face image fill light parameters , and verify the regressed fill light parameters with specific fill light parameters, which solves the misjudgment of the high-definition remake of the prosthetic face.

In the described image acquisition module, obtain the face image collected under specific supplementary light parameters, including:

In the process of collecting face images, the supplementary light parameters are randomly changed at regular intervals, and the supplementary light parameters corresponding to the acquisition time are recorded as specific supplementary light parameters.

The aforementioned RGB color parameter is α, and the brightness parameter is β, and the RGB color parameter is calculated by the following formula:

α = (R/255, G/255, B/255),

The convolutional neural network of this application is obtained through the following module training:

The sample acquisition module is used to acquire face images collected under multiple light supplement parameters, perform preprocessing on the face images, and add labels to the face images to obtain a training sample set.

Wherein, the label includes an RGB color parameter α and a brightness parameter β of the fill light parameter.

The forward processing module is used to input the training sample set into the convolutional neural network, extract feature vector samples, and the feature vector samples are used to distinguish real faces and fake faces, and return and complement the training sample set according to the training sample set. light parameters.

The backpropagation module is used to calculate the loss of feature vector samples through ArcFace Loss, calculate the loss of fill light parameter regression through the Euclidean loss function, and update the parameters of the convolutional neural network through backpropagation.

Wherein, when calculating the regression loss of the fill light parameter, the regressor of the prosthetic training sample includes RGB color parameters and brightness parameters, where α=(0,0,0), β=0.

In one example, the image acquisition module includes:

The face detection and positioning unit is used to perform face detection and face key point positioning on the face image to obtain binocular coordinates.

The human face normalization unit is used for aligning and scaling human face images through binocular coordinates.

This application can be used in mobile terminals, and supplementary light can be performed through the screen of the mobile terminal.

In one example, the device also includes:

In one example, the processing module is specifically used for:

Input the feature vector into ArcFace Loss to get the living body score.

The implementation principle and technical effects of the device provided by the embodiment of the present application are the same as those of the method provided by the above-mentioned embodiment. The corresponding content in the method provided. Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the devices and units described above can refer to the corresponding process in the method provided by the above-mentioned embodiments, and will not be repeated here. .

Each module in the above-mentioned face silent living body detection device can be fully or partially realized by software, hardware and combinations thereof. Wherein, the network interface may be an Ethernet card or a wireless network card or the like. The above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.

As used in this application, the terms "component," "module," and "system" and the like are intended to mean a computer-related entity, which may be hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. As an illustration, both an application running on a server and a server can be a component. One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers.

In one embodiment, the methods described in the above embodiments provided by this application can implement business logic through computer programs and record them on a storage medium, and the storage medium can be read and executed by a computer to implement the business logic described in the embodiments of this specification. The method describes the effect of the program. Therefore, the present application also provides a computer-readable storage medium for face silent liveness detection, including a memory for storing processor-executable instructions, and when the instructions are executed by the processor, the face silent liveness detection method comprising the above-mentioned embodiments is realized A step of.

This application highlights the difference in material and 3D information between the real face and the prosthetic face through RGB fill light, which is beneficial to distinguish between real and fake faces, and effectively alleviates the impact of ambient light; regression is performed through face image fill light parameters , and verify the regressed fill light parameters with specific fill light parameters, which solves the misjudgment of the high-definition remake of the prosthetic face.

The storage medium may include a physical device for storing information, and information is usually digitized and then stored using an electrical, magnetic, or optical medium. Described storage medium can include: the device that utilizes electric energy mode to store information such as, various memory, as RAM, ROM etc.; USB stick; a device that stores information optically, such as a CD or DVD. Of course, there are other readable storage media, such as quantum memory, graphene memory and so on.

The description of the above-mentioned storage medium according to the method provided by the above-mentioned embodiment may also include other implementation modes. The implementation principle and technical effect of this embodiment are the same as the method provided by the above-mentioned above-mentioned embodiment. For details, please refer to the relevant The descriptions of the methods provided in the foregoing embodiments are not repeated here.

In one embodiment, the present application also provides a device for face silent liveness detection, which may be a separate computer, or may include one or more of the methods or one or more of the methods described in this specification. The actual operating device of multiple embodiment devices and the like. The device for face silent life detection may include at least one processor and a memory storing computer-executable instructions. When the processor executes the instructions, the face silent life detection method described in any one or more embodiments above is implemented. step.

The description of the above-mentioned device according to the method provided by the above-mentioned embodiment may also include other implementations. The implementation principle and technical effect of this embodiment are the same as the method provided by the above-mentioned above-mentioned embodiment. For details, please refer to the relevant above-mentioned The descriptions of the methods provided in the embodiments are not repeated here.

Finally, it should be noted that: the above-described embodiments are only specific implementations of the application, used to illustrate the technical solutions of the application, rather than limiting it, and the scope of protection of the application is not limited thereto, although referring to the aforementioned The embodiment has described this application in detail, and those of ordinary skill in the art understand that any person familiar with the art within the technical scope disclosed in this application can still modify or modify the technical solutions described in the foregoing embodiments. Changes can be easily thought of, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present application. All should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Claims

A method for face silent living body detection, characterized in that the method comprises:

Obtaining a face image collected under specific supplementary light parameters, performing preprocessing on the human face image to obtain a preprocessed human face image; wherein, the supplementary light parameters include RGB color parameters and brightness parameters;

The preprocessed face image is input into a trained convolutional neural network to extract feature vectors, which are used to distinguish real faces and fake faces, and obtain living body scores according to the feature vectors; and According to the face image after the pretreatment, the light supplement parameter is regressed to obtain the light supplement parameter after regression;

When the living body score is greater than the set threshold, and the difference between the regressed fill light parameter and the specific fill light parameter is within a preset range, it is determined that the face image is from a real face, otherwise , it is determined that the face image is from a prosthetic face.
The human face silent living body detection method according to claim 1, wherein said obtaining the human face image collected under specific supplementary light parameters comprises:

During the process of collecting the face image, the supplementary light parameter is randomly changed every certain time interval, and the supplementary light parameter corresponding to the acquisition time is recorded as the specific supplementary light parameter.
According to claim 1 or 2 described human face silent living body detection method, it is characterized in that, described RGB color parameter is α, and brightness parameter is β, calculates described RGB color parameter by following formula:

α = (R/255, G/255, B/255),

Among them, R, G, and B are the R, G, and B values of the fill light, respectively, and β∈[0,1].
According to the human face silent living detection method described in any one of claims 1-3, it is characterized in that, the convolutional neural network is obtained by training as follows:

Obtain a plurality of face images collected under a plurality of fill light parameters, preprocess each face image respectively, and add labels to each face image to obtain a training sample set; wherein, the labels include RGB color parameter α and brightness parameter β;

The training sample set is input into the convolutional neural network, and feature vector samples are extracted, and the feature vector samples are used to distinguish real faces and artificial faces, and return light supplement parameters according to the training sample set;

Calculate the loss of the eigenvector sample by ArcFace Loss, calculate the loss of the fill light parameter regression through the Euclidean loss function, and update the parameters of the convolutional neural network through backpropagation; wherein, when calculating the loss of the fill light parameter regression , the regressors of the prosthetic training samples include RGB color parameters and brightness parameters, where α=(0,0,0), β=0.
The human face silent living body detection method according to claim 4, wherein said preprocessing the human face image includes:

Carry out face detection and face key point location to described face image, obtain binocular coordinates;

The face image is aligned and scaled by binocular coordinates to obtain the preprocessed face image.
The face silent living body detection method according to any one of claims 1-5, characterized in that the supplementary light is performed through the screen of the mobile terminal.
The face silent living body detection method according to claim 1, is characterized in that, described method also comprises:

Calculate the Euclidean distance between the regressed fill light parameter and the specific fill light parameter, and obtain the difference between the regressed fill light parameter and the specific fill light parameter.
The face silent living body detection method according to claim 1, wherein said obtaining the living body score according to said feature vector comprises:

Input the feature vector into ArcFace Loss to get the living body score.
The human face silent living body detection method according to claim 8, wherein the key points of the human face also include key points of the nose, key points of the left corner of the mouth and key points of the right corner of the mouth.
The face silent living body detection method according to claim 1, wherein when the living body score is greater than a set threshold, and the regressed supplementary light parameter and the specific supplementary light parameter When the difference is within the preset range, it is determined that the human face image is from a real human face, otherwise, it is determined that the human face image is from a prosthetic human face, including:

When the living body score is greater than or equal to a set threshold, calculate the difference between the regressed fill light parameter and the specific fill light parameter;

When the difference is within a preset range, it is determined that the face image is from a real face,

When the difference is not within the preset range, it is determined that the face image is from a prosthetic face;

In a case where the living body score is smaller than the set threshold, it is determined that the face image is from a prosthetic face.
A human face silent living body detection device is characterized in that said device comprises:

An image acquisition module, configured to acquire a face image collected under specific light supplement parameters, and preprocess the face image to obtain a preprocessed face image; wherein, the light supplement parameters include RGB color parameters and brightness parameters;

A processing module, configured to input the preprocessed face image into a trained convolutional neural network to extract a feature vector, which is used to distinguish a real face from a prosthetic face, obtained according to the feature vector In vivo score; And according to the face image after the preprocessing regression supplementary light parameter, obtain the supplementary light parameter after regression;

A judging module, configured to determine that the face image is from If it is a real face, otherwise, it is determined that the face image is from a prosthetic face.
The human face silent living body detection device according to claim 11, wherein the device further comprises:

The sample acquisition module is used to obtain the face images collected under a plurality of supplementary light parameters, preprocess and add labels to obtain a training sample set; wherein, the labels include RGB color parameters α and brightness parameters β of supplementary light parameters;

The forward processing module is used for inputting the training sample set into the convolutional neural network, extracting feature vector samples for discriminating real faces and prosthetic faces, and regressing fill light parameters according to the training sample set;

The backpropagation module is used to calculate the loss of the eigenvector samples through ArcFace Loss, calculate the loss of the fill light parameter regression through the Euclidean loss function, and update the parameters of the convolutional neural network through backpropagation; wherein, when calculating the fill light parameter regression When the loss of , the regressor of the prosthetic training sample includes RGB color parameters and brightness parameters, where α=(0,0,0), β=0.
The human face silent living body detection device according to any one of claims 8-11, wherein the image acquisition module comprises:

A face detection and positioning unit, configured to perform face detection and key point positioning on the face image to obtain binocular coordinates;

The human face normalization unit is used to align and scale the human face image through binocular coordinates to obtain the preprocessed human face image.
A computer-readable storage medium for face silent liveness detection, characterized in that it includes a memory for storing processor-executable instructions, and when the instructions are executed by the processor, it implements any of claims 1-7. A step of the face silent liveness detection method.
A device for face silent liveness detection, characterized in that it includes at least one processor and a memory storing computer-executable instructions, and when the processor executes the instructions, it implements any one of claims 1-7. Describe the steps of the human face silent liveness detection method.