CN114093006A

CN114093006A - Training method, device and equipment of living human face detection model and storage medium

Info

Publication number: CN114093006A
Application number: CN202111417758.1A
Authority: CN
Inventors: 王珂尧; 张国生
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2022-02-25

Abstract

The utility model provides a training method, a training device, equipment and a storage medium for a living body face detection model, which relate to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and can be applied to scenes such as smart cities, smart finance and the like. The method comprises the following steps: acquiring an initial image set; detecting each initial image in the initial image set by using at least one detection model trained in advance, and determining a training image set based on a detection result, wherein the training data sets of all the detection models in the at least one detection model are different; adding the training image set into the initial training set to obtain a target training set; and training the initial living body face detection model by using a target training set to obtain a trained living body face detection model. According to the training method of the living human face detection model, manual data labeling is not needed, and the training efficiency of the model and the accuracy of the model are improved.

Description

Training method, device and equipment of living human face detection model and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, particularly to the field of deep learning and computer vision technologies, and more particularly to a method, an apparatus, a device, and a storage medium for training a living body face detection model, which can be applied to scenes such as smart cities and smart finance.

Background

The living body face detection is to distinguish whether an image is shot by a real person, is a basic composition module of a face recognition system and ensures the safety of the face recognition system. However, in an actual application scenario, the living human face algorithm based on deep learning has a problem of poor generalization, which has a good effect on the seen data, but has a reduced effect on unknown attack samples and modes, which affects the actual application performance.

Disclosure of Invention

The disclosure provides a training method, a device, equipment and a storage medium for a living body face detection model.

According to a first aspect of the present disclosure, there is provided a training method for a human face detection model, including: acquiring an initial image set; detecting each initial image in the initial image set by using at least one detection model trained in advance, and determining a training image set based on a detection result, wherein the training data sets of all the detection models in the at least one detection model are different; adding the training image set into an initial training set to obtain a target training set, wherein the initial training set is a training data set of an initial living body face detection model; and training the initial living body face detection model by using a target training set to obtain a trained living body face detection model.

According to a second aspect of the present disclosure, there is provided a living body face detection method, including: acquiring an image to be detected; inputting an image to be detected into a pre-trained living body face detection model, and outputting to obtain a living body face detection result, wherein the living body face detection model is obtained by training according to the method described in any one of the first aspect.

According to a third aspect of the present disclosure, there is provided a training apparatus for a living body face detection model, comprising: a first acquisition module configured to acquire an initial set of images; a determining module configured to detect each initial image in the initial image set by using at least one detection model trained in advance, and determine a training image set based on the detection result, wherein the training data sets of the at least one detection model are different; the adding module is configured to add the training image set into an initial training set to obtain a target training set, wherein the initial training set is a training data set of an initial living body face detection model; and the training module is configured to train the initial living body face detection model by using the target training set to obtain a trained living body face detection model.

According to a fourth aspect of the present disclosure, there is provided a living body face detection apparatus including: a second acquisition module configured to acquire an image to be detected; the detection module is configured to input an image to be detected into a pre-trained living body face detection model, and output a living body face detection result, wherein the living body face detection model is obtained by training according to the method described in any one of the first aspect.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect or the second aspect.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method as described in any one of the implementation manners of the first or second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method as described in any of the implementations of the first or second aspect.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram in which the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method of training a live face detection model according to the present disclosure;

FIG. 3 is a flow diagram of another embodiment of a training method for a live face detection model according to the present disclosure;

FIG. 4 is a decomposition flow diagram of the generation steps of an initial image set of a training method of a live face detection model according to the present disclosure;

FIG. 5 is a flow diagram of one embodiment of a live face detection method according to the present disclosure;

FIG. 6 is a schematic structural diagram of an embodiment of a training apparatus for a living human face detection model according to the present disclosure;

FIG. 7 is a schematic structural diagram of one embodiment of a living body face detection apparatus according to the present disclosure;

fig. 8 is a block diagram of an electronic device for implementing a training method of a living body face detection model or a living body face detection method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows an exemplary system architecture 100 to which an embodiment of the training method of the living body face detection model or the training apparatus of the living body face detection model of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or transmit information or the like. Various client applications may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may provide various services. For example, the server 105 may analyze and process the initial image set acquired from the

terminal devices

101, 102, 103 and generate a processing result (e.g., a trained live face detection model).

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the training method of the living body face detection model provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the training device of the living body face detection model is generally disposed in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method of training a live face detection model according to the present disclosure is shown. The training method of the living human face detection model comprises the following steps:

step 201, an initial image set is obtained.

In this embodiment, an executing subject (for example, the server 105 shown in fig. 1) of the training method for the living body face detection model may acquire an initial image set, where the initial image set includes at least one initial image, and the initial image needs to include a face, so that the face in the initial image can be detected. As an example, the initial image set is generally obtained by processing an image set obtained on a line, that is, the executing entity may obtain a latest image set from the line, and then perform processing, such as face detection, face alignment, and the like, on the image set, so as to obtain an image set after processing, that is, the initial image set; as another example, the initial image set may also be the latest image set acquired by the execution subject in various manners, such as images acquired by a fixed camera, and the acquired image set is processed to obtain a processed image set, that is, the initial image set. The initial image set is obtained by processing the acquired image set, so that a more accurate detection result can be obtained when the following steps are carried out on the initial image in the initial image set.

Step 202, detecting each initial image in the initial image set by using at least one detection model trained in advance, and determining a training image set based on the detection result.

In this embodiment, the executing subject may detect each initial image in the initial image set by using at least one detection model trained in advance, and determine the training image set based on the detection result, where the training data sets of the at least one detection model are different. That is, in this embodiment, the executing body may pre-train M detection models, where M is an integer greater than or equal to 1, and when training the M detection models, training data sets used by each detection model are different, that is, the detection models may be trained on different domains, for example, 5 detection models may be pre-trained, and the 5 detection models are respectively trained on 5 different domains. The detection models are trained through different domains, so that each detection model has different detection capability.

Then, the executing entity detects the initial images in the initial image set by using each of the M detection models, so that for each initial image, there are M detection results, and the executing entity may determine whether to determine the initial image as the training image based on the M detection results. For example, it may be determined whether M detection scores corresponding to each initial image are all greater than a preset threshold, and in a case that all the M detection scores are greater than the preset threshold, the initial image is determined as a training image, so as to obtain a training image set.

And step 203, adding the training image set into the initial training set to obtain a target training set.

In this embodiment, the executing subject may add the training image set to an initial training set to obtain a target training set, where the initial training set is a training data set of an initial living human face detection model. That is, after the execution subject determines the training image set, the execution subject adds the training image set to an initial training set (a training set of an initial living human face detection model), so as to obtain a target training set, and the target training set includes a newly determined training set.

And step 204, training the initial living body face detection model by using the target training set to obtain a trained living body face detection model.

In this embodiment, the executing agent may train the initial living body face detection model by using the target training set determined in step 203, so as to obtain a trained living body face detection model. Here, the initial living body face detection model is a pre-trained model, which is a model trained by image data acquired in all different domains, and in this embodiment, the initial living body face detection model is trained again by using the target data set, so that the trained living body face detection model has higher detection accuracy.

The training method of the living body face detection model provided by the embodiment of the disclosure comprises the steps of firstly obtaining an initial image set; then, detecting each initial image in the initial image set by using at least one detection model trained in advance, and determining a training image set based on a detection result; then adding the training image set into the initial training set to obtain a target training set; and finally, training the initial living body face detection model by using a target training set to obtain a trained living body face detection model. According to the training method of the living body face detection model in the embodiment, the latest training data can be determined in an automatic mode, manual data marking is not needed, time-consuming and labor-consuming processes of manual marking are reduced, and the acquisition efficiency of new training data is improved; and the latest training data is added into the initial training set to obtain a target training set, and the target data set is used for training the initial living body face detection model, so that the detection accuracy and the generalization of the trained living body face detection model can be improved, and the defense capacity against attacks can be improved.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

With continued reference to fig. 3, fig. 3 illustrates a flow 300 of another embodiment of a method of training a live face detection model according to the present disclosure. The training method of the living human face detection model comprises the following steps:

step 301, an initial image set is obtained.

In the present embodiment, an executive (e.g., the server 105 shown in fig. 1) of the training method of the living body face detection model may acquire an initial image set. The initial image set comprises at least one initial image. Step 301 is substantially the same as step 201 in the foregoing embodiment, and the specific implementation manner may refer to the foregoing description of step 201, which is not described herein again.

Step 302, each initial image in the initial image set is detected by using at least one detection model trained in advance.

In this embodiment, the executing entity may detect each initial image in the initial image set by using M detection models trained in advance, where M is an integer greater than or equal to 1. Step 302 is substantially the same as step 202 in the foregoing embodiment, and the detailed implementation manner may refer to the foregoing description of step 202, which is not described herein again.

Step 303, in response to that the detection result of each detection model in the at least one detection model on the initial image meets a preset condition, determining the initial image as a training image to obtain a training image set.

In this embodiment, for each initial image in the initial image set, the executing entity uses M detection models to detect the initial image, so as to obtain M corresponding detection results. The executing body determines the initial image as a training image under the condition that the M detection results all meet the preset condition, and then obtains a training image set comprising a plurality of training images.

In some optional implementations of this embodiment, step 303 includes: marking the pseudo label of the initial image as a living body in response to the detection score of each detection model in the at least one detection model to the initial image being greater than a preset living body threshold value; in response to that the detection score of each detection model in the at least one detection model to the initial image is smaller than a preset attack threshold value, marking the pseudo label of the initial image as an attack; and determining the initial image marked with the false label as the living body and the attack as the training image.

In this implementation manner, if M detection scores output by M detection models are all greater than a preset living body threshold, the initial image is considered as a living body image, and the pseudo label of the initial image is labeled as a living body; and if the M detection scores output by the M detection models are all smaller than a preset attack threshold value, the initial image is considered as an attack image, the attack image can be an image obtained by shooting a false face (such as a face photo), and a pseudo label of the initial image is marked as an attack. Then, the executing entity determines the pseudo-label-labeled images as training images, so as to obtain a training image set.

It should be noted that, since the initial image is unlabeled data, in this implementation, labels of the unlabeled data (initial image) are predicted by using a trained model (detection model), so as to create pseudo labels for the initial image, that is, the pseudo labels here refer to labels created for the unlabeled data (initial image) by using the trained detection model.

The plurality of detection models trained in different domains are used for marking the data (initial image set) on the line in a voting mode, so that the latest labeled data can be obtained without a manual labeling mode, time-consuming and labor-consuming processes of manual labeling are reduced, and the data acquisition efficiency is improved.

And step 304, adding the training image set into the initial training set to obtain a target training set.

And 305, training the initial living body face detection model by using a target training set to obtain a trained living body face detection model.

The steps 304-305 are substantially the same as the steps 203-204 of the foregoing embodiment, and the specific implementation manner can refer to the foregoing description of the steps 203-204, which is not described herein again.

As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the training method for the living body face detection model in this embodiment highlights the step of determining the initial image as the training image, and determines the initial image as the training image under the condition that M detection results corresponding to each initial image all satisfy the preset condition, so that the efficiency of determining the training image is improved, and further, the generalization of the trained living body face detection model is improved.

With continued reference to fig. 4, fig. 4 shows a decomposition flow 400 of the generation step of the initial image set of the training method of the living body face detection model according to the present disclosure. The generating step of the initial image set comprises:

step 401, an original image set is obtained.

In this embodiment, the executing entity obtains an original image set, wherein the original image set is the latest image set on the line.

Step 402, preprocessing the original image for each original image in the original image set to obtain a corresponding face image.

In this embodiment, for each original image in the original image set acquired in step 401, the executing entity performs preprocessing on the original image to obtain a processed face image. Image preprocessing methods are various, including but not limited to at least one of the following: face detection, face alignment, normalization processing, data enhancement processing, and the like. The image preprocessing is carried out on the original image, so that irrelevant information in the original image can be eliminated, useful real information can be recovered, the detectability of the relevant information is enhanced, and the data is simplified to the maximum extent, so that the reliability of feature extraction image recognition and image detection is improved.

In some optional implementations of this embodiment, step 402 includes: carrying out face detection on the original image by using a face detection model to obtain a first image containing a face; performing face key point detection on the first image by using a face key point detection model to obtain face key point coordinates in the first image; and carrying out face alignment on the face in the first image based on the face key point coordinates to obtain a corresponding face image.

In this implementation, first, a face is defined to include 72 key points, which are respectively represented as (x)₁，y₁)、(x₂，y₂)……(x₇₂，y₇₂) And then the executing body performs face detection on the original image by using a pre-trained detection model to obtain an image containing a face, and the image is recorded as a first image. Wherein the detection model isA model of the face position is detected.

Then, the execution subject detects the face key points in the first image by using a pre-trained face key point detection model according to the detected face region, so as to obtain the face key point coordinate values in the first image. Wherein, the human face key point detection model is an existing model, and the first image is input into the human face key point detection model, so as to obtain 72 human face key point coordinates in the first image, which are respectively (x)₁，y₁)、(x₂，y₂)……(x₇₂，y₇₂)。

Then, the executing agent performs face alignment on the target face in the first image according to the key point coordinate value of the face, so as to obtain a corresponding face image, and specifically, according to the 72 key point coordinates of the face, may obtain a minimum value and a maximum value of x and y, which are respectively expressed as x_min、x_max、y_min、y_maxThen a face box can be determined based on the minimum and maximum of x and y, e.g., (x)_min，y_min) And (x)_max，y_max) A rectangular box of diagonal vertices. And enlarging the face frame by a preset multiple, and intercepting the face image, for example, enlarging by 3 times. And adjusting the sample face image to a preset size to obtain a corresponding face image, for example, adjusting the size to 224 × 224.

Through the steps, irrelevant information in the original image can be eliminated, useful real information can be recovered, the detectability of relevant information can be enhanced, and data can be simplified to the maximum extent, so that the reliability of characteristic extraction image identification and image detection can be improved.

Step 403, performing normalization processing on the face image.

In this embodiment, the execution subject may perform normalization processing on the face image. Specifically, the executing body may first perform normalization processing on the pixels in the face image, where the normalization processing may be, for example, subtracting 128 from the pixel value of each pixel and dividing the pixel value by 256 to make the pixel value of each pixel between [ -0.5, 0.5 ]. The normalization process can maximally simplify the data.

And step 404, performing random data enhancement processing on the normalized face image to obtain an initial image.

In this embodiment, the execution subject may perform random data enhancement processing on the normalized face image to obtain an initial image, where the data enhancement may be various operations such as flipping, rotating, clipping, deforming, and scaling. The data enhancement process may enhance the detectability of the relevant information.

As can be seen from fig. 4, in the above-mentioned method for generating an initial image set, by performing image preprocessing, face detection, face alignment, normalization processing, data enhancement processing, etc. on an original image, irrelevant information in the original image can be eliminated, useful real information can be recovered, detectability of relevant information can be enhanced, and data can be simplified to the maximum extent, thereby improving reliability of detection on the generated initial image.

With continuing reference to fig. 5, a flow 500 of one embodiment of a live face detection method according to the present disclosure is shown. The living human face detection method comprises the following steps:

and step 501, acquiring an image to be detected.

In the present embodiment, the execution subject of the living body face detection method (for example, the server 105 shown in fig. 1) may acquire an image to be detected. The image to be detected may be an image obtained by shooting a human face. The face may be a real face or a false face.

And 502, inputting an image to be detected into a pre-trained living body face detection model, and outputting to obtain a living body face detection result.

In this embodiment, the execution subject may input the image to be detected into a pre-trained living body face detection model, and output the result to obtain a living body face detection result, where the living body face detection model may be used to detect whether a living body exists in the image, and is obtained by training using the training method provided in any one of embodiments of fig. 1 to 4, which is not described herein again.

The living human face detection method provided by the embodiment of the disclosure includes the steps of firstly, obtaining an image to be detected; and then inputting the image to be detected into a pre-trained living body face detection model, and outputting to obtain a living body face detection result. In the living body face detection method provided in the embodiment, the pre-trained living body face detection model is used for detecting the model to be detected, so that the accuracy of the obtained living body face detection result is improved.

With further reference to fig. 6, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of an apparatus for training a human face detection model, where the apparatus embodiment corresponds to the method embodiment shown in fig. 2, and the apparatus may be applied to various electronic devices.

As shown in fig. 6, the training apparatus 600 of the living body face detection model of the present embodiment includes: a first acquisition module 601, a determination module 602, an addition module 603, and a training module 604. Wherein, the first obtaining module 601 is configured to obtain an initial image set; a determining module 602 configured to detect each initial image in the initial image set by using at least one detection model trained in advance, and determine a training image set based on a detection result; an adding module 603 configured to add the training image set to the initial training set to obtain a target training set; a training module 604 configured to train the initial living body face detection model with the target training set, resulting in a trained living body face detection model.

In the present embodiment, in the training apparatus 600 for a living body face detection model: the specific processes of the first obtaining module 601, the determining module 602, the adding module 603, and the training module 604 and the technical effects thereof can be referred to the related descriptions of step 201 and step 204 in the corresponding embodiment of fig. 2, and are not described herein again.

In some optional implementations of this embodiment, the determining module includes: and the determining submodule is configured to determine the initial image as a training image in response to the detection result of each detection model in the at least one detection model on the initial image meeting a preset condition, so as to obtain a training image set.

In some optional implementations of this embodiment, the determining the sub-module includes: a first labeling unit configured to label the pseudo label of the initial image as a living body in response to a detection score of each of the at least one detection model on the initial image being greater than a preset living body threshold; the second labeling unit is configured to label the pseudo label of the initial image as an attack in response to the detection score of each detection model in the at least one detection model on the initial image being smaller than a preset attack threshold value; a determination unit configured to determine an initial image in which the pseudo tag is labeled as a living body and an attack as a training image.

In some optional implementations of this embodiment, the initial image set includes at least one initial image; the training device of the living body face detection model further comprises an initial image generation module, wherein the initial image generation module comprises: an acquisition sub-module configured to acquire an original image set; the preprocessing submodule is configured to preprocess the original image aiming at each original image in the original image set to obtain a corresponding face image; the normalization submodule is configured to perform normalization processing on the face image; and the data enhancement module is configured to perform random data enhancement processing on the normalized human face image to obtain an initial image.

In some optional implementations of this embodiment, the preprocessing sub-module includes: the face detection unit is configured to perform face detection on the original image by using a face detection model to obtain a first image containing a face; the face key point detection unit is configured to perform face key point detection on the first image by using the face key point detection model to obtain face key point coordinates in the first image; and the face alignment unit is configured to perform face alignment on the face in the first image based on the face key point coordinates to obtain a corresponding face image.

With further reference to fig. 7, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a living body face detection apparatus, which corresponds to the embodiment of the method shown in fig. 5, and which can be applied in various electronic devices.

As shown in fig. 7, the living body face detection apparatus 700 of the present embodiment includes: a second acquisition module 701 and a detection module 702. The second obtaining module 701 is configured to obtain an image to be detected; and the detection module 702 is configured to input the image to be detected into a pre-trained living body face detection model, and output a living body face detection result.

In the present embodiment, in the living body face detection apparatus 700: the specific processing of the second obtaining module 701 and the detecting module 702 and the technical effects thereof can refer to the related descriptions of step 501 and step 502 in the corresponding embodiment of fig. 5, which are not repeated herein.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as a training method of a living body face detection model or a living body face detection method. For example, in some embodiments, the training method of the living face detection model or the living face detection method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the training method of the living body face detection model or the living body face detection method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform a training method of a live face detection model or a live face detection method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A training method of a human face detection model comprises the following steps:

acquiring an initial image set;

detecting each initial image in the initial image set by using at least one detection model trained in advance, and determining a training image set based on a detection result, wherein the training data sets of all the detection models in the at least one detection model are different;

adding the training image set into an initial training set to obtain a target training set, wherein the initial training set is a training data set of an initial living body face detection model;

and training the initial living body face detection model by using the target training set to obtain a trained living body face detection model.

2. The method according to claim 1, wherein the determining a training image set based on the detection result comprises:

and determining the initial image as a training image to obtain a training image set in response to that the detection result of each detection model in the at least one detection model on the initial image meets a preset condition.

3. The method according to claim 2, wherein the determining the initial image as a training image in response to the detection result of each of the at least one detection model on the initial image satisfying a preset condition comprises:

marking the pseudo label of the initial image as a living body in response to the detection score of each detection model of the at least one detection model on the initial image being greater than a preset living body threshold value;

in response to that the detection score of each detection model in the at least one detection model on the initial image is smaller than a preset attack threshold value, marking the pseudo label of the initial image as an attack;

and determining the initial image marked with the false label as the living body and the attack as the training image.

4. The method of claim 1, wherein the initial image set includes at least one initial image; and

the initial image is obtained by the following steps:

acquiring an original image set;

preprocessing each original image in the original image set to obtain a corresponding face image;

carrying out normalization processing on the face image;

and carrying out random data enhancement processing on the normalized human face image to obtain the initial image.

5. The method of claim 4, wherein the preprocessing the original image to obtain a corresponding face image comprises:

carrying out face detection on the original image by using a face detection model to obtain a first image containing a face;

performing face key point detection on the first image by using a face key point detection model to obtain face key point coordinates in the first image;

and carrying out face alignment on the face in the first image based on the face key point coordinates to obtain a corresponding face image.

6. A living body face detection method comprises the following steps:

acquiring an image to be detected;

inputting the image to be detected into a pre-trained living body face detection model, and outputting to obtain a living body face detection result, wherein the living body face detection model is obtained by training according to the method of any one of claims 1 to 5.

7. A training apparatus for a living body face detection model, comprising:

a first acquisition module configured to acquire an initial set of images;

a determining module configured to detect each initial image in the initial image set by using at least one detection model trained in advance, and determine a training image set based on a detection result, wherein training data sets of the at least one detection model are different;

an adding module configured to add the training image set to an initial training set to obtain a target training set, wherein the initial training set is a training data set of an initial living human face detection model;

and the training module is configured to train the initial living body face detection model by using the target training set to obtain a trained living body face detection model.

8. The apparatus of claim 7, wherein the means for determining comprises:

the determining sub-module is configured to determine the initial image as a training image to obtain a training image set in response to that the detection result of each detection model of the at least one detection model on the initial image meets a preset condition.

9. The apparatus of claim 8, wherein the determination submodule comprises:

a first labeling unit configured to label the pseudo label of the initial image as a living body in response to a detection score of each of the at least one detection model on the initial image being greater than a preset living body threshold;

a second labeling unit configured to label the pseudo label of the initial image as an attack in response to a detection score of each of the at least one detection model on the initial image being smaller than a preset attack threshold;

a determination unit configured to determine an initial image in which the pseudo tag is labeled as a living body and an attack as a training image.

10. The apparatus of claim 7, wherein the initial image set includes at least one initial image; and the apparatus further comprises an initial image generation module comprising:

an acquisition sub-module configured to acquire an original image set;

the preprocessing submodule is configured to preprocess each original image in the original image set to obtain a corresponding face image;

a normalization submodule configured to normalize the face image;

and the data enhancement module is configured to perform random data enhancement processing on the normalized human face image to obtain the initial image.

11. The apparatus of claim 10, wherein the pre-processing sub-module comprises:

the face detection unit is configured to perform face detection on the original image by using a face detection model to obtain a first image containing a face;

the face key point detection unit is configured to perform face key point detection on the first image by using a face key point detection model to obtain face key point coordinates in the first image;

and the face alignment unit is configured to perform face alignment on the face in the first image based on the face key point coordinates to obtain a corresponding face image.

12. A living body face detection apparatus comprising:

a second acquisition module configured to acquire an image to be detected;

a detection module configured to input the image to be detected into a pre-trained living body face detection model, and output a living body face detection result, wherein the living body face detection model is trained by the method according to any one of claims 1 to 5.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6.