CN113283388B

CN113283388B - Training method, device, equipment and storage medium of living body face detection model

Info

Publication number: CN113283388B
Application number: CN202110703241.2A
Authority: CN
Inventors: 喻晨曦
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2024-05-24
Anticipated expiration: 2041-06-24
Also published as: CN113283388A

Abstract

The invention discloses a training method of a living body face detection model, which is applied to the technical field of artificial intelligence and is used for solving the technical problem that the prediction accuracy of a model does not reach the standard when a two-class model is trained through an extremely unbalanced positive and negative sample. The method provided by the invention comprises the following steps: acquiring a face picture sample set carrying a label; inputting the normal face picture sample and the abnormal face picture sample into a living body face detection model to be trained to obtain an original predicted value; correcting the original predicted value of the abnormal face picture sample to obtain a corrected predicted value; taking the corrected predicted value as the predicted value of the abnormal face picture sample, taking the original predicted value as the predicted value of the normal face picture sample, and training the living face detection model to be trained according to the category to which the corresponding face picture sample identified in the label actually belongs; and when the loss function of the living body face detection model converges, obtaining the trained living body face detection model.

Description

Training method, device, equipment and storage medium of living body face detection model

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a training method, apparatus, device, and storage medium for a living face detection model.

Background

In the anti-fraud wind control scenario, the intelligent customer service scenario includes: certificate fraud prevention, face in vivo detection, identification of high risk clients/users based on client history data, big data backwash backhander, etc. In the case of security wind control, these scenarios are common. Moreover, the scenes contain extremely unbalanced sample characteristics, and in many projects, the air control target sample rate of some scenes is even lower than 2%, and the problem caused by the extremely unbalanced sample characteristics brings great difficulty to the improvement of the model effect.

Taking the scene of the human face living body detection model as an example, since the number of human face falsification samples of living bodies is far smaller than that of normal human face samples of living bodies, when intelligent detection is carried out on human face living bodies through a two-class model, model training and testing are carried out due to lack of enough real service data, most of data in a laboratory disclosed by the model is inconsistent with the data caliber of the real service scene, so that the model fails in the real scene, under the current method for directly carrying out machine learning supervision model training, the scene with the target sample ratio of training data of only 0.3% or the positive and negative sample ratio of the training data of at most 1 is adopted: 200 is insufficient to achieve a relatively good accuracy requirement for the training task.

Disclosure of Invention

The embodiment of the invention provides a training method, a training device, computer equipment and a storage medium of a living body face detection model, which are used for solving the technical problem that the prediction precision of the model does not reach the standard when a two-class model is trained by using an extremely unbalanced positive and negative sample.

A training method of a living face detection model, the method comprising:

Acquiring a face picture sample set carrying a label, wherein the face picture sample set comprises normal face picture samples and abnormal face picture samples, and the number of the normal face picture samples is larger than that of the abnormal face picture samples;

Inputting the normal face picture sample and the abnormal face picture sample into a living body face detection model to be trained to obtain an original predicted value of the normal face picture sample and an original predicted value of the abnormal face picture sample;

Correcting the original predicted value of the abnormal face picture sample to obtain the corrected predicted value of the abnormal face picture sample;

Taking the correction predicted value as a predicted value of the living body face detection model to be trained on the abnormal face picture sample, taking an original predicted value of the normal face picture sample as a predicted value of the living body face detection model to be trained on the normal face picture sample, taking a normal category or an abnormal category, which is identified in the label and to which the corresponding face picture sample actually belongs, as a target, and training the living body face detection model to be trained;

and when the loss function of the living body face detection model converges, obtaining the trained living body face detection model.

A training apparatus for a living face detection model, the apparatus comprising:

The system comprises a sample acquisition module, a label acquisition module and a label acquisition module, wherein the sample acquisition module is used for acquiring a face picture sample set carrying labels, the face picture sample set comprises normal face picture samples and abnormal face picture samples, and the number of the normal face picture samples is greater than that of the abnormal face picture samples;

the input module is used for inputting the normal face picture sample and the abnormal face picture sample into a living body face detection model to be trained to obtain an original predicted value of the normal face picture sample and an original predicted value of the abnormal face picture sample;

The correction processing module is used for carrying out correction processing on the original predicted value of the abnormal face picture sample to obtain the corrected predicted value of the abnormal face picture sample;

The training module is used for taking the correction predicted value as the predicted value of the living body face detection model to be trained on the abnormal face picture sample, taking the original predicted value of the normal face picture sample as the predicted value of the living body face detection model to be trained on the normal face picture sample, taking the normal category or the abnormal category of the corresponding face picture sample identified in the label as the target, and training the living body face detection model to be trained;

And the convergence module is used for obtaining the trained living body face detection model when the loss function of the living body face detection model converges.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the training method of the living face detection model described above when the computer program is executed.

A computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the training method of a living face detection model described above.

According to the training method, the device, the computer equipment and the storage medium of the living body face detection model, firstly, a face picture sample set carrying a label is obtained, the number of normal face picture samples in the face picture sample set is larger than that of the abnormal face picture samples, the normal face picture samples and the abnormal face picture samples are input into the living body face detection model to be trained, the original predicted value of the normal face picture samples and the original predicted value of the abnormal face picture samples are obtained, correction processing is carried out on the original predicted value of the abnormal face picture samples with the small number of samples, the corrected predicted value is used as the predicted value of the living body face detection model to be trained on the abnormal face picture samples, the original predicted value of the normal face picture samples is used as the predicted value of the living body face detection model to be trained on the normal face picture samples, the actual normal type or the abnormal type of the corresponding face picture identified in the label is used as a target, correction processing is carried out on the original predicted value of the abnormal face picture samples with the small number of samples, the face detection model to be trained is carried out on the living body face picture sample to be trained, and the face detection model to be more accurate in the face detection model to detect the human face sample to be trained, and the face detection model to have high face detection accuracy is more accurate in the face detection model to detect the abnormal face sample to the human face sample to be detected.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of an application environment of a training method of a living face detection model according to an embodiment of the present invention;

FIG. 2 is a flowchart of a training method of a living face detection model according to an embodiment of the present invention;

FIG. 3 is a flowchart of a training method of a living face detection model according to another embodiment of the present invention;

Fig. 4 is a schematic structural diagram of a living body face detection apparatus according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a computer device in accordance with an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The training method of the living body face detection model provided by the application can be applied to an application environment as shown in figure 1, wherein the computer equipment can communicate with a server through a network. The computer device may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, among others. The server may be implemented as a stand-alone server or as a cluster of servers.

In one embodiment, as shown in fig. 2, a training method of a living body face detection model is provided, and the method is applied to the computer device in fig. 1 for illustration, and includes the following steps:

S101, acquiring a face picture sample set carrying a label, wherein the face picture sample set comprises normal face picture samples and abnormal face picture samples, and the number of the normal face picture samples is larger than that of the abnormal face picture samples.

It will be appreciated that the tag is used to mark whether the face picture sample is a normal sample or an abnormal sample. The abnormal face picture sample represents a forged living face picture, for example, a face picture recorded by a flip or a video, and the normal face picture sample is a living face picture shot by the current living state of the user.

S102, inputting the normal face picture sample and the abnormal face picture sample into a living body face detection model to be trained, and obtaining an original predicted value of the normal face picture sample and an original predicted value of the abnormal face picture sample.

It can be understood that the obtained original predicted value of the abnormal face picture sample is a result of forward propagation of the abnormal face picture sample by the living body face detection model to be trained, and the obtained original predicted value of the normal face picture sample is a result of forward propagation of the normal face picture sample by the living body face detection model to be trained.

In one embodiment, the original predicted value may be represented by an abnormal probability, where the obtained original predicted value of the normal face picture sample is the predicted probability that the normal face picture sample obtained by inputting the normal face picture sample to the living face detection model to be trained is an abnormal picture, and the obtained original predicted value of the abnormal face picture sample is the predicted probability that the abnormal face picture sample obtained by inputting the abnormal face picture sample to the living face detection model to be trained is an abnormal picture.

In other embodiments, the original predicted value may also be represented by a normal probability, where the obtained original predicted value of the normal face picture sample is the predicted probability that the normal face picture sample obtained by inputting the normal face picture sample to the living face detection model to be trained is a normal picture, and the obtained original predicted value of the abnormal face picture sample is the predicted probability that the abnormal face picture sample obtained by inputting the abnormal face picture sample to the living face detection model to be trained is a normal picture.

In one embodiment, the living face detection model may be a two-class or multi-class model suitable for supervised learning, and the scheme is mainly aimed at the living face, so that the scheme is applicable to a wide-range back model, such as mobilenet, resnet, vggnet and the like. In a multi-classification model, a model with classifier modules may also be used.

In one embodiment, when the normal face picture sample and the abnormal face picture sample are input to the living face detection model to be trained, the face picture samples can be input to the living face detection model in batches by randomly extracting the normal face picture sample and the abnormal face picture sample.

Further, the face picture sample set including the normal face picture sample and the abnormal face picture sample can be randomly extracted in a miniBatch manner, and finally the whole face sample set D is divided into k sample sets.

And S103, performing correction processing on the original predicted value of the abnormal face picture sample to obtain the corrected predicted value of the abnormal face picture sample.

In one embodiment, the step of correcting the predicted value of the abnormal face picture sample to obtain the corrected predicted value of the abnormal face picture sample further includes:

Calculating the correction predicted value of the abnormal face picture sample through the following formula:

wherein mid represents the corrected predicted value of the abnormal face picture sample i, P represents the original predicted value of the abnormal face picture sample, C represents the super-parameter, and Max represents the number of samples in the class with the largest sample size.

In this embodiment, the classes of the samples include a normal class and an abnormal class, and since the number of normal samples is much larger than the number of abnormal samples in this embodiment, the Max represents the number of normal face picture samples.

By performing correction processing on the original predicted value of the abnormal face picture sample, the abnormal face picture sample with less sample size is more focused when the parameters are regulated by the living face detection model, and the recognition precision of the living face detection model on the abnormal living face can be improved.

S104, taking the correction predicted value as a predicted value of the living body face detection model to be trained on the abnormal face picture sample, taking an original predicted value of the normal face picture sample as a predicted value of the living body face detection model to be trained on the normal face picture sample, taking a normal category or an abnormal category of the corresponding face picture sample identified in the label as a target, and training the living body face detection model to be trained.

It is understood that the predicted value of the abnormal face picture sample and the predicted value of the normal face picture sample may be expressed as:

Wherein (x, y) represents a sample x in the first sample set with a label of y, LD () represents a label distribution adaptive loss function, f _θ represents a living face detection model with a parameter θ to be learned, y=1 represents an abnormal face picture sample, y=0 represents a normal face picture sample, and P' represents a predicted value of the normal face picture sample.

In one embodiment, after the step of obtaining the normal face picture sample and the abnormal face picture sample carrying the tag, the method further comprises:

randomly extracting a first sample set from a face picture sample set, wherein the first sample set comprises the normal face picture sample and the abnormal face picture sample;

Randomly extracting a second sample set from the face picture sample set, wherein the second sample set comprises the normal face picture sample and the abnormal face picture sample;

further, the step of training the living body face detection model to be trained by taking the normal category or the abnormal category to which the corresponding face picture sample identified in the label actually belongs as a target includes:

Selecting a first-stage loss function, and performing first-stage training on the living body face detection model through the first sample set;

And when the first-stage loss function converges, selecting a second-stage loss function, and continuing to train the second-stage living body face detection model through the second sample set.

In one embodiment, the first stage loss function is:

Wherein f _1θ represents a living face detection model with a parameter θ to be learned in the first stage, (x, y) represents a sample x in a first sample set with a label y, m represents the number of samples in the first sample set, and LD ((x, y), f _1θ) represents a predicted value of the living face detection model on a face picture sample x in the first stage.

Judging whether the first-stage loss function converges or not through the following formula:

wherein, And (3) representing the gradient of the parameter theta, wherein alpha 1 represents the learning rate of the first-stage loss function, f _1θ 'represents the output of the living body face detection model in the first stage when the parameter which is adjusted last time is theta', and when the absolute value of the difference value between the left f _1θ of the equation and the right of the equation is within a preset first range, the first-stage loss function is judged to be converged.

It will be appreciated that when the gradient of the parameter θ is small enough during the first stage training, f _1θ and f _1θ' are infinitely close, this first stage loss function is shown to have converged.

In one embodiment, the second stage loss function is:

Wherein f _2θ represents a living face detection model with a parameter θ to be learned in the second stage, (x ', y ') represents samples x ', n _i in the second sample set with a label of y ', m represents the total number of samples in the ith class, m represents the number of samples in the first sample set, LD ((x ', y ') and f _2θ) represents a predicted value of the living face detection model on the face picture sample x ' in the second stage.

It will be appreciated that when y 'is a normal face picture sample, n _i represents the total number of normal face picture samples in the second sample set, and when y' is an abnormal face picture sample, n _i represents the total number of abnormal face picture samples in the second sample set.

Further, it is determined whether the second stage loss function converges by the following formula:

wherein, And a gradient representing a parameter theta, wherein alpha represents the learning rate of the second-stage loss function, n _i represents the total number of samples in the ith category, f _2θ 'represents the output of the living body face detection model in the second stage when the parameter which is adjusted last time is theta', and the second-stage loss function convergence is judged when the absolute value of the difference value between the left f _2θ of the equation and the right of the equation is within a preset second range.

It will be appreciated that when the gradient of the parameter θ is small enough during the second stage training, f _θ and f _θ' are infinitely close, this second stage loss function is shown to have converged.

Wherein,Alpha' represents the initialized learning rate of the second-stage loss function, gamma is a preset constant, and gamma is larger than 1.

In one embodiment, γ has a value of, for example, 1.1, 1.2, 1.3, etc.

In one embodiment, α' =α ₁.

According to the second-stage loss function, the learning rate alpha of the second-stage loss function is subjected to equal-ratio scaling, and the learning rate after equal-ratio scaling is used as the attenuation of each iteration of the second-stage loss function, so that the function of adjusting the learning speed can be achieved, and the convergence speed of the living face detection model in the second stage is accelerated.

Further, in order to increase the convergence speed of the loss function of the living face detection model and shorten the training time of the living face detection model, the step of performing the first stage training on the living face detection model through the first sample set further includes:

randomly extracting a first sample subset and a second sample subset from the first sample set;

Selecting the first stage loss function and the first sample subset, and training the living body face detection model by selecting an adaptive learning rate mechanism;

When the first-stage loss function converges, the second sample subset and the cosine decay learning rate are selected to train the living human face detection model continuously;

and when the first-stage loss function is converged again, judging that the living body face detection model is trained in the first stage.

It is understood that the cosine decay learning rate may also be referred to as a cosine learning rate.

And S105, when the loss function of the living body face detection model converges, obtaining the trained living body face detection model.

Further, when the loss function of the living body face detection model converges, the step of obtaining the trained living body face detection model specifically includes:

and when the loss function of the second stage of the living body face detection model converges, obtaining the trained living body face detection model.

The training method of the living body face detection model provided by the embodiment firstly obtains a face picture sample set carrying a label, the number of normal face picture samples in the face picture sample set is larger than that of the abnormal face picture samples, the normal face picture samples and the abnormal face picture samples are input into the living body face detection model to be trained to obtain an original predicted value of the normal face picture samples and an original predicted value of the abnormal face picture samples, the original predicted value of the abnormal face picture samples with smaller sample number is subjected to deviation correction processing to obtain an abnormal face picture sample correction predicted value, the correction predicted value is used as the predicted value of the living body face detection model to be trained to the abnormal face picture samples, the original predicted value of the normal face picture sample is used as the predicted value of the normal face picture sample by the living face detection model to be trained, the normal category or the abnormal category which the corresponding face picture sample identified in the label actually belongs to is used as a target, and the living face detection model to be trained is trained, so that the living face detection model pays more attention to the characteristic information of the abnormal face picture sample with less sample number in the training process, the trained living face detection model has higher recognition sensitivity when detecting the abnormal face picture to be detected, and the detection accuracy of the living face detection model is improved.

Further, when an unlabeled face picture sample is newly added, in the process of continuing to train the living face detection model and updating the parameters of the living face detection model, fig. 3 is a flowchart of a training method of the living face detection model according to another embodiment of the present invention, as shown in fig. 3, in order to reduce the workload of manually labeling the face picture sample, after the step of converging the second-stage loss function of the living face detection model, the method further includes the following steps S301 to S309:

S301, acquiring an unlabeled face picture sample without a label and an initial label preset value of the unlabeled face picture sample;

s302, inputting the unlabeled face picture sample into the living face detection model trained in the second stage to obtain a first pseudo label of the unlabeled face picture sample;

S303, calculating the loss of the initial pseudo tag according to the first round pseudo tag and the initial tag preset value;

S304, calculating the loss of the living body face detection model according to the loss of the initial pseudo tag and the loss of the second-stage loss function;

s305, reversely adjusting parameters of the living body face detection model according to the calculated loss of the living body face detection model;

S306, inputting the unlabeled face picture sample into the living face detection model again for forward transmission to obtain a current pseudo tag of the unlabeled face picture sample;

S307, calculating the current loss of the pseudo tag according to the current pseudo tag and the pseudo tag obtained when the adjacent pseudo tag is transmitted forward last time;

S308, calculating the loss of the living body face detection model according to the loss of the current pseudo tag and the loss of the second-stage loss function;

S309, when the loss of the living body face detection model is not converged, the step of reversely adjusting the parameters of the living body face detection model to calculate the loss of the living body face detection model is circulated until the loss of the living body face detection model is converged, and a trained living body face detection model is obtained.

In one embodiment, the absolute value of the difference between the current pseudo tag and the pseudo tag obtained when the current pseudo tag and the adjacent pseudo tag were previously transmitted in the forward direction in the living body face detection model is used as the current loss of the pseudo tag, and when the current pseudo tag is obtained by the forward direction of the living body face detection model for the first time, the absolute value of the difference between the current pseudo tag and the preset value of the initial tag is used as the current loss of the pseudo tag.

Mathematically representing the current penalty of the pseudo tag, for example:

Wherein y _i 'represents the current pseudo tag, y _i "represents the pseudo tag obtained when the previous forward propagation is adjacent, and when y _i' is the pseudo tag obtained when the previous forward propagation is first performed, y _i" is the initial tag preset value.

In one embodiment, the step of calculating the loss of the living face detection model by the first round pseudo tag and the initial tag preset value further includes:

calculating the loss of the living body face detection model by the following formula:

wherein, Representing the loss of the second stage loss function,/>Representing the current loss of the pseudo tag, μ representing a preset parameter.

The non-label face picture sample without the label carries out semi-supervised learning on the living body face detection model, so that when the non-labeled face picture sample is newly added, the living body face detection model needs to be trained continuously, the workload of manual labeling on the face picture sample can be reduced, and the updating efficiency of the living body face detection model is improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

In an embodiment, a training device for a living body face detection model is provided, where the training device for a living body face detection model corresponds to the training method for a living body face detection model in the foregoing embodiment one by one. In one embodiment, the living face detection model may be a two-class or multi-class model suitable for supervised learning, and the scheme is mainly aimed at the living face, so that the scheme is applicable to a wide range of backhaul models, such as mobilenet, resnet, vggnet and the like. In a multi-classification model, a model with classifier modules may also be used. As shown in fig. 4, the training device 100 of the living face detection model includes a sample acquisition module 11, an input module 12, a correction processing module 13, a training module 14, and a convergence module 15. The functional modules are described in detail as follows:

The sample acquisition module 11 is configured to acquire a face picture sample set carrying a label, where the face picture sample set includes normal face picture samples and abnormal face picture samples, and the number of the normal face picture samples is greater than the number of the abnormal face picture samples;

The input module 12 is configured to input the normal face picture sample and the abnormal face picture sample to a living face detection model to be trained, so as to obtain an original predicted value of the normal face picture sample and an original predicted value of the abnormal face picture sample;

The correction processing module 13 is configured to perform correction processing on the original predicted value of the abnormal face picture sample, so as to obtain a corrected predicted value of the abnormal face picture sample;

The training module 14 is configured to use the correction predicted value as a predicted value of the living face detection model to be trained on the abnormal face picture sample, use an original predicted value of the normal face picture sample as a predicted value of the living face detection model to be trained on the normal face picture sample, and use a normal category or an abnormal category, to which the corresponding face picture sample identified in the tag actually belongs, as a target, to train the living face detection model to be trained;

and the convergence module 15 is configured to obtain the trained live face detection model when the loss function of the live face detection model converges.

When the original predicted value of the normal face picture sample and the original predicted value of the abnormal face picture sample are obtained, the original predicted value of the abnormal face picture sample with the smaller number of samples is subjected to correction processing to obtain the corrected predicted value of the abnormal face picture sample, the corrected predicted value is used as the predicted value of the abnormal face picture sample by the living face detection model to be trained, the original predicted value of the normal face picture sample is used as the predicted value of the normal face picture sample by the living face detection model to be trained, the normal class or the abnormal class which the corresponding face picture sample identified in the label actually belongs to is used as the target, and the living face detection model to be trained is trained, so that the living face detection model is more concerned with the characteristic information of the abnormal face picture sample with the smaller number of samples in the training process, the trained living face detection model has higher recognition sensitivity when the abnormal face picture to be detected is detected, and the living face detection accuracy of the living face detection model is improved.

In one embodiment, the correction processing module 13 is specifically configured to calculate the correction predicted value of the abnormal face picture sample according to the following formula:

In one embodiment, the training device 100 for a living body face detection model further includes:

The first extraction module is used for randomly extracting a first sample set from the face picture sample set, wherein the first sample set comprises the normal face picture sample and the abnormal face picture sample;

the second extraction module is used for randomly extracting a second sample set from the face picture sample set, wherein the second sample set comprises the normal face picture sample and the abnormal face picture sample.

Further, the training module 14 specifically includes:

the first stage training unit is used for selecting a first stage loss function and carrying out first stage training on the living body face detection model through the first sample set;

And the second stage training unit is used for selecting a second stage loss function when the first stage loss function converges, and continuing to train the second stage on the living body face detection model through the second sample set.

In one embodiment, the first stage loss function is:

wherein f _1θ represents a living face detection model with a parameter θ to be learned in the first stage, (x, y) represents a sample x in the first sample set with the label v, m represents the number of samples in the first sample set, LD ((x, y), f _1θ) represents a predicted value of the living face detection model on a face picture sample x in the first stage.

In one embodiment, it is determined whether the first stage loss function converges by the following formula:

wherein, And a gradient representing a parameter theta, wherein alpha ₁ represents the learning rate of the first-stage loss function, f _1θ 'represents the output of the living body face detection model in the first stage when the parameter which is adjusted last time is theta', and the first-stage loss function is judged to be converged when the absolute value of the difference value between f _1θ on the left side of the equation and f _1θ on the right side of the equation is within a preset first range.

Further, the second stage loss function is:

Further, the second stage training unit determines whether the second stage loss function converges according to the following formula:

wherein, And a gradient representing a parameter theta, wherein alpha represents a learning rate of the second-stage loss function, n _i represents the total number of samples in the ith category, f _2θ 'represents the output of the living body face detection model in the second stage when the parameter which is adjusted last time is theta', and the second-stage loss function convergence is judged when the absolute value of the difference value between the left f _2θ of the equation and the right of the equation is within a preset second range.

In one embodiment, γ has a value of, for example, 1.1, 1.2, 1.3, etc.

When an unlabeled face picture sample is newly added, in the process of continuing training the living face detection model and updating parameters of the living face detection model, in order to reduce the workload of manually labeling the face picture sample, the training device 100 of the living face detection model further includes:

the system comprises an initial label acquisition module, a label identification module and a label identification module, wherein the initial label acquisition module is used for acquiring an unlabeled face picture sample without a label and an initial label preset value of the unlabeled face picture sample;

the label-free sample input module is used for inputting the label-free face picture sample into the living face detection model trained in the second stage to obtain a first pseudo label of the label-free face picture sample;

the initial label loss calculation module is used for calculating the loss of the initial pseudo label according to the first round of pseudo labels and the initial label preset value;

a first model loss calculation module, configured to calculate a loss of the living face detection model according to a loss of the initial pseudo tag and a loss of the second-stage loss function;

The parameter adjusting module is used for reversely adjusting parameters of the living body face detection model according to the calculated loss of the living body face detection model;

the propagation module is used for inputting the label-free face picture sample into the living face detection model again to conduct forward propagation, so that a current pseudo label of the label-free face picture sample is obtained;

The label loss calculation module is used for calculating the current loss of the pseudo label according to the current pseudo label and the pseudo label obtained when the adjacent pseudo label is transmitted forward last time;

A second model loss calculation module, configured to calculate a loss of the living face detection model according to the loss of the current pseudo tag and the loss of the second-stage loss function;

and the circulation module is used for circulating the step of reversely adjusting the parameters of the living body face detection model to calculate the loss of the living body face detection model when the loss of the living body face detection model is not converged, and obtaining the trained living body face detection model until the loss of the living body face detection model is converged.

Mathematically representing the current penalty of the pseudo tag, for example:

In one embodiment, the second model loss calculation module is specifically configured to:

According to the embodiment, the non-labeled face picture sample without the label is used for semi-supervised learning of the living face detection model, so that when the non-labeled face picture sample is newly added, and the living face detection model needs to be trained continuously, the workload of manual labeling of the face picture sample can be reduced, and the updating efficiency of the living face detection model is improved.

The meaning of "first" and "second" in the above modules/units is merely to distinguish different modules/units, and is not used to limit which module/unit has higher priority or other limiting meaning. Furthermore, the terms "comprises," "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules that are expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or modules that may not be expressly listed or inherent to such process, method, article, or apparatus, and the partitioning of such modules by means of any other means that may be implemented by such means.

For specific limitations of the training device for the living face detection model, reference may be made to the above limitations of the training method for the living face detection model, and details thereof will not be repeated here. The modules in the training device of the living body face detection model can be all or partially realized by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is for communicating with an external server via a network connection. The computer program, when executed by a processor, implements a training method for a living face detection model.

In one embodiment, a computer device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the training method of the living face detection model in the above embodiment when the computer program is executed, such as steps 101 to 105 shown in fig. 2 and other extensions of the method and extensions of related steps. Or the processor when executing the computer program implements the functions of the respective modules/units of the training apparatus of the living body face detection model in the above embodiment, such as the functions of the modules 11 to 15 shown in fig. 4. In order to avoid repetition, a description thereof is omitted.

The Processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is a control center of the computer device, connecting various parts of the overall computer device using various interfaces and lines.

The memory may be used to store the computer program and/or modules, and the processor may implement various functions of the computer device by running or executing the computer program and/or modules stored in the memory, and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the cellular phone, etc.

The memory may be integrated in the processor or may be provided separately from the processor.

In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor, implements the steps of the training method of the living face detection model in the above embodiment, such as steps 101 to 105 shown in fig. 2 and other extensions of the method and extensions of related steps. Or the computer program when executed by the processor, implements the functions of the respective modules/units of the training apparatus of the living body face detection model in the above-described embodiment, such as the functions of the modules 11 to 15 shown in fig. 4. In order to avoid repetition, a description thereof is omitted.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A method for training a living face detection model, the method comprising:

Acquiring a face picture sample set carrying a label, wherein the face picture sample set comprises normal face picture samples and abnormal face picture samples, and the number of the normal face picture samples is greater than that of the abnormal face picture samples;

Correcting the original predicted value of the abnormal face picture sample to obtain an abnormal face picture sample corrected predicted value;

Taking the correction predicted value as a predicted value of the living body face detection model to be trained on the abnormal face picture sample, taking an original predicted value of the normal face picture sample as a predicted value of the living body face detection model to be trained on the normal face picture sample, taking a normal category or an abnormal category, which actually belongs to the corresponding face picture sample identified in the label, as a target, and training the living body face detection model to be trained;

When the loss function of the living body face detection model converges, obtaining a trained living body face detection model;

The step of correcting the original predicted value of the abnormal face picture sample to obtain the corrected predicted value of the abnormal face picture sample comprises the following steps: calculating the correction predicted value of the abnormal face picture sample through the following formula:

Wherein mid represents a correction predicted value of the abnormal face picture sample i, P represents an original predicted value of the abnormal face picture sample, C represents a super parameter, max represents the number of samples in the class with the largest sample size, and n _i represents the total number of samples in the ith class;

Wherein, after the step of obtaining the normal face picture sample and the abnormal face picture sample carrying the tag, the method further comprises: randomly extracting a first sample set from the face picture sample set, wherein the first sample set comprises the normal face picture sample and the abnormal face picture sample; randomly extracting a second sample set from the face picture sample set, wherein the second sample set comprises the normal face picture sample and the abnormal face picture sample; the step of training the living body face detection model to be trained by taking the normal category or the abnormal category to which the corresponding face picture sample identified in the label actually belongs as a target comprises the following steps: selecting a first-stage loss function, and performing first-stage training on the living body face detection model through the first sample set; when the first-stage loss function converges, selecting a second-stage loss function, and continuing to train the second-stage on the living face detection model through the second sample set

Wherein the second stage loss function is:

Wherein f _2θ represents a living face detection model with a parameter θ to be learned in the second stage, (x ', y ') represents samples x ', n _i in a second sample set with a label of y ', m represents the total number of samples in the ith class, m represents the number of samples in the first sample set, LD ((x ', y ') and f _2θ) represents a predicted value of the living face detection model on a face picture sample x ' in the second stage;

wherein, judge whether the said second stage loss function converges through the following formula:

wherein, And (3) representing the gradient of the parameter theta, wherein alpha represents the learning rate of the second-stage loss function, n _i represents the total number of samples in the ith category, f _2θ′ represents the output of the living body face detection model in the second stage when the parameter which is adjusted last time is theta', and the second-stage loss function convergence is judged when the absolute value of the difference value between the left f _2θ of the equation and the right of the equation is within a preset second range.

2. The method of training a living face detection model according to claim 1, wherein the first stage loss function is:

Wherein f _1θ represents a living face detection model with a parameter θ to be learned in the first stage, (x, y) represents a sample x in the first sample set with a label y, m represents the number of samples in the first sample set, LD ((x, y), f _1θ) represents a predicted value of the living face detection model on a face picture sample x in the first stage.

3. The training method of a living face detection model according to claim 2, wherein whether the first-stage loss function converges is determined by the following formula:

wherein, And a gradient representing a parameter theta, wherein alpha ₁ represents the learning rate of the first-stage loss function, f _1θ′ represents the output of the living body face detection model in the first stage when the parameter which is adjusted last time is theta', and the first-stage loss function is judged to be converged when the absolute value of the difference value between f _1θ on the left side of the equation and the right side of the equation is within a preset first range.

4. A training device for a living face detection model, the device comprising:

the second extraction module is used for randomly extracting a second sample set from the face picture sample set, wherein the second sample set comprises the normal face picture sample and the abnormal face picture sample;

The correction processing module is used for carrying out correction processing on the original predicted value of the abnormal face picture sample to obtain the corrected predicted value of the abnormal face picture sample; the method is specifically used for calculating the correction predicted value of the abnormal face picture sample through the following formula:

The training module is used for taking the correction predicted value as the predicted value of the living body face detection model to be trained on the abnormal face picture sample, taking the original predicted value of the normal face picture sample as the predicted value of the living body face detection model to be trained on the normal face picture sample, taking the normal category or the abnormal category of the corresponding face picture sample identified in the label as the target, and training the living body face detection model to be trained; the method specifically comprises the following steps: the first stage training unit is used for selecting a first stage loss function and carrying out first stage training on the living body face detection model through the first sample set; the second stage training unit is used for selecting a second stage loss function when the first stage loss function converges, and continuing to train the living face detection model in the second stage through the second sample set; the second stage loss function is:

Wherein f _2θ represents a living face detection model with a parameter θ to be learned in the second stage, (x ', y ') represents samples x ', n _i in a second sample set with a label of y ', m represents the total number of samples in the ith class, m represents the number of samples in the first sample set, LD ((x ', y ') and f _2θ) represents a predicted value of the living face detection model on a face picture sample x ' in the second stage; judging whether the second-stage loss function converges or not through the following formula:

wherein, Representing a gradient of a parameter theta, wherein alpha represents a learning rate of the loss function of the second stage, n _i represents the total number of samples in the ith category, f _2θ 'represents the output of the living body face detection model in the second stage when the parameter which is adjusted last time is theta', and judging that the loss function of the second stage is converged when the absolute value of a difference value between f _2θ on the left side of the equation and f _2θ on the right side of the equation is within a preset second range;

5. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, carries out the steps of the training method of the living face detection model according to any one of claims 1 to 3.

6. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the training method of the living face detection model according to any one of claims 1 to 3.