CN114299567B

CN114299567B - Model training method, living body detection method, electronic device, and storage medium

Info

Publication number: CN114299567B
Application number: CN202111463661.4A
Authority: CN
Inventors: 王军华; 付贤强; 朱海涛; 户磊
Original assignee: Hefei Dilusense Technology Co Ltd
Current assignee: Hefei Dilusense Technology Co Ltd
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2022-11-18
Anticipated expiration: 2041-12-02
Also published as: CN114299567A

Abstract

The embodiment of the invention relates to the field of image processing, and discloses a model training method, a living body detection method, an electronic device and a storage medium, wherein image samples of a living body and a non-living body containing a human face and class labels of the image samples are obtained; the category labels include a plurality of category labels belonging to living bodies, and a plurality of category labels belonging to non-living bodies; taking the image sample as input, and taking the feature vector of the image sample as output to construct a feature extraction model; taking a feature vector output by the feature extraction model as input, and taking the probability of the feature vector belonging to each class label as output to construct a classifier; and performing combined training on the feature extraction model and the classifier to obtain the trained feature extraction model and the trained classifier. According to the scheme, the generalization capability of the trained model is greatly improved through a skillfully designed data label method and a new loss function.

Description

Model training method, living body detection method, electronic device, and storage medium

Technical Field

The present invention relates to the field of image processing, and in particular, to a method for model training and in-vivo detection, an electronic device, and a storage medium.

Background

The existing human face living body detection algorithm is mainly based on a deep learning living body detection algorithm, and the ability of extracting characteristics capable of distinguishing living bodies from non-living bodies is learned by collecting sample data of living bodies and false bodies in advance and training a deep learning model by using the sample data.

The conventional training method is as follows: dividing sample data into two types of living bodies and non-living bodies, and giving L =2 different types of labels, such as 0 and 1, as supervision information during training; and E rounds of training are summed, M data in all samples are used in each round, the M data are randomly divided into K batches in one round, each batch comprises B data, and then the K batches of data are used for training the feature extraction model in sequence. For each datum, the probability of belonging to different labels can be obtained by the features extracted by the feature extraction model through a classifier, and then the difference between the prediction probability and the actual situation is measured by using cross entropy loss so as to optimize the model parameters.

The training thought simply divides the human face into a living body and a false body, ignores the subdivision condition of the living body and the false body, and has great difference of different age groups and types of human in the living body; the non-living bodies are different due to a plurality of types, such as A4 paper, photos, mobile phone screen photos, clothes, latex head covers, silica gel head covers, plastic masks and the like. Some prostheses can easily acquire a large amount of training data, while some prostheses have difficulty acquiring a large amount of training data. For such a situation, the conventional training method is often poor in training effect on a small number of prostheses or on types of prostheses that do not appear in training data, so that the generalization effect of the trained face in-vivo detection algorithm on the prostheses of unknown types is poor.

Disclosure of Invention

The embodiment of the invention aims to provide a model training method, a living body detection method, an electronic device and a storage medium, and the generalization capability of a trained model is greatly improved through a skillfully designed data label method and a new loss function.

In order to solve the above technical problem, an embodiment of the present invention provides a model training method, including:

acquiring image samples of living bodies and non-living bodies containing human faces and class labels of the image samples; the category labels include a plurality of category labels belonging to living bodies, and a plurality of category labels belonging to non-living bodies;

taking the image sample as input, and taking the feature vector of the image sample as output to construct a feature extraction model;

taking a feature vector output by the feature extraction model as input, and taking the probability that the feature vector belongs to each class label as output to construct a classifier;

and performing combined training on the feature extraction model and the classifier, wherein a loss function during the combined training is constructed on the basis of a first loss between a feature vector output by the feature extraction model and a class center feature vector of a class label to which the feature vector belongs and a second loss between a prediction class output by the classifier and the class label.

The embodiment of the invention also provides a living body detection method, which comprises the following steps:

processing the face image to be detected by adopting the feature extraction model and the classifier obtained by the combined training of the model training method to obtain the feature vector of the face image and the probability of the feature vector belonging to each class of labels;

determining a first probability that the face image belongs to the living body based on a cosine value of an included angle between the feature vector and class center feature vectors of various classes of labels belonging to the living body;

determining a second probability that the face image belongs to the living body based on the probability that the feature vector belongs to each class label of the living body;

and determining a final probability that the face image belongs to the living body based on the first probability and the second probability.

An embodiment of the present invention also provides an electronic device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a model training method as described above, and a liveness detection method as described above.

Embodiments of the present invention also provide a computer-readable storage medium storing a computer program which, when executed by a processor, implements the model training method as described above, and the in-vivo detection method as described above.

Compared with the prior art, the method and the device have the advantages that the image samples of the living body and the non-living body containing the human face and the class labels of the image samples are obtained; the category labels include a plurality of category labels belonging to living bodies, and a plurality of category labels belonging to non-living bodies; taking an image sample as input, taking a feature vector of the image sample as output, and constructing a feature extraction model; taking a feature vector output by the feature extraction model as input, taking the probability of the feature vector belonging to each class of labels as output, and constructing a classifier; and performing combined training on the feature extraction model and the classifier, wherein a loss function in the combined training is constructed on the basis of a first loss between a feature vector output by the feature extraction model and a class center feature vector of a class label to which the feature vector belongs and a second loss between a prediction class output by the classifier and the class label. The scheme is different from the traditional category labels with living bodies and non-living bodies as the image samples, and uses a plurality of category labels belonging to the living bodies and a plurality of category labels belonging to the non-living bodies as the category labels for the image samples, so that the subdivision conditions of the living bodies and the non-living bodies can be further mined. Meanwhile, when the feature extraction model and the classifier are jointly trained, a loss function of the joint training is constructed according to a first loss between the feature vector extracted by the feature extraction model and the class center feature vector of the class label to which the feature vector belongs and a second loss between the prediction class output by the classifier and the class label, so that the feature extraction capability and the classification prediction capability of the model can be greatly improved, the generalization capability of the trained model is further improved, and the judgment capability of the non-living body type with less or even no samples in the training set is remarkably improved.

Drawings

FIG. 1 is a detailed flow diagram of a model training method according to an embodiment of the invention;

FIG. 2 is a detailed flow diagram of an image sample acquisition method according to an embodiment of the invention;

fig. 3 is a detailed flowchart of a first loss acquisition method according to an embodiment of the present invention;

FIG. 4 is a detailed flowchart of a biopsy method according to an embodiment of the invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.

An embodiment of the present invention relates to a model training method, and as shown in fig. 1, the model training method provided in this embodiment includes the following steps.

Step 101: acquiring living body and non-living body image samples containing human faces and class labels of the image samples; the category labels include a plurality of category labels belonging to living bodies, and a plurality of category labels belonging to non-living bodies.

Specifically, original images of a living body and a non-living body including a human face can be acquired by photographing and the like, image samples for model training are formed, and each image sample labels the type of the image sample in advance to obtain a label. Unlike the conventional live body detection algorithm training process, the definition of the category label in the present embodiment not only defines two category labels according to the live body and the non-live body to which the image sample belongs, but defines a plurality of category labels detailed under the live body and the non-live body categories, that is, the category label includes a plurality of category labels belonging to the live body and a plurality of category labels belonging to the non-live body. The plurality of category labels belonging to the living body may be divided according to one or more classification dimensions, the plurality of category labels belonging to the non-living body may be divided according to one or more classification dimensions, and the classification dimension used by the living body and the classification dimension used by the non-living body may not be identical.

In one example, as shown in FIG. 2, this step may be implemented by the following substeps.

Substep 1011: an original image of a living body and a non-living body including a face of a person is acquired.

Specifically, the original images of the living body and the non-living body including the face of a person can be acquired by photographing or the like.

Sub-step 1012: an original image belonging to a living body is labeled based on a plurality of category labels defined in advance by the age group of the living body.

Specifically, for a living body, it is possible to divide into 5 categories by age groups to which the living body belongs, and set a category label for each category, for example: 0 to 6 is 100,7 to 12 is 101, 13 to 50 is 102, 51 to 70 is 103, and 70 or more is 104. Then, for an original image belonging to a living body, a category label is set for the original image by the age to which the face in the original image belongs.

Substep 1013: the original image belonging to the non-living body is labeled based on a plurality of category labels defined in advance by the material of the non-living body.

Specifically, for non-living objects (i.e., "prostheses"), it is possible to classify the non-living objects into a plurality of categories by their material quality, and to set a category label for each category, for example: the 2D prostheses are classified into 7 classes according to material (the specific number depends on the actual material): the unknown material is 200, the color A4 paper is 201, the black-and-white A4 paper is 202, the color photo is 203, the black-and-white photo is 204, the color coated paper is 205, and the black-and-white coated paper is 206; the 3D prosthesis is classified into 5 types according to material quality (the specific quantity depends on the actual material quality): the material is unknown 300, the plastic 3D prosthesis is 301, the latex 3D prosthesis is 302, the silica gel 3D prosthesis is 303, and the resin 3D prosthesis is 304. Then, for the original image belonging to the non-living body, a category label is set for the original image according to the material to which the face in the original image belongs.

Substep 1014: extracting a specified number of original images from the original images as image samples; the original images with the specified number cover all the category labels, and the number of the original images corresponding to each category label is the same.

Specifically, after the original image is acquired and the category label corresponding to the original image is marked, the original image used for model training needs to be extracted from the original image as an image sample. In order to improve the generalization capability of the model, a specified number of images can be extracted from the original images, the category labels corresponding to the extracted original images need to cover all the predefined category labels, and the number of the original images corresponding to each category label is the same.

For example, the process of extracting the image sample may be implemented by the following steps.

The method comprises the following steps: and aiming at the original image of any type of label, randomly taking m original images from the original image, and judging the magnitude relation between m and a quotient value obtained by dividing the designated number by the total type label number.

Specifically, assume that the number of image samples required for one round of training is designated as M and the number of classes covered in all the original images is c (and thus the total class label number) when training the model. In order to improve the generalization capability of the model, the same number of images need to be extracted from the original images corresponding to each category label as image samples, so the number of the original images extracted corresponding to each category label should be M/c.

When extracting an original image for any category label, M original images belonging to the category label can be randomly taken from the original image, and then the size relationship between M and M/c can be judged.

Step two: if m is larger than the quotient value, deleting part of original images from the m original images to enable the number of the remaining original images to be equal to the quotient value.

Specifically, when M is greater than M/c, the number of original images extracted from the current category label is greater than the number of images extracted, and at this time, a partial number (M-M/c) of original images needs to be deleted from the extracted original images, so that the number of remaining original images extracted from the current category label is equal to M/c.

Step three: if m is smaller than the quotient value, randomly taking part of original images from the original images of the type labels again to enable the total number of the taken original images to be equal to the quotient value.

Specifically, when M is smaller than M/c, the number of original images extracted from the current category label is less than the specified number of extracted images, and at this time, a partial number (M/c-M) of original images needs to be extracted again from (all) original images of the category label, so that the number of total original images extracted from the current category label after re-extraction is equal to M/c. When the original image is extracted again, it can be acquired by random sampling.

Step four: and taking the selected M original images of all categories as M image samples to randomly disorder the sequence, and averagely dividing the M original images into M/B batches, wherein B is the batch size.

Specifically, for each class label, extracting a specified number (M/c) of original images under the corresponding class label by adopting the method from the first step to the third step, thereby obtaining the total number of the original images of the specified number M as M image samples; and randomly disordering the extracted M image samples, and forming M/B batch samples by taking every B data as a batch, wherein B is the batch size. In the subsequent model training, different batches of image samples can be selected according to batches for model training, and the training process of each batch of image samples is used as a training period.

Step 102: and taking the image sample as input, and taking the feature vector of the image sample as output to construct a feature extraction model.

Specifically, a conventional deep learning network E (referred to as a "model E" for short) is constructed as a feature extraction model, trainable parameters of the model E are recorded as WE, the input of the model E is the image sample containing the face, and the output is an n-dimensional feature vector v. Where n is a hyperparameter, set empirically, for example, taking n =128.

Step 103: and taking the feature vector output by the feature extraction model as input, and taking the probability that the feature vector belongs to each class label as output to construct a classifier.

Specifically, a conventional deep learning network C (called a model C for short) is constructed as a classifier, and trainable parameters of the model C are recorded as WC. The input of the model C is the feature extraction model in step 102, i.e. the n-dimensional feature vector v output by the model E, and the output is the C (the size is the same as the total class label number) dimensional vector p, wherein p is _j,i Representing the jth image sampleThe probability value that the corresponding feature vector belongs to the ith class label.

Step 104: and performing combined training on the feature extraction model and the classifier, wherein a loss function in the combined training is constructed on the basis of a first loss between a feature vector output by the feature extraction model and a class center feature vector of a class label to which the feature vector belongs and a second loss between a prediction class output by the classifier and the class label.

Specifically, the constructed feature extraction model (model E) and the classifier (model C) are jointly trained by using an image sample to obtain the trained feature extraction model and classifier. The loss function in the process of performing the joint training can be constructed based on a first loss between the feature vector output by the feature extraction model and the class center feature vector of the class label to which the feature vector belongs, and a second loss between the prediction class output by the classifier and the class label.

The methods of constructing the first loss and the second loss will be described below, respectively.

As shown in fig. 3, the first loss construction process can be implemented as follows.

Step 201: calculating a feature vector v and a class center feature vector v of a class label to which the feature vector belongs by the following formula (1) _CF D (v) from the top.

Specifically, a central feature with n =128 dimensions (the dimensions of the feature vector output by the feature extraction model), i.e., a class-center feature vector, may be defined for each class label. Class-centric feature vectors such as the ith class label may be denoted as

Its initial value is a random value. If the number of the image samples in a batch is B, the feature vector v epsilon R output after the feature extraction model ^n×B Wherein R isReal numbers.

Obtaining each feature vector v and class center feature vector v of class label to which the feature vector belongs _CF The distance D (v) between the two vectors is obtained through calculation of the formula (1).

In one example, before performing step 201, the central feature vectors v of each class can be processed by the following steps _CF And (4) updating.

For various central characteristic vectors v used in the current training period _CF The update is performed by the following formula.

Wherein v is _CF For the updated class-center feature vector,

a is a class center feature vector before updating, a is a hyper-parameter, b is the number of image samples under the same class label, e _K Feature vector v of kth image sample under same class label _K The labels corresponding to the category

The vector difference between them.

Specifically, suppose that in the image sample number B used in the current training period (batch), ct category labels (ct is less than or equal to c) are involved, and c is the total number of category labels; for the image samples under the various class labels belonging to the B image samples, respectively updating the new class central feature vector under the class label as v through a formula 2 _CF . Wherein

For the old class center feature vector under the class label,

i.e. the class-centric feature vector used in the last training period.

The feature vector v of the kth image sample under the same class label can be calculated by the following formula _K Corresponding to the category label

The vector difference between them.

Step 202: the first loss is calculated by the following formula (5).

Wherein, W _E Extracting trainable parameters, L, of the model E for the features _E (W _E ) For the first loss, B is the batch size of the image sample, D (v) _j ) Is the jth image sample v in a batch of image samples _j The corresponding distance.

Specifically, a feature vector v and a class center feature vector v of a class label to which the feature vector belongs are obtained _CF After the distance D (v) therebetween, the first loss can be calculated by equation (5). First loss may be to trainable parameters W of the feature extraction model _E Constraint is carried out, so that the similarity between a certain feature vector and the class center feature vector under the class label of the feature vector is high, and the similarity between the feature vector and the class center feature vector of the class of a non-belonging label is low, and the trainable parameters W in the model E are trained _E 。

The second loss construction process can be achieved by the following steps.

The second loss is calculated by equation (6).

Wherein, W _C Trainable parameters for classifier C, L _C (W _C ) For the second loss, B is the batch size of the image sample, c is the total class label count, y _j,i Is the actual probability, p, of the ith class label to which the jth image sample belongs in a batch of image samples _j,i The prediction probability of the ith class label to which the jth image sample in the image samples belongs is obtained.

The second penalty may be on trainable parameters W of classifier C _C Constraining a feature vector to train a trainable parameter W in a model C in a direction having a high prediction probability for a label class to which the feature vector belongs and a low prediction probability for a label class to which the feature vector does not belong _C 。

On the basis, when the feature extraction model and the classifier are subjected to combined training, the loss function during the adopted combined training can be constructed by the formula (7).

loss＝L _E (W _E )+L _C (W _C )………………………(7)

Wherein loss is the loss value during joint training, L _E (W _E ) For the first loss, L _C (W _C ) The second loss.

Specifically, when the loss of the joint training is calculated by using the formula (7), the parameters (W) of the model E and the model C are optimized according to the conventional deep learning network optimization method _E 、W _C ) Namely:

compared with the related art, the embodiment obtains the image samples of the living body and the non-living body containing the human face and the class labels of the image samples; the category labels include a plurality of category labels belonging to living bodies, and a plurality of category labels belonging to non-living bodies; taking an image sample as input, taking a feature vector of the image sample as output, and constructing a feature extraction model; taking a feature vector output by the feature extraction model as input, taking the probability of the feature vector belonging to each class of labels as output, and constructing a classifier; and performing combined training on the feature extraction model and the classifier, wherein a loss function in the combined training is constructed on the basis of a first loss between a feature vector output by the feature extraction model and a class center feature vector of a class label to which the feature vector belongs and a second loss between a prediction class output by the classifier and the class label. The scheme is different from the traditional method that living bodies and non-living bodies are used as class labels of the image samples, and a plurality of class labels belonging to the living bodies and a plurality of class labels belonging to the non-living bodies are used as the class labels of the image samples, so that the subdivision conditions of the living bodies and the non-living bodies can be further mined. Meanwhile, when the feature extraction model and the classifier are jointly trained, a loss function of the joint training is constructed according to a first loss between the feature vector extracted by the feature extraction model and the class center feature vector of the class label to which the feature vector belongs and a second loss between the prediction class output by the classifier and the class label, so that the feature extraction capability and the classification prediction capability of the model can be greatly improved, the generalization capability of the trained model is further improved, and the judgment capability of the non-living body type with less or even no samples in the training set is remarkably improved.

Another embodiment of the present invention relates to a living body detection method, which is implemented based on the above-described model training method. As shown in fig. 4, the living body detecting method includes the following steps.

Step 301: and processing the face image to be detected by adopting a feature extraction model and a classifier obtained by joint training of a model training method to obtain a feature vector of the face image and the probability of the feature vector belonging to each class of labels.

Specifically, the feature extraction model E obtained by training with the model training method is used to perform feature extraction on the face image to be detected, so as to obtain a feature vector v corresponding to the face image. And classifying the feature vector v corresponding to the face image to be detected output by the feature extraction model E by using the classifier C obtained by training by using the model training method to obtain the probability p that the feature vector belongs to each class of label.

Step 302: and determining a first probability that the face image belongs to the living body based on cosine values of included angles between the feature vectors and class center feature vectors of various classes of labels belonging to the living body.

Specifically, feature vectors v obtained by feature extraction and class center feature vectors of various types of labels belonging to living bodies are processed to obtain cosine values of included angles between the feature vectors v and the class center feature vectors

In which

Class center feature vectors that are the i-th class labels belonging to living subjects, namely: the ith class label belongs to

class labels

100, 101, 102, 103, 104.

Then, the first probability p is calculated according to the following formula (8) _E 。

Therein, max _i Among a plurality of category labels belonging to the living body,

of (c) is calculated.

Specifically, cosine values of included angles between feature vectors v of a face image to be detected and class center feature vectors of various classes of labels belonging to living bodies are obtained

Then, the first probability p can be obtained according to the formula (8) _E 。

Step 303: and determining a second probability that the face image belongs to the living body based on the probability that the feature vector belongs to each class label of the living body.

Specifically, the probability p of each class label of the living body to which the feature vector belongs is obtained _i Thereafter, the second probability p can be calculated according to the following equation (9) _C 。

p _C ＝∑p _i ………………………(9)

Wherein the ith class label belongs to the class labels {100, 101, 102, 103, 104}.

Step 304: and determining the final probability that the face image belongs to the living body based on the first probability and the second probability.

Specifically, a first probability p is obtained _E And a second probability p _C Thereafter, the final probability P can be calculated according to the following equation (10).

P＝d×p _E +(1-d)×p _C ………………………(10)

Where d is a hyperparameter, empirically set to 0.1.

Compared with the prior art, the embodiment of the invention processes the face image to be detected through the feature extraction model and the classifier obtained by the combined training of the model training method to obtain the feature vector of the face image and the probability of the feature vector belonging to each class of labels; determining a first probability that the face image belongs to the living body based on cosine values of included angles between the feature vectors and class center feature vectors of various classes of labels belonging to the living body; determining a second probability that the face image belongs to the living body based on the probability that the feature vector belongs to each class label of the living body; and determining the final probability that the face image belongs to the living body based on the first probability and the second probability.

In the scheme, the adopted feature extraction model and the classifier are obtained by joint training of the image samples marked by the plurality of category labels belonging to the living body and the plurality of category labels belonging to the non-living body, so that the subdivision conditions of the living body and the non-living body can be further detected. Meanwhile, when the feature extraction model and the classifier are jointly trained, a loss function of the joint training is constructed by using a first loss between the feature vector extracted by the feature extraction model and the class center feature vector of the class label to which the feature vector belongs and a second loss between the prediction class output by the classifier and the class label, so that the feature extraction capability and the classification prediction capability of the model can be greatly improved, the generalization capability of the trained model is further improved, and the judgment capability of the non-living body type with less or even no samples in the training set is remarkably improved. On the basis, when the living body detection is carried out, the probability that the face to be detected belongs to the living body is judged together based on the first probability obtained by the characteristic extraction model and the second probability obtained by the classifier, so that the accuracy of the living body detection is improved.

Another embodiment of the invention relates to an electronic device, as shown in FIG. 5, comprising at least one processor 402; and a memory 401 communicatively coupled to the at least one processor 402; the memory 401 stores instructions executable by the at least one processor 402, and the instructions are executed by the at least one processor 402 to enable the at least one processor 402 to perform any one of the method embodiments described above.

Where the memory 401 and the processor 402 are coupled by a bus, which may include any number of interconnected buses and bridges that couple one or more of the various circuits of the processor 402 and the memory 401 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 402 is transmitted over a wireless medium through an antenna, which further receives the data and transmits the data to the processor 402.

The processor 402 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 401 may be used to store data used by processor 402 in performing operations.

Another embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes any of the above-described method embodiments when executed by a processor.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims

1. A method of model training, comprising:

performing joint training on the feature extraction model and the classifier, wherein a loss function in the joint training is constructed on the basis of a first loss between a feature vector output by the feature extraction model and a class center feature vector of a class label to which the feature vector belongs and a second loss between a prediction class output by the classifier and the class label;

the first loss is constructed by the following method:

by passing

Calculating characteristicsFeature vector v and class center feature vector v of class label to which the feature vector belongs _CF D (v) in between;

by passing

Calculating the first loss;

wherein, W _E Extracting trainable parameters of the model E for the feature, L _E (W _E ) For the first loss, B is the batch size of the image sample, D (v) _j ) Is the jth image sample v in a batch of image samples _j The corresponding distance;

before the first loss is constructed, the method further comprises the following steps:

various central characteristic vectors v used for the current training period _CF The update is performed by the following formula:

wherein v is _CF For the updated class-centered feature vector,

a is a class center feature vector before updating, a is a hyper-parameter, b is the number of image samples under the same class label, e _K Feature vector v for kth image sample under same class label _K The labels corresponding to the category

The vector difference between them.

2. The method of claim 1, wherein the obtaining of image samples of living and non-living subjects including a human face and class labels of the image samples comprises:

acquiring original images of a living body and a non-living body containing a human face;

labeling an original image belonging to a living body based on a plurality of category labels predefined according to the age group of the living body;

labeling an original image belonging to a non-living body based on a plurality of category labels predefined according to non-living body materials;

extracting a specified number of original images from the original images as the image samples;

and the original images with the specified number cover all the class labels, and the number of the original images corresponding to each class label is the same.

3. The method according to claim 2, wherein the extracting a specified number of original images from the original images as the image samples comprises:

randomly taking m original images from the original images aiming at the original images of any category of labels, and judging the magnitude relation between m and a quotient value obtained by dividing the designated number by the total category label number;

if the m is larger than the quotient value, deleting partial original images from the m original images to enable the number of the remaining original images to be equal to the quotient value;

if m is smaller than the quotient value, randomly taking part of original images from the original images of the class labels again to enable the total number of the original images to be equal to the quotient value;

and randomly disordering the selected M original images of all categories as M image samples, and averagely dividing the M original images into M/B batches, wherein B is the batch size.

4. The method of claim 1, wherein the second loss is constructed by:

by passing

Calculating the second loss;

wherein, W _C Trainable parameters for classifier C, L _C (W _C ) For the second loss, B is the batch size of the image sample, C is the total class label count, y _j,i Is the actual probability, p, of the ith class label to which the jth image sample belongs in a batch of image samples _j,i The prediction probability of the ith class label to which the jth image sample belongs in the image samples is obtained.

5. The method of claim 1, wherein the loss function in the joint training is constructed by the following formula:

loss＝L _E (W _E )+L _C (W _C )

wherein loss is the loss value during joint training, L _E (W _E ) For the first loss, L _C (W _C ) Is the second loss.

6. A method of in vivo detection, comprising:

processing a face image to be detected by adopting a feature extraction model and a classifier obtained by the combined training of the model training method according to any one of claims 1 to 5 to obtain a feature vector of the face image and the probability of the feature vector belonging to each class of labels;

7. An electronic device, comprising:

at least one processor; and (c) a second step of,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method of any one of claims 1 to 5 and the liveness detection method of claim 6.

8. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the model training method according to any one of claims 1 to 5 and the in-vivo detection method according to claim 6.