CN114299567A

CN114299567A - Model training method, living body detection method, electronic device, and storage medium

Info

Publication number: CN114299567A
Application number: CN202111463661.4A
Authority: CN
Inventors: 王军华; 付贤强; 朱海涛; 户磊
Original assignee: Beijing Dilusense Technology Co Ltd; Hefei Dilusense Technology Co Ltd
Current assignee: Hefei Dilusense Technology Co Ltd
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2022-04-08
Anticipated expiration: 2041-12-02
Also published as: CN114299567B

Abstract

The embodiment of the invention relates to the field of image processing, and discloses a model training method, a living body detection method, an electronic device and a storage medium, wherein image samples of a living body and a non-living body containing a human face and class labels of the image samples are obtained; the category labels include a plurality of category labels belonging to living bodies, and a plurality of category labels belonging to non-living bodies; taking the image sample as input and the feature vector of the image sample as output to construct a feature extraction model; taking a feature vector output by the feature extraction model as input, and taking the probability that the feature vector belongs to each class label as output to construct a classifier; and performing combined training on the feature extraction model and the classifier to obtain the trained feature extraction model and the trained classifier. According to the scheme, the generalization capability of the trained model is greatly improved through the skillfully designed data label method and the new loss function.

Description

Model training method, living body detection method, electronic device, and storage medium

Technical Field

The present invention relates to the field of image processing, and in particular, to a method for model training and in-vivo detection, an electronic device, and a storage medium.

Background

The existing human face living body detection algorithm is mainly based on a deep learning living body detection algorithm, and a deep learning model is trained by pre-collecting sample data of a living body and a prosthesis and using the sample data to help the deep learning model to learn the capability of extracting characteristics capable of distinguishing the living body from non-living body.

The conventional training method is as follows: dividing sample data into two types of living bodies and non-living bodies, and giving L-2 different types of labels, such as 0 and 1, as supervision information during training; and E rounds of training are summed, M data in all samples are used in each round, the M data are randomly divided into K batches in one round, each batch contains B data, and then the K batches are used for training the feature extraction model in sequence. For each datum, the probability of belonging to different labels can be obtained by the features extracted by the feature extraction model through a classifier, and then the difference between the prediction probability and the actual situation is measured by using cross entropy loss so as to optimize the model parameters.

The training thought simply divides the human face into a living body and a prosthesis, neglects the subdivision condition of the living body and the prosthesis, for example, the difference between different age groups and human species in the living body is very large; the non-living bodies are more diverse due to a variety of types, such as A4 paper, photos, mobile phone screen photos, clothes, latex head covers, silica gel head covers, plastic masks, and the like. Some prostheses can easily acquire a large amount of training data, while some prostheses have difficulty acquiring a large amount of training data. In such a situation, the training effect of the conventional training method on a small number of prostheses or on prosthesis types that do not appear in training data is often poor, so that the generalization effect of the trained human face living body detection algorithm on the prosthesis of the unknown type is poor.

Disclosure of Invention

The embodiment of the invention aims to provide a model training method, a living body detection method, an electronic device and a storage medium, and the generalization capability of a trained model is greatly improved through a skillfully designed data label method and a new loss function.

In order to solve the above technical problem, an embodiment of the present invention provides a model training method, including:

acquiring image samples of living bodies and non-living bodies containing human faces and class labels of the image samples; the category labels include a plurality of category labels belonging to living bodies, and a plurality of category labels belonging to non-living bodies;

taking the image sample as input and the feature vector of the image sample as output to construct a feature extraction model;

taking a feature vector output by the feature extraction model as input, and taking the probability that the feature vector belongs to each class label as output to construct a classifier;

and performing combined training on the feature extraction model and the classifier, wherein a loss function in the combined training is constructed on the basis of a first loss between a feature vector output by the feature extraction model and a class center feature vector of a class label to which the feature vector belongs and a second loss between a prediction class output by the classifier and the class label.

The embodiment of the invention also provides a living body detection method, which comprises the following steps:

processing the face image to be detected by adopting the feature extraction model and the classifier obtained by the model training method through combined training to obtain the feature vector of the face image and the probability of the feature vector belonging to each class of labels;

determining a first probability that the face image belongs to the living body based on a cosine value of an included angle between the feature vector and class center feature vectors of various classes of labels belonging to the living body;

determining a second probability that the face image belongs to the living body based on the probability that the feature vector belongs to each class label of the living body;

and determining the final probability that the face image belongs to the living body based on the first probability and the second probability.

An embodiment of the present invention also provides an electronic device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a model training method as described above, and a liveness detection method as described above.

Embodiments of the present invention also provide a computer-readable storage medium storing a computer program which, when executed by a processor, implements the model training method as described above, and the in-vivo detection method as described above.

Compared with the prior art, the method and the device have the advantages that the image samples of the living body and the non-living body containing the human face and the class labels of the image samples are obtained; the category labels include a plurality of category labels belonging to living bodies, and a plurality of category labels belonging to non-living bodies; taking an image sample as input and a feature vector of the image sample as output, and constructing a feature extraction model; taking a feature vector output by the feature extraction model as input, taking the probability that the feature vector belongs to each class of label as output, and constructing a classifier; and performing combined training on the feature extraction model and the classifier, wherein a loss function in the combined training is constructed on the basis of a first loss between a feature vector output by the feature extraction model and a class center feature vector of a class label to which the feature vector belongs and a second loss between a prediction class output by the classifier and the class label. The scheme is different from the traditional category labels with living bodies and non-living bodies as the image samples, and uses a plurality of category labels belonging to the living bodies and a plurality of category labels belonging to the non-living bodies as the category labels for the image samples, so that the subdivision conditions of the living bodies and the non-living bodies can be further mined. Meanwhile, when the feature extraction model and the classifier are jointly trained, a loss function of the joint training is constructed according to a first loss between the feature vector extracted by the feature extraction model and the class center feature vector of the class label to which the feature vector belongs and a second loss between the prediction class output by the classifier and the class label, so that the feature extraction capability and the classification prediction capability of the model can be greatly improved, the generalization capability of the trained model is further improved, and the judgment capability of the non-living body type with less or even no samples in the training set is remarkably improved.

Drawings

FIG. 1 is a detailed flow diagram of a model training method according to an embodiment of the invention;

FIG. 2 is a detailed flow chart of an image sample acquisition method according to an embodiment of the invention;

fig. 3 is a detailed flowchart of a first loss acquisition method according to an embodiment of the present invention;

FIG. 4 is a detailed flowchart of a biopsy method according to an embodiment of the invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.

An embodiment of the present invention relates to a model training method, and as shown in fig. 1, the model training method provided in this embodiment includes the following steps.

Step 101: acquiring living body and non-living body image samples containing human faces and class labels of the image samples; the category labels include a plurality of category labels belonging to living bodies, and a plurality of category labels belonging to non-living bodies.

Specifically, original images of living bodies and non-living bodies including human faces can be acquired by photographing and the like, image samples for model training are formed, and each image sample labels the category of the image sample in advance to obtain a label. Unlike the conventional live body detection algorithm training process, the definition of the class labels in the present embodiment not only defines two class labels according to the live body and the non-live body to which the image sample belongs, but defines a plurality of class labels refined under the live body and the non-live body classes, that is, the class labels include a plurality of class labels belonging to the live body and a plurality of class labels belonging to the non-live body. The plurality of category labels belonging to the living body may be divided according to one or more classification dimensions, the plurality of category labels belonging to the non-living body may be divided according to one or more classification dimensions, and the classification dimension used by the living body and the classification dimension used by the non-living body may not be identical.

In one example, as shown in FIG. 2, this step may be implemented by the following substeps.

Substep 1011: an original image of a living body and a non-living body including a face of a person is acquired.

Specifically, the original images of the living body and the non-living body including the face of a person can be acquired by photographing or the like.

Substep 1012: an original image belonging to a living body is labeled based on a plurality of category labels defined in advance by the age group of the living body.

Specifically, for a living body, it is possible to divide into 5 categories by age group to which the living body belongs, and set a category label for each category, for example: 0 to 6 is 100, 7 to 12 is 101, 13 to 50 is 102, 51 to 70 is 103, and 70 or more is 104. Then, for an original image belonging to a living body, a category label is set for the original image by the age to which the face in the original image belongs.

Substep 1013: the original image belonging to the non-living body is labeled based on a plurality of category labels defined in advance by the material of the non-living body.

Specifically, for non-living objects (i.e., "prostheses"), it is possible to classify the non-living objects into a plurality of categories by their material quality, and to set a category label for each category, for example: the 2D prostheses are classified into 7 classes according to material (the specific number depends on the actual material): the unknown material is 200, the color A4 paper is 201, the black-and-white A4 paper is 202, the color photo is 203, the black-and-white photo is 204, the color coated paper is 205, and the black-and-white coated paper is 206; the 3D prostheses are classified into 5 classes according to material (the specific number depends on the actual material): the material is unknown 300, the plastic 3D prosthesis is 301, the latex 3D prosthesis is 302, the silica gel 3D prosthesis is 303, and the resin 3D prosthesis is 304. Then, for the original image belonging to the non-living body, a category label is set for the original image according to the material to which the face in the original image belongs.

Substep 1014: extracting a specified number of original images from the original images as image samples; the original images with the specified number cover all the category labels, and the number of the original images corresponding to each category label is the same.

Specifically, after the original image is acquired and the class label corresponding to the original image is marked, the original image for model training needs to be extracted from the original image as an image sample. In order to improve the generalization capability of the model, a specified number of images can be extracted from the original images, the category labels corresponding to the extracted original images need to cover all predefined category labels, and the number of the original images corresponding to each category label is the same.

The process of extracting an image sample can be realized, for example, by the following steps.

The method comprises the following steps: and aiming at the original image of any category label, randomly taking m original images from the original image, and judging the magnitude relation between m and a quotient value obtained by dividing the designated number by the total category label number.

Specifically, assume that when training the model, the number of image samples required for one round of training is designated as M, and the number of classes covered in all the original images is c (and thus the total number of class labels). In order to improve the generalization capability of the model, the same number of images need to be extracted from the original images corresponding to each category label as image samples, so the number of the original images extracted corresponding to each category label should be M/c.

When extracting an original image for any category label, M original images belonging to the category label can be randomly taken from the original image, and then the size relationship between M and M/c can be judged.

Step two: if m is larger than the quotient value, deleting part of original images from the m original images to enable the number of the remaining original images to be equal to the quotient value.

Specifically, when M is greater than M/c, the number of original images extracted from the current category label is greater than the number of images extracted, and at this time, a partial number (M-M/c) of original images needs to be deleted from the extracted original images, so that the number of remaining original images extracted from the current category label is equal to M/c.

Step three: if m is smaller than the quotient value, randomly taking part of original images from the original images of the labels of the category again, and enabling the total number of the original images to be equal to the quotient value.

Specifically, when M is smaller than M/c, the number of original images extracted from the current category label is less than the number of extracted images, and at this time, a partial number (M/c-M) of original images needs to be extracted again from (all) original images of the category label, so that the number of extracted total original images from the current category label after re-extraction is equal to M/c. When the original image is extracted again, it can be acquired by random sampling.

Step four: and taking the selected M original images of all categories as M image samples to randomly disorder the sequence, and averagely dividing the M original images into M/B batches, wherein B is the batch size.

Specifically, for each category label, extracting a specified number (M/c) of original images under the corresponding category label by adopting the method from the first step to the third step, thereby obtaining the total number M of the original images as M image samples; and randomly disordering the extracted M image samples, and forming M/B batch samples by taking every B data as a batch, wherein B is the batch size. In the subsequent model training, different batches of image samples can be selected according to batches for model training, and the training process of each batch of image samples is used as a training period.

Step 102: and taking the image sample as input, and taking the feature vector of the image sample as output to construct a feature extraction model.

Specifically, a conventional deep learning network E (referred to as a "model E" for short) is constructed as a feature extraction model, trainable parameters of the model E are recorded as WE, the input of the model E is the image sample containing the face, and the output is an n-dimensional feature vector v. Where n is a hyperparameter, it is set empirically, for example, taking n as 128.

Step 103: and taking the feature vector output by the feature extraction model as input, and taking the probability that the feature vector belongs to each class label as output to construct a classifier.

Specifically, a conventional deep learning network C (referred to as "model C" for short) is constructed as a classifier, and trainable parameters of the model C are denoted as WC. The input of the model C is the feature extraction model in step 102, i.e. the n-dimensional feature vector v output by the model E, and the output is the C (the size is the same as the total class label number) dimensional vector p, wherein p is_j,iAnd representing the probability value of the feature vector corresponding to the jth image sample belonging to the ith class label.

Step 104: and performing combined training on the feature extraction model and the classifier, wherein a loss function in the combined training is constructed on the basis of a first loss between a feature vector output by the feature extraction model and a class center feature vector of a class label to which the feature vector belongs and a second loss between a prediction class output by the classifier and the class label.

Specifically, the constructed feature extraction model (model E) and the classifier (model C) are jointly trained by using an image sample to obtain the trained feature extraction model and classifier. The loss function in the process of performing the joint training can be constructed based on a first loss between the feature vector output by the feature extraction model and the class center feature vector of the class label to which the feature vector belongs, and a second loss between the prediction class output by the classifier and the class label.

The methods of constructing the first loss and the second loss will be described below, respectively.

As shown in fig. 3, the first loss construction process can be implemented as follows.

Step 201: calculating a feature vector v and a class center feature vector v of a class label to which the feature vector belongs by the following formula (1)_CFD (v) from the first to the second.

Specifically, each class label may be defined with a central feature with n-128 dimensions (dimensions of the feature vector output by the feature extraction model), that is, a class-center feature vector. Class-centric feature vectors such as the ith class label may be denoted as

The initial value is a random value. If the number of the image samples in a batch is B, the feature vector v epsilon R output after the feature extraction model^n×BWherein R is a real number.

Obtaining each feature vector v and class center feature vector v of class label to which the feature vector belongs_CFThe distance D (v) between the two vectors is obtained by calculation according to the formula (1).

In one example, before performing step 201, the central feature vectors v of each class can be processed by the following steps_CFAnd (6) updating.

For various central characteristic vectors v used in the current training period_CFThe update is performed by the following formula.

Wherein v is_CFFor the updated class-centered feature vector,

a is a class center feature vector before updating, a is a hyper-parameter, b is the number of image samples under the same class label, e_KFeature vector v of kth image sample under same class label_KThe labels corresponding to the category

The vector difference between them.

Specifically, suppose that in the image sample number B used in the current training period (batch), ct category labels (ct is less than or equal to c) are involved, and c is the total number of category labels; for the image samples under the various class labels of the B image samples, respectively updating the new class central feature vector under the class label to be v through a formula 2_CF. Wherein

For the old class center feature vector under the class label,

i.e. the class-centric feature vector used in the last training period.

The feature vector v of the kth image sample under the same class label can be calculated by the following formula_KCorresponding to the category label

The vector difference between them.

Step 202: the first loss is calculated by the following formula (5).

Wherein, W_EExtracting trainable parameters, L, of the model E for the features_E(W_E) For the first loss, B is the batch size of the image sample, D (v)_j) Is the jth image sample v in a batch of image samples_jThe corresponding distance.

Specifically, a feature vector v and a class center feature vector v of a class label to which the feature vector belongs are obtained_CFAfter the distance d (v), the first loss can be calculated by equation (5). First loss may be for trainable parameters W of the feature extraction model_EConstraining to train trainable parameters W in the model E along the direction of high similarity between the class center feature vector of a certain feature vector and the class center feature vector of the class label of the certain feature vector and low similarity between the class center feature vectors of the class labels of the certain feature vector and the class center feature vector of the class labels of the non-certain feature vector_E。

The second loss construction process can be realized by the following steps.

The second loss is calculated by equation (6).

Wherein, W_CAs trainable parameters of the classifier C, L_C(W_C) For the second loss, B is the batch size of the image sample, c is the total class label number, y_j,iIs the actual probability, p, of the ith class label to which the jth image sample belongs in a batch of image samples_j,iThe prediction probability of the ith class label to which the jth image sample belongs in the image samples is obtained.

The second penalty may be on trainable parameters W of classifier C_CConstraining a feature vector to train a trainable parameter W in a model C in a direction having a high prediction probability for a label class to which the feature vector belongs and a low prediction probability for a label class to which the feature vector does not belong_C。

On the basis, when the feature extraction model and the classifier are subjected to combined training, the loss function during the adopted combined training can be constructed by the formula (7).

loss＝L_E(W_E)+L_C(W_C)………………………(7)

Wherein loss is the loss value during joint training, L_E(W_E) Is the first loss, L_C(W_C) The second loss.

Specifically, when the loss of the joint training is calculated by using the formula (7), the parameters (W) of the model E and the model C are optimized according to the conventional deep learning network optimization method_E、W_C) Namely:

compared with the related art, the embodiment obtains the image samples of the living body and the non-living body containing the human face and the class labels of the image samples; the category labels include a plurality of category labels belonging to living bodies, and a plurality of category labels belonging to non-living bodies; taking an image sample as input and a feature vector of the image sample as output, and constructing a feature extraction model; taking a feature vector output by the feature extraction model as input, taking the probability that the feature vector belongs to each class of label as output, and constructing a classifier; and performing combined training on the feature extraction model and the classifier, wherein a loss function in the combined training is constructed on the basis of a first loss between a feature vector output by the feature extraction model and a class center feature vector of a class label to which the feature vector belongs and a second loss between a prediction class output by the classifier and the class label. The scheme is different from the traditional category labels with living bodies and non-living bodies as the image samples, and uses a plurality of category labels belonging to the living bodies and a plurality of category labels belonging to the non-living bodies as the category labels for the image samples, so that the subdivision conditions of the living bodies and the non-living bodies can be further mined. Meanwhile, when the feature extraction model and the classifier are jointly trained, a loss function of the joint training is constructed according to a first loss between the feature vector extracted by the feature extraction model and the class center feature vector of the class label to which the feature vector belongs and a second loss between the prediction class output by the classifier and the class label, so that the feature extraction capability and the classification prediction capability of the model can be greatly improved, the generalization capability of the trained model is further improved, and the judgment capability of the non-living body type with less or even no samples in the training set is remarkably improved.

Another embodiment of the present invention relates to a living body detection method implemented based on the above-described model training method. As shown in fig. 4, the living body detecting method includes the following steps.

Step 301: and processing the face image to be detected by adopting a feature extraction model and a classifier obtained by joint training of a model training method to obtain a feature vector of the face image and the probability of the feature vector belonging to each class of labels.

Specifically, the feature extraction model E obtained by training with the model training method is used to perform feature extraction on the face image to be detected, so as to obtain a feature vector v corresponding to the face image. And classifying the feature vector v corresponding to the face image to be detected output by the feature extraction model E by using the classifier C obtained by training by using the model training method to obtain the probability p that the feature vector belongs to each class of label.

Step 302: and determining a first probability that the face image belongs to the living body based on the cosine value of an included angle between the feature vector and the class center feature vector of each class of label belonging to the living body.

Specifically, feature vectors v obtained by feature extraction and class center feature vectors of various types of labels belonging to living bodies are processed to obtain cosine values of included angles between the feature vectors v and the class center feature vectors

Wherein

Class center feature vectors that are the i-th class labels belonging to living subjects, namely: the ith class label belongs to the class labels 100, 101, 102, 103, 104.

Thereafter, the first probability p is calculated according to the following formula (8)_E。

Therein, max_iAmong a plurality of category labels belonging to the living body,

is measured.

Specifically, cosine values of included angles between feature vectors v of a face image to be detected and class center feature vectors of various classes of labels belonging to living bodies are obtained

Then, the first probability p can be obtained according to the formula (8)_E。

Step 303: and determining a second probability that the face image belongs to the living body based on the probability that the feature vector belongs to each class label of the living body.

Specifically, the probability p of each class label of the living body to which the feature vector belongs is obtained_iThereafter, the second probability p can be calculated according to the following equation (9)_C。

p_C＝∑p_i………………………(9)

Wherein the ith class label belongs to the class labels {100, 101, 102, 103, 104 }.

Step 304: and determining the final probability that the face image belongs to the living body based on the first probability and the second probability.

Specifically, the first probability p is obtained_EAnd a second probability p_CThereafter, the final probability P may be calculated according to the following equation (10).

P＝d×p_E+(1-d)×p_C………………………(10)

Where d is a hyperparameter, empirically set to 0.1.

Compared with the prior art, the embodiment of the invention processes the face image to be detected through the feature extraction model and the classifier obtained by the combined training of the model training method to obtain the feature vector of the face image and the probability of the feature vector belonging to each class of labels; determining a first probability that the face image belongs to the living body based on cosine values of included angles between the feature vectors and class center feature vectors of various classes of labels belonging to the living body; determining a second probability that the face image belongs to the living body based on the probability that the feature vector belongs to each class label of the living body; and determining the final probability that the face image belongs to the living body based on the first probability and the second probability.

In the scheme, the adopted feature extraction model and the classifier are obtained by joint training of the image samples marked by the plurality of category labels belonging to the living body and the plurality of category labels belonging to the non-living body, so that the subdivision conditions of the living body and the non-living body can be further detected. Meanwhile, when the feature extraction model and the classifier are jointly trained, a loss function of the joint training is constructed according to a first loss between the feature vector extracted by the feature extraction model and the class center feature vector of the class label to which the feature vector belongs and a second loss between the prediction class output by the classifier and the class label, so that the feature extraction capability and the classification prediction capability of the model can be greatly improved, the generalization capability of the trained model is further improved, and the judgment capability of the non-living body type with less or even no samples in the training set is remarkably improved. On the basis, when the living body detection is carried out, the probability that the face to be detected belongs to the living body is judged together based on the first probability obtained by the feature extraction model and the second probability obtained by the classifier, so that the accuracy of the living body detection is improved.

Another embodiment of the invention relates to an electronic device, as shown in FIG. 5, comprising at least one processor 402; and a memory 401 communicatively coupled to the at least one processor 402; wherein the memory 401 stores instructions executable by the at least one processor 402, the instructions being executable by the at least one processor 402 to enable the at least one processor 402 to perform any of the method embodiments described above.

Where the memory 401 and the processor 402 are coupled by a bus, which may include any number of interconnected buses and bridges that couple one or more of the various circuits of the processor 402 and the memory 401 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 402 is transmitted over a wireless medium through an antenna, which further receives the data and transmits the data to the processor 402.

The processor 402 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 401 may be used to store data used by processor 402 in performing operations.

Another embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes any of the above-described method embodiments when executed by a processor.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims

1. A method of model training, comprising:

2. The method of claim 1, wherein the obtaining of image samples of living and non-living subjects including a human face and class labels of the image samples comprises:

acquiring original images of a living body and a non-living body containing a human face;

labeling an original image belonging to a living body based on a plurality of category labels predefined according to the age group of the living body;

labeling an original image belonging to a non-living body based on a plurality of category labels predefined according to non-living body materials;

extracting a specified number of original images from the original images as the image samples;

and the original images with the specified number cover all the class labels, and the number of the original images corresponding to each class label is the same.

3. The method of claim 2, wherein the extracting a specified number of original images from the original images as the image samples comprises:

randomly taking m original images from the original images aiming at the original images of any category of labels, and judging the magnitude relation between m and a quotient value obtained by dividing the designated number by the total category label number;

if the m is larger than the quotient value, deleting partial original images from the m original images to enable the number of the remaining original images to be equal to the quotient value;

if m is smaller than the quotient value, randomly taking part of original images from the original images of the class labels again to enable the total number of the original images to be equal to the quotient value;

and taking the M original images of all the selected categories as M image samples to randomly disorder the sequence, and averagely dividing the M original images into M/B batches, wherein B is the batch size.

4. The method of claim 1, wherein the first loss is constructed by:

by passing

Calculating the feature vector v and the class center feature vector v of the class label to which the feature vector belongs_CFThe distance d (v) therebetween;

by passing

Calculating the first loss;

5. The method of claim 4, wherein said passing is by

Calculating the feature vector v and the class center feature vector v of the class label to which the feature vector belongs_CFBefore d (v), comprising:

for various central characteristic vectors v used in the current training period_CFThe update is performed by the following formula:

wherein v is_CFFor the updated class-centered feature vector,

The vector difference between them.

6. The method of claim 4, wherein the second loss is constructed by:

by passing

Calculating the second loss;

wherein, W_CAs trainable parameters of the classifier C, L_C(W_C) For the second loss, B is the batch size of the image sample, C is the total class label number, y_j,iThe fact that the jth image sample in a batch of image samples belongs to the ith class label isProbability of interstation, p_j,iThe prediction probability of the ith class label to which the jth image sample belongs in the image samples is obtained.

7. The method of claim 1, wherein the loss function in the joint training is constructed by the following formula:

loss＝L_E(W_E)+L_C(W_C)

wherein loss is the loss value during joint training, L_E(W_E) For the first loss, L_C(W_C) Is the second loss.

8. A method of in vivo detection, comprising:

processing a face image to be detected by using a feature extraction model and a classifier obtained by joint training according to the model training method of any one of claims 1 to 7 to obtain a feature vector of the face image and the probability of the feature vector belonging to each class of labels;

9. An electronic device, comprising:

at least one processor; and the number of the first and second groups,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method of any one of claims 1 to 7 and the liveness detection method of claim 8.

10. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the model training method according to any one of claims 1 to 7 and the in-vivo detection method according to claim 8.