CN115063374A

CN115063374A - Model training method, face image quality scoring method, electronic device and storage medium

Info

Publication number: CN115063374A
Application number: CN202210731159.5A
Authority: CN
Inventors: 刘冲冲; 付贤强; 何武; 朱海涛; 户磊
Original assignee: Hefei Dilusense Technology Co Ltd
Current assignee: Hefei Dilusense Technology Co Ltd
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2022-09-16

Abstract

The embodiment of the invention relates to the field of face image processing, and discloses a model training method, a face image quality scoring method, electronic equipment and a storage medium. In the invention, the model training method comprises the following steps: constructing a feature coding network for extracting a face feature vector from a face image, wherein the face feature vector is used as a sample and obeys a first normal distribution; constructing a clustering center network for generating two clustering center vectors for describing good quality and bad quality of the face image, wherein the two clustering center vectors are respectively used as samples and obey second normal distribution; constructing a classification probability model, wherein the classification probability model is used for generating classification probability values of good quality and bad quality of the face image; and (3) taking the face image without the label as a training sample, and performing combined training on the feature coding network, the clustering center network and the classification probability model to obtain the trained feature coding network and the clustering center network.

Description

Model training method, face image quality scoring method, electronic device and storage medium

Technical Field

The embodiment of the invention relates to the field of face image processing, in particular to a model training method, a face image quality scoring method, electronic equipment and a storage medium.

Background

Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. It is the core of artificial intelligence and is the fundamental way to make computer have intelligence.

The training data refers to data used for training an algorithm model in a machine learning process. The algorithm learns from the training data, finds the relationship from the obtained training data, forms an understanding, makes a decision, and evaluates confidence. The better the quality of the training data, the better the model behaves.

The quality of the face image is influenced by factors such as illumination, a face pose angle, face contrast, face integrity, face blurring, expression exaggeration, makeup concentration and the like. Therefore, in some methods, a plurality of sub-modules corresponding to factors affecting the quality of the face image are adopted, for example, the sub-modules respectively detect the blurring degree, the integrity degree and the size degree of the attitude angle of the face image, and an industry expert comprehensively gives the quality score of the face image according to the output of each sub-module, but the collection and labeling of the data set depending on the method is time-consuming and labor-consuming, and the rapid score of the quality of the face image is difficult to realize.

Disclosure of Invention

The embodiment of the invention aims to provide a model training method, a face image quality grading method, electronic equipment and a storage medium, which can realize rapid grading of face image quality.

In order to solve the above technical problem, an embodiment of the present invention provides a model training method, including: constructing a feature coding network for extracting a face feature vector from a face image, wherein the face feature vector is used as a sample and obeys a first normal distribution; constructing a clustering center network for generating two clustering center vectors for describing good quality and bad quality of the face image, wherein the two clustering center vectors are respectively used as samples and obey second normal distribution; constructing a classification probability model, wherein the classification probability model is used for generating classification probability values of good quality and bad quality of the face image; taking the face image without the label as a training sample, and performing combined training on the feature coding network, the clustering center network and the classification probability model to obtain a trained feature coding network and a trained clustering center network; wherein the loss of joint training comprises: the distance loss between the face feature vector and the two cluster center vectors is determined by taking the corresponding classification probability value as the distance loss of the two cluster center vectors, and taking the face feature vector corresponding to the face image marked with good quality as the first normal distribution obeyed by the sample and taking the two cluster center vectors as the second normal distribution obeyed by the sample respectively.

In order to solve the above technical problem, an embodiment of the present invention further provides a method for scoring a face image quality, including the following steps: inputting the face image to be recognized into the feature coding model to obtain a third normal distribution which is obeyed by a face feature vector corresponding to the face image to be recognized as a sample; determining the quality of the facial image to be recognized according to the distance from the third normal distribution which takes the facial feature vector corresponding to the facial image to be recognized as the sample obedience to the fourth normal distribution which takes the two clustering center vectors output by the clustering center model as the sample obedience respectively; the feature coding model and the clustering center model are obtained through training by the model training method.

An embodiment of the present invention also provides an electronic device, including: at least one processor; a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method or the face image quality scoring method described above.

The embodiment of the invention also provides a computer readable storage medium, which stores a computer program, and the computer program is executed by a processor to realize the above model training method or the human face image quality scoring method.

According to the embodiment of the invention, the face image can be coded by constructing the feature coding model, the face image is represented by the face feature vector, the face feature vector is calculated by constructing the clustering center model and the classification probability model, and the face image is clustered to two clustering centers with good quality and bad quality. The face feature vector of the face image, the cluster center vector of the good-quality face image classification and the cluster center vector of the bad-quality face image classification are respectively expressed by one normal distribution, and the image calculation is converted into the correlation calculation among different normal distributions. Based on a large number of face images without labels and a small number of face images with good quality, the face images are used as training samples, joint training is carried out on the feature coding network, the clustering center network and the classification probability model, the workload of data labeling can be effectively reduced, and the model training efficiency is improved. In the loss adopted by the joint training, the face feature vector can be effectively divided to the periphery of two cluster center vectors corresponding to a good quality class and a bad quality class based on the distance loss of the face feature vector approaching the two cluster center vectors by the corresponding classification probability value; meanwhile, distance loss between a first normal distribution which takes a small amount of face characteristic vectors corresponding to face images marked as good quality as sample obedience and a second normal distribution which takes two cluster center vectors as sample obedience respectively can be used for verifying and distinguishing good quality cluster centers and bad quality cluster centers of two cluster centers trained under a non-marked sample, and finally the cluster center vectors of the good quality cluster centers and the bad quality cluster centers are determined. And quality grading can be carried out on the face image to be recognized based on the trained feature coding network and the clustering center network, so that the grading process of the face image quality under weak supervision is realized.

In addition, the feature encoding network includes: a feature coding model and a first generation module; constructing a feature coding network for extracting a face feature vector from a face image, wherein the face feature vector is used as a sample and obeys a first normal distribution, and the method comprises the following steps: constructing a feature coding model, wherein the input of the feature coding model is a human face image, and the output of the feature coding model is a first mean vector and a first standard deviation vector; sampling from the standard normal distribution through a first generation module to obtain a first sampling value vector, and constructing a first normal distribution through the following formula:

wherein z is _n ^(b) Is a face feature vector; s _n ^(b) For a vector of first sample values, mu _n ^(b) Is a first mean vector, var _n ^(b) Is the first standard deviation vector, B ═ 1,2 …, B; b is the batch size of the training samples and n is the vector length. According to the method and the device, a first sampling value vector is obtained by sampling from standard normal distribution, the first mean value vector is used as a mean value, the first standard deviation vector is used as a standard deviation, the construction of the first normal distribution is realized, the feature of the image is represented in a dimensionality reduction mode, and the face feature vector obtained by encoding the face image is described in a normal distribution mode.

In addition, the cluster center network includes: a cluster center model and a second generation module; constructing a cluster center network for generating two cluster center vectors for describing the good quality and the bad quality of the face image, wherein the two cluster center vectors are respectively used as samples and obey a second normal distribution, and the method comprises the following steps: constructing a clustering center model, wherein the clustering center model has no input and the output is two vector expressions for describing the good quality and the bad quality of the face image, and each vector expression comprises a second mean vector and a second standard deviation vector; sampling from the standard normal distribution through a second generation module to obtain a second sampling value vector, and constructing a second normal distribution through the following formula:

wherein, C _k,n Is a clustering center vector; ss _n Is the second sample valueAmount, C _k,1,n Is a second mean vector, C _k,2,n Is a second standard deviation vector; and k is 1 and 2, respectively represents good quality and bad quality, and n is the vector length. In the application, a second sampling value vector is obtained by sampling from the standard normal distribution, the second mean vector is used as a mean value, and the second standard deviation vector is used as a standard deviation, so that the construction of the second normal distribution is realized, and the cluster center vectors corresponding to two cluster centers are described in one normal distribution respectively.

In addition, constructing the face features to correspond to the distance loss of the two cluster centers to which the classification probability values belong includes: calculating the distance loss of the face features corresponding to the classification probability values belonging to the two cluster centers by the following formula:

therein, loss _c Distance loss for face features to two cluster centers with corresponding classification probability values, J _k ^(b) Classification probability values, var, for good and bad quality of face images _n ^(b) Is the first standard deviation vector, z _n ^(b) As face feature vectors, C _k,n As a cluster center vector, α ₂ B is the batch size of the training sample; b is 1,2 …, B; k is 1,2, respectively representing good quality and bad quality; n is the vector length. In the method, the classification process of the face features is restrained by using the classification probability values, the first standard deviation vectors, the face feature vectors and the cluster center vectors of the face images belonging to good quality and bad quality, the face feature vectors are controlled to be uniformly distributed around two classifications, and the classification mode of a classification probability model and the cluster centers of the two classifications can be selected and the face coding mode is controlled to be restrained by joint training.

In addition, the loss of the joint training also comprises classification probability value uniformity loss, and the classification probability value uniformity loss is constructed and comprises the following steps: the classification probability value uniformity loss is calculated by the following formula:

therein, loss _y To classify probability value uniformity losses, J _k ^(b) Classification probability values, alpha, for good and bad quality of face images ₁ B is the batch size of the training sample; b is 1,2 …, B; and k is 1 and 2, and represents good quality and bad quality respectively. According to the method and the device, the classification probability value of the good quality and the bad quality of the face image can be used for constructing the uniformity loss of the classification probability value, and controlling the uniformity degree of the deviation probability of the face image of the training sample between two classifications, so that the coding mode of the feature coding model and the classification mode of the classification probability model are restrained.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.

FIG. 1 is a flow chart of steps of a model training method provided according to an embodiment of the invention;

FIG. 2 is a flowchart illustrating steps of a method for scoring facial image quality according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present invention, and the embodiments may be mutually incorporated and referred to without contradiction.

The embodiment of the invention relates to a model training method. The specific process is shown in FIG. 1.

Step 101, constructing a feature coding network for extracting a face feature vector from a face image, wherein the face feature vector is used as a sample and obeys a first normal distribution;

102, constructing a clustering center network for generating two clustering center vectors for describing good quality and bad quality of the face image, wherein the two clustering center vectors are respectively used as samples to obey second normal distribution;

103, constructing a classification probability model, wherein the classification probability model is used for generating classification probability values of good quality and bad quality of the face image respectively;

step 104, taking the face image without the label as a training sample, and performing combined training on the feature coding network, the clustering center network and the classification probability model to obtain a trained feature coding network and a trained clustering center network;

wherein the loss of joint training comprises: the distance loss between the face feature vector and the two cluster center vectors is determined by taking the corresponding classification probability value as the distance loss of the two cluster center vectors, and taking the face feature vector corresponding to the face image marked with good quality as the first normal distribution obeyed by the sample and taking the two cluster center vectors as the second normal distribution obeyed by the sample respectively.

The model training method of the embodiment is used in electronic devices capable of realizing machine learning, such as computers, tablets, mobile phones and the like. Because the training data refers to data used for training an algorithm model in the machine learning process, how to obtain good training data is an important problem in machine learning. In the algorithm related to the face image processing, the face image quality is affected by factors such as illumination, face pose angle, face contrast, face integrity, face blur, expression exaggeration, makeup concentration, and the like. Therefore, in some methods, a plurality of sub-modules corresponding to factors affecting the quality of the face image are adopted, for example, sub-modules for respectively detecting the blurring degree, the integrity degree and the size degree of the attitude angle of the face image are adopted, and an industry expert comprehensively gives a face image quality score according to the output of each sub-module, but the collection and the labeling of a data set depending on the method are time-consuming and labor-consuming. Some mark the face image according to human subjective feeling scores, construct a deep learning network and execute supervised training to directly obtain a scoring model which accords with human subjective feeling, but the factors influencing the quality of the face image are more, the difficulty of accurately marking the quality score is very high, and even the realization cannot be realized, so that the rapid scoring of the quality of the face image is difficult to realize.

According to the embodiment of the invention, based on a large number of unlabelled face images and a small number of face images labeled with good quality as training samples, the feature coding network, the clustering center network and the classification probability model are jointly trained, so that the workload of data labeling can be effectively reduced, and the model training efficiency is improved.

The following describes the implementation details of the model training method of the present embodiment in detail, and the following is only provided for the convenience of understanding and is not necessary for implementing the present embodiment.

In step 101, the electronic device constructs a feature coding network for extracting a face feature vector from a face image, where the face feature vector as a sample follows a first normal distribution. The feature coding network can select and code the face image from dimensions such as contrast, brightness, gray scale and the like, and can also code different organs respectively after extracting the contour of the face structure in the face image, so as to be used for calculating face attitude angle, face contrast, face integrity, face blurring degree, expression exaggeration degree and makeup concentration degree. The feature vector obtained after encoding is a one-dimensional vector, and can also be a two-dimensional three-dimensional or N-dimensional vector.

In one example, a feature encoding network includes: a feature coding model and a first generation module; constructing a feature coding network for extracting face features from a face image, wherein a face feature vector is used as a sample and obeys a first normal distribution, and the method is realized by the following steps: and constructing a feature coding model, wherein the input of the feature coding model is a human face image, and the output of the feature coding model is a first mean vector and a first standard deviation vector.

The feature coding model may be denoted as E, and the trainable parameters of E are denoted as w _E E, inputting a face image, outputting 2 one-dimensional feature vectors with the length of N respectively, namely a first mean vector and a first standard deviation vector which are respectively recorded as mu _n And var _n Where N is 1,2,3, …, and N is a hyperparameter greater than 2, and is empirically set, for example, taking N as 128.

Specifically, when the electronic equipment performs model training each time, B different face images are randomly selected from the data set and recorded as x ^(b) Where B is 1,2,3, …, and B is a positive integer super parameter, the setting is empirically determined, for example, B may be 64, and x may be set ^(b) Inputting the first mean vector into the feature coding model E in sequence, the obtained first mean vector can be recorded as mu _n ^(b) And a first standard deviation vector var _n ^(b) 。

The electronic equipment determines that the face feature vector of the face image serving as a sample obeys a first normal distribution with the first mean vector as a mean and the first standard deviation vector as a standard deviation through a first generation module. The electronic device may construct a first normal distribution in an N-dimensional euclidean space or other N-dimensional spaces, so as to map all face images input by the feature coding model into face feature vectors.

In one example, after the sampling value vector is obtained by sampling in the predetermined number domain, the first normal distribution is constructed based on the sampling value vector.

Specifically, the electronic device may sample B (B images corresponding to one sample) times from the standard normal distribution, and sample n values each time, which is denoted as a first sampling value vector s _n ^(b) According to mu _n ^(b) 、var _n ^(b) And s _n ^(b) Calculating the intermediate vector z _n ^(b) To describe the first normal distribution. The sampling from the standard normal distribution may be random sampling or sampling at a preset interval.

In one example, a first sampling value vector is obtained by sampling from a standard normal distribution through a first generation module, and the first normal distribution is constructed through the following formula:

wherein z is _n ^(b) Is a face feature vector, s _n ^(b) For a vector of first sample values, mu _n ^(b) Is the first mean vector, var _n ^(b) Is the first standard deviation vector, B ═ 1,2 …, B; b is the batch size of the training samples, i.e., the number of face images used per training batch, n is the vector length,

it is indicated that the addition is element-by-element,

representing element-by-element multiplication.

In this embodiment, a first sampling value vector is obtained by sampling from a standard normal distribution, and the first mean vector is used as a mean value and the first standard deviation vector is used as a standard deviation to implement the construction of the first normal distribution, represent the feature of an image in a dimensionality reduction manner, and implement the description of a face feature vector obtained by encoding a face image in a normal distribution.

In step 102, the electronic device constructs a cluster center network for generating two cluster center vectors describing good quality and bad quality of the face image, and the two cluster center vectors are respectively used as samples to obey a second normal distribution. The cluster center network can be a one-layer network or a multi-layer network, the cluster center network is used for calculating the positions of two cluster centers with good and bad quality, and the position information is expressed by a vector and described by a second mean vector and a second standard deviation vector.

In one example, a cluster-centric network includes: a cluster center model and a second generation module; constructing a clustering center network for generating two clustering center vectors for describing the good quality and the bad quality of the face image, wherein the two clustering center vectors are respectively used as samples to obey a second normal distribution, and the method is realized by the following steps: and constructing a clustering center model, wherein the clustering center model has no input and the output is two vector expressions for describing the good quality and the bad quality of the face image, and each vector expression comprises a second mean vector and a second standard deviation vector.

The cluster center model may be denoted as C, and the trainable parameters of C are denoted as w _c C has no input, the output is vector expression of two clustering centers, and each clustering center is expressed by two one-dimensional characteristic vectors with the length of N respectively, namely a second mean vector and a second standard deviation vector.

The output of the clustering center model C is recorded as C _k,m,n Where k denotes two classes (k ═ 1 is a good quality class, k ═ 2 is a bad quality class), m denotes two vectors (m ═ 1 is a mean vector, m ═ 2 is a standard deviation vector), n denotes a vector length, i.e., C _k,m,n Two classes k are represented, each class corresponding to two vectors m, each of length n. Corresponding to (C) _1,1,n Second mean vector, C, representing good quality cluster centers _1,2,n Second standard deviation vector, C, representing good quality cluster center _2,1,n Second mean vector, C, representing bad mass cluster center _2,2,n A second standard deviation vector representing a center of a bad-quality cluster.

The electronic equipment determines that the clustering center vector of each clustering center serves as a sample and obeys a second normal distribution with a second mean vector as a mean value and a second standard deviation vector as a standard deviation in the vector expression of the clustering center through a second generation module. The electronic device may construct a second normal distribution in an N-dimensional euclidean space or other N-dimensional space, which represents a cluster center vector corresponding to two cluster centers to be output by the cluster center model.

In one example, after the sampling value vector is obtained by sampling in the predetermined number domain, the second normal distribution is constructed based on the sampling value vector.

Specifically, the electronic device maySampling n times from the standard normal distribution to obtain n values, and recording as a second sampling value vector ss _n According to C _k,1,n 、C _k,2,n And ss _n Calculating a clustering center vector C _k,n To describe a second normal distribution. The sampling from the standard normal distribution may be random sampling or sampling at a preset interval. Second vector of sampled values ss _n The vector s of first sampled values may also be used _n ^(b) 。

In one example, a second sampling value vector may be obtained by sampling from the standard normal distribution through the second generating module, and the second normal distribution is constructed through the following formula:

wherein, C _k,n The cluster center vector is used as a sample and follows a second normal distribution; ss _n As a vector of second sampled values, C _k,1,n Is a second mean vector, C _k,2,n Is a second standard deviation vector; k is 1,2, respectively representing good quality and bad quality, n is the vector length,

it is indicated that the addition is element-by-element,

representing element-by-element multiplication.

In this embodiment, a second sampling value vector is obtained by sampling from the standard normal distribution, and the second mean vector is used as a mean value, and the second standard deviation vector is used as a standard deviation, so that the second normal distribution is constructed, and therefore, the cluster center vectors corresponding to the two cluster centers are described as one normal distribution respectively.

In step 103, the electronic device constructs a classification probability model, which is used to generate classification probability values of good quality and bad quality of the face image. The classification probability model can divide the face feature vectors into two types in a clustering mode, and output the face feature vectors and the deviation degrees of the two classifications in a probability value. For example, the implementation can be based on K-means algorithm, CLARANS algorithm, and other algorithms.

In one example, the classification probability model is denoted as J, and the trainable parameters of J are denoted as w _J The input of the network is a one-dimensional feature vector with the length of N, namely a human face feature vector, the output of the penultimate layer network is a one-dimensional vector with the length of 2, the last layer is a conventional softmax layer, and the output of the softmax is a probability value obtained by normalizing two values (namely corresponding good quality classification and bad quality classification) in the one-dimensional vector output by the penultimate layer network (for example, if the output of the penultimate layer network is (2, 3), the probability value obtained after normalization is 40%, 60%).

The output of J is denoted as J _k ^(b) Wherein k is 1,2, J _k Is the probability of classifying the output of two classes (k 1 is a good quality class, k 2 is a bad quality class), i.e., J ₁ Is the probability that this face feature vector is a good quality class, J ₂ Is the probability that this face feature vector is a bad quality class, B is 1,2 …, B; and B is the batch size of the training sample, and different values correspond to different input face images.

In step 104, the electronic device performs joint training on the feature coding network, the clustering center network and the classification probability model by using the face image without the label as a training sample to obtain a trained feature coding network and a trained clustering center network; the loss of joint training includes: the distance loss between the face feature vector and the two cluster center vectors is determined by taking the corresponding classification probability value as the distance loss of the two cluster center vectors, and taking the face feature vector corresponding to the face image marked with good quality as the first normal distribution obeyed by the sample and taking the two cluster center vectors as the second normal distribution obeyed by the sample respectively. The joint training is to perform constraint on parameters of multiple models by using loss calculation corresponding to the output of one model based on the coupling between the models, or perform constraint on parameters of one model by using loss calculation corresponding to the output of multiple models, or perform constraint on parameters of multiple models by using loss calculation corresponding to the output of multiple models.

In one example, constructing the face features to correspond to the distance loss of the two cluster centers to which the classification probability values belong can be implemented as follows: the distance between each face feature vector and the corresponding vector of the centers of the two classifications is calculated, the situations of minimum distance, maximum distance, mean value, median and the like between the face feature vector and the classification center in each classification are counted, face feature classification loss is analyzed, or other algorithms for calculating whether the face feature vectors are uniformly distributed in the two classifications are used for constructing the face feature classification loss.

In one example, the face feature is lost by the following formula to the distance between the two cluster centers to which the corresponding classification probability value belongs:

therein, loss _c Distance loss for face features to two cluster centers with corresponding classification probability values, J _k ^(b) Classification probability values, var, for good and bad quality of face images _n ^(b) Is the first standard deviation vector, z _n ^(b) As face feature vectors, C _k,n As a cluster center vector, α ₂ B is the batch size of the training sample; b is 1,2 …, B; k is 1,2, respectively representing good quality and bad quality; n is the vector length.

In the embodiment, the classification process of the face features is constrained by using the classification probability values of the face images belonging to good quality and bad quality, the first standard deviation vector, the face feature vectors and the cluster center vectors, the face feature vectors are controlled to be uniformly distributed around the two classifications, and the classification mode of the classification probability model and the cluster centers of the two classifications can be selected and the face coding mode can be controlled to be constrained by joint training.

In one example, the electronic device may calculate a distance loss between the first normal distribution and the second normal distribution corresponding to the good-quality facial image by using a distance between mean vectors between the first normal distribution and the second normal distribution (i.e., the first mean vector and the second mean vector) as a distance between the first normal distribution and the second normal distribution, or calculate a distance loss based on a divergence (e.g., KL divergence, JS divergence).

In one example, the electronic device constructs a distance loss between a first normal distribution and a second normal distribution corresponding to a face image labeled as good quality by:

b pieces of face images with good quality are input into a feature coding model (E) to obtain B first mean vectors with good quality and B first standard deviation vectors with good quality, the mean value of the B first mean vectors with good quality is recorded as a good quality mean vector, and the mean value of the B first standard deviation vectors with good quality is recorded as a good quality standard deviation vector;

determining a good quality face feature vector of a good quality face image as a sample to obey a good quality first normal distribution with a first mean value vector of the good quality as a mean value and a first standard deviation vector of the good quality as a standard deviation;

calculating the distance loss between the first normal distribution and the second normal distribution corresponding to the good-quality face image through the following formula:

therein, loss _f Distance loss, KL (#), between the first normal distribution and the second normal distribution corresponding to the good-quality face image ₁ ||* ₂ ) Is shown by ₁ And ₂ the Kullback-Leibler divergence; p (zg) _n ) Denotes zg _n Probability distribution of (1), zg _n Obey the mean as a sample to a good mass mean vector (mugm) _n ) The standard deviation is good quality standard deviation vector (vargm) _n ) Normal distribution of (2); p (C) _1,n ) Cluster center vector C representing good quality _1,n Probability distribution of (C) _1,n Second mean vector C as sample obedience mean of good quality _1,1,n Second standard deviation vector C with standard deviation of good quality _1,2,n Normal distribution of (1), p (C) _2,n ) Cluster center vector C representing bad quality _2,n Probability distribution of (C) _2,n Second mean vector C as sample obedient mean bad quality _2,1,n Second standard deviation vector C with standard deviation of bad quality _2,2,n Is the vector length, n is the normal distribution of (1).

In this embodiment, the loss is minimized _f Corresponding to the desired zg _n Distribution as sample obeys is close to C _1,n As a distribution to which the sample obeys, and is far from C _2,n As the distribution obeyed by the sample, it is known which of the 2 cluster centers is the center C of the good-quality face image _1,n 。

In this embodiment, the good-quality face feature vector is expressed by the first normal distribution, the distance between the first normal distribution and the second normal distribution corresponding to the good-quality face image is calculated by divergence, loss constraint is constructed on the distance between the first normal distribution and the second normal distribution corresponding to the good-quality face image, and the categories of two classified cluster centers are determined, that is, which cluster center is the good-quality cluster center and which is the bad-quality cluster center, so that a feature coding model, a cluster center model, and a classification probability model can be constrained.

In one example, the loss of the joint training further includes a loss of the face features, and the loss of the face features can be calculated by performing the same processing on the original face image and the face feature vector, then performing feature extraction on the processed face image and the face feature vector, and comparing whether the extracted features are similar.

In one example, the loss of facial features may be constructed by:

constructing an image reduction model, inputting a face feature vector and outputting a tensor in the shape of a face image, adding the image reduction model into a joint training process, and calculating the loss of the face feature through the following formula:

therein, loss _e Is a loss of facial features, x ^(b) As an image of a human face, dx ^(b) Tensor, alpha, which is the shape of the face image ₄ B is the batch size of the training sample; b is 1,2 …, B.

Wherein, the image reduction model is marked as G, and the trainable parameter of G is marked as w _G The input of G is a one-dimensional feature vector with the length of N, namely a face feature vector, and the output is a tensor of the image shape, which is recorded as dx and used for assisting E training. Then z will be _n ^(b) Input to the image restoration model G to obtain z _n ^(b) Corresponding image tensor dx ^(b) 。

In this embodiment, by constructing the image restoration model, the difference between the restored image and the original face image can be measured by restoring the image with the encoding result after performing feature encoding on the verified face image, and performing combined training to perform loss constraint on the feature encoding model and train the feature encoding model.

In one example, the loss of joint training further includes a classification probability value uniformity loss. The distance between each face feature vector and the corresponding vector of the centers of the two classifications can be calculated, and the uniformity loss of the classification probability value is calculated according to the numerical ratio of the distance values, or the density degree of the positions of the face feature vectors in the two classifications, or other algorithms capable of measuring the uniformity degree of the deviation probability of the face image of the training sample between the two classifications.

In one example, the classification probability value uniformity loss is calculated by the following formula:

therein, loss _y To classify probability value uniformity losses, J _k ^(b) Classification probability values, alpha, for good and bad quality of face images ₁ B is a hyper-parameter, and B is the batch size of the training sample, which is the number of face images used in each batch of training; b is 1,2 …, B; k is 1,2, representing good quality and bad quality, respectively.

In this embodiment, through the classification probability values of the face images belonging to good quality and bad quality, the uniformity loss of the classification probability values can be constructed, and the uniformity degree of the biased probability of the face images of the training samples between the two classifications is controlled, so that the coding mode of the feature coding model and the classification mode of the classification probability model are constrained.

In an example, the loss function of the joint training may be obtained by simply adding the face feature loss, the classification probability value uniformity loss, the face feature classification loss, and the distance loss between the first normal distribution and the second normal distribution corresponding to the face image labeled as good quality, or may be obtained by weighting the face feature loss, the classification probability value uniformity loss, the face feature classification loss, and the distance loss between the first normal distribution and the second normal distribution corresponding to the face image labeled as good quality.

In one example, the loss function for joint training is set as follows:

loss＝loss _e +loss _y +loss _c +loss _f …………………(7)

wherein alpha is ₄ Is a hyperparameter greater than 0, set empirically, e.g. may take alpha ₄ ＝1.0。

The electronic device may optimize E, G, C, J the parameters according to conventional deep learning network optimization methods, such as gradient descent:

a determination E, G, C, J is made according to conventional methods, such as determining whether training has converged a specified number of times or loss no longer decreases significantly, stopping training if E, G, C, J has converged, otherwise starting a new round of training.

The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.

The embodiment of the invention also relates to a face image quality scoring method. The specific flow is shown in fig. 2.

Step 201, inputting a face image to be recognized into a feature coding model, and obtaining a face feature vector corresponding to the face image to be recognized as a third normal distribution obeyed by a sample;

step 202, determining the quality of the facial image to be recognized according to the distance from the third normal distribution, to which the facial feature vector corresponding to the facial image to be recognized obeys, serving as a sample to the fourth normal distribution, to which the two clustering center vectors output by the clustering center model serve as sample obeys respectively; the feature coding model and the clustering center model are obtained through training by the model training method.

The implementation details of the face image quality scoring method according to the present embodiment are specifically described below, and the following description is only provided for the convenience of understanding, and is not necessary for implementing the present embodiment.

The third normal distribution of this embodiment is the same as the first normal distribution in the above embodiment, in which the face image is input to the feature coding model, the face feature vector corresponding to the obtained face image is taken as the normal distribution to which the sample obeys, and the fourth normal distribution is the same as the second normal distribution in the above embodiment, in which the two cluster center vectors output by the cluster center model are respectively taken as the normal distribution to which the sample obeys.

The first normal distribution is a feature coding model in the model training process, the input face image is coded, the obtained face feature vector is taken as the normal distribution to which the sample obeys, the third normal distribution is the trained feature coding model, the input face image is coded, and the obtained face feature vector is taken as the normal distribution to which the sample obeys; the second normal distribution is normal distribution in which two clustering center vectors output by the clustering center model in the model training process are respectively used as sample obeys, and the fourth normal distribution is normal distribution in which two clustering center vectors output by the trained clustering center model are respectively used as sample obeys. The model training process is the process of modifying iterative computation on the value of the model training parameter, so that the feature coding model corresponding to the first normal distribution is the feature coding model which is not trained yet and the parameter is not determined yet, and the feature coding model corresponding to the third normal distribution is the feature coding model which is trained and the parameter is determined already; the cluster center model corresponding to the second normal distribution is a cluster center model which is not trained yet and has parameters not determined yet, and the cluster center model corresponding to the fourth normal distribution is a cluster center model which is trained already and has parameters determined already. Therefore, the present embodiment distinguishes the models before and after the joint training by the third normal distribution and the first normal distribution, and the fourth normal distribution and the second normal distribution, respectively.

In step 201, the electronic device records the facial image to be quality scored as x, and inputs x into the trained feature coding model E to obtain a first mean vector mux corresponding to the facial image to be recognized _n And a first standard deviation vector varx _n The output of C is C _k,m,n ；

Recording the face characteristic vector of the face image to be subjected to quality scoring as zx _n Calculating zx _n As a third normal distribution to which the samples obey, i.e. for the mux _n 、varx _n Calculating zx in a manner similar to equation (1) _n 。

In step 202, the electronic device calculates the distance from the third normal distribution to the fourth normal distribution of the two cluster centers, and determines the quality of the facial image to be recognized according to the distance.

zx _n Third Normal distribution to good quality Cluster center vector C as sample obedience _1,n Distance d as a fourth normal distribution to which the sample obeys _g ，zx _n Third Normal distribution to bad quality Cluster center vector C as sample compliance _2,n Distance d of the fourth normal distribution to which the sample obeys _b ：

In one example, the distance between the third normal distribution to which the face feature vector corresponding to the face image to be recognized serves as the sample obedient and the fourth normal distribution to which the two cluster center vectors output by the cluster center model serve as the sample obedient is calculated by the following formula:

d _g ＝KL(p(zx _n )||p(C _1,n ))…………………(8)

d _b ＝KL(p(zx _n )||p(C _2,n ))…………………(9)

wherein d is _g Distance between the fourth normal distribution obeyed by the sample as the cluster center vector from the third normal distribution to the cluster center of good quality, d _b Is the distance, KL (#), between the fourth normal distributions obeyed by the cluster center vector from the third normal distribution to the cluster center with bad quality ₁ ||* ₂ ) Is shown by ₁ And ₂ the Kullback-Leibler divergence; p (zx) _n ) Denotes zx _n Probability distribution of (1), zx _n Face feature vector as a sample obeying a third normal distribution, p (C) _1,n ) Cluster center vector C representing good quality _1,n Probability distribution of (C) _1,n Second mean vector C as sample obedience mean of good quality _1,1,n Second standard deviation vector C with standard deviation of good quality _1,2,n Normal distribution of (1), p (C) _2,n ) Cluster center vector C representing bad quality _2,n Probability distribution of (C) _2,n Second mean vector C as sample obedient mean bad quality _2,1,n Second standard deviation vector C with standard deviation of bad quality _2,2,n N is the vector length;

calculating the quality score socre of the face image to be recognized by the following formula:

in this embodiment, by calculating the distance between the third normal distribution and the two second normal distributions with divergence, the quality of the face image can be measured by the normal distribution corresponding to the face image, the second normal distribution corresponding to the cluster center with good quality, and the distance of the second normal distribution corresponding to the cluster center with bad quality, so as to quantify the scoring standard, and further score the face image.

In a specific embodiment, the following four models are constructed, training of a face image quality scoring model is realized in a weak supervision training mode based on the four models, and the trained models are used for scoring the face image quality.

1. Constructing a deep learning convolution network E as a characteristic coding model, and recording trainable parameters of the E as w _E E, inputting a face image, outputting 2 one-dimensional feature vectors with the length of N respectively, and recording the vectors as mean vectors mu respectively _n Sum standard deviation vector var _n Where N is 1,2,3, …, N is a hyperparameter greater than 2, and is empirically set, for example, taking N128;

2. constructing a deep learning convolution network G as an image restoration model, and recording trainable parameters of G as w _G The input of G is a one-dimensional characteristic vector with the length of N, and the output is a tensor of the image shape and is recorded as dx;

3. constructing a deep learning network C as a clustering center model, and recording trainable parameters of C as w _c C has no input, the output is the vector expression of two clustering centers, each clustering center is represented by two one-dimensional characteristic vectors with the length of N respectively, namely, a mean vector and a standard deviation vector, the output is 4 vectors with the length of 1xN, and the vectors are represented as characteristic vectors with the length of 2x N;

4. constructing a deep learning network J as a classification probability model, and recording trainable parameters of J as w _J The input being of a length NThe output of the one-dimensional feature vector is a one-dimensional vector with the length of 2, the last layer is a conventional softmax layer, and the output of the softmax is a probability value obtained by normalizing two values in the one-dimensional vector output by the second layer (for example, if the output of the second layer is (2, 3), the probability value obtained by normalizing is 40%, 60%). Where J is used to classify the input vector two times.

The four models were trained as follows:

step 1, selecting N from data set _g As a good quality face image as a good quality subset D _g In which N is _g Is a hyperparameter greater than 1, and can be set empirically, for example, N can be taken _g Divide by D in dataset 200 _g The face image data set remaining outside is marked as a remaining subset D, i.e. the current data set is divided into good quality subsets D _g And a residual subset D (the D has good quality face images and also has bad quality face images, and all face images are not marked);

step 2, randomly taking B different face images from the residue subset D and recording the B different face images as x ^(b) Wherein B is 1,2,3, …, B is less than N _g The positive integer superparameter (B samples taken each time are smaller than the number of images of the good quality subset Dg) is set empirically, for example, B is 64;

step 3, mixing x ^(b) Sequentially inputting the average value vector (i.e. the first average value vector) mu into the feature coding model E _n ^(b) And the standard deviation vector (i.e., the first standard deviation vector) var _n ^(b) ；

Step 4, sampling B (corresponding to B images of one sample) times from the standard normal distribution, sampling n values each time, and recording as a sampling value (first sampling value vector) s _n ^(b) (ii) a According to the mean vector mu _n ^(b) Standard deviation vector var _n ^(b) And the sampled value s _n ^(b) Calculating the intermediate vector z _n ^(b) (i.e., face feature vector):

wherein

And

respectively representing element-by-element addition and element-by-element multiplication between vectors (multiplication is prior to addition);

step 5, intermediate vector z _n ^(b) Inputting the image into an image reduction model G to obtain an intermediate vector z _n ^(b) Corresponding image tensor (i.e. tensor of face image shape) dx ^(b) ；

Step 6, intermediate vector z _n ^(b) Inputting the data into a classification probability model J to obtain J _k ^(b) (i.e., the classification probability values of the face image belonging to good quality and bad quality), where k is 1 and 2(k is 1 for good quality class and k is 2 for bad quality class), by calculating loss _y (i.e., classification probability value uniformity loss) E, J is loss constrained.

Wherein alpha is ₁ Is a hyperparameter, set empirically, e.g. may take alpha ₁ ＝1.0；

Step 7, randomly collecting N values from the standard normal distribution, and recording the N values as sampling values ss _n (i.e., a second sample value vector), the output of the clustering center model C is denoted as C _k,m,n Where k denotes two classes (k ═ 1 is a good quality class, k ═ 2 is a bad quality class), m denotes two vectors (m ═ 1 is a mean vector, m ═ 2 is a standard deviation vector), n denotes a vector length (e.g., 128 bits), i.e., C _k,m,n Two classes k are represented, each class corresponding to two vectors m, each of length n.

According to the sampled value ss _n And two classes of vector representations (four vectors in total) output by model C, can be passedThe clustering center vector C of the two classes is calculated by the following formula _k,n ：

C _k,m,n The vector is 2x2xN, k takes values of 1 and 2 to represent two classifications of good quality/bad quality facial images respectively, and m takes values of 1 and 2 to represent the vector representation of each classification, namely the vector representation comprises a mean vector mu and a standard deviation vector var.

Calculating z from the preceding _n ^(b) The same procedure as in (2), where a cluster center vector C is calculated _k,n Is also based on the "resampling process (ss) _n )". C obtained after sampling calculation _k,n As the sample follows normal distribution, since k is 1,2, there are "2" normal distributions, and each cluster center vector corresponds to one normal distribution. Can be regarded as C _k,n Two cluster centers are characterized.

The loss constraint is applied E, C, J according to the following loss formula (i.e., face feature classification loss).

Wherein alpha is ₂ Is a hyperparameter, set empirically, e.g. may take alpha ₂ ＝1.0；

Step 8, from the good quality subset D _g B human face images are randomly taken from the middle to be input into E, and good quality mean value vector mug is obtained _n ^(b) Sum good quality standard deviation vector varg _n ^(b) And calculating the average value of the two types of vectors: mean value of good quality mean vector (mutm) _n Mean of good quality standard deviation vectors vargm _n ：

The average mu and the average var of the B good-quality face images after E-coding are described in the same formula (1).

zg _n A cluster center vector for a good quality face image is described.

Step 9, calculating the distance loss between the first normal distribution and the second normal distribution corresponding to the good-quality face image:

wherein KL (.) ₁ ||* ₂ ) Is shown by ₁ And ₂ Kullback-Leibler divergence of (p) (zg) _n ) Denotes zg _n Probability distribution of (1), zg _n Obey the mean value to mugm as a sample _n Standard deviation of vargm _n Normal distribution of (1), p (C) _1,n ) Is represented by C _1,n Probability distribution of (C) _1,n Obey mean as a sample to C _1,1,n Standard deviation of C _1,2,n Normal distribution of (1), p (C) _2,n ) Is represented by C _2,n Probability distribution of (C) _2,n Obey mean as a sample to C _2,1,n Standard deviation of C _2,2,n Is normally distributed.

Wherein alpha is ₃ Is a hyperparameter greater than 0, set empirically, e.g. may take alpha ₃ ＝0.05；

Preamble C _k,n Two cluster centers of good quality/bad quality are described, but it is not known at present which center corresponds to good quality and which center corresponds to bad quality, step 9 distinguishes C by a small number of centers consisting of the average features of several good quality face images _k,n Good and bad cluster centers are described.

Minimizationloss _f Corresponding to the desired zg _n Distribution as sample obeys is close to C _1,n As a distribution to which the sample obeys, and is far from C _2,n As the distribution obeyed by the sample, it is known which of the 2 cluster centers is the center of the good-quality face image (C) _1,n )。

Step 10, calculating a loss function:

wherein alpha is ₄ Is a hyperparameter greater than 0, set empirically, e.g. may take alpha ₄ ＝1.0；

Step 11, optimizing E, G, C, J parameters according to a conventional deep learning network optimization method:

step 12, judging E, G, C, J whether convergence has occurred according to a conventional method, for example, judging whether training has been performed for a specified number of times or loss has not decreased significantly, if E, G, C, J convergence has occurred, entering step 13, otherwise entering step 2;

and step 13, outputting the trained E, G, C, J.

The step of scoring the quality of the face image comprises:

step 1, E, C after the deep learning network training step is completed is loaded, and the face image to be subjected to quality grading is recorded as x;

step 2, inputting x into E to obtain mux _n And varx _n The output of C is C _k,m,n ；

Step 3, calculating zx _n (for mux) _n 、varx _n The intermediate vector calculated by formula 1, i.e. the face feature vector of the face image to be quality scored) is taken as the distance from the normal distribution obeyed by the sample to the normal distribution of the two cluster centers, i.e.: zx _n Normal-distributed to good-quality cluster center vector as sample obedientDistance d as a normal distribution to which the sample obeys _g 、zx _n Distance d of normal distribution to bad quality cluster center vector as sample obedience _b ：

d _g ＝KL(p(zx _n )||p(C _1,n ))…………………(8)

d _b ＝KL(p(zx _n )||p(C _2,n ))…………………(9)

Wherein KL (.) ₁ ||* ₂ ) Denotes ₁ And ₂ Kullback-Leibler divergence of (p) (zx) _n ) Denotes zx _n Probability distribution of (1), zx _n Obey mean as a sample to mux _n Standard deviation of varx _n Normal distribution of (1), p (C) _1,n ) Is represented by C _1,n Probability distribution of (C) _1,n Obey mean as a sample to C _1,1,n Standard deviation of C _1,2,n Normal distribution of (1), p (C) _2,n ) Is represented by C _2,n Probability distribution of (C) _2,n Obey mean as a sample to C _2,1,n Standard deviation of C _2,2,n Is normally distributed.

Step 4, calculating the score of the facial image x to be subjected to quality grading:

the score is output.

It should be understood that the present embodiment corresponds to the above embodiments, and the present embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.

Embodiments of the present invention also relate to an electronic device, as shown in fig. 3, including: at least one processor 301; a memory 302 communicatively coupled to the at least one processor; the memory 302 stores instructions executable by the at least one processor 301, and the instructions are executed by the at least one processor 301 to perform the method of any of the embodiments.

Where the memory 302 and the processor 301 are coupled in a bus, the bus may comprise any number of interconnected buses and bridges, the buses coupling one or more of the various circuits of the processor 301 and the memory 302. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. Information processed by processor 301 is transmitted over a wireless medium through an antenna, which further receives the information and passes the information to processor 301.

The processor 301 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory 302 may be used to store information used by the processor in performing operations.

Embodiments of the present invention relate to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.

That is, as can be understood by those skilled in the art, all or part of the steps in the method according to the above embodiments may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A method of model training, comprising:

constructing a feature coding network for extracting a face feature vector from a face image, wherein the face feature vector is used as a sample and obeys a first normal distribution;

constructing a clustering center network for generating two clustering center vectors for describing the good quality and the bad quality of the face image, wherein the two clustering center vectors are respectively used as samples to obey second normal distribution;

constructing a classification probability model, wherein the classification probability model is used for generating classification probability values of good quality and bad quality of the face image respectively;

taking a face image without a label as a training sample, and performing combined training on the feature coding network, the clustering center network and the classification probability model to obtain the trained feature coding network and the clustering center network;

wherein the loss of joint training comprises: the face feature vector is close to the distance loss of the two cluster center vectors by the corresponding classification probability value, and the distance loss between a first normal distribution obeying the face feature vector marked as a good-quality face image as a sample and a second normal distribution obeyed by the two cluster centers as the samples respectively is utilized.

2. The model training method of claim 1, wherein the feature coding network comprises: a feature coding model and a first generation module;

the constructing of the feature coding network for extracting the face feature vector from the face image, wherein the face feature vector as a sample obeys a first normal distribution, comprises:

constructing the feature coding model, wherein the input of the feature coding model is a human face image, and the output of the feature coding model is a first mean vector and a first standard deviation vector;

sampling from the standard normal distribution through the first generation module to obtain a first sampling value vector, and constructing the first normal distribution through the following formula:

wherein z is _n ^(b) The face feature vector is obtained; s _n ^(b) For said first vector of sample values, mu _n ^(b) Is the first mean vector, var _n ^(b) Is the first standard deviation vector, B ═ 1,2 …, B; b is the batch size of the training samples and n is the vector length.

3. The model training method of claim 1, wherein the cluster-centric network comprises: a cluster center model and a second generation module;

the method for constructing the cluster center network for generating two cluster center vectors for describing the good quality and the bad quality of the face image, wherein the two cluster center vectors are respectively used as samples to obey second normal distribution, and comprises the following steps:

constructing the clustering center model, wherein the clustering center model has no input and the output is two vector expressions for describing the good quality and the bad quality of the face image, and each vector expression comprises a second mean vector and a second standard deviation vector;

sampling from the standard normal distribution through the second generation module to obtain a second sampling value vector, and constructing the second normal distribution through the following formula:

wherein, C _k,n Is the cluster center vector; ss _n For the second vector of sample values, C _k,1,n Is the second mean vector, C _k,2,n Is the second standard deviation vector; and k is 1 and 2, respectively represents good quality and bad quality, and n is the vector length.

4. The model training method according to any one of claims 1 to 3, wherein constructing the face features to correspond to the distance loss of the two cluster centers to which the classification probability values belong comprises:

calculating the face feature to correspond to the distance loss of the two cluster centers to which the classification probability values belong by the following formula:

therein, loss _c Distance loss for the face features to the two cluster centers corresponding to the classification probability values, J _k ^(b) Classification probability values, var, for good and bad quality of said face image _n ^(b) Is the first standard deviation vector, z _n ^(b) As said face feature vector, C _k,n For the cluster center vector, α ₂ B is the batch size of the training sample; b is 1,2 …, B; k is 1,2, respectively representing good quality and bad quality; n is the vector length.

5. The model training method according to any one of claims 1 to 3, wherein constructing the distance loss between the first normal distribution and the second normal distribution corresponding to the face image labeled as good quality comprises:

b pieces of face images with good quality are input into the feature coding model, B first mean value vectors with good quality and B first standard deviation vectors with good quality are obtained, the mean value of the B first mean value vectors with good quality is recorded as a good quality mean value vector, and the mean value of the B first standard deviation vectors with good quality is recorded as a good quality standard deviation vector;

determining a good quality face feature vector of the good quality face image as a sample to obey a good quality first normal distribution with the first mean value vector of the good quality as a mean value and the first standard deviation vector of the good quality as a standard deviation;

among them, loss _f The distance loss, KL (. lambda.) (of the first normal distribution and the second normal distribution corresponding to the good-quality face image) ₁ ||* ₂ ) Denotes ₁ And ₂ the Kullback-Leibler divergence; p (zg) _n ) Denotes zg _n Probability distribution of (1), zg _n The sample obeys normal distribution with the mean value as the good quality mean vector and the standard deviation as the good quality standard deviation vector; p (C) _1,n ) Cluster center vector C representing good quality _1,n Probability distribution of (C) _1,n The second mean vector C as a sample obedient mean of good quality _1,1,n The second standard deviation vector C with a standard deviation of good quality _1,2,n Normal distribution of (1), p (C) _2,n ) Cluster center vector C representing bad quality _2,n Probability distribution of (C) _2,n The second mean vector C as a sample obedient mean of bad quality _2,1,n The second standard deviation vector C with standard deviation of bad quality _2,2,n Is the vector length, n is the normal distribution of (1).

6. The model training method of claim 1, wherein the losses of the joint training further include face feature losses, and wherein constructing the face feature losses comprises:

constructing an image reduction model, wherein the input of the image reduction model is the face feature vector, the output of the image reduction model is the tensor of the face image shape, the image reduction model is added into the joint training process, and the face feature loss is calculated through the following formula:

therein, loss _e For the loss of the face feature, x ^(b) For said face image, dx ^(b) Is a tensor of the shape of the face image, α ₄ B is the batch size of the training sample; b is 1,2 …, B.

7. The model training method of claim 1, wherein the losses of the joint training further comprise classification probability value uniformity losses, and wherein constructing the classification probability value uniformity losses comprises:

calculating the classification probability value uniformity loss by the following formula:

therein, loss _y For the loss of homogeneity of the classification probability values, J _k ^(b) A classification probability value, alpha, for the good and bad quality of the face image ₁ B is the batch size of the training sample; b is 1,2 …, B; k is 1,2, representing good quality and bad quality, respectively.

8. A face image quality scoring method is characterized by comprising the following steps:

inputting a face image to be recognized into a feature coding model, and obtaining a face feature vector corresponding to the face image to be recognized as a third normal distribution obeyed by a sample;

determining the quality of the facial image to be recognized according to the distance from the third normal distribution, which is obeyed by the sample, of the facial feature vector corresponding to the facial image to be recognized to the fourth normal distribution, which is respectively obeyed by the sample, of the two clustering center vectors output by the clustering center model; the feature coding model and the cluster center model are obtained by training through the model training method of any one of claims 1 to 7.

9. The method according to claim 8, wherein the determining the quality of the facial image to be recognized according to the distance between a third normal distribution to which a facial feature vector corresponding to the facial image to be recognized is obeyed as a sample and a fourth normal distribution to which two cluster center vectors output by a cluster center model are respectively obeyed as a sample comprises:

calculating the distance from the face feature vector corresponding to the face image to be recognized as a third normal distribution obeyed by the sample to two cluster center vectors output by the cluster center model as a fourth normal distribution obeyed by the sample respectively through the following formula;

d _g ＝KL(p(zx _n )||p(C _1,n ))

d _b ＝KL(p(zx _n )||p(C _2,n ))

wherein d is _g D is the distance between the third normal distribution and a fourth normal distribution which takes the cluster center vector describing the good quality of the face image as a sample obedient _b Is the distance, KL (#), between the third normal distribution and a fourth normal distribution which takes a cluster center vector describing the bad quality of the face image as sample obedience ₁ ||* ₂ ) Is shown by ₁ And ₂ the Kullback-Leibler divergence; p (zx) _n ) Denotes zx _n Probability distribution of (1), zx _n A face feature vector corresponding to the face image to be recognized and serving as a sample obeying the third normal distribution, p (C) _1,n ) Cluster center vector C representing good quality _1,n Probability distribution of (C) _1,n The second mean vector C as a sample obey mean of good quality _1,1,n Standard deviation ofGood quality of the second standard deviation vector C _1,2,n Normal distribution of (1), p (C) _2,n ) Cluster center vector C representing bad quality _2,n Probability distribution of (C) _2,n The second mean vector C as a sample obedient mean of bad quality _2,1,n The second standard deviation vector C with standard deviation of bad quality _2,2,n N is the vector length;

10. an electronic device, comprising:

at least one processor;

a memory communicatively coupled to the at least one processor;

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a model training method as claimed in any one of claims 1 to 7 or a face image quality scoring method as claimed in any one of claims 8 to 9.

11. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the model training method according to any one of claims 1 to 7 or the face image quality scoring method according to any one of claims 8 to 9.