CN115619729A

CN115619729A - Face image quality evaluation method and device and electronic equipment

Info

Publication number: CN115619729A
Application number: CN202211234838.8A
Authority: CN
Inventors: 黄泽元; 王夏洪
Original assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Current assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date: 2022-10-10
Filing date: 2022-10-10
Publication date: 2023-01-17

Abstract

The application provides a method and a device for evaluating the quality of a face image and electronic equipment. The method comprises the following steps: acquiring a pre-trained first face recognition network, and determining the number of layers of a residual error network contained in the first face recognition network; according to a preset random discarding probability, parameters in each layer of residual error network are randomly discarded, and after at least two times of random discarding, at least two discarded second face recognition networks are obtained; respectively processing the same face image by using each second face recognition network to obtain a feature vector corresponding to each second face recognition network; and calculating the distance between the feature vectors corresponding to the second face recognition network, and evaluating the quality of the face image based on the distance and a threshold value. The method and the device improve the efficiency of the quality evaluation of the face image, have high credibility of the evaluation result, and are quick and convenient in the evaluation process.

Description

Face image quality evaluation method and device and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for evaluating quality of a face image, and an electronic device.

Background

In actual industrial production, it is very important to evaluate the quality of a face image. Generally, the input of a face system is random, various noises (such as blurring, shading, lighting and the like) exist, and various face attacks also exist. If the human face quality control link is not added, the false recognition and false acceptance in the following tasks of human face recognition, comparison, retrieval and the like can be caused.

The existing human face image quality evaluation method comprises a manual method and a deep learning method. The manual method is used for manually designing the characteristics of the human face image, for example, calculating the fuzziness and the contrast of one human face image, calculating the inter-eye distance of the human face image and the like, the method usually only considers the influence of a single factor, cannot comprehensively consider all the influences, needs manual participation, and reduces the efficiency of human face image quality evaluation. The deep learning method is to regressively calculate the quality score of the face image by building a neural network, but the quality score of the face image for training the neural network is generally manually marked, and the credibility of the evaluation result of the neural network is low according to the experience and the feeling of a marking person.

Disclosure of Invention

In view of this, embodiments of the present application provide a method and an apparatus for evaluating a face image quality, and an electronic device, so as to solve the problems that all influences cannot be considered comprehensively, the efficiency of evaluating the face image quality is reduced, and the reliability of an evaluation result of a neural network is low in the prior art.

In a first aspect of the embodiments of the present application, a method for evaluating quality of a face image is provided, including: acquiring a pre-trained first face recognition network, and determining the number of layers of a residual error network contained in the first face recognition network; according to a preset random discarding probability, parameters in each layer of residual error network are randomly discarded, and after at least two times of random discarding, at least two discarded second face recognition networks are obtained; respectively processing the same face image by using each second face recognition network to obtain a feature vector corresponding to each second face recognition network; and calculating the distance between the feature vectors corresponding to the second face recognition network, and evaluating the quality of the face image based on the distance and a threshold value.

In a second aspect of the embodiments of the present application, there is provided a face image quality evaluation apparatus, including: the acquisition module is configured to acquire a pre-trained first face recognition network and determine the number of layers of a residual error network contained in the first face recognition network; the discarding module is configured to randomly discard parameters in each layer of residual error network according to a preset random discarding probability, and after at least two times of random discarding, at least two discarded second face recognition networks are obtained; the processing module is configured to process the same face image by using each second face recognition network respectively to obtain a feature vector corresponding to each second face recognition network; and the evaluation module is configured to calculate the distance between the feature vectors corresponding to the second face recognition network and evaluate the quality of the face image based on the distance and a threshold value.

In a third aspect of the embodiments of the present application, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method.

The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects:

determining the number of layers of a residual error network contained in a pre-trained first face recognition network; according to a preset random discarding probability, parameters in each layer of residual error network are randomly discarded, and after at least two times of random discarding, at least two discarded second face recognition networks are obtained; respectively processing the same face image by using each second face recognition network to obtain a feature vector corresponding to each second face recognition network; and calculating the distance between the feature vectors corresponding to the second face recognition network, and evaluating the quality of the face image based on the distance and a threshold value. According to the method and the device, the same face image is processed by using the second face recognition network obtained after random discarding, the face image quality is evaluated based on the distance between the feature vectors of the face image, all influences can be comprehensively considered, the face image quality evaluation efficiency is improved, the reliability of an evaluation result is high, and the evaluation process is quick and convenient.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flow chart of a face image quality evaluation method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating a training process of a robust regression network according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a face image quality evaluation apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In actual industrial production, it is very important to evaluate the quality of a face image. Generally, the input of a face system is random, various noises (such as blurring, shading, lighting and the like) exist, and various face attacks also exist. If the human face quality control link is not added, the false recognition and false acceptance in the subsequent tasks of human face recognition, comparison, retrieval and the like can be caused.

The existing human face image quality evaluation method comprises a manual method and a deep learning method. The manual method is used for manually designing the characteristics of the face image, such as calculating the blurring and contrast of one face image, calculating the inter-eye distance of the face image and the like, the method usually only considers the influence of a single factor, cannot comprehensively consider all the influences, needs manual participation, and reduces the efficiency of the quality evaluation of the face image. The deep learning method is to regressively calculate the quality score of the face image by building a neural network, but the quality score of the face image for training the neural network is generally manually marked, and the credibility of the evaluation result of the neural network is low according to the experience and the feeling of a marking person.

In view of this, in order to solve the quality evaluation problem of the face image, the embodiment of the present invention obtains a plurality of discarded second face recognition networks by randomly discarding parameters in each layer of residual network in the first face recognition network, processes the same face image by using each second face recognition network to obtain feature vectors corresponding to the face image, and finally implements quality evaluation of the face image by calculating a distance between every two feature vectors and according to the distance and a preset threshold. The following detailed description of the present disclosure will be made with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a schematic flow chart of a face image quality evaluation method provided in an embodiment of the present application. The facial image quality evaluation method of fig. 1 may be performed by a server. As shown in fig. 1, the method for evaluating the quality of a face image may specifically include:

s101, acquiring a pre-trained first face recognition network, and determining the number of layers of a residual error network contained in the first face recognition network;

s102, randomly discarding parameters in each layer of residual error network according to a preset random discarding probability, and obtaining at least two discarded second face recognition networks after at least two times of random discarding;

s103, processing the same face image by using each second face recognition network respectively to obtain a feature vector corresponding to each second face recognition network;

and S104, calculating the distance between the feature vectors corresponding to the second face recognition network, and evaluating the quality of the face image based on the distance and a threshold value.

Specifically, the first face recognition network in the embodiment of the present application may adopt any conventional face recognition neural network, and the structure of the face recognition neural network includes a multilayer residual error network, for example: the structure of the face recognition neural network can be designed into a classical 50-layer residual error network (IR 50), the input of the face recognition neural network is a clipped face image, such as clipping the original face image into a face picture with width and height (112 ), and the output of the face recognition neural network can be a feature vector with 512 dimensions.

Further, the random discarding in the embodiment of the present application refers to randomly discarding parameters included in each layer of residual error network in a face recognition neural network (here, a first face recognition network) according to a preset random discarding probability, it should be noted that the random discarding probability corresponding to each layer of residual error network may be different, when all the residual error networks are randomly discarded once, a face recognition neural network (i.e., a second face recognition network) that is randomly discarded is obtained, and a plurality of second face recognition networks can be obtained by repeating the above operations several times.

In some embodiments, randomly discarding the parameters in each layer of the residual network according to a preset random discarding probability includes: setting a random discarding probability interval for each layer of residual error network in the first face recognition network, and randomly selecting a random discarding probability from the random discarding probability interval as the probability for randomly discarding the parameters of the residual error network; and according to the random discarding probability of each layer of residual error network, randomly discarding the parameters in each layer of residual error network to obtain a discarded second face recognition network.

Specifically, in an alternative example, the first face recognition network is denoted as N ₁ The structure of this network is a classical 50-layer residual network (IR 50). Namely thatFirst face recognition network N ₁ The method comprises the steps of setting a random discarding probability interval for each layer of residual network, randomly selecting a probability value from the random discarding probability interval as the probability of random discarding according to the random discarding probability interval corresponding to each layer of residual network when random discarding operation is executed, sequentially randomly discarding parameters of each residual network, and setting the randomly discarded parameters to be 0, so that a new second face recognition network can be obtained.

For example: dividing 0-0.5 into 50 equal parts as the lower random discarding limit of each layer; 50 aliquots were made from 0.1 to 0.6 as the upper limit for random discard of each layer. That is, for network level 1, the random drop probability is randomly taken from (0, 0.1), network level 2, the random drop probability is randomly taken from (0.01, 0.11), \8230;, and so on, network level 49, the random drop probability is randomly taken from (0.49, 0.59), and network level 50, the random drop probability is randomly taken from (0.50, 0.60).

The reason for adopting the equal division setting is that the shallower the face recognition network is, the more and more detailed information is, and if the discarding probability is large, the influence on the face image quality judgment is large; the deeper the face recognition network, the richer the semantics are, and the stronger the discriminability can be achieved even if fewer features are retained, so that the random discarding probability can be set to be larger.

In one specific example, in the above-described random discard manner, each of the 1-50 layers of the residual network is provided with a probability of random discard, and the probability becomes larger as the layer becomes deeper. By identifying the first face to the network N ₁ Discarding 100 times randomly to obtain 100 variant networks (i.e. a second face recognition network), and then passing the same face image through the 100 variant networks to obtain 100 feature vectors: v. of ₁ ，v ₂ ......v ₁₀₀ 。

The first face recognition network N ₁ At least 2 times, e.g. first face recognition network N ₁ A random discarding (dropout) is carried out once to obtain a second face recognition network N ₂ Go further forwardAnother random discarding of rows may result in another second face recognition network N ₃ . It should be understood that, in order to reduce the complexity of the calculation, in the embodiment of the present application, random discarding may be performed only twice to obtain two different second face recognition networks, and based on two feature vectors obtained by processing the same face image by the two second face recognition networks, a distance between the two feature vectors is calculated, and the distance is used to evaluate the quality of the face image. However, it should be understood that the embodiments of the present application are not limited to two random discards, and more than two random discards may still be true, for example, the first face recognition network N ₁ And randomly discarding 100 times to obtain 100 different second face recognition networks, wherein each second face recognition network can output a feature vector, and the distance between every two feature vectors needs to be calculated instead to evaluate the face image quality.

In practical application, if a human face image is in the first human face recognition network N ₁ Is a high quality image, the face image is considered robust, and the first face is therefore identified to the network N ₁ A second face recognition network N is obtained by once random discarding ₂ And another random discarding is carried out to obtain a second face recognition network N ₃ The same face image passes through a second face recognition network N ₂ And a second face recognition network N ₃ The two feature vectors obtained after processing should be identical, i.e. the distance between the two feature vectors should be close to 0.

In some embodiments, the processing the same face image by using each second face recognition network respectively to obtain a feature vector corresponding to each second face recognition network includes: and cutting the face image according to the preset width and height, respectively inputting the cut face image into each second face recognition network, and respectively outputting the feature vector corresponding to the cut face image by using each second face recognition network.

Specifically, in an alternative example, the face image is cropped into a face picture with width and height (112 ), and the cropped face image is represented asimg, face recognition network N ₂ And a face recognition network N ₃ The output feature vector is represented as v ₁ And v ₂ Then the following formula exists:

N ₂ ＝dropout(N ₁ )

v ₁ ＝N ₂ (img)

N ₃ ＝dropout(N ₁ )

v ₂ ＝N ₃ (img)

wherein, N ₂ Is through to N ₁ Face recognition network obtained by random discarding, and N ₃ Is through the pair of N ₁ Face recognition network v obtained by another random discard ₁ The cut face image is input into a face recognition network N ₂ Then, the face recognition network N ₂ Output feature vector, v ₂ Inputting the cut human face image into a human face recognition network N ₃ Then, the face recognition network N ₃ And outputting the feature vector.

It should be noted that, in the embodiment of the present application, two random discards are taken as an example for explanation, and the face recognition network N ₁ Face recognition network N is obtained after two times of random discarding ₂ And a face recognition network N ₃ Therefore, each face recognition network after random discarding outputs a feature vector, but when the number of random discarding is more, for example, the face recognition network N is used ₁ And randomly discarding for 100 times, wherein the number of the obtained second face recognition networks is 100, and the corresponding second face recognition networks output 100 feature vectors in total.

In some embodiments, calculating distances between feature vectors corresponding to the second face recognition network, and evaluating the quality of the face image based on the distances and a threshold value includes: calculating inner products among the feature vectors output by the second face recognition network, taking the inner products as distances among the feature vectors, and calculating mean values and standard deviations corresponding to all the distances; and calculating a network robust value of the face image based on the mean value and the standard deviation, comparing the network robust value with a robust threshold value, and evaluating the quality of the face image according to a comparison result.

Specifically, after obtaining the feature vector output by each second face recognition network, calculating the distance between every two feature vectors in an inner product mode, wherein a distance calculation formula of the feature vectors is as follows:

dist＝dot(norm(v ₁ )，norm(v ₂ ))

where dist represents the distance between feature vectors, d0t represents the inner product, and norm represents the normalization of the feature vectors. If the quality of a face image is higher, the dist is smaller, and if the dist is larger, the quality of the face image is lower, namely the robustness between the face image and the first face recognition network is poor.

In one specific example, assuming there are 100 eigenvectors, two-by-two uses the inner product to compute the distance, resulting in 100 x 99=9900 distance values. These distance values constitute a distribution, and the mean and standard deviation corresponding to this distribution are calculated in the following manner.

The 100 feature vectors are added and averaged by the following formula to obtain the average value of the image-network robustness:

the standard deviation of "image-network robustness" is found by the following formula:

finally, the calculation formula of the network robust value of the face image is as follows:

n _rob ＝1-0.5*n_mean+0.5*n_std

wherein n is _rob The larger the network robustness value of the face image is, the lower the mean value and the standard deviation are, the better the network robustness of the face image is, and the higher the quality of the face image is. The embodiment of the application adopts the network robust value of the face image to measure the face image in the face recognition networkThe robustness in the method not only improves the efficiency of evaluating the quality of the face image, but also has high reliability of an evaluation result and quick and convenient evaluation process.

In some further embodiments, the method further comprises: taking the mean value, the standard deviation and the network robust value corresponding to each face image as a label of the face image, and training a pre-configured regression network by using a training set consisting of the face image and the label to obtain a training and fitting regression network; the regression network structure comprises four stages, each stage is used for outputting a characteristic diagram, and a robust value regression module is arranged behind the last stage.

Specifically, in the foregoing embodiment, one face image needs to be subjected to 100 computations (after the computations of 100 second face recognition networks) to finally calculate a network robust value, which results in a long computation time.

Further, in one specific example, assume a batch of face training data sets consisting of 50 ten thousand face images. Firstly, the technical solutions provided by the foregoing embodiments are used to respectively calculate the mean value and the standard deviation of the network robustness of the 50 ten thousand face images, for example: each face image is calculated through the face recognition network after being discarded randomly to obtain 1 eigenvector, then 100 eigenvectors can be obtained through the face recognition network after being discarded randomly, the 100 eigenvectors are calculated pairwise to calculate the inner product to obtain the average value, the standard deviation and the network robust value, and the average value, the standard deviation and the network robust value are used as labels of the face images.

Further, in one specific example, a regression network N is constructed that includes 18 layers of residual networks (IR 18) _r The backbone architecture uses a classical IR18 architecture, comprising four stages (stages), an 18-layer network. Regression network transmissionThe width and height of the face image are (112 ), 4 stages of 4 feature images F are respectively output ₁ ，F ₂ ，F ₃ ，F ₄ The dimensions are (56, 56, 64), (28, 28, 128), (14, 14, 256), (7, 512), respectively.

Wherein, in the 4 th characteristic diagram F ₄ And adding a robust value regression module R for regressing the robust value. The net variance contains 3 convolutions with 3 kernels by 3, each convolution being followed by a Batch Normalization (BN) and activation function (ReLU). Followed by a global average pooling, followed by 1-dimension (512, 256) matrix calculation and 1-dimension (256, 128), followed by a sigmoid function (activation function). 128 values can be obtained, and from the 128 values, the mean value and standard deviation thereof can be calculated, thereby obtaining the network robustness value.

It should be noted that, when the regression network is trained by using the face training data set, the following two network loss functions are used, and the two network loss functions specifically include the following contents:

first, N is required to be used ₁ Identifying networks to constrain N _r And outputting characteristic graphs of the first stage and the second stage of the regression network. Namely N _r Output characteristic diagram F of the first stage and the second stage of the regression network ₁ ，F ₂ Requires and N ₁ Identifying output characteristic diagram F of first stage and second stage of network _1-1 ，F _1-2 This constitutes the first network loss function, which is the subtraction of the profiles of the two networks to average the absolute values. The formula is as follows:

loss1＝mean(|F ₁ -F _1-1 |)+mean(|F ₂ -F _1-2 |)

secondly, the mean value, the standard deviation and the robust value output by the regression module R of the regression network all need to make a smooth _ L1 loss (i.e. a second loss function) with the label of the picture. The formula is as follows:

loss2＝smoothl1(mean-mean′)+smoothl1(std-std′)+smoothl1(rob-rob′)

wherein, what the above-mentioned formula was left-falling is regression module output, and what did not have left-falling is the label.

After the training and fitting are carried out in the training process, a regression network after the training and fitting can be obtained, when a face picture is inferred, 100 times of calculation do not need to be carried out through 100 face recognition networks, and only the regression network after the training and fitting needs to be used for carrying out one-time calculation to obtain a network robust value.

In some embodiments, after obtaining the training fitted regression network, the method further comprises: inputting the facial image to be evaluated into a regression network after training fitting, outputting a network robust value corresponding to the facial image to be evaluated by using the regression network after training fitting, comparing the network robust value with a robust threshold value, and evaluating the quality of the facial image to be evaluated according to a comparison result.

Specifically, the facial image of which the quality needs to be evaluated is input into a regression network (i.e., a robust value regression network) after training and fitting, a network robust value corresponding to the facial image can be obtained through calculation of the robust value regression network, and the network robust value of the facial image is compared with a robust threshold value, so that the quality of the facial image is judged.

Further, two ways of calculating robust values are proposed in the foregoing embodiments, respectively, namely, the first way is robust value calculation based on a face recognition network, and the second way is to directly regress the robust values by using a robust regression network. In the following embodiments, the regression network itself is measured for robustness in the embodiments of the present application. Fig. 2 is a schematic diagram of a training process of a robust regression network according to an embodiment of the present disclosure. As shown in fig. 2, the training process of the robust regression network may specifically include:

s201, randomly discarding parameters of each layer of residual error network in the regression network, and discarding for multiple times to obtain multiple discarded regression networks;

s202, taking the same face image as the input of a plurality of discarded regression networks to obtain a plurality of network robust values corresponding to the discarded regression networks;

s203, calculating discarded robust mean values corresponding to the plurality of network robust values, calculating to obtain a quality robust value based on the discarded robust mean values and the original robust value, and using the quality robust value as a label of the face image;

s204, training a pre-configured robust regression network by using a training set consisting of the face images and the labels to obtain a trained and fitted robust regression network;

the robust regression network is a neural network obtained by adding a mass regression module behind the last stage of the regression network, when the robust regression network is trained, other parts except the mass regression module are frozen, and only the mass regression module is subjected to parameter adjustment.

In a specific example, by performing 100 random discards (dropouts) on the regression network, the robustness value obtained by the same face picture after passing through the regression network after 100 different discards is different. Because of the existence of information discarding, the "robust value" of the 100 times is necessarily reduced compared to the "original robust value". If the reduction is not much, the overall quality is high, and if the reduction is much, the overall quality is poor.

Obtaining 100 robust values through 100 network regression, averaging the robust values to obtain a 'discarded robust mean value' rob2, subtracting the original robust values rob1 and rob2, and subtracting the original robust values rob1 and rob2 from 1 to obtain a 'quality robust value' rob3:

rob ₃ ＝1-(rob ₁ -rob ₂ )

and calculating the quality robust value of the face pictures in the training set to obtain the label of the quality robust value of each face picture.

Further, the quality robust value is used as a label of the face image, a training set composed of the face image and the label is used to train a pre-configured robust regression network, and compared with the regression network of the previous embodiment, the robust regression network is trained through a fourth feature map F in the regression network ₄ Then adding a mass regression module QR with the structure consistent with that of the robust value regression module R, finally adding a (128, 1) matrix and a sigmoid function (activation function) to obtain a regression-capable qualityA robust regression network measuring the robust values. When the robust regression network is trained, other parts in the robust regression network are all frozen, only the QR part is not frozen, the parameters of the QR part can be adjusted, and the loss function during training uses smooth _ L1 loss (namely, the second loss function).

In some embodiments, after obtaining the trained fitted robust regression network, the method further comprises: inputting the face image to be evaluated into a robust regression network after training fitting, and outputting a network robust value and a quality robust value corresponding to the face image to be evaluated by using the robust regression network after training fitting; and calculating a quality score based on the network robust value and the quality robust value, comparing the quality score with a quality score threshold value, and evaluating the quality of the face image to be evaluated according to a comparison result.

Specifically, after training to obtain a robust value regression network, the main part of the robust value regression network is composed of IR18 with two regression modules R and QR. R outputs 128 values, the network robust value q1 of the face image can be further calculated, and QR outputs one value, namely the quality robust value q2 of the face image. The final face image quality score q can be calculated by the following formula:

q＝q1*(q2*0.2+0.8)

and comparing the quality score q corresponding to the calculated face image with a preset quality score threshold value, thereby judging whether the quality of the face image is qualified.

It should be noted that, in the above embodiment, the random discarding of the face recognition network or the regression network may also be changed to perform random pixel erasure on the face image, or may also directly discard a certain layer or some layers of residual networks. The specific discarding mode can be determined according to actual requirements, and the random discarding mode for the face recognition network or the regression network does not limit the technical scheme of the application.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Fig. 3 is a schematic structural diagram of a face image quality evaluation apparatus according to an embodiment of the present application. As shown in fig. 3, the face image quality evaluation apparatus includes:

an obtaining module 301, configured to obtain a pre-trained first face recognition network, and determine the number of layers of a residual error network included in the first face recognition network;

a discarding module 302, configured to randomly discard parameters in each layer of residual network according to a preset random discarding probability, and after at least two times of random discarding, obtain at least two discarded second face recognition networks;

the processing module 303 is configured to process the same face image by using each second face recognition network, so as to obtain a feature vector corresponding to each second face recognition network;

and the evaluation module 304 is configured to calculate distances between the feature vectors corresponding to the second face recognition network, and evaluate the quality of the face image based on the distances and the threshold value.

In some embodiments, the discarding module 302 of fig. 3 sets a random discarding probability interval for each layer of residual error network in the first face recognition network, and randomly selects a random discarding probability from the random discarding probability interval as a probability for randomly discarding the parameters of the residual error network; and according to the random discarding probability of each layer of residual error network, randomly discarding the parameters in each layer of residual error network to obtain a discarded second face recognition network.

In some embodiments, the processing module 303 in fig. 3 crops the face image according to a preset width and a preset height, inputs the cropped face image into each second face recognition network, and outputs a feature vector corresponding to the cropped face image by using each second face recognition network.

In some embodiments, the evaluation module 304 of fig. 3 calculates an inner product between feature vectors output by the second face recognition network, takes the inner product as a distance between the feature vectors, and calculates a mean value and a standard deviation corresponding to all the distances; and calculating a network robust value of the face image based on the mean value and the standard deviation, comparing the network robust value with a robust threshold value, and evaluating the quality of the face image according to a comparison result.

In some embodiments, the training module 305 in fig. 3 takes the mean value, the standard deviation, and the network robust value corresponding to each face image as a label of the face image, and trains a pre-configured regression network by using a training set composed of the face image and the label to obtain a training-fitted regression network; the regression network structure comprises four stages, each stage is used for outputting a characteristic diagram, and a robust value regression module is arranged behind the last stage.

In some embodiments, after obtaining the training-fitted regression network, the evaluation module 304 in fig. 3 inputs the facial image to be evaluated into the training-fitted regression network, outputs a network robust value corresponding to the facial image to be evaluated by using the training-fitted regression network, compares the network robust value with a robust threshold, and evaluates the quality of the facial image to be evaluated according to the comparison result.

In some embodiments, the training module 305 of fig. 3 randomly discards parameters of each layer of residual error network in the regression network, and after multiple times of discarding, obtains multiple discarded regression networks; the same face image is used as the input of a plurality of discarded regression networks to obtain a plurality of network robust values corresponding to the discarded regression networks; calculating discarded robust mean values corresponding to the plurality of network robust values, calculating to obtain a quality robust value based on the discarded robust mean values and the original robust value, and using the quality robust value as a label of the face image; training a pre-configured robust regression network by using a training set consisting of the face images and the labels to obtain a trained and fitted robust regression network; the robust regression network is a neural network obtained by adding a mass regression module behind the last stage of the regression network, when the robust regression network is trained, other parts except the mass regression module are frozen, and only the mass regression module is subjected to parameter adjustment.

In some embodiments, after obtaining the robust regression network after training fitting, the evaluation module 304 in fig. 3 inputs the facial image to be evaluated into the robust regression network after training fitting, and outputs a network robust value and a quality robust value corresponding to the facial image to be evaluated by using the robust regression network after training fitting; and calculating a quality score based on the network robust value and the quality robust value, comparing the quality score with a quality score threshold value, and evaluating the quality of the face image to be evaluated according to a comparison result.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 4 is a schematic structural diagram of an electronic device 4 provided in an embodiment of the present application. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: a processor 401, a memory 402, and a computer program 403 stored in the memory 402 and operable on the processor 401. The steps in the various method embodiments described above are implemented when the processor 401 executes the computer program 403. Alternatively, the processor 401 implements the functions of the respective modules/units in the above-described respective apparatus embodiments when executing the computer program 403.

Illustratively, the computer program 403 may be partitioned into one or more modules/units, which are stored in the memory 402 and executed by the processor 401 to accomplish the present application. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 403 in the electronic device 4.

The electronic device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other electronic devices. The electronic device 4 may include, but is not limited to, a processor 401 and a memory 402. Those skilled in the art will appreciate that fig. 4 is merely an example of the electronic device 4, and does not constitute a limitation of the electronic device 4, and may include more or fewer components than shown, or some of the components may be combined, or different components, e.g., the electronic device may also include an input-output device, a network access device, a bus, etc.

The Processor 401 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 402 may be an internal storage unit of the electronic device 4, for example, a hard disk or a memory of the electronic device 4. The memory 402 may also be an external storage device of the electronic device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device 4. Further, the memory 402 may also include both internal storage units of the electronic device 4 and external storage devices. The memory 402 is used for storing computer programs and other programs and data required by the electronic device. The memory 402 may also be used to temporarily store data that has been output or is to be output.

It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above function distribution may be performed by different functional units and modules as needed, that is, the internal structure of the device is divided into different functional units or modules, so as to perform all or part of the above described functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the above-described apparatus/computer device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, another division may be made in actual implementation, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by the present application, and the computer program can be stored in a computer readable storage medium to instruct related hardware, and when the computer program is executed by a processor, the steps of the method embodiments described above can be realized. The computer program may comprise computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may contain suitable additions or additions that may be required in accordance with legislative and patent practices within the jurisdiction, for example, in some jurisdictions, computer readable media may not include electrical carrier signals or telecommunications signals in accordance with legislative and patent practices.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method for evaluating the quality of a face image is characterized by comprising the following steps:

acquiring a pre-trained first face recognition network, and determining the number of layers of a residual error network contained in the first face recognition network;

according to a preset random discarding probability, parameters in each layer of the residual error network are discarded randomly, and after at least two times of random discarding, at least two discarded second face recognition networks are obtained;

processing the same face image by using each second face recognition network to obtain a feature vector corresponding to each second face recognition network;

and calculating the distance between the characteristic vectors corresponding to the second face recognition network, and evaluating the quality of the face image based on the distance and a threshold value.

2. The method according to claim 1, wherein the randomly discarding the parameters in the residual network of each layer according to a preset random discarding probability comprises:

setting a random discarding probability interval for each layer of the residual error network in the first face recognition network, and randomly selecting a random discarding probability from the random discarding probability interval as the probability for randomly discarding the parameters of the residual error network;

and according to the random discarding probability of each layer of the residual error network, randomly discarding the parameters in each layer of the residual error network to obtain a discarded second face recognition network.

3. The method of claim 1, wherein the processing the same face image by using each of the second face recognition networks to obtain a feature vector corresponding to each of the second face recognition networks comprises:

and cutting the face image according to the preset width and height, respectively inputting the cut face image into each second face recognition network, and respectively outputting the feature vector corresponding to the cut face image by using each second face recognition network.

4. The method according to claim 3, wherein the calculating of the distance between the feature vectors corresponding to the second face recognition network and the evaluating of the quality of the face image based on the distance and a threshold value comprise:

calculating an inner product between the feature vectors output by the second face recognition network, taking the inner product as the distance between the feature vectors, and calculating a mean value and a standard deviation corresponding to all the distances;

and calculating a network robust value of the face image based on the mean value and the standard deviation, comparing the network robust value with a robust threshold value, and evaluating the quality of the face image according to a comparison result.

5. The method of claim 4, further comprising:

taking the mean value, the standard deviation and the network robust value corresponding to each face image as a label of the face image, and training a pre-configured regression network by using a training set consisting of the face image and the label to obtain a training and fitting regression network;

the regression network structure comprises four stages, each stage is used for outputting a characteristic diagram, and a robust value regression module is arranged behind the last stage.

6. The method of claim 5, wherein after the deriving the trained fitted regression network, the method further comprises:

inputting the facial image to be evaluated into the regression network after training fitting, outputting a network robust value corresponding to the facial image to be evaluated by using the regression network after training fitting, comparing the network robust value with a robust threshold value, and evaluating the quality of the facial image to be evaluated according to a comparison result.

7. The method of claim 6, further comprising:

randomly discarding parameters of each layer of residual error network in the regression network, and discarding for multiple times to obtain multiple discarded regression networks;

taking the same face image as the input of the discarded regression networks to obtain a plurality of network robust values corresponding to the discarded regression networks;

calculating discarded robust mean values corresponding to the plurality of network robust values, calculating to obtain a quality robust value based on the discarded robust mean values and the original robust value, and using the quality robust value as a label of the face image;

training a pre-configured robust regression network by using a training set consisting of the face images and the labels to obtain a trained and fitted robust regression network;

the robust regression network is a neural network obtained by adding a quality regression module behind the last stage of the regression network, when the robust regression network is trained, other parts except the quality regression module are frozen, and only the quality regression module is subjected to parameter adjustment.

8. The method of claim 7, wherein after the deriving the trained fitted robust regression network, the method further comprises:

inputting the face image to be evaluated into the robust regression network after training fitting, and outputting a network robust value and a quality robust value corresponding to the face image to be evaluated by using the robust regression network after training fitting;

and calculating a quality score based on the network robust value and the quality robust value, comparing the quality score with a quality score threshold value, and evaluating the quality of the face image to be evaluated according to a comparison result.

9. A face image quality evaluation apparatus, characterized by comprising:

the acquisition module is configured to acquire a pre-trained first face recognition network and determine the number of layers of a residual error network contained in the first face recognition network;

the discarding module is configured to randomly discard parameters in each layer of the residual error network according to a preset random discarding probability, and after at least two times of random discarding, at least two discarded second face recognition networks are obtained;

the processing module is configured to process the same face image by using each second face recognition network respectively to obtain a feature vector corresponding to each second face recognition network;

and the evaluation module is configured to calculate the distance between the feature vectors corresponding to the second face recognition network, and evaluate the quality of the face image based on the distance and a threshold value.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 8 when executing the program.