CN114119551A

CN114119551A - Quantitative analysis method for human face image quality

Info

Publication number: CN114119551A
Application number: CN202111424323.XA
Authority: CN
Inventors: 夏立; 蔡娜娜; 郑鹏; 李峰岳; 王康; 张晓燕
Original assignee: Nanjing Fiberhome Telecommunication Technologies Co ltd
Current assignee: Nanjing Fiberhome Telecommunication Technologies Co ltd
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2022-03-01

Abstract

The invention relates to a quantitative analysis method of face image quality, which introduces factors of fine granularity, posture, definition and illumination to analyze, and classifies the face image into fine granularity for distinguishing problems of false detection, low quality, shielding and the like in the face image; the definition estimation model is trained through sequencing learning, so that the accurate estimation of the definition of the face image is realized; obtaining a specific angle of the human face in a three-dimensional space by using a posture angle regression model; the method is characterized in that score mapping parameters and influence factor weights suitable for different types are fitted by combining illumination analysis and testing images in different scenes, so that relatively accurate face quality scores can be given in different application scenes, low-quality face filtering is effectively performed, and subjective face quality evaluation is provided for face images; in addition, the used models are all lightweight models after compression optimization, the calculation speed is high, and the resource occupation is less.

Description

Quantitative analysis method for human face image quality

Technical Field

The invention relates to a quantitative analysis method for the quality of a face image, and belongs to the technical field of face image quality evaluation.

Background

The image quality assessment algorithm aims to give an objective quantitative value in accordance with a subjective quality judgment of a person using a mathematical model. The different points of the human face image quality evaluation are that not only the image quality but also whether the human face image can be used for human face recognition is considered, and factors such as the posture, the shielding degree, the blurring degree, the illumination condition and the like of the human face in the image all affect the human face recognition result, so that the subjective feeling of the human face image is difficult to convert into numbers for measurement, which is a great challenge for the human face image quality evaluation.

In tasks such as face recognition, face clustering, face attribute analysis and the like, face images with poor quality need to be filtered, false detection conditions inevitably occur in a face detection algorithm, a face detection result is directly output to a subsequent link, and the effect of the subsequent algorithm is influenced to a great extent. Therefore, a human face quality evaluation algorithm capable of effectively filtering low quality and accurately quantitatively analyzing the quality of the human face image is needed.

At present, the face Quality evaluation method can be divided into two types, one is that Quality indexes (HQV) are subjectively defined through a Human vision system, and the other is that Quality scores (MQV) are directly determined through face recognition effects. In the HQV method, the human face image quality is reflected by calculating the influence factors such as the posture, the definition and the like; in the MQV method, the quality score is usually obtained from the matching similarity of the face recognition features.

In practical application scenes, a simple subjective definition quality evaluation method is not enough for accurately evaluating the quality of a face image, and a quality score generated based on the face recognition similarity completely depends on a face recognition model and is greatly different from a human subjective perception result.

The existing human face quality evaluation method has the following defects:

(1) the definition and the posture judgment accuracy are lower

In the subjective face quality definition method, the traditional methods are mostly adopted in the calculation methods of definition, posture and the like, but the traditional definition estimation methods (such as Laplace, Sobel operators and the like) cannot effectively solve the problems of blurring and noise at the same time; the traditional posture estimation method, for example, evaluates the posture result by measuring the symmetry of a human face region, cannot obtain the specific angle of the human face in a three-dimensional space, and is not accurate enough for the quantitative analysis of the posture.

(2) Neglect the problem of human face shielding

At present, in a method for subjectively defining the quality of a human face, almost no evaluation is carried out on occlusion influence factors, and occlusion has a great influence on the recognition of the human face. Masks, helmets and other external shielding situations exist in a great number in application scenes such as monitoring. The human face quality evaluation algorithm can consider the shielding factor and reflect the influence of different shielding degrees on the quality.

(3) The method based on the face recognition features has poor generalization capability and low calculation speed

In order to use the face recognition features for knowledge migration, the mass fraction regression network and the face recognition network have basically the same structure, and the network is not light enough in practical application, so that the generalization capability of different face feature extraction models is poor.

(4) Quality evaluation and subjective perception inconsistency based on face recognition features

In the quality evaluation method based on the face recognition features, the quality score labels are generated by recognition similarity, but generally, low-quality samples of the same person are lacked in a face recognition data set, so that the model trained by the generated labels cannot achieve good discrimination for the low-quality samples. When the similarity is calculated, a high-quality sample needs to be selected, quality labels generated by different objects have relativity, no clear standard exists, and the difference exists between the quality labels and the subjective perception result.

Disclosure of Invention

The invention aims to solve the technical problem of providing a quantitative analysis method for the quality of a face image, which can effectively filter a low-quality face and provide subjective face quality evaluation for the face image through fine-grained classification of the face quality, face pose estimation, a face definition estimation model and illumination analysis.

The invention adopts the following technical scheme for solving the technical problems: the invention designs a quantitative analysis method of human face image quality, which is used for quantifying the quality of a local human face image to be detected in an image to be detected, and executes the following steps A to J to obtain a score corresponding to the local human face image to be detected and is used for performing quality screening on the local human face image to be detected;

step A, processing a local face image to be detected based on a quality classification network which takes the face image as input and takes preset face image fine-grained classification corresponding to the face image as output, obtaining fine-grained classification corresponding to the local face image to be detected, using the fine-grained classification as the quality classification corresponding to the local face image to be detected, and entering step B;

b, based on the image to be detected, taking a local face image to be detected as a center, presetting a first proportion aiming at the outer expansion of a selection frame corresponding to the local face image to be detected, obtaining a first local outer expansion face image to be detected, if the first local outer expansion face image exceeds the area of the image to be detected, filling the first local outer expansion face image by using all 0 pixel values, and entering the step C;

step C, processing a first partially externally expanded to-be-detected face image based on an attitude classification network which takes the face image as input and outputs probabilities of preset angle intervals corresponding to the yaw angle direction, the pitch angle direction and the roll angle direction of the face in the face image as output, obtaining prediction results of preset angle intervals corresponding to the yaw angle direction, the pitch angle direction and the roll angle direction of the face in the first partially externally expanded to-be-detected face image respectively, executing mathematical expectation regression to obtain prediction angle continuous values corresponding to the yaw angle direction, the pitch angle direction and the roll angle direction of the face in the first partially externally expanded to-be-detected face image respectively, namely obtaining prediction angle continuous values corresponding to the yaw angle direction, the pitch angle direction and the roll angle direction of the face in the partial to-be-detected face image respectively, and then entering step D;

step D, based on the image to be detected, with the local face image to be detected as the center, presetting a second proportion aiming at the outer expansion of the selection frame corresponding to the local face image to be detected to obtain a second local outer expansion face image to be detected, and entering the step E;

step E, processing the second local externally-expanded face image to be detected based on a definition estimation network which takes the face image as input and the definition data corresponding to the face image as output, obtaining the definition data corresponding to the second local externally-expanded face image to be detected, mapping the definition data between 0 and 1 by using a sigmoid function, taking the definition data as the definition value corresponding to the second local externally-expanded face image to be detected, namely obtaining the definition value corresponding to the local face image to be detected, and then entering the step F;

step F, based on the position coordinates of each corner point of the local face image to be detected and the length and width of the local face image to be detected, reducing a selection frame corresponding to the local face image to be detected to obtain a local illumination area in the local face image to be detected, calculating to obtain an average value of V channels in HSV color space corresponding to the local illumination area, using the average value as a face illumination value corresponding to the local face image to be detected, and entering step G;

g, calculating to obtain a score corresponding to a predicted angle continuous value of the face corresponding to the yaw angle direction, a score corresponding to a predicted angle continuous value of the pitch angle direction and a score corresponding to a predicted angle continuous value of the roll angle direction in the local face image to be detected, applying a weighting mode according to preset yaw angle direction weight, preset pitch angle direction weight and roll angle direction weight under each quality classification to obtain an attitude score corresponding to the local face image to be detected, and entering the step H;

step H, calculating to obtain a definition score corresponding to the local face image to be detected according to the definition value corresponding to the local face image to be detected, and then entering the step I;

step I, calculating to obtain an illumination score corresponding to the local face image to be detected according to the face illumination value corresponding to the local face image to be detected, and then entering step J;

step J, according to the posture weight, the definition weight and the illumination weight which are respectively corresponding to each quality classification, combining the quality classification corresponding to the local human face image to be detected according to the following formula:

obtaining a Score corresponding to the local face image to be detected, wherein,

respectively representing the attitude weight, the definition weight and the illumination weight under the quality classification corresponding to the local human face image to be detected, S_P、S_C、S_LRespectively representing the corresponding attitude score, definition score and illumination score m of the local human face image to be detected_tAnd representing the preset maximum face image score under the quality classification corresponding to the local face image to be detected.

As a preferred technical scheme of the invention: in the step A, based on the fine-grained classification corresponding to the obtained local face image to be detected, the quality classification corresponding to the fine-grained classification corresponding to the local face image to be detected is obtained by combining the fine-grained classification with the preset mapping relation between the quality classifications, namely the quality classification corresponding to the local face image to be detected is obtained, and then the step B is carried out.

As a preferred technical scheme of the invention: in the step C, the face in the obtained first partially-expanded face image to be detected respectively corresponds to the prediction results of each preset angle interval in the yaw angle direction, the pitch angle direction and the roll angle direction, and the prediction results are as follows:

performing mathematical expectation regression to obtain continuous values yaw of the predicted angles of the faces in the first partially external-extended face image to be detected in the yaw angle direction_pPitch angle direction predicted angle continuous value pitch_pRolling angle direction predicted angle continuous value roll_pObtaining the continuous values of the prediction angles of the face in the local face image to be detected in the yaw angle direction, the pitch angle direction and the roll angle direction respectively; where, I is {0, 1, 2, …, I }, where I represents the number of angle sections that the face is divided into in each attitude angle direction, and is a local_yawRepresenting the output, logit, of the pose classification network corresponding to the face in the yaw direction_pitchRepresenting the output, logit, of the pose classification network corresponding to the face in the pitch angle direction_rollThe output, softmax (logit) of the posture classification network corresponding to the direction of the roll angle of the human face is shown_yaw)_iRepresents logit_yawProbability, softmax (logit) corresponding to the ith angle interval_pitch)_iRepresents logit_pitchProbability, softmax (logit) corresponding to the ith angle interval_roll)_iRepresents logit_rollThe probability corresponding to the i-th angle interval.

As a preferred technical scheme of the invention: in step F, according to the following formula:

calculating to obtain an average value L of V channels in HSV color space corresponding to the local illumination area as a face illumination value corresponding to the face image to be detected, wherein W and H are the width and the height of the local illumination area respectively, and V is_whThe (w, h) coordinate position in the local illumination area corresponds to the pixel value of the V channel in the HSV color space.

As a preferred technical scheme of the invention: in the step G, executing the following steps G1 to G4 to obtain a score corresponding to a predicted angle continuous value of the face in the local face image to be detected in the yaw angle direction, a score corresponding to a predicted angle continuous value in the pitch angle direction and a score corresponding to a predicted angle continuous value in the roll angle direction;

step G1. is based on the coordinate system with the abscissa as the attitude angle and the ordinate as the fraction, and aiming at the yaw angle direction, the pitch angle direction and the roll angle direction corresponding to the human face: based on the variation range of the pose angles of the face in the corresponding direction, forming two-corner-point coordinate positions by respectively setting the corresponding scores of the extreme pose angles of the face rotating towards the two sides of the face in the direction to be 0, forming a vertex coordinate position by a preset maximum score value corresponding to the pose angle 0 of the face rotating in the direction, and then entering the step G2;

step G2. is performed for the pitch angle direction and the roll angle direction corresponding to the human face: on the basis of the fact that a human face rotates to two sides of the human face in the corresponding direction respectively, the same preset first fractional values corresponding to the same preset first rotating attitude angles form two first rotating point coordinate positions, wherein the preset first rotating attitude angles are larger than 0-degree attitude angles and smaller than a limit attitude angle, and the preset first fractional values corresponding to the preset first rotating attitude angles are smaller than the fractional values on a straight line between the corner point coordinate position and the vertex coordinate position of the same side corresponding to the preset first rotating attitude angles;

aiming at the yaw angle direction corresponding to the human face: based on the same preset second score values corresponding to the same preset second rotation attitude angles of the face in the corresponding yaw angle direction, the face rotates to the two sides of the face in the corresponding yaw angle direction respectively, and two second rotation point coordinate positions corresponding to the quality classifications are formed, wherein the preset second rotation attitude angles in the quality classifications are larger than 0-degree attitude angles and smaller than a limit attitude angle, and the preset second score values corresponding to the preset second rotation attitude angles in the quality classifications are larger than the score values of the preset second rotation attitude angles on a straight line connecting the corner point coordinate position and the vertex coordinate position on the same side of the preset second rotation attitude angles;

then go to step G3;

g3., linearly connecting the first rotation point coordinate position and the same-side corner point coordinate position respectively in sequence from the vertex coordinate position to the two sides of the vertex coordinate position to form a corresponding relation between the attitude angle and the score of the face corresponding to the pitch angle direction and the roll angle direction;

respectively aiming at each quality classification, respectively and linearly connecting a second rotating point coordinate position corresponding to the quality classification and an angular point coordinate position on the same side to the two sides of the top coordinate position in sequence to form a corresponding relation between the attitude angle and the score of the face corresponding to the yaw angle direction under the quality classification, and further obtaining a corresponding relation between the attitude angle and the score of the face corresponding to the yaw angle direction under each quality classification;

then go to step G4;

step G4. is based on the corresponding relationship between the attitude angle and the score of the face corresponding to the pitch angle direction and the roll angle direction and the corresponding relationship between the attitude angle and the score of the face corresponding to the yaw angle direction under each quality classification, and combines the quality classification corresponding to the local face image to be detected to obtain the score corresponding to the predicted angle continuous value of the face corresponding to the yaw angle direction in the local face image to be detected, the score corresponding to the predicted angle continuous value of the pitch angle direction and the score corresponding to the predicted angle continuous value under the roll angle direction.

As a preferred technical scheme of the invention: the step H comprises the following steps H1 to H4, and the definition score corresponding to the local face image to be detected is obtained;

step H1, aiming at each preset face sample image, obtaining definition values corresponding to the face sample images according to the method in the step E, and then entering the step H2;

step H2, calculating to obtain definition scores corresponding to the face sample images according to an image definition score calculation method, and then entering step H3;

step H3., based on a coordinate system with the abscissa as the definition value and the ordinate as the definition fraction, forming fitting point positions according to the definition value and the definition fraction corresponding to each face sample image, fitting to obtain the corresponding relation between the definition value and the definition fraction, and then entering step H4;

step H4., obtaining the definition score corresponding to the local face image to be detected according to the corresponding relationship between the definition value and the definition score and the definition value corresponding to the local face image to be detected.

As a preferred technical scheme of the invention: the step I comprises the steps I1 to I4, and the illumination score corresponding to the local face image to be detected is obtained;

step I1., based on a coordinate system with an abscissa as an illumination value and an ordinate as an illumination fraction, combining a preset illumination value range, forming a start coordinate position with a minimum illumination value corresponding fraction as 0, forming an end coordinate position with a maximum illumination value corresponding fraction as a preset fraction, and then entering step I2;

step I2, forming high-resolution coordinate positions respectively corresponding to the quality classifications based on preset maximum illumination scores corresponding to preset high-resolution illumination values between the minimum illumination values and the maximum illumination values under the quality classifications, and then entering step I3;

step I3, respectively aiming at each quality classification, sequentially connecting a high-grade coordinate position and a tail-end coordinate position corresponding to the quality classification by a starting point coordinate position through an opening downward arc line to form a corresponding relation between an illumination value and an illumination score corresponding to the quality classification, further obtaining a corresponding relation between the illumination value and the illumination score corresponding to each quality classification, and then entering step I4;

step I4., obtaining the illumination score corresponding to the local face image to be measured according to the corresponding relationship between the illumination value and the illumination score corresponding to each quality classification, in combination with the quality classification corresponding to the local face image to be measured and the face illumination value corresponding to the local face image to be measured.

As a preferred technical scheme of the invention: the step I2 further includes forming auxiliary coordinate positions corresponding to the quality classifications by preset auxiliary illumination scores corresponding to preset auxiliary illumination values between the preset high-resolution illumination values and the maximum illumination values under the quality classifications based on the high-resolution coordinate positions corresponding to the quality classifications, respectively;

in step I3, for each quality classification, the high-score coordinate position, the auxiliary coordinate position, and the tail coordinate position corresponding to the quality classification are sequentially connected by the start coordinate position and the downward arc line, so as to form a corresponding relationship between the illumination value and the illumination score corresponding to the quality classification, and further obtain a corresponding relationship between the illumination value and the illumination score corresponding to each quality classification.

As a preferred technical scheme of the invention: in the step a, for a Resnet18 network, removing the last residual module in the network, and replacing the average pooling layer in the network with an adaptive average pooling layer to obtain an updated network, where the quality classification network is implemented based on the updated network;

training a quality classification network of the structure according to the preset face images as input and the fine-grained classification corresponding to the face images as output based on the preset face images and the preset face image fine-grained classification corresponding to the sample face images respectively, and updating to obtain the quality classification network;

in the step C, for the Resnet18 network, removing the last residual module in the network, and updating the full connection layer in the network to include branch full connection layers respectively corresponding to the yaw angle direction, the pitch angle direction, and the roll angle direction to obtain an updated network, where the posture classification network is implemented based on the updated network;

presetting various angle interval categories under the conditions that various sample face images are preset and faces in the various sample face images respectively correspond to a yaw angle direction, a pitch angle direction and a roll angle direction, and outputting prediction results of the various angle interval categories under the conditions that the face images serve as input and the faces in the human face images respectively correspond to the yaw angle direction, the pitch angle direction and the roll angle direction, training is carried out on the posture classification network with the structure, and the posture classification network is obtained through updating;

in the step E, the definition estimation network is realized based on a Resnet10 network; and training the definition estimation network with the structure based on the preset definition data corresponding to each sample face image and each sample face image, and taking the face image as input and the definition data corresponding to the face image as output, and updating to obtain the definition estimation network.

As a preferred technical scheme of the invention: in the training process of the definition estimation network, the following steps are carried out: and aiming at the preset face images of all samples, respectively executing distortion processing of different methods and different degrees to obtain low-quality face images of the samples, which respectively correspond to the different degrees under the distortion methods, of the face images of the samples, jointly forming the face images of the samples, sequencing the face images of the samples by a pairwise method, combining a rankloss function, taking the face images as input and the definition data corresponding to the face images as output, and training a definition estimation network.

Compared with the prior art, the quantitative analysis method for the quality of the face image has the following technical effects by adopting the technical scheme:

the invention designs a quantitative analysis method for the quality of a face image, which introduces factors of fine granularity, posture, definition and illumination to analyze, and classifies the face image into fine granularity classification for distinguishing the problems of false detection, low quality, shielding and the like in the face image; the definition estimation model is trained through sequencing learning, so that the accurate estimation of the definition of the face image is realized; obtaining a specific angle of the human face in a three-dimensional space by using a posture angle regression model; the method is characterized in that score mapping parameters and influence factor weights suitable for different types are fitted by combining illumination analysis and testing images in different scenes, so that relatively accurate face quality scores can be given in different application scenes, low-quality face filtering is effectively performed, and subjective face quality evaluation is provided for face images; in addition, the used models are all lightweight models after compression optimization, the calculation speed is high, and the resource occupation is less.

Drawings

FIG. 1 is a schematic diagram of a framework of a quantitative analysis method for human face image quality according to the present invention;

FIG. 2 is a graph showing the correspondence between attitude angle and score in an embodiment of the present invention;

FIG. 3 is a graph showing the relationship between sharpness values and sharpness score values in an embodiment of the present invention;

fig. 4 shows the correspondence between the illumination values and the illumination scores in the embodiment of the present invention.

Detailed Description

The following description will explain embodiments of the present invention in further detail with reference to the accompanying drawings.

The invention designs a quantitative analysis method of facial image quality, which is used for quantifying the quality of a local facial image to be measured in an image to be measured.

Regarding the quality classification network, removing the last residual module in the network and replacing the average pooling layer in the network to be a self-adaptive average pooling layer aiming at the Resnet18 network to obtain an updated network, wherein the quality classification network is realized based on the updated network; and training the quality classification network of the structure according to the preset face images as input and the fine-grained classification corresponding to the face images as output based on the preset face images of the samples and the fine-grained classification of the preset face images corresponding to the face images of the samples, and updating to obtain the quality classification network.

Regarding the attitude classification network, removing the last residual module in the network aiming at the Resnet18 network, updating a full connection layer in the network to obtain an updated network, wherein the full connection layer comprises branch full connection layers respectively corresponding to a yaw angle direction, a pitch angle direction and a roll angle direction, and the attitude classification network is realized based on the updated network; and based on presetting each sample face image and presetting each angle interval type under the condition that the face in each sample face image respectively corresponds to the yaw angle direction, the pitch angle direction and the roll angle direction, training the posture classification network with the structure and updating to obtain the posture classification network according to the condition that the face image is used as input and the face in the face image respectively corresponds to the yaw angle direction, the pitch angle direction and the roll angle direction and presetting each angle interval type prediction results as output.

The sharpness estimation network is implemented based on the Resnet10 network; and training the definition estimation network with the structure based on the preset definition data corresponding to each sample face image and each sample face image, and taking the face image as input and the definition data corresponding to the face image as output, and updating to obtain the definition estimation network.

In the training process of the definition estimation network in the application: the method comprises the steps of respectively executing distortion processing of different methods and different degrees aiming at preset face images of various samples, wherein the distortion processing comprises 10 distortion methods such as resolution reduction, Gaussian blur, Gaussian noise and the like, each distortion method has 4 different distortion degrees, obtaining low-quality face images of various samples corresponding to different degrees under various distortion methods respectively, forming the face images of various samples together, sequencing the face images of various samples through a pairwise method, combining a rankloss function, taking the face images as input, taking definition data corresponding to the face images as output, and training aiming at a definition estimation network.

As shown in table 1 below, the structure of the quality classification network, the pose classification network, and the sharpness estimation network is shown schematically.

TABLE 1

And the following table 2 shows the size of the model specifically applied in the practical application for the quality classification network, the attitude classification network and the definition estimation network.

TABLE 2

Model (model)	Input size	Amount of ginseng	FLOPs
				Quality of	39696	2.8M	258.91M
Posture	3112112	2.8M	352.45M
				Definition of	3224224	4.9M	894.16M

Based on the above quality classification network, the posture classification network, and the sharpness estimation network, in practical application, as shown in fig. 1, the following steps a to J are performed to obtain a score corresponding to the local face image to be detected, so as to perform quality screening on the local face image to be detected. The face picture shown in fig. 1 is derived from public data set information about VGGFACE.

Step A, processing a local face image to be detected based on a quality classification network which takes the face image as input and takes preset face image fine-grained classification corresponding to the face image as output to obtain fine-grained classification corresponding to the local face image to be detected; and further based on the fine-grained classification corresponding to the obtained local face image to be detected, combining the preset mapping relation between each fine-grained classification and each preset quality classification to obtain the quality classification corresponding to the fine-grained classification corresponding to the local face image to be detected, namely obtaining the quality classification corresponding to the local face image to be detected, and then entering the step B.

In practical application, currently, 28 kinds of fine-grained classifications (including various types of false detections, abnormal faces, occlusion, and the like) of face quality are supported, according to the above design, common scenes and subsequent face feature extraction requirements are integrated, and the above fine-grained classifications are mapped into quality classifications of several common faces and output, including normal, occlusion, attitude abnormality, low-quality faces, chromaticity abnormality, and non-faces, that is, a preset mapping relationship is preset between each fine-grained classification and each preset quality classification, and as shown in table 3 below, a quality classification corresponding to a local face image to be detected is obtained.

TABLE 3

Quality output class number	Quality classification output	Corresponding fine-grained classification
			0	Is normal	Normal, slight occlusion, etc
1	Facial shield	Mask, brim shield and the like
			2	Abnormal posture	Overlook 90-degree side face
3	Poor quality face	Side face mouthCovering, blurring, cutting, etc
			4	Color anomaly	Black and white, etc
-1	Non-human face	Animals, tires, etc

And B, based on the image to be detected, taking the local face image to be detected as the center, performing external expansion on a selection frame corresponding to the local face image to be detected by preset first proportion to obtain a first local external expansion face image to be detected, filling the first local external expansion face image to be detected by all 0 pixel values if the first local external expansion face image to be detected exceeds the area of the image to be detected, adjusting the first local external expansion face image to be detected to be in a preset standard size, performing normalization and standardization processing, and entering the step C.

In application, if the first local external expansion to-be-detected face image has feature points, judging whether the input image rotates by +/-90 and +/-180 degrees or not by using key points for the first local external expansion to-be-detected face image, and if the input image rotates, rotating the first local external expansion to-be-detected face image to a normal direction.

And C, processing the first partially externally expanded to-be-detected face image based on an attitude classification network which takes the face image as input and outputs probabilities of preset angle intervals corresponding to the yaw angle direction, the pitch angle direction and the roll angle direction of the face in the face image as output, so as to obtain prediction results of preset angle intervals corresponding to the yaw angle direction, the pitch angle direction and the roll angle direction of the face in the first partially externally expanded to-be-detected face image, and according to the following formula:

performing mathematical expectation regression to obtain continuous values yaw of the predicted angles of the faces in the first partially external-extended face image to be detected in the yaw angle direction_pPitch angle direction predicted angle continuous value pitch_pRolling angle direction predicted angle continuous value roll_pObtaining the continuous values of the prediction angles of the face in the local face image to be detected in the yaw angle direction, the pitch angle direction and the roll angle direction respectively, and then entering the step D; where, I is {0, 1, 2, …, I }, where I represents the number of angle sections that the face is divided into in each attitude angle direction, and is a local_yawRepresenting the output, logit, of the pose classification network corresponding to the face in the yaw direction_pitchRepresenting the output, logit, of the pose classification network corresponding to the face in the pitch angle direction_rollThe output, softmax (logit) of the posture classification network corresponding to the direction of the roll angle of the human face is shown_yaw)_iRepresents logit_yawProbability, softmax (logit) corresponding to the ith angle interval_pitch)_iRepresents logit_pitchProbability, softmax (logit) corresponding to the ith angle interval_roll)_iRepresents logit_rollThe probability corresponding to the i-th angle interval.

And D, based on the image to be detected, taking the local face image to be detected as the center, carrying out external expansion on the selection frame corresponding to the local face image to be detected by a preset second proportion to obtain a second local external expansion face image to be detected, adjusting the second local external expansion face image to be detected to be a preset standard size, carrying out normalization and standardization processing, and entering the step E.

And E, processing the second local externally-expanded face image to be detected based on a definition estimation network which takes the face image as input and the definition data corresponding to the face image as output, obtaining the definition data corresponding to the second local externally-expanded face image to be detected, mapping the definition data between 0 and 1 by using a sigmoid function to be used as the definition value corresponding to the second local externally-expanded face image to be detected, namely obtaining the definition value corresponding to the local face image to be detected, and then entering the step F.

And F, based on the position coordinates of each angular point of the local face image to be detected and the length and width of the local face image to be detected, reducing a selection frame corresponding to the local face image to be detected to obtain a local illumination area in the local face image to be detected, and according to the following formula:

calculating to obtain an average value L of V channels in HSV color space corresponding to the local illumination area, using the average value L as a face illumination value corresponding to the local to-be-detected face image, wherein the larger the L, the brighter the face is, the smaller the L, the darker the face is, and then entering the step G. Wherein W and H are the width and height of the local illumination area, respectively, and V_whThe (w, h) coordinate position in the local illumination area corresponds to the pixel value of the V channel in the HSV color space.

G, executing the following steps G1 to G4, and calculating to obtain a score corresponding to a predicted angle continuous value of the face in the yaw angle direction, a score corresponding to a predicted angle continuous value of the pitch angle direction and a score corresponding to a predicted angle continuous value in the roll angle direction in the local face image to be detected; and according to the preset yaw angle direction weight, the preset pitch angle direction weight and the roll angle direction weight under each quality classification, obtaining an attitude score corresponding to the local face image to be detected by applying a weighting mode, and then entering the step H.

Step G1. is based on the coordinate system with the abscissa as the attitude angle and the ordinate as the fraction, and aiming at the yaw angle direction, the pitch angle direction and the roll angle direction corresponding to the human face: based on the variation range of the pose angle of the face in the corresponding direction, two corner coordinate positions are formed by respectively setting the corresponding scores of the extreme pose angles of the face rotating to the two sides of the face in the direction to be 0, and the vertex coordinate position is formed by a preset maximum score value corresponding to the pose angle 0 of the face rotating in the direction, and then the step G2 is carried out.

aiming at the yaw angle direction corresponding to the human face: based on the same preset second score values corresponding to the same preset second rotation attitude angles of the face in the corresponding yaw angle direction, the face rotates to the two sides of the face in the corresponding yaw angle direction respectively, and two second rotation point coordinate positions corresponding to the quality classifications are formed, wherein the preset second rotation attitude angles in the quality classifications are larger than 0-degree attitude angles and smaller than a limit attitude angle, and the preset second score values corresponding to the preset second rotation attitude angles in the quality classifications are larger than the score values of the preset second rotation attitude angles on a straight line connecting the corner point coordinate position and the vertex coordinate position on the same side of the preset second rotation attitude angles; then proceed to step G3.

G3., linearly connecting the first rotation point coordinate position and the same-side corner point coordinate position with the vertex coordinate position towards the two sides of the vertex coordinate position respectively to form a corresponding relation between the posture angle and the score of the face corresponding to the pitch angle direction and the roll angle direction, as shown in fig. 2;

respectively aiming at each quality classification, respectively and linearly connecting a second rotation point coordinate position corresponding to the quality classification and an angular point coordinate position on the same side to the two sides of the top coordinate position in sequence to form a corresponding relation between the attitude angle and the score of the face corresponding to the yaw angle direction under the quality classification, and further obtaining a corresponding relation between the attitude angle and the score of the face corresponding to the yaw angle direction under each quality classification, as shown in fig. 2; then proceed to step G4.

And step H, according to the definition value corresponding to the local face image to be detected, executing the following steps H1 to H4, calculating to obtain the definition score corresponding to the local face image to be detected, and then entering the step I.

And H1, aiming at each preset face sample image, obtaining definition values corresponding to the face sample images according to the method in the step E, and then entering the step H2.

And H2, calculating to obtain definition scores corresponding to the face sample images according to an image definition score calculation method, and then entering the step H3.

Step H3. is to form each fitting point position by using the corresponding definition value and definition score of each face sample image based on the coordinate system with the abscissa as the definition value and the ordinate as the definition score, and to fit to obtain the corresponding relationship between the definition value and the definition score, as shown in fig. 3, and then to go to step H4.

Step I, according to the face illumination value corresponding to the local face image to be detected, executing the following steps I1 to I4, calculating to obtain the illumination score corresponding to the local face image to be detected, and then entering the step J.

Step I1. is based on the coordinate system with the abscissa as the illumination value and the ordinate as the illumination fraction, and combines the preset illumination value range, the minimum illumination value corresponding fraction is 0 to form the start coordinate position, the maximum illumination value corresponding fraction is the preset fraction to form the end coordinate position, and then the step I2 is entered.

Step I2, forming high-resolution coordinate positions corresponding to the quality classifications respectively based on preset maximum illumination scores corresponding to preset high-resolution illumination values between the minimum illumination values and the maximum illumination values under the quality classifications; and forming auxiliary coordinate positions corresponding to the quality classifications according to preset auxiliary illumination scores corresponding to auxiliary illumination values preset between the preset high-resolution illumination value and the maximum illumination value under each quality classification based on the high-resolution coordinate positions corresponding to the quality classifications, and then entering step I3.

And step I3, respectively aiming at each quality classification, sequentially connecting a high-grade coordinate position, an auxiliary coordinate position and an end coordinate position corresponding to the quality classification by a starting point coordinate position through an opening downward arc line, forming a corresponding relation between the illumination value and the illumination score corresponding to the quality classification, further obtaining a corresponding relation between the illumination value and the illumination score corresponding to each quality classification, as shown in the figure 4, and then entering the step I4.

respectively representing the correspondence of local face images to be detectedAttitude weight, sharpness weight, illumination weight under quality classification, S_P、S_C、S_LRespectively representing the corresponding attitude score, definition score and illumination score m of the local human face image to be detected_tAnd representing the preset maximum face image score under the quality classification corresponding to the local face image to be detected.

The quantitative analysis method for the quality of the face image is designed by the technical scheme, and the method introduces factors of fine granularity, posture, definition and illumination to analyze, classifies the face image into fine granularity and is used for distinguishing the problems of false detection, low quality, shielding and the like in the face image; the definition estimation model is trained through sequencing learning, so that the accurate estimation of the definition of the face image is realized; obtaining a specific angle of the human face in a three-dimensional space by using a posture angle regression model; the method is characterized in that score mapping parameters and influence factor weights suitable for different types are fitted by combining illumination analysis and testing images in different scenes, so that relatively accurate face quality scores can be given in different application scenes, low-quality face filtering is effectively performed, and subjective face quality evaluation is provided for face images; in addition, the used models are all lightweight models after compression optimization, the calculation speed is high, and the resource occupation is less.

The embodiments of the present invention will be described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims

1. A quantitative analysis method of human face image quality is used for quantifying the quality of a local human face image to be detected in an image to be detected, and is characterized in that the following steps A to J are executed to obtain a score corresponding to the local human face image to be detected and used for performing quality screening on the local human face image to be detected;

respectively representing the attitude weight, the definition weight and the illumination weight under the quality classification corresponding to the local human face image to be detected, S_P、S_C、S_LRespectively representing the correspondence of local face images to be detectedGesture score, sharpness score, illumination score, m_tAnd representing the preset maximum face image score under the quality classification corresponding to the local face image to be detected.

2. The method for quantitatively analyzing the quality of human face images as claimed in claim 1, wherein: in the step A, based on the fine-grained classification corresponding to the obtained local face image to be detected, the quality classification corresponding to the fine-grained classification corresponding to the local face image to be detected is obtained by combining the fine-grained classification with the preset mapping relation between the quality classifications, namely the quality classification corresponding to the local face image to be detected is obtained, and then the step B is carried out.

3. The method for quantitatively analyzing the quality of human face images as claimed in claim 1, wherein: in the step C, the face in the obtained first partially-expanded face image to be detected respectively corresponds to the prediction results of each preset angle interval in the yaw angle direction, the pitch angle direction and the roll angle direction, and the prediction results are as follows:

performing mathematical expectation regression to obtain continuous values yaw of the predicted angles of the faces in the first partially external-extended face image to be detected in the yaw angle direction_pPitch angle direction predicted angle continuous value pitch_pRolling angle direction predicted angle continuous value roll_pObtaining the yaw angle direction, the pitch angle direction and the roll angle direction respectively corresponding to the face in the local face image to be detectedDownward predicted angle continuum; where, I is {0, 1, 2, …, I }, where I represents the number of angle sections that the face is divided into in each attitude angle direction, and is a local_yawRepresenting the output, logit, of the pose classification network corresponding to the face in the yaw direction_pitchRepresenting the output, logit, of the pose classification network corresponding to the face in the pitch angle direction_rollThe output, softmax (logit) of the posture classification network corresponding to the direction of the roll angle of the human face is shown_yaw)_iRepresents logit_yawProbability, softmax (logit) corresponding to the ith angle interval_pitch)_iRepresents logit_pitchProbability, softmax (logit) corresponding to the ith angle interval_roll)_iRepresents logit_rollThe probability corresponding to the i-th angle interval.

4. The method for quantitatively analyzing the quality of human face images as claimed in claim 1, wherein: in step F, according to the following formula:

5. The method for quantitatively analyzing the quality of human face images as claimed in claim 1, wherein: in the step G, executing the following steps G1 to G4 to obtain a score corresponding to a predicted angle continuous value of the face in the local face image to be detected in the yaw angle direction, a score corresponding to a predicted angle continuous value in the pitch angle direction and a score corresponding to a predicted angle continuous value in the roll angle direction;

then go to step G3;

then go to step G4;

6. The method for quantitatively analyzing the quality of human face images as claimed in claim 1, wherein: the step H comprises the following steps H1 to H4, and the definition score corresponding to the local face image to be detected is obtained;

7. The method for quantitatively analyzing the quality of human face images as claimed in claim 1, wherein: the step I comprises the steps I1 to I4, and the illumination score corresponding to the local face image to be detected is obtained;

8. The method for quantitatively analyzing the quality of human face images as claimed in claim 7, wherein: the step I2 further includes forming auxiliary coordinate positions corresponding to the quality classifications by preset auxiliary illumination scores corresponding to preset auxiliary illumination values between the preset high-resolution illumination values and the maximum illumination values under the quality classifications based on the high-resolution coordinate positions corresponding to the quality classifications, respectively;

9. The method for quantitatively analyzing the quality of human face images as claimed in claim 1, wherein: in the step a, for a Resnet18 network, removing the last residual module in the network, and replacing the average pooling layer in the network with an adaptive average pooling layer to obtain an updated network, where the quality classification network is implemented based on the updated network;

10. The method for quantitatively analyzing the quality of human face images as claimed in claim 9, wherein: in the training process of the definition estimation network, the following steps are carried out: and aiming at the preset face images of all samples, respectively executing distortion processing of different methods and different degrees to obtain low-quality face images of the samples corresponding to different degrees under the distortion methods, respectively, jointly forming the face images of the samples, sequencing the face images of the samples by a pairwise method, combining a rank loss function, taking the face images as input and the definition data corresponding to the face images as output, and training a definition estimation network.