CN110866471A

CN110866471A - Face image quality evaluation method and device, computer readable medium and communication terminal

Info

Publication number: CN110866471A
Application number: CN201911055879.9A
Authority: CN
Inventors: 颜波
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-10-31
Filing date: 2019-10-31
Publication date: 2020-03-06
Also published as: WO2021083241A1

Abstract

The present disclosure relates to the field of image recognition technologies, and in particular, to a method and an apparatus for evaluating quality of a face image, a method and an apparatus for training a feature extraction model, an image processing system, a computer-readable medium, and a communication terminal. The method comprises the following steps: acquiring an image to be processed containing a human face; detecting the image to be processed to obtain a corresponding face image; inputting a face image into a trained feature extraction model based on a mobile face recognition network, and performing feature extraction on the face image to acquire feature data; and inputting the characteristic data into a first full-connection layer and a second full-connection layer which are continuously arranged for processing so as to obtain the face quality score of the face image. The method and the device can realize the feature extraction and the quality scoring of the face image in one network, thereby effectively reducing the magnitude of the model. Moreover, the quality of the face image can be quickly evaluated, and the accuracy of a quality evaluation result is ensured.

Description

Face image quality evaluation method and device, computer readable medium and communication terminal

Technical Field

The present disclosure relates to the field of image recognition technologies, and in particular, to a method for evaluating a face image quality, a method for training a feature extraction model, a device for evaluating a face image quality, a device for training a feature extraction model, an image processing system, a computer-readable medium, and a wireless communication terminal.

Background

With the rapid development of image processing technology, face recognition technology has become an indispensable technology in the fields of monitoring, customs clearance and the like. The low quality of the face image can greatly reduce the success rate of face recognition.

In the prior art, an image processing and matching method based on feature engineering and a method based on deep learning exist to evaluate the quality of a human face. However, the prior art also has certain problems and disadvantages. For example, in a face quality evaluation method based on deep learning, scoring and labeling of a face image need to be completed manually, a large amount of time and energy are needed, certain subjectivity is achieved, factors affecting the face quality are more, manual labeling cannot comprehensively consider the influence of factors in multiple aspects, and therefore a labeled sample is inaccurate, and accuracy of a model is further affected. In addition, more and more models need to be applied to intelligent mobile terminal equipment, so that higher requirements are placed on the size and the performance of the models, and the requirements of the existing face quality evaluation method on the size and the running time of the models are difficult to meet.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure is directed to a method and an apparatus for evaluating a quality of a face image, a method and an apparatus for training a feature extraction model, an image processing system, a computer readable medium, and a wireless communication terminal, which can quickly evaluate a quality of a face, thereby overcoming limitations and drawbacks of related technologies to a certain extent.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the present disclosure, a method for evaluating the quality of a face image is provided, which includes:

acquiring an image to be processed containing a human face;

detecting the image to be processed to obtain a corresponding face image;

inputting the face image into a trained feature extraction model based on a mobile face recognition network, and performing feature extraction on the face image to obtain feature data;

and inputting the characteristic data into a first full-connection layer and a second full-connection layer which are continuously arranged for convolution processing so as to obtain the face quality score of the face image.

According to a second aspect of the present disclosure, there is provided a training method of a feature extraction model, including:

responding to an image processing instruction of an image service system, and acquiring a sample image containing a human face;

inputting the sample image into a continuously arranged convolution layer and a depth convolution layer for continuous convolution processing to obtain a first convolution result;

inputting the first convolution result into n continuously-arranged bottleneck structure layers for continuous convolution processing to obtain a second convolution result; wherein n is greater than 5 and is a positive integer;

performing convolution processing on the second convolution result by using the continuously arranged convolution layer and the linear global depth convolution layer to obtain a third convolution result;

performing full-connection processing on the third convolution result by using a full-connection layer to obtain face feature data corresponding to the sample image;

and inputting the face feature data into a loss function model to calculate loss parameters, and optimizing based on the loss parameters to iteratively train a feature extraction model.

According to a third aspect of the present disclosure, there is provided a face image quality evaluation device, comprising:

the image processing device comprises a to-be-processed image acquisition module, a to-be-processed image acquisition module and a processing module, wherein the to-be-processed image acquisition module is used for acquiring a to-be-processed image containing a human face;

the face image extraction module is used for detecting the image to be processed to obtain a corresponding face image;

the face feature data extraction module is used for inputting the face image into a trained feature extraction model based on a mobile face recognition network and extracting features of the face image to obtain feature data;

and the face quality scoring module is used for inputting the feature data into a first full-connection layer and a second full-connection layer which are continuously arranged for convolution processing so as to obtain the face quality score of the face image.

According to a fourth aspect of the present disclosure, there is provided a training apparatus for a feature extraction model, comprising:

the system comprises a sample data acquisition module, a face recognition module and a face recognition module, wherein the sample data acquisition module is used for responding to an image processing instruction of an image service system and acquiring a sample image containing a face;

the first convolution result generation module is used for inputting the sample image into continuously arranged convolution layers and depth convolution layers to carry out continuous convolution processing so as to obtain a first convolution result;

the second convolution result generation module is used for inputting the first convolution result into n continuously-arranged bottleneck structure layers for continuous convolution processing so as to obtain a second convolution result; wherein n is greater than 5 and is a positive integer;

a third convolution result generation module, configured to perform convolution processing on the second convolution result by using the continuously-arranged convolutional layers and the linear global depth convolutional layers to obtain a third convolution result;

the face feature data generation module is used for carrying out full-connection processing on the third convolution result by using a full-connection layer so as to obtain face feature data corresponding to the sample image;

and the iterative training module is used for inputting the face feature data into a loss function model to calculate loss parameters, and optimizing the face feature data based on the loss parameters to iteratively train a feature extraction model.

According to a fifth aspect of the present disclosure, there is provided an image processing system comprising:

the service module is used for acquiring an image to be processed;

the image processing module is used for responding to a service processing instruction sent by the service module to execute the human face image quality evaluation method in any one of the embodiments so as to obtain a grading result of the image to be processed.

According to a sixth aspect of the present disclosure, there is provided a computer-readable medium, on which a computer program is stored, the computer program, when executed by a processor, implementing the above-mentioned face image quality evaluation method; or the above-mentioned training method of the feature extraction model.

According to a seventh aspect of the present disclosure, there is provided a wireless communication terminal comprising:

one or more processors;

a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the above-described face image quality evaluation method; or the above-mentioned training method of the feature extraction model.

In the facial image quality evaluation method provided by the embodiment of the disclosure, the trained feature extraction model based on the mobile face recognition network is used for extracting the features of the facial image, and the two full-connection layers are used for outputting the scoring result of the facial image, so that the feature extraction and quality scoring of the facial image in one network are realized, and the magnitude of the model is effectively reduced. Moreover, the quality of the face image can be quickly evaluated, and the accuracy of a quality evaluation result is ensured.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 schematically illustrates a schematic diagram of a face image quality evaluation method in an exemplary embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating an architecture of a feature extraction model based on a mobile face recognition network in an exemplary embodiment of the present disclosure;

fig. 3 schematically illustrates an architecture diagram of a bottleneck structure layer with a step size of 1 in an exemplary embodiment of the present disclosure;

fig. 4 schematically illustrates an architecture diagram of a bottleneck structure layer with a step size of 2 in an exemplary embodiment of the present disclosure;

fig. 5 schematically illustrates an overall architecture diagram of a face image quality evaluation model in an exemplary embodiment of the present disclosure;

FIG. 6 is a schematic diagram schematically illustrating a training method of a feature extraction model in an exemplary embodiment of the present disclosure;

fig. 7 schematically illustrates a composition diagram of a face image quality evaluation apparatus in an exemplary embodiment of the present disclosure;

FIG. 8 is a schematic diagram illustrating components of a training apparatus for a feature extraction model according to an exemplary embodiment of the disclosure;

FIG. 9 schematically illustrates a composition diagram of an image processing system in an exemplary embodiment of the present disclosure;

fig. 10 schematically shows a structural diagram of a computer system of a wireless communication device in an exemplary embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

In the prior art, when the quality of the face image is evaluated, on one hand, the grading and labeling of the face image in the training process are completed manually, so that a large amount of time and energy are needed, and the method has certain subjectivity. On the other hand, factors influencing the quality of the human face are manifold, and can include human face posture, human face shielding, contrast, resolution, illumination, background and the like, and the influence of the manifold factors cannot be considered comprehensively by manual labeling, which will have certain influence on the accuracy of the human face evaluation model result. In addition, more and more image evaluation methods are required to be applied to intelligent mobile terminal equipment such as a smart phone and a tablet personal computer, so that higher requirements are placed on the size and the performance of a model, the requirements on the size and the running time of the model are difficult to meet by the existing face quality evaluation method, and a lighter face quality evaluation model is required.

In view of the above disadvantages and shortcomings of the prior art, the present exemplary embodiment provides a method for evaluating quality of a face image, where the model has a smaller magnitude and can be applied to an intelligent terminal device such as a mobile phone and a tablet computer. Referring to fig. 1, the above-mentioned face image quality evaluation method may include the following steps:

s11, acquiring an image to be processed containing a human face;

s12, detecting the image to be processed to obtain a corresponding face image;

s13, inputting the face image into a trained feature extraction model based on a mobile face recognition network, and performing feature extraction on the face image to acquire feature data;

and S14, inputting the feature data into a first full-connection layer and a second full-connection layer which are continuously arranged for convolution processing so as to obtain the face quality score of the face image.

In the method for evaluating the quality of the face image provided by the exemplary embodiment, on one hand, the trained feature extraction model based on the mobile face recognition network is used for extracting the features of the face image, and the two full-connection layers are used for outputting the scoring result of the face image, so that the quality of the face image can be quickly evaluated, and meanwhile, the accuracy of the quality evaluation result is ensured. On the other hand, the grading result of the face image is output by using two full-connection layers after the feature extraction model, so that the feature extraction and quality grading of the face image are realized through one network, and the magnitude of the model is effectively reduced. .

Hereinafter, each step of the face image quality evaluation method in the present exemplary embodiment will be described in more detail with reference to the drawings and examples.

Step S11, an image to be processed including a human face is acquired.

In this example embodiment, the above-mentioned smart terminal device may be a smart terminal such as a mobile phone and a tablet computer equipped with a camera module. The user can utilize the camera module of the terminal equipment to take a picture to obtain the image to be processed containing the face. Or, the user can also take a picture through an external camera component to obtain the image to be processed containing the face. Or, the image to be processed sent by other equipment can be received through a wired or wireless network.

And step S12, detecting the image to be processed to obtain a corresponding face image.

In this exemplary embodiment, after the to-be-processed image is acquired, since the image may include a background, noise, and the like, the to-be-processed image may be preprocessed to acquire a corresponding face image. Specifically, the following steps may be included:

step S121, carrying out face detection on the image to be processed to obtain a face area;

step S122, carrying out face key point detection on the face region to obtain key points of the face region;

and step S123, aligning the face region based on the key point of the face region to obtain a face image after alignment.

For example, the trained face detection model may be used to perform face detection on the image to be processed to determine a face region, and the trained face keypoint detection model may be used to perform keypoint detection on the face region to extract the keypoint information of the face. And converting the face region into a standard face by using a preset similarity transformation matrix. For example, the similarity transformation matrix may include the following equation:

wherein, the upper left corner 2 x 2 matrix is a rotating part; t is t_xAnd t_yFor the translation factor, 4 degrees of freedom are included, namely rotation, x-direction translation, y-direction translation, and a scaling factor s.

For the face region image, the length ratio, the included angle and the circle center before and after the similarity transformation are kept unchanged.

In addition, the above-mentioned face detection model and face key point detection model can be implemented by using conventional techniques, and the disclosure is not particularly limited herein. Alternatively, in other exemplary embodiments of the present disclosure, one model may be used for Face detection and detection of Face keypoint information, for example, a Hyper Face model is used for Face detection and estimation of keypoint location and head angle.

And step S13, inputting the face image into a trained feature extraction model based on a mobile face recognition network, and performing feature extraction on the face image to acquire feature data.

In this example embodiment, a mobile face recognition network (MobileFaceNets) model may be trained in advance. Specifically, the following steps may be included:

step S21, obtaining original data, and preprocessing the original data to obtain sample data.

In this exemplary embodiment, facial image data of multiple persons in different scenes may be acquired as raw data, for example, images in different states related to the face itself, such as states of different facial poses, occlusions, and expressions, may be acquired as raw data. Or, images under different image parameters can be acquired; for example, for an image acquisition sensor, different contrast, resolution, brightness, or the like may be configured; different lighting, locations, backgrounds, etc. may be configured for the image acquisition environment.

After the original data is obtained, the trained face detection and face key point detection models can be used for face detection and face key point detection, and then the face is converted into a standard face according to similarity conversion. For example, the original data is preprocessed using the method in the above embodiment to obtain sample data.

Step S22, inputting the sample data into the continuously set convolution layer and depth convolution layer to perform continuous convolution processing to obtain a first convolution result.

In the present exemplary embodiment, referring to fig. 2, the convolution kernel of the convolution layer is 3 × 3, and the step s is 2; the convolution kernel of the depth convolution layer is 3 × 3, and the step s is 1.

Step S23, inputting the first convolution result into n continuously arranged bottleneck structure layers for continuous convolution processing to obtain a second convolution result; wherein n >5 and is a positive integer.

In the present exemplary embodiment, the sample data after that may be input into the improved mobile face recognition network. Specifically, the improved mobile face recognition network in the embodiment includes a different number of bottleneck structure (bottleeck) layers from the prior art, the bottleneck structure includes different structures, and the last layer of improvement, etc. Specifically, referring to fig. 2, the improved mobile face recognition network may include sequentially arranged: a first convolutional layer, a depth convolutional layer, six continuous bottleneck structure layers, a second convolutional layer, a linear global depth convolutional layer and a full connection layer.

And configuring the step length and the execution repetition times in each layer based on the bottleneck structures of six continuous layers in six continuous bottleneck structure layers. For example, the preset step length correspondingly configured for the odd-numbered bottleneck structure layers is P, and the preset step length correspondingly configured for the even-numbered bottleneck structure layers is Q; wherein P > Q, and P, Q are all positive integers. For example, P ═ 2 and Q ═ 1 may be configured. The first bottleneck structure layer is configured to have a step length s of 2 and a repetition number n of 1; the second bottleneck structure layer is configured to have a step length of s ═ 1 and a repetition number of n ═ 4; the third bottleneck structure layer is configured to have a step length s equal to 2 and a repetition number n equal to 1; the fourth bottleneck structure layer is configured to have a step length s equal to 1 and a repetition number n equal to 6; the fifth bottleneck structure layer is configured to have a step length s equal to 2 and a repetition number n equal to 1; the sixth bottleneck structure layer is configured with a step length s of 1 and a repetition number n of 2.

For a bottleneck structure layer configured with different preset step sizes, when the bottleneck structure layer is configured with a step size s equal to 1, referring to fig. 3, the bottleneck structure includes a first convolution layer, a depth convolution layer, a second convolution layer, an SE-Net (szeeze and Excitation Network) layer, and an add calculation (add) layer, which are sequentially arranged. Wherein, the convolution kernel of the first convolution layer is 1 × 1, and a PReLU (parameter corrected Linear Unit) activation function is used for activation; the convolution kernel of the depth convolution layer is 3 x 3, and a PReLU activation function is used for activation; the convolution kernel of the second convolution layer is 1 x 1 and is activated using a linear activation function. Inputting the initial input parameters into the first convolution layer for convolution processing; inputting the output result of the first convolution layer into the depth convolution layer for convolution processing; the output result of the depth convolution layer is input into a second convolution layer for convolution processing; the output result of the second convolution layer is input into the backlog excitation network layer to carry out channel weight distribution processing; and inputting the output result of the backlog excitation network layer and the initial input parameters into the summation calculation layer for calculation to obtain the final output result of the bottleneck structure layer.

When the bottleneck structure layer is configured to have a step length s of 2, referring to fig. 4, the bottleneck structure includes a first convolution layer, a depth convolution layer, a second convolution layer, and a backlog excitation network layer, which are sequentially disposed. The convolution kernel of the first convolution layer is 1 × 1, and a PReLU activation function is used for activation; the convolution kernel of the depth convolution layer is 3 x 3, the PReLU activation function is used for activation, and the step size stride is configured to be 2; the convolution kernel of the second convolution layer is 1 x 1 and is activated using a linear activation function.

Configuring the step length s of a first bottleneck structure layer to be 2, the repetition time n to be 1, and configuring the step length s of a second bottleneck structure layer to be 1, and the repetition time n to be 4; and the bottleneck structure layer with the configuration step length s being 1 adopts a residual error structure, and the bottleneck structure layer with the configuration step length s being 2 does not use the residual error structure. Therefore, when the second bottleneck structure layer is repeatedly operated for multiple times, the residual error structure is repeatedly used, the problem of gradient dispersion caused by deepening of the layer number of the neural network can be effectively solved, and model learning and convergence are facilitated.

In addition, by modifying the structure of the bottleneck structure, a backlog excitation (SE block) network layer is added, and the possibility that the importance of each channel is different can be effectively considered. The feature representation capability of each channel can be increased by adding an importance weight to each channel and then multiplying by the original value of each channel. The defect that the structure in the prior art scheme considers that the importance of each channel is the same is avoided. For the backlog excitation network layer, an initial feature map is input, a vector with the output of 1 x C is used as the importance weight of each channel, and the network can automatically learn the importance of each channel in the training process, so that the feature extraction and expression capacity of the network is enhanced, and the performance of the model is improved.

Step S24, performing convolution processing on the second convolution result by using the continuously arranged convolution layer and linear global depth convolution layer to obtain a third convolution result;

and step S25, performing full-connection processing on the third convolution result by using a full-connection layer to obtain the face feature data corresponding to the sample data.

In the present exemplary embodiment, referring to fig. 2, the convolution kernel of the convolution layer in step S24 is 1 × 1. The convolution kernel of the linear global depth convolution layer is 7 × 7. The last layer is set as a fully connected layer and the final output is a 128-dimensional vector. By setting the last layer as the fully connected layer, the output result of the linear global depth convolution layer can be reduced in dimension, and a small amount of computation is kept. And, through actual operation verification, the accuracy of the model is effectively improved.

In addition, in the present exemplary embodiment, a normalization processing layer may be further disposed after the above-mentioned improved mobile face recognition based network feature extraction model. For example, a standardized processing layer based on the L2 paradigm is used.

After the sample data is subjected to feature extraction by using the improved feature extraction model to obtain training face feature data corresponding to each sample data, the training face feature data is subjected to standardization processing by using an L2 paradigm, so that standardized final face features (embedding) are obtained.

For example, the formula of the L2 paradigm may include:

where x is the element of the feature extraction model output vector and K is the length of the vector, as in the above embodiment, K is 128.

In the exemplary embodiment, in the process of model training, after the face features are obtained, the face features may be input into the ArcFace Loss function model to calculate the Loss of the model. Specifically, the formula of the ArcFace Loss function may include:

wherein L is total loss, N is sample number, N is category number, s and m are hyper-parameters, and theta is an included angle between the face feature and each category weight. In the present exemplary embodiment, the configuration s is 64 and m is 0.5.

After the total loss is obtained, according to a back propagation algorithm, the loss can be transmitted to an embedding layer and then transmitted to a feature extraction model based on a mobile face recognition network. Optimizing the model by using an Adam optimization algorithm, and setting the initial learning rate to be 0.1; and then gradually decreasing according to the training data and the training steps to finally obtain an improved feature extraction model based on the mobile face recognition network, which can accurately recognize the face in real time. The feature extraction model may be run in an intelligent mobile terminal setting.

After the feature extraction model based on the mobile face recognition network is trained, the detected face image corresponding to the image to be processed can be input into the feature extraction model, and corresponding feature data is extracted.

For example, when the trained feature extraction model is used to extract features of a face image corresponding to an image to be processed, the aligned face image may be input into the feature extraction model, and the feature vector of the face image is finally output by sequentially processing the convolution layer, the depth convolution layer, the six continuously arranged bottleneck structure layers, the convolution layer, the linear global depth convolution layer, and the full-connection layer of the model, and performing normalization processing.

And step S14, inputting the feature data into a first full-connection layer and a second full-connection layer which are continuously arranged for convolution processing so as to obtain the face quality score of the face image.

In this exemplary embodiment, the labeling of the face image quality score may be performed in advance. Specifically, a standard face image of each object may be selected as a reference image, cosine similarity between other face images of the object and the reference object is calculated, and a value of the similarity is used as a quality score of the face image.

When the performance of the feature extraction model is enough, the similarity and the face quality are evaluated in a proportional relationship, the reference image is used as a high-quality image, and when other face images of the same person are compared with the reference image, the higher the quality of the image is, the higher the similarity is, and conversely, if the similarity is lower, the poorer the quality of the face image is.

In the present exemplary embodiment, after the feature extraction model, two fully-connected layers may be provided as the quality evaluation model after the normalization processing layer described above. Specifically, the number of neurons of the first fully-connected layer can be configured to be one half of the dimension of a human face feature (embedding), and the activation function is a relu activation function; the number of the neurons of the second full-connection layer is 1, the activation function is a sigmoid activation function, the output is a face quality score between 0 and 1, and therefore the face feature space is mapped to a face quality score space.

And performing supervised training on the quality evaluation model based on the face image scoring and labeling result as a training sample. The loss function of the quality evaluation model may adopt an MSE (mean-square error) loss function, and the formula may include:

wherein the content of the first and second substances,

face quality score value, y, for model prediction_iThe value is scored for the face quality of the annotation.

After MES loss is calculated, a loss function can be transferred to a full-connected layer according to a back propagation algorithm, two full-connected layers of a quality evaluation model are optimized by using an Adam algorithm, the initial learning rate is set to be 0.01, and then the loss function is gradually decreased according to training data and training steps. After the optimization is completed, for any face image, the quality evaluation model can be used for obtaining the corresponding face quality score.

In this exemplary embodiment, for the above feature extraction model and quality evaluation model based on the mobile face recognition network, the network weight of the feature extraction model may be fixed and unchanged in the training process.

Referring to fig. 5, the method provided by the embodiment of the present disclosure forms a complete human face image quality evaluation model by adding two full connection layers after the feature extraction model, and performs human face quality scoring by using the two full connection layers, so that the human face feature extraction and the human face quality scoring of the image are completed in the same network, and the performance and the universality of the model can be fully ensured. In addition, the feature extraction model is constructed based on a mobile face recognition network model, and the specific processing process of feature extraction is improved by modifying the structure of the model, modifying the configuration of the bottleneck structural layer and modifying the specific structure of the bottleneck structural layer, so that the feature extraction model is smaller in magnitude, higher in precision and higher in speed; the size and the running time of the model can be ensured to meet the requirement of deployment at a mobile terminal, and the quality of the human face image at the mobile terminal is accurately evaluated in real time. The model can be applied to a face classification system of mobile equipment such as a smart phone and a tablet personal computer, for example, an image with high face quality is selected from a photo sequence and input into a face recognition system, so that the efficiency and the performance of the face recognition system can be obviously improved; or the human face quality evaluation model is applied to the functions of camera snapshot, continuous shooting and the like, and can be used for more conveniently helping a user select a satisfactory photo and the like.

Further, referring to fig. 6, in an embodiment of the present example, a training method of a feature extraction model is also provided. Referring to fig. 6, the above-mentioned face image quality evaluation method may include the steps of:

step S31, responding to the image processing instruction of the image service system, and acquiring a sample image containing a human face;

step S32, inputting the sample image into continuously set convolution layer and depth convolution layer to carry out continuous convolution processing to obtain a first convolution result;

step S33, inputting the first convolution result into n continuously arranged bottleneck structure layers for continuous convolution processing to obtain a second convolution result; wherein n is greater than 5 and is a positive integer;

step S34, performing convolution processing on the second convolution result by using the continuously arranged convolution layer and linear global depth convolution layer to obtain a third convolution result;

step S35, performing full-connection processing on the third convolution result by using a full-connection layer to obtain face feature data corresponding to the sample image;

and step S36, inputting the face feature data into a loss function model to calculate loss parameters, and optimizing based on the loss parameters to iteratively train a feature extraction model.

For example, the image service system may be a service system for processing a face recognition task; for example, a service system for station arrival recognition, a service system for processing monitoring images, an access control system, and the like. The present disclosure does not specifically limit the specific contents of the service system.

In this exemplary embodiment, after obtaining the face feature data corresponding to the sample image, the method further includes: inputting a scoring model to train the scoring model, including:

step S41, inputting the face feature data into a first full-connection layer and a second full-connection layer which are continuously arranged for processing so as to obtain the face quality score of the sample image;

and step S42, inputting the face quality score into a score loss function to obtain a score loss parameter, and optimizing based on the score loss parameter to iteratively train a score model.

In this exemplary embodiment, among n bottleneck structure layers continuously arranged in the feature extraction model, a preset step length corresponding to an odd-numbered bottleneck structure layer is P, and a preset step length corresponding to an even-numbered bottleneck structure layer is Q; wherein P > Q, and P, Q are all positive integers.

In this example embodiment, the method further comprises: and configuring the execution repetition times of each bottleneck structure layer based on the layer of each bottleneck structure layer in the n continuous bottleneck structure layers.

In this example embodiment, after the first volume result is input into the bottleneck structure layer, the method includes: and performing convolution, depth degree convolution, convolution and channel weight distribution processing on the first convolution result in sequence by utilizing the first convolution layer, the depth convolution layer, the second convolution layer and the extrusion excitation network layer arranged on the bottleneck structure layer to obtain a second convolution result.

The specific training process of the training method of the feature extraction model is described in detail in the above-mentioned human face image quality evaluation method, and is not repeated in this embodiment.

It is to be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the method according to an exemplary embodiment of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Further, as shown in fig. 7, the embodiment of the present example further provides a face image quality evaluation device 70, including: the system comprises a to-be-processed image acquisition module 701, a face image extraction module 702, a face feature data extraction module 703 and a face quality scoring module 704. Wherein:

the to-be-processed image acquisition module 701 may be configured to acquire an to-be-processed image including a human face.

The facial image extraction module 702 may be configured to detect the image to be processed to obtain a corresponding facial image.

The facial feature data extraction module 703 may be configured to input the facial image into a trained feature extraction model based on a mobile face recognition network, and perform feature extraction on the facial image to obtain feature data.

The face quality scoring module 704 may be configured to input the feature data into a first full-link layer and a second full-link layer that are continuously arranged, and perform convolution processing to obtain a face quality score of the face image.

In an example of the present disclosure, the to-be-processed image obtaining module 701 may include: a face region recognition module, a key point detection module, and an alignment processing module (not shown in the figure). Wherein the content of the first and second substances,

the face region identification module may be configured to perform face detection on the image to be processed to obtain a face region.

The key point detection module may be configured to perform face key point detection on the face region to obtain key points of the face region.

The alignment processing module may be configured to perform alignment processing on the face region based on the key point of the face region, so as to obtain a face image after the alignment processing.

In one example of the present disclosure, the apparatus further comprises: a normalization processing module (not shown in the figure). Wherein the content of the first and second substances,

the normalization processing module may be configured to normalize the feature data to obtain normalized feature data.

In one example of the present disclosure, the apparatus further comprises: the feature extraction model training module comprises: a raw data processing unit, a first convolution processing unit, a bottleneck structure processing unit, a second convolution processing unit and a full connection processing unit (not shown in the figure). Wherein the content of the first and second substances,

the raw data processing unit may be configured to obtain raw data and pre-process the raw data to obtain sample data.

The first convolution processing unit may be configured to input the sample data into a convolution layer and a depth convolution layer that are continuously set, and perform continuous convolution processing to obtain a first convolution result.

The bottleneck structure processing unit may be configured to input the first convolution result into n continuously arranged bottleneck structure layers for continuous convolution processing to obtain a second convolution result; wherein n >5 and is a positive integer.

The second convolution processing unit may be configured to perform convolution processing on the second convolution result using the continuously arranged convolutional layers and the linear global depth convolutional layer to obtain a third convolution result.

The full-connection processing unit may be configured to perform full-connection processing on the third convolution result by using a full-connection layer to obtain the face feature data corresponding to the sample data.

In one example of the present disclosure, the apparatus further comprises: and a step size configuration module.

The step length configuration module can be used for setting the preset step length corresponding to the odd-numbered bottleneck structure layers as P and the preset step length corresponding to the even-numbered bottleneck structure layers as Q in n bottleneck structure layers continuously arranged in the feature extraction model; wherein P > Q, and P, Q are all positive integers.

In one example of the present disclosure, the apparatus further comprises: and a repetition number configuration module.

The repetition number configuration module may be configured to configure the execution repetition number of each bottleneck structure layer based on a level of each bottleneck structure layer in the n consecutive bottleneck structure layers.

In an example of the present disclosure, the bottleneck structure layer may utilize a first convolution layer, a depth convolution layer, a second convolution layer, and a squeeze excitation network layer, which are provided by the bottleneck structure layer, to sequentially perform convolution, depth degree convolution, and channel weight allocation processing on the first convolution result to obtain a second convolution result.

Further, referring to fig. 8, an embodiment of the present example further provides a training apparatus 80 for a feature extraction model, including: a sample data acquisition module 801, a first convolution result generation module 802, a second convolution result generation module 803, a third convolution result generation module 804, a face feature data generation module 805, and an iterative training module 806. Wherein the content of the first and second substances,

the sample data acquiring module 801 may be configured to respond to an image processing instruction of an image service system to acquire a sample image including a human face.

The first convolution result generation module 802 may be configured to perform continuous convolution processing on the sample image input by the continuously arranged convolution layer and the depth convolution layer to obtain a first convolution result.

The second convolution result generation module 803 may be configured to input the first convolution result into n bottleneck structure layers that are consecutively arranged for consecutive convolution processing to obtain a second convolution result; wherein n >5 and is a positive integer.

The third convolution result generation module 804 may be configured to perform convolution processing on the second convolution result by using the continuously arranged convolutional layers and the linear global depth convolutional layers to obtain a third convolution result.

The facial feature data generation module 805 may be configured to perform full-join processing on the third convolution result by using a full-join layer to obtain facial feature data corresponding to the sample image.

The iterative training module 806 may be configured to input the facial feature data into a loss function model to calculate a loss parameter, and perform optimization based on the loss parameter to iteratively train a feature extraction model.

In an example of the present disclosure, the apparatus 80 may further include: and the scoring model training module is used for inputting the face characteristic data corresponding to the sample image into the scoring model so as to train the scoring model. The scoring model training module may include: a scoring unit and an iterative training module. Wherein the content of the first and second substances,

the scoring unit may be configured to input the facial feature data into a first full-link layer and a second full-link layer that are continuously arranged, and process the facial feature data to obtain a facial quality score of the sample image;

the iterative training unit may be configured to input the face quality score into a score loss function to obtain a score loss parameter, and perform optimization based on the score loss parameter to iteratively train a score model.

In an example of the present disclosure, among n bottleneck structure layers continuously arranged in the feature extraction model, a preset step length corresponding to an odd-numbered bottleneck structure layer is P, and a preset step length corresponding to an even-numbered bottleneck structure layer is Q; wherein P > Q, and P, Q are all positive integers.

In one example of the present disclosure, the apparatus 80 may further include: and a repetition number configuration module.

Further, referring to fig. 9, an embodiment of the present example further provides an image processing system 900, including: a business module 901, an image processing module 902 and a model training module 903. Wherein the content of the first and second substances,

the service module 901 may be configured to obtain an image to be processed.

The image processing module 902 may be configured to respond to a service processing instruction sent by the service module to execute a face image quality evaluation method, so as to obtain a scoring result of the image to be processed.

The model training module 903 may be configured to respond to an image processing instruction issued by the service module to execute a training method of a feature extraction model, so as to obtain the feature extraction model.

For example, the service module may be related service application of an application scenario such as a monitoring system, a security check system, or an access control system. The service module can acquire and store the image to be processed containing the face in real time.

The details of each module in the above facial image quality evaluation device and the training device of the feature extraction model have been described in detail in the corresponding facial image quality evaluation method and the training method of the feature extraction model, and therefore are not described herein again.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Fig. 10 illustrates a schematic block diagram of a computer system suitable for use in implementing a wireless communication device of an embodiment of the present invention.

It should be noted that the computer system 1000 of the electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiment of the present invention.

As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU)1001 that can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for system operation are also stored. The CPU1001, ROM 1002, and RAM 1003 are connected to each other via a bus 1004. An Input/Output (I/O) interface 1005 is also connected to the bus 1004.

The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 1004 including a hard disk and the like; and a communication section 1009 including a network interface card such as a LAN (Local area network) card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.

In particular, according to an embodiment of the present invention, the processes described below with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. When the computer program is executed by a Central Processing Unit (CPU)1001, various functions defined in the system of the present application are executed.

It should be noted that the computer readable medium shown in the embodiment of the present invention may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 1.

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims

1. A method for evaluating the quality of a face image is characterized by comprising the following steps:

acquiring an image to be processed containing a human face;

detecting the image to be processed to obtain a corresponding face image;

and inputting the characteristic data into a first full-connection layer and a second full-connection layer which are continuously arranged for processing so as to obtain the face quality score of the face image.

2. The method according to claim 1, wherein the preprocessing the image to be processed to obtain a corresponding face image comprises:

carrying out face detection on the image to be processed to obtain a face area;

performing face key point detection on the face region to acquire key points of the face region;

and aligning the face region based on the key points of the face region to obtain a face image after alignment.

3. The method of claim 1, wherein after obtaining the characterization data, the method further comprises:

and carrying out standardization processing on the characteristic data to obtain the characteristic data after the standardization processing.

4. The method of claim 1, further comprising: pre-training the feature extraction model based on the mobile face recognition network, including:

acquiring original data, and preprocessing the original data to acquire sample data;

inputting the sample data into a continuously set convolution layer and a depth convolution layer to carry out continuous convolution processing so as to obtain a first convolution result;

and carrying out full-connection processing on the third convolution result by using a full-connection layer to obtain the face feature data corresponding to the sample data.

5. The method according to claim 4, wherein among the n bottleneck structure layers continuously arranged in the feature extraction model, the preset step length corresponding to the odd bottleneck structure layers is P, and the preset step length corresponding to the even bottleneck structure layers is Q; wherein P > Q, and P, Q are all positive integers.

6. The method according to claim 4 or 5, characterized in that the method further comprises:

and configuring the execution repetition times of each bottleneck structure layer based on the layer of each bottleneck structure layer in the n continuous bottleneck structure layers.

7. The method of claim 4, wherein after inputting the first convolution result into the bottleneck structure layer, the method comprises:

and performing convolution, depth degree convolution, convolution and channel weight distribution processing on the first convolution result in sequence by utilizing the first convolution layer, the depth convolution layer, the second convolution layer and the extrusion excitation network layer arranged on the bottleneck structure layer to obtain a second convolution result.

8. A training method of a feature extraction model is characterized by comprising the following steps:

9. The method of claim 8, wherein after the obtaining of the face feature data corresponding to the sample image, the method further comprises: inputting a scoring model to train the scoring model, including:

inputting the face feature data into a first full-connection layer and a second full-connection layer which are continuously arranged for processing so as to obtain a face quality score of the sample image;

and inputting the face quality score into a score loss function to obtain a score loss parameter, and optimizing based on the score loss parameter to iteratively train a score model.

10. The method according to claim 8, wherein among the n bottleneck structure layers continuously arranged in the feature extraction model, the preset step length corresponding to the odd bottleneck structure layers is P, and the preset step length corresponding to the even bottleneck structure layers is Q; wherein P > Q, and P, Q are all positive integers.

11. The method according to claim 8 or 9, characterized in that the method further comprises:

12. The method of claim 8, wherein after inputting the first convolution result into the bottleneck structure layer, the method comprises:

13. A face image quality evaluation device is characterized by comprising:

the face image acquisition module is used for detecting the image to be processed to acquire a corresponding face image;

14. A training device for a feature extraction model, comprising:

15. An image processing system, comprising:

the service module is used for acquiring an image to be processed;

the image processing module is used for responding to a service processing instruction sent by the service module to execute the human face image quality evaluation method according to any one of claims 1 to 7 so as to obtain a grading result of the image to be processed.

16. The system of claim 15, further comprising:

a model training module, configured to respond to an image processing instruction issued by the business module to execute the training method of the feature extraction model according to any one of claims 8 to 12, so as to obtain the feature extraction model.

17. A computer-readable medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the face image quality evaluation method according to any one of claims 1 to 7; alternatively, a method of training a feature extraction model as claimed in any one of claims 8 to 12.

18. A wireless communication terminal, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the face image quality evaluation method according to any one of claims 1 to 7; alternatively, a method of training a feature extraction model as claimed in any one of claims 8 to 12.