CN110197146B

CN110197146B - Face image analysis method based on deep learning, electronic device and storage medium

Info

Publication number: CN110197146B
Application number: CN201910432222.3A
Authority: CN
Inventors: 张一帆; 邢斌; 张颖; 万正勇; 沈志勇
Original assignee: China Merchants Finance Technology Co Ltd
Current assignee: China Merchants Finance Technology Co Ltd
Priority date: 2019-05-23
Filing date: 2019-05-23
Publication date: 2021-02-23
Anticipated expiration: 2039-05-23
Also published as: CN110197146A

Abstract

The invention relates to a face image analysis method based on deep learning, an electronic device and a readable storage medium, wherein the method comprises the following steps: acquiring a human face picture to be analyzed, and calling a human face multi-attribute detection model to identify the human face picture to obtain a first preset type of human face attribute characteristics; calling a face angle discrimination model to identify the face picture to obtain a second preset type of face attribute features; calculating the facial image characteristics of a third preset type of the facial image by using a preset image processing rule; and respectively converting the recognized human face attribute characteristics of each first preset type, the recognized human face attribute characteristics of each second preset type and the recognized human face image characteristics of each third preset type into corresponding picture quality parameter values, and substituting the picture quality parameter values into a predetermined picture quality comprehensive parameter calculation formula to calculate the image quality comprehensive parameter values corresponding to the human face picture. The invention can comprehensively and accurately analyze the quality of the face image.

Description

Face image analysis method based on deep learning, electronic device and storage medium

Technical Field

The invention relates to the technical field of face recognition, in particular to a face image analysis method based on deep learning, an electronic device and a readable storage medium.

Background

Currently, face recognition technology is increasingly widely applied to various scenes in life, including applications in the fields of security, trade, and economy, such as criminal investigation, certificate verification, and video surveillance.

In recent years, face recognition technology has been advanced considerably, but in practical applications, the recognition rate and accuracy of face recognition are reduced significantly in the case of poor face image quality. The low-quality images entering the face recognition system can increase the calculation load of the system and reduce the face recognition accuracy. Therefore, it is urgently needed to provide a method for performing accurate quality analysis on a face image entering a face recognition system.

Disclosure of Invention

The invention aims to provide a face image analysis method based on deep learning, an electronic device and a readable storage medium, and aims to perform accurate quality analysis on a face image.

In order to achieve the above object, the present invention provides a method for analyzing a face image based on deep learning, which comprises:

acquiring a face picture to be analyzed, and calling a pre-trained face multi-attribute detection model to identify the face picture so as to identify a plurality of first preset types of face attribute features;

calling a pre-trained face angle discrimination model to identify the face picture so as to identify a plurality of face attribute features of a second preset type;

calculating a plurality of third preset types of human face image features of the human face image by using a preset image processing rule;

respectively converting the recognized face attribute characteristics of each first preset type, the recognized face attribute characteristics of the second preset type and the recognized face image characteristics of the third preset type into corresponding picture quality parameter values according to the mapping relation among the predetermined face attribute characteristics, the predetermined face image characteristics and the predetermined picture quality parameter values, and substituting the picture quality parameter values corresponding to the face attribute characteristics of each first preset type, the recognized face attribute characteristics of the second preset type and the recognized face image characteristics of the third preset type into a predetermined picture quality comprehensive parameter calculation formula so as to calculate the picture quality comprehensive parameter values corresponding to the face picture.

Preferably, the method further comprises the following steps:

and if the image quality comprehensive parameter value corresponding to the face picture is greater than or equal to a preset threshold value, determining that the quality of the face picture is qualified, and inputting the face picture into a preset face recognition system for face recognition.

Preferably, the first preset type of face attribute features include attribute features of blur, expression, illumination, occlusion and/or posture of a face, the face multi-attribute detection model is a convolutional neural network model, and the training step of the face multi-attribute detection model includes:

acquiring a preset number of face image samples in a preset first database;

respectively marking fuzzy, expression, illumination, shielding and/or posture attribute characteristics of a corresponding face for each face image sample;

dividing the marked face image sample into a training set with a first proportion and a verification set with a second proportion; training the face multi-attribute detection model by using the face image samples in the training set to obtain a trained face multi-attribute detection model, and verifying the accuracy of the trained face multi-attribute detection model by using the face image samples in the verification set; and

if the accuracy is greater than or equal to a preset threshold, finishing the training, or if the accuracy is less than the preset threshold, increasing the number of the face image samples, and re-executing the steps.

Preferably, the face multi-attribute detection model comprises a multitask convolutional neural network (MTCNN) and a first additional structure; the MTCNN comprises a PNet layer network structure, an RNet layer network structure and an ONet layer network structure, the first additional structure comprises two fully-connected layers, and each fully-connected layer of the first additional structure corresponds to an activation function; the method comprises the following steps of obtaining a human face picture to be analyzed, calling a pre-trained human face multi-attribute detection model to identify the human face picture, and identifying a plurality of human face attribute features of a first preset type:

the method comprises the steps of obtaining a face picture to be analyzed, recognizing the face picture by utilizing a PNet, RNet and ONet three-layer network structure in the MTCNN, and outputting face features, wherein the face features pass through two full-connection layers of the first additional structure and output a plurality of face attribute features of a first preset type.

Preferably, the second preset type of face attribute features include a pitch angle, a roll angle and/or a cruise angle, the face angle discrimination model is a convolutional neural network model, and the training step of the face angle discrimination model includes:

acquiring a preset number of face image samples in a preset second database;

respectively marking a pitch angle, a roll angle and/or a cruise angle of a corresponding face for each face image sample;

dividing the marked face image sample into a training set with a first proportion and a verification set with a second proportion;

training the face angle discrimination model by using the face image samples in the training set to obtain a trained face angle discrimination model, and verifying the accuracy of the trained face angle discrimination model by using the face image samples in the verification set; and

Preferably, the face angle discrimination model includes a MobileNet V2 network and a second additional structure, where the second additional structure includes a convolution layer, a pooling layer, and a full-link layer; the step of calling a pre-trained face angle discrimination model to identify the face picture so as to identify a plurality of face attribute features of a second preset type comprises the following steps:

and identifying the face picture by utilizing a MobileNet V2 network, outputting face key point information, performing 1 × 1 convolution operation on the face key point information through the convolution layer of the second additional structure to change output dimensionality, performing batch regularization and average pooling processing on the face key point information through the pooling layer of the second additional structure, and finally outputting a plurality of second preset types of face attribute characteristics through the full-connection layer of the second additional structure.

Preferably, the third preset type of face image features include an illumination value and a blur value of the face picture, and the step of calculating a plurality of third preset type of face image features of the face picture by using a preset image processing rule includes:

converting the face picture into an HSV image by utilizing OpenCV, and averaging V channels in the converted HSV image to be used as an illumination value of the face picture;

and carrying out boundary detection on the face picture by utilizing a Laplacian algorithm, calculating to obtain boundary parameters, and taking the obtained boundary parameters as fuzzy values of the face picture.

Preferably, the predetermined picture quality comprehensive parameter calculation formula is:

f(x)＝(A1B1+A2B2+A3B3+……+AnBn)/(B1+B2+B3+……+Bn)

wherein n is a positive integer greater than 2, a1 and a2 … … An are picture quality parameter values obtained by converting each of first preset type face attribute features, second preset type face attribute features and third preset type face image features, B1 and B2 … … Bn are weights corresponding to each of the preset first preset type face attribute features, second preset type face attribute features and third preset type face image features, and f (x) is a calculated image quality comprehensive parameter value corresponding to the face picture.

In addition, in order to achieve the above object, the present invention further provides an electronic device, which includes a memory and a processor, wherein the memory stores a deep learning based face image analysis system operable on the processor, and when the deep learning based face image analysis system is executed by the processor, the steps of the above deep learning based face image analysis method are implemented.

Further, to achieve the above object, the present invention also provides a computer readable storage medium storing a deep learning based face image analysis system, which is executable by at least one processor to cause the at least one processor to execute the steps of the deep learning based face image analysis method as described above.

The invention provides a face image analysis method based on deep learning, an electronic device and a readable storage medium, which are used for identifying a face picture to be analyzed by calling a pre-trained face multi-attribute detection model and a face angle discrimination model so as to identify a plurality of face attribute characteristics of a first preset type and a second preset type, calculating a plurality of face image characteristics of a third preset type of the face picture by using a preset image processing rule, converting the identified face attribute characteristics of the first preset type, the face attribute characteristics of the second preset type and the face image characteristics of the third preset type into corresponding picture quality parameter values and substituting the picture quality parameter values into a predetermined picture quality comprehensive parameter calculation formula so as to calculate the image quality comprehensive parameter values corresponding to the face picture. The pre-trained face multi-attribute detection model and the pre-trained face angle discrimination model can be used for identifying the face attribute characteristics of a plurality of first preset types and second preset types in the face picture, and the face image characteristics of a plurality of third preset types of the face picture are combined to comprehensively calculate so as to analyze and judge the image quality of the face picture, so that the face attributes of various types are integrated, the judgment indexes are more diverse, and the quality analysis can be more comprehensively and accurately carried out on the face image.

Drawings

FIG. 1 is a schematic view of an operating environment of a deep learning-based face image analysis system according to a preferred embodiment of the present invention;

FIG. 2 is a schematic diagram of a model structure of a multi-attribute face detection model;

FIG. 3 is a schematic diagram of the operation of the face angle discrimination model;

FIG. 4 is a block diagram of a preferred embodiment of a deep learning based facial image analysis system according to the present invention;

fig. 5 is a flowchart illustrating a method for analyzing a face image based on deep learning according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

The invention provides a face image analysis system based on deep learning. Please refer to fig. 1, which is a schematic diagram of an operating environment of a deep learning-based facial image analysis system 10 according to a preferred embodiment of the present invention.

In the present embodiment, the facial image analysis system 10 based on deep learning is installed and operated in the electronic device 1. The electronic device 1 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a command set or stored in advance. The electronic device 1 may be a computer, or may be a single network server, a server group composed of a plurality of network servers, or a cloud composed of a large number of hosts or network servers based on cloud computing, where cloud computing is one of distributed computing and is a super virtual computer composed of a group of loosely coupled computers. The electronic device 1 may be a server, a smart phone, a tablet computer, a portable computer, a desktop computer, or other terminal equipment with storage and operation functions. In one embodiment, when the electronic device 1 is a server, the server may be one or more of a rack server, a blade server, a tower server, or a rack server.

In the present embodiment, the electronic device 1 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13, which may be communicatively connected to each other through a system bus, and the memory 11 stores a deep learning based face image analysis system 10 that may be run on the processor 12. It is noted that fig. 1 only shows the electronic device 1 with components 11-13, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.

The storage 11 includes a memory and at least one type of readable storage medium. The memory provides cache for the operation of the electronic device 1; the readable storage medium may be a non-volatile storage medium such as flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the readable storage medium may be an internal storage unit of the electronic apparatus 1, such as a hard disk of the electronic apparatus 1; in other embodiments, the non-volatile storage medium may also be an external storage device of the electronic apparatus 1, such as a plug-in hard disk provided on the electronic apparatus 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. In this embodiment, the readable storage medium of the memory 11 is generally used for storing an operating system and various types of application software installed in the electronic device 1, for example, the facial image analysis system 10 based on deep learning in an embodiment of the present invention is stored. Further, the memory 11 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 12 is generally used for controlling the overall operation of the electronic apparatus 1, such as performing control and processing related to data interaction or communication with the other devices. In this embodiment, the processor 12 is configured to run the program code stored in the memory 11 or process data, such as the facial image analysis system 10 based on deep learning.

The network interface 13 may comprise a wireless network interface or a wired network interface, and the network interface 13 is generally used for establishing a communication connection between the electronic apparatus 1 and other electronic devices.

The deep learning based facial image analysis system 10 includes at least one computer readable instruction stored in the memory 11, which is executable by the processor 12 to implement embodiments of the present application.

The above-mentioned facial image analysis system 10 based on deep learning is implemented by the processor 12 as follows:

step S1, obtaining a human face picture to be analyzed, calling a pre-trained human face multi-attribute detection model to identify the human face picture, so as to identify a plurality of human face attribute features of a first preset type.

In this embodiment, the face picture may be a to-be-analyzed face picture of a target user sent by a receiving terminal, or after receiving a face picture quality analysis request for the target user sent by the receiving terminal, the receiving terminal is controlled to take a face picture of the target user as the to-be-analyzed face picture, or the to-be-analyzed face picture is extracted from a predetermined database according to a user identifier (such as an identity card number of a user) of the target user in the face picture quality analysis request. The first preset type of face attribute features may include, but are not limited to, blur, expression, illumination, occlusion, and posture attribute features of a face.

And step S2, calling a pre-trained face angle discrimination model to identify the face picture so as to identify a plurality of second preset types of face attribute features. For example, the second preset type of face attribute features include, but are not limited to, face angle features such as pitch angle, roll angle, and cruise angle.

Step S3, calculating a plurality of facial image features of a third preset type of the facial image according to a preset image processing rule. For example, the facial image features of the third preset type may include some characteristics of the picture itself, such as an illumination value, a blur value, and the like of the facial picture.

Step S4, according to the mapping relationship among the predetermined face attribute features, face image features, and picture quality parameter values, respectively converting the recognized face attribute features of each first preset type, face attribute features of the second preset type, and face image features of the third preset type into corresponding picture quality parameter values, and substituting the picture quality parameter values corresponding to the face attribute features of each first preset type, face attribute features of the second preset type, and face image features of the third preset type into a predetermined picture quality comprehensive parameter calculation formula to calculate an image quality comprehensive parameter value corresponding to the face picture.

In the embodiment, a pre-trained face multi-attribute detection model and a face angle discrimination model are called to identify a face picture to be analyzed so as to identify a plurality of face attribute features of a first preset type and a second preset type, a plurality of face image features of a third preset type of the face picture are calculated by using preset image processing rules, and the identified face attribute features of each first preset type, the face attribute features of the second preset type and the face image features of the third preset type are converted into corresponding picture quality parameter values and substituted into a predetermined picture quality comprehensive parameter calculation formula so as to calculate the image quality comprehensive parameter values corresponding to the face picture. The pre-trained face multi-attribute detection model and the pre-trained face angle discrimination model can be used for identifying the face attribute characteristics of a plurality of first preset types and second preset types in the face picture, and the face image characteristics of a plurality of third preset types of the face picture are combined to comprehensively calculate so as to analyze and judge the image quality of the face picture, so that the face attributes of various types are integrated, the judgment indexes are more diverse, and the quality analysis can be more comprehensively and accurately carried out on the face image.

Optionally, when executed by the processor 12, the system 10 for analyzing a face image based on deep learning further implements the following steps:

and if the image quality comprehensive parameter value corresponding to the face picture is greater than or equal to a preset threshold value, determining that the quality of the face picture is qualified, and inputting the face picture into a preset face recognition system for face recognition. If the image quality comprehensive parameter value corresponding to the face picture is smaller than a preset threshold value, which indicates that the image quality of the face picture is low, sending prompt information of low face picture quality to a preset terminal, and/or directly discarding the face picture, namely the face picture cannot be input to a preset face recognition system for face recognition.

In the embodiment, the quality of the face image is comprehensively analyzed before the face recognition system performs face recognition, only the face image with qualified analysis quality is input into the face recognition system for face recognition, and the face image with lower image quality is discarded, so that the recognition error of the face recognition system caused by too low input face image quality is effectively improved, the recognition accuracy of the face recognition system can be improved, the rejected low-quality face image does not need to be subjected to complex feature extraction, the operation load of the face recognition system is reduced, and the working efficiency of the face recognition system is improved.

Optionally, before the face image is identified by calling the pre-trained face multi-attribute detection model, the face multi-attribute detection model is pre-trained. In this embodiment, the first preset type of face attribute features include attribute features of blur, expression, illumination, occlusion, and/or posture of a face, the face multi-attribute detection model is a convolutional neural network model, and the training step of the face multi-attribute detection model includes:

acquiring a preset number of face image samples in a preset first database (such as a WiderFace data set which is a face detection reference data set and comprises 32203 pictures and marks 393703 faces); marking fuzzy, expression, illumination, shielding and/or posture attribute characteristics of a corresponding face for each face image sample; for example, the label corresponding to the face image sample 1 is "no occlusion", the label corresponding to the face image sample 2 is "occluded", and so on. Dividing the marked face image sample into a training set with a first proportion (70%) and a verification set with a second proportion (30%); training the face multi-attribute detection model by using the face image samples in the training set to obtain a trained face multi-attribute detection model, and verifying the accuracy of the trained face multi-attribute detection model by using the face image samples in the verification set; and if the accuracy is greater than or equal to a preset threshold, ending the training, or if the accuracy is less than the preset threshold, increasing the number of the face image samples and re-executing the steps.

In an alternative embodiment, as shown in fig. 2, fig. 2 is a schematic diagram of a model structure of a human face multi-attribute detection model. The face Multi-attribute detection model in this embodiment includes a main structure of a Multi-task convolutional neural network (MTCNN for short) and a first additional structure. The multi-task convolutional neural network MTCNN comprises a PNet layer network structure, a RNet layer network structure and an ONet layer network structure, the first additional structure comprises two full-connection layers, and each full-connection layer of the first additional structure corresponds to one activation function. After a face picture to be analyzed is obtained, a face picture is recognized by utilizing a PNet, RNet and ONet three-layer network structure in a multitask convolutional neural network MTCNN and then face facial features are output, the face facial features output by the multitask convolutional neural network MTCNN pass through two full-connection layers of the first additional structure, and finally face attribute features including fuzzy, expression, illumination, shielding and posture attribute features of the face are output.

As shown in FIG. 2, the main structure of the multitasking convolutional neural network MTCNN consists of 3 neural networks, P-Net, R-Net and O-Net. Before using these networks, the original picture, i.e. the acquired face picture to be analyzed, is first scaled to different sizes to form an "image pyramid", and then the picture of each size is calculated through the neural network P-Net. The reason for this is that: the human faces in the original picture have different sizes, and some human faces are larger and some human faces are smaller. For a small face, detection can be carried out on the amplified picture; for a relatively large face, detection can be performed on the reduced picture. Thus, the human face can be detected under the same scale.

The first network P-Net inputs a RGB image with 12 pixels width and height and 3 channels, and determines whether the 12 × 12 image contains a face and gives a face frame and a key point position. In actual calculation, the P-Net performs face detection on each 12 × 12 region in the image. The sizes of the face frames are different, except for the influence of frame regression, the main reason is that each scale in the image pyramid is calculated once by using P-Net, so the face frames with different sizes are formed. The P-Net calculation is also relatively coarse, so that R-Net is used to further optimize.

The R-Net input image is 24 × 24 × 3, that is, R-Net determines whether the 24 × 24 × 3 image contains a human face, and predicts the position of a key point. The output of R-Net and P-Net is the same, and it is composed of face discrimination, frame regression and key point position prediction. In practical calculation, the output of each P-Net, which can be a human face, is scaled to a size of 24 x 24 and then input into R-Net for further judgment. It is clear that R-Net eliminates many false positives in P-Net.

Further scaling all the resulting regions to 48 x 48 size, inputting to the final O-Net, the structure of the O-Net is similar to that of the P-Net, except that its input is a 48 x 3 image, and the number of channels and layers of the network is larger. From P-Net to R-Net and finally to O-Net, the network input pictures are larger and larger, the number of channels of the convolutional layers is larger and larger, the number of internal layers is larger and larger, and therefore the accuracy rate of face recognition is higher and higher. Meanwhile, the P-Net operation speed is the fastest, the R-Net operation speed is the second, and the O-Net operation speed is the slowest. Three networks are used because if O-Net is initially used directly on the original picture, i.e. on each region of the acquired picture of the face to be analyzed, the speed is very slow. Actually, P-Net filters once, and then delivers the filtered result to R-Net for filtering, and finally delivers the filtered result to O-Net with the best effect but a slower speed for discrimination. Therefore, the number of the needed judgment is reduced in advance in each step, and the processing time is effectively reduced.

As shown in fig. 2, in the embodiment, the face multi-attribute detection model extracts features extracted by O-Net in the MTCNN network, then passes through two full connection layers in the first additional structure, and finally outputs a plurality of first preset types of face attribute features, that is, a prediction probability result: blur, expression, lighting, occlusion, and gesture.

Optionally, the face angle discrimination model is trained in advance before the face image is identified by calling the pre-trained face angle discrimination model. In this embodiment, the second preset type of face attribute features include a pitch angle, a roll angle, and/or a cruise angle, the face angle discrimination model is a convolutional neural network model, and the training of the face angle discrimination model includes:

acquiring a preset number of facial image samples in a preset second database (for example, an AFLW facial data set which is a large-scale facial database containing multiple poses and multiple views and is marked with 21 feature points on each face, wherein the database has a very large information amount and comprises various poses, expressions, illumination and other information, the AFLW facial data set comprises 25000 manually marked facial pictures, 59 percent of which are female and 41 percent of which are male); each face image sample is labeled with a corresponding face angle, for example, the face image sample 1 is labeled as "pitch angle" or "pitch angle 20 degrees", the face image sample 2 is labeled as "roll angle" or "roll angle 45 degrees", the face image sample 3 is labeled as "cruise angle" or "cruise angle 60 degrees", and so on. Dividing the marked face image sample into a training set with a first proportion (70%) and a verification set with a second proportion (30%); training the face angle discrimination model by using the face image samples in the training set to obtain a trained face angle discrimination model, and verifying the accuracy of the trained face angle discrimination model by using the face image samples in the verification set; and if the accuracy is greater than or equal to a preset threshold, ending the training, or if the accuracy is less than the preset threshold, increasing the number of the face image samples and re-executing the steps.

In an optional implementation manner, the face angle discrimination model in this embodiment includes a main model structure of the MobileNet V2 network and a second additional structure, where the second additional structure includes a convolution layer, a pooling layer, and a full-link layer. After a face picture to be analyzed is acquired, identifying the face picture by using a MobileNet V2 network, and outputting face key point information, wherein the face key point information is subjected to 1 × 1 convolution operation by the convolution layer of the second additional structure to change output dimensionality, is subjected to batch regularization and average pooling processing by the pooling layer of the second additional structure, and finally is output with a plurality of second preset types of face attribute features by the full-connection layer of the second additional structure, for example, three angles of final prediction are output: pitch angle, roll angle, and cruise angle.

Among them, MobileNet was pioneered by Google, which is an efficient network architecture allowing direct construction of very small, low-latency models that easily meet embedded device requirements by two hyper-parameters. The MobileNet V2 is an improved version of MobileNet V1, and MobileNet V1 mainly introduces depthwise separable coupling to replace the traditional convolution operation, which is equivalent to realizing the decoupling between spatial and channel, achieving the purpose of model acceleration, and the whole network structure also continues the characteristic that the VGG network is straight up and down. There are two major improvements of MobileNet V2 over MobileNet V1: linear Bottlenecks, that is, the non-Linear active layer behind the small-dimension output layer is removed in order to ensure the expressive ability of the model. Inverted Residual block, the structure is just opposite to the traditional Residual block in which the dimension is reduced and then expanded, so that short is connected with a feature map with the reduced dimension.

As shown in fig. 3, fig. 3 is a schematic operation diagram of the face angle discrimination model. In this embodiment, when the MobileNet V2 is used to predict the pitch angle, the roll angle, and the cruise angle of the human face, 1 × 1 convolution operation is performed at the tail of MobileNet V2 to change the output dimension, batch regularization and average pooling processing is performed in the middle of the MobileNet V2, and finally, a full-connection layer is connected to directly predict three angles. The learning rate of each layer of the MobileNet V2 network is shown in fig. 3, and the network weights are pre-trained weights. The training set of the network is an AFLW face data set, the data set is a large-scale face database containing multiple poses and multiple views, and each face is labeled with 21 feature points. The database has a very large information amount, and comprises information of various postures, expressions, illumination and the like. The AFLW face data set contains 25000 manually labeled face pictures, of which 59% are female and 41% are male. After training, tests are carried out on a large number of face data sets, and the prediction errors of three angles are [9.7,9.3 and 8.5 ].

Optionally, the third preset type of face image features include an illumination value and a blur value of the face image, and the step of calculating a plurality of third preset type of face image features of the face image by using a preset image processing rule includes:

In this embodiment, an OpenCV library is used to calculate an illumination value and a blur value of the face image, and when calculating brightness, OpenCV is used to convert the face image in an RGB image format into an HSV image, where a V channel in the converted HSV image indicates a degree of brightness of color, that is, the illumination value of a picture can be calculated by averaging the V channels. And when calculating the fuzzy value, calculating the face image by using a Laplacian algorithm. The Laplacian algorithm is used to measure the second derivative of a picture, and can emphasize the region with rapidly changing density in the picture, i.e. the boundary, so that the Laplacian algorithm is commonly used for boundary detection. And performing boundary detection on the face picture by utilizing a Laplacian algorithm, calculating to obtain boundary parameters, and taking the obtained boundary parameters as fuzzy values of the face picture. For example, the boundary parameter may be a variance, and the boundary in a normal picture is relatively clear, so the variance is relatively large; however, the boundary information included in the blurred picture is very small, so the variance is small. Wherein, the full name of OpenCV is: open Source Computer Vision Library. OpenCV is a (open source) based cross-platform computer vision library that can run on Linux, Windows, and Mac OS operating systems. The method is light and efficient, is composed of a series of C functions and a small number of C + + classes, provides interfaces of languages such as Python, Ruby, Matable and the like, and realizes a plurality of general algorithms in the aspects of image processing and computer vision.

Optionally, after all attributes of the face picture (each face attribute feature and the face image feature such as an illumination value and a blur value) are identified, the identified face attribute feature of each first preset type, the face attribute feature of each second preset type and the face image feature of each third preset type are respectively converted into corresponding picture quality parameter values according to a mapping relationship among the predetermined face attribute features, the face image features and the picture quality parameter values, and the converted picture quality parameter values are substituted into a predetermined picture quality comprehensive parameter calculation formula for calculation. For example, in one embodiment, the picture quality comprehensive parameter calculation formula is as follows:

f(x)＝(A1B1+A2B2+A3B3+……+AnBn)/(B1+B2+B3+……+Bn)

wherein n is a positive integer greater than 2, a1 and a2 … … An are picture quality parameter values obtained by converting each attribute, such as each first preset type face attribute feature, each second preset type face attribute feature, and each third preset type face image feature, to be converted into corresponding picture quality parameter values, B1 and B2 … … Bn are weights corresponding to each preset first preset type face attribute feature, each second preset type face attribute feature, and each third preset type face image feature, and f (x) is a weighted average value obtained by final calculation, that is, An image quality comprehensive parameter value corresponding to the face picture.

For example, after all attributes (each face attribute feature, face angle feature, illumination value and blur value) of the face picture are identified, each identified face attribute feature of a first preset type, face attribute feature of a second preset type and face image feature of a third preset type can be converted into a corresponding picture quality parameter value according to a mapping relation among predetermined face attribute features, face image features and picture quality parameter values, that is, each attribute is converted into a corresponding score, and finally, the score converted by each attribute is weighted and averaged to obtain a final image quality comprehensive parameter value, that is, a comprehensive score of the face picture. When each attribute is converted into a corresponding score, the face picture with high quality is generally considered to be the attribute features of normal facial expression, no shielding, normal head posture, moderate illumination, low ambiguity and the like. Therefore, the scoring criteria can be set according to the attribute features, for example, if there is no occlusion, the attribute is scored as 100, otherwise, the attribute is scored as 0. In one embodiment, as shown in table 1 below:

TABLE 1

After the attributes are individually scored, weighted averaging is performed, and when the weighting coefficients are selected, the weighting coefficients can be set according to actual application needs, for example, the weighting coefficients of three angles (a pitch angle, a roll angle, and a cruise angle) can be set to be 2, and the weighting coefficients of other attributes can be set to be 1, because the influence of the three angles on the face recognition system is large. Then, according to the weighting coefficient, the composite score (image quality composite parameter value) of the final face picture, that is, the quality score representing the quality of the face picture can be obtained.

Referring to fig. 4, a functional block diagram of a preferred embodiment of the facial image analysis system 10 based on deep learning in fig. 4 is shown. The deep learning based face image analysis system 10 is segmented into one or more functional modules that are stored in the memory 11 and executed by the processor 12 to accomplish the present invention. As used herein, a "module" refers to a set of computer program instructions capable of performing a specified function. In the present embodiment, the facial image analysis system 10 based on deep learning is divided into: a first identification module 100, a second identification module 110, a first calculation module 120, a second calculation module 130. It should be understood that: in this embodiment, the facial image analysis system 10 based on deep learning is divided into the first recognition module 100, the second recognition module 110, the first calculation module 120, and the second calculation module 130, only to clearly express the functions that the facial image analysis system 10 based on deep learning can implement, and is not used to limit that the facial image analysis system 10 based on deep learning can only be divided into the first recognition module 100, the second recognition module 110, the first calculation module 120, and the second calculation module 130, or a person skilled in the art needs to divide the facial image analysis system 10 based on deep learning into functional modules different from the present embodiment in other embodiments, which is not described herein again.

The first identification module 100 is configured to: acquiring a face picture to be analyzed, and calling a pre-trained face multi-attribute detection model to identify the face picture so as to identify a plurality of first preset types of face attribute features;

the second identification module 110 is configured to: calling a pre-trained face angle discrimination model to identify the face picture so as to identify a plurality of face attribute features of a second preset type;

the first calculation module 120 is configured to: calculating a plurality of third preset types of human face image features of the human face image by using a preset image processing rule;

the second calculating module 130 is configured to: respectively converting the recognized face attribute characteristics of each first preset type, the recognized face attribute characteristics of the second preset type and the recognized face image characteristics of the third preset type into corresponding picture quality parameter values according to the mapping relation among the predetermined face attribute characteristics, the predetermined face image characteristics and the predetermined picture quality parameter values, and substituting the picture quality parameter values corresponding to the face attribute characteristics of each first preset type, the recognized face attribute characteristics of the second preset type and the recognized face image characteristics of the third preset type into a predetermined picture quality comprehensive parameter calculation formula so as to calculate the picture quality comprehensive parameter values corresponding to the face picture.

The functions or operation steps of the first identifying module 100, the second identifying module 110, the first calculating module 120, the second calculating module 130 and other program modules implemented by the program modules are substantially the same as those of the embodiments, and are not repeated herein.

As shown in fig. 5, fig. 5 is a schematic flow chart of a preferred embodiment of the facial image analysis method based on deep learning of the present invention, and the facial image analysis method based on deep learning includes the following steps:

step S10, obtaining a human face picture to be analyzed, calling a pre-trained human face multi-attribute detection model to identify the human face picture, so as to identify a plurality of human face attribute features of a first preset type.

And step S20, calling a pre-trained face angle discrimination model to identify the face picture so as to identify a plurality of second preset types of face attribute features. For example, the second preset type of face attribute features include, but are not limited to, face angle features such as pitch angle, roll angle, and cruise angle.

Step S30, calculating a plurality of facial image features of a third preset type of the facial image according to a preset image processing rule. For example, the facial image features of the third preset type may include some characteristics of the picture itself, such as an illumination value, a blur value, and the like of the facial picture.

Step S40, according to the mapping relationship among the predetermined face attribute features, face image features, and picture quality parameter values, respectively converting the recognized face attribute features of each first preset type, face attribute features of the second preset type, and face image features of the third preset type into corresponding picture quality parameter values, and substituting the picture quality parameter values corresponding to the face attribute features of each first preset type, face attribute features of the second preset type, and face image features of the third preset type into a predetermined picture quality comprehensive parameter calculation formula to calculate an image quality comprehensive parameter value corresponding to the face picture.

f(x)＝(A1B1+A2B2+A3B3+……+AnBn)/(B1+B2+B3+……+Bn)

For example, after all attributes (each face attribute feature, face angle feature, illumination value and blur value) of the face picture are identified, each identified face attribute feature of a first preset type, face attribute feature of a second preset type and face image feature of a third preset type can be converted into a corresponding picture quality parameter value according to a mapping relation among predetermined face attribute features, face image features and picture quality parameter values, that is, each attribute is converted into a corresponding score, and finally, the score converted by each attribute is weighted and averaged to obtain a final image quality comprehensive parameter value, that is, a comprehensive score of the face picture. When each attribute is converted into a corresponding score, the face picture with high quality is generally considered to be the attribute features of normal facial expression, no shielding, normal head posture, moderate illumination, low ambiguity and the like. Therefore, the scoring criteria can be set according to the attribute features, for example, if there is no occlusion, the attribute is scored as 100, otherwise, the attribute is scored as 0. In one embodiment, as shown in table 2 below:

TABLE 2

Furthermore, the present invention also provides a computer-readable storage medium storing a deep learning based facial image analysis system executable by at least one processor to cause the at least one processor to perform the steps of:

The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the above-mentioned embodiments of the electronic device 1 and the method, and will not be described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The preferred embodiments of the present invention have been described above with reference to the accompanying drawings, and are not to be construed as limiting the scope of the invention. The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Additionally, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

Those skilled in the art can implement the invention in various modifications, such as features from one embodiment can be used in another embodiment to yield yet a further embodiment, without departing from the scope and spirit of the invention. Any modification, equivalent replacement and improvement made within the technical idea of using the present invention should be within the scope of the right of the present invention.

Claims

1. A facial image analysis method based on deep learning is characterized in that the facial image analysis method based on deep learning comprises the following steps:

the method comprises the steps of obtaining a human face picture to be analyzed, calling a pre-trained human face multi-attribute detection model to recognize the human face picture, wherein the human face multi-attribute detection model comprises a multitask convolutional neural network (MTCNN) and a first additional structure, the MTCNN comprises three-layer network structures of a PNet, a RNet and an ONet, the first additional structure comprises two full-connection layers, each full-connection layer of the first additional structure corresponds to an activation function, after the human face picture to be analyzed is obtained, the human face picture is recognized by the three-layer network structures of the PNet, the RNet and the ONet in the MTCNN and then human face features are output, and the human face features pass through the two full-connection layers of the first additional structure and output a plurality of human face attribute features of a first preset type;

2. The method for analyzing a face image based on deep learning of claim 1, further comprising:

3. The method for analyzing facial image based on deep learning of claim 1, wherein the first preset type of facial attribute features includes fuzzy, expression, illumination, occlusion and/or pose attribute features of human face, the facial multi-attribute detection model is a convolutional neural network model, and the training step of the facial multi-attribute detection model includes:

acquiring a preset number of face image samples in a preset first database;

if the accuracy is greater than or equal to the preset threshold, ending the training, or if the accuracy is less than the preset threshold, increasing the number of the face image samples, and returning to the step of obtaining the preset number of face image samples in the preset first database.

4. The method for analyzing facial images based on deep learning of claim 1, wherein the second predetermined type of facial attribute features includes pitch angle, roll angle and/or cruise angle, the facial angle discrimination model is a convolutional neural network model, and the training of the facial angle discrimination model includes:

acquiring a preset number of face image samples in a preset second database;

if the accuracy is greater than or equal to the preset threshold, ending the training, or if the accuracy is less than the preset threshold, increasing the number of the face image samples, and returning to the step of obtaining the preset number of face image samples in the preset second database.

5. The method according to claim 1 or 4, wherein the face angle discrimination model comprises a MobileNet V2 network and a second additional structure, wherein the second additional structure comprises a convolution layer, a pooling layer and a full-link layer; the step of calling a pre-trained face angle discrimination model to identify the face picture so as to identify a plurality of face attribute features of a second preset type comprises the following steps:

6. The method as claimed in claim 1, wherein the third predetermined type of facial image features include an illumination value and a blur value of the facial image, and the step of calculating a plurality of third predetermined types of facial image features of the facial image using the predetermined image processing rule comprises:

7. The method for analyzing facial images based on deep learning of claim 1, wherein the predetermined picture quality comprehensive parameter calculation formula is:

f(x)＝(A1B1+A2B2+A3B3+……+AnBn)/(B1+B2+B3+……+Bn)

8. An electronic device, characterized in that the electronic device comprises a memory and a processor, wherein the memory stores a deep learning based face image analysis system which can run on the processor, and the deep learning based face image analysis system realizes the steps of the deep learning based face image analysis method according to any one of claims 1 to 7 when executed by the processor.

9. A computer-readable storage medium, on which a deep learning based facial image analysis system is stored, which when executed by a processor implements the steps of the deep learning based facial image analysis method according to any one of claims 1 to 7.