CN111241927A

CN111241927A - Cascading type face image optimization method, system and equipment and readable storage medium

Info

Publication number: CN111241927A
Application number: CN201911387764.XA
Authority: CN
Inventors: 朱鹏; 黄自力; 何学智; 林林
Original assignee: Newland Digital Technology Co ltd
Current assignee: Newland Digital Technology Co ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-06-05

Abstract

The invention discloses a cascading human face image optimization method, which carries out human face detection and coordinate estimation through a multitask cascading convolution neural network; filtering pictures with resolution lower than a threshold value; filtering the picture with the facial posture being too askew; performing comprehensive score calculation on the pictures through resolution score, face head posture score, face shielding score, expression score, definition score and illumination intensity score, wherein the picture with the highest comprehensive score is the optimal face; the face image is tracked, an optimal face is pushed, and when the optimal face is updated or the optimal face is not updated in a continuous preset number of frames of the same face and the comprehensive score is larger than the preset score, the optimal face is pushed to the recognition module. Except for the fact that the complete optimization step is needed when the face appears for the first time, unqualified images can be screened out in the cascade midway, the optimization progress is accelerated, and the occupation of memory resources is reduced.

Description

Cascading type face image optimization method, system and equipment and readable storage medium

Technical Field

The invention relates to the technical field of image recognition, in particular to a cascading type face image optimization method, a system, equipment and a readable storage medium.

Background

With the arrival of the AI era, the face recognition technology has been widely applied in various industry fields, such as intelligent transportation, public security management, access control systems, card port systems, electronic passports, public security, bank self-service systems, information security, and the like. The human face recognition system is based on a human face detection technology, the current human face detection efficiency is realized by human face detection at a full frame rate of 30fps and in a full frame, and the maximum number of human face images which can be captured by each frame of video image exceeds 100. If all detected face images are input into face recognition, not only can serious CPU pressure and memory occupation be caused, but also the recognition rate can cause unstable recognition results due to the influence of factors such as the quality of input images, angles, illumination and the like. Therefore, a high-quality face image is input for recognition, and a more accurate and reliable matching result is output. At present, a subjective evaluation method mainly adopted for screening face images scores the images by artificially participating in an evaluation process, so as to judge the quality of the images. However, this method has disadvantages of high labor cost and poor real-time performance, and thus cannot be evaluated in large quantities.

Disclosure of Invention

The invention aims to provide a method, a system, equipment and a readable storage medium for optimizing a cascading type face image, which are small in calculation amount and high in quality.

In order to solve the technical problems, the technical scheme of the invention is as follows:

in a first aspect, the present invention provides a method for optimizing a cascaded face image, including the steps of:

acquiring a video image, and performing face detection and coordinate estimation through a multitask cascade convolution neural network;

calculating the resolution score of the image, and filtering the pictures with the resolution lower than a threshold value;

calculating the score of the human face posture, and filtering the picture with the too skewed human face posture;

calculating a face shielding score, an expression score, a definition score and an illumination intensity score of the image respectively;

performing weighted calculation on the resolution score, the face posture score, the face shielding score, the expression score, the definition score and the illumination intensity score, and taking the picture with the highest total score as an optimal face;

and carrying out face tracking on the face image in the video, and pushing and updating the optimal face.

Preferably, the face pose score calculation process is performed as follows:

acquiring point coordinates of a nose, a left eye, a right eye, a left mouth angle and a right mouth angle of the human face through a multitask cascade convolution neural network;

obtaining a predetermined score when the abscissa of the nose is greater than the left eye abscissa and less than the right eye abscissa;

obtaining a predetermined score when the ordinate of the nose is greater than the ordinate of the eyes and less than the ordinate of the mouth;

the minimum distance from the nose to the connecting line of the left eye and the left mouth corner is a first distance, the minimum distance from the nose to the connecting line of the right eye and the right mouth corner is a second distance, the smaller value of the first distance and the second distance is divided by the width of the face rectangular frame, and when the ratio is greater than a threshold value, a preset score is obtained;

the minimum distance from the nose to the connecting line of the left eye and the right eye is a distance three, the minimum distance from the nose to the connecting line of the left mouth corner and the right mouth corner is a distance four, the smaller value of the distance three and the distance four is divided by the height of the face rectangular frame, and a preset score is obtained when the ratio is greater than a threshold value.

Preferably, when a face appears in a picture in the video, the face is tracked through a KCF target tracking algorithm, and each face picture is filtered, wherein the filtering includes filtering pictures with resolution lower than a threshold value and filtering pictures with too-distorted face pose.

Preferably, when the optimal face is updated or the optimal face is not updated in a continuous preset number of frames of the same face, and the total score of the optimal face is greater than a preset score, the optimal face is pushed to a face recognition module.

Preferably, the sharpness score is calculated by the Tenengrad gradient algorithm.

Preferably, the face occlusion score and the expression score are evaluated and calculated by adopting a deep learning model.

Preferably, the illumination intensity score is calculated by using a mean of gray values of the picture, and the illumination intensity score of the mean of gray values within a threshold range is higher than the illumination intensity score outside the threshold range.

In a second aspect, the present invention further provides a cascading type face image optimization system, including:

a cascade detection module: acquiring a video image, and performing face detection and coordinate estimation through a multitask cascade convolution neural network;

a resolution filtering module: calculating the resolution score of the image, and filtering the pictures with the resolution lower than a threshold value;

the gesture filtering module: calculating the score of the human face posture, and filtering the picture with the too skewed human face posture;

a calculation module: calculating a face shielding score, an expression score, a definition score and an illumination intensity score of the image respectively;

an optimal face module: performing weighted calculation on the resolution score, the face posture score, the face shielding score, the expression score, the definition score and the illumination intensity score, and taking the picture with the highest total score as an optimal face;

a pushing module: and carrying out face tracking on the face image in the video, and pushing and updating the optimal face.

In a third aspect, the present invention provides a cascaded face image optimization apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the cascaded face image optimization method when executing the program.

In a fourth aspect, the present invention provides a readable storage medium for a cascaded face image optimization, on which a computer program is stored, the computer program being executed by a processor to implement the steps of the cascaded face image optimization method described above.

The method adopts a cascade scheme and a tracking scheme, is different from a parallel computing mode in the prior art, except that the complete optimization step is required for the face appearing for the first time, unqualified images can be screened out in advance in the cascade process, the optimization progress is accelerated, and the occupation of memory resources is reduced. In addition, the technical scheme avoids the defects of unstable result and low accuracy rate caused by direct entering recognition without optimizing the quality of the face image in the current face recognition technology. According to the cascading human face image optimization method, the human face image with good quality is provided to the identification end for identification after optimization, and therefore the accuracy and stability of identification can be improved.

Drawings

FIG. 1 is a flowchart of an embodiment of a method for selecting a cascaded face image according to the present invention;

FIG. 2 is a diagram illustrating the results of face detection on public data sets by MTCNN in the present invention;

FIG. 3 is a schematic diagram illustrating an estimation of a pose using five MTCNN points according to the present invention;

FIG. 4 is a flowchart illustrating a preferred method of cascading facial images according to another embodiment of the present invention.

Detailed Description

The following further describes embodiments of the present invention with reference to the drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Referring to fig. 1, the embodiment of the present invention provides a method for optimizing a tandem type face image.

The method comprises the following steps: a segment of real-time or local video is acquired through a standard communication protocol, the video contains face information, and the standard communication protocol can be any one of RTSP/RTP, ONVIF and GB/T28181.

Step two: and performing face detection and five-point coordinate estimation on the image in the video by using an MTCNN (Multi-Task cascaded convolutional neural network) algorithm to obtain all face frame position information and face five-point coordinate information. Refer to fig. 2.

Step three: the resolution of the face image is used as a first evaluation index and is represented by ScoreReso, the faceSize is obtained through the size of a face frame obtained through MTCNN detection, and the calculation formula is as follows:

FaceSize＝FaceWidth*FaceHeight

wherein, FaceWidth and FaceHeight respectively represent the width and height of the face region obtained by MTCNN face detection.

Dividing the image size into three grades, and if the image size is less than 80 x 80, the resolution score ScoreReso is 0; more than 80 by 80 is less than 120 by 120 pixels scorereco is divided into 5, and more than 120 by 120 pixels scorereco is divided into 10. If the resolution score is 0, the next recognition is not carried out, and the graph is directly deleted.

Step four: referring to fig. 3, the pose of the face is used as a second evaluation index, expressed as ScorePose, and the pose information is determined by using five-point coordinates obtained by MTCNN face detection as positional relationships between the nose (x1, y1), the left eye (x2, y2), the right eye (x3, y3), the left mouth angle (x4, y4), and the right mouth angle (x5, y 5). Specific considerations are mainly:

(1) the abscissa of the nose point is larger than the abscissa of the left eye and smaller than the abscissa of the right eye, namely x2< x1< x3, and the scheme is mainly used for excluding the condition of a side face with a left posture and a right posture of 90 degrees;

(2) the ordinate of the nose point is larger than the ordinate of the eyes and smaller than the ordinate of the mouth, namely max (y2, y3) < y1< min (y4, y5), which is mainly used for eliminating the situation of large-angle head lowering and head raising;

(3) the distance between the nose point and the boundary of the connecting line of the left eye corner and the left mouth corner, the distance between the nose point and the boundary of the right mouth corner of the right eye corner, and the minimum of the nose point and the nose point is divided by the width FaceWidht of the face rectangular frame to obtain NoseToLR, the threshold value is set to be 0.1, and the NoseToLR is filtered when the ratio is less than 0.1. The calculation formula according to the point-to-two point connecting line is as follows:

NoseToLR＝min(NosetoLeft，NosetoRight)

(4) the distance between the nose point and the boundary of the connecting line of the two eye points and the distance between the nose point and the boundary of the connecting line of the two mouth corners are obtained, the minimum of the nose point and the nose point is divided by the face rectangular frame height FaceHeight to obtain NosetOTB, the threshold value is set to be 0.2, and the NosetOTB is filtered when the ratio is less than 0.2. The calculation formula is as follows:

Nose ToTB＝min(NosetoTop，NosetoBottom)

it should be noted that, in the coordinate system where the above points are located, the origin is located at the upper left corner of the image.

Score 0 for face pose evaluation filtered by threshold, 5 for each of the above four criteria, and score 20 for pose if all passed. And if the Scorpeose score of the gesture is 0, directly deleting the face image without entering the next recognition step.

Step five: and taking illumination as a third evaluation index, expressing the illumination by ScoreLight, calculating illumination brightness information by adopting a gray level average value, quantizing the brightness value, and dividing the brightness value into 256 levels from 0 to 255, wherein 0 is completely black, and 255 is completely white. The calculation formula of the Gray level average Gray is as follows:

Gray＝0.299Red+0.587Green+0.114Blue

to exclude the cases of over-darkness and over-exposure, it is suggested to take a segmentation score: scorelight score is 3 in the interval of 0-40 and 220-255 and 6 in the interval of 40-80 and 180-220; ScoreLight score is 10 within 80-180.

Step six: the definition is taken as a fourth evaluation index and is expressed by Tenengrad, and the main idea is to investigate the field contrast of the image, namely the gradient difference of the gray features between adjacent pixels in a spatial domain; in the frequency domain, the frequency components of the image are considered, and the high-frequency components of the image which is clearly focused are more and the low-frequency components of the image which is blurred are more. The Tenengrad gradient method is mainly used, and the specific method is as follows:

assuming the Sobel convolution kernel is Gx, Gy, the gradient of image I at point (x, y) is:

the Tenengrad value for this image is defined as:

where n is the total number of pixels in the image. The threshold values in this embodiment are set to 20 and 40, and more than 20 is less than 40 sharpness scores Tenengrad of 5 and more than 40 scores Tenengrad of 10.

Step seven: taking the occlusion and the expression as the fifth and sixth evaluation indexes, namely the Face occlusion score ScoreCover and the expression score ScoreExp respectively, the part is trained by adopting a deep learning model, in the embodiment, a MobileFaceNets lightweight network is used for training, and the data set selects the widget Face. Occlusion is divided into three levels: 0-no shielding, 1-slight shielding and 2-severe shielding, and the expression is divided into: 0-normal, 1-exaggerated.

The score of 0 is set for serious shielding, the score of 10 is set for slight shielding, and the score of 30 is set for no shielding; ordinary expression ScoreExp is 10 points, and exaggerated expression ScoreExp is 5 points

Step eight: and in the second step, after newly added face information appears in the images, the first frame image of the detected face is used as a candidate face of the preferred result, then the face quality evaluation scores in the third step to the seventh step are sequentially calculated, if the comprehensive score is lower than 50, the face is not used as the optimal candidate face image until the preferred score of the first frame face is larger than 50. And tracking the face of the subsequent frame by using the existing tracking technology, such as KCF (Kernel Correlation Filter) target tracking and the like, and calculating the tracked face image of the same target from the third step to the seventh step, wherein the value of 0 in the third step or the fourth step is 0, the next step of calculation is not carried out, the face is directly deleted, and the face quality of the next frame is evaluated. The calculation method of the comprehensive score comprises the following steps:

FaceScores＝ScoreReso+ScorePose+ScoreLight+Tenengrad+ScoreCover+ScoreExp

step nine: setting the time for pushing the optimal result to the back-end face recognition, and when the optimal face is updated or the optimal face is not updated in 15 continuous frames of the same face and the comprehensive score FaceScores are larger than 50, pushing the optimal result to the face recognition under the two conditions.

The embodiment of the invention can avoid the current face recognition technology, and many people can directly enter into recognition without optimizing the quality of face images, so that the obtained result is unstable and the accuracy is low. According to the cascading human face image quality optimization method and the cascading human face image quality optimization system, the human image with good quality is provided to the recognition end for recognition through optimization, and therefore the accuracy and the stability of recognition can be improved. The method adopts a cascade scheme and a tracking scheme, is different from a parallel computing mode in the prior art, except that the complete optimization step is required for the face appearing for the first time, unqualified images can be screened out in advance in the cascade process, the optimization progress is accelerated, and the occupation of memory resources is reduced. Aiming at two conditions of shielding and expression, the quality optimization consideration scope is rarely added in the prior art, a deep learning evaluation scheme is introduced, the face quality optimization scoring items are increased, the consideration is more comprehensive, and the comprehensive scoring result is more reliable. And a push mechanism is set, and the preferred face is pushed to the recognition end at a certain interval or under the condition of updating the preferred result, so that the recognition stability is improved, and the response time is shortened.

Referring to fig. 4, in another aspect, the present invention further provides a tandem type human face image optimization system, including:

a cascade detection module: acquiring an image sequence, and performing face detection and coordinate estimation through a multitask cascade convolution neural network;

a resolution filtering module: calculating the resolution score of the image, and filtering the pictures with the resolution lower than a threshold value; the process of calculating the image resolution score includes:

the size of the detected face frame is the product of the width of the face frame area and the height of the face frame area;

dividing the face frame into a plurality of grades according to the size, wherein the higher the size of the face frame is, the higher the resolution score of the grade image is.

The gesture filtering module: calculating the score of the human face posture, and filtering the picture with the too skewed human face posture; the calculation process of the face pose score comprises the following steps:

the minimum distance from the nose to the connecting line of the left eye and the left mouth corner is a first distance, the minimum distance from the nose to the right mouth corner of the right eye is a second distance, the smaller value of the first distance and the second distance is divided by the width of the face rectangular frame, and a preset score is obtained when the ratio of the first distance to the second distance is greater than a threshold value;

the minimum distance between the nose and the connecting line of the two eye points is a distance three, the minimum distance between the nose and the connecting line of the two mouth corners is a distance four, the height of the face rectangular frame is divided by the four smaller values of the distance three and the distance, and the preset score is obtained when the ratio of the distance three to the distance is greater than the threshold value.

an optimal face module: performing comprehensive score calculation on the pictures through resolution score, face head posture score, face shielding score, expression score, definition score and illumination intensity score, wherein the picture with the highest comprehensive score is the optimal face; wherein the sharpness score is calculated by the Tenengrad gradient algorithm. And calculating the scores of the face shielding score and the expression score by adopting a deep learning model. The illumination intensity score is calculated by using the mean of the gray values of the pictures, and the illumination intensity score of the mean of the gray values within the threshold range is higher than the illumination intensity score outside the threshold range.

A pushing module: the face image is tracked, an optimal face is pushed, and when the optimal face is updated or the optimal face is not updated in a continuous preset number of frames of the same face and the comprehensive score is larger than the preset score, the optimal face is pushed to the recognition module. When the face appears in the pictures in the image sequence, the face is tracked through a KCF target tracking algorithm, and each face picture is filtered.

In still another aspect, the present invention provides a cascaded face image optimization apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the cascaded face image optimization method when executing the computer program. Referring to fig. 4, the method includes:

s10: acquiring a video image, and performing face detection and coordinate estimation through a multitask cascade convolution neural network;

s20: calculating the resolution score of the image, and filtering the pictures with the resolution lower than a threshold value;

s30: calculating the score of the human face posture, and filtering the picture with the too skewed human face posture;

s40: calculating a face shielding score, an expression score, a definition score and an illumination intensity score of the image respectively;

s50: performing weighted calculation on the resolution score, the face posture score, the face shielding score, the expression score, the definition score and the illumination intensity score, and taking the picture with the highest total score as an optimal face;

s60: and carrying out face tracking on the face image in the video, and pushing and updating the optimal face.

In yet another aspect, the present invention provides a readable storage medium for a cascaded face image optimization, on which a computer program is stored, the computer program being executed by a processor to implement the steps of the cascaded face image optimization method described above. Referring to fig. 4, the method includes:

When the gesture is judged, the deep learning method MTCNN is adopted as the basis of face detection, the MTCNN detection outputs the five-point coordinate information of the five sense organs at the same time, the position relation among the five sense organs is utilized to carry out face gesture evaluation and rotation angle calculation, a more accurate result can be obtained, the relative calculation amount is small, and the processing speed is high.

For the judgment of the occlusion and the expression, the traditional image processing method generally cannot obtain reliable results, so a deep learning method is adopted, a large amount of training data is used for training, finally, a multi-attribute model is obtained for output, and a face image is input to obtain the occlusion condition and the expression information of a target.

For multi-frame continuous recognition, a tracking strategy is adopted, when a new face is detected in a certain frame image position, the comprehensive score of the image quality of the face in the continuous frame is calculated, a tracking technology is added, an optimal interval and a score threshold value are set, when a threshold value condition is met, the frame with the optimal face quality of the sequence is pushed to perform face recognition, the recognition response time is shortened, and the recognition accuracy is improved.

The invention has the following beneficial effects: a cascade mode is adopted, multiple factors such as resolution, posture, illumination, definition, shielding, expression and the like are considered in sequence, and the calculation is simplified to be repeated from easy to difficult. When the first-level judgment is unqualified, the face image information is directly deleted, evaluation index calculation in the next stage is not performed, and intermediate calculation amount is greatly reduced.

The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, and the scope of protection is still within the scope of the invention.

Claims

1. A method for optimizing a cascading type face image is characterized by comprising the following steps:

2. The method for optimizing the tandem type face image according to claim 1, wherein: the process of calculating the face pose score comprises the following steps:

3. The method for optimizing the tandem type face image according to claim 1, wherein: when a face appears in the pictures in the video, the face is tracked through a KCF target tracking algorithm, each face picture is filtered, and the filtering comprises filtering the pictures with the resolution ratio lower than a threshold value and filtering the pictures with the face posture being too askew.

4. The method for optimizing the tandem type face image according to claim 1, wherein: and when the optimal face is updated or the optimal face is not updated in a continuous preset number of frames of the same face, and the total score of the optimal face is greater than a preset score, pushing the optimal face to a face recognition module.

5. The method for optimizing the tandem type human face image according to any one of claims 1 to 4, wherein: the sharpness score is calculated by the Tenengrad gradient algorithm.

6. The method for optimizing the tandem type human face image according to any one of claims 1 to 4, wherein: and the face shielding score and the expression score are evaluated and calculated by adopting a deep learning model.

7. The method for optimizing the tandem type human face image according to any one of claims 1 to 4, wherein: the illumination intensity score is calculated by adopting the mean value of the gray values of the pictures, and the illumination intensity score of the mean value of the gray values within the threshold range is higher than the illumination intensity score outside the threshold range.

8. A cascading human face image optimization system is characterized by comprising:

9. A cascaded face image optimization apparatus comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein: the processor, when executing the program, implements the steps of the cascaded face image preferred method of any one of claims 1-7.

10. A preferably readable storage medium of a cascaded face image, having a computer program stored thereon, wherein: the computer program is executed by a processor to perform the steps of the method of any of claims 1 to 7 for implementing a tandem face image optimization method.