CN112115835A

CN112115835A - Face key point-based certificate photo local anomaly detection method

Info

Publication number: CN112115835A
Application number: CN202010952760.8A
Authority: CN
Inventors: 王蒙; 杨飞燕; 宁宏维; 文涛
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2020-09-11
Filing date: 2020-09-11
Publication date: 2020-12-22

Abstract

The invention provides a method for detecting local abnormity of a certificate photo based on face key points, and belongs to the technical field of artificial intelligence AI image detection. The method comprises the steps of judging whether a face is detected or not by adopting a face detection algorithm on a picture, extracting a face region as input, obtaining a face part through classification, and training a regression model to obtain 5 key points of a left eye center, a right eye center, a nose tip, a left mouth corner and a right mouth corner; utilizing a MobileNet fitting regression function to carry out shielding detection, explicitly detecting the shielding state and the shielding abnormal type of the key points of the human face, and when the output of the model exceeds a threshold value, indicating that the points are abnormal; and finally, constructing geometric characteristics by using 5 key points, and judging whether the head posture of the portrait is abnormal or not and the type of abnormal deflection according to a judgment rule. The method and the system automatically audit the certificate photo by non-face abnormity, non-frontal abnormity, eye shielding abnormity, mouth shielding abnormity and nose shielding abnormity, improve audit efficiency of the certificate photo and reduce manpower and material cost.

Description

Face key point-based certificate photo local anomaly detection method

Technical Field

The invention relates to the technical field of artificial intelligence AI image detection and classification, in particular to a method for detecting local abnormity of a certificate photo based on face key points.

Background

The certificate photo is one of basic information for human identification, and is often used in daily life, for example: a certificate photo is usually embedded into a certificate text such as an identity card, a driving license, a passport and the like, or is pasted on an application form such as admission, employment, examination and the like, is used for verifying uniqueness of user data, and has wide application in various aspects of social life. Nowadays, many occasions require detection and identification of identity information, and human face outlines and organs become important bases for identifying a person. In the process of comparing the human face, firstly, the human face area, the eyes, the mouth, the nose and other face organs in the human face area need to be accurately detected and then compared with the acquired human face information, so that the human face and the human face organs in the human face area are important for human visual perception and machine recognition. The basic requirements of the certificate photo are that the face is not covered with a crown, the face is not shielded abnormally, and the like, but due to the problems of technical means or the tiger or the intention of operators, and the certificate photo has different standard requirements in different use occasions, the shooting effect is different from person to person, and the standard property and the quality of the photo are different. If only depend on the professional to carry out the manual inspection will waste huge manpower and materials undoubtedly.

With the promotion of an internet and government affair service system, the artificial intelligence technology is combined with the internet technology to realize the intellectualization and informatization of government affair service, the flow efficiency and the service quality of the government affair service are improved, and the construction of a service type government has important significance. At present, classification methods such as quality detection and corona-free detection of the identification photo exist, but local anomaly detection and posture estimation are rarely studied on facial organs of the identification photo. Therefore, based on the above requirements, how to implement automatic anomaly detection on the user certificate photo by using an image detection and classification technology, so as to improve the auditing efficiency of the certificate photo and reduce the cost of manpower and material resources, has become a technology generally concerned by technicians.

Disclosure of Invention

Aiming at the existing problems, the invention provides the face key point-based identification photo abnormity detection method which can automatically identify, check and upload photos, save the auditing time and waiting time of window personnel, improve the business handling efficiency, and further can detect whether the portrait photos shot by the user meet the basic requirements of the identification photos.

In order to achieve the above object, the technical scheme provided by the invention is a method for detecting local abnormality of a certificate photo based on face key points, which comprises the following steps:

firstly, quickly auditing whether a face is detected in a photo uploaded by a user by adopting an Adaboost method based on Haar characteristics for the photo uploaded by the user, and if so, outputting a face image as the next step of input; otherwise, outputting a detection result: unqualified certificate photo, reason: no face is detected;

step two, extracting a face image to detect a face contour and extracting a candidate window according to the contour, training a multi-class classification model to obtain face classification, and training a regression model for four modes of a left eye, a right eye, a nose, a mouth, a left mouth and a right mouth to obtain 5 key points of a left eye center, a right eye center, a nose tip, a left mouth corner and a right mouth corner;

step three, using MobileNet to fit a regression function to carry out occlusion detection, outputting a Dl-dimensional floating vector, converting the vector into a Dl-dimensional [0, 1] Boolean vector after certain threshold segmentation, when the output of the model is greater than the threshold, the occlusion state of a key point is 1 (representing the point is abnormal), explicitly detecting the occlusion state of a key point of the face, and judging the type of the face occlusion (eye/mouth/nose) abnormality; the method can automatically check the evidence photo for eye shielding abnormity, mouth shielding abnormity and nose shielding abnormity; then, the geometric features are constructed by utilizing 5 key points, whether the portrait in the certificate photo is in correct posture (namely, the head is correct, and the eyes are in front of the head) is judged according to a judgment rule, and if not, a detection result is output: unqualified certificate photo, reason: attitude left-right skew/rotational skew/head-down/head-up is not correct;

step four, summarizing and outputting the auditing result.

Compared with the prior art, the identification photo abnormity detection method based on the face key points has the following outstanding beneficial effects: the identification photo abnormity detection method based on the face key points adopts a mode of combining technologies such as face detection, key point positioning and attitude estimation, automatic audit of non-face abnormity, non-frontal abnormity, eye shielding abnormity, mouth shielding abnormity and nose shielding abnormity of the identification photo is realized, the audit time of window personnel is saved, the working efficiency of the window personnel is improved, the work experience of users is improved, and the method has good popularization and application values.

Drawings

FIG. 1 is a flow chart of a method for detecting abnormal photographs based on feature points according to the present invention;

FIG. 2 is a flowchart illustrating the key point locating process to determine whether each point is abnormal in step two;

FIG. 3 is a network structure of the multi-class classifier in step two;

FIG. 4 is a schematic diagram of a regression model network structure in step two;

FIG. 5 is a flowchart of a keypoint occlusion exception module in step three;

FIG. 6 is the determination of normality and abnormality of key points and the classification of geometric features in step three;

fig. 7 shows the head posture determination rule in step three.

Detailed Description

The method for detecting the abnormal condition of the identification photo based on the feature points according to the present invention will be described in further detail with reference to the accompanying drawings and embodiments.

Example 1: as shown in fig. 1, the method for detecting abnormal identification photo based on feature points of the present invention includes:

the method comprises the following steps that firstly, for a photo uploaded by a user, an Adaboost method based on Haar features is adopted to quickly detect whether the photo uploaded by the user has a face, and if yes, a face positioning area is obtained and used as input of the next step; otherwise, outputting a detection result: unqualified certificate photo, reason: no face is detected;

step two, extracting a face region to detect a face contour and extracting a candidate window according to the contour, training a multi-class classification model to obtain face feature classification, training a regression model for four modes of a left eye, a right eye, a nose, a mouth, a left mouth and a right mouth to obtain 5 key points of a left eye center, a right eye center, a nose tip, a left mouth corner and a right mouth corner, and continuing to the next step;

step three, completing occlusion detection by using the MobileNet, predicting the occlusion state of key points of the human face, and outputting a detection result if the occlusion state of the key points is 1 (indicating that the key points are abnormal) when the output of the model is greater than a threshold value: unqualified certificate photo, reason: face occlusion anomalies (eye/mouth/nose occlusion anomalies); the geometric characteristics are constructed by utilizing 5 key points, whether the portrait in the certificate photo is in posture correction (namely, the head is in correction, and eyes are in head-up front) is judged according to the judgment rule, and if not, the detection result is output: unqualified certificate photo, reason: attitude left-right skew/rotational skew/head-down/head-up anomaly;

step four, summarizing and outputting the auditing result. When the selected technical scheme passes the verification of the certificate photo, outputting: and (4) the certificate photo is qualified, otherwise, the output is carried out: the certificate photo is unqualified because of the union of the auditing results of the certificate photo by the technical scheme.

As a further improved technical scheme of the invention, the AdaBoost algorithm selects features which accord with the Haar-like of the human face from a plurality of features, constructs weak classifiers by using the features, selects values with better classification effect from the weak classifiers by an iterative method for combination, and distributes different weights by weighting to form a final strong classifier; cascade strategy layer-by-layer verification is adopted, so that a large number of non-human face samples can be eliminated. The weak classifier is G_m(t): t → { -1, +1}, the strong classifier is composed of weak classifiers, and the final classifier:

where M is 1,2, …, M, S is 1,2, …, S, M representing the number of iterations, S representing the number of rectangles that make up the Haar-like feature, α_mRepresents G_m(t) importance in the Final classifier, utilization ofThe algorithm detects whether a picture uploaded by a user has a face, and if so, the output face image is used as the input of the next step; otherwise, outputting a detection result: unqualified certificate photo, reason: no human face is detected.

As shown in fig. 2, in the second step, key points of the left eye center, the right eye center, the nose tip, the left mouth corner and the right mouth corner of the face image are located, and the further improved technical scheme specifically includes the following steps:

b1, carrying out face contour detection on the photo and extracting a candidate window according to the contour;

inputting human face images of length1, length2, scalew and scaleh, converting the images into gray level images and the size of 300 multiplied by 300, and carrying out accurate detection and contour searching; length1 and length2 indicate contour lengths, scalew and scaleh are empirical values for magnifying a window, and length1 is set to 30, length2 is set to 400, scalew is set to 30, scaleh is set to 20.

And (3) circulating each contour, only selecting contour lines with the length being more than length1 and less than length2, obtaining leftmost points (xmin, y1), rightmost points (xmax, y2), uppermost points (x1, ymin) and lowermost points (x2, ymax), and outputting candidate windows (xmin-scale, ymin-scale, xmax-xmin +2 × scale, ymax-ymin +2 × scale).

Extracting a candidate window capable of reflecting the face part of the human face by using the algorithm; when the overlapping proportion is more than 50%, partial windows are removed, and 5 points of the face, namely the left eye center, the right eye center, the nose tip, the left mouth corner and the right mouth corner, are accurately detected.

Step B2, training a multi-class classification model and obtaining facial feature classification;

four classes are defined, left and right eye, nose, mouth, left and right mouth, negative examples. The learning objective is expressed as a four-class classification problem, using the cross-entropy loss function:

z_yand z_jThe prediction results for the samples labeled y and J are indicated, J indicates the number of classes, and J is therefore 4.

After classifying the face, the left eye and the right eye must be separated and the left mouth and the right mouth must be distinguished; the separation process is realized by the position and the overlapping rate of the face part, the overlapping rate is set to be 50%, the leftmost part and the rightmost part are firstly picked out, if the overlapping proportion is less than 50%, the leftmost part is considered to be a left eye (left mouth), and the rightmost part is considered to be a right eye (right mouth).

Data set: 5000 pictures are selected from CelebA, COFW and MTFL databases, 2 ten thousand collected identification pictures (manually labeled coordinates) are added to fuse faces with different shapes, and each sample needs to provide coordinate information of 5 key points (a left eye center, a right eye center, a nose tip, a left mouth corner and a right mouth corner), so that each part of a face is easily obtained; negative examples are cut out from other parts of the face and other non-face images.

A multi-class classification model network structure: the method comprises the steps of inputting 39 x 39 pictures by a neural network structure comprising 4 convolutional layers and 2 full-connected layers, selecting 4 x 4,3 x 3,4 x 4 and 3 x 3 convolution kernels from the convolutional layers C1, C2, C3 and C4 respectively to perform convolution operation, performing max-posing by a window with the size of 2 x2 for each convolution, finally inputting 2 x 80 feature maps into the full-connected layers after training to obtain the multi-class classifier, wherein the structure of the multi-class classifier is shown in figure 3.

Step B3, training regression models of the four modes to obtain key points;

the four modes are left eye and right eye, nose, mouth and left mouth, mouth and right mouth; the CNN model is trained to locate key points, the input to CNN is a small area rather than the entire surface, so it is a structure. The learning objective is represented as a regression problem, using the euclidean penalty:

wherein

To obtain the facial landmark coordinates from the network,

is the true coordinates of the earth, I is the number of samples; one model is trained for each keypoint, therefore

Regression model network structure: the method comprises the steps of inputting a picture with the size of 15 multiplied by 15 by a neural network structure comprising 2 convolutional layers and 2 full-connected layers, selecting convolution kernels with the sizes of 4 multiplied by 4 and 3 multiplied by 3 from the convolutional layers C1 and C2 respectively to carry out convolution operation, carrying out max-firing on a window with the size of 2 multiplied by 2 in each convolution, finally obtaining a feature map with the size of 2 multiplied by 2 and inputting the feature map into the full-connected layers, and obtaining key points after finishing training, wherein the structure of the key points is shown in figure 4.

As a further improved technical solution of the present invention, the third step is, as shown in fig. 5, to perform occlusion detection by using a MobileNet fitting regression function, construct geometric features by using position information of 5 key points, and determine whether a photo is abnormal according to the output key point information and a threshold, and specifically includes the following steps:

and step C1, introducing an occlusion detection regression model, wherein the occlusion detection is the occlusion state of the predicted face mark. Utilizing MobileNet to fit a regression function to carry out shielding detection, setting a width multiplier alpha to be 1.0 and a resolution multiplier rho to be 1.0 for hyper-parameters, and dividing a two-dimensional floating point vector into D by using a certain threshold value_lDimension 0,1 occlusion state.

Training data set

Including a face image X_nAnd corresponding D_lDimension {0, 1} occlusion State

D_lIs the number of keypoints; the regression model achieves X by optimizing the following objectives_nMapping to H_n:

Wherein

Is a feature extraction function and f is an objective regression function. For the

The whitened image is used as a feed feature for the prediction model.

The model outputs an L-dimensional floating vector with each element at [0, 1]]Within the interval, the probability of an occluded landmark is represented. After a certain threshold value division, the vector is converted into D_lVitamin [0, 1]]Boolean vector, i.e. the predicted occlusion state of the output face region. The threshold is set to 0.12, and when the model output in the positioning is greater than 0.12, the occlusion state of the landmark is 1.

Step C2, judging whether the photo is abnormal according to the obtained key point information: the output form of 5 key points is

And

respectively represent the coordinate positions of the key points when

And displaying a red key point in time, indicating that the point is abnormally shielded, and outputting a detection result, wherein the certificate is unqualified and the reason is as follows: eye/mouth/nose occlusion abnormalities; when in use

The key point is displayed as a green key point, which indicates that the key point is normal, and the next operation is continued. As shown in FIG. 6(a), the method can proceed with the certificationAnd automatically auditing the line eye shielding abnormity, the mouth shielding abnormity and the nose shielding abnormity.

Step C3, constructing geometric features to detect whether the portrait in the certificate photo is in correct posture (namely, the head is correct and the eyes are in front of the head), specifically comprising the following steps:

as shown in fig. 6(b), the features of the configuration include: distance between eyes in the vertical direction (CD) and distance ratio between eyes and nose tip line

Ratio of distance from midpoint of eyes to nose to distance from midpoint of corners of mouth to nose

The setting of each characteristic threshold value of the present embodiment is shown in table 1 below:

TABLE 1 setting of threshold values

The distance (CD) of the two eyes in the vertical direction, wherein the left eye is A, the right eye is B, and the distance between A and B in the vertical direction is used as a head deviation judgment standard; when the deviation is larger than the threshold value, the left-right deviation is determined.

Distance ratio of line connecting eyes and nose tip

The connecting line of the left eye and the nose tip is OA, the connecting line of the right eye and the nose tip is OB, and the ratio of the line segment OA to the line segment OB is used as a head deviation judgment standard; the rotational deflection of the head affects the magnitude of this ratio: head left rotational skew OA is less than OB, then

Less than 1, head right rotational deflection OA greater than OB, then

Less than 1; otherwise, the ratio approaches 1, i.e. the lengths of the line segment OA and the line segment OB are substantially equalAre equal.

Point X is the midpoint of the line connecting the eyes, and line segment OX is the line connecting the nose tip and the midpoint of the eyes; and point Y is the midpoint of the line connecting the corners of the two mouths, and line segment OY is the line connecting the tip of the nose to the midpoint of the corners of the two mouths. The ratio of the length of the line segment OX to the length of the line segment OY can be used as a criterion for the determination of the head deviation. When the head is raised, the line segment OX is shortened, the line segment OY is lengthened, and the ratio is reduced accordingly; when lowering the head, the line OX is lengthened, the line OY is shortened, and the ratio is increased accordingly.

As shown in fig. 7, the posture determination rule sets a threshold value for each feature for determining head deviation, and outputs the result when the feature is greater than or less than the threshold value: the certificate photo is not qualified because the posture is not correct (skew type).

The identification photo abnormity detection method based on the face key points adopts a mode of combining technologies such as face detection, key point positioning and attitude estimation, and the like, so that automatic verification of non-face abnormity, non-frontal abnormity, eye shielding abnormity, mouth shielding abnormity and nose shielding abnormity of the identification photo is realized, the verification time of window personnel is saved, and the work experience of a user is improved.

The above-described embodiments are merely preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims

1. A local abnormal detection method of a certificate photo based on face key points is characterized by comprising the following steps:

the method comprises the following steps that firstly, an Adaboost algorithm is adopted to check whether a face is detected in a certificate photo uploaded by a user, and if so, the face area is output to enter the subsequent detection to determine whether the face is abnormal; and if the human face is not detected, outputting a detection result: unqualified certificate photo, reason: no face is detected;

secondly, collecting pictures and certificate photographs to form a data set, carrying out face contour detection on a face region extracted from the certificate photographs, extracting candidate windows according to contours, training a plurality of types of classification models to obtain face feature classification, and training regression models for four modes of a left eye, a right eye, a nose, a mouth, a left mouth and a right mouth to obtain 5 key points of a left eye center, a right eye center, a nose tip, a left mouth corner and a right mouth corner;

step three, completing occlusion detection by using a MobileNet fitting regression function, predicting the occlusion state of key points of the human face, when the output of the model is greater than a threshold value, the occlusion state of the key points is 1, indicating that the points are abnormal, and outputting a detection result: unqualified certificate photo, reason: face occlusion is abnormal; and (3) constructing geometric features by using 5 key points, judging whether the portrait in the identification photo is in a correct posture according to a judgment rule, and if not, outputting a detection result: unqualified certificate photo, reason: the posture is not correct;

step four, summarizing and outputting the auditing result.

2. The method for detecting local anomaly of identification photo based on human face key points according to claim 1, characterized in that Adaboost algorithm uses off-line learning classifier to detect human face, based on Harr rectangular feature integral graph to carry out cascade detection, and selects values with better classification effect from a plurality of weak classifiers by iterative method to combine, and obtains different weights by weight distribution to form the final strong classifier; cascade strategy is adopted for verification layer by layer, so that a large number of non-human face samples can be eliminated, and the weak classifier is G_m(t): t → { -1, +1}, the strong classifier is composed of weak classifiers, and the final classifier:

where M1, 2, 1, M, S1, 2, S, M stands for iteration round, S denotes the number of rectangles that make up the haar feature, g (t) is a combination of weak classifiers, α_mRepresents G_m(t) importance in the final classifier, detection of on-user by the algorithmWhether the photo has a face or not is transmitted, if so, the output face image is used as the input of the next step; otherwise, outputting a detection result: unqualified certificate photo, reason: no human face is detected.

3. The method for detecting local abnormal of identification photo based on face key points as claimed in claim 1, wherein the second step specifically comprises the following steps:

step B1, carrying out face contour detection on the photo to extract a candidate window; inputting a face image, converting the face image into a gray image with the size of 300 multiplied by 300, and accurately detecting and searching a contour; circulating each contour, outputting a candidate window, and removing partial windows according to the overlapping rate;

step B2, training a multi-class classification model, and obtaining facial features through classification; labeling coordinate information of 5 key points on each sample in the data set, and cutting out negative samples from other parts of the human face and other non-human face images; four classes are defined: left and right eyes; a nose; mouth, left mouth and right mouth; a negative sample; training the target loss function:

z_yand z_jRespectively representing the prediction results of samples marked with y and J, wherein J represents the number of classes;

a multi-class classification model network structure: the method comprises the steps of inputting 39 x 39 pictures by a neural network structure comprising 4 convolutional layers and 2 full-connected layers, selecting 4 x 4,3 x 3,4 x 4 and 3 x 3 convolutional kernels from the convolutional layers C1, C2, C3 and C4 respectively to perform convolution operation, performing maximum pooling by a window with the size of 2 x2 for each convolution, finally obtaining 2 x 80 characteristic graphs, inputting the characteristic graphs into the full-connected layers, and completing training to obtain a multi-class classification model;

step B3, training regression models of the four modes to obtain key points; the four modes are respectively: left and right eyes; a nose; mouth and left mouth; mouth and right mouth; using euclidean losses:

wherein

To obtain the facial landmark coordinates from the network,

Regression model network structure: the method comprises the steps of inputting a picture with the size of 15 multiplied by 15 by a neural network structure comprising 2 convolutional layers and 2 full-connected layers, selecting 4 multiplied by 4 and 3 multiplied by 3 convolution kernels from the convolutional layers C1 and C2 respectively to carry out convolution operation, carrying out maximum pooling by a window with the size of 2 multiplied by 2 in each convolution, and finally obtaining a characteristic diagram of 2 multiplied by 40 and inputting the characteristic diagram into the full-connected layers.

4. The method for detecting local abnormal situation of identification photo based on face key points as claimed in claim 1, wherein the step three of performing occlusion detection and constructing geometric features by using key points specifically comprises the steps of:

step C1, using MobileNet to fit regression function to complete occlusion detection, training data set

Including a face image X_nAnd corresponding D_lDimension {0, 1} occlusion State

D_lIs the number of keypoints; regression modelAchieving X by optimizing the following objectives_nMapping to H_n：

Wherein

Is a feature extraction function, f is a target regression function; after a certain threshold value division, the vector is converted into D_lVitamin [0, 1]]Boolean vectors, namely the predicted blocking state of the output face region;

step C2, judging whether the photo is abnormal according to the key point information: the output form of 5 key points is

And

indicating the coordinate position of the key point when

And (3) displaying a red key point in time, indicating that the point is abnormally shielded, and outputting a detection result: unqualified certificate photo, reason: eye/mouth/nose occlusion abnormalities; when in use

The key point is displayed as a green key point, which indicates that the key point is normal, and the next operation is continued.

Step C3, using 5 key points to construct geometric features, and judging whether the portrait in the certificate photo is gesture correction according to the judgment rule specifically comprises the following steps:

features of the construction include: the distance between the two eyes in the vertical direction, the distance ratio between the two eyes and the connecting line of the nose tip, and the ratio between the distance between the middle point of the two eyes and the nose and the distance between the middle point of the two corners of the mouth and the nose.

The distance CD of the two eyes in the vertical direction is A, the left eye is A, the right eye is B, and the distance between A and B in the vertical direction is used as a head deviation judgment standard; when the deviation is larger than the threshold value, judging that the deviation is left and right;

distance ratio of line connecting eyes and nose tip

Less than 1, head right rotational deflection OA greater than OB, then

Less than 1; otherwise, the ratio approaches 1, i.e. the lengths of the line segment OA and the line segment OB are equal;

Point X is the midpoint of the line connecting the eyes, and line segment OX is the line connecting the nose tip and the midpoint of the eyes; the point Y is the middle point of the connecting line of the double corners of the mouth, and the line segment OY is the connecting line from the nose tip to the middle point of the double corners of the mouth; the ratio of the length of the line segment OX to the length of the line segment OY can be used as a standard for judging the head deflection; when the head is raised, the line segment OX is shortened, the line segment OY is lengthened, and the ratio is reduced accordingly; when the head is lowered, the line segment OX is lengthened, the line segment OY is shortened, and the ratio is increased accordingly;

setting threshold values for all the characteristics for judging the head deflection, and outputting the following results when the characteristics exceed the threshold values: unqualified certificate photo, reason: the posture is not correct.