CN115798008A

CN115798008A - Rapid face detection and recognition method based on key point correction

Info

Publication number: CN115798008A
Application number: CN202211515708.1A
Authority: CN
Inventors: 田维青; 肖建军; 张显; 饶毅; 黄勇; 宋万礼; 陈辉; 张超; 余玲; 王龙; 居亮
Original assignee: Nanjing Nanzi Information Technology Co ltd; Guizhou Qianyuan Power Co ltd
Current assignee: Nanjing Nanzi Information Technology Co ltd; Guizhou Qianyuan Power Co ltd
Priority date: 2022-11-30
Filing date: 2022-11-30
Publication date: 2023-03-14

Abstract

The invention belongs to the technical field of face recognition application, and particularly discloses a rapid face detection and recognition method based on key point correction, which comprises the following steps of 1, establishing a face library; step 2, detecting a human face; step 3, judging the resolution and the positive bias; step 4, extracting characteristics; and 5, calculating the similarity of the human faces and matching. The rapid face detection and identification method based on key point correction has the beneficial effects that: 1. the recognition method disclosed by the invention filters the small too-biased human face, and solves the problems that the angle of the human face shot by the industrial television camera is not correct and the resolution ratio is small, so that the recognition accuracy is influenced; 2. a new face detection network RetinaDF is added to accelerate the detection speed and a new weighted distance is used for calculating the face similarity so as to balance the Euclidean distance and the cosine distance and improve the identification accuracy.

Description

Rapid face detection and recognition method based on key point correction

Technical Field

The invention belongs to the technical field of face recognition application, and particularly relates to a rapid face detection and recognition method based on key point correction, which is suitable for industrial television camera scenes.

Background

With the development of deep learning, a deep learning neural network is generally used for detecting a face and predicting key points, and meanwhile, a face recognition technology based on key point correction is more mature. The human face recognition is a biological recognition technology for carrying out identity recognition based on facial feature information of a person, and comprises the steps of collecting images or videos containing human faces by using a camera, detecting the human faces in each frame of the images or videos, determining the positions of the human faces, and carrying out affine transformation correction and face recognition on the detected human faces.

The face recognition mainly comprises four parts, namely: face detection, image preprocessing, face feature extraction, matching and identification.

The face detection is to find a face target in an image and accurately calibrate the position and size of a face. Due to the particularity of the face, for example, problems of small face target, unobvious face features, shielding and the like, the general target detection is often limited when the face target is detected, so the face detection is often provided with a special algorithm, or the face target is improved on the basis of the general target detection algorithm, so that the face detection precision is improved. The face detection model RetinaFace is an improved version based on a universal target detection network RetinaNet, a detection module of an SSH network is added, and the classification of whether the face is the face, the face frame regression and the face key point are respectively output by three detection heads, so that the detection precision is improved; the lightweight network MobileNet is used as a backbone network to extract features, and the detection speed is improved.

Image pre-processing is the process of processing detected faces and serving for feature extraction. The original image obtained by face detection is limited by various conditions, so that the original image cannot be directly used, and preprocessing including gray level conversion, light compensation and the like is required. The most frequently used preprocessing is affine transformation of the human face according to the detected key point information of the human face, and the human face is rotated and straightened, so that the identification precision is improved.

The face feature extraction is a process of carrying out feature modeling on a face, and the face feature extraction based on deep learning is completed by a deep neural network. FaceNet can directly map faces to euclidean space, the distance in euclidean space directly corresponding to a measure of face similarity. Similarly, a lightweight network MobileNet is used for extracting face features, a face classifier is trained by taking a name as a label and multiple faces on each label as a data set during training, the last full-link layer of the classifier is removed during inference by using a feature extraction network, and a one-dimensional feature vector with 128 dimensions is output as the face features.

The matching and identifying is to search and match the detected human face after extracting features with human face features in a human face library, calculate Euclidean distance or cosine distance with the human face in the human face library, set a threshold value, and output the result obtained by matching when the similarity exceeds the threshold value.

There are many limitations and difficulties for industrial television cameras to realize key point face correction recognition. The face library of a general station entrance or entrance guard face recognition system inputs faces, the general resolution is high, the detected faces generally need at least 200 pixels to ensure accurate recognition, so that the faces are detected by a camera in a short distance, and accurate recognition can be ensured only by the front faces; the human face detected by the industrial television camera due to the angle problem is generally not a front face, the resolution of the human face detected by the camera with high resolution is smaller and is generally only dozens of pixels, and both the resolution and the resolution can influence the sufficiency of human face feature extraction, so that the recognition effect is influenced.

The face recognition effect can be influenced by overexposure or excessively dark illumination environments, and the conventional face recognition system has a light supplement function; some of industrial television cameras can be located outdoors, the face extraction features are greatly influenced by illumination, night scene cameras can use a night mode, the features cannot be effectively extracted, and meanwhile, the recognition effect can be poor for generally colored faces in a face library.

In an actual scene, people all move, and the movement of the human face relative to the camera generates motion blur. At present, a face recognition system such as an access control system has an automatic anti-fuzzy function, and generally performs short-time freezing on a picture after a clear and proper face is detected so as to improve the effect and the speed of a face recognition algorithm; the industrial television camera is low in resolution, fuzzy and mostly has no anti-fuzzy function, and even if a human face is detected, the recognition effect is influenced

The key information of the face is shielded under the condition that the face is shielded by a shield such as a hat, a mask and the like, so that the sufficiency of face feature extraction is influenced, and the recognition effect is further influenced. People's face recognition systems such as entrance guard usually need to take off these shelters to guarantee the accuracy of the face recognition algorithm; in the scene of the industrial television camera, for example, personnel in a power plant need to wear a safety helmet all the time, so that the situation of shielding the face often occurs, and the algorithm identification precision is influenced.

In an industrial television camera scene, a plurality of human faces often appear, and one picture is often slow to detect and identify. The face similarity is calculated by using the Euclidean distance in an industrial television camera scene, an unknown face is easily recognized as a face in a face library, and the face similarity is calculated by using the cosine distance, so that the face in the face library is easily not recognized.

Based on the problems, the invention provides a rapid face detection and identification method based on key point correction.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to provide a rapid face detection and recognition method based on key point correction, which solves the problems existing in the background technology, namely, the face detected by an industrial television camera due to the angle problem is generally not a front face, the resolution of the face detected by a camera with high resolution is smaller and is generally only dozens of pixels, and both the resolution and the resolution can influence the sufficiency of face feature extraction, thereby influencing the recognition effect; a lot of human faces often appear in an industrial television camera scene, which often causes that the detection and identification of one picture are slow; the face similarity is calculated by using the Euclidean distance, an unknown face is easily recognized as a face in a face library, and the face similarity is calculated by using the cosine distance, so that the face in the face library is easily not recognized.

The technical scheme is as follows: the invention provides a rapid face detection and identification method based on key point correction, which comprises the following steps of 1, establishing a face library, and ensuring that the resolution of a face cannot be overlarge and the maximum resolution does not exceed 150 pixels when the face is input; step 2, detecting the human face, inputting the collected picture containing the human face into a human face detection network RetinaDF, intercepting the human face in an original image according to the output human face frame information, rotationally correcting the human face according to a key point, inputting a faceNet characteristic extraction network to extract the human face characteristics and corresponding human names and storing the human face characteristics and the corresponding human names into a human face library; step 3, inputting a face detection network aiming at the picture or the video frame, and intercepting a face after detecting a face frame output by the network; step 4, judging resolution and correction deviation, carrying out affine transformation on the judged face, carrying out rotating correction according to key points, and inputting faceNet to extract features; and 5, calculating and matching the similarity of the human face, matching the extracted features with the features in the human face library, and calculating the similarity distance.

This technical scheme, in step 1, guarantee that the people's face resolution ratio can not be too big when typeeing the people's face, the biggest 150 pixels that do not exceed, in order to match with the scene, it is good to directly use the industrial television camera to gather the people's face, gather the face of face, the left side face, the right side face, the face of five angles such as on the lower side, on the upper side during the collection people's face, many people's faces are gathered to every angle, many people's faces of same kind of angle, the skew angle is different as far as possible and the difference is some greatly, the people's face of typeeing can not have facial sheltering from.

In the technical scheme, in the step 2, the face detection network RetinaDF uses MobileNet as a backbone network, and overlaps expansion modules with different expansion rates, wherein each expansion module is formed by connecting standard convolution, expansion convolution and residual errors, covers output characteristics of a plurality of receptive fields, then inputs the output characteristics into an SSH module, and respectively outputs the face probability, the face frame regression and the face key point by using three detection heads.

According to the technical scheme, after the face library is established in the step 3, a face picture or video is collected by using an industrial television camera to realize a face recognition algorithm, whether the resolution of the intercepted face is too small is judged, if the width and the height are both smaller than 80 pixels, the face is judged to be too small, and the face is not sent to a feature extraction network to extract and recognize features; and judging whether the face is excessively biased or not according to the information of five key points of two eyes, a nose and a mouth corner of the face, and if the face is excessively biased, not sending the face to a feature extraction network to extract features and identifying the features.

In the technical scheme, the distance is weighted in the step 5 to calculate the face similarity, and the Euclidean distance and the cosine distance are balanced, so that the face in a face library can not be recognized easily due to the cosine distance, and the Euclidean distance is required to be added for balancing;

the Euclidean distance threshold is 0.8, the cosine distance threshold is 0.75, the cosine distance is subtracted from 1, the cosine distance threshold is 0.25, the smaller the distance is, the more similar the Euclidean distance is, the Euclidean distance and the cosine distance are weighted and added to obtain a new distance d = alpha, eudis + (1-alpha) (1-cosdis), alpha is the weight for balancing eudis and cosdis, the better effect is achieved by taking 0.2, and the threshold is 0.2.0.8+ 0.8.25 =0.36;

and calculating the similarity between the extracted features and the features in the face library, judging whether the similarity is smaller than a threshold value, if so, matching the corresponding name, otherwise, identifying the name.

In the technical scheme, in the step 2, if the nose is excessively close to the left eye and the right eye or the left mouth angle and the right mouth angle in the x-axis direction, the face is judged to be excessively deviated, and the distance x between the nose and the left eye and the distance x between the nose and the right eye in the x-axis direction are calculated ₁ 、x ₂ Distance (x) from left and right eyes ₁ + x ₂ ) The distance x between the nose and the left and right mouth angles in the x-axis direction is calculated ₃ 、x ₄ The distance (x) between the left and right mouth angles ₃ + x ₄ ) Setting a threshold value of 0.1, and if the calculated ratio is smaller than the threshold value, judging that the face is over biased; if the nose is excessively close to the left eye and the right eye or the left mouth angle and the right mouth angle in the y-axis direction, judging that the face is excessively deviated, and calculating the distance y between the nose and the center points of the left eye and the right eye in the y-axis direction ₁ Distance (y) from center point of left and right eyes and center point of left and right mouth angles ₁ + y ₂ ) The distance y between the nose and the left and right mouth angles in the y-axis direction is calculated ₂ The distance (y) between the left and right mouth angles ₁ + y ₂ ) The ratio of (2) is more strict in the two directions of judging the upper part and the lower part than in the direction of judging the left part and the right part, a threshold value of 0.25 is set, and if the calculated ratio is smaller than the threshold value, the face is judged to be over-biased; if the left eye, the right eye and the left mouth corner excessively approach the frame, the human face is also judged to be excessively deviated, and the distance w between the left eye, the left mouth corner and the left side frame is calculated ₁ 、w ₃ Distance w between right eye and mouth corner and right side frame ₂ 、w ₄ Setting a threshold value of 0.1, if the ratio of the distance to the width w of the face frame is less than the threshold value, judging that the face is over-biased, and calculating the distance h between the left eye, the right eye and the upper side frame ₁ 、h ₂ Distance h between left mouth corner, right mouth corner and lower side frame ₃ 、h ₄ And if the ratio of the distance to the height h of the face frame is less than 0.1 of the threshold value, judging that the face is over biased.

Compared with the prior art, the rapid face detection and identification method based on key point correction has the beneficial effects that: 1. the identification method of the invention filters the small and deviated human face, and solves the problems that the angle of the human face shot by the industrial television camera is not correct and the resolution ratio is small, which affects the identification accuracy; 2. a new face detection network RetinaDF is designed to accelerate the detection speed and a new weighted distance is designed to calculate the face similarity so as to balance the Euclidean distance and the cosine distance and improve the identification accuracy.

Drawings

To more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following embodiments will be made to

While the drawings needed to describe the embodiments herein are briefly described, it should be apparent that the drawings described below are merely examples of the invention, and that other drawings may be derived from those of ordinary skill in the art without inventive faculty.

FIG. 1 is a block diagram of a flow structure of a fast face detection and recognition method based on keypoint correction according to the present invention;

FIG. 2 is a block diagram of a process of establishing a face library according to the fast face detection and recognition method based on keypoint correction of the present invention;

FIG. 3 is a structure diagram of a face detection network RetinaDF based on the fast face detection and identification method of key point correction of the present invention;

fig. 4 is a parameter schematic diagram of calculation and comparison when the face is judged to be forward biased according to the rapid face detection and identification method based on keypoint correction of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it is to be noted that the terms "top", "bottom", "one side"

"the other side", "the front", "the back", "the middle part", "the inside", "the top",

the bottom "and the like indicate orientations or positional relationships that are based on the orientations or positional relationships shown in the drawings, which are merely for convenience in describing the present invention and for simplifying the description, and do not indicate or imply that the indicated device or element must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the present invention; the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance; furthermore, unless expressly stated or limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, as they may be fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

The invention relates to a rapid face detection and identification method based on key point correction, which comprises the following steps of 1, establishing a face library, and ensuring that the resolution of a face cannot be overlarge and the maximum resolution does not exceed 150 pixels when the face is input; step 2, detecting the human face, inputting the collected picture containing the human face into a human face detection network RetinaDF, intercepting the human face in an original image according to the output human face frame information, rotationally correcting the human face according to a key point, inputting a faceNet characteristic extraction network to extract the human face characteristics and corresponding human names and storing the human face characteristics and the corresponding human names into a human face library; step 3, inputting a face detection network aiming at the picture or the video frame, and intercepting the face after detecting a face frame output by the network; step 4, judging resolution and correction deviation, carrying out affine transformation on the judged face, carrying out rotating correction according to key points, and inputting faceNet to extract features; and 5, calculating and matching the similarity of the human face, matching the extracted features with the features in the human face library, and calculating the similarity distance.

In addition, in preferred step 1, when the face is input, the face resolution cannot be too large, the maximum resolution cannot exceed 150 pixels, in order to match with a scene, it is better to directly use an industrial television camera to collect the face, when the face is collected, five angles of face, left side face, right side face, lower part, upper part and the like are collected, multiple faces are collected at each angle, multiple faces at the same angle are collected, the offset angles are different as much as possible and have large differences, and the input face cannot be shielded by the face.

In addition, in the preferable step 2, the face detection network retinadef uses MobileNet as a backbone network, and overlaps expansion modules with different expansion rates, wherein the expansion modules are formed by connecting standard convolution, expansion convolution and residual errors, and cover output characteristics of a plurality of receptive fields, and then the output characteristics are input into an SSH module, and the face probability, the face frame regression and the face key point are respectively output by three detection heads.

In addition, after a face library is established in the preferable step 3, a face picture or a video is acquired by using an industrial television camera to realize a face recognition algorithm, whether the resolution of the intercepted face is too small is judged, if the width and the height are both smaller than 80 pixels, the face is judged to be too small, and the face is not sent to a feature extraction network to extract and recognize features; and judging whether the face is excessively biased or not according to the information of five key points of two eyes, a nose and a mouth corner of the face, and if the face is excessively biased, not sending the face to a feature extraction network to extract and identify features.

In addition, in the preferable step 5, the distance is weighted to calculate the face similarity, and the euclidean distance and the cosine distance are balanced, and since the cosine distance is easy to identify the face in the face library, the euclidean distance needs to be added for balancing;

In the technical scheme, in the step 2, if the nose is excessively close to the left eye and the right eye or the left mouth angle and the right mouth angle in the x-axis direction, the face is judged to be excessively deviated, and the distance x between the nose and the left eye and the distance x between the nose and the right eye in the x-axis direction are calculated ₁ 、x ₂ Distance from left and right eyes (x) ₁ + x ₂ ) The distance x between the nose and the left and right mouth angles in the x-axis direction is calculated ₃ 、x ₄ The distance (x) between the left and right mouth angles ₃ + x ₄ ) Setting a threshold value of 0.1, and if the calculated ratio is smaller than the threshold value, judging that the face is over biased; if the nose excessively approaches to the left eye and the right eye or the left mouth angle and the right mouth angle in the y-axis direction, the human face is judged to be excessively deviated, and the distance y between the nose and the center points of the left eye and the right eye in the y-axis direction is calculated ₁ Distance (y) from center point of left and right eyes and center point of left and right mouth angles ₁ + y ₂ ) The distance y between the nose and the left and right mouth angles in the y-axis direction is calculated ₂ Distance (y) from left and right mouth corners ₁ + y ₂ ) The ratio of (2) is more strict in the two directions of judging the upper part and the lower part than in the direction of judging the left part and the right part, a threshold value of 0.25 is set, and if the calculated ratio is smaller than the threshold value, the face is judged to be over-biased; if the left eye, the right eye and the left mouth corner are excessively close to the frame, the human face is also judged to be excessively deviated, and the distance w between the left eye, the left mouth corner and the left side frame is calculated ₁ 、w ₃ Distance w between right eye and mouth corner and right side frame ₂ 、w ₄ Setting a threshold value of 0.1, if the ratio of the distance to the width w of the face frame is less than the threshold value, judging that the face is over-biased, and calculating the distance h between the left eye, the right eye and the upper side frame ₁ 、h ₂ Distance h between left mouth corner, right mouth corner and lower side frame ₃ 、h ₄ And if the ratio of the distance to the height h of the face frame is less than 0.1 of the threshold value, judging that the face is over biased.

Examples

A lot of human faces often appear in an industrial television camera scene, and one picture is often slow to detect and identify.

As shown in fig. 3, the retinaFace is optimized as a new face detection network retinaDF. A general detection network includes a Feature Pyramid FPN (Feature Pyramid Net) for fusing multi-scale features to enhance the detection effect on small targets, however, FPN may cause the calculation overhead to be increased, YOLOF (young Look One-level Feature) proposes a C5 Feature map (32-fold down-sampling Feature map) that already includes sufficient context information for detecting various scale targets. And stacking standard convolution and expansion convolution to increase the receptive field, and connecting and constructing an expansion module through residual errors to obtain a characteristic diagram covering all target scales.

Using MobileNet as a backbone network to extract features, reducing channel dimension by convolution with 1 × 1, refining context semantic information by convolution with 3 × 3, and superposing 4 expansion modules with different expansion rates to cover output features of a plurality of receptive fields. Then, the face probability, the face frame regression and the face key point are output by three detection heads respectively after being input into an SSH module, and the brand-new detection network RetinaDF obtained in the way can greatly improve the detection speed.

The industrial television camera realizes face detection and recognition, and firstly, a high-quality face library is established. The resolution of the face detected by the industrial television camera is generally only dozens of pixels and is not more than 120 pixels at most, so that when a face library is established, the resolution of the face cannot be too large when the face is input, and the maximum resolution of the face cannot be more than 150 pixels. If a face library under a general scene is used, such as self-shooting at any time, the resolution of the face in the library is far greater than that of the face detected by an industrial television camera, so that the face similarity calculation and face recognition effects are poor; meanwhile, in order to match with a scene, it is preferable to directly use an industrial television camera to collect a human face.

The industrial television camera is not a front face generally due to the fact that a human face detected by the angle problem of high-altitude presence is collected, the human face with five angles such as a front face, a left side face, a right side face, a lower angle and an upper angle is collected when the human face is collected, a plurality of human faces are collected at each angle (a plurality of human faces can increase the identification accuracy, but the identification rate is influenced too much), and the offset angle is different as much as possible and is large in difference. The face of typeeing can not have facial sheltering from and can not shelter from eyes eyebrow such as bang overlength, safety helmet also can not shelter from eyes eyebrow.

As shown in fig. 2, the collected picture containing the face is input into a face detection network RetinaDF, the detection network outputs face probability, face library regression, face key point information of a large number of detection frames, the correct frame is screened out by NMS (Non Maximum Suppression), the face is intercepted from the original image according to the screened face frame information, the face is rotationally corrected according to the key point, the FaceNet feature extraction network is input to extract the face features, and the corresponding names are stored in the face library.

After the face library is established, a face picture or a video is collected by an industrial television camera to realize a face recognition algorithm. And similarly, inputting a face detection network aiming at the picture or the video frame, and intercepting the face after the face frame is output by the detection network.

Firstly, judging whether the resolution of an intercepted human face is too small, if the width and the height are both smaller than 80 pixels, judging that the human face is too small, and not sending the human face into a feature extraction network to extract and identify features; secondly, judging whether the face is excessively biased or not according to the information of five key points of two eyes, a nose and a mouth corner of the face, if so, not sending the face to a feature extraction network to extract and identify features. Referring to fig. 4, if the nose is excessively close to the left eye and the right eye or the left mouth angle and the right mouth angle in the x-axis direction, the human face is judged to be excessively deviated, and the distance x between the nose and the left eye and the right eye in the x-axis direction is calculated ₁ 、x ₂ Distance (x) from left and right eyes ₁ + x ₂ ) The distance x between the nose and the left and right mouth angles in the x-axis direction is calculated ₃ 、x ₄ The distance (x) between the left and right mouth angles ₃ + x ₄ ) Setting a threshold value of 0.1, and if the calculated ratio is smaller than the threshold value, judging that the face is over biased. If the nose is excessively close to the left eye and the right eye or the left mouth angle and the right mouth angle in the y-axis direction, judging that the face is excessively deviated, and calculating the distance y between the nose and the center points of the left eye and the right eye in the y-axis direction ₁ In the left and right eyesCenter point, distance between center points of left and right mouth corners (y) ₁ + y ₂ ) The distance y between the nose and the left and right mouth angles in the y-axis direction is calculated ₂ Distance (y) from left and right mouth corners ₁ + y ₂ ) The ratio of (2) is more strict in the two directions of upper and lower judgment than left and right judgment, a threshold value of 0.25 is set, and if the calculated ratio is smaller than the threshold value, the face is judged to be over-biased.

And if the left eye, the right eye and the left mouth corner are excessively close to the frame, the human face is also judged to be excessively deviated. Calculating the distance w between the left eye and the left mouth corner and the left side frame ₁ 、w ₃ Distance w between right eye and mouth corner and right side frame ₂ 、w ₄ Setting a threshold value of 0.1, and if the ratio of the distance to the width w of the face frame is smaller than the threshold value, judging that the face is over biased; calculating the distance h between the left eye and the upper side frame and the right eye ₁ 、h ₂ Distance h between left mouth corner, right mouth corner and lower side frame ₃ 、h ₄ And if the ratio of the distance to the height h of the face frame is less than 0.1 of the threshold value, judging that the face is over biased.

Affine transformation is carried out on the judged face, rotating and centering are carried out according to key points, faceNet is input to extract features, the extracted features are matched with features in a face library, and similarity distance is calculated.

Because the Euclidean distance eudis is used for calculating the face similarity, unknown faces are easily recognized as faces in a face library, the cosine distance cosdis is used for calculating the face similarity, the faces in the face library are easily not recognized, a new weighting distance is designed for calculating the face similarity, and the Euclidean distance and the cosine distance are balanced.

Firstly, it is most important to prevent an unknown face from being recognized as a face in a face library, so that the threshold needs to be more strict, the weight of the cosine distance needs to be larger, and meanwhile, since the cosine distance is easy to be unable to recognize the face in the face library, the euclidean distance needs to be added for balancing.

The smaller the Euclidean distance, the more similar the Euclidean distance is, the minimum is 0, while the larger the cosine distance, the more similar the maximum is 1, the Euclidean distance threshold value is 0.8, and the cosine distance threshold value is 0.75. The cosine distance is subtracted from 1, and the cosine distance threshold becomes 0.25, and becomes similar as the euclidean distance becomes smaller. The Euclidean distance and the cosine distance are weighted and added to obtain a new distance d = alpha. Eudis + (1-alpha). (1-cosdis), wherein alpha is the weight for balancing eudis and cosdis, the effect is better when 0.2 is taken, and the threshold value is changed to 0.2.0.8+0.8.0.25=0.36.

And calculating the similarity between the extracted features and the features in the face library, judging whether the similarity is smaller than a threshold value, matching the corresponding name if the similarity is smaller than the threshold value, and otherwise, identifying the name.

It should be noted that, in this document, terms such as "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. A rapid face detection and identification method based on key point correction is characterized in that: comprises the following steps of (a) carrying out,

step 1, establishing a face library, and ensuring that the resolution of a face cannot be overlarge and the maximum resolution does not exceed 150 pixels when the face is input;

step 2, detecting the human face, inputting the collected picture containing the human face into a human face detection network RetinaDF, intercepting the human face in an original image according to the output human face frame information, rotationally correcting the human face according to a key point, inputting a faceNet characteristic extraction network to extract the human face characteristics and corresponding human names and storing the human face characteristics and the corresponding human names into a human face library;

step 3, inputting a face detection network aiming at the picture or the video frame, and intercepting a face after detecting a face frame output by the network;

step 4, judging resolution and correction deviation, carrying out affine transformation on the judged face, carrying out rotating correction according to key points, and inputting faceNet to extract features;

and 5, calculating and matching the similarity of the human face, matching the extracted features with the features in the human face library, and calculating the similarity distance.

2. The method for rapid face detection and recognition based on keypoint correction according to claim 1, characterized in that: in step 1, guarantee that the human face resolution ratio can not be too big when typeeing the human face, the biggest 150 pixels that do not exceed, in order to match with the scene, it is good to directly use the industrial television camera to gather the human face, gather the face of five angles such as face, left side face, right side face, the face of leaning on, on the upper side when gathering the human face, many people's faces are gathered to every angle, many people's faces of same kind of angle, the skew angle is different as far as possible and the difference is some greatly, the human face of typeeing can not have facial sheltering from.

3. The method for rapid face detection and recognition based on keypoint correction according to claim 1, characterized in that: in the step 2, the face detection network RetinaDF uses MobileNet as a backbone network, and superposes expansion modules with different expansion rates, wherein each expansion module is formed by connecting standard convolution, expansion convolution and residual errors, covers output characteristics of a plurality of receptive fields, then inputs the output characteristics into an SSH module, and respectively outputs face probability, face frame regression and face key points by using three detection heads.

4. The method for rapid face detection and recognition based on keypoint correction according to claim 1, characterized in that: after the face library is established in the step 3, acquiring a face picture or a video by using an industrial television camera to realize a face recognition algorithm, judging whether the resolution of the intercepted face is too small, if the width and the height are both smaller than 80 pixels, judging that the face is too small, and not sending the face into a feature extraction network to extract and recognize features; and judging whether the face is excessively biased or not according to the information of five key points of two eyes, a nose and a mouth corner of the face, and if the face is excessively biased, not sending the face to a feature extraction network to extract and identify features.

5. The method for rapid face detection and recognition based on keypoint correction according to claim 1, characterized in that: in the step 5, the weighted distance is used for calculating the face similarity, and the Euclidean distance and the cosine distance are balanced, and the Euclidean distance is required to be added for balancing because the cosine distance is easy to identify the face in the face library;

6. The method of claim 4, wherein the method comprises the following steps: in the step 2, in the step of processing,

if the nose is excessively close to the left eye and the right eye or the left mouth angle and the right mouth angle in the x-axis direction, judging that the face is excessively deviated, and calculating the distance x between the nose and the left eye and the right eye in the x-axis direction ₁ 、x ₂ Distance (x) from left and right eyes ₁ + x ₂ ) The distance x between the nose and the left and right mouth angles in the x-axis direction is calculated ₃ 、x ₄ The distance (x) between the left and right mouth angles ₃ + x ₄ ) Setting a threshold value of 0.1, and if the calculated ratio is smaller than the threshold value, judging that the face is over biased;

if the nose is excessively close to the left eye and the right eye or the left mouth angle and the right mouth angle in the y-axis direction, judging that the face is excessively deviated, and calculating the distance y between the nose and the center points of the left eye and the right eye in the y-axis direction ₁ Distance (y) from center point of left and right eyes and center point of left and right mouth angles ₁ + y ₂ ) The distance y between the nose and the left and right mouth angles in the y-axis direction is calculated ₂ Distance (y) from left and right mouth corners ₁ + y ₂ ) The ratio of (2) is more strict in the two directions of judging the upper part and the lower part than in the direction of judging the left part and the right part, a threshold value of 0.25 is set, and if the calculated ratio is smaller than the threshold value, the face is judged to be over-biased;

if the left eye, the right eye and the left mouth corner are excessively close to the frame, the human face is also judged to be excessively deviated, and the distance w between the left eye, the left mouth corner and the left side frame is calculated ₁ 、w ₃ Distance w between right eye and mouth corner and right side frame ₂ 、w ₄ Setting a threshold value of 0.1, if the ratio of the distance to the width w of the face frame is less than the threshold value, judging that the face is over-biased, and calculating the distance h between the left eye, the right eye and the upper side frame ₁ 、h ₂ Distance h between left mouth corner, right mouth corner and lower side frame ₃ 、h ₄ And if the ratio of the distance to the height h of the face frame is less than 0.1 of the threshold value, judging that the face is over biased.