CN115512408A

CN115512408A - Face recognition method and system under natural monitoring based on deep learning

Info

Publication number: CN115512408A
Application number: CN202211143361.2A
Authority: CN
Inventors: 徐玲杰
Original assignee: Cosco Shipping Technology Co Ltd
Current assignee: Cosco Shipping Technology Co Ltd
Priority date: 2022-09-20
Filing date: 2022-09-20
Publication date: 2022-12-23

Abstract

The invention provides a face recognition method and a face recognition system under natural monitoring based on deep learning, which are characterized in that based on a natural monitoring video stream to be detected, a face detection algorithm is adopted to recognize face pictures of a plurality of persons in the natural monitoring video stream and extract key feature points, based on the key feature points, an SVM (support vector machine) algorithm is adopted to distinguish a front face picture and a skew face picture, the skew face picture with a face skew angle larger than or equal to a preset threshold value is screened out, the front face picture and the skew face with a face skew angle smaller than the preset threshold value are corrected and preprocessed, then a face recognition model Arcface is adopted to extract a multi-dimensional feature vector of each face in each face picture, the minimum Euclidean distance between the multi-dimensional feature vector of each face in the natural monitoring video stream and the multi-dimensional feature vectors of all faces in a face information base is calculated, the calculated distance is compared with the preset distance threshold value to match the identity of the corresponding person, and more accurate matching results can be obtained.

Description

Face recognition method and system under natural monitoring based on deep learning

Technical Field

The invention relates to the technical field of universal video monitoring, in particular to a face recognition method and system under natural monitoring based on deep learning.

Background

Face recognition is a biometric technology for identity recognition based on facial feature information of a person. A series of related technologies, also commonly called face recognition and face recognition, are used to collect images or video streams containing faces by using a camera or a video camera, automatically detect and track the faces in the images, and then perform face recognition on the detected faces. The present general face recognition system mainly comprises four components, namely face image acquisition and detection, face image preprocessing, face image feature extraction, matching and recognition.

The face recognition accuracy is affected by many factors, such as illumination, face angle, resolution, and the like. In order to improve the problems and improve the accuracy of face recognition to adapt to scenes with very high requirements on face recognition precision, such as mobile phone unlocking, face identity card matching, face access control and the like, three-dimensional face recognition, thermal imaging face recognition, a multi-light-source face recognition technology based on an infrared camera and the like are introduced. The auxiliary function of the non-visible light camera for face recognition is limited by distance, and the infrared function of the camera fails due to over-strong illumination, so the three technologies are not suitable for face recognition under common monitoring videos. The face recognition method under common natural monitoring mainly depends on effective face detection, face feature extraction and matching algorithms.

The traditional face detection method, such as an Adaboost face detection method based on an integral graph and a cascade detector, can quickly read and detect the face on the front side, but in a complex scene, the detection result is unstable, the area similar to the face is easily falsely detected as the face, and the false detection rate is high; for example, the face detection method based on the edge shape, texture and color features of the face organ is essentially to detect the face by using the rule derived from the prior knowledge of the face, and has the problem of high false detection rate of a complex scene; for example, a template matching-based method realizes a face detection function through correlation between a face template and an image to be detected, and the method is simple in principle and easy to implement, but the size, size and shape of the template cannot be changed in a self-adaptive manner, so that the application range of the method is narrow.

Disclosure of Invention

The invention provides a face recognition method under natural monitoring based on deep learning, which aims to solve the problems of unstable detection result, high false detection rate, narrow application range and the like in the existing face recognition process. The invention also relates to a face recognition system under natural monitoring based on deep learning.

The technical scheme of the invention is as follows:

a face recognition method under natural monitoring based on deep learning is characterized by comprising the following steps:

a data acquisition step: acquiring a natural monitoring video stream to be detected, identifying a plurality of face pictures of a plurality of persons in the natural monitoring video stream according to a face detection algorithm, and extracting key feature points from each face picture;

picture processing: based on the key feature points, distinguishing a front face picture and a skew face picture in the face picture by adopting an SVM (support vector machine) algorithm, comparing a face skew angle in the skew face picture with a preset skew angle threshold, if the face skew angle is greater than or equal to the preset skew angle threshold, removing the skew face picture greater than or equal to the preset skew angle threshold, if the face skew angle is less than the preset skew angle threshold, correcting the skew face picture and the front face picture which are less than the preset skew degree threshold according to a reference feature point set by a standard face to obtain a corrected face picture, and preprocessing the corrected face picture;

extracting and matching the face features: for the preprocessed face pictures, extracting a multidimensional feature vector of each face in each face picture by adopting a face recognition model Arcface to obtain multidimensional feature vectors of a plurality of faces, respectively calculating the minimum Euclidean distance between the multidimensional feature vector of each person in each face picture in the natural monitoring video stream and the multidimensional feature vectors in all face pictures in a face information base according to the multidimensional feature vectors, comparing the calculated minimum Euclidean distance with a preset distance threshold, and if the minimum Euclidean distance is smaller than or equal to the preset distance threshold, recognizing the identity of each person in the natural monitoring video stream according to the face picture in the face information base corresponding to the minimum Euclidean distance.

Preferably, in the data collecting step, the key feature points include a left eye, a right eye, a nose tip, a left mouth corner, and a right mouth corner.

Preferably, in the image processing step, after the SVM support vector machine algorithm is adopted to distinguish the positive face image and the skewed face image in the face image, the collected positive face image and the skewed face image of each person are respectively marked as a positive sample and a negative sample in sequence, and the face images of the same person are summarized under the same folder.

Preferably, in the step of processing the pictures, the preprocessing includes turning over and normalizing each face picture to obtain a normalized picture corresponding to each face picture.

Preferably, in the step of extracting and matching the face features, the face information base includes person identity information and a face picture base of the person.

A human face recognition system under natural monitoring based on deep learning is characterized by comprising a data acquisition module, an image processing module and a human face feature extraction and matching module which are sequentially connected,

the data acquisition module is used for acquiring a natural monitoring video stream to be detected, identifying a plurality of face pictures of a plurality of persons in the natural monitoring video stream according to a face detection algorithm, and extracting key feature points from each face picture;

the image processing module is used for distinguishing a front face image and a skew face image in the face image by adopting an SVM (support vector machine) algorithm based on the key feature points, comparing the skew angle of the face in the skew face image with a preset skew angle threshold value, if the skew angle of the face is greater than or equal to the preset skew angle threshold value, eliminating the skew face image which is greater than or equal to the preset skew angle threshold value, and if the skew angle of the face is less than the preset skew angle threshold value, correcting the skew face image and the front face image which are less than the preset skew degree threshold value according to a reference feature point set by a standard face to obtain a corrected face image, and preprocessing the corrected face image;

and the face feature extraction matching module is used for extracting the multidimensional feature vector of each face in each face picture by adopting a face recognition model Arcface for the preprocessed face pictures to obtain the multidimensional feature vectors of a plurality of faces, respectively calculating the minimum Euclidean distance between the multidimensional feature vector of each person in each face picture in the natural monitoring video stream and the multidimensional feature vectors in all the face pictures in the face information base according to the multidimensional feature vectors, comparing the calculated minimum Euclidean distance with a preset distance threshold, and recognizing the identity of each person in the natural monitoring video stream according to the face picture in the face information base corresponding to the minimum Euclidean distance if the minimum Euclidean distance is less than or equal to the preset distance threshold.

Preferably, the key feature points include a left eye, a right eye, a nose tip, a left mouth corner, and a right mouth corner.

Preferably, in the image processing module, before the SVM support vector machine algorithm is used to distinguish the positive face image and the skewed face image in the face image, the collected positive face image and the skewed face image of each person are respectively and sequentially marked as a positive sample and a negative sample, and the face images of the same person are summarized in the same folder.

Preferably, the preprocessing includes turning over and normalizing each face picture to obtain a normalized picture corresponding to each face picture.

Preferably, the face information base includes person identity information and a face picture base of the person.

The invention has the beneficial effects that:

the face recognition method under natural monitoring based on deep learning can be used for rapidly recognizing multiple persons at the same time, and based on the natural monitoring video stream to be detected, multiple face pictures of the multiple persons in the natural monitoring video stream are recognized by adopting a face detection algorithm, so that the multiple faces under the picture of a natural monitoring camera can be simultaneously and rapidly recognized; key feature points are extracted from each face picture, a fine face screening mechanism is designed to improve face recognition accuracy, face angles acquired by a monitoring camera in a natural scene are random, and a plurality of skewed faces and side faces exist, a front face picture and a skewed face picture in the face pictures are distinguished by adopting an SVM (support vector machine) algorithm based on the face key feature points, faces with large skew angles and blurred faces and side faces are screened, face accuracy is effectively improved, the feature significance of faces extracted from follow-up face features is guaranteed, and the accuracy of face matching is guaranteed. And finally, extracting a multidimensional characteristic vector of the face picture by adopting a face recognition model Arcface, calculating the minimum Euclidean distance between each face picture of each person with the multidimensional characteristic vector in the natural monitoring video stream and the face picture in the face information base according to the multidimensional characteristic vector, and comparing the calculated Euclidean distance with a preset distance threshold value to match the identity of the corresponding person. The method has high reproducibility, can be applied to different scenes, can realize effective transplantation only by acquiring the face picture in a certain scene to perform relay training on the face recognition model ArcFace, can obtain more accurate matching results, and has the advantages of flexible application and high transportability.

The invention also relates to a face recognition system under natural monitoring based on deep learning, which corresponds to the face recognition method under natural monitoring based on deep learning, and can be understood as a system for realizing the face recognition method under natural monitoring based on deep learning. The method can be used for simultaneously and quickly identifying a plurality of faces in the picture of the monitoring camera (the common high-definition camera) in the natural scene, and can obtain a more accurate matching result.

Drawings

Fig. 1 is a flow chart of the face recognition method based on natural monitoring of deep learning according to the present invention.

Fig. 2 is a preferred flow chart of the face recognition method based on natural monitoring of deep learning according to the present invention.

Detailed Description

The present invention will be described with reference to the accompanying drawings.

The invention relates to a face recognition method under natural monitoring based on deep learning, and a flow chart of the method is shown in figure 1, and the method sequentially comprises the following steps:

a data acquisition step: acquiring a natural monitoring video stream to be detected, identifying face pictures of a plurality of persons in the natural monitoring video stream according to a face detection algorithm, and extracting key feature points from each face picture; specifically, as shown in the preferred flowchart of fig. 2, a surveillance video stream in a natural scene to be detected is obtained, a face detection algorithm Retinaface deployed at a server is used for real-time detection, face images (in jpg format) of multiple detected persons are stored, and five key feature points (in json format) of a face are extracted from each face image. The Retinaface detects all face frame position coordinate points in the monitoring picture and coordinates of five key feature points of a left eye, a right eye, a nose tip, a left mouth corner and a right mouth corner, and screens out face frames with discordant proportions and small areas according to length-width proportions and area information of the face frames.

It should be noted that, different from a specific scene, such as mobile phone face unlocking and face access control, a face image acquired in a natural scene may have face skew, blur, and uneven light. At present, a face recognition data set sample in an existing laboratory scene is not enough to cover complex face recognition requirements in a natural scene. Therefore, the face image under the real scene needs to be acquired to enrich the sample set, and in order to be as close to the real scene as possible, the acquisition process should be as non-sensitive as possible, that is, the face image is directly acquired under the condition that the person on the picture is not notified, and the frame rate is 1 frame/second.

Picture processing: based on key feature points, distinguishing a front face picture and a skew face picture in the face picture by adopting an SVM (support vector machine) algorithm, comparing a face skew angle in the skew face picture with a preset skew angle threshold, if the face skew angle is greater than or equal to the preset skew angle threshold, removing the skew face picture which is greater than or equal to the preset skew angle threshold, and if the face skew angle is less than the preset skew angle threshold, correcting the skew face picture and the front face picture which are less than the preset skew degree threshold according to a reference feature point set by a standard face to obtain a corrected face picture, and preprocessing the corrected face picture; that is to say, a face skew classification SVM operator is adopted to judge whether the face is a front face or not, openCV face correction is carried out if the face is the front face, and if the face is not the front face, face recognition is abandoned and data collection is returned again to obtain pictures through natural monitoring video stream.

Specifically, based on coordinate values of five key feature points, a positive face picture and a skewed face picture in the face picture are distinguished by adopting an SVM (support vector machine) algorithm, after the positive face picture and the skewed face picture in the face picture are distinguished, the positive face picture (positive sample) and the skewed face picture (negative sample) are distinguished manually, positive and negative sample marking is carried out on key feature point (json format) files of the face in sequence, and the face pictures of the same person are summarized under the same folder. Each face sample corresponds to a json file, and x and y coordinates (namely, a picture coordinate system) of five key points of the face in a picture are recorded in the file. And manually judging the face skew according to naked eyes, dividing the face picture and the json file corresponding to the face picture into a positive sample group and a negative sample group, and storing the positive sample group and the negative sample group in two folders for training the SVM. Wherein, each human face picture should be clear in five sense organs, and each person preferably has human face pictures under different angles and light conditions, and the number of the human face pictures is only 5-10.

According to the SVM support vector machine algorithm principle, modeling can be carried out on the problems to form a most-valued optimization problem, and the most-valued optimization problem is expressed according to the following formula:

in the above formula, w and b are weights and bias parameters in the SVM modeling function to be optimized, xi are coordinate values of five human face key feature points in each picture, yi is a true skew result (1 represents a positive face, -1 represents a skewed face), and T is a matrix transposition symbol (not a variable). Therefore, according to the SVM support vector machine algorithm optimization mode, the marked positive and negative samples can be input into the formula to obtain the SVM classification model.

And then comparing the face skew angle in the skew face picture with a preset skew angle threshold, if the face skew angle is larger than or equal to the preset skew angle threshold (such as 45 degrees), namely the face skew angle is larger than or equal to 45 degrees, rejecting the skew face picture with the face skew angle larger than or equal to 45 degrees, and if the face skew angle is smaller than 45 degrees, correcting the skew face picture smaller than 45 degrees according to 5 reference characteristic points set by a standard face to obtain a corrected face picture, and turning and normalizing the face picture and the corrected face picture to obtain a normalized picture corresponding to each face picture.

Extracting and matching the face features: for the preprocessed face pictures, extracting a multidimensional feature vector of each face in each face picture by adopting a face recognition model Arcface to obtain multidimensional feature vectors of a plurality of faces, respectively calculating the minimum Euclidean distance between the multidimensional feature vector of each person appearing under the lens in the natural monitoring video stream in each face picture and the multidimensional feature vectors in all the face pictures in a face information base according to the multidimensional feature vectors, comparing the calculated minimum Euclidean distance with a preset distance threshold, and recognizing the identity of each person in the natural monitoring video stream according to the face picture in the face information base corresponding to the minimum Euclidean distance if the minimum Euclidean distance is smaller than or equal to the preset distance threshold.

Specifically, for the normalized pictures, a face recognition model Arcface is adopted to extract 128-dimensional feature vectors (1 x128 vectors) of each face in each face picture, so as to obtain the face pictures with the 128-dimensional feature vectors, wherein the 128-dimensional feature vectors mainly comprise distinctive face image features such as face outlines, skin textures, skin colors, facial features and edges and corners.

Then, respectively calculating the minimum Euclidean distance between the multidimensional feature vector of each person appearing under the shot in the natural monitoring video stream in each face picture and the multidimensional feature vectors in all the face pictures in the face information base according to the multidimensional feature vectors, wherein the Euclidean distance is calculated according to the following formula:

in the above formula, X ₁ And X ₂ Are two sets of face feature vectors extracted by Arcface.

Further, the feature vector X ₁ ＝{x _1,1 ,x _1,2 ,...,x _1,128 }、X ₂ ＝{x _2,1 ,x _2,2 ,...,x _2,128 The Euclidean distance of

Where x is the value of the feature vector.

The goal of the Arcface training is to shorten the distance between different face pictures of the same person, conversely, to increase the distance between faces of different persons, and to put the face pictures under the marked natural scene into the model for training, so that an optimized Arcface model can be obtained.

And after the minimum Euclidean distance is calculated, comparing the calculated minimum Euclidean distance with a preset distance threshold, and if the minimum Euclidean distance is smaller than the preset distance threshold, determining that the face identity in the known face information base corresponding to the minimum distance is the identity of the person corresponding to the unknown face when the minimum distance between the face picture of the unknown person in the natural monitoring video stream and all the face pictures in the known face information base is smaller than the distance threshold.

It should be noted that the distance threshold is generally set by grouping the verification sets, and then comparing the face recognition accuracy under different thresholds to determine the value size by using different distance thresholds. The value range of the distance threshold is 0< ∈ ≦ 2.

Assuming that the verification set comprises M persons, each person has 5 face pictures, one face is extracted from the pictures of each person to form a face picture library comprising M faces with identities to be determined, and the remaining N (N =4 × M) faces are faces with known identities for comparison. Two groups of face picture samples can obtain N eigenvectors from Arcface

And M face feature vectors of identities to be confirmed

According to the formula (3), D _M Each feature vector in (1) and D _N All the feature vectors in the method respectively calculate Euclidean distances, and the minimum Euclidean distance of each comparison is reserved.

Min _i ＝min{Dis _i,1 ,Dis _i,2 ,...,Dis _i,N } (3)

Wherein i ∈ {1, 2., M }

When Min _i When the epsilon is less than or equal to the preset value, correspondingly taking Min _i And if the pair of human faces is determined to be the same person, the identification is correct. The accuracy can be expressed as:

P _acc ＝100％×M _T /M (4)

wherein M is _T In order to identify the number of correct face identities.By taking different values of epsilon, P is found to be 0.8 _acc A maximum of 99.2% was obtained. Therefore, in a natural scene, the face distance threshold value of 0.8 is taken to be in different application scenes, and the distance threshold value can be determined by the method to obtain the optimal balance between the omission ratio and the accuracy ratio.

It should be noted that, before the method of the present invention is operated, a face information base is established in advance, and the face information base includes person identity information and a face picture base (5-10 for each person) of corresponding persons. At the beginning of system operation, the 128-dimensional feature vector of the face of each picture is extracted through the Arcface network and stored in the memory of the computer, so that the result can be quickly obtained during matching.

The invention also relates to a face recognition system under natural monitoring based on deep learning, which corresponds to the face recognition method under natural monitoring based on deep learning and can be understood as a system for realizing the method, the system comprises a data acquisition module, an image processing module and a face feature extraction and matching module which are connected in sequence, specifically,

the image processing module is used for distinguishing a front face image and a skew face image in the face image by adopting an SVM (support vector machine) algorithm based on the key feature points, comparing the face skew angle in the skew face image with a preset skew angle threshold value, if the face skew angle is greater than or equal to the preset skew angle threshold value, rejecting the skew face image which is greater than or equal to the preset skew angle threshold value, and if the face skew angle is less than the preset skew angle threshold value, correcting the skew face image which is less than the preset skew degree threshold value according to a reference feature point set by a standard face to obtain a corrected face image, and preprocessing the front face image and the corrected face image;

Preferably, the key feature points include left eye, right eye, nose tip, left mouth corner and right mouth corner.

Preferably, in the image processing module, after the SVM support vector machine algorithm is adopted to distinguish the positive face image and the skewed face image in the face image, the collected positive face image and the skewed face image of each person are respectively marked as a positive sample and a negative sample in sequence, and the face images of the same person are summarized in the same folder.

Preferably, the face information library includes person identity information and a face picture library of persons.

The invention provides an objective and scientific face recognition method and system under natural monitoring based on deep learning, which is characterized in that innovation is mainly carried out on three aspects of data preparation, deep learning model training and face recognition technical process of a face recognition technology under natural monitoring, face pictures of a plurality of people in a natural monitoring video stream are recognized by adopting a face detection algorithm based on a monitoring video stream under a natural scene, key feature points of each face picture are extracted, face picture processing and face picture feature extraction are carried out by adopting an SVM (support vector machine) algorithm and a face recognition model Arcface in sequence, identities of corresponding people are matched, a plurality of faces in a monitoring camera picture under the natural scene can be recognized simultaneously and rapidly, and more accurate matching results can be obtained.

It should be noted that the above-mentioned embodiments enable a person skilled in the art to more fully understand the invention, without restricting it in any way. Therefore, although the present invention has been described in detail with reference to the drawings and examples, it will be understood by those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention.

Claims

1. A face recognition method under natural monitoring based on deep learning is characterized by comprising the following steps:

picture processing: based on key feature points, distinguishing a front face picture and a skew face picture in the face picture by adopting an SVM (support vector machine) algorithm, comparing a face skew angle in the skew face picture with a preset skew angle threshold, if the face skew angle is greater than or equal to the preset skew angle threshold, removing the skew face picture which is greater than or equal to the preset skew angle threshold, and if the face skew angle is less than the preset skew angle threshold, correcting the skew face picture and the front face picture which are less than the preset skew degree threshold according to a reference feature point set by a standard face to obtain a corrected face picture, and preprocessing the corrected face picture;

2. The method according to claim 1, wherein in the data collection step, the key feature points include left eye, right eye, nose tip, left mouth corner, and right mouth corner.

3. The method for recognizing the human face under the natural monitoring based on the deep learning as claimed in claim 1, wherein in the picture processing step, after a positive face picture and a skewed human face picture in the human face picture are distinguished by adopting an SVM (support vector machine) algorithm, the collected positive face picture and the skewed human face picture of each person are respectively marked as a positive sample and a negative sample in turn, and the human face pictures of the same person are summarized under the same folder.

4. The method according to claim 1, wherein in the image processing step, the preprocessing includes turning over and normalizing each face image to obtain a normalized image corresponding to each face image.

5. The natural monitoring face recognition method based on deep learning of claim 1, wherein in the face feature extraction and matching step, the face information base comprises person identity information and a face picture base of persons.

6. A human face recognition system under natural monitoring based on deep learning is characterized by comprising a data acquisition module, an image processing module and a human face feature extraction and matching module which are sequentially connected,

7. The deep learning based natural surveillance system according to claim 6, wherein the key feature points include left eye, right eye, nose tip, left mouth corner and right mouth corner.

8. The deep learning-based natural monitoring face recognition system according to claim 6, wherein in the image processing module, before a positive face image and a skewed face image in the face image are distinguished by using an SVM (support vector machine) algorithm, the collected positive face image and skewed face image of each person are respectively marked as a positive sample and a negative sample in sequence, and the face images of the same person are summarized in the same folder.

9. The deep learning-based face recognition system under natural monitoring according to claim 6, wherein the preprocessing includes flipping and normalizing each face picture to obtain a normalized picture corresponding to each face picture.

10. The deep learning based natural surveillance system according to claim 6, wherein the face information base comprises personnel identity information and a face picture base of personnel.