CN112149557B

CN112149557B - Person identity tracking method and system based on face recognition

Info

Publication number: CN112149557B
Application number: CN202011000236.7A
Authority: CN
Inventors: 柯逍; 林炳辉; 陈宇杰
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2020-09-22
Filing date: 2020-09-22
Publication date: 2022-08-09
Anticipated expiration: 2040-09-22
Also published as: CN112149557A

Abstract

The invention relates to a person identity tracking method and a person identity tracking system based on face recognition, which comprise the following steps: training a neural network by adopting a face data set; acquiring a face picture of an identity figure to be identified, and constructing a face identity library to be identified; detecting the face position of each frame of image by using a trained yolov3 face detection model according to an input video frame; extracting features of the detected face by using a trained neural network, comparing the features with face features in a face identity library to be recognized to determine identity, and initializing a face target to be tracked; and tracking the identity of the person corresponding to the face. The invention can confirm the id of the person for the tracked target.

Description

Person identity tracking method and system based on face recognition

Technical Field

The invention relates to the technical field of machine vision, in particular to a person identity tracking method and system based on face recognition.

Background

In recent years, with social progress and continuous development of science and technology, the problem of face recognition is always a popular research field, and many experts at home and abroad have very deep research in the field. Meanwhile, as an entrance and a basis of face recognition, techniques of face detection, alignment, tracking and the like are also developed together. The face recognition technology is widely applied to practical application scenes such as intelligent monitoring, video conferences and access control systems, but due to the fact that background complexity of the practical scenes is changed by other factors such as illumination, shielding or changes of human postures, a face recognition algorithm in videos of a real monitoring system still has certain challenges.

Meanwhile, an object tracking algorithm is rapidly developed in recent years, the tracking algorithm is widely applied to monitoring scenes, and the requirement on an intelligent security scene is high. However, most of the tracking of people by the current tracking algorithm only stays at the pedestrian level, and the id of the target often has a conversion problem in the tracking process.

Disclosure of Invention

In view of the above, the present invention is directed to a method and a system for tracking a person identity based on face recognition, wherein an id of the person can be confirmed for a tracked target.

The invention is realized by adopting the following scheme: a person identity tracking method based on face recognition specifically comprises the following steps:

training a neural network by adopting a face data set; acquiring a face picture of an identity figure to be identified, and constructing a face identity library to be identified;

detecting the face position of each frame of image by using a trained yolov3 face detection model according to an input video frame;

extracting features of the detected face by using a trained neural network, comparing the features with face features in a face identity library to be recognized to determine identity, and initializing a face target to be tracked;

and tracking the identity of the person corresponding to the face.

Further, the training of the neural network by using the face data set specifically includes:

collecting a public face data set to obtain pictures of related persons and corresponding person names;

taking the size of the face image in the face data set to be 112 × 112 and using resnet as the backbone network, the loss function is set as follows:

where m is the number of samples, i represents the ith sample, n represents the number of classes, j represents the jth class,

score, y, representing the category to which the ith sample belongs _i Is the class to which the ith sample belongs, s is a normalization parameter, i.e., a scaling factor,

is the weight W _yi And a feature vector x _i Angle of cosine of (1), wherein weight W _i And a feature vector x _i Having been normalized to 1, t is an introduced hyperparameter used to limit the included angle between the different classes.

Further, the constructing of the face identity library to be recognized specifically includes: selecting a human face image of a target person to be tracked, taking the name of the person as a file name, placing the human face image under an appointed folder as an image library of the person to be tracked, wherein k persons exist in the library, and the corresponding name is a name ₁ ，name ₂ ，...，name _k 。

Further, the detecting the face position of each frame of image by using the trained yolov3 face detection model according to the input video frame specifically comprises:

selecting an image of a first frame of a video stream;

calling a pre-trained yolov3 face detection model, changing the size of an input picture into 448 x 448 size by the yolov3 face detection model, and dividing the input picture into 7 x 49 grids on average, wherein the size of each grid is 64 x 64;

for each grid, 2 bounding boxes are predicted, and each bounding box has five basic parameters of (x, y, w, h, confidence), wherein (x, y) is the center coordinate of the bounding box, and (w, h) is the width and height of the bounding box and the confidence is the confidence;

and predicting 7 × 2 boundary frames according to the previous step, screening the boundary frames with the confidence coefficient lower than a preset threshold value of 0.7, then inhibiting and removing redundant windows by using a non-maximum value, and taking the obtained boundary frames as face detection frames to obtain the positions of the faces in the images.

Further, the step of extracting features of the detected face by using a trained neural network, and comparing the features with the face features in the face identity library to be recognized to determine the identity specifically comprises the following steps:

intercepting an image of a face position, aligning the face by adopting similar transformation, changing the size of the intercepted image into 112 multiplied by 112, and sending the image into a trained neural network to obtain a feature vector a;

respectively sending k pictures in a face identity library to be recognized into a trained neural network to obtain k output characteristic vectors b ₁ ，b ₂ ，...，b _k K is the number of the faces in the face identity library to be recognized;

respectively matching the feature vectors a with b ₁ ，b ₂ ，...，b _k Calculating the cosine similarity, and determining the b with the highest cosine similarity and exceeding the set threshold value of 0.8 _i And taking the corresponding face as the face matched with the feature a, otherwise, setting the face corresponding to the feature a as a stranger.

Further, the tracking the person identity corresponding to the face specifically includes:

representing the target state of each tracked face as follows:

in the formula, m' represents the tracked face target state, u and v represent the central coordinates of the tracked face region, s is the length-width ratio of the face frame, r is the height of the face frame,

respectively represent (u, v, s, r) in the imageVelocity in coordinate space;

allocating a tracker for each face detection frame to be tracked, setting a counter, increasing the counter during Kalman filtering prediction, and once the face detection results of one face detection frame tracker and yolov3 can be matched, resetting the counter corresponding to the face detection frame tracker to be 0; if a face detection frame tracker cannot match the face detection result of yolov3 within a preset period of time, namely 30 frames, deleting the track of the face detection frame tracker from the track list;

and (4) transmitting the track boxes in the track list into a trained neural network in real time to detect the id of the face.

Further, the matching of the tracking result and the detection result is realized by adopting the following method:

linear weighting of three metric approaches is used as the final metric value:

d(i1，j1)＝αd ⁽¹⁾ (i1，j1)+βd ⁽²⁾ (i1，j1)+(1-α-β)d ⁽³⁾ (i1，j1)；

in the formula (d) ⁽¹⁾ (i1, j1) is the tracking result c _i1 And the detection result d _j1 A measure of the position between, d ⁽²⁾ (i1, j1) is the tracking result c _i1 And the detection result d _j1 The value of the appearance metric between d ⁽³⁾ (i1, j1) is the tracking result c _i1 And the detection result d _j1 The alpha and beta are weighting coefficients;

if d (i1, j1) is less than the set threshold value of 0.3, the tracking result c is judged _i1 And the detection result d _j1 Are matched.

Further, the tracking result c _i1 And the detection result d _j1 The velocity metric value between is calculated using the following equation:

in the formula (I), the compound is shown in the specification,

as a result of tracking c _i1 And the detection result d _j1 F is the tracking result c _i1 And the detection result d _j1 The number of frames in between.

The invention also provides a person identity tracking system based on face recognition, comprising a processor, a memory and computer program instructions stored on the memory and capable of being executed by the processor, wherein when the computer program instructions are executed by the processor, the steps of the method are realized.

The present invention also provides a computer readable storage medium having stored thereon computer program instructions executable by a processor, the computer program instructions, when executed by the processor, performing the method steps as described above.

Compared with the prior art, the invention has the following beneficial effects: the invention can confirm the id, namely the name, of the person for the tracked target, and can reconfirm the identity of the person by a face recognition method if the number of the tracked target changes in the tracking process. Meanwhile, the idea of object tracking is also utilized to predict the motion trail of the face, so that the problem of tracking frame delay in face frame-by-frame identification is avoided.

Drawings

FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure herein. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As shown in fig. 1, the embodiment provides a person identity tracking method based on face recognition, which specifically includes the following steps:

and tracking the identity of the person corresponding to the face.

In this embodiment, the training of the neural network by using the face data set specifically includes:

the Size of a face image in a face data set is changed to 112 × 112, resnet is used as a backbone network, the total Batch Size during training is set to 512, meanwhile, the learning rate is reduced by one order of magnitude from 0.1 in 10 ten thousand iterations, 14 ten thousand iterations and 16 ten thousand iterations respectively, the total iteration is set to 20 ten thousand iterations, momentum is 0.9, weight attenuation is 5e-4, and a loss function is set as follows:

a score representing the category to which the ith sample belongs,y _i is the class to which the ith sample belongs, s is a normalization parameter, i.e., a scaling factor,

In this embodiment, the constructing the face identity library to be recognized specifically includes: selecting a human face image of a target person to be tracked, taking the name of the person as a file name, placing the human face image under an appointed folder as an image library of the person to be tracked, wherein k persons exist in the library, and the corresponding name is a name ₁ ，name ₂ ，...，name _k 。

In this embodiment, the detecting, according to the input video frame, the face position of each frame of image using the trained yolov3 face detection model specifically includes:

selecting an image of a first frame of a video stream;

for each grid, 2 bounding boxes are predicted, and each bounding box has five basic parameters of (x, y, w, h, confidence), wherein (x, y) is the center coordinate of the bounding box, (w, h) is the width and height of the bounding box, and confidence is confidence;

and (3) predicting 7 × 2 boundary frames according to the previous step, screening the boundary frames with the confidence coefficient lower than a preset threshold value of 0.7, then inhibiting and removing redundant windows by using a non-maximum value, and taking the obtained boundary frames as face detection frames, namely obtaining the positions of the faces in the images.

In this embodiment, the extracting features of the detected face by using the trained neural network, and comparing the extracted features with the face features in the face identity library to be recognized to determine the identity specifically includes:

respectively sending k pictures in a face identity library to be recognized into a trained neural network to obtain k output eigenvectors b ₁ ，b ₂ ，...，b _k K is the number of the faces in the face identity library to be recognized;

respectively matching the feature vectors a with b ₁ ，b ₂ ，...，b _k The cosine similarity is calculated according to the following formula:

the largest one of the k similar is the matched face, and the degree of acquaintance of the feature vector a and all the feature vectors b does not exceed the threshold, the face corresponding to the feature vector a is a stranger, namely is not in the library.

In this embodiment, the tracking the person identity corresponding to the face specifically includes:

representing the target state of each tracked face as follows:

respectively representing the velocities of (u, v, s, r) in the image coordinate space;

allocating a tracker for each face detection frame to be tracked, setting a counter, increasing the counter during Kalman filtering prediction, and once the face detection results of one face detection frame tracker and yolov3 can be matched, resetting the counter corresponding to the face detection frame tracker to be 0; if a face detection frame tracker cannot match the face detection result of yolov3 within a preset period of time, namely 30 frames, the track of the face detection frame tracker is deleted from the track list;

In this embodiment, the matching between the tracking result and the detection result is implemented by the following method:

linear weighting of three metric approaches is used as the final metric value:

in the formula (d) ⁽¹⁾ (i1, j1) is the tracking result c _i1 And the detection result d _j1 A measure of the position between, d ⁽²⁾ (i1, j1) is the tracking result c _i1 And the detection result d _j1 The value of the appearance metric between d ⁽³⁾ (i1, j1) is the tracking result c _i1 And the detection result d _j1 Alpha and beta are weighting coefficients;

if d (i1, j1) is less than the set threshold value of 0.3, the tracking result c is judged _i1 And the detection result d _j1 And matching, and tracking the human face of all video frames by the method.

The matching of the tracking frame adopts a position factor and an appearance factor respectively, wherein the mahalanobis distance is used in the position measurement:

mahalanobis distance calculation object detection frame d _j1 And an object tracking frame c _i1 S is a covariance matrix, i1, j1 is a serial number;

wherein, in the aspect of appearance measurement, for each detection block d _j1 Computing a corresponding 128-dimensional feature vector r through a CNN network _j1 Constructing a list for each tracking target, and storing eachAnd tracking the feature vector of the last 100 frames successfully associated with the target. Then the appearance metric is calculated by calculating the minimum cosine distance between the last 100 successfully associated feature sets of the tracker and the feature vector of the current frame detection result:

i1, j1, k1 are all ordinal numbers, and R represents the set of eigenvectors.

Wherein the tracking result c _i1 And the detection result d _j1 The velocity metric value between is calculated using the following equation:

in the formula (I), the compound is shown in the specification,

as a result of tracking c _i1 And the detection result d _j1 F is the tracking result c _i1 And the detection result d _j1 The number of frames in between. The distance measurement is divided by f to represent the moving speed and direction of the detected object, so that the problem of tracking id switching caused by similar people meeting can be better solved.

The present embodiment also provides a person identification tracking system based on face recognition, comprising a processor, a memory, and computer program instructions stored on the memory and capable of being executed by the processor, wherein when the computer program instructions are executed by the processor, the steps of the method as described above are implemented.

The present embodiments also provide a computer readable storage medium having stored thereon computer program instructions executable by a processor, the computer program instructions, when executed by the processor, performing the method steps as described above.

The embodiment focuses on the recognition and tracking of the human face under the monitoring scene by computer vision, and the yolov3 is used as the human face detector, so that the human face detection efficiency is improved. The face recognition is combined with the tracking, the face recognition can determine the identity of a person in the tracking process, the person identity is used for id reduction, the target id frequent transformation in the tracking process is reduced, the speed measurement is added, the constraint on tracking matching is strengthened, and the method has innovative significance. The method provided by the embodiment has high accuracy and good timeliness, and has practical application significance for recognizing and tracking the human face.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims

1. A person identity tracking method based on face recognition is characterized by comprising the following steps:

tracking the identity of a person corresponding to the face;

the neural network trained by adopting the face data set specifically comprises the following steps:

is the weight W _yi And a feature vector x _i Angle cosine value of, wherein weight W _i And a feature vector x _i Having been normalized to 1, t is the introduced hyper-parameter used to limit the included angles between the different classes;

the tracking of the person identity corresponding to the face specifically comprises the following steps:

representing the target state of each tracked face as follows:

in the formula, m 'represents the tracked face target state, u and v represent the center coordinates of the tracked face region, s' is the length-width ratio of the face frame, r is the height of the face frame,

respectively representing the velocities of (u, v, s', r) in the image coordinate space;

allocating a tracker for each face detection frame to be tracked, setting a counter, increasing the counter during Kalman filtering prediction, and once the face detection results of one face detection frame tracker and yolov3 can be matched, resetting the counter corresponding to the face detection frame tracker to be 0; if a face detection frame tracker fails to match the face detection result of yolov3 within a preset period of time, deleting the track of the face detection frame tracker from the track list;

transmitting the track frame in the track list into a trained neural network in real time to detect the id of the face;

the matching of the tracking result and the detection result is realized by adopting the following method:

linear weighting of three metric approaches is used as the final metric value:

in the formula (d) ⁽¹⁾ (i1, j1) is the tracking result c _i1 And the detection result d _j1 Position measure of between, d ⁽²⁾ (i1, j1) is the tracking result c _i1 And the detection result d _j1 The value of the appearance metric between d ⁽³⁾ (i1, j1) is the tracking result c _i1 And the detection result d _j1 The alpha and beta are weighting coefficients;

if d (i1, j1) is smaller than the set threshold, the tracking result c is judged _i1 And the detection result d _j1 Are matched;

the tracking result c _i1 And the detection result d _j1 The velocity metric value between is calculated using the following equation:

in the formula (I), the compound is shown in the specification,

2. A face recognition based on human face as claimed in claim 1The method for tracking the identity of other people is characterized in that the step of constructing the identity library of the face to be recognized specifically comprises the following steps: selecting a human face image of a target person to be tracked, taking the name of the person as a file name, placing the human face image under an appointed folder as an image library of the person to be tracked, wherein k persons exist in the library, and the corresponding names are name1 and name ₂ ，...，name _k 。

3. The person identity tracking method based on face recognition as claimed in claim 1, wherein the detecting the face position of each frame of image using the trained yolov3 face detection model according to the input video frame specifically comprises:

selecting an image of a first frame of a video stream;

and predicting 7 × 2 boundary frames according to the previous step, screening the boundary frames with the confidence coefficient lower than a preset threshold value, then inhibiting and removing redundant windows by using a non-maximum value, and taking the obtained boundary frames as face detection frames, namely obtaining the positions of the faces in the images.

4. The person identity tracking method based on face recognition according to claim 1, wherein the step of extracting features of the detected face by using a trained neural network and comparing the extracted features with the face features in the face identity library to be recognized to determine the identity specifically comprises the following steps:

the human face to be recognizedK pictures in the stock library are respectively sent into a trained neural network to obtain k output characteristic vectors b ₁ ，b ₂ ，...，b _k K is the number of the faces in the face identity library to be recognized;

respectively matching the feature vectors a with b ₁ ，b ₂ ，...，b _k Calculating cosine similarity, and comparing the cosine similarity with the highest value and exceeding the set threshold value _i And taking the corresponding face as the face matched with the feature a, otherwise, setting the face corresponding to the feature a as a stranger.

5. A person identity tracking system based on face recognition, comprising a processor, a memory and computer program instructions stored on the memory and executable by the processor, which when executed by the processor, implement the method steps of any of claims 1-4.

6. A computer-readable storage medium, having stored thereon computer program instructions executable by a processor, for performing, when the processor executes the computer program instructions, the method steps according to any one of claims 1-4.