CN113239727A

CN113239727A - Person detection and identification method

Info

Publication number: CN113239727A
Application number: CN202110375567.7A
Authority: CN
Inventors: 李扬曦; 缪亚男; 王佩
Original assignee: National Computer Network and Information Security Management Center
Current assignee: National Computer Network and Information Security Management Center
Priority date: 2021-04-03
Filing date: 2021-04-03
Publication date: 2021-08-10

Abstract

The invention discloses a person detection and identification method, which relates to the technical field of face identification and comprises the following steps: performing video frame extraction on an input video to obtain an original face image; carrying out face detection on the original face image by using a Blazeface network structure to obtain a target image; positioning key points of the human face of the target image by using Dlib, aligning the human face, and cutting the human face area of the target image to generate a training image; carrying out face recognition training on the training image by using ResNet50+ ArcFace Loss to obtain a trained face recognition network; and analyzing the face image to be recognized by using the trained face recognition network to obtain a recognition result. The method can be used for rapidly carrying out face detection and face recognition, and the detection speed is improved while the detection precision is ensured.

Description

Person detection and identification method

Technical Field

The invention relates to the technical field of face recognition, in particular to a person detection and recognition method.

Background

Various publicity videos in the internet are also endless at present, and some people can also publish some unrealistic words on the internet, if the videos flow into the domestic internet, bad effects can be caused, and the scheme for detecting and identifying people in the prior art has the following defects: the fast RCNN is a two-stage general object detection network, has the characteristics of high precision, but relatively low speed compared with other general detection networks such as Yolo and the like, and has high speed requirements on the speed when face detection is carried out in a video; and secondly, with the development of the face recognition technology, more and better metric learning methods are provided, and the recall of the face recognition can be improved. Therefore, how to provide a method for fast face detection and face recognition is a technical problem to be solved urgently for those skilled in the art.

Disclosure of Invention

In view of this, the present invention provides a method for detecting and identifying a person, so as to solve the problems in the background art and improve the detection speed while ensuring the detection accuracy.

In order to achieve the purpose, the invention adopts the following technical scheme: a person detection and identification method comprises the following steps:

performing video frame extraction on an input video to obtain an original face image;

carrying out face detection on the original face image by using a Blazeface network structure to obtain a target image;

positioning key points of the human face of the target image by using Dlib, aligning the human face, and cutting the human face area of the target image to generate a training image;

carrying out face recognition training on the training image by using ResNet50+ ArcFace Loss to obtain a trained face recognition network;

and analyzing the face image to be recognized by using the trained face recognition network to obtain a recognition result.

Preferably, one frame of image in the input video is extracted as a key frame at regular intervals by using FFmpeg software for the input video.

Preferably, the BlazeFace network structure is based on MobileNet + SSD to improve the size of a convolution kernel and a control mechanism.

Preferably, the specific steps of the face recognition training are as follows:

inputting the training image into ResNet50 to extract features;

calculating the difference between the predicted label and the real label by using the ArcFace Loss to complete the training stage of face recognition, wherein the calculation formula is as follows:

wherein L is₁As a loss function, x_iFor the features extracted by ResNet50, W is the weight value of the fully connected layer, b is the bias value of the fully connected layer, e is the natural logarithm, and m is the sample number.

Preferably, at least one image is taken by each person, the features are extracted by the trained face recognition network, and the extracted features are stored in a database to obtain base features.

Preferably, the method further comprises a face recognition test, and the face recognition test specifically comprises the following steps:

inputting a test image, extracting a face region, inputting the face region into the trained face recognition network, and extracting features to obtain test features;

and calculating Euclidean distance between the test features and the bottom library features, and if the Euclidean distance is smaller than a specified threshold value, determining that the character is the figure.

Compared with the prior art, the technical scheme has the advantages that the human detection and identification method can assist in achieving identification of the target human, detection accuracy is guaranteed, detection speed is improved, and detection of the target human is fast and accurate.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic diagram of the structure of the present invention;

fig. 2 is a diagram of an improved network architecture according to the present invention.

FIG. 3(a) is a drawing of a prior art anchor machine;

FIG. 3(b) is a drawing showing the improved anchor mechanism of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a person detection and identification method, which comprises the following steps as shown in figure 1:

carrying out face detection on an original face image by using a Blazeface network structure to obtain a target image;

carrying out face key point positioning on a target image by using Dlib, carrying out face alignment, and cutting a face area of the target image to generate a training image;

Furthermore, FFmpeg software is used for performing video frame extraction on the input video, and in order to improve efficiency, one frame of image in the input video is extracted every two seconds to serve as a key frame.

Further, the face detection adopts a BlazeFace network structure, and the network is improved based on MobileNet + SSD.

It should be noted that SSD is a one-stage detection network, MobileNet is an optimization means for network acceleration, and the BlazeFace network structure improves the speed as much as possible under the condition of ensuring the accuracy, and is improved based on MobileNet + SSD:

1. the network structure was modified to replace the convolution with 3 x 3 with 5 x 5, as shown in fig. 2, the length of the convolution kernel was 5 x 5, which increased the receptive field.

2. The anchor mechanism is improved, and 2 anchors of each pixel in 8 × 8, 4 × 4 and 2 × 2 resolutions are replaced by 6 anchors of 8 × 8, as shown in fig. 3(a) and 3(b), so that the detection speed is improved.

Further, in the process of carrying out face detection on the input video, the detected face can obviously shake, a tie resolution strategy is used for replacing NMS, and regression parameters of the bounding box are estimated to be a weighted average value between overlapping predictions. The improved network structure can achieve the speed of sub-millisecond grade on the mobile equipment and has higher precision.

Furthermore, after face detection is completed, the face area is cut, and because the faces in the video may have different orientations, the Dlib can be used for positioning key points of the faces, then the faces are aligned, and inclined faces are adjusted, so that the accuracy of the face recognition part is improved.

Further, the specific steps of the face recognition training are as follows:

inputting the training image into ResNet50 to extract features;

Further, after the face recognition network is trained, the characteristics of the base library need to be generated. And for the target person to be recognized, at least one image is taken from each position, the features are extracted by using a trained face recognition network, and the images are stored in a database to obtain the characteristics of a base.

Further, the method also comprises a face recognition test, and the specific steps are as follows:

when a test image is input, extracting a face region, inputting the face region into a trained face recognition network, and extracting features to obtain test features;

and calculating Euclidean distance between the test features and the bottom library features, and if the Euclidean distance is smaller than a specified threshold value, determining that the target person is the target person.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A person detection and identification method is characterized by comprising the following steps:

2. The method as claimed in claim 1, wherein the FFmpeg software is used to extract a frame of image of the input video as the key frame at regular intervals.

3. The method of claim 1, wherein the BlazeFace network architecture is based on MobileNet + SSD to improve the convolution kernel size and control mechanism.

4. The method for detecting and identifying a person as claimed in claim 1, wherein the steps of the face recognition training are as follows:

inputting the training image into ResNet50 to extract features;

calculating the difference between the predicted label and the real label by using the ArcFace Loss to finish the training stage of face recognition, wherein the calculation formula is as follows:

5. The method of claim 1, wherein at least one image of each person is extracted by the trained face recognition network and stored in a database to obtain the base features.

6. The method of claim 5, further comprising a face recognition test, wherein the face recognition test comprises the following specific steps:

and calculating Euclidean distance between the test features and the bottom library features, and if the Euclidean distance is smaller than a specified threshold value, determining that the person is the character.