CN113239727A - Person detection and identification method - Google Patents
Person detection and identification method Download PDFInfo
- Publication number
- CN113239727A CN113239727A CN202110375567.7A CN202110375567A CN113239727A CN 113239727 A CN113239727 A CN 113239727A CN 202110375567 A CN202110375567 A CN 202110375567A CN 113239727 A CN113239727 A CN 113239727A
- Authority
- CN
- China
- Prior art keywords
- face
- image
- face recognition
- training
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Human Computer Interaction (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a person detection and identification method, which relates to the technical field of face identification and comprises the following steps: performing video frame extraction on an input video to obtain an original face image; carrying out face detection on the original face image by using a Blazeface network structure to obtain a target image; positioning key points of the human face of the target image by using Dlib, aligning the human face, and cutting the human face area of the target image to generate a training image; carrying out face recognition training on the training image by using ResNet50+ ArcFace Loss to obtain a trained face recognition network; and analyzing the face image to be recognized by using the trained face recognition network to obtain a recognition result. The method can be used for rapidly carrying out face detection and face recognition, and the detection speed is improved while the detection precision is ensured.
Description
Technical Field
The invention relates to the technical field of face recognition, in particular to a person detection and recognition method.
Background
Various publicity videos in the internet are also endless at present, and some people can also publish some unrealistic words on the internet, if the videos flow into the domestic internet, bad effects can be caused, and the scheme for detecting and identifying people in the prior art has the following defects: the fast RCNN is a two-stage general object detection network, has the characteristics of high precision, but relatively low speed compared with other general detection networks such as Yolo and the like, and has high speed requirements on the speed when face detection is carried out in a video; and secondly, with the development of the face recognition technology, more and better metric learning methods are provided, and the recall of the face recognition can be improved. Therefore, how to provide a method for fast face detection and face recognition is a technical problem to be solved urgently for those skilled in the art.
Disclosure of Invention
In view of this, the present invention provides a method for detecting and identifying a person, so as to solve the problems in the background art and improve the detection speed while ensuring the detection accuracy.
In order to achieve the purpose, the invention adopts the following technical scheme: a person detection and identification method comprises the following steps:
performing video frame extraction on an input video to obtain an original face image;
carrying out face detection on the original face image by using a Blazeface network structure to obtain a target image;
positioning key points of the human face of the target image by using Dlib, aligning the human face, and cutting the human face area of the target image to generate a training image;
carrying out face recognition training on the training image by using ResNet50+ ArcFace Loss to obtain a trained face recognition network;
and analyzing the face image to be recognized by using the trained face recognition network to obtain a recognition result.
Preferably, one frame of image in the input video is extracted as a key frame at regular intervals by using FFmpeg software for the input video.
Preferably, the BlazeFace network structure is based on MobileNet + SSD to improve the size of a convolution kernel and a control mechanism.
Preferably, the specific steps of the face recognition training are as follows:
inputting the training image into ResNet50 to extract features;
calculating the difference between the predicted label and the real label by using the ArcFace Loss to complete the training stage of face recognition, wherein the calculation formula is as follows:
wherein L is1As a loss function, xiFor the features extracted by ResNet50, W is the weight value of the fully connected layer, b is the bias value of the fully connected layer, e is the natural logarithm, and m is the sample number.
Preferably, at least one image is taken by each person, the features are extracted by the trained face recognition network, and the extracted features are stored in a database to obtain base features.
Preferably, the method further comprises a face recognition test, and the face recognition test specifically comprises the following steps:
inputting a test image, extracting a face region, inputting the face region into the trained face recognition network, and extracting features to obtain test features;
and calculating Euclidean distance between the test features and the bottom library features, and if the Euclidean distance is smaller than a specified threshold value, determining that the character is the figure.
Compared with the prior art, the technical scheme has the advantages that the human detection and identification method can assist in achieving identification of the target human, detection accuracy is guaranteed, detection speed is improved, and detection of the target human is fast and accurate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic diagram of the structure of the present invention;
fig. 2 is a diagram of an improved network architecture according to the present invention.
FIG. 3(a) is a drawing of a prior art anchor machine;
FIG. 3(b) is a drawing showing the improved anchor mechanism of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a person detection and identification method, which comprises the following steps as shown in figure 1:
performing video frame extraction on an input video to obtain an original face image;
carrying out face detection on an original face image by using a Blazeface network structure to obtain a target image;
carrying out face key point positioning on a target image by using Dlib, carrying out face alignment, and cutting a face area of the target image to generate a training image;
carrying out face recognition training on the training image by using ResNet50+ ArcFace Loss to obtain a trained face recognition network;
and analyzing the face image to be recognized by using the trained face recognition network to obtain a recognition result.
Furthermore, FFmpeg software is used for performing video frame extraction on the input video, and in order to improve efficiency, one frame of image in the input video is extracted every two seconds to serve as a key frame.
Further, the face detection adopts a BlazeFace network structure, and the network is improved based on MobileNet + SSD.
It should be noted that SSD is a one-stage detection network, MobileNet is an optimization means for network acceleration, and the BlazeFace network structure improves the speed as much as possible under the condition of ensuring the accuracy, and is improved based on MobileNet + SSD:
1. the network structure was modified to replace the convolution with 3 x 3 with 5 x 5, as shown in fig. 2, the length of the convolution kernel was 5 x 5, which increased the receptive field.
2. The anchor mechanism is improved, and 2 anchors of each pixel in 8 × 8, 4 × 4 and 2 × 2 resolutions are replaced by 6 anchors of 8 × 8, as shown in fig. 3(a) and 3(b), so that the detection speed is improved.
Further, in the process of carrying out face detection on the input video, the detected face can obviously shake, a tie resolution strategy is used for replacing NMS, and regression parameters of the bounding box are estimated to be a weighted average value between overlapping predictions. The improved network structure can achieve the speed of sub-millisecond grade on the mobile equipment and has higher precision.
Furthermore, after face detection is completed, the face area is cut, and because the faces in the video may have different orientations, the Dlib can be used for positioning key points of the faces, then the faces are aligned, and inclined faces are adjusted, so that the accuracy of the face recognition part is improved.
Further, the specific steps of the face recognition training are as follows:
inputting the training image into ResNet50 to extract features;
calculating the difference between the predicted label and the real label by using the ArcFace Loss to complete the training stage of face recognition, wherein the calculation formula is as follows:
wherein L is1As a loss function, xiFor the features extracted by ResNet50, W is the weight value of the fully connected layer, b is the bias value of the fully connected layer, e is the natural logarithm, and m is the sample number.
Further, after the face recognition network is trained, the characteristics of the base library need to be generated. And for the target person to be recognized, at least one image is taken from each position, the features are extracted by using a trained face recognition network, and the images are stored in a database to obtain the characteristics of a base.
Further, the method also comprises a face recognition test, and the specific steps are as follows:
when a test image is input, extracting a face region, inputting the face region into a trained face recognition network, and extracting features to obtain test features;
and calculating Euclidean distance between the test features and the bottom library features, and if the Euclidean distance is smaller than a specified threshold value, determining that the target person is the target person.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (6)
1. A person detection and identification method is characterized by comprising the following steps:
performing video frame extraction on an input video to obtain an original face image;
carrying out face detection on the original face image by using a Blazeface network structure to obtain a target image;
positioning key points of the human face of the target image by using Dlib, aligning the human face, and cutting the human face area of the target image to generate a training image;
carrying out face recognition training on the training image by using ResNet50+ ArcFace Loss to obtain a trained face recognition network;
and analyzing the face image to be recognized by using the trained face recognition network to obtain a recognition result.
2. The method as claimed in claim 1, wherein the FFmpeg software is used to extract a frame of image of the input video as the key frame at regular intervals.
3. The method of claim 1, wherein the BlazeFace network architecture is based on MobileNet + SSD to improve the convolution kernel size and control mechanism.
4. The method for detecting and identifying a person as claimed in claim 1, wherein the steps of the face recognition training are as follows:
inputting the training image into ResNet50 to extract features;
calculating the difference between the predicted label and the real label by using the ArcFace Loss to finish the training stage of face recognition, wherein the calculation formula is as follows:
wherein L is1As a loss function, xiFor the features extracted by ResNet50, W is the weight value of the fully connected layer, b is the bias value of the fully connected layer, e is the natural logarithm, and m is the sample number.
5. The method of claim 1, wherein at least one image of each person is extracted by the trained face recognition network and stored in a database to obtain the base features.
6. The method of claim 5, further comprising a face recognition test, wherein the face recognition test comprises the following specific steps:
inputting a test image, extracting a face region, inputting the face region into the trained face recognition network, and extracting features to obtain test features;
and calculating Euclidean distance between the test features and the bottom library features, and if the Euclidean distance is smaller than a specified threshold value, determining that the person is the character.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110375567.7A CN113239727A (en) | 2021-04-03 | 2021-04-03 | Person detection and identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110375567.7A CN113239727A (en) | 2021-04-03 | 2021-04-03 | Person detection and identification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113239727A true CN113239727A (en) | 2021-08-10 |
Family
ID=77131254
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110375567.7A Pending CN113239727A (en) | 2021-04-03 | 2021-04-03 | Person detection and identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113239727A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116110100A (en) * | 2023-01-14 | 2023-05-12 | 深圳市大数据研究院 | Face recognition method, device, computer equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106203395A (en) * | 2016-07-26 | 2016-12-07 | 厦门大学 | Face character recognition methods based on the study of the multitask degree of depth |
CN108875602A (en) * | 2018-05-31 | 2018-11-23 | 珠海亿智电子科技有限公司 | Monitor the face identification method based on deep learning under environment |
CN111178228A (en) * | 2019-12-26 | 2020-05-19 | 中云智慧(北京)科技有限公司 | Face recognition method based on deep learning |
CN112070058A (en) * | 2020-09-18 | 2020-12-11 | 深延科技(北京)有限公司 | Face and face composite emotional expression recognition method and system |
CN112488064A (en) * | 2020-12-18 | 2021-03-12 | 平安科技(深圳)有限公司 | Face tracking method, system, terminal and storage medium |
-
2021
- 2021-04-03 CN CN202110375567.7A patent/CN113239727A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106203395A (en) * | 2016-07-26 | 2016-12-07 | 厦门大学 | Face character recognition methods based on the study of the multitask degree of depth |
CN108875602A (en) * | 2018-05-31 | 2018-11-23 | 珠海亿智电子科技有限公司 | Monitor the face identification method based on deep learning under environment |
CN111178228A (en) * | 2019-12-26 | 2020-05-19 | 中云智慧(北京)科技有限公司 | Face recognition method based on deep learning |
CN112070058A (en) * | 2020-09-18 | 2020-12-11 | 深延科技(北京)有限公司 | Face and face composite emotional expression recognition method and system |
CN112488064A (en) * | 2020-12-18 | 2021-03-12 | 平安科技(深圳)有限公司 | Face tracking method, system, terminal and storage medium |
Non-Patent Citations (2)
Title |
---|
JIANKANG DENG等: "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 《ARXIV》 * |
VALENTIN BAZAREVSKY等: "BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs", 《ARXIV》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116110100A (en) * | 2023-01-14 | 2023-05-12 | 深圳市大数据研究院 | Face recognition method, device, computer equipment and storage medium |
CN116110100B (en) * | 2023-01-14 | 2023-11-14 | 深圳市大数据研究院 | Face recognition method, device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11062123B2 (en) | Method, terminal, and storage medium for tracking facial critical area | |
CN108805131B (en) | Text line detection method, device and system | |
CN103208008B (en) | Based on the quick adaptive method of traffic video monitoring target detection of machine vision | |
CN110460838B (en) | Lens switching detection method and device and computer equipment | |
CN115713715B (en) | Human behavior recognition method and recognition system based on deep learning | |
CN112766218B (en) | Cross-domain pedestrian re-recognition method and device based on asymmetric combined teaching network | |
CN111191535B (en) | Pedestrian detection model construction method based on deep learning and pedestrian detection method | |
CN110245697A (en) | A kind of dirty detection method in surface, terminal device and storage medium | |
CN110610123A (en) | Multi-target vehicle detection method and device, electronic equipment and storage medium | |
CN111738153B (en) | Image recognition analysis method and device, electronic equipment and storage medium | |
CN110674887A (en) | End-to-end road congestion detection algorithm based on video classification | |
CN116152824A (en) | Invoice information extraction method and system | |
CN112052702A (en) | Method and device for identifying two-dimensional code | |
CN114005019B (en) | Method for identifying flip image and related equipment thereof | |
CN113239727A (en) | Person detection and identification method | |
JP6341059B2 (en) | Character recognition device, character recognition method, and program | |
US11908124B2 (en) | Pavement nondestructive detection and identification method based on small samples | |
CN113570540A (en) | Image tampering blind evidence obtaining method based on detection-segmentation architecture | |
Wang et al. | Fast blur detection algorithm for UAV crack image sets | |
CN112380970B (en) | Video target detection method based on local area search | |
CN112733864A (en) | Model training method, target detection method, device, equipment and storage medium | |
CN112308061B (en) | License plate character recognition method and device | |
CN111553408B (en) | Automatic test method for video recognition software | |
CN104318207B (en) | A kind of method that shearing lens and gradual shot are judged using rapid robust feature and SVMs | |
CN114170271A (en) | Multi-target tracking method with self-tracking consciousness, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Li Yangxi Inventor after: Miao Yanan Inventor after: Wang Pei Inventor after: Liu Kedong Inventor after: Peng Chengwei Inventor after: Hu Yanlin Inventor before: Li Yangxi Inventor before: Miao Yanan Inventor before: Wang Pei |