CN109101915B - Face, pedestrian and attribute recognition network structure design method based on deep learning - Google Patents
Face, pedestrian and attribute recognition network structure design method based on deep learning Download PDFInfo
- Publication number
- CN109101915B CN109101915B CN201810864964.9A CN201810864964A CN109101915B CN 109101915 B CN109101915 B CN 109101915B CN 201810864964 A CN201810864964 A CN 201810864964A CN 109101915 B CN109101915 B CN 109101915B
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- face
- key point
- ith
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30241—Trajectory
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention disclosesA method for designing a network structure for recognizing human faces, pedestrians and attributes thereof based on deep learning is characterized in that full-connection layers for obtaining the track characteristics of the pedestrian key points and connecting with a pedestrian characteristic extraction sub-network are subjected to characteristic fusion to obtain fusion characteristics(ii) a The method comprises the steps of carrying out key point detection on multiple face images of the same person to obtain face key points, obtaining face key point track characteristics through calculation, carrying out feature fusion on the obtained face key point track characteristics and a full connection layer connected with a face multitask recognition sub-network to obtain fusion characteristicsUsing fusion featuresIdentifying the human face and the attribute thereof; feature to be fusedAnd fusion featuresPerforming feature fusion to obtain feature fusionUsing fusion featuresAnd identifying the pedestrian and the attribute thereof.
Description
Technical Field
The invention relates to the field of deep learning of face and attribute recognition thereof and pedestrian and attribute recognition thereof, in particular to the construction of a network structure.
Background
At present, the technical result of face recognition is leapfrog in the field of academic research, but the problem of low reliability is always existed when the face recognition is applied to the real life. Most face recognition systems currently can only be used in some limited environments, such as: 1. the tested main body needs active matching; 2. the human face image has higher resolution; 3. good lighting conditions. Interference factors such as postures, illumination, expressions and the like often exist in natural scenes, and the interference needs to be overcome in the development and popularization of the face recognition technology.
The pedestrian re-identification technology is a technology for judging whether a specific pedestrian exists in an image or a video sequence by utilizing a computer vision technology. In the face of massive extended monitoring videos, the need of re-identifying pedestrians in the monitoring videos by using a computer arises. Pedestrian re-identification has been rapidly developed in recent years under the continuous effort of researchers, but there is a great gap with the demand for practical applications. Firstly, in a general monitoring video, the resolution of pedestrians in an image is low, and face information is fuzzy, which is very unfavorable for image analysis, feature extraction, segmentation and the like; secondly, there will be a situation of occlusion between the pedestrian and the pedestrian or other objects, which has a great influence on the representation of the pedestrian; finally, the appearance of the same person is greatly changed due to the difference of monitoring environments, the difference of camera parameters and the difference of illumination, and certain difficulty is brought to matching. How to overcome the difficulty of the above factors on the pedestrian matching task and find out an effective method to solve the problem is an important research direction for the pedestrian re-identification problem.
Disclosure of Invention
The invention overcomes the defects of the prior art, provides a face and pedestrian and attribute identification network structure based on deep learning, aims to identify the face and the attribute thereof and identify the pedestrian and the attribute thereof by utilizing a multitask network based on a convolutional neural network, and adds a pedestrian key point track characteristic and a face key point track characteristic to improve the accuracy of the face and pedestrian and attribute identification.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for designing a network structure for recognizing human faces, pedestrians and attributes thereof based on deep learning comprises the following steps:
step (1): inputting continuous n frames of video images captured by a monitoring camera into a pedestrian detection and tracking module, and outputting a continuous n frames of pedestrian image sequence of an ith pedestrian when the ith pedestrian appears in the video imagesPedestrian detection adopts divisionThe source fast R-CNN algorithm comprises three basic frames, wherein the first basic frame is a region candidate network (RPN) used for generating a candidate region for each monitoring video image, the second basic frame is a convolutional neural network used for extracting pedestrian features from the candidate region, and the third basic frame is a binary Softmax classifier used for judging whether the candidate region contains pedestrians or not, and the pedestrian tracking adopts an optical flow tracking function of opencv;
step (2): respectively sequencing the continuous n frames of pedestrian images of the ith pedestrian obtained in the step (1)Inputting the pedestrian feature extraction sub-network into a convolutional neural network-based pedestrian feature extraction sub-network, wherein the network comprises two network layers, namely a convolutional layer and a maximum sampling layer, the maximum sampling layer of the two convolutional layers is used as a substructure, and the pedestrian feature extraction sub-network comprises N series-connected substructures;
and (3): respectively aiming at the continuous n-frame pedestrian image sequence of the ith person obtained in the step (1)Detecting the pedestrian key points to obtain corresponding m pedestrian key pointsCalculating the pedestrian key point track by adopting a formula (1) through the position change of m pedestrian key point positions, normalizing the obtained m pedestrian key point track vectors by adopting a formula (2) respectively, merging the normalized m vectors to serve as the pedestrian key point track characteristics, and extracting the full-connection layer P connected with the pedestrian key point track characteristics and the pedestrian characteristic extraction sub-networkcThe output of the step (b) is subjected to feature fusion to obtain a fusion feature s1The feature fusion adopts a concat layer in a deep learning framework cafe, and extracts the pedestrian key point track feature and the pedestrian feature into a full connection layer P connected with a sub-networkcAs the input of the concat layer, wherein the dimension of the track characteristic of the pedestrian key point is mx (n-1) × 2 dimensions, and the pedestrian characteristic is extractedGet full connectivity layer P that sub-networks are connected tocThe dimension of (c) is D dimension, and the final concat layer output is the fusion feature S1;
And (4): respectively sequencing the continuous n frames of pedestrian images of the ith pedestrian obtained in the step (1)Inputting the image into a face detection module for face detection to obtain a continuous n-frame face image sequence of the ith personThe face detection module adopts a face detection module of an open source face recognition engine SeetaFace, the module adopts a Funnel-Structured Cascade structure (FuSt), the top of the FuSt Cascade structure is composed of a plurality of rapid LAB Cascade classifiers aiming at different postures, then a plurality of multilayer perceptron (MLP) Cascade structures based on SURF characteristics are arranged, finally a unified MLP Cascade structure is used for processing candidate windows of all postures, and finally a correct face window is reserved to obtain a face image;
and (5): judging the continuous n frames of face image sequence of the ith pedestrian obtained in the step (4)Resolution, the face image with resolution larger than A multiplied by B is not subjected to super resolution processing, the face image with resolution smaller than A multiplied by B is subjected to super resolution processing, and finally the ith continuous n-frame face image sequence with higher pedestrian resolution is obtained
And (6): respectively sequencing the face images obtained in the step (5)Inputting the facial features into a face feature extraction sub-network based on a convolutional neural network, wherein the network consists of M convolutional layers;
and (7): are respectively paired withThe human face image sequence obtained in the step (5)Detecting key points of human face to obtain corresponding s key points of human faceCalculating the trajectories of key points of the human face by using a formula (1) according to the position change of key points of the s-person face, normalizing the trajectories of the key points of the obtained s-person face by using a formula (2), combining the normalized s vectors to serve as trajectory features of the key points of the human face, and connecting the trajectory features of the key points of the human face with a sub-network for extracting the features of the key points of the human face to form a full-connection layer FcThe output of the step (2) is subjected to feature fusion to obtain a fusion feature S2Wherein, the dimension of the track characteristic of the human face key points is s (x (n-1) x 2 dimension, and the whole connection layer F connected with the human face characteristic extraction sub-networkcThe dimension of (2) is D dimension;
and (8): fusing the features S obtained in the step (7)2The method comprises the following steps of inputting a face identity feature layer, a face attribute 1 feature layer, a face attribute 2 feature layer and a face attribute v feature layer, wherein the face identity feature layer is used as an input of an identity classification layer, the face attribute 1 feature layer is used as an input of a face attribute 1 classification layer, the face attribute 2 feature layer is used as an input of a face attribute 2 classification layer, and the face attribute v feature layer is used as an input of a face attribute v classification layer;
and (9): for the fusion characteristics S obtained in the step (3)1And the fusion feature S obtained in step (7)2Performing feature fusion to obtain feature fusion S3In which features S are fused1Has dimension of m x (n-1) x 2+ D, and fuses the feature S2Dimension of (D) is s × (n-1) × 2+ D dimension;
step (10): fusing the features S obtained in the step (9)3The pedestrian identity characteristic layer is used as the input of a pedestrian identity characteristic layer, a pedestrian attribute 1 characteristic layer, a pedestrian attribute 2 characteristic layer, a pedestrian attribute v characteristic layer, the pedestrian identity characteristic layer is used as the input of a pedestrian identity classification layer, the pedestrian attribute 1 characteristic layer is used as the input of a pedestrian attribute 1 classification layer, and the pedestrian is classified into a pedestrian attribute 1 classification layerA pedestrian attribute 2 characteristic layer is used as the input of a pedestrian attribute 2 classification layer, and a pedestrian attribute u characteristic layer is used as the input of a pedestrian attribute u classification layer;
wherein, when t is 0,representing the calculation of the pedestrian key point track of the ith pedestrian, k represents the k pedestrian key point of the ith pedestrian, and k belongs to [1, m ]]J represents the jth frame pedestrian image of the ith pedestrian, j belongs to [1, n-1 ]],Representing the k-th pedestrian key point track from the jth frame to the jth + 1-th frame of pedestrian images of the ith pedestrian,a k-th pedestrian key point coordinate of a j + 1-th frame pedestrian image representing an i-th pedestrian,the k-th pedestrian key point coordinate of the j-th frame pedestrian image representing the i-th pedestrian,the x-axis coordinate of the kth pedestrian key point of the j +1 th frame pedestrian image representing the ith pedestrian,the y-axis coordinate of the kth pedestrian key point of the j +1 th frame pedestrian image representing the ith pedestrian,the x-axis coordinate of the kth pedestrian key point of the jth frame pedestrian image representing the ith pedestrian,y-axis coordinates of a kth pedestrian key point of a jth frame pedestrian image representing an ith pedestrian;
when t is equal to 1, the first step is carried out,representing the calculation of the face key point track of the ith pedestrian, k represents the k personal face key point of the ith pedestrian, and k belongs to [1, s ]]J represents the jth frame face image of the ith pedestrian, and j belongs to [1, n-1 ]],A k-th personal face key point track of the face images from the jth frame to the (j + 1) th frame representing the ith pedestrian,the kth personal face key point coordinates of the j +1 th frame of face image representing the ith pedestrian,the kth personal face key point coordinates of the jth frame of face image representing the ith pedestrian,the x-axis coordinate of the kth personal face key point of the j +1 th frame of the face image of the ith pedestrian,y-axis coordinates of a k-th personal face key point of a j + 1-th frame of the face image of the ith pedestrian,the x-axis coordinate of the k personal face key point of the j frame human face image representing the ith pedestrian,and the y-axis coordinate of the kth personal face key point of the jth frame of the face image of the ith pedestrian.
Wherein, when t is 0,the method is characterized in that the track vector of the pedestrian key point of the ith pedestrian is normalized, k represents the k pedestrian key point of the ith pedestrian, and k belongs to [1, m ]]J represents the jth frame pedestrian image of the ith pedestrian, j belongs to [1, n-1 ]],The k-th pedestrian key point track characteristic of the continuous n-frame pedestrian images representing the ith pedestrian is a vector with (n-1) multiplied by 2 dimensions,a k-th pedestrian keypoint trajectory representing a succession of n frames of pedestrian images of the i-th pedestrian,representing the trace length of a k pedestrian key point from a jth frame to a j +1 th frame of pedestrian images of the ith pedestrian;
when t is equal to 1, the first step is carried out,the method is characterized in that the trajectory vector of the face key point of the ith pedestrian is normalized, k represents the k personal face key point of the ith pedestrian, and k belongs to [1, s ]]J represents the jth frame face image of the ith pedestrian, and j belongs to [1, n-1 ]],The k-th personal face key point track characteristic of the continuous n-frame face images of the ith pedestrian is a vector of (n-1) multiplied by 2 dimensions,a k-th individual face keypoint trajectory representing n consecutive frames of face images of an i-th pedestrian,and the length of the k personal face key point track from the j frame to the j +1 frame of the face image of the ith pedestrian is represented.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a method for designing a network structure for identifying human faces, pedestrians and attributes thereof based on deep learning, wherein monitoring video images are input into a pedestrian detection and tracking module to carry out pedestrian detection and tracking so as to obtain a plurality of pedestrian images of the same person; carrying out pedestrian key point detection on the obtained images of a plurality of pedestrians of the same person, obtaining pedestrian key point track characteristics through calculation, carrying out characteristic fusion on the obtained pedestrian key point track characteristics and a full connection layer connected with a pedestrian characteristic extraction sub-network to obtain fusion characteristics S1(ii) a Inputting the obtained multiple pedestrian images of the same person into a face detection module to carry out face detection to obtain multiple face images of the same person; judging the resolutions of a plurality of face images of the same person, directly inputting the face image with higher resolution into a face multitask recognition sub-network, performing super-resolution processing on the face image with lower resolution, and then inputting the face image into the face multitask recognition sub-network; face key points obtained by carrying out key point detection on a plurality of face images of the same person are calculated to obtain face key point track characteristics, and the obtained face key point track characteristics and a full connection layer connected with a face multitask identification sub-network are subjected to characteristic fusion to obtain fusion characteristics S2Using the fusion feature S2Identifying the human face and the attribute thereof; fusing features S1And fusion feature S2Performing feature fusion to obtain feature fusion S3Using the fusion feature S3And identifying the pedestrian and the attribute thereof. The network structure improves the accuracy of face and pedestrian identification and attribute identification.
Drawings
Fig. 1 is a schematic diagram of a network structure for recognizing human faces and pedestrians and their attributes based on deep learning.
Fig. 2 is a schematic diagram of a pedestrian feature extraction sub-network structure.
Fig. 3 is a schematic diagram of a face feature extraction sub-network structure.
Detailed Description
In this embodiment, as shown in fig. 1, a schematic diagram of a network structure for recognizing human faces, pedestrians, and attributes thereof based on deep learning mainly includes the following steps:
step (1): inputting continuous 15 frames of video images captured by a monitoring camera into a pedestrian detection and tracking module, and outputting a continuous 15 frames of pedestrian image sequence of an ith pedestrian when the ith pedestrian appears in the video imagesThe pedestrian detection adopts an open source fast R-CNN algorithm, which comprises three basic frames, wherein the first frame is a candidate area network (RPN) structure used for generating a candidate area for each monitoring video image, the second frame is a convolutional neural network used for extracting pedestrian characteristics from the candidate area, and the third frame is a binary Softmax classifier used for judging whether the candidate area contains a pedestrian or not, and the pedestrian tracking adopts an optical flow tracking function of opencv;
step (2): respectively sequencing the continuous 15-frame pedestrian image sequence of the ith pedestrian obtained in the step (1)Inputting the pedestrian feature extraction sub-network into a convolutional neural network-based pedestrian feature extraction sub-network, wherein the network comprises two network layers, namely a convolutional layer and a maximum sampling layer, the maximum sampling layer of the two convolutional layers is used as a substructure, and the pedestrian feature extraction sub-network comprises 10 series-connected substructures;
and (3): respectively aiming at the continuous 15-frame pedestrian image sequence of the ith person obtained in the step (1)Detecting the pedestrian key points to obtain corresponding 18 pedestrian key pointsBy changing the position of 18 pedestrian key pointsCalculating the pedestrian key point track by adopting a key point track calculation formula, normalizing the obtained 18 pedestrian key point track vectors by adopting a key point track normalization formula respectively, merging the 18 normalized vectors to serve as the pedestrian key point track characteristics, and extracting the pedestrian key point track characteristics and a pedestrian characteristic sub-network to form a full connection layer PcThe output of the step (b) is subjected to feature fusion to obtain a fusion feature s1The feature fusion adopts a concat layer in a deep learning framework cafe, and extracts the pedestrian key point track feature and the pedestrian feature into a full connection layer P connected with a sub-networkcThe dimension of the pedestrian key point track feature is 504 dimensions, and the pedestrian feature extraction sub-network is connected with a full connection layer PcThe dimension of (c) is 512 dimensions, and the final concat layer output is the fusion feature S1;
And (4): respectively sequencing the continuous 15-frame pedestrian image sequence of the ith pedestrian obtained in the step (1)Inputting the image into a face detection module for face detection to obtain a continuous 15-frame face image sequence of the ith personThe face detection module adopts a face detection module of an open source face recognition engine SeetaFace, the module adopts a Funnel-Structured Cascade structure (FuSt), the top of the FuSt Cascade structure is composed of a plurality of rapid LAB Cascade classifiers aiming at different postures, then a plurality of multilayer perceptron (MLP) Cascade structures based on SURF characteristics are arranged, finally a unified MLP Cascade structure is used for processing candidate windows of all postures, and finally a correct face window is reserved to obtain a face image;
and (5): judging the continuous 15 frames of face image sequence of the ith pedestrian obtained in the step (4)Resolution, to be greater than 112 x 112The face image is not subjected to super-resolution processing, the face image with the resolution ratio smaller than 112 multiplied by 112 is subjected to super-resolution processing, and finally the ith continuous 15-frame face image sequence with higher pedestrian resolution ratio is obtained
And (6): respectively sequencing the face images obtained in the step (5)Inputting the facial features into a face feature extraction sub-network based on a convolutional neural network, wherein the network consists of 20 convolutional layers;
and (7): respectively aligning the face image sequences obtained in the step (5)Detecting the key points of the face to obtain corresponding 5 key points of the faceCalculating the tracks of the key points of the human face by adopting a key point track calculation formula through the position change of the key points of the 5 human faces, normalizing the track vectors of the key points of the 5 obtained human faces by adopting a key point track normalization formula respectively, merging the 5 normalized vectors to serve as track characteristics of the key points of the human face, and extracting the track characteristics of the key points of the human face and a full connection layer F connected with a sub-network of the extraction of the human face characteristicscThe output of the step (2) is subjected to feature fusion to obtain a fusion feature S2Wherein, the dimension of the track characteristic of the face key points is 140 dimensions, and the face characteristic extraction sub-network is connected with a full connection layer FcHas a dimension of 512 dimensions;
and (8): fusing the features S obtained in the step (7)2The face identity characteristic layer is used as the input of an identity classification layer, the gender characteristic layer, the expression characteristic layer and the age characteristic layer, the gender characteristic layer is used as the input of the gender classification layer, the expression characteristic layer is used as the input of the expression classification layer, and the age characteristic layer is used as the input of the age classification layer;
and (9): for the fusion characteristics S obtained in the step (3)1And the fusion feature S obtained in step (7)2Performing feature fusion to obtain feature fusion S3In which features S are fused1Has a dimension of 1016, and is fused with the feature S2Dimension of 652;
step (10): fusing the features S obtained in the step (9)3The pedestrian identity characteristic layer is used as the input of a pedestrian identity characteristic layer, a gender characteristic layer, a hair style characteristic layer and a clothes type characteristic layer, the gender characteristic layer is used as the input of the gender classification layer, the hair style characteristic layer is used as the input of the hair style classification layer, and the clothes type characteristic layer is used as the input of the clothes type classification layer;
the formula for calculating the track of the key points is as follows:
wherein, when t is 0,representing the calculation of the pedestrian key point track of the ith pedestrian, k represents the k pedestrian key point of the ith pedestrian, and k belongs to [1, m ]]J represents the jth frame pedestrian image of the ith pedestrian, j belongs to [1, n-1 ]],Representing the k-th pedestrian key point track from the jth frame to the jth + 1-th frame of pedestrian images of the ith pedestrian,a k-th pedestrian key point coordinate of a j + 1-th frame pedestrian image representing an i-th pedestrian,the k-th pedestrian key point coordinate of the j-th frame pedestrian image representing the i-th pedestrian,the x-axis coordinate of the kth pedestrian key point of the j +1 th frame pedestrian image representing the ith pedestrian,the y-axis coordinate of the kth pedestrian key point of the j +1 th frame pedestrian image representing the ith pedestrian,the x-axis coordinate of the kth pedestrian key point of the jth frame pedestrian image representing the ith pedestrian,and (3) the y-axis coordinate of the kth pedestrian key point of the jth frame pedestrian image of the ith pedestrian is represented, when j is 1 and k is 1, the 1 st pedestrian key point track from the 1 st frame to the 2 nd frame pedestrian image of the ith person is as follows:
when t is equal to 1, the first step is carried out,representing the calculation of the face key point track of the ith pedestrian, k represents the k personal face key point of the ith pedestrian, and k belongs to [1, s ]]J represents the jth frame face image of the ith pedestrian, and j belongs to [1, n-1 ]],A k-th personal face key point track of the face images from the jth frame to the (j + 1) th frame representing the ith pedestrian,the kth personal face key point coordinates of the j +1 th frame of face image representing the ith pedestrian,the kth personal face key point coordinates of the jth frame of face image representing the ith pedestrian,the x-axis coordinate of the kth personal face key point of the j +1 th frame of the face image of the ith pedestrian,y-axis coordinates of a k-th personal face key point of a j + 1-th frame of the face image of the ith pedestrian,the x-axis coordinate of the k personal face key point of the j frame human face image representing the ith pedestrian,and when j is 1 and k is 1, the track of the 1 st personal face key point of the 1 st frame to the 2 nd frame of the face image of the ith person is as follows:
the key point track normalization formula is as follows:
wherein, when t is 0,the method is characterized in that the track vector of the pedestrian key point of the ith pedestrian is normalized, k represents the k pedestrian key point of the ith pedestrian, and k belongs to [1, m ]]J represents the jth frame pedestrian image of the ith pedestrian, j belongs to [1, n-1 ]],The k-th pedestrian key point track characteristic of the continuous n-frame pedestrian images representing the ith pedestrian is a vector with (n-1) multiplied by 2 dimensions,to representThe k-th pedestrian key point trajectory of n consecutive frames of pedestrian images of the ith pedestrian,and (3) representing the k-th pedestrian key point track length from the j frame to the j +1 frame of the pedestrian image of the ith pedestrian, and when n is 15 and k is 1, normalizing the 1 st pedestrian key point track vector of the ith pedestrian for continuously 15 frames of the pedestrian image into:
when t is equal to 1, the first step is carried out,the method is characterized in that the trajectory vector of the face key point of the ith pedestrian is normalized, k represents the k personal face key point of the ith pedestrian, and k belongs to [1, s ]]J represents the jth frame face image of the ith pedestrian, and j belongs to [1, n-1 ]],The k-th personal face key point track characteristic of the continuous n-frame face images of the ith pedestrian is a vector of (n-1) multiplied by 2 dimensions,a k-th individual face keypoint trajectory representing n consecutive frames of face images of an i-th pedestrian,the length of the k-th personal face key point track from the j-th frame to the j + 1-th frame of the face image of the ith pedestrian is represented, and when n is 15 and k is 1, the 1-st personal face key point track vector of the continuous 15-frame face image of the ith pedestrian is normalized as follows:
Claims (3)
1. a method for designing a network structure for recognizing human faces, pedestrians and attributes thereof based on deep learning comprises the following steps:
step (1): inputting continuous n frames of video images captured by the monitoring camera into the pedestrian detection and tracking module, and outputting a continuous n frames of pedestrian image sequence { P) of the ith pedestrian when the ith pedestrian appears in the video imagesi 1,Pi 2,…,Pi nThe pedestrian detection adopts an open-source fast R-CNN algorithm, the algorithm comprises three basic frames, the first frame is a candidate area network structure RPN used for generating a candidate area for each monitoring video image, the second frame is a convolutional neural network used for extracting pedestrian features from the candidate area, and the third frame is a binary Softmax classifier used for judging whether the candidate area contains a pedestrian or not, and the pedestrian tracking adopts an optical flow tracking function of opencv;
step (2): respectively converting the continuous n frames of pedestrian image sequences { P) of the ith pedestrian obtained in the step (1)i 1,Pi 2,…,Pi nInputting the data into a pedestrian feature extraction sub-network based on a convolutional neural network, wherein the network comprises two network layers of a convolutional layer and a maximum sampling layer, the two convolutional layers are used for connecting the maximum sampling layer as a sub-structure, and the pedestrian feature extraction sub-network comprises N series sub-structures;
and (3): respectively for the i-th person's consecutive n-frame line image sequence { P) obtained in step (1)i 1,Pi 2,…,Pi nDetecting pedestrian key points to obtain corresponding m pedestrian key pointsCalculating the pedestrian key point track through the position change of m pedestrian key point positions, respectively normalizing the obtained m pedestrian key point track vectors, combining the normalized m vectors to serve as the pedestrian key point track characteristics, and extracting the pedestrian key point track characteristics and the pedestrian characteristic to form a full-connection layer P connected with a sub-networkcOutput of (2) performing feature fusionObtaining a fusion feature S1The feature fusion adopts a concat layer in a deep learning framework cafe, and extracts the pedestrian key point track feature and the pedestrian feature into a full connection layer P connected with a sub-networkcAs the input of the concat layer, wherein the dimensionality of the track characteristic of the pedestrian key point is mx (n-1) × 2 dimensionality, and the pedestrian characteristic extraction sub-network is connected with the full connection layer PcThe dimension of (c) is D dimension, and the final concat layer output is the fusion feature S1;
And (4): respectively converting the continuous n frames of pedestrian image sequences { P) of the ith pedestrian obtained in the step (1)i 1,Pi 2,…,Pi nInputting the image into a face detection module for face detection to obtain an ith personal continuous n-frame face image sequence (F)i 1,Fi 2,…,Fi nThe face detection module adopts a face detection module of an open source face recognition engine SeetaFace, the module adopts a funnel type cascade structure FuSt, the funnel type cascade structure FuSt is composed of a plurality of rapid LAB cascade classifiers aiming at different postures at the top, then a plurality of multilayer perceptron MLP cascade structures based on SURF characteristics are arranged, finally a unified MLP cascade structure is used for processing candidate windows of all postures, and finally a correct face window is reserved to obtain a face image;
and (5): judging the continuous n frames of face image sequences { F) of the ith pedestrian obtained in the step (4)i 1,Fi 2,…,Fi nResolving power, namely not performing super-resolution processing on the face image with the resolving power larger than A multiplied by B, and performing super-resolution processing on the face image with the resolving power smaller than A multiplied by B to finally obtain the ith continuous n-frame face image sequence with higher pedestrian resolution
And (6): respectively sequencing the face images obtained in the step (5)Inputting the facial features into a face feature extraction sub-network based on a convolutional neural network, wherein the network consists of M convolutional layers;
and (7): respectively aligning the face image sequences obtained in the step (5)Detecting key points of human face to obtain corresponding s key points of human faceCalculating the trajectories of key points of the human face through the position change of key points of the s-individual human face, respectively normalizing the trajectory vectors of the key points of the obtained s-individual human face, combining the normalized s vectors to serve as the trajectory features of the key points of the human face, and extracting the full-connection layer F of the sub-networks for connecting the trajectory features of the key points of the human face and the sub-networks for extracting the human face featurescThe output of the step (2) is subjected to feature fusion to obtain a fusion feature S2Wherein, the dimension of the track characteristic of the human face key points is s (x (n-1) x 2 dimension, and the whole connection layer F connected with the human face characteristic extraction sub-networkcThe dimension of (2) is D dimension;
and (8): fusing the features S obtained in the step (7)2The method comprises the following steps of taking the input of a face identity feature layer, a face attribute 1 feature layer, a face attribute 2 feature layer, … and a face attribute v feature layer, taking the face identity feature layer as the input of an identity classification layer, taking the face attribute 1 feature layer as the input of the face attribute 1 classification layer, taking the face attribute 2 feature layer as the input of the face attribute 2 classification layer, and taking … the face attribute v feature layer as the input of the face attribute v classification layer;
and (9): for the fusion characteristics S obtained in the step (3)1And the fusion feature S obtained in step (7)2Performing feature fusion to obtain feature fusion S3In which features S are fused1Has dimension of m x (n-1) x 2+ D, and fuses the feature S2Dimension of (D) is s × (n-1) × 2+ D dimension;
step (10): fusing the features S obtained in the step (9)3As a pedestrian identity feature layer, a pedestrian attribute 1 feature layer, a pedestrian attribute 2 feature layer, …, a pedestrian attributeAnd inputting a sexual v characteristic layer, namely taking the pedestrian identity characteristic layer as the input of a pedestrian identity classification layer, taking the pedestrian attribute 1 characteristic layer as the input of a pedestrian attribute 1 classification layer, taking the pedestrian attribute 2 characteristic layer as the input of a pedestrian attribute 2 classification layer, …, and taking the pedestrian attribute v characteristic layer as the input of a pedestrian attribute v classification layer.
2. The method for designing the network structure based on the deep learning of the human faces, pedestrians and the attribute recognition thereof as claimed in claim 1, wherein: the calculation formula of the pedestrian key point track in the step (3) and the calculation formula of the face key point track in the step (7) are as follows:
wherein, when t is 0,representing the calculation of the pedestrian key point track of the ith pedestrian, k represents the k pedestrian key point of the ith pedestrian, and k belongs to [1, m ]]J represents the jth frame pedestrian image of the ith pedestrian, j belongs to [1, n-1 ]],Representing the k-th pedestrian key point track from the jth frame to the jth + 1-th frame of pedestrian images of the ith pedestrian,a k-th pedestrian key point coordinate of a j + 1-th frame pedestrian image representing an i-th pedestrian,the k-th pedestrian key point coordinate of the j-th frame pedestrian image representing the i-th pedestrian,j +1 th frame pedestrian image representing ith pedestrianThe x-axis coordinate of the k-th pedestrian keypoint,the y-axis coordinate of the kth pedestrian key point of the j +1 th frame pedestrian image representing the ith pedestrian,the x-axis coordinate of the kth pedestrian key point of the jth frame pedestrian image representing the ith pedestrian,y-axis coordinates of a kth pedestrian key point of a jth frame pedestrian image representing an ith pedestrian;
when t is equal to 1, the first step is carried out,representing the calculation of the face key point track of the ith pedestrian, k represents the k personal face key point of the ith pedestrian, and k belongs to [1, s ]]J represents the jth frame face image of the ith pedestrian, and j belongs to [1, n-1 ]],A k-th personal face key point track of the face images from the jth frame to the (j + 1) th frame representing the ith pedestrian,the kth personal face key point coordinates of the j +1 th frame of face image representing the ith pedestrian,the kth personal face key point coordinates of the jth frame of face image representing the ith pedestrian,the x-axis coordinate of the kth personal face key point of the j +1 th frame of the face image of the ith pedestrian,y-axis coordinates of a k-th personal face key point of a j + 1-th frame of the face image of the ith pedestrian,the x-axis coordinate of the k personal face key point of the j frame human face image representing the ith pedestrian,and the y-axis coordinate of the kth personal face key point of the jth frame of the face image of the ith pedestrian.
3. The method for designing the network structure based on the deep learning of the human faces, pedestrians and the attribute recognition thereof as claimed in claim 1, wherein: normalizing the trajectory vectors of the pedestrian key points in the step (3) and normalizing the trajectory vectors of the face key points in the step (7), wherein a normalization formula is as follows:
wherein, when t is 0,the method is characterized in that the track vector of the pedestrian key point of the ith pedestrian is normalized, k represents the k pedestrian key point of the ith pedestrian, and k belongs to [1, m ]]J represents the jth frame pedestrian image of the ith pedestrian, j belongs to [1, n-1 ]],The k-th pedestrian key point track characteristic of the continuous n-frame pedestrian images representing the ith pedestrian is a vector with (n-1) multiplied by 2 dimensions,k-th pedestrian key representing n consecutive frames of pedestrian images of i-th pedestrianThe locus of the points is such that,representing the trace length of a k pedestrian key point from a jth frame to a j +1 th frame of pedestrian images of the ith pedestrian;
when t is equal to 1, the first step is carried out,the method is characterized in that the trajectory vector of the face key point of the ith pedestrian is normalized, k represents the k personal face key point of the ith pedestrian, and k belongs to [1, s ]]J represents the jth frame face image of the ith pedestrian, and j belongs to [1, n-1 ]],The k-th personal face key point track characteristic of the continuous n-frame face images of the ith pedestrian is a vector of (n-1) multiplied by 2 dimensions,a k-th individual face keypoint trajectory representing n consecutive frames of face images of an i-th pedestrian,and the length of the k personal face key point track from the j frame to the j +1 frame of the face image of the ith pedestrian is represented.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810864964.9A CN109101915B (en) | 2018-08-01 | 2018-08-01 | Face, pedestrian and attribute recognition network structure design method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810864964.9A CN109101915B (en) | 2018-08-01 | 2018-08-01 | Face, pedestrian and attribute recognition network structure design method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109101915A CN109101915A (en) | 2018-12-28 |
CN109101915B true CN109101915B (en) | 2021-04-27 |
Family
ID=64848324
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810864964.9A Active CN109101915B (en) | 2018-08-01 | 2018-08-01 | Face, pedestrian and attribute recognition network structure design method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109101915B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109858402B (en) * | 2019-01-16 | 2021-08-31 | 腾讯科技(深圳)有限公司 | Image detection method, device, terminal and storage medium |
CN109886154A (en) * | 2019-01-30 | 2019-06-14 | 电子科技大学 | Most pedestrian's appearance attribute recognition methods according to collection joint training based on Inception V3 |
CN109829436B (en) * | 2019-02-02 | 2022-05-13 | 福州大学 | Multi-face tracking method based on depth appearance characteristics and self-adaptive aggregation network |
CN110084216B (en) * | 2019-05-06 | 2021-11-09 | 苏州科达科技股份有限公司 | Face recognition model training and face recognition method, system, device and medium |
CN110298278B (en) * | 2019-06-19 | 2021-06-04 | 中国计量大学 | Underground parking garage pedestrian and vehicle monitoring method based on artificial intelligence |
CN110263756A (en) * | 2019-06-28 | 2019-09-20 | 东北大学 | A kind of human face super-resolution reconstructing system based on joint multi-task learning |
CN111553231B (en) * | 2020-04-21 | 2023-04-28 | 上海锘科智能科技有限公司 | Face snapshot and deduplication system, method, terminal and medium based on information fusion |
CN112818833B (en) * | 2021-01-29 | 2024-04-12 | 中能国际建筑投资集团有限公司 | Face multitasking detection method, system, device and medium based on deep learning |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9437009B2 (en) * | 2011-06-20 | 2016-09-06 | University Of Southern California | Visual tracking in video images in unconstrained environments by exploiting on-the-fly context using supporters and distracters |
CN103116756B (en) * | 2013-01-23 | 2016-07-27 | 北京工商大学 | A kind of persona face detection method and device |
CN104077804B (en) * | 2014-06-09 | 2017-03-01 | 广州嘉崎智能科技有限公司 | A kind of method based on multi-frame video picture construction three-dimensional face model |
AU2015224526B2 (en) * | 2014-09-11 | 2020-04-30 | Iomniscient Pty Ltd | An image management system |
CN105518744B (en) * | 2015-06-29 | 2018-09-07 | 北京旷视科技有限公司 | Pedestrian recognition methods and equipment again |
CN108038409B (en) * | 2017-10-27 | 2021-12-28 | 江西高创保安服务技术有限公司 | Pedestrian detection method |
-
2018
- 2018-08-01 CN CN201810864964.9A patent/CN109101915B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN109101915A (en) | 2018-12-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109101915B (en) | Face, pedestrian and attribute recognition network structure design method based on deep learning | |
Konstantinidis et al. | Sign language recognition based on hand and body skeletal data | |
Zhan et al. | Face detection using representation learning | |
CN111414862B (en) | Expression recognition method based on neural network fusion key point angle change | |
Pu et al. | Facial expression recognition from image sequences using twofold random forest classifier | |
Shirsat et al. | Proposed system for criminal detection and recognition on CCTV data using cloud and machine learning | |
Xia et al. | Face occlusion detection using deep convolutional neural networks | |
Liu et al. | Facial attractiveness computation by label distribution learning with deep CNN and geometric features | |
Yang et al. | Face recognition based on MTCNN and integrated application of FaceNet and LBP method | |
CN117541994A (en) | Abnormal behavior detection model and detection method in dense multi-person scene | |
Archana et al. | Real time face detection and optimal face mapping for online classes | |
Chen et al. | A multi-scale fusion convolutional neural network for face detection | |
Hsiao et al. | EfficientNet based iris biometric recognition methods with pupil positioning by U-net | |
Silwal et al. | A novel deep learning system for facial feature extraction by fusing CNN and MB-LBP and using enhanced loss function | |
Myvizhi et al. | Extensive analysis of deep learning-based deepfake video detection | |
Sajid et al. | Facial asymmetry-based feature extraction for different applications: a review complemented by new advances | |
Liu et al. | Lip event detection using oriented histograms of regional optical flow and low rank affinity pursuit | |
Yang et al. | Heterogeneous face detection based on multi‐task cascaded convolutional neural network | |
Liu et al. | Robust saliency-aware distillation for few-shot fine-grained visual recognition | |
Martinez-Gonzalez et al. | Real time face detection using neural networks | |
Nguyen et al. | A method for hand detection based on Internal Haar-like features and Cascaded AdaBoost Classifier | |
Sadeq et al. | Comparison Between Face and Gait Human Recognition Using Enhanced Convolutional Neural Network | |
Rondón et al. | Machine learning models in people detection and identification: a literature review | |
Ismail et al. | A review on Arabic sign language recognition | |
Papadimitriou et al. | Fingerspelled alphabet sign recognition in upper-body videos |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |