CN111582154A - Pedestrian re-identification method based on multitask skeleton posture division component - Google Patents

Pedestrian re-identification method based on multitask skeleton posture division component Download PDF

Info

Publication number
CN111582154A
CN111582154A CN202010377073.8A CN202010377073A CN111582154A CN 111582154 A CN111582154 A CN 111582154A CN 202010377073 A CN202010377073 A CN 202010377073A CN 111582154 A CN111582154 A CN 111582154A
Authority
CN
China
Prior art keywords
pedestrian
network
skeleton
feature
multitask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010377073.8A
Other languages
Chinese (zh)
Inventor
陈海英
王慧燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Gongshang University
Original Assignee
Zhejiang Gongshang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Gongshang University filed Critical Zhejiang Gongshang University
Priority to CN202010377073.8A priority Critical patent/CN111582154A/en
Publication of CN111582154A publication Critical patent/CN111582154A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Abstract

The invention discloses a pedestrian re-identification method based on a multitask skeleton attitude division component. The pedestrian feature extraction and skeleton key point detection two tasks are combined to construct a model, wherein the pedestrian feature extraction network adopts an improved InceptionResNet 2 network and performs feature fusion with a skeleton key point detection branch, so that the network feature expression capability is improved, region blocking can be performed in a self-adaptive manner according to the human body, the fineness and the accuracy of detail feature extraction are improved, and the pedestrian feature extraction method is suitable for solving the problem of pedestrian re-identification which is similar in appearance feature and needs to be identified by means of appearance detail.

Description

Pedestrian re-identification method based on multitask skeleton posture division component
Technical Field
The invention relates to the technical field of computer vision, in particular to a pedestrian re-identification method based on a multitask skeleton posture division component.
Background
The pedestrian re-identification means identifying the identity of the pedestrian from pedestrian images captured by different cameras, aims to make up for the visual limitation of the current fixed camera, is combined with a pedestrian detection/pedestrian tracking technology, and can be widely applied to the fields of intelligent video monitoring, intelligent security and the like. Given an image containing a target pedestrian (query), the pedestrian re-identification (ReID) technique attempts to search for images containing the same pedestrian from a large number of pedestrian images (galleries), widely regarded as a sub-problem for image retrieval; ReID has received great attention from both academic and industrial circles because of its important theoretical value and broad application prospects.
ReID technology has developed very rapidly in recent years, but ReID remains a very challenging task due to significant changes in camera view, height, pedestrian pose, complex background, resolution, etc. Compared with a face recognition task, a scene of the ReID is more complex, some difficult problems are not solved, especially, when appearance characteristics such as dress of pedestrians are similar, recognition is a difficult task, existing detail feature extraction methods are mostly based on uniform partitioning, and fineness is not enough.
Disclosure of Invention
The invention aims to provide a pedestrian re-identification method based on a multitask skeleton attitude division component aiming at the defects of the existing pedestrian re-identification technology, which specifically comprises the following steps:
and (1) preprocessing data.
The sample image is normalized, for example, an input image of 512 × 512 size, and if the sample image is larger than the size, the sample image is randomly cropped, and if the sample image is smaller than the size, the sample image is proportionally enlarged and then cropped.
And (2) designing a network model for feature extraction.
The pedestrian re-recognition model based on multi-task skeleton posture division comprises two branches: the pedestrian characteristic extraction branch and the skeleton key point detection branch.
The pedestrian feature extraction branch is a main network, an improved IncepotionResNetv 2 is adopted as a backbone network, the last downsampling layer of the original IncepotionResNetv 2 is temporarily discarded, and a space tensor feature set (TenstorT) is obtained, so that the global features of pedestrians can be obtained.
The framework key point detection branch adopts a VGG network structure, a confidence map is output at the end of the network through 1-1 convolution, the number of layers of the confidence map is the same as the number of human body joint points, and each layer represents a heat map of one joint point. And performing component division by using the skeleton key points obtained by the skeleton key point detection branches, and dividing the components into seven parts, namely seven space tensors alpha in the horizontal direction to obtain the local features of the pedestrians.
Fusing the global features and the local features in a vector splicing manner, if the dimensions of the two feature vectors are the same, directly fusing in the vector splicing manner; if the spatial tensor mu is different in dimensionality, the spatial tensor mu can be converted into vectors in the same dimensionality through linear transformation, and then fusion is carried out in a vector splicing mode to enhance the expression capacity of the features, so that seven spatial tensors mu are obtained.
And finally, carrying out average merging (averaging) on the seven space tensors mu to obtain seven column vectors beta, carrying out channel dimensionality reduction by using 1 x 1 convolution to obtain seven column vectors gamma, connecting the seven column vectors gamma to 7 full connection layers (FCLayer), and carrying out classification by Softmax to obtain seven eigenvectors, wherein weights in the whole process are not shared.
And (3) training the model by adopting a label smooth loss function so as to optimize network parameters.
Training is carried out on an ImageNet database according to the skeleton posture to obtain a pre-training network, then seven eigenvectors (weight values are not shared) generated in the step (2) are input into a label smoothing loss function to obtain seven loss functions, and model parameters of pedestrian re-recognition of the defined skeleton posture division component are trained by utilizing a back propagation algorithm until the whole network model is converged.
During the test in the step (4), seven column vectors gamma are combined into a (localization) feature vector in a point-by-bit addition mode, Euclidean distances of the specified object in the query set and each object in the candidate set are calculated, and then the calculated distances are sorted in an ascending order to obtain an identification result.
The invention has the beneficial effects that: the method provided by the invention can adaptively divide the regions into blocks according to the human body shape, improves the fineness of detail feature extraction compared with the existing method, and is suitable for solving the problem of the ReID of the pedestrian which has similar appearance features and needs to be identified by means of the appearance details.
Drawings
FIG. 1 is a flow chart according to the present invention;
fig. 2 is a diagram of an overall network architecture according to the present invention.
Detailed Description
In order to describe the present invention more specifically, the following detailed description of the technical solution of the present invention is made with reference to the accompanying drawings and the detailed description, and the flow of an embodiment of the method is shown in fig. 1. The invention relates to a pedestrian re-identification method based on a skeleton attitude division component, which comprises the following steps of:
step (1), data preprocessing
A sufficient number of sample images (100) are acquired, which can be downloaded from the network (Market1501, DukeMTMC-reiD, CUHK03) or can be self-filmed.
The sample image is normalized (101), and for example, an input image of 512 × 512 size is cut randomly if the sample image is larger than the size, and is enlarged in equal proportion and cut again if the sample image is smaller than the size.
Step (2) designing a network model to extract features
The input picture data is input to a device adopting a modified IncepotionResNetv 2 as a backbone network, and the IncepotionResNetv 2 can fuse feature maps of different scales during training.
The modified incopetionresnetv 2 input is first passed through the stem structure (202), i.e. the input is 3 channels, i.e. RGB channels of pictures, and the output is 256 channels through the stem network structure.
And then 256 channels of data output by the stem network are input into 5 increment-ResNet-A (203) networks, and the output is still 256 channels.
The output of the 5 increment-ResNet-A network is input into Reduction-A (204) with 256 channels, and the output is the convolution of 896 channels.
The result output from Reduction-A is input to 10 increment-ResNet-B (205), and a convolution with 896 channels is obtained.
The output result of the inclusion-ResNet-B is input into Reduction-A (206), and convolution with 1792 output channels is obtained.
And (3) inputting the result of the Reduction-A into 5 increment-ResNet-C (207) to obtain convolution with 1792 channels, and obtaining a space Tensor feature set Tensor T, namely the global features of the pedestrians.
And then 7 parts are divided by the skeleton key points obtained by the skeleton key point detection network branch (208), and the parts are divided into 7 parts in the horizontal direction, namely 7 space tensors alpha, so that the local features of the pedestrians are obtained. The pedestrian re-identification method based on the multitask skeleton posture division divides the pedestrian by 7 parts, namely, 14 key points of the human body are utilized to extract local features to improve the accuracy of the pedestrian re-identification, the 7 parts are respectively a head, the upper body is divided into two parts according to the key points of the elbow of the pedestrian, the crotch is one part, the legs are divided into two parts according to the knee joints, then the feet are one part, and the human body is divided into 7 parts in total, so that the method is beneficial to extracting the local features of the pedestrian without damaging the important features of the pedestrian.
Then, fusing the global features and the local features in a vector splicing mode, and if the dimensions of the two feature vectors are the same, directly fusing in the vector splicing mode; if the spatial tensor mu is different in dimensionality, the spatial tensor mu can be converted into vectors in the same dimensionality through linear transformation, and then fusion is carried out in a vector splicing mode to enhance the expression capacity of the features, so that 7 spatial tensors mu are obtained.
Finally, the 7 spatial tensors μ are averaged and pooled (averaging) to obtain 7 column vectors β. And (3) using 1 × 1 convolution to reduce the number of the channels to obtain 7 column vectors γ, connecting the column vectors γ with 7 full-connected layers, classifying by Softmax to obtain 7 feature vectors (209), wherein the weight in the whole process is not shared, and the training process is equivalent to 7 losses.
And (3) passing the input picture through a skeleton key point detection branch (208), wherein the input picture passes through a classic VGG structure and is convoluted by 1 x 1, and a confidence map is output, and if the human body has p joint points, the confidence map has p layers, and each layer represents a heat map of one joint point. The loss of the stage is calculated by the confidence map and the label and stored, and the loss of each layer is added at the end of the network to be used as the total loss for reverse transmission, so that intermediate supervision is realized, and the gradient disappearance is avoided.
Step (3), model training (102)
And performing combined training according to the pedestrian feature extraction branch and the skeleton key point detection branch (208), performing feature fusion on feature vectors generated by the network in a vector splicing manner, inputting the feature vectors into a label smooth loss function, and training defined network model parameters for pedestrian re-identification by using a back propagation algorithm to optimize the parameters of the network model, wherein the label smooth loss is adopted in the model training.
Classification of pedestrian re-identification often uses a cross-entropy loss function:
Figure BDA0002480530850000041
wherein N is the total number of pedestrians and is a pedestrian label. When an image i is input, yiIs the pedestrian's label in the image if yiIs class i, which has a value of 1, otherwise it is 0. p is a radical ofiIs the probability that the network predicts that the pedestrian belongs to tag i pedestrian.
The reason for introducing the label smoothing loss function is that the cross entropy loss function excessively depends on a correct pedestrian label, so that the phenomenon of overfitting training is easily caused, and the overfitting phenomenon in the training process is avoided. A small number of error labels may exist in a pedestrian training sample, the error labels may have a certain influence on a prediction result to a certain extent, and the label smoothing loss function may also be used for preventing the model from excessively depending on the labels in the training process. Therefore, the pedestrian label smoothing processing is to set an error rate for the labels in the training process, and train by taking 1-as a real label.
Figure BDA0002480530850000042
Wherein N is the total number of pedestrians and is a pedestrian label. When an image i is input, yiIs the pedestrian's label in the image if yiIs class i, which has a value of 1, otherwise it is 0. p is a radical ofiThe network predicts that the pedestrian belongs to the tag i pedestrianThe probability of (c). Is the tag error rate.
Step (4), model test (103)
Aiming at a query set and a candidate set contained in a pedestrian re-identification data set, calculating Euclidean distances of a specified object in the query set and each object in the candidate set, merging 7 column vectors Gamma together in a vector splicing mode during testing, and calculating similarity. And then, sequencing the calculated distances in an ascending order to obtain a sequencing result of pedestrian re-identification and a pedestrian re-identification result.
In conclusion, the present invention provides a new method for segmenting pedestrian components based on skeleton postures to extract local features without resorting to segmentation estimation for re-identification, based on a large number of uncontrolled variation sources, such as significant changes in pose and viewpoint, complex changes in illumination and poor image quality, challenges faced by ReID. The method utilizes the convolution neural network method to implicitly learn the human body posture from the monocular RGB image by utilizing the feature of the image and the space model related to the image to divide the human body parts, and provides the pedestrian re-identification method based on the skeleton posture division part, which brings certain improvement of accuracy rate and is a reasonable mode in the process of pedestrian identification.

Claims (4)

1. The pedestrian re-identification method based on the multitask skeleton attitude division component is characterized by comprising the following steps of:
preprocessing data;
acquiring a sufficient number of sample images, and carrying out normalization pretreatment on the sample images;
step (2) designing a network model for feature extraction;
the network model consists of two branches: a pedestrian feature extraction branch and a skeleton key point detection branch;
the pedestrian feature extraction branch is a main network, an improved InceptionResNetv2 is used as a backbone network, namely, the last downsampling layer of the original InceptionResNetv2 is temporarily discarded to obtain a space tensor feature set, so that the global features of pedestrians can be obtained;
the framework key point detection branch adopts a VGG network structure, a confidence map is output at the end of the network through 1-to-1 convolution, the number of layers of the confidence map is the same as the number of human body joint points, and each layer represents a heat map of one joint point; the skeleton key points obtained by the skeleton key point detection branches are divided into seven parts, namely seven space tensors alpha, according to the horizontal direction, so that the local features of the pedestrian are obtained;
fusing the global features and the local features in a vector splicing mode to obtain seven space tensors mu, then averagely converging the seven space tensors mu to obtain seven column vectors beta, then performing channel dimensionality reduction by using 1 x 1 convolution to obtain seven column vectors gamma, connecting the seven column vectors gamma to seven full-connection layers, and classifying by Softmax to obtain seven eigenvectors;
step (3) training the network model by adopting a label smooth loss function to enable network parameters to be optimal;
and (4) during testing, combining seven column vectors gamma into a feature vector in a point-by-bit addition mode, calculating the Euclidean distance between the specified object in the query set and each object in the candidate set, and then sequencing the calculated distances in an ascending order to obtain a recognition result.
2. The pedestrian re-identification method based on the multitask skeleton attitude division component as claimed in claim 1, wherein: the pretreatment in the step (1) is specifically as follows: setting the size of an input image, and if the sample image is larger than the size, performing random cutting to obtain the sample image; and if the sample image is smaller than the size, performing equal-scale amplification and then cutting.
3. The pedestrian re-identification method based on the multitask skeleton attitude division component as claimed in claim 1, wherein: if the dimensions of the two characteristic vectors are the same in the step (2), directly fusing in a vector splicing mode; if the feature is in different dimensions, the feature is converted into a vector in the same dimension through linear transformation, and then the vector is fused in a vector splicing mode to enhance the expression capability of the feature.
4. The pedestrian re-identification method based on the multitask skeleton attitude division component as claimed in claim 1, wherein: and (3) specifically, training on an ImageNet database according to the skeleton posture to obtain a pre-training network, inputting the seven eigenvectors generated in the step (2) into a label smoothing loss function to obtain seven loss functions, and training the network model parameters by using a back propagation algorithm until the whole network model converges.
CN202010377073.8A 2020-05-07 2020-05-07 Pedestrian re-identification method based on multitask skeleton posture division component Pending CN111582154A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010377073.8A CN111582154A (en) 2020-05-07 2020-05-07 Pedestrian re-identification method based on multitask skeleton posture division component

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010377073.8A CN111582154A (en) 2020-05-07 2020-05-07 Pedestrian re-identification method based on multitask skeleton posture division component

Publications (1)

Publication Number Publication Date
CN111582154A true CN111582154A (en) 2020-08-25

Family

ID=72112062

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010377073.8A Pending CN111582154A (en) 2020-05-07 2020-05-07 Pedestrian re-identification method based on multitask skeleton posture division component

Country Status (1)

Country Link
CN (1) CN111582154A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200093A (en) * 2020-10-13 2021-01-08 北京邮电大学 Pedestrian re-identification method based on uncertainty estimation
CN112966574A (en) * 2021-02-22 2021-06-15 厦门艾地运动科技有限公司 Human body three-dimensional key point prediction method and device and electronic equipment
CN114359970A (en) * 2022-01-12 2022-04-15 平安科技(深圳)有限公司 Pedestrian re-identification method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832672A (en) * 2017-10-12 2018-03-23 北京航空航天大学 A kind of pedestrian's recognition methods again that more loss functions are designed using attitude information
CN108537136A (en) * 2018-03-19 2018-09-14 复旦大学 The pedestrian's recognition methods again generated based on posture normalized image
CN109784258A (en) * 2019-01-08 2019-05-21 华南理工大学 A kind of pedestrian's recognition methods again cut and merged based on Analysis On Multi-scale Features
CN110163110A (en) * 2019-04-23 2019-08-23 中电科大数据研究院有限公司 A kind of pedestrian's recognition methods again merged based on transfer learning and depth characteristic
CN110717411A (en) * 2019-09-23 2020-01-21 湖北工业大学 Pedestrian re-identification method based on deep layer feature fusion
CN110796026A (en) * 2019-10-10 2020-02-14 湖北工业大学 Pedestrian re-identification method based on global feature stitching

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832672A (en) * 2017-10-12 2018-03-23 北京航空航天大学 A kind of pedestrian's recognition methods again that more loss functions are designed using attitude information
CN108537136A (en) * 2018-03-19 2018-09-14 复旦大学 The pedestrian's recognition methods again generated based on posture normalized image
CN109784258A (en) * 2019-01-08 2019-05-21 华南理工大学 A kind of pedestrian's recognition methods again cut and merged based on Analysis On Multi-scale Features
CN110163110A (en) * 2019-04-23 2019-08-23 中电科大数据研究院有限公司 A kind of pedestrian's recognition methods again merged based on transfer learning and depth characteristic
CN110717411A (en) * 2019-09-23 2020-01-21 湖北工业大学 Pedestrian re-identification method based on deep layer feature fusion
CN110796026A (en) * 2019-10-10 2020-02-14 湖北工业大学 Pedestrian re-identification method based on global feature stitching

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HUANG,H. , ET AL.: "EANet: Enhancing Alignment for Cross-Domain Person Re-identification" *
WU, X. , ET AL.: "Person Re-identification Based on Semantic Segmentation" *
XIE,Y. , ET AL.: "Cross-Camera Person Re-Identification With Body-Guided Attention Network" *
秦晓飞 等: "基于孪生网络和多距离融合的行人再识别" *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200093A (en) * 2020-10-13 2021-01-08 北京邮电大学 Pedestrian re-identification method based on uncertainty estimation
CN112966574A (en) * 2021-02-22 2021-06-15 厦门艾地运动科技有限公司 Human body three-dimensional key point prediction method and device and electronic equipment
CN114359970A (en) * 2022-01-12 2022-04-15 平安科技(深圳)有限公司 Pedestrian re-identification method and device, electronic equipment and storage medium
WO2023134071A1 (en) * 2022-01-12 2023-07-20 平安科技(深圳)有限公司 Person re-identification method and apparatus, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN107832672B (en) Pedestrian re-identification method for designing multi-loss function by utilizing attitude information
CN108764065B (en) Pedestrian re-recognition feature fusion aided learning method
CN109325952B (en) Fashionable garment image segmentation method based on deep learning
CN108520226B (en) Pedestrian re-identification method based on body decomposition and significance detection
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN111325111A (en) Pedestrian re-identification method integrating inverse attention and multi-scale deep supervision
CN111310668B (en) Gait recognition method based on skeleton information
CN111582154A (en) Pedestrian re-identification method based on multitask skeleton posture division component
KR101917354B1 (en) System and Method for Multi Object Tracking based on Reliability Assessment of Learning in Mobile Environment
CN108764019A (en) A kind of Video Events detection method based on multi-source deep learning
CN111582126B (en) Pedestrian re-recognition method based on multi-scale pedestrian contour segmentation fusion
CN113221770A (en) Cross-domain pedestrian re-identification method and system based on multi-feature hybrid learning
CN111488766A (en) Target detection method and device
CN111985332A (en) Gait recognition method for improving loss function based on deep learning
Akanksha et al. A Feature Extraction Approach for Multi-Object Detection Using HoG and LTP.
Wang et al. Summary of object detection based on convolutional neural network
CN114973305B (en) Accurate human body analysis method for crowded people
CN112101154B (en) Video classification method, apparatus, computer device and storage medium
Kavimandan et al. Human action recognition using prominent camera
CN111401286B (en) Pedestrian retrieval method based on component weight generation network
CN114663835A (en) Pedestrian tracking method, system, equipment and storage medium
CN111046861B (en) Method for identifying infrared image, method for constructing identification model and application
Cheng et al. Automatic Data Cleaning System for Large-Scale Location Image Databases Using a Multilevel Extractor and Multiresolution Dissimilarity Calculation
Isayev et al. Investigation of optimal configurations of a convolutional neural network for the identification of objects in real-time
Kamaleswari et al. An Assessment of Object Detection in Thermal (Infrared) Image Processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination