CN112418134A - Multi-stream multi-label pedestrian re-identification method based on pedestrian analysis - Google Patents

Multi-stream multi-label pedestrian re-identification method based on pedestrian analysis Download PDF

Info

Publication number
CN112418134A
CN112418134A CN202011387800.5A CN202011387800A CN112418134A CN 112418134 A CN112418134 A CN 112418134A CN 202011387800 A CN202011387800 A CN 202011387800A CN 112418134 A CN112418134 A CN 112418134A
Authority
CN
China
Prior art keywords
pedestrian
label
model
identification
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011387800.5A
Other languages
Chinese (zh)
Other versions
CN112418134B (en
Inventor
王其聪
王旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Shenzhen Research Institute of Xiamen University
Original Assignee
Xiamen University
Shenzhen Research Institute of Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University, Shenzhen Research Institute of Xiamen University filed Critical Xiamen University
Priority to CN202011387800.5A priority Critical patent/CN112418134B/en
Publication of CN112418134A publication Critical patent/CN112418134A/en
Application granted granted Critical
Publication of CN112418134B publication Critical patent/CN112418134B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/245Classification techniques relating to the decision surface
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A multi-stream multi-label pedestrian re-identification method based on pedestrian analysis relates to a computer vision technology. Preparing a pedestrian re-identification direction data set, analyzing a pedestrian picture through a pedestrian analysis model to obtain masks of an upper half body, a lower half body and a whole body of a pedestrian, and designing an attention mechanism network model of multiple branches according to the obtained masks; adjusting the color tone of the body area of the pedestrian according to the mask, changing the color of the clothes of the pedestrian in the picture, and then expanding the data set as a new pedestrian category; and making a data set with a double-label structure from the newly expanded data set, setting different confidence degrees for each pedestrian, providing a multi-label classification loss function, performing feature representation on the images of the test set by using a trained network model, and performing similarity comparison and sequencing. The method has the advantages that better identification performance is obtained on a plurality of public data sets, and the interference of the problems of background disorder, shielding and the like on the re-identification direction of the pedestrian is effectively relieved.

Description

Multi-stream multi-label pedestrian re-identification method based on pedestrian analysis
Technical Field
The invention relates to a computer vision technology, in particular to a multi-stream multi-label pedestrian re-identification method based on pedestrian analysis.
Background
Pedestrian re-identification is one of the current research hotspots of computer vision directions, with the development of the modern society, the intelligent security field receives more and more attention of people, and the pedestrian re-identification is also widely paid attention as one of the very important research directions in the intelligent security field, so that the rapid development of the pedestrian re-identification field is promoted. However, the pedestrian re-identification direction still has a great number of problems, such as a small number of data sets, a disordered background of a pedestrian picture, occlusion problems and the like. At present, most of research works are to solve the problem of re-identifying the direction of the pedestrian by using a deep neural network, re-identifying the pedestrian as a classification task in a model training stage, and extracting features for similarity comparison in a testing stage.
The data sets of the pedestrian re-identification directions are obtained by shooting through a plurality of cameras without overlapped vision fields, so that the data set images for the pedestrian re-identification task contain the interference of complex background factors, and the pedestrian re-identification task is to judge whether pedestrians in the images shot by different cameras are the same person, so that the complex background factors have a large influence on the pedestrian re-identification task, and more interference information can be brought in the process of extracting features by using a deep neural network. In addition, due to the camera shooting and the effect of the subsequent detection method and the like, the area proportion of the edge frame where the pedestrian region is located is obvious in difference, normally detected pedestrians often occupy the large area region of the edge frame, the proportion of some pedestrians in the picture is small, further learning of the model can be influenced, and the influence of the problem on model learning is not considered in most methods at present. Because pictures are randomly shot by a camera, the influence of the occlusion problem on the task of re-identifying the pedestrians is more remarkable, wherein people like Zhong (z.zhong, l.zheng, g.kang, s.li, and y.yang, "Random interference data evaluation," arXiv preprinting: 1708.04896,2017 ") propose a method for enhancing data, erase the region in the picture with a certain probability, and the method can partially process the occlusion problem, but the method does not consider from the perspective of model design and the perspective of the spatial structure where the pedestrians are located in the picture, and the method can just effectively process the occlusion problem.
Disclosure of Invention
The invention aims to provide a multi-stream multi-label pedestrian re-identification method based on pedestrian analysis, aiming at the technical problems in the existing pedestrian re-identification model.
The invention comprises the following steps:
1) preparing a pedestrian re-identification direction data set;
2) designing a multi-stream attention pedestrian re-identification network model structure based on pedestrian analysis, and extracting more discriminative features of pedestrians through the network model;
3) expanding the prepared data set by means of a pedestrian analysis model to obtain a newly expanded data set;
4) designing a multi-label classification loss function;
5) designing a multi-stream multi-task loss optimization function;
6) on large-scale image data, a ResNet50 network of a main network is pre-trained by using a back propagation algorithm to obtain a pre-trained ResNet50 model;
7) on the basis of a pre-trained ResNet50 model, using an expanded pedestrian re-recognition data set, calculating the loss caused by the model by using a designed multi-task loss optimization function, and performing end-to-end training on the whole constructed model by using a back propagation algorithm to obtain a finally trained model;
8) and carrying out pedestrian re-identification by using the trained model, and using the output features of the final network model as feature representation of the pedestrian image for similarity measurement and sequencing later.
In step 1), the specific method for preparing the pedestrian re-identification direction data set may be: let the pedestrian image in the training set be { (x)i,yi) I ═ 1.., n }, where n is the number of samples in the training set and is a natural number; x is the number ofiFor the ith training sampleCorresponding pedestrian image, yiAnd (1 ≦ yi ≦ N) represents the pedestrian category label of the ith training sample, and N represents the number of pedestrian categories contained in the training sample set and is a natural number.
In step 2), the specific steps of designing the network model structure for re-identifying the pedestrian with multi-stream attention based on pedestrian analysis are as follows:
(1) respectively obtaining masks of the whole body, the upper body and the lower body of the pedestrian in the picture by means of a pedestrian analysis model, wherein except that the pixel point of the specified body area in the mask is 1, the rest areas are 0;
(2) removing a full connection layer which is finally used for classification of the original network, changing global average pooling before the full connection layer into global maximum pooling, and finally copying a network part after the first convolution pooling into four independent branches without parameter sharing among the branches;
(3) in the first branch, no changes are made; and respectively performing attention mechanisms on the whole body, the upper body and the lower body of the pedestrian for the second branch, the third branch and the fourth branch, and respectively performing a channel attention mechanism on the characteristics of the position of each branch part in each branch to finally obtain a multi-stream attention pedestrian re-identification network model structure based on pedestrian analysis.
In step 3), the specific steps of expanding the prepared data set are as follows:
(1) respectively obtaining masks of the whole bodies of pedestrians in the picture by means of a pedestrian analysis model, wherein except that the pixel points of the specified body regions in the masks are 1, the rest regions are 0;
(2) and separating the foreground area and the background area of the pedestrian in the picture by using the obtained mask, and then changing the tone of the foreground area of the pedestrian to realize changing the clothes of the pedestrian so as to achieve the purpose of expanding the data set.
In step 4), the specific steps of designing the multi-label classification loss function are as follows:
(1) class y on the original datasetlabelThe pedestrian of (1) is assumed to be generated in the pedestrian category of
Figure BDA0002810223440000031
Then on the new data set the original picture is owned
Figure BDA0002810223440000032
Two category labels, and will ylabelAs a first category of the original picture,
Figure BDA0002810223440000033
as a second category of the original picture; the generated picture also has two category labels
Figure BDA0002810223440000034
Will be provided with
Figure BDA0002810223440000035
As a first category of newly generated pictures, ylabelAs a second category of newly generated pictures; thus constituting a dual-labeled pedestrian re-identification dataset;
(2) different confidences are set for different classes and smoothing is added, thus resulting in a designed multi-label classification loss function as follows:
Figure BDA0002810223440000036
wherein, P (y)label) The probability of the first class label is predicted for the model,
Figure BDA0002810223440000037
the probability of the second label is predicted for the model.
In step 5), the specific steps of designing the multi-stream multi-task loss optimization function are as follows:
calculating a measurement task and a classification task for each branch, wherein the measurement task adopts a triple loss function, the classification task adopts a multi-label classification loss function, and a finally designed multi-stream multi-task loss optimization function is shown as the following formula:
L=Lmulti-labels+Ltriplet#(2)
wherein L ismulti-labelsFor the mean classification loss of multiple branches of the model, LtripletThe average metric loss for the multiple branches of the model is measured.
Compared with the prior art, the invention has the following advantages:
firstly, designing a multi-stream attention pedestrian re-identification network model structure based on pedestrian analysis, wherein a multi-branch structure is utilized to perform an attention mechanism on pedestrians in a picture so as to eliminate the interference of a background and a shielding area on the re-identification direction of the pedestrians, and the attention mechanism is added to different channels of the same-layer characteristics of different networks, so that the model weights different channels expressing different pedestrian information, and thus characteristics with more discriminative power can be learned; then, a relatively efficient data enhancement method is designed, and the purpose of expanding the data set is achieved by changing the color of the pedestrian clothes in the picture; the newly generated picture is generated based on an original picture, and strong similarity still exists between the newly generated picture and a real picture in the aspects of line texture, outline and background, so that a structure is proposed for the newly generated data set to form a double-label data set, a multi-label classification loss function is proposed, a multi-stream multi-task loss function is designed by combining a measurement task, and the model is better optimized, so that more discriminative feature representation is extracted, subsequent feature similarity measurement and sequencing are completed, and the final pedestrian re-identification result is obtained. The invention effectively relieves the interference of the problems of disordered background, shielding and the like on the direction re-identification of the pedestrian.
Drawings
FIG. 1 is a block diagram of an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the method of the present invention is described in detail below with reference to the accompanying drawings and examples, which are carried out on the premise of the technical solution of the present invention, and the embodiments and the specific operation procedures are given, but the scope of the present invention is not limited to the following examples.
Referring to fig. 1, an implementation of an embodiment of the invention includes the steps of:
1. a pedestrian re-recognition direction data set is prepared. Let the pedestrian image in the training set be { (x)i,yi) I 1.., n }, where n is the number of samples of the training set and is a natural number; x is the number ofiFor the pedestrian image corresponding to the ith training sample, yiAnd (1 ≦ yi ≦ N) represents the pedestrian category label of the ith training sample, and N represents the number of pedestrian categories contained in the training sample set and is a natural number.
2. Designing a multi-stream attention pedestrian re-identification network model structure based on pedestrian analysis, and extracting more discriminative features of pedestrians through the network model.
B1. The masks of the positions of the upper half body, the lower half body and the whole body area of the pedestrian on the picture are respectively marked by the pedestrian analysis model JPPNet, and the masks are respectively marked as masksupper、masklowerAnd maskwhole。maskupperThe pixel values of the areas of the upper half bodies of the middle pedestrians are uniformly set to be 1, the other positions are set to be 0, and the mask islowerThe pixel values of the areas of the lower half bodies of the middle-aged people are uniformly set to be 1, the other positions are set to be 0, and the mask iswholeThe pixel value of the whole body area of the middle pedestrian is set to 1, and the remaining positions are set to 0.
B2. The full connection layer of the ResNet50 network which is finally used for classification is removed, the global average pooling before the full connection layer is changed into global maximum pooling, and finally the network part after the first convolution pooling is copied into four independent branches, and parameters are not shared among the branches.
B3. The network structure of the first branch is the same as the structure after the modified ResNet-50, after the global maximum pooling, the obtained characteristic is 2048-dimensional, then a full connection layer is connected firstly to perform dimension reduction operation on the characteristic of 2048-dimensional, the number of neurons of the full connection layer is set to 512, and the dimension reduction characteristic z of 512-dimensional is obtainedG
B4. In the second branch, the picture of the pedestrian is analyzed by using JPPNet, and the mask of the whole body area of the pedestrian is obtainedwholeThen pooling the first convolutionWeighting the obtained features X to obtain the features X weighted for the attention of the whole pedestrian regionwhole
Xwhole=(1+α1·maskwhole)·X#(1)
Wherein alpha is1The weight parameter is used for performing an attention mechanism on the whole pedestrian body in the picture. The branch is used for carrying out attention mechanism on the whole human body ascending on the picture, the subsequent processing is the same as the processing in the first branch, the dimension reduction is carried out on the obtained 2048-dimensional feature, and the 512-dimensional feature Z is obtainedwhole
B5. In the third and fourth branches, the pedestrian in the picture is analyzed by the JPPNet in the same way, and the mask of the upper half of the pedestrian is obtainedupperAnd mask of lower body1owerWeighting the features X to obtain features X weighted for the upper half area and the lower half area of the person on the pictureupperAnd Xlower
Xupper=(1+α2·maskupper)·X#(2)
Xlower=(1+α3·masklower)·X#(3)
Wherein alpha is2And alpha3The weight parameters are respectively used for performing an attention mechanism on the upper whole body and the lower half body of the pedestrian in the picture. The two branches are respectively used for performing attention mechanism on the upper half body and the lower half body of the person ascending the picture, the subsequent processing is the same as the processing in the first branch, and finally, 512-dimensional characteristics Z of the two branches after dimension reduction are respectively obtainedupperAnd Zlower
B6. In the training stage, the pedestrian re-recognition task is used as a classification task, so that for each branch, a full connection layer is connected to classify each branch, and the neuron number of the full connection layer is the pedestrian category number of the training set. In the testing stage, the stage connects the features after dimensionality reduction of the four branches, and obtains a feature Z for similarity measurement:
Figure BDA0002810223440000051
wherein,
Figure BDA0002810223440000052
representing a cascade of join operations, the resulting dimension of feature Z for the similarity measure is 2048 dimensions.
3. And expanding the prepared data set by means of a pedestrian analysis model to obtain a newly expanded data set.
C1.JPPNet analyzes the pedestrian in the picture to obtain the mask of the whole body of the pedestrianholeRecording the pedestrian label as label, wherein the pedestrian label is a positive value, such as a Market-1501 data set, and the value range of the label is from 0 to 1501;
C2. by the obtained maskwholeSeparating the area of the foreground of the pedestrian from the background area to respectively obtain pictures P and B only containing foreground pixel values and background pixel values;
C3. and if the analysis region is small, namely the proportion of the area of the pedestrians in the region where the pedestrians are located to the total area is less than 0.3, the picture is considered to be failed in analysis, no processing is performed on the picture, and otherwise, the following operation is continued.
And C4, converting the pedestrian foreground picture P in the RGB format into a pedestrian foreground picture in the HSV format. The angle value of the hue in the HSV-format picture is from 0-360, so that in order to ensure the consistency of the hue H after the pedestrian with the same class label is converted, the converted hue is obtained by a formula H-label% 360, then the pedestrian clothing change is carried out on the picture P, and the pedestrian foreground picture P after the clothing change is converted into an RGB format.
C5. And splicing the foreground picture P with the previous background picture B to obtain a new pedestrian picture, and storing the new picture.
4. And designing a multi-label classification loss function.
D1. Class y on the original datasetlabelThe pedestrian of (1) is assumed to be generated in the pedestrian category of
Figure BDA0002810223440000061
Then on the new data set the original picture has possession
Figure BDA0002810223440000062
Two category labels, and will ylabelAs a first category of the original picture,
Figure BDA0002810223440000063
as a second category of the original picture. The generated picture also has two category labels
Figure BDA0002810223440000064
Will be provided with
Figure BDA0002810223440000065
As a first category of newly generated pictures, ylabelAs a second category for newly generated pictures. Thus constituting a double-labeled pedestrian re-identification data set.
D2. A multi-label classification loss function L is designedmulti-labels
Figure BDA0002810223440000066
Wherein, alpha and beta are hyper-parameters, K is the number of categories, P (y)label) The probability of the first class label is predicted for the model,
Figure BDA0002810223440000067
predicting a probability of a second label for the model; in the experiment, α and β were set to 0.1, respectively.
5. And designing a multitask loss optimization function.
E1. Calculating the multi-branch metric average loss:
Figure BDA0002810223440000068
wherein,
Figure BDA0002810223440000069
respectively representing the triple metric loss calculated by the ith branch, i belongs to [1, 2, 3, 4 ]]Representing four branches of the model, respectively.
E2. Calculating the average loss of the multi-label classification function of the multi-branch:
Figure BDA00028102234400000610
wherein,
Figure BDA0002810223440000071
represents the multi-label classification loss obtained by the ith branch calculation, i belongs to [1, 2, 3, 4 ]]Representing four branches of the model, respectively.
E3. Calculating the loss function of the whole model:
L=Lmulti-labels+Ltriplet#(8)
wherein L isclsFor the mean classification loss of multiple branches of the model, LtripletThe average metric loss for the multiple branches of the model is measured.
6. On large-scale image data, a backbone network ResNet50 network is pre-trained by using a back propagation algorithm to obtain a pre-trained ResNet50 model.
7. On the basis of a pre-trained ResNet50 model, an expanded pedestrian re-recognition data set is used, loss brought by a designed multi-task loss optimization function is used for calculating the model, and a back propagation algorithm is used for performing end-to-end training on the whole constructed model to obtain a finally trained model.
8. And carrying out pedestrian re-identification by using the trained model, and using the output features of the final network model as feature representation of the pedestrian image and for similarity measurement and sequencing.
Tables 1-2 show that the method provided by the invention is compared with other pedestrian re-identification results on the Market-1501 and CUHK03 data sets.
TABLE 1
Figure BDA0002810223440000072
TABLE 2
Figure BDA0002810223440000081
In tables 1-2, other methods are as follows:
LSRO corresponds to the method proposed by Zheng et al (Z.Zheng, L.Zheng, and Y.Yang. "Unlabeled samples generated by a y gate enhanced the person re-identification baseline in vitro" arXiv prediction arXiv:1701.07717, vol.3, 2017.);
PNGAN corresponds to the method proposed by Qian et al (X.Qian, Y.Fu, T.Xiang, W.Wang, J.Qia, Y.Wu, Y.G.Jiang, and X.Xue. "position-normalized image generation for person re-identification" in European Conference Computer Vision,2018, pp.661-678.);
CamStyle corresponds to the method proposed by Zhong et al (Z.Zhong, L.Zhong, Z.Zhong, S.Li, and Y.Yang. "Camera style adaptation for person re-identification" in proceedings of the IEEE conference on Computer Vision and Pattern Recognition,2018, pp.5157-5166.);
MLFN corresponds to the method proposed by Chang et al (X.Chang, T.Hospitals, and T.Xiaong. "Multi-level factorization network for person re-identification" in proceedings of the IEEE conference on Computer Vision and Pattern Recognition,2018, vol.1, p.2.);
HA-CNN corresponds to the method proposed by Li et al (W.Li, X.Zhu, and S.Gong. "Harmonious attachment network for person identification" in proceedings of the IEEE conference on Computer Vision and Pattern Recognition,2018, pp.2285-2294.);
PCB corresponds to the method proposed by Sun et al (Y.Sun, L.Zheng, Y.Yang., et al. "Beyond part models: Person retrieved with refined part seed" in proceedings of the European Conference on Computer Vision,2018: 480. 496.);
MGN corresponds to the method proposed by Wang et al (G.Wang, Y.Yuan, X.Chen, et al, "Learning discrete creativity foods with multiple granularities for person re-identification" in programs of the 26th ACM international conference on Multimedia,2018: 274-;
the method proposed by OSNet in accordance with Zhou et al (K.Zhou, Y.Yang, A.Cavallaro, et al. "Omni-scale feature extraction for person re-identification" in proceedings of the IEEE International Conference on Computer Vision,2019: 3702-;
PAN corresponds to the method proposed by Zheng et al (Z.Zheng, L.Zheng, Y.Yang, et al. "pendant alignment network for large-scale person re-identification" in IEEE Transactions on Circuits and Systems for Video Technology,2018,29(10): 3037. 3045);
AANet corresponds to the method proposed by Tay et al (C.Tay, S.Roy, K.Yap. "AaNet: Attribute authentication network for person re-identities" in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2019: 7134-;
FPR corresponds to the method proposed by He et al (L.He, Y.Wang, W.Liu, et al, "formed-aware Pyramid recovery for Alignment-free Ocmass Person Re-identification" in proceedings of the IEEE International Conference on Computer Vision,2019: 8450-;
CRAN corresponds to the method proposed by Han et al (C.Han, R.Zheng, C.Gao, et al. "completion-required Activity Network for Person Re-Identification" in IEEE Transactions on Circuits and Systems for Video Technology, 2019.);
CASN corresponds to the method proposed by Zheng et al (M.Zheng, S.Karanam, Z.Wu, et al. "Re-identification with a consistent attribute parameter networks" in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2019: 5735-;
JPPNet corresponds to the method proposed by Liang et al (X.Liang, K.Gong, X.Shen, et al. "Look inter person: Joint body matching & position estimation network and a new benchmark" in IEEE Transactions on Pattern Analysis and Machine Analysis, 2018,41(4): 871-885.).
Analyzing a pedestrian picture through a pedestrian analysis model to obtain masks of an upper half body, a lower half body and a whole body of a pedestrian, and designing a multi-branch attention mechanism network model according to the obtained masks, wherein each branch fully utilizes local information of the pedestrian to learn the model; secondly, adjusting the color of the body area of the pedestrian according to the obtained mask, changing the color of the clothes of the pedestrian in the picture, and then serving as a new pedestrian category to achieve the purpose of expanding the data set; based on the consideration, the newly expanded data set is made into a data set with a double-label structure, each pedestrian has two corresponding labels which are a first class label and a second class label respectively, and different confidence coefficients are set for different labels in the classification process, so that a multi-label classification loss function is provided, and the model can learn more distinctive features. And finally, performing feature representation on the images in the test set by using the trained network model, and performing similarity comparison and sequencing. According to experimental analysis, the method reduces the interference of problems such as background clutter and shielding on the re-identification direction of the pedestrian, and obtains better identification performance on a plurality of public data sets.

Claims (6)

1. The pedestrian re-identification method based on pedestrian analysis is characterized by comprising the following steps of:
1) preparing a pedestrian re-identification direction data set;
2) designing a multi-stream attention pedestrian re-identification network model structure based on pedestrian analysis, and extracting more discriminative features of pedestrians through the network model;
3) expanding the prepared data set by means of a pedestrian analysis model to obtain a newly expanded data set;
4) designing a multi-label classification loss function;
5) designing a multi-stream multi-task loss optimization function;
6) on large-scale image data, a ResNet50 network of a main network is pre-trained by using a back propagation algorithm to obtain a pre-trained ResNet50 model;
7) on the basis of a pre-trained ResNet50 model, using an expanded pedestrian re-recognition data set, calculating the loss caused by the model by using a designed multi-task loss optimization function, and performing end-to-end training on the whole constructed model by using a back propagation algorithm to obtain a finally trained model;
8) and carrying out pedestrian re-identification by using the trained model, and using the output features of the final network model as feature representation of the pedestrian image for similarity measurement and sequencing later.
2. The pedestrian re-identification method based on pedestrian parsing in multi-stream and multi-label manner as claimed in claim 1, wherein in step 1), the specific method for preparing the pedestrian re-identification direction data set is as follows: let the pedestrian image in the training set be { (x)i,yi) I ═ 1.., n }, where n is the number of samples in the training set and is a natural number; x is the number ofiFor the pedestrian image corresponding to the ith training sample, yiAnd (1 ≦ yi ≦ N) represents the pedestrian category label of the ith training sample, and N represents the number of pedestrian categories contained in the training sample set and is a natural number.
3. The pedestrian-analysis-based multi-stream multi-label pedestrian re-identification method according to claim 1, wherein in the step 2), the specific steps of designing the pedestrian-analysis-based multi-stream attention pedestrian re-identification network model structure are as follows:
(1) respectively obtaining masks of the whole body, the upper body and the lower body of the pedestrian in the picture by means of a pedestrian analysis model, wherein except that the pixel point of the specified body area in the mask is 1, the rest areas are 0;
(2) removing a full connection layer which is finally used for classification of the original network, changing global average pooling before the full connection layer into global maximum pooling, and finally copying a network part after the first convolution pooling into four independent branches without parameter sharing among the branches;
(3) in the first branch, no changes are made; and respectively performing attention mechanisms on the whole body, the upper body and the lower body of the pedestrian for the second branch, the third branch and the fourth branch, and respectively performing a channel attention mechanism on the characteristics of the position of each branch part in each branch to finally obtain a multi-stream attention pedestrian re-identification network model structure based on pedestrian analysis.
4. The pedestrian-parsing-based multi-stream multi-tag pedestrian re-identification method according to claim 1, wherein in the step 3), the specific steps of expanding the already prepared data set are as follows:
(1) respectively obtaining masks of the whole bodies of pedestrians in the picture by means of a pedestrian analysis model, wherein except that the pixel points of the specified body regions in the masks are 1, the rest regions are 0;
(2) and separating the foreground area and the background area of the pedestrian in the picture by using the obtained mask, and then changing the tone of the foreground area of the pedestrian to realize changing the clothes of the pedestrian so as to achieve the purpose of expanding the data set.
5. The pedestrian-parsing-based multi-stream multi-label pedestrian re-identification method according to claim 1, wherein in the step 4), the specific steps of designing the multi-label classification loss function are as follows:
(1) class y on the original datasetlabelThe pedestrian of (1) is assumed to be generated in the pedestrian category of
Figure FDA0002810223430000021
Then on the new data set the original picture is owned
Figure FDA0002810223430000022
Two category labels, and will ylabelAs a first category of the original picture,
Figure FDA0002810223430000023
as a second category of the original picture; the generated picture also has two category labels
Figure FDA0002810223430000024
Will be provided with
Figure FDA0002810223430000025
As a first category of newly generated pictures, ylabelAs a second category of newly generated pictures; thus constituting a dual-labeled pedestrian re-identification dataset;
(2) different confidences are set for different classes and smoothing is added, thus resulting in a designed multi-label classification loss function as follows:
Figure FDA0002810223430000026
wherein, P (y)label) The probability of the first class label is predicted for the model,
Figure FDA0002810223430000027
the probability of the second label is predicted for the model.
6. The pedestrian re-identification method based on pedestrian parsing in multi-stream and multi-label manner as claimed in claim 1, wherein in step 5), the specific steps of designing the multi-stream and multi-task loss optimization function are as follows:
calculating a measurement task and a classification task for each branch, wherein the measurement task adopts a triple loss function, the classification task adopts a multi-label classification loss function, and a finally designed multi-stream multi-task loss optimization function is shown as the following formula:
L=Lmulti-labels+Ltriplet#(2)
wherein L ismulti-labelsFor the mean classification loss of multiple branches of the model, LtripletIs a modelThe average metric loss over multiple branches.
CN202011387800.5A 2020-12-01 2020-12-01 Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method Active CN112418134B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011387800.5A CN112418134B (en) 2020-12-01 2020-12-01 Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011387800.5A CN112418134B (en) 2020-12-01 2020-12-01 Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method

Publications (2)

Publication Number Publication Date
CN112418134A true CN112418134A (en) 2021-02-26
CN112418134B CN112418134B (en) 2024-02-27

Family

ID=74829464

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011387800.5A Active CN112418134B (en) 2020-12-01 2020-12-01 Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method

Country Status (1)

Country Link
CN (1) CN112418134B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686228A (en) * 2021-03-12 2021-04-20 深圳市安软科技股份有限公司 Pedestrian attribute identification method and device, electronic equipment and storage medium
CN113095174A (en) * 2021-03-29 2021-07-09 深圳力维智联技术有限公司 Re-recognition model training method, device, equipment and readable storage medium
CN113255604A (en) * 2021-06-29 2021-08-13 苏州浪潮智能科技有限公司 Pedestrian re-identification method, device, equipment and medium based on deep learning network
CN114758362A (en) * 2022-06-15 2022-07-15 山东省人工智能研究院 Clothing changing pedestrian re-identification method based on semantic perception attention and visual masking
CN114998934A (en) * 2022-06-27 2022-09-02 山东省人工智能研究院 Clothes-changing pedestrian re-identification and retrieval method based on multi-mode intelligent perception and fusion
CN115147873A (en) * 2022-09-01 2022-10-04 汉斯夫(杭州)医学科技有限公司 Method, equipment and medium for automatically classifying dental images based on dual-label cascade
CN115661722A (en) * 2022-11-16 2023-01-31 北京航空航天大学 Pedestrian re-identification method combining attributes and orientation
CN116129473A (en) * 2023-04-17 2023-05-16 山东省人工智能研究院 Identity-guide-based combined learning clothing changing pedestrian re-identification method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977893A (en) * 2019-04-01 2019-07-05 厦门大学 Depth multitask pedestrian recognition methods again based on the study of level conspicuousness channel
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
JP2020101968A (en) * 2018-12-21 2020-07-02 株式会社 日立産業制御ソリューションズ Multi-label data learning assisting apparatus, multi-label data learning assisting method and multi-label data learning assisting program
CN111738213A (en) * 2020-07-20 2020-10-02 平安国际智慧城市科技股份有限公司 Person attribute identification method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
JP2020101968A (en) * 2018-12-21 2020-07-02 株式会社 日立産業制御ソリューションズ Multi-label data learning assisting apparatus, multi-label data learning assisting method and multi-label data learning assisting program
CN109977893A (en) * 2019-04-01 2019-07-05 厦门大学 Depth multitask pedestrian recognition methods again based on the study of level conspicuousness channel
CN111738213A (en) * 2020-07-20 2020-10-02 平安国际智慧城市科技股份有限公司 Person attribute identification method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴杰;王怡涵;侯米娜;全晓鹏;: "基于注意力机制的行人属性识别", 电子世界, no. 02, 30 January 2020 (2020-01-30) *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686228A (en) * 2021-03-12 2021-04-20 深圳市安软科技股份有限公司 Pedestrian attribute identification method and device, electronic equipment and storage medium
CN113095174A (en) * 2021-03-29 2021-07-09 深圳力维智联技术有限公司 Re-recognition model training method, device, equipment and readable storage medium
US11810388B1 (en) 2021-06-29 2023-11-07 Inspur Suzhou Intelligent Technology Co., Ltd. Person re-identification method and apparatus based on deep learning network, device, and medium
CN113255604A (en) * 2021-06-29 2021-08-13 苏州浪潮智能科技有限公司 Pedestrian re-identification method, device, equipment and medium based on deep learning network
CN113255604B (en) * 2021-06-29 2021-10-15 苏州浪潮智能科技有限公司 Pedestrian re-identification method, device, equipment and medium based on deep learning network
CN114758362A (en) * 2022-06-15 2022-07-15 山东省人工智能研究院 Clothing changing pedestrian re-identification method based on semantic perception attention and visual masking
CN114758362B (en) * 2022-06-15 2022-10-11 山东省人工智能研究院 Clothing changing pedestrian re-identification method based on semantic perception attention and visual shielding
CN114998934A (en) * 2022-06-27 2022-09-02 山东省人工智能研究院 Clothes-changing pedestrian re-identification and retrieval method based on multi-mode intelligent perception and fusion
CN114998934B (en) * 2022-06-27 2023-01-03 山东省人工智能研究院 Clothes-changing pedestrian re-identification and retrieval method based on multi-mode intelligent perception and fusion
CN115147873A (en) * 2022-09-01 2022-10-04 汉斯夫(杭州)医学科技有限公司 Method, equipment and medium for automatically classifying dental images based on dual-label cascade
CN115661722B (en) * 2022-11-16 2023-06-06 北京航空航天大学 Pedestrian re-identification method combining attribute and orientation
CN115661722A (en) * 2022-11-16 2023-01-31 北京航空航天大学 Pedestrian re-identification method combining attributes and orientation
CN116129473A (en) * 2023-04-17 2023-05-16 山东省人工智能研究院 Identity-guide-based combined learning clothing changing pedestrian re-identification method and system

Also Published As

Publication number Publication date
CN112418134B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
CN112418134B (en) Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method
Hafiz et al. A survey on instance segmentation: state of the art
Chen et al. Global context-aware progressive aggregation network for salient object detection
CN107609460B (en) Human body behavior recognition method integrating space-time dual network flow and attention mechanism
Song et al. Mask-guided contrastive attention model for person re-identification
CN109670405B (en) Complex background pedestrian detection method based on deep learning
CN114758288B (en) Power distribution network engineering safety control detection method and device
CN109977893B (en) Deep multitask pedestrian re-identification method based on hierarchical saliency channel learning
CN108647628B (en) Micro-expression recognition method based on multi-feature multi-task dictionary sparse transfer learning
CN112348036A (en) Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade
CN109035196B (en) Saliency-based image local blur detection method
CN107992874A (en) Image well-marked target method for extracting region and system based on iteration rarefaction representation
CN110033007A (en) Attribute recognition approach is worn clothes based on the pedestrian of depth attitude prediction and multiple features fusion
Qin et al. Automatic skin and hair masking using fully convolutional networks
CN111695455B (en) Low-resolution face recognition method based on coupling discrimination manifold alignment
Zhang et al. A small target detection method based on deep learning with considerate feature and effectively expanded sample size
Zhong et al. Key frame extraction algorithm of motion video based on priori
Zhou et al. An ica mixture hidden markov model for video content analysis
CN116311345A (en) Transformer-based pedestrian shielding re-recognition method
Tang et al. Research of color image segmentation algorithm based on asymmetric kernel density estimation
CN112070041B (en) Living body face detection method and device based on CNN deep learning model
Li et al. Distribution-Guided Hierarchical Calibration Contrastive Network for Unsupervised Person Re-Identification
Pang et al. Rotative maximal pattern: A local coloring descriptor for object classification and recognition
Campbell et al. Automatic Interpretation of Outdoor Scenes.
CN106815845A (en) Color image segmentation method based on pixels probability density classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant