CN112418134A

CN112418134A - Multi-stream multi-label pedestrian re-identification method based on pedestrian analysis

Info

Publication number: CN112418134A
Application number: CN202011387800.5A
Authority: CN
Inventors: 王其聪; 王旭
Original assignee: Xiamen University; Shenzhen Research Institute of Xiamen University
Current assignee: Xiamen University; Shenzhen Research Institute of Xiamen University
Priority date: 2020-12-01
Filing date: 2020-12-01
Publication date: 2021-02-26
Anticipated expiration: 2040-12-01
Also published as: CN112418134B

Abstract

A multi-stream multi-label pedestrian re-identification method based on pedestrian analysis relates to a computer vision technology. Preparing a pedestrian re-identification direction data set, analyzing a pedestrian picture through a pedestrian analysis model to obtain masks of an upper half body, a lower half body and a whole body of a pedestrian, and designing an attention mechanism network model of multiple branches according to the obtained masks; adjusting the color tone of the body area of the pedestrian according to the mask, changing the color of the clothes of the pedestrian in the picture, and then expanding the data set as a new pedestrian category; and making a data set with a double-label structure from the newly expanded data set, setting different confidence degrees for each pedestrian, providing a multi-label classification loss function, performing feature representation on the images of the test set by using a trained network model, and performing similarity comparison and sequencing. The method has the advantages that better identification performance is obtained on a plurality of public data sets, and the interference of the problems of background disorder, shielding and the like on the re-identification direction of the pedestrian is effectively relieved.

Description

Multi-stream multi-label pedestrian re-identification method based on pedestrian analysis

Technical Field

The invention relates to a computer vision technology, in particular to a multi-stream multi-label pedestrian re-identification method based on pedestrian analysis.

Background

Pedestrian re-identification is one of the current research hotspots of computer vision directions, with the development of the modern society, the intelligent security field receives more and more attention of people, and the pedestrian re-identification is also widely paid attention as one of the very important research directions in the intelligent security field, so that the rapid development of the pedestrian re-identification field is promoted. However, the pedestrian re-identification direction still has a great number of problems, such as a small number of data sets, a disordered background of a pedestrian picture, occlusion problems and the like. At present, most of research works are to solve the problem of re-identifying the direction of the pedestrian by using a deep neural network, re-identifying the pedestrian as a classification task in a model training stage, and extracting features for similarity comparison in a testing stage.

The data sets of the pedestrian re-identification directions are obtained by shooting through a plurality of cameras without overlapped vision fields, so that the data set images for the pedestrian re-identification task contain the interference of complex background factors, and the pedestrian re-identification task is to judge whether pedestrians in the images shot by different cameras are the same person, so that the complex background factors have a large influence on the pedestrian re-identification task, and more interference information can be brought in the process of extracting features by using a deep neural network. In addition, due to the camera shooting and the effect of the subsequent detection method and the like, the area proportion of the edge frame where the pedestrian region is located is obvious in difference, normally detected pedestrians often occupy the large area region of the edge frame, the proportion of some pedestrians in the picture is small, further learning of the model can be influenced, and the influence of the problem on model learning is not considered in most methods at present. Because pictures are randomly shot by a camera, the influence of the occlusion problem on the task of re-identifying the pedestrians is more remarkable, wherein people like Zhong (z.zhong, l.zheng, g.kang, s.li, and y.yang, "Random interference data evaluation," arXiv preprinting: 1708.04896,2017 ") propose a method for enhancing data, erase the region in the picture with a certain probability, and the method can partially process the occlusion problem, but the method does not consider from the perspective of model design and the perspective of the spatial structure where the pedestrians are located in the picture, and the method can just effectively process the occlusion problem.

Disclosure of Invention

The invention aims to provide a multi-stream multi-label pedestrian re-identification method based on pedestrian analysis, aiming at the technical problems in the existing pedestrian re-identification model.

The invention comprises the following steps:

1) preparing a pedestrian re-identification direction data set;

2) designing a multi-stream attention pedestrian re-identification network model structure based on pedestrian analysis, and extracting more discriminative features of pedestrians through the network model;

3) expanding the prepared data set by means of a pedestrian analysis model to obtain a newly expanded data set;

4) designing a multi-label classification loss function;

5) designing a multi-stream multi-task loss optimization function;

6) on large-scale image data, a ResNet50 network of a main network is pre-trained by using a back propagation algorithm to obtain a pre-trained ResNet50 model;

7) on the basis of a pre-trained ResNet50 model, using an expanded pedestrian re-recognition data set, calculating the loss caused by the model by using a designed multi-task loss optimization function, and performing end-to-end training on the whole constructed model by using a back propagation algorithm to obtain a finally trained model;

8) and carrying out pedestrian re-identification by using the trained model, and using the output features of the final network model as feature representation of the pedestrian image for similarity measurement and sequencing later.

In step 1), the specific method for preparing the pedestrian re-identification direction data set may be: let the pedestrian image in the training set be { (x)_i,y_i) I ═ 1.., n }, where n is the number of samples in the training set and is a natural number; x is the number of_iFor the ith training sampleCorresponding pedestrian image, y_iAnd (1 ≦ yi ≦ N) represents the pedestrian category label of the ith training sample, and N represents the number of pedestrian categories contained in the training sample set and is a natural number.

In step 2), the specific steps of designing the network model structure for re-identifying the pedestrian with multi-stream attention based on pedestrian analysis are as follows:

(1) respectively obtaining masks of the whole body, the upper body and the lower body of the pedestrian in the picture by means of a pedestrian analysis model, wherein except that the pixel point of the specified body area in the mask is 1, the rest areas are 0;

(2) removing a full connection layer which is finally used for classification of the original network, changing global average pooling before the full connection layer into global maximum pooling, and finally copying a network part after the first convolution pooling into four independent branches without parameter sharing among the branches;

(3) in the first branch, no changes are made; and respectively performing attention mechanisms on the whole body, the upper body and the lower body of the pedestrian for the second branch, the third branch and the fourth branch, and respectively performing a channel attention mechanism on the characteristics of the position of each branch part in each branch to finally obtain a multi-stream attention pedestrian re-identification network model structure based on pedestrian analysis.

In step 3), the specific steps of expanding the prepared data set are as follows:

(1) respectively obtaining masks of the whole bodies of pedestrians in the picture by means of a pedestrian analysis model, wherein except that the pixel points of the specified body regions in the masks are 1, the rest regions are 0;

(2) and separating the foreground area and the background area of the pedestrian in the picture by using the obtained mask, and then changing the tone of the foreground area of the pedestrian to realize changing the clothes of the pedestrian so as to achieve the purpose of expanding the data set.

In step 4), the specific steps of designing the multi-label classification loss function are as follows:

(1) class y on the original dataset_labelThe pedestrian of (1) is assumed to be generated in the pedestrian category of

Then on the new data set the original picture is owned

Two category labels, and will y_labelAs a first category of the original picture,

as a second category of the original picture; the generated picture also has two category labels

Will be provided with

As a first category of newly generated pictures, y_labelAs a second category of newly generated pictures; thus constituting a dual-labeled pedestrian re-identification dataset;

(2) different confidences are set for different classes and smoothing is added, thus resulting in a designed multi-label classification loss function as follows:

wherein, P (y)_label) The probability of the first class label is predicted for the model,

the probability of the second label is predicted for the model.

In step 5), the specific steps of designing the multi-stream multi-task loss optimization function are as follows:

calculating a measurement task and a classification task for each branch, wherein the measurement task adopts a triple loss function, the classification task adopts a multi-label classification loss function, and a finally designed multi-stream multi-task loss optimization function is shown as the following formula:

L＝L_multi-labels+L_triplet#(2)

wherein L is_multi-labelsFor the mean classification loss of multiple branches of the model, L_tripletThe average metric loss for the multiple branches of the model is measured.

Compared with the prior art, the invention has the following advantages:

firstly, designing a multi-stream attention pedestrian re-identification network model structure based on pedestrian analysis, wherein a multi-branch structure is utilized to perform an attention mechanism on pedestrians in a picture so as to eliminate the interference of a background and a shielding area on the re-identification direction of the pedestrians, and the attention mechanism is added to different channels of the same-layer characteristics of different networks, so that the model weights different channels expressing different pedestrian information, and thus characteristics with more discriminative power can be learned; then, a relatively efficient data enhancement method is designed, and the purpose of expanding the data set is achieved by changing the color of the pedestrian clothes in the picture; the newly generated picture is generated based on an original picture, and strong similarity still exists between the newly generated picture and a real picture in the aspects of line texture, outline and background, so that a structure is proposed for the newly generated data set to form a double-label data set, a multi-label classification loss function is proposed, a multi-stream multi-task loss function is designed by combining a measurement task, and the model is better optimized, so that more discriminative feature representation is extracted, subsequent feature similarity measurement and sequencing are completed, and the final pedestrian re-identification result is obtained. The invention effectively relieves the interference of the problems of disordered background, shielding and the like on the direction re-identification of the pedestrian.

Drawings

FIG. 1 is a block diagram of an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the method of the present invention is described in detail below with reference to the accompanying drawings and examples, which are carried out on the premise of the technical solution of the present invention, and the embodiments and the specific operation procedures are given, but the scope of the present invention is not limited to the following examples.

Referring to fig. 1, an implementation of an embodiment of the invention includes the steps of:

1. a pedestrian re-recognition direction data set is prepared. Let the pedestrian image in the training set be { (x)_i,y_i) I 1.., n }, where n is the number of samples of the training set and is a natural number; x is the number of_iFor the pedestrian image corresponding to the ith training sample, y_iAnd (1 ≦ yi ≦ N) represents the pedestrian category label of the ith training sample, and N represents the number of pedestrian categories contained in the training sample set and is a natural number.

2. Designing a multi-stream attention pedestrian re-identification network model structure based on pedestrian analysis, and extracting more discriminative features of pedestrians through the network model.

B1. The masks of the positions of the upper half body, the lower half body and the whole body area of the pedestrian on the picture are respectively marked by the pedestrian analysis model JPPNet, and the masks are respectively marked as masks_upper、mask_lowerAnd mask_whole。mask_upperThe pixel values of the areas of the upper half bodies of the middle pedestrians are uniformly set to be 1, the other positions are set to be 0, and the mask is_lowerThe pixel values of the areas of the lower half bodies of the middle-aged people are uniformly set to be 1, the other positions are set to be 0, and the mask is_wholeThe pixel value of the whole body area of the middle pedestrian is set to 1, and the remaining positions are set to 0.

B2. The full connection layer of the ResNet50 network which is finally used for classification is removed, the global average pooling before the full connection layer is changed into global maximum pooling, and finally the network part after the first convolution pooling is copied into four independent branches, and parameters are not shared among the branches.

B3. The network structure of the first branch is the same as the structure after the modified ResNet-50, after the global maximum pooling, the obtained characteristic is 2048-dimensional, then a full connection layer is connected firstly to perform dimension reduction operation on the characteristic of 2048-dimensional, the number of neurons of the full connection layer is set to 512, and the dimension reduction characteristic z of 512-dimensional is obtained_G。

B4. In the second branch, the picture of the pedestrian is analyzed by using JPPNet, and the mask of the whole body area of the pedestrian is obtained_wholeThen pooling the first convolutionWeighting the obtained features X to obtain the features X weighted for the attention of the whole pedestrian region_whole，

X_whole＝(1+α₁·mask_whole)·X#(1)

Wherein alpha is₁The weight parameter is used for performing an attention mechanism on the whole pedestrian body in the picture. The branch is used for carrying out attention mechanism on the whole human body ascending on the picture, the subsequent processing is the same as the processing in the first branch, the dimension reduction is carried out on the obtained 2048-dimensional feature, and the 512-dimensional feature Z is obtained_whole。

B5. In the third and fourth branches, the pedestrian in the picture is analyzed by the JPPNet in the same way, and the mask of the upper half of the pedestrian is obtained_upperAnd mask of lower body_1owerWeighting the features X to obtain features X weighted for the upper half area and the lower half area of the person on the picture_upperAnd X_lower，

X_upper＝(1+α₂·mask_upper)·X#(2)

X_lower＝(1+α₃·mask_lower)·X#(3)

Wherein alpha is₂And alpha₃The weight parameters are respectively used for performing an attention mechanism on the upper whole body and the lower half body of the pedestrian in the picture. The two branches are respectively used for performing attention mechanism on the upper half body and the lower half body of the person ascending the picture, the subsequent processing is the same as the processing in the first branch, and finally, 512-dimensional characteristics Z of the two branches after dimension reduction are respectively obtained_upperAnd Z_lower。

B6. In the training stage, the pedestrian re-recognition task is used as a classification task, so that for each branch, a full connection layer is connected to classify each branch, and the neuron number of the full connection layer is the pedestrian category number of the training set. In the testing stage, the stage connects the features after dimensionality reduction of the four branches, and obtains a feature Z for similarity measurement:

wherein,

representing a cascade of join operations, the resulting dimension of feature Z for the similarity measure is 2048 dimensions.

3. And expanding the prepared data set by means of a pedestrian analysis model to obtain a newly expanded data set.

C1.JPPNet analyzes the pedestrian in the picture to obtain the mask of the whole body of the pedestrian_holeRecording the pedestrian label as label, wherein the pedestrian label is a positive value, such as a Market-1501 data set, and the value range of the label is from 0 to 1501;

C2. by the obtained mask_wholeSeparating the area of the foreground of the pedestrian from the background area to respectively obtain pictures P and B only containing foreground pixel values and background pixel values;

C3. and if the analysis region is small, namely the proportion of the area of the pedestrians in the region where the pedestrians are located to the total area is less than 0.3, the picture is considered to be failed in analysis, no processing is performed on the picture, and otherwise, the following operation is continued.

And C4, converting the pedestrian foreground picture P in the RGB format into a pedestrian foreground picture in the HSV format. The angle value of the hue in the HSV-format picture is from 0-360, so that in order to ensure the consistency of the hue H after the pedestrian with the same class label is converted, the converted hue is obtained by a formula H-label% 360, then the pedestrian clothing change is carried out on the picture P, and the pedestrian foreground picture P after the clothing change is converted into an RGB format.

C5. And splicing the foreground picture P with the previous background picture B to obtain a new pedestrian picture, and storing the new picture.

4. And designing a multi-label classification loss function.

D1. Class y on the original dataset_labelThe pedestrian of (1) is assumed to be generated in the pedestrian category of

Then on the new data set the original picture has possession

as a second category of the original picture. The generated picture also has two category labels

Will be provided with

As a first category of newly generated pictures, y_labelAs a second category for newly generated pictures. Thus constituting a double-labeled pedestrian re-identification data set.

D2. A multi-label classification loss function L is designed_multi-labels：

Wherein, alpha and beta are hyper-parameters, K is the number of categories, P (y)_label) The probability of the first class label is predicted for the model,

predicting a probability of a second label for the model; in the experiment, α and β were set to 0.1, respectively.

5. And designing a multitask loss optimization function.

E1. Calculating the multi-branch metric average loss:

wherein,

respectively representing the triple metric loss calculated by the ith branch, i belongs to [1, 2, 3, 4 ]]Representing four branches of the model, respectively.

E2. Calculating the average loss of the multi-label classification function of the multi-branch:

wherein,

represents the multi-label classification loss obtained by the ith branch calculation, i belongs to [1, 2, 3, 4 ]]Representing four branches of the model, respectively.

E3. Calculating the loss function of the whole model:

L＝L_multi-labels+L_triplet#(8)

wherein L is_clsFor the mean classification loss of multiple branches of the model, L_tripletThe average metric loss for the multiple branches of the model is measured.

6. On large-scale image data, a backbone network ResNet50 network is pre-trained by using a back propagation algorithm to obtain a pre-trained ResNet50 model.

7. On the basis of a pre-trained ResNet50 model, an expanded pedestrian re-recognition data set is used, loss brought by a designed multi-task loss optimization function is used for calculating the model, and a back propagation algorithm is used for performing end-to-end training on the whole constructed model to obtain a finally trained model.

8. And carrying out pedestrian re-identification by using the trained model, and using the output features of the final network model as feature representation of the pedestrian image and for similarity measurement and sequencing.

Tables 1-2 show that the method provided by the invention is compared with other pedestrian re-identification results on the Market-1501 and CUHK03 data sets.

TABLE 1

TABLE 2

In tables 1-2, other methods are as follows:

LSRO corresponds to the method proposed by Zheng et al (Z.Zheng, L.Zheng, and Y.Yang. "Unlabeled samples generated by a y gate enhanced the person re-identification baseline in vitro" arXiv prediction arXiv:1701.07717, vol.3, 2017.);

PNGAN corresponds to the method proposed by Qian et al (X.Qian, Y.Fu, T.Xiang, W.Wang, J.Qia, Y.Wu, Y.G.Jiang, and X.Xue. "position-normalized image generation for person re-identification" in European Conference Computer Vision,2018, pp.661-678.);

CamStyle corresponds to the method proposed by Zhong et al (Z.Zhong, L.Zhong, Z.Zhong, S.Li, and Y.Yang. "Camera style adaptation for person re-identification" in proceedings of the IEEE conference on Computer Vision and Pattern Recognition,2018, pp.5157-5166.);

MLFN corresponds to the method proposed by Chang et al (X.Chang, T.Hospitals, and T.Xiaong. "Multi-level factorization network for person re-identification" in proceedings of the IEEE conference on Computer Vision and Pattern Recognition,2018, vol.1, p.2.);

HA-CNN corresponds to the method proposed by Li et al (W.Li, X.Zhu, and S.Gong. "Harmonious attachment network for person identification" in proceedings of the IEEE conference on Computer Vision and Pattern Recognition,2018, pp.2285-2294.);

PCB corresponds to the method proposed by Sun et al (Y.Sun, L.Zheng, Y.Yang., et al. "Beyond part models: Person retrieved with refined part seed" in proceedings of the European Conference on Computer Vision,2018: 480. 496.);

MGN corresponds to the method proposed by Wang et al (G.Wang, Y.Yuan, X.Chen, et al, "Learning discrete creativity foods with multiple granularities for person re-identification" in programs of the 26th ACM international conference on Multimedia,2018: 274-;

the method proposed by OSNet in accordance with Zhou et al (K.Zhou, Y.Yang, A.Cavallaro, et al. "Omni-scale feature extraction for person re-identification" in proceedings of the IEEE International Conference on Computer Vision,2019: 3702-;

PAN corresponds to the method proposed by Zheng et al (Z.Zheng, L.Zheng, Y.Yang, et al. "pendant alignment network for large-scale person re-identification" in IEEE Transactions on Circuits and Systems for Video Technology,2018,29(10): 3037. 3045);

AANet corresponds to the method proposed by Tay et al (C.Tay, S.Roy, K.Yap. "AaNet: Attribute authentication network for person re-identities" in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2019: 7134-;

FPR corresponds to the method proposed by He et al (L.He, Y.Wang, W.Liu, et al, "formed-aware Pyramid recovery for Alignment-free Ocmass Person Re-identification" in proceedings of the IEEE International Conference on Computer Vision,2019: 8450-;

CRAN corresponds to the method proposed by Han et al (C.Han, R.Zheng, C.Gao, et al. "completion-required Activity Network for Person Re-Identification" in IEEE Transactions on Circuits and Systems for Video Technology, 2019.);

CASN corresponds to the method proposed by Zheng et al (M.Zheng, S.Karanam, Z.Wu, et al. "Re-identification with a consistent attribute parameter networks" in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2019: 5735-;

JPPNet corresponds to the method proposed by Liang et al (X.Liang, K.Gong, X.Shen, et al. "Look inter person: Joint body matching & position estimation network and a new benchmark" in IEEE Transactions on Pattern Analysis and Machine Analysis, 2018,41(4): 871-885.).

Analyzing a pedestrian picture through a pedestrian analysis model to obtain masks of an upper half body, a lower half body and a whole body of a pedestrian, and designing a multi-branch attention mechanism network model according to the obtained masks, wherein each branch fully utilizes local information of the pedestrian to learn the model; secondly, adjusting the color of the body area of the pedestrian according to the obtained mask, changing the color of the clothes of the pedestrian in the picture, and then serving as a new pedestrian category to achieve the purpose of expanding the data set; based on the consideration, the newly expanded data set is made into a data set with a double-label structure, each pedestrian has two corresponding labels which are a first class label and a second class label respectively, and different confidence coefficients are set for different labels in the classification process, so that a multi-label classification loss function is provided, and the model can learn more distinctive features. And finally, performing feature representation on the images in the test set by using the trained network model, and performing similarity comparison and sequencing. According to experimental analysis, the method reduces the interference of problems such as background clutter and shielding on the re-identification direction of the pedestrian, and obtains better identification performance on a plurality of public data sets.

Claims

1. The pedestrian re-identification method based on pedestrian analysis is characterized by comprising the following steps of:

1) preparing a pedestrian re-identification direction data set;

4) designing a multi-label classification loss function;

5) designing a multi-stream multi-task loss optimization function;

2. The pedestrian re-identification method based on pedestrian parsing in multi-stream and multi-label manner as claimed in claim 1, wherein in step 1), the specific method for preparing the pedestrian re-identification direction data set is as follows: let the pedestrian image in the training set be { (x)_i，y_i) I ═ 1.., n }, where n is the number of samples in the training set and is a natural number; x is the number of_iFor the pedestrian image corresponding to the ith training sample, y_iAnd (1 ≦ yi ≦ N) represents the pedestrian category label of the ith training sample, and N represents the number of pedestrian categories contained in the training sample set and is a natural number.

3. The pedestrian-analysis-based multi-stream multi-label pedestrian re-identification method according to claim 1, wherein in the step 2), the specific steps of designing the pedestrian-analysis-based multi-stream attention pedestrian re-identification network model structure are as follows:

4. The pedestrian-parsing-based multi-stream multi-tag pedestrian re-identification method according to claim 1, wherein in the step 3), the specific steps of expanding the already prepared data set are as follows:

5. The pedestrian-parsing-based multi-stream multi-label pedestrian re-identification method according to claim 1, wherein in the step 4), the specific steps of designing the multi-label classification loss function are as follows:

Then on the new data set the original picture is owned

Will be provided with

the probability of the second label is predicted for the model.

6. The pedestrian re-identification method based on pedestrian parsing in multi-stream and multi-label manner as claimed in claim 1, wherein in step 5), the specific steps of designing the multi-stream and multi-task loss optimization function are as follows:

L＝L_multi-labels+L_triplet#(2)

wherein L is_multi-labelsFor the mean classification loss of multiple branches of the model, L_tripletIs a modelThe average metric loss over multiple branches.