CN112418134B

CN112418134B - Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method

Info

Publication number: CN112418134B
Application number: CN202011387800.5A
Authority: CN
Inventors: 王其聪; 王旭
Original assignee: Xiamen University; Shenzhen Research Institute of Xiamen University
Current assignee: Xiamen University; Shenzhen Research Institute of Xiamen University
Priority date: 2020-12-01
Filing date: 2020-12-01
Publication date: 2024-02-27
Anticipated expiration: 2040-12-01
Also published as: CN112418134A

Abstract

A pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method relates to a computer vision technology. Preparing a pedestrian re-identification direction data set, analyzing a pedestrian picture through a pedestrian analysis model to obtain masks of an upper body, a lower body and a whole body of a pedestrian, and designing a multi-branch attention mechanism network model according to the obtained masks; according to the mask, the color of the pedestrian clothing in the picture is changed by tone adjustment of the pedestrian body area, and then the pedestrian clothing is used as a new pedestrian category to expand the data set; and manufacturing a data set with a double-label structure from the newly-expanded data set, wherein each pedestrian is provided with two corresponding labels, different confidence degrees are respectively set, a multi-label classification loss function is provided, a trained network model is utilized to perform characteristic representation on the images of the test set, and then similarity comparison and sequencing are performed. The method has the advantages that good recognition performance is obtained on a plurality of public data sets, and interference of problems such as disordered background and shielding on the re-recognition direction of pedestrians is effectively relieved.

Description

Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method

Technical Field

The invention relates to a computer vision technology, in particular to a pedestrian re-identification method based on pedestrian analysis and multiple flow and labels.

Background

Pedestrian re-identification is one of the current computer vision direction research hotspots, and along with the development of modern society, the intelligent security field is paid more attention to, and pedestrian re-identification is also widely regarded as one of the important research directions in the intelligent security field, so that the rapid development of the pedestrian re-identification field is promoted. However, there are still a lot of problems in the direction of re-recognition of pedestrians, such as a small number of data sets, clutter of the pedestrians, and shielding. At present, most of research works are to solve the problem of pedestrian re-recognition direction by using a deep neural network, and in a model training stage, pedestrian re-recognition is used as a classification task, and then in a testing stage, features are extracted for similarity comparison.

The data sets of the pedestrian re-recognition directions are all obtained by shooting through a plurality of cameras without overlapping vision fields, so that the data set pictures used for the pedestrian re-recognition task contain the interference of complex background factors, the pedestrian re-recognition task is to judge whether pedestrians in pictures shot by different cameras are the same person or not, the influence of the complex background factors on the pedestrian re-recognition task is large, and more interference information can be brought in the process of extracting features through a deep neural network. In addition, due to reasons such as shooting by a camera and the effect of a later detection method, the area ratio of the pedestrian area in the edge frame is obvious, normally detected pedestrians often occupy the area with larger edge frame, and the ratio of pedestrians in pictures is smaller, so that further learning of a model is influenced, and the influence of the problem on model learning is not considered in most of the current methods. Since the pictures are randomly shot by the camera, the influence of the shielding problem on the task of re-identifying pedestrians is more remarkable, wherein the method for enhancing data is proposed by Zhong et al (Z.Zhong, L.Zheng, G.Kang, S.Li, and Y.Yang, "Random erasing data augmentation," arXiv preprint arXiv:1708.04896,2017.) and erases the region in the pictures with a certain probability, and the method can also partially treat the shielding problem, but the method does not consider from the perspective of model design and the perspective of the spatial structure of the pedestrians in the pictures, and can just effectively treat the shielding problem.

Disclosure of Invention

The invention aims to provide a multi-flow and multi-tag pedestrian re-identification method based on pedestrian analysis aiming at the technical problems existing in the existing pedestrian re-identification model.

The invention comprises the following steps:

1) Preparing a pedestrian re-identification direction data set;

2) Designing a multi-flow attention pedestrian re-identification network model structure based on pedestrian analysis, and extracting the characteristic of pedestrians with more discriminant ability through the network model;

3) Expanding the prepared data set by means of a pedestrian analysis model to obtain a new expanded data set;

4) Designing a multi-label classification loss function;

5) Designing a multi-stream multi-task loss optimization function;

6) On large-scale image data, a back propagation algorithm is utilized to pretrain a backbone network ResNet50 network, and a pretrained ResNet50 model is obtained;

7) On the basis of a pre-trained ResNet50 model, using an expanded pedestrian re-recognition data set, calculating the loss caused by the model by using a designed multi-task loss optimization function, and performing end-to-end training on the whole constructed model by using a back propagation algorithm to obtain a final trained model;

8) And (3) carrying out pedestrian re-recognition by using the trained model, and taking the output characteristics of the final network model as the characteristic representation of the pedestrian image and using the output characteristics as the subsequent similarity measurement and sequencing.

In step 1), the specific method for preparing the pedestrian re-recognition direction data set may be: assume that the training set pedestrian image is { (x) _i ,y _i ) I=1,..n }, where n is the number of samples of the training set and is a natural number; x is x _i For the pedestrian image corresponding to the ith training sample, y _i (1. Ltoreq.yi. Ltoreq.N) represents the pedestrian category label of the ith training sample, N represents the number of pedestrian categories contained in the training sample set and is a natural number.

In step 2), the specific steps of the design of the multi-flow attention pedestrian re-identification network model structure based on pedestrian analysis are as follows:

(1) Respectively acquiring masks of the whole body, the upper body and the lower body of a pedestrian in the picture by means of a pedestrian analysis model, wherein the masks are 0 except for the pixel point 1 of a specified body region;

(2) Removing the full connection layer used for classification finally by the original network, changing global average pooling before the full connection layer into global maximum pooling, and finally copying the network part after the first convolution pooling into four independent branches, wherein parameter sharing is not carried out among each branch;

(3) In the first branch, no changes are made; and respectively carrying out the attention mechanisms of the whole body, the upper body and the lower body of the pedestrian for the second branch, the third branch and the fourth branch, and respectively carrying out the channel attention mechanism on the characteristics of the positions of each branch part in each branch to finally obtain the multi-flow attention pedestrian re-identification network model structure based on pedestrian analysis.

In step 3), the specific steps of expanding the already prepared data set are as follows:

(1) Respectively acquiring masks of the whole pedestrian body in the picture by means of a pedestrian analysis model, wherein the masks except for the pixel points of the appointed body region are 1, and the rest regions are 0;

(2) And separating the foreground region and the background region of the pedestrians in the pictures by using the obtained mask, and then changing the color tone of the foreground region of the pedestrians to realize the clothes changing of the pedestrians so as to achieve the purpose of expanding the data set.

In step 4), the specific steps of designing the multi-label classification loss function are as follows:

(1) Class y for the original dataset _label Assuming that the generated pedestrian category isThe original picture has +.>Two category labels, and will y _label As a first category of the original picture, < >>As a second category of the original picture; the generated picture likewise has two category labels +.>Will->As the first category of newly generated pictures, y _label As a second category of the newly generated picture; thus constructing a double-labeled pedestrian re-identification dataset;

(2) Different confidence levels are set for different categories, and smoothing is added, so that a designed multi-label classification loss function is obtained as follows:

wherein P (y) _label ) The probability of the first category label is predicted for the model,the probability of the second label is predicted for the model.

In step 5), the specific steps of designing the multi-stream multi-task loss optimization function are as follows:

for each branch, a measurement task and a classification task are respectively calculated, wherein the measurement task adopts a triplet loss function, the classification task adopts a multi-label classification loss function, and a finally designed multi-stream multi-task loss optimization function is shown in the following formula:

L＝L _multi-labels +L _triplet #(2)

wherein L is _multi-labels To model the average classification loss of multiple branches, L _triplet The average metric loss for multiple branches of the model.

Compared with the prior art, the invention has the following advantages:

the invention designs a multi-stream attention pedestrian re-recognition network model structure based on pedestrian analysis, wherein a multi-branch structure is utilized to carry out attention mechanism on pedestrians in a picture so as to eliminate interference of a background and a shielding area on the re-recognition direction of the pedestrians, and attention mechanisms are added to different channels of the same-layer characteristics of different networks, so that the model weights information channels expressing different pedestrians, and the characteristics with more discriminants can be learned; then, a more efficient data enhancement method is designed, and the purpose of expanding a data set is achieved by changing the colors of clothes of pedestrians in the pictures; the newly generated picture is generated based on the original picture, and strong similarity exists between the newly generated picture and the real picture in the aspects of pedestrian textures, outlines and backgrounds, so that a double-label data set is provided for the newly generated data set, a multi-label classification loss function is provided, a multi-stream multi-task loss function is designed in combination with a measurement task, the model is better optimized, and therefore more discriminant feature representation is extracted, subsequent feature similarity measurement and sequencing are completed, and a final pedestrian re-identification result is obtained. The invention effectively relieves the interference of the problems of disordered background, shielding and the like on the re-recognition direction of pedestrians.

Drawings

Fig. 1 is a frame diagram of an embodiment of the present invention.

Detailed Description

In order to make the above objects, features and advantages of the present invention more comprehensible, the following detailed description of the method of the present invention is given with reference to the accompanying drawings and examples, which are provided with the technical scheme of the present invention as a premise, and the present invention is not limited to the following examples.

Referring to fig. 1, the implementation of the embodiment of the present invention includes the following steps:

1. a pedestrian re-recognition direction dataset is prepared. Assume that the training set pedestrian image is { (x) _i ,y _i ) I=1,..n }, where n is the number of samples of the training set and is a natural number; x is x _i For the pedestrian image corresponding to the ith training sample, y _i (1. Ltoreq.yi. Ltoreq.N) represents the pedestrian category label of the ith training sample, N represents the number of pedestrian categories contained in the training sample set and is a natural number.

2. And designing a multi-flow attention pedestrian re-identification network model structure based on pedestrian analysis, and extracting the characteristic of more discriminant pedestrians through the network model.

B1. Masks respectively marking positions of the upper body, the lower body and the whole body area of the pedestrian on the picture through a pedestrian analysis model JPPNet, and respectively marking the positions as masks _upper 、mask _lower And mask _whole 。mask _upper The pixel value of the area where the upper half of the middle pedestrian is positioned is uniformly set to be 1, the rest positions are set to be 0, and the mask is set _lower The pixel value of the area where the lower half of the middle pedestrian is positioned is uniformly set to be 1, the rest positions are set to be 0, and the mask is set _whole The pixel value of the whole body area of the pedestrian is set to 1, and the rest positions are set to 0.

B2. The ResNet50 network is removed and finally the fully connected layer used for classification is removed, global average pooling before the fully connected layer is changed into global maximum pooling, and finally the network part after the first convolution pooling is duplicated into four independent branches, and parameter sharing is not carried out among each branch.

B3. The network structure of the first branch is the same as the structure after the ResNet-50 is modified, the obtained characteristic is 2048-dimensional after global maximum pooling, then a full-connection layer is connected to perform dimension reduction operation on the 2048-dimensional characteristic, the number of neurons of the full-connection layer is set to be 512, and the dimension reduction characteristic z of 512 dimensions is obtained _G 。

B4. In the second branch, the JPPNet is used for analyzing the pedestrian picture to obtain the mask of the whole body area of the pedestrian _whole Then, weighting the characteristic X obtained after the first convolution pooling to obtain the characteristic X obtained after the attention weighting of the whole pedestrian area _whole ，

X _whole ＝(1+α ₁ ·mask _whole )·X#(1)

Wherein alpha is ₁ The weight parameters are used for carrying out attention mechanism on the whole body of the pedestrian in the picture. The branch is to perform a attentive mechanism on the whole body of the pedestrian on the picture, and the subsequent processing is the same as the processing in the first branch, and the obtained 2048-dimensional characteristics are subjected to dimension reduction to obtain 512-dimensional characteristics Z _whole 。

B5. In the third and fourth branches, too, use is made ofThe JPPNet analyzes the pedestrians in the pictures to respectively obtain masks of the upper body of the pedestrians _upper Mask for lower body _1ower Respectively weighting the features X to obtain the features X after the attention weighting of the upper half body area and the lower half body area of the pedestrian on the picture _upper X is as follows _lower ，

X _upper ＝(1+α ₂ ·mask _upper )·X#(2)

X _lower ＝(1+α ₃ ·mask _lower )·X#(3)

Wherein alpha is ₂ And alpha ₃ The weight parameters of the attention mechanism for the whole upper body and the lower body of the pedestrian in the picture are respectively. The two branches are respectively used for carrying out attention mechanism on the upper body and the lower body of the pedestrian on the picture, the later processing is the same as the processing in the first branch, and finally 512-dimensional characteristics Z after dimension reduction of the two branches are respectively obtained _upper And Z _lower 。

B6. In the training stage, the task of re-identifying pedestrians is used as a classification task, so that for each branch, a layer of full-connection layer is respectively connected at last and used for classifying each branch, and the number of neurons of the full-connection layer is the number of the classes of pedestrians in the training set. In the test stage, the stage is connected with the characteristics after the dimension reduction of the four branches, and the characteristics Z for similarity measurement are obtained:

wherein,representing the concatenation operation, the dimension of the resulting feature Z for the similarity measure is 2048 dimensions.

3. And expanding the prepared data set by means of a pedestrian analysis model to obtain a new expanded data set.

C1. Analyzing pedestrians in the pictures by using the JPPNet to obtain masks of the whole bodies of the pedestrians _hole And recordRecording the pedestrian label, namely, label, wherein the pedestrian label is a positive value, such as a mark-1501 data set, and the value range of the label is from 0 to 1501;

C2. through the obtained mask _whole Separating the area where the foreground of the pedestrian is located from the background area to obtain pictures P and B only containing foreground pixel values and background pixel values respectively;

C3. the analysis model of the person also has error, if the analysis area is smaller, namely the ratio of the pedestrian area in the area where the pedestrian is located to the total area is smaller than 0.3, the analysis of the picture is considered to be failed, no processing is carried out on the picture, and otherwise, the following operation is continued.

And C4. Converting the pedestrian foreground picture P in the RGB format into the pedestrian foreground picture in the HSV format. The angle value of the hue in the picture in the HSV format is from 0 to 360, so that the hue after conversion is obtained through the formula H=label% 360 in order to ensure the consistency of the hue H after pedestrian conversion of the same type label, then the picture P is subjected to pedestrian clothing change, and the pedestrian foreground picture P after clothing change is converted into the RGB format.

C5. And splicing the foreground picture P with the previous background picture B to obtain a new pedestrian picture, and storing the new picture.

4. And designing a multi-label classification loss function.

D1. Class y for the original dataset _label Assuming that the generated pedestrian category isThe original picture has +.>Two category labels, and will y _label As a first category of the original picture, < >>As the second category of the original picture. The generated picture also has two category labels +.>Will->As the first category of newly generated pictures, y _label As the second category of the newly generated picture. Thus forming a double-labeled pedestrian re-identification dataset.

D2. Designs a multi-label classification loss function L _multi-labels ：

Wherein alpha and beta are super parameters, K is a category number, P (y _label ) The probability of the first category label is predicted for the model,predicting a probability of a second tag for the model; in the experiment, alpha and beta are respectively set to be 0.1.

5. A multi-task loss optimization function is designed.

E1. Calculating multi-branch metric average loss:

wherein,respectively represents the triplet measurement loss obtained by calculation of the ith branch, i epsilon [1,2,3,4 ]]Representing the four branches of the model, respectively.

E2. Calculating the average loss of the multi-label classification function of the multi-branches:

wherein,representing the multi-label classification loss calculated by the ith branch, i epsilon [1,2,3,4 ]]Representing the four branches of the model, respectively.

E3. Calculating a loss function of the whole model:

L＝L _multi-labels +L _triplet #(8)

wherein L is _cls To model the average classification loss of multiple branches, L _triplet The average metric loss for multiple branches of the model.

6. On large-scale image data, a back propagation algorithm is utilized to pretrain a backbone network ResNet50 network, and a pretrained ResNet50 model is obtained.

7. On the basis of a pre-trained ResNet50 model, an expanded pedestrian re-recognition data set is used, the loss caused by a designed multi-task loss optimization function calculation model is utilized, and a back propagation algorithm is utilized to perform end-to-end training on the whole built model, so that a final trained model is obtained.

8. And (3) carrying out pedestrian re-recognition by using the trained model, and taking the output characteristics of the final network model as the characteristic representation of the pedestrian image and using the output characteristics as similarity measurement and sequencing.

Tables 1-2 are respectively the comparison of the proposed method with the re-identification results of other pedestrians on the Market-1501, CUHK03 data set.

TABLE 1

TABLE 2

In tables 1-2, other methods are as follows:

LSRO corresponds to the method proposed by Zheng et al (Z.Zheng, L.Zheng, and Y.Yang. "Unlabeled samples generated by gan improve the person re-identification baseline in vitro" arXiv preprint arXiv:1701.07717, vol.3, 2017.);

PNGAN corresponds to the method proposed by Qian et al (X.Qian, Y.Fu, T.Xiang, W.Wang, J.Qiu, Y.Wu, Y. -G.Jiang, and X.Xue. "Pose-normalized image generation for person re-identification" in European Conference on Computer Vision,2018, pp.661-678.);

CamStyle corresponds to the method proposed by Zhong et al (Z.Zhong, L.Zheng, Z.Zheng, S.Li, and Y.Yang. "Camera style adaptation for person re-identification" in proceedings of the IEEE conference on Computer Vision and Pattern Recognition,2018, pp.5157-5166.);

MLFN corresponds to the method proposed by Chang et al (X.Chang, T.Hospedales, and T.Xiang. "Multi-level factorisation net for person re-identification" in proceedings of the IEEE conference on Computer Vision and Pattern Recognition,2018, vol.1, p.2.);

HA-CNN corresponds to the method proposed by Li et al (w.li, x.zhu, and s.gong. "Harmonious attention network for person reidentification" in proceedings of the IEEE conference on Computer Vision and Pattern Recognition,2018, pp.2285-2294);

the PCB corresponds to the method proposed by Sun et al (Y.Sun, L.Zheng, Y.Yang, et al, "Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline)" in proceedings of the European Conference on Computer Vision, 2018:480-496);

MGN corresponds to the method proposed by Wang et al (G.Wang, Y.Yuan, X.Chen, et al, "Learning discriminative features with multiple granularities for person re-identification" in proceedings of the 26th ACM international conference on Multimedia,2018:274-282);

OSNet corresponds to the method proposed by Zhou et al (K.Zhou, Y.Yang, A.Cavallaro, et al, "Omni-scale feature learning for person re-identification" in proceedings of the IEEE International Conference on Computer Vision, 2019:3702-3712);

PAN corresponds to the method proposed by Zheng et al (Z.Zheng, L.Zheng, Y.Yang, et al, "Pedestrian alignment network for large-scale person re-identification" in IEEE Transactions on Circuits and Systems for Video Technology,2018,29 (10): 3037-3045);

AANet corresponds to the method proposed by Tay et al (C.Tay, S.Roy, K.Yap. "Aanet: attribute attention network for person re-identifiers" in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2019: 7134-7143.);

FPR corresponds to the method proposed by He et al (L.He, Y.Wang, W.Liu, et al, "forerunner-aware Pyramid Reconstruction for Alignment-free Occluded Person Re-identification" in proceedings of the IEEE International Conference on Computer Vision, 2019:8450-8459);

CRANs correspond to the methods proposed by Han et al (C.Han, R.Zheng, C.Gao, et al, "completion-Reinforced Attention Network for Person Re-Identification" in IEEE Transactions on Circuits and Systems for Video Technology, 2019);

CASN corresponds to the method proposed by Zheng et al (M.Zheng, S.Karanam, Z.Wu, et al, "Re-identification with consistent attentive siamese networks" in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019:5735-5744);

JPPNet corresponds to the method proposed by Liang et al (X.Liang, K.Gong, X.Shen, et al, "Look intoperson: joint body parsing & pose estimation network and a new benchmark" in IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41 (4): 871-885).

Firstly, analyzing a pedestrian picture through a pedestrian analysis model to obtain masks of an upper body, a lower body and a whole body of a pedestrian, and designing a multi-branch attention mechanism network model according to the obtained masks, wherein each branch fully utilizes the information of the locality of the pedestrian to learn the model; secondly, according to the obtained mask, the color tone of the pedestrian body area is adjusted, the color of the pedestrian clothes in the picture is changed, and then the pedestrian clothes are used as a new pedestrian category, so that the aim of expanding the data set is fulfilled; the original picture is compared with the expanded picture, the original picture has very strong similarity on the background and the outline of pedestrians, based on the consideration, the newly expanded data set is manufactured into a data set with a double-label structure, each pedestrian is provided with two corresponding labels, namely a first-class label and a second-class label, and different confidence degrees are respectively set for different labels in the classifying process, so that a multi-label classifying loss function is provided, and the model is beneficial to learn more distinguishing characteristics. And finally, carrying out feature representation on the test set images by using the trained network model, and carrying out subsequent similarity comparison and sequencing. Experimental analysis shows that the method reduces interference of problems such as background disorder and shielding on the re-recognition direction of pedestrians, and obtains good recognition performance on a plurality of public data sets.

Claims

1. The pedestrian re-identification method based on pedestrian analysis and multi-stream and multi-tag is characterized by comprising the following steps of:

1) Preparing a pedestrian re-identification direction data set;

the specific steps of expanding the ready data set are as follows:

(2) Separating a pedestrian foreground area from a background area in the picture by using the obtained mask, and then changing the tone of the pedestrian foreground area to realize the clothes changing of the pedestrian, thereby achieving the purpose of expanding the data set;

4) The method for designing the multi-label classification loss function comprises the following specific steps:

(1) Class y for the original dataset _label Is a row of (2)A person, assuming that the generated pedestrian category isThe original picture has +.>Two category labels, and will y _label As a first category of the original picture, < >>As a second category of the original picture; the generated picture likewise has two category labels +.>Will->As the first category of newly generated pictures, y _label As a second category of the newly generated picture; thus constructing a double-labeled pedestrian re-identification dataset;

wherein P (y) _label ) The probability of the first category label is predicted for the model,predicting a probability of a second tag for the model;

5) The method comprises the following specific steps of:

L＝L _multi-labels +L _triplet #(2)

wherein L is _multi-labels To model the average classification loss of multiple branches, L _triplet Average metric loss for a model plurality of branches;

2. The pedestrian recognition method based on the pedestrian resolution multi-stream multi-tag as claimed in claim 1, wherein in the step 1), the specific method for preparing the pedestrian recognition direction data set is as follows: assume that the training set pedestrian image is { (x) _i ,y _i ) I=1,..n }, where n is the number of samples of the training set and is a natural number; x is x _i For the pedestrian image corresponding to the ith training sample, y _i The value range of (2) is not less than 1 and not more than y _i N is less than or equal to, represents the pedestrian category label of the ith training sample, and N represents the number of pedestrian categories contained in the training sample set and is a natural number.

3. The pedestrian-analysis-based multi-stream multi-tag pedestrian re-recognition method as set forth in claim 1, wherein in step 2), the specific steps of designing the pedestrian-analysis-based multi-stream attention pedestrian re-recognition network model structure are as follows: