CN113887448A

CN113887448A - Pedestrian re-identification method based on deep reloading

Info

Publication number: CN113887448A
Application number: CN202111174153.4A
Authority: CN
Inventors: 闫禹铭; 于慧敏; 李殊昭
Original assignee: Zhejiang University ZJU; Zhejiang Lab
Current assignee: Zhejiang University ZJU; Zhejiang Lab
Priority date: 2021-10-09
Filing date: 2021-10-09
Publication date: 2022-01-04

Abstract

The invention discloses a pedestrian re-identification method based on deep reloading, which comprises a training stage and a testing stage. The overall frame is divided into two branches: and (4) extracting a branch network E from the original image characteristic, and deeply reloading the branch network M to extract the characteristic. Firstly, the pictures in the training data set are reloaded by using a ready-made deep reloading model and are stored in the training data set. In the training phase, both branches participate in training. Taking the deep reloading feature extraction branch M as an example, the pedestrian picture after deep reloading is input into the backbone network to extract features, and then is separated into identity features and clothing features through an attention mechanism. The identity features extracted by the two branches are drawn as close as possible to extract more robust identity features. In the testing stage, only the network E is used for extracting identity features for the input pictures for identity inference. The pedestrian re-identification method can complete the pedestrian re-identification task, and effectively reduce the negative influence of appearance changes such as pedestrian reloading on pedestrian re-identification.

Description

Pedestrian re-identification method based on deep reloading

Technical Field

The invention belongs to the field of computer vision and pattern recognition, and particularly relates to a pedestrian re-recognition method based on deep reloading.

Background

In recent years, with the wide application of monitoring equipment, pedestrian identification related technologies have gained more and more attention, and pedestrian identification focuses on finding pedestrians with the same identity in a pedestrian database by using a shot pedestrian picture to determine the identity of the shot pedestrian. The pedestrian identity recognition has wide application scenes in the environment of the Internet of things and big data, and comprises the fields of intelligent cities, intelligent security and the like. At present, the main pedestrian re-identification technology is closely related to pedestrian identity identification, the pedestrian re-identification also obtains wide attention recently, and remarkable performance improvement is achieved on the public data set. However, the high pedestrian identity labeling cost in the real scene, the great difference of the pedestrian pictures obtained in different domains (scenes) in the aspects of illumination, background, posture and the like brings great challenges to the application of pedestrian re-identification in the real scene, the current mainstream deep learning method generally focuses on the appearance information of the pedestrian for inference, and the pedestrian identity labeling cost is difficult to apply in the real scene in which the pedestrian is frequently changed.

Most of the current algorithms use an attention mechanism to focus the model on a region with higher identification to improve the performance of the model. However, in a real scene, pedestrians frequently change clothes, the same pedestrian wears different clothes with different appearance characteristics, and if only a local area is concerned, the generalization performance of the model is poor.

Disclosure of Invention

The invention aims to provide a pedestrian re-identification method based on deep reloading, aiming at the defect that a pedestrian re-identification algorithm in the prior art is poor in effect in reloading scenes. The pedestrian re-identification method can complete the pedestrian re-identification task, and effectively reduce the negative influence of appearance changes such as pedestrian reloading on pedestrian re-identification.

The purpose of the invention is realized by the following technical scheme: a pedestrian re-identification method based on deep reloading comprises the following steps:

1) and replacing the pedestrians in the training set picture by using the depth replacing model and the preselected clothing template, storing and supplementing the pedestrians in the training set picture.

2) In the training stage, the original image feature extraction branch network E and the deep reloading feature extraction branch network M are used for respectively extracting the identity features and the clothing features of the original image and the deep reloading image, and the networks E and M are trained, so that the extracted features have a better classification effect.

3) In the training phase, the networks E and M are trained so that the identity features extracted by E and M are closer together.

4) In the testing stage, the original image feature extraction branch network E is only used for completing the extraction of the identity information, the identity information is used for carrying out similarity measurement and identity inference, and the highest similarity is the final matching result.

Further, in step 2), the original picture feature extraction branch network E and the deep retooling feature extraction branch network M respectively extract the identity features and clothing features of the original picture and the deep retooling picture, and the specific process is as follows: firstly, inputting a picture into a backbone network to extract a characteristic f^sThen, the identity characteristic and the clothing characteristic are separated through an attention mechanism:

f^s＝Backbone (I)

f^clo＝Atten(f^s)*f^s

f^ID＝(1-Atten(f^s))*f^s

wherein f is^cloAs a characteristic of the information on the garment, f^IDAs identity information features, f^sInputting features extracted from the backbone network for pedestrian pictures, I picture input, Atten (f)^s) For attention mechanism applied to f^sThe resulting attention was sought.

Further, in the step 2), the clothing characteristics and the identity characteristics separated by the attention mechanism are supervised and trained by utilizing a classification loss:

where CE represents a classification loss.

Further, in the step 3), the network E and the network M extract the identity features and the clothing features of the original picture and the deep repackaged picture respectively. E and M are trained to minimize the distance between the identity features of the two branches:

wherein Ic represents the picture after depth reloading, and I represents the original picture.

Further, the step 4) is specifically as follows: in the testing stage, the deep reloading feature extraction branch network M is not used, and a pedestrian picture is input into the original picture feature extraction branch network E to extract identity features for pedestrian identity inference.

Further, in the step 1), PF-AFN and the like are adopted as the deep reloading model.

Further, in step 2), the backbone networks of the networks E and M adopt a ResNet-50 network structure.

Further, in step 2), the attention mechanism consists of channel attention and spatial attention.

Further, in the step 2), the classification loss adopts the classification loss and the triple loss based on the cross entropy.

Further, in step 3), using an MSE metric function, the distance between the identity features extracted by the networks E and M is measured.

The invention has the beneficial effects that: according to the invention, the identity characteristics and the clothing characteristics are separated by attention, the identity characteristics with more identity recognition degree are extracted, and then the identity characteristics are used for deduction, so that the adaptability of the model to the pedestrian dressing change is improved, and meanwhile, the pictures of the same figure and posture of the same pedestrian wearing different clothing are obtained through deep dressing change, so that the model learning of the identity characteristics irrelevant to the clothing is facilitated. In a real scene, pedestrians are frequently changed, a conventional deep learning method focuses on appearance characteristics to deduce, pictures of the same pedestrian wearing different clothes can cause misjudgment due to overlarge appearance difference, the pedestrian re-identification method is expected to reduce the negative influence of pedestrian changing on pedestrian re-identification in the real scene to a certain extent, and the identification accuracy in the real scene is improved.

Drawings

FIG. 1 is a schematic diagram of the overall structure of a pedestrian re-identification network of the present invention;

FIG. 2 is a flow chart of the training phase of the present invention;

FIG. 3 is a flow chart of the testing phase of the present invention;

FIG. 4 is a schematic diagram of an example of the present invention using an attention mechanism;

FIG. 5 is a diagram illustrating matching results sorted by similarity according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples.

As shown in FIG. 1, the pedestrian re-identification method based on deep reloading of the invention comprises a training phase and a testing phase. The overall frame is divided into two branches: and (4) extracting a branch network E from the original image characteristic, and deeply reloading the branch network M to extract the characteristic. Before training, firstly, the pictures in the training data set are reloaded by using a ready-made deep reloading model in an offline data enhancement mode, and are supplemented and stored in the training data set. In the training phase, both branches participate in training. The original image feature extraction branch network E and the deep reloading feature extraction branch network M are used for respectively extracting the identity features and clothing features of the original image and the deep reloading image, the networks E and M are trained to enable the extracted features to have a better classification effect, and meanwhile the networks E and M are trained to enable the distances between the identity features extracted by the networks E and M to be closer; taking the deep reloading feature extraction branch M as an example, the pedestrian picture after deep reloading is input into the backbone network to extract features, and then is separated into identity features and clothing features through an attention mechanism. The identity features extracted by the two branches are drawn as close as possible to extract more robust identity features. After training is finished, in a testing stage, for an input picture, a deep reloading feature extraction branch network M is not used, and only an original picture feature extraction branch network E is used for extracting identity features for identity inference. The method specifically comprises the following steps:

1) and replacing the pedestrians in the training set picture by using the depth replacing model and the preselected clothing template, storing and supplementing the pedestrians in the training set picture into the training set. The used depth reloading model can be any current public model, the clothes template can be any clothes, and no style characteristic requirement exists.

2) As shown in fig. 2, in the training stage, the original image feature extraction branch network E and the deep retooling feature extraction branch network M are used to respectively extract the identity features and clothing features of the original image and the deep retooling image, and the networks E and M are trained to enable the extracted features to have a better classification effect. Wherein, the pedestrian picture is input into the backbone networks of the networks E and M to extract the characteristics, and then the pedestrian picture is separated into the identity characteristic and the clothing characteristic through an attention mechanism:

f^s＝Backbone(I)

f^clo＝Atten(f^s)*f^s

f^ID＝(1-Atten(f^s))*f^s

wherein I is picture input, f^sInputting features extracted by a Backbone network Backbone for a pedestrian picture I; atten (f)^s) For attention mechanism applied to f^sThe resulting attention map; f. of^cloAs a characteristic of the information on the garment, f^IDIs an identity information feature. The backbone networks of networks E and M can be any of the backbone network structures currently available, such as ResNet, VGGNet, and the like; the attention mechanism can be any current attention module

The clothing characteristics and the identity characteristics separated by the attention mechanism are supervised and trained by utilizing a classification loss:

wherein E represents an original picture feature extraction branch network, and M represents a deep reloading feature extraction branch network; CE represents a classification penalty, which may be any penalty used for classification.

3) As shown in fig. 2, in the training phase, the networks E and M are trained so that the distance between the identity features extracted by E and M is closer:

4) As shown in fig. 3, in the actual testing stage, only the original image feature extraction branch network E is used to complete the extraction of the identity information, and the identity information is used to perform similarity measurement and identity inference. Specifically, an image is input into the network E to extract an identity feature, and the identity feature is used to perform similarity measurement and identity inference.

The implementation process of one embodiment of the invention is as follows:

1) changing the pedestrian in the training set picture by using the depth changing model and the preselected clothing template, storing and supplementing the pedestrian into the training set; and (5) replacing the pedestrians in the training set picture by using a third-party deep replacement model PF-AFN (CVPR 2021).

2) In the training stage, the original picture feature extraction branch network E and the deep reloading feature extraction branch network M are used for respectively extracting the identity features and clothing features of the original picture and the deep reloading picture, and the networks E and M are trained to enable the extracted features to have better classification effects.

The main network of the networks E and M adopts a ResNet-50 network structure, as shown in FIG. 4, the attention mechanism is composed of channel attention and space attention, and the final attention attentive is composed of general attentionRoad attention force diagram A_chaAnd spatial attention diagram a_spaMultiplication results in:

A_cha＝sigmoid(Relu(Conv(Relu(Conv(GAP(f^s))))))

A_spa＝softmax(Relu(Conv(CGAP(f^s))))

Atten＝A_spa*A_cha

wherein GAP, CGAP, Conv, Relu and sigmoid are global average pooling, global average pooling in channel direction, convolution layer, Relu activation layer and sigmoid activation layer respectively.

The classification loss adopts the classification loss L based on the cross entropy_CEAnd triplet loss L_TLThe clothes type label of the original picture adopts 11 pre-labeled types, the labeling of the types takes the color and style of clothes as a standard, and the clothes type label of the deeply reloaded picture is labeled according to the adopted clothes template types:

CE＝L_CE+L_TL

where CE represents a classification loss. y is_iA real label representing the sample i,

a prediction tag representing sample i. N represents the number of samples. f. of^aRepresents an anchor sampleIdentity or clothing features extracted from network E or network M, f^pRepresenting features corresponding to positive samples belonging to the same identity as the anchor sample, fⁿRepresenting features corresponding to negative samples where the anchor samples belong to different identities, and alpha representing a margin value where positive and negative samples are expected to be pushed away from the distance.

3) In the training phase, the networks E and M are trained so that the identity features extracted by E and M are closer together. Measuring the distance between the identity features extracted by the networks E and M by using an MSE (Mean Square Error) measurement function, and enabling the distance between the identity features extracted by the networks E and M to be closer by training the networks E and M:

4) In the actual testing stage, the deep reloading feature extraction branch network M cannot be reserved, only the original picture feature extraction branch network E is used for completing the extraction of identity information, and the identity information is used for carrying out similarity measurement and identity inference so as to improve the reloading robustness of the method. The final obtained results are sorted according to the similarity, an example challenge picture matching result is shown in fig. 5, and the final matching result is the one with the highest similarity.

The above description is only exemplary of the present invention and should not be taken as limiting the invention, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A pedestrian re-identification method based on deep reloading is characterized by comprising the following steps:

2. The pedestrian re-identification method based on the deep retooling of claim 1, wherein in the step 2), the original image feature extraction branch network E and the deep retooling feature extraction branch network M respectively extract the identity features and clothing features of the original image and the deep retooling image, and the specific process is as follows: firstly, inputting a picture into a backbone network to extract a characteristic f^sThen, the identity characteristic and the clothing characteristic are separated through an attention mechanism:

f^s＝Backbone (I)

f^clo＝Atten(f^s)*f^s

f^ID＝(1-Atten(f^s))*f^s

3. The pedestrian re-identification method based on deep reloading as claimed in claim 2, wherein in the step 2), the clothing characteristics and the identity characteristics separated by the attention mechanism are supervised trained by using a classification loss:

where CE represents a classification loss.

4. The pedestrian re-identification method based on the deep suit-changing as claimed in claim 1, wherein in the step 3), the network E and the network M respectively extract the identity characteristics and clothing characteristics of the original picture and the deep suit-changing picture. E and M are trained to minimize the distance between the identity features of the two branches:

5. The pedestrian re-identification method based on deep reloading as claimed in claim 1, wherein the step 4) is specifically as follows: in the testing stage, the deep reloading feature extraction branch network M is not used, and a pedestrian picture is input into the original picture feature extraction branch network E to extract identity features for pedestrian identity inference.

6. The pedestrian re-identification method based on deep reloading as claimed in claim 1, wherein in step 1), the deep reloading model adopts PF-AFN or the like.

7. The pedestrian re-identification method based on deep reloading as claimed in claim 2, wherein in the step 2), the backbone networks of the networks E and M adopt a ResNet-50 network structure.

8. The pedestrian re-identification method based on depth reloading as claimed in claim 2, wherein in the step 2), the attention mechanism is composed of channel attention and spatial attention.

9. The pedestrian re-identification method based on deep reloading as claimed in claim 3, wherein in the step 2), the classification loss adopts the classification loss and the triple loss based on cross entropy.

10. The pedestrian re-identification method based on deep reloading as claimed in claim 4, wherein in step 3), the distance of the identity features extracted by the networks E and M is measured by using MSE measurement function.