CN112016489B

CN112016489B - Pedestrian re-identification method capable of retaining global information and enhancing local features

Info

Publication number: CN112016489B
Application number: CN202010911071.2A
Authority: CN
Inventors: 栾晓; 陈俊恒
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2020-09-02
Filing date: 2020-09-02
Publication date: 2022-10-04
Anticipated expiration: 2040-09-02
Also published as: CN112016489A

Abstract

The invention discloses a pedestrian re-identification method for retaining global information and enhancing local features, which comprises the following steps of: s1: changing the size of the original pedestrian image to 384 multiplied by 128 multiplied by 3; s2: respectively extracting global characteristic information and local characteristic information of the pedestrian through global-branch and local-branch; s3: the fusion guide module fuses the global feature information with each local feature information respectively: s4: inputting the features after global average pooling of global-branch and local-branch into a triple loss function for metric learning, and inputting the features of the global-branch, local-branch and fusion guide module into a cross entropy loss function for classification learning; s5: inputting the characteristics of the global-branch, the local-branch and the fusion guide module into respective trained classifiers, and outputting the result of pedestrian re-identification classification. The invention can weaken the problems of shielding, image blurring and pedestrian misalignment in the pedestrian image.

Description

Pedestrian re-identification method capable of retaining global information and enhancing local features

Technical Field

The invention relates to the field of digital image processing and pattern recognition, in particular to a pedestrian re-recognition method for retaining global information and enhancing local features.

Background

Pedestrian re-identification refers to a technology for judging whether a specific pedestrian exists in an image or a video sequence by using a computer vision technology. Given a monitored pedestrian image, the pedestrian image is retrieved across the device. The technology aims to make up the visual limitation of a fixed camera, can be combined with a pedestrian detection/pedestrian tracking technology, and can be widely applied to the fields of intelligent video monitoring, intelligent security and the like. Because the cost of manual data labeling is high, most data samples are used for pedestrian image under the monitoring video selected by the pedestrian detector, and the problems of shielding, image blurring, pedestrian misalignment and the like generally exist. In addition, the existing method based on local features ignores or does not fully utilize the global semantic information of pedestrians, only focuses on mining local information, and does not combine and utilize the advantages of the global semantic information and the local semantic information.

Disclosure of Invention

The invention provides a pedestrian re-identification method for retaining global information and enhancing local features, and aims to solve the problems of shielding, image blurring and pedestrian misalignment in a pedestrian image.

The invention is realized by the following technical scheme:

a pedestrian re-identification method for retaining global information and enhancing local characteristics is applied to a corresponding network model and comprises the following steps:

s1, a data preparation stage: for the pedestrian image, the size of the original pedestrian image is changed into 384 multiplied by 128 multiplied by 3 in the network model, wherein the 384 multiplied by 128 multiplied by 3 corresponds to the height, the width and the channel number of the image;

s2, a characteristic extraction stage: global feature information and local feature information of pedestrians are respectively extracted through global-branch and local-branch, and then the obtained information is subjected to global average pooling and global maximum pooling, wherein: the global characteristic refers to the overall attribute of the image target, the local characteristic refers to the local attribute of the image target, and in the network model, the network branch responsible for extracting the global characteristic information is named as a global branch, and the network branch responsible for extracting the local characteristic information is named as a local branch;

s3, feature fusion and guidance stage: the fusion guide module in the network model fuses the global feature information with each local feature information respectively, and the fused feature information not only retains the global semantic information of the pedestrian, but also enhances the expression capability of the corresponding local features fused with the global feature information; with the reverse conduction and gradient updating of the network model training process, the fusion guide module further improves the global-branch and local-branch feature extraction capability;

s4, model training stage: inputting the characteristic information after the pooling of the global-branch and the local-branch into a triple loss function for metric learning, and inputting the characteristics of the global-branch, the local-branch and the fusion guide module into a cross entropy loss function for classification learning;

s5, model evaluation stage: inputting the characteristics of the global-branch, the local-branch and the fusion guide module into respective trained classifiers, and outputting the result of pedestrian re-identification classification.

Further, S1 specifically is:

the original pedestrian images i of the data set in the network model are uniform in size, and the conversion formula is as follows:

I＝resize(i)

wherein the size of I is 384 × 128 × 3.

Further, in S2:

inputting a pedestrian image I into a network model, then adopting ResNet-50 as a main network to carry out primary feature extraction, and taking a feature map output by a 3 rd residual block of ResNet-50 as T ^b3 The size is 24 × 8 × 1024.

Further, in S2:

in global-branch, T is ^b3 Inputting the original 4 th residual block of ResNet-50 to obtain a characteristic diagram T ^b4 The size is 12 × 4 × 2048; here, the 1 × 1 convolution, 3 × 3 convolution and upsampling operations are denoted as Conv1, conv3 and Upsample, respectively; thus, the following formula is passed:

T _g ＝Conv1(Conv3(T ^b3 +Conv1(Upsample(T ^b4 ))))

the global information enhancement module can obtain the feature map T _g The size is 24 × 8 × 2048; obtaining global characteristics after global average pooling and global maximum pooling

The size of the drug is 1 multiplied by 2048; the pooling formula is as follows:

further, in S2:

in local-branching, the downsampling operation of the 4 th residual block of ResNet-50 is removed, T ^b3 Inputting the residual block to obtain a feature map T _p The size of which is 24 × 8 × 2048; obtaining local characteristics after global average pooling and global maximum pooling

The size of the drug is 1 multiplied by 2048; the pooling equation is as follows:

further, in S3:

in the fusion guidance module, global features

With each local feature

Carry out the para-position addition to obtain the fusion characteristic

The formula is as follows:

further, in S4:

will the characteristic diagram T _g And the feature map T _p After global average pooling, respectively obtaining

And

the size is 1 × 1 × 2048, and the pooling formula is as follows:

further, in S4:

will be provided with

And

feature metric learning is carried out in input loss 1, the loss 1 is calculated by adopting a triple loss function, and the formula is as follows:

wherein N represents the total number of input Triplet Loss samples in the network model,

and

respectively representing the Anchor, positive and Negative samples in the network model, alpha is a difference value hyper-parameter [ ·] ₊ ＝max(·,0)。

Further, in S4:

global feature vector

Local feature vector

And fusion feature vectors

Respectively inputting full-connection layers for dimensionality reduction, wherein the dimensionality reduction size is 1 multiplied by 256, then inputting the loss 2 for classification training, wherein the loss 2 is calculated by adopting a cross entropy loss function, and the formula is as follows:

wherein, W _k Representing weight vectors of k classes, N representing the total number of input Softmax Loss samples in the network model, and C representing the total number of sample classes; and after the network model training is finished, obtaining a trained classifier corresponding to the feature vector.

Further, in S5:

and respectively inputting the feature vectors of which the dimensions are reduced to 1 multiplied by 256 into corresponding trained classifiers, and outputting the classification result of the re-identification of the pedestrians.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention designs a global information enhancement module. The residual error neural network ResNet-50 comprises 4 residual error blocks, and the invention can extract richer global feature information of pedestrian semantics by fusing feature maps output by 2 residual error blocks with less memory and parameter cost.

2. The invention designs a fusion guide module which can fuse two types of characteristics of global-branch and local-branch, and enhances the expression capability of corresponding local characteristics fused with the global semantic information of pedestrians while keeping the global semantic information of the pedestrians. And with the reverse conduction of the training process, the fusion guide module further improves the global-branch and local-branch feature extraction capability.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

fig. 1 is a schematic diagram of a network model structure according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.

Example (b):

s1: a data preparation stage: the size of the original pedestrian image is changed to 384 × 128 × 3 (corresponding to the height, width and number of channels of the image, respectively):

for the pedestrian original image i of the data set, the size is unified, and the conversion formula is as follows:

I＝resize(i) (1)

where the size of I is 384 × 128 × 3 (image height × image width × number of image channels).

S2: a characteristic extraction stage: respectively extracting global characteristic information and local characteristic information of the pedestrian through global-branch and local-branch;

s21, inputting a pedestrian image I into a network model, then adopting ResNet-50 as a main network to carry out primary feature extraction, and taking a feature graph output by a 3 rd residual block of ResNet-50 as T ^b3 The size of which is 24 × 8 × 1024;

s22: in global-branch, T of S21 ^b3 Inputting the original 4 th residual block of ResNet-50 to obtain a characteristic diagram T ^b4 The size is 12 × 4 × 2048. Here, the 1 × 1 convolution, the 3 × 3 convolution and the upsampling operation are denoted as Conv1, conv3 and Upsample, respectively. Thus, the following formula is passed:

T _g ＝Conv1(Conv3(T ^b3 +Conv1(Upsample(T ^b4 )))) (2)

the global information enhancement module can obtain the feature map T _g The size is 24 × 8 × 2048. After Global Average Pooling (GAP) and Global Maximum Pooling (GMP), global characteristics are obtained

The size is 1 × 1 × 2048. Pooling formulaThe following were used:

s23: in local-branching, the downsampling operation of the 4 th residual block of ResNet-50 is deleted, T of S21 ^b3 Inputting the residual block to obtain a feature map T _p The size is 24 × 8 × 2048. After GAP and GMP, local characteristics are obtained

The size is 1 × 1 × 2048. The pooling equation is as follows:

s3: and (3) feature fusion and guide stage: the fusion guide module fuses the global feature information with each local feature information respectively, and the fused features not only keep the global semantic information of the pedestrians, but also enhance the expression capability of the corresponding local features fused with the global feature information. Besides, with the reverse conduction of the training process, the fusion guide module further improves the global-branch and local-branch feature extraction capability:

s31: in the fusion guidance module, the global feature of S22

And each local feature of S23

Carry out the para-position addition to obtain the fusion characteristic

The formula is as follows:

and S32, conducting reversely and updating the gradient in the training process, and fusing the guide module to further improve the global-branch and local-branch feature extraction capability.

S4, model training: inputting the feature information after the pooling of the global-branch and the local-branch into a triple loss function for metric learning, and inputting the features of the global-branch, the local-branch and the fusion guide module into a cross entropy loss function for classification learning:

s41: the characteristic diagram T of S22 _g And comparing the characteristic map T of S23 _p After GAP, respectively obtain

And

the size is 1 × 1 × 2048. The pooling equation is as follows:

s42: will be provided with

And

the feature metric learning is performed in the loss 1 shown in fig. 1. In the invention, the Loss 1 adopts a triple Loss function (triple Loss), and the formula is as follows:

wherein N represents the total number of input triple Loss samples,

and

respectively representing an Anchor, positive and Negative sample, alpha being a difference hyperparameter [ · C] ₊ ＝max(·,0)。

S43: global feature vector of S22

S23 local feature vector

And the fused feature vector of S31

Respectively inputting the full-connection layers for dimension reduction, wherein the dimension reduction is 1 multiplied by 256, and then inputting the dimension reduction into a loss 2 for classification training. In the invention, the Loss 2 adopts a cross entropy Loss function (Softmax Loss), and the formula is as follows:

wherein, W _k The weight vector representing class k, N the total number of input Softmax Loss samples, and C the total number of sample classes. And after the network model training is finished, obtaining a trained classifier corresponding to the feature vector.

S5: in the model evaluation stage, the feature vectors with dimension reduced from S43 to 1 × 1 × 256 are respectively input into the trained classifier corresponding to S43, and the classification result of pedestrian re-identification is output.

Based on a pedestrian re-identification method which reserves global information and enhances local features, a network model structure shown in figure 1 is constructed, and tests are carried out on a Market-1501 and DukeMTMC-ReiD data set. In the field of pedestrian re-identification research, a Cumulative Matching Curve (CMC) and an Average Precision Average (mAP) are generally used to evaluate the performance of a method on a pedestrian re-identification data set. Meanwhile, in table 1, a method 1 based on local features, which employs horizontal blocking of a feature map, a method 2 employing a two-branch network structure, a method 3 employing a three-branch network, which gradually transits from global feature learning to local feature learning, and a method 4 employing the present invention are compared.

The following table gives the results of the tests on the database, and it can be seen that the network model based on the present invention performed more excellently on each data set result, both in the CMC (Rank-1 and Rank 5) and in the mAP indices.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A pedestrian re-identification method for retaining global information and enhancing local features, which is applied in a corresponding network model, is characterized by comprising the following steps:

s3, a characteristic fusion and guide stage: the fusion guide module in the network model fuses the global feature information with each local feature information respectively, and the fused feature information not only retains the global semantic information of the pedestrian, but also enhances the expression capability of the corresponding local features fused with the global feature information; with the reverse conduction and gradient updating in the network model training process, the fusion guide module further improves the global-branch and local-branch feature extraction capability;

s4, model training stage: inputting the pooled feature information of the global-branch and the local-branch into a triple loss function for metric learning, and inputting the features of the global-branch, the local-branch and the fusion guide module into a cross entropy loss function for classification learning;

s5, model evaluation stage: inputting the global feature vector, the local feature vector and the fusion feature vector into a full-connection layer respectively for dimensionality reduction, and then inputting the global feature vector, the local feature vector and the fusion feature vector into a cross entropy loss function for classification training; after the network model training is finished, obtaining a trained classifier corresponding to the feature vector;

and respectively inputting the feature vectors of which the dimensions are reduced to 1 multiplied by 256 into the trained classifier, and outputting the classification result of the re-identification of the pedestrians.

2. The pedestrian re-identification method for retaining global information and enhancing local features according to claim 1, wherein S1 specifically is:

I＝resize(i)

wherein the size of I is 384 × 128 × 3.

3. The pedestrian re-identification method for retaining global information and enhancing local features as claimed in claim 2, wherein in S2:

inputting the pedestrian image I into a network model, then adopting ResNet-50 as a main network to carry out primary feature extraction, and taking ResNet-50The feature map output by the 3 rd residual block is T ^b3 The size is 24 × 8 × 1024.

4. The pedestrian re-identification method for retaining global information and enhancing local features as claimed in claim 3, wherein in S2:

in global-branch, T is ^b3 Inputting the original 4 th residual block of ResNet-50 to obtain a characteristic diagram T ^b4 The size is 12 × 4 × 2048; here, the 1 × 1 convolution, the 3 × 3 convolution and the upsampling operation are denoted as Conv1, conv3 and Upsample, respectively; thus, the following formula is passed:

T _g ＝Conv1(Conv3(T ^b3 +Conv1(Upsample(T ^b4 ))))

The size is 1 × 1 × 2048; the pooling equation is as follows:

5. the pedestrian re-identification method for retaining global information and enhancing local features as claimed in claim 4, wherein in S2:

6. the pedestrian re-identification method for retaining global information and enhancing local features as claimed in claim 5, wherein in S3:

in the fusion guidance module, global features

And each local feature

Carry out the para-position addition to obtain the fusion characteristic

The formula is as follows:

7. the pedestrian re-identification method for retaining global information and enhancing local features as claimed in claim 6, wherein in S4:

feature map T _g And the feature map T _p Respectively obtain after global average pooling

And

the size is 1 × 1 × 2048, and the pooling formula is as follows:

8. the pedestrian re-identification method for retaining global information and enhancing local features as claimed in claim 7, wherein in S4:

will be provided with

And

and

9. The pedestrian re-identification method for retaining global information and enhancing local features as claimed in claim 8, wherein in S4:

global feature vector

Local feature vector

And fusion feature vectors

Respectively inputting full-connection layers for dimensionality reduction, wherein the dimensionality reduction size is 1 multiplied by 256, and then inputting cross entropy loss function calculation, wherein the formula is as follows: