CN111898431A

CN111898431A - Pedestrian re-identification method based on attention mechanism part shielding

Info

Publication number: CN111898431A
Application number: CN202010587406.XA
Authority: CN
Inventors: 韩光; 艾岳川; 朱梦成; 刘耀明
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2020-11-06
Anticipated expiration: 2040-06-24
Also published as: CN111898431B

Abstract

The invention discloses a pedestrian re-identification method based on attention mechanism component shielding, which extracts the basic characteristics of a target through a Resnet50 network; extracting local features, global features and attention features of the target respectively through a global feature extractor, a component shielding feature extractor and an attention feature extractor; and obtaining a prediction characteristic vector of the pedestrian through the global characteristic, the component shielding characteristic and the attention characteristic, and carrying out end-to-end training on the whole network to obtain a pedestrian re-identification model. The method fully utilizes the global characteristics and the component shielding characteristics of the pedestrian images, and simultaneously fuses the space attention mechanism and the channel attention mechanism, thereby effectively improving the discriminability of the predicted characteristics.

Description

Pedestrian re-identification method based on attention mechanism part shielding

Technical Field

The invention belongs to the technical field of pattern recognition, and particularly relates to a pedestrian re-recognition method based on attention mechanism component shielding.

Background

Pedestrian re-identification is a technology for researching fire and heat in the field of computer vision in recent years, identifies pedestrians through characteristics such as wearing, posture and hairstyle of the pedestrians, and is mainly oriented to identification and retrieval of the pedestrians in a cross-camera and cross-scene mode. The pedestrian re-identification method has wide application prospect in the fields of video monitoring, intelligent security and the like, and the development of the pedestrian re-identification technology has important significance for building safe cities.

Pedestrian re-identification is a very challenging computer vision task, whose task is to retrieve pedestrians under different cameras, where the difficulties include varying backgrounds, lighting, blurring of pictures, different poses of pedestrians, and occlusion of sundries. In the past, means for solving pedestrian re-identification mainly utilizes manually extracted features to carry out retrieval, and with the application of deep learning in pedestrian re-identification, research on pedestrian re-identification is greatly developed, but the problems of blurring, shielding and the like of pictures are not well solved.

Most of the existing pedestrian re-identification algorithms are based on the global features of pedestrian images for identification, because the backgrounds of the pedestrian images are very complicated, the detailed information of the pedestrian images is not fully utilized, the blocking performance is not robust enough, and the accuracy of pedestrian matching is not high.

Disclosure of Invention

The invention aims to provide a pedestrian re-identification method based on attention mechanism component shielding, and aims to solve the problems that an existing pedestrian re-identification model is insensitive to local detail features of a target and poor in shielding robustness.

In order to solve the technical problems, the invention adopts the technical scheme that:

a pedestrian re-identification method based on attention mechanism component shielding comprises the following steps:

inputting the complete pedestrian target map into a pre-trained pedestrian re-identification model;

the pedestrian re-identification model gives a result whether the complete pedestrian target map has a specific pedestrian or not;

the pedestrian re-identification model comprises a global feature extractor, a component shielding feature extractor, an attention feature extractor and a basic feature extraction model, wherein the output end of the basic feature extraction model is connected with the input ends of the global feature extractor, the component shielding feature extractor and the attention feature extractor respectively.

Further, the training method of the pedestrian re-identification model comprises the following steps:

acquiring a complete pedestrian target map carrying tag data;

according to the complete pedestrian target image, the basic feature extraction model extracts a complete pedestrian target feature image;

according to the complete pedestrian target feature map, the global feature extractor extracts a global feature vector of a pedestrian target;

according to the complete pedestrian target feature map, the component shielding feature extractor extracts component shielding feature vectors of the pedestrian target;

according to the complete pedestrian target feature map, the attention feature extractor extracts an attention feature vector of a pedestrian target;

and based on the global feature vector of the pedestrian target, the component shielding feature vector of the pedestrian target, the attention feature vector of the pedestrian target and the pedestrian label vector of the complete pedestrian target graph, taking the linear addition of a cross entropy loss function and a triplet loss function as a total loss function, taking the reduction of the total loss function value as a target, and training the pedestrian re-identification model.

Further, the basic feature extraction model is based on a ResNet-50 convolutional neural network which is pre-trained by ImageNet and removes a downsampling layer and a full connection layer at the tail end; the basic feature extraction model compresses an input image into an image of a predetermined size before extracting image features.

Further, the extracting, by the global feature extractor, a global feature vector of the pedestrian target according to the complete pedestrian target feature map includes:

inputting the complete pedestrian target feature map into a global feature extractor;

and the global feature extractor performs global average pooling processing and convolution dimension reduction processing on the input complete pedestrian target feature map to obtain a global feature vector of the pedestrian target.

Further, the extracting, according to the complete pedestrian target feature map, a component occlusion feature vector of the pedestrian target by the component occlusion feature extractor includes:

inputting a complete pedestrian target feature map into a component occlusion feature extractor;

the component shielding characteristic extractor horizontally and uniformly divides the input complete pedestrian target characteristic graph into four component blocks;

randomly selecting one of the four component blocks through mask operation, setting the response value of the selected block to be zero, keeping the response values of the rest component blocks unchanged, and acquiring a pedestrian target characteristic diagram shielded by the components;

and obtaining an intermediate feature vector by performing global maximum pooling on the pedestrian target feature map shielded by the component, and then performing dimension reduction on the intermediate feature vector through convolution, batch normalization and ReLU operation to obtain the component shielding feature vector of the pedestrian target.

Further, the extracting, by the attention feature extractor, an attention feature vector of the pedestrian target according to the complete pedestrian target feature map includes:

inputting the complete pedestrian target feature map into an attention feature extractor;

the attention feature extractor comprises an attention module including a spatial attention module and a channel attention module;

the channel attention module is used for respectively carrying out global maximum pooling and global average pooling on the complete pedestrian target feature map, then respectively carrying out convolution dimension reduction, ReLU and convolution dimension increase processing, then carrying out element addition and combination on the feature maps obtained after the respective processing, and multiplying the combined feature map obtained after Sigmoid operation with the complete pedestrian target feature map to obtain a channel attention weighted feature map;

the spatial attention module is used for respectively averaging and maximizing complete pedestrian target feature maps in channel dimensions, then respectively passing through a full connection layer, a ReLU and a full connection layer, then carrying out element addition and combination on feature maps obtained after respective processing, multiplying a combined feature map obtained after Sigmoid operation by the complete pedestrian target feature map, and finally obtaining a spatial attention weighted feature map;

carrying out element addition and combination on the space attention weighted feature map, the channel attention weighted feature map and the complete pedestrian target feature map to obtain an attention feature map;

and carrying out global average pooling on the attention feature map to obtain an intermediate feature vector, and then carrying out dimension reduction processing on the intermediate feature vector through convolution, batch normalization and ReLU operation to obtain the attention feature vector of the pedestrian target.

Compared with the prior art, the invention has the following beneficial effects and advantages:

the pedestrian re-identification method based on attention mechanism component shielding is reasonable in design, global features, component shielding features and attention features are combined in the feature extraction process, the overall information of a pedestrian image can be obtained through the global features, and most visual clues are utilized to identify pedestrians; the component occlusion feature has better robustness to occlusion; the attention characteristic comprises space attention and channel attention, the space attention mechanism captures context dependency information of each position of the target characteristic, and the channel attention mechanism captures channel dependency among characteristic graphs, so that better characteristic representation and recognition effects are obtained.

Drawings

FIG. 1 is a schematic flow chart of a pedestrian re-identification method based on attention mechanism component occlusion according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a spatial attention module according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a channel attention module according to an embodiment of the invention.

Detailed Description

The invention is further described with reference to specific examples. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

Aiming at the problem that the existing pedestrian re-identification model is poor in shielding robustness, the method utilizes the masks of the component shielding branches to carry out random shielding on the components on the feature map, and improves the shielding robustness through shielding training; acquiring global information of the pedestrian image by using the global feature branch; aiming at the problem that the complex background in the pedestrian image interferes with the identification, the context dependence information of each position of the pedestrian image and the interdependence relation between characteristic diagram channels are obtained by utilizing the attention characteristic branch so as to enhance the discriminability of the output characteristics.

Fig. 1 is a schematic flow chart of a pedestrian re-identification method based on attention mechanism component occlusion according to an embodiment of the present invention, where the method includes the following steps:

step 1, acquiring a complete pedestrian target map carrying tag data as a training sample.

And 2, extracting a complete pedestrian target feature map by the basic feature extraction model according to the complete pedestrian target map.

The specific implementation method of the step is as follows:

and scaling the obtained complete pedestrian target image into a fixed size of 384 multiplied by 128, taking the ResNet-50 convolutional neural network pre-trained by ImageNet as a basic feature extraction model, removing the final down-sampling and full-connection layer, and outputting the pedestrian target feature image with the size of 24 multiplied by 8.

And 3, extracting the global feature vector of the pedestrian target by a global feature extractor according to the complete pedestrian target feature map.

The specific implementation method of the step is as follows:

completely inputting a pedestrian target feature map generated by a basic feature extraction model into a global feature extractor, down-sampling the feature map into 2048 multiplied by 1 feature vectors through global average pooling, and then reducing the dimensions of the feature vectors by using 1 multiplied by 1 convolution, batch normalization and ReLU operation to generate 512 multiplied by 1 feature vectors.

And 4, extracting the component shielding characteristic vector of the pedestrian target by the component shielding characteristic extractor according to the complete pedestrian target characteristic diagram.

The specific implementation method of the step is as follows:

step 1), inputting a pedestrian target feature map generated by a basic feature extraction model into a component occlusion feature extractor;

step 2), the component shielding feature extractor horizontally and uniformly divides the input pedestrian target feature map into four component blocks;

step 3), randomly selecting one of the four component blocks through mask operation, setting the response value of the selected component block to be zero, keeping the response values of the other component blocks unchanged, and acquiring a pedestrian target characteristic diagram shielded by the components;

and 4) generating a characteristic vector with the size of 2048 multiplied by 1 by using the pedestrian target characteristic diagram shielded by the mask through global maximum pooling, and then reducing the dimension of the characteristic vector by using convolution of 1 multiplied by 1, batch normalization and ReLU operation to generate a characteristic vector of 1024 multiplied by 1.

And 5, extracting the attention feature vector of the pedestrian target by an attention feature extractor according to the complete pedestrian target feature map.

The specific implementation manner of the step is as follows:

step 1), inputting a pedestrian target feature map generated by a basic feature extraction model into an attention feature extractor; the attention feature extractor comprises an attention module, wherein the attention module comprises a space attention module and a channel attention module;

step 2), the channel attention module carries out global maximum pooling and global average pooling on the input pedestrian target feature map, then carries out convolution dimension reduction, ReLU and convolution dimension increase processing, then carries out element addition and combination on the feature maps obtained after respective processing, carries out Sigmoid operation, multiplies the obtained feature map by the original pedestrian target feature map, and finally obtains a channel attention weighted feature map;

step 3), the spatial attention module respectively averages and maximizes the input pedestrian target characteristic graphs on channel dimensions, then respectively passes through a full connection layer, a ReLU and a full connection layer, then performs element addition and combination on the characteristic graphs obtained after respective processing, performs Sigmoid operation, multiplies the obtained characteristic graph by the original pedestrian target characteristic graph, and finally obtains a spatial attention weighted characteristic graph;

step 4), carrying out element addition and combination on the space attention weighted feature map, the channel attention weighted feature map and the original pedestrian target feature map to obtain an attention feature map;

and 5) generating 2048 multiplied by 1 feature vector by global average pooling of the attention feature map, and then generating 512 multiplied by 1 attention feature vector by reducing dimensions of the feature vector through 1 multiplied by 1 convolution, batch normalization and ReLU operation.

And 6, based on the global feature vector of the pedestrian target, the component shielding feature vector of the pedestrian target, the attention feature vector of the pedestrian target and the pedestrian label vector of the complete pedestrian target image, taking the linear addition of the cross entropy loss function and the triple loss function as a total objective function of network training, taking the trend of the total objective function value to be reduced as a target, and training to obtain the pedestrian re-identification model based on the shielding of the attention mechanism component.

The specific implementation manner of the step is as follows:

step 1), respectively calculating the triple loss and the cross entropy loss of the global feature vector and the pedestrian label vector of the pedestrian target, the triple loss and the cross entropy loss of the component shielding feature vector and the pedestrian label vector, and the triple loss and the cross entropy loss of the attention feature vector and the pedestrian label vector, linearly adding the three triple losses and the three cross entropy losses, and updating model parameters by using a back propagation algorithm;

and 2), setting the batch size to be 64 and the training epoch to be 400, optimizing the network by adopting an Adam algorithm, and evaluating the model performance by using Rank1 and mAP after the network training is finished.

And 7, inputting the new complete pedestrian target image into the trained pedestrian re-identification model to obtain a result whether the complete pedestrian target image has a specific pedestrian or not.

As can be seen from the above embodiments, the present invention follows the idea of pcb (part conditional basic) algorithm, including: extracting basic characteristics of the target through a Resnet50 network; respectively extracting local features, global features and attention features of the target by a global feature extractor and a component shielding feature extractor; in the component shielding characteristic branch, the last layer of characteristic layer is horizontally and uniformly divided into four component blocks, one component block is randomly shielded, and the component shielding characteristic is obtained through the rest of the components which are not shielded; in the global feature branch, inputting the feature image of the whole target into a global feature extractor to obtain global features; in the attention feature branch, inputting a complete feature map into a space attention module and a channel attention module to respectively obtain a space attention feature and a channel attention feature, and then fusing the two attention features into one feature; and obtaining a prediction characteristic vector of the pedestrian through the global characteristic, the component shielding characteristic and the attention characteristic, and carrying out end-to-end training on the whole network to obtain a pedestrian re-identification model.

The method fully utilizes the global characteristics and the component shielding characteristics of the pedestrian images, and simultaneously fuses the space attention mechanism and the channel attention mechanism, thereby effectively improving the discriminability of the predicted characteristics.

The present invention has been disclosed in terms of the preferred embodiment, but is not intended to be limited to the embodiment, and all technical solutions obtained by substituting or converting equivalents thereof fall within the scope of the present invention.

Claims

1. A pedestrian re-identification method based on attention mechanism part shielding is characterized by comprising the following steps:

2. The pedestrian re-identification method based on attention mechanism component occlusion according to claim 1, wherein the training method of the pedestrian re-identification model comprises the following steps:

acquiring a complete pedestrian target map carrying tag data;

3. The method of pedestrian re-identification based on attention mechanism component occlusion of claim 2, wherein the base feature extraction model is based on a ResNet-50 convolutional neural network pre-trained by ImageNet and with terminal downsampling and full connectivity layers removed; the basic feature extraction model compresses an input image into an image of a predetermined size before extracting image features.

4. The method of claim 2, wherein the global feature extractor extracts a global feature vector of a pedestrian target according to the complete pedestrian target feature map, and comprises:

5. The method of claim 2, wherein the extracting component occlusion feature vectors of the pedestrian target according to the complete pedestrian target feature map comprises:

6. The method of claim 2, wherein the extracting the attention feature vector of the pedestrian target according to the complete pedestrian target feature map comprises: