CN115830531A

CN115830531A - Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion

Info

Publication number: CN115830531A
Application number: CN202211481147.8A
Authority: CN
Inventors: 陈斌; 陈玉
Original assignee: Shenyang University of Chemical Technology
Current assignee: Shenyang University of Chemical Technology
Priority date: 2022-11-24
Filing date: 2022-11-24
Publication date: 2023-03-21

Abstract

A pedestrian re-identification method based on residual multi-channel attention multi-feature fusion relates to a pedestrian re-identification method, a residual dual-channel attention module is constructed, a residual block and the residual dual-channel attention module are connected in series to form a new residual attention module (RRMCA), and the RRMCA module is used for building a main network to extract image features; forming three branch networks and performing feature fusion; using a Softmax loss, triple loss and center loss joint optimization model; the introduction of the attention mechanism enables the network to selectively strengthen key features, inhibit useless features, improve the discrimination capability of the network and the expression capability of a model, effectively solve the problem of global weakening caused by the attention mechanism, extract key information of pedestrians by the fusion components of multi-scale features, and obtain global features with discrimination capability, so that the accuracy of pedestrian re-identification is improved, and the problems of more details and distinguishing remarkable features of pedestrian re-identification are solved.

Description

Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion

Technical Field

The invention relates to a pedestrian re-identification method, in particular to a pedestrian re-identification method based on residual multi-channel attention multi-feature fusion.

Background

Pedestrian re-identification is a technique that utilizes computer vision techniques to determine whether a particular pedestrian is present in an image or video sequence, i.e., by identifying the same person from multiple monitored images of pedestrians. With the popularization of large-scale high-definition cameras and the development of high-speed communication networks, the pedestrian re-identification technology is widely applied to the fields of intelligent people searching systems, intelligent security and protection, automatic driving, case solving of public security departments, social security protection and the like. Because the phenomena of illumination difference, attitude change, visual angle change, shielding, background noise and the like existing in the video monitoring descending pedestrian image bring certain challenges to identity recognition, extracting pedestrian features with robustness to judge whether the two image features belong to the same individual becomes a key problem for solving the heavy pedestrian recognition.

The traditional pedestrian re-identification technology judges whether the same pedestrian is under the crossing equipment or not by means of manually designing features and distance measurement, and is difficult to meet the pedestrian re-identification task of a complex monitoring scene due to the fact that manually extracting features are complex and the feature extracting capacity is limited. With the development of deep learning in recent years, a convolutional neural network has strong feature extraction capability, can automatically extract features from original image data according to task requirements, and has a remarkable effect in pedestrian re-identification application.

At present, the research of the pedestrian re-identification technology is mainly based on two modes of characterization learning and metric learning. The similarity between pictures is not directly considered in the characterization learning method when the network is trained, and a pedestrian re-identification task is taken as a classification problem or a verification problem. Unlike characterization learning, metric learning aims to learn the similarity of two pictures through a network. Specifically, the similarity of different pictures of the same pedestrian is greater than that of different pictures of different pedestrians. Finally, the loss function of the network enables the distance between the same pedestrian pictures (positive sample pairs) to be as small as possible, the distance between different pedestrian pictures (negative sample pairs) to be as large as possible, and common measurement learning loss methods comprise contrast loss, triple loss, quadruple loss and the like. Both the characterization learning and the metric learning are based on the image features extracted by the network model.

With the continuous improvement of network models, the combination of local features and global features achieves good results in the pedestrian re-identification technology. Specifically, the target pedestrian image is horizontally divided into equal parts, the equal parts are respectively sent into a model in sequence for training, and finally, the equal divided features are fused. The global features and the local features of the pedestrians are weighted through an attention mechanism, so that the remarkable features of the pedestrians are extracted, and the accuracy of pedestrian re-identification is improved.

Although the method has achieved extremely high accuracy, the method has the following problems that (1) weighting through an attention mechanism can lead to that the whole features are distributed with smaller weight in the network optimization process, and the feature difference of different identity images is reduced. (2) In the process of feature fusion, local features and global features are only unilaterally aggregated, the change of the pedestrian features on the scale due to the change of the pedestrian postures and positions is not considered, and meanwhile, the pedestrian features of different scales are not well reserved, so that the feature extraction is more easily interfered by information from irrelevant areas.

Therefore, how to make the finally extracted image features contain more detailed information and distinguishing salient features is a main problem in the pedestrian re-identification model.

Disclosure of Invention

The invention aims to provide a pedestrian re-identification method based on residual multi-channel attention multi-feature fusion, which provides a network model capable of acquiring deeper detail information and more discriminative feature information of a target pedestrian image, and obtains richer features to represent pedestrians by using a multi-task loss function training network, so that the accuracy of pedestrian re-identification is improved, and the problems of more details and distinguishing significant features of pedestrian re-identification are solved.

The purpose of the invention is realized by the following technical scheme:

a pedestrian re-identification method based on residual multi-channel attention multi-feature fusion comprises the following steps:

step 1: aiming at the input data set, extracting image features by utilizing the constructed pedestrian re-identification network model;

step 2: constructing a residual double-channel attention module to extract the discriminative characteristics of the pedestrian image;

and 3, step 3: fusing multi-scale features;

and 4, step 4: performing combined training on the pedestrian re-recognition network model in the step 1 by adopting a Softmax loss function, a triple loss function and a central loss function to obtain optimal parameters of the pedestrian re-recognition network model;

and 5: and calculating Euclidean distances of specified objects in the query set and each object in the candidate set aiming at the query set and the candidate set contained in the pedestrian re-identification public data set, and then performing ascending sequencing on the calculated distances to obtain a sequencing result of the pedestrian re-identification.

The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion comprises the following sub-steps of 1:

step 1.1: in order to enrich the diversity of data and improve the generalization capability of a model, firstly, preprocessing operations including normalization, random horizontal turning, random cutting, random erasing and the like are performed on a picture data set of an input model; sending the picture data set after the preprocessing operation into a pedestrian re-identification network model;

step 1.2: the pedestrian re-identification network model is an RRMCA-ResNet50 network, and the RRMCA-ResNet50 network is a network model formed by respectively connecting a Residual double-Channel Attention module (RMCA) with 4 improved Residual blocks in ResNet-50 in series; the RRMCA-ResNet50 main network comprises 1 convolutional layer, 1 pooling layer and 4 residual attention modules (RRMAC), wherein the feature maps extracted by the RRMCA _1, the RRMCA _3 and the RRMCA _4 of the network model form a multi-branch structure through three branch networks respectively.

The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion comprises the following sub-steps in step 2:

step 2.1: introducing a residual double-channel attention module (RMAC) into a network structure, extracting channel attention weights by using two branches with different scales by combining semantic information of Global context and local context, wherein one branch extracts the channel attention weight of Global features through Global Avg Pooling and then through point convolution, and the other branch directly extracts the channel attention weight of local features through point convolution to obtain more detailed information of image features; a residual error structure is added in the module, and the obtained new characteristic diagram is added with the input characteristic diagram, so that the problem of global weakening caused by an attention mechanism is effectively solved;

step 2.2: for local branching, firstly, inputting a feature vector, reducing the number of channels of the input feature vector H (x) to be 1/r of the original number by point convolution of 1 x 1, then activating a function, and finally, performing dimension raising by the point convolution of 1 x 1 to restore the number of the channels to be the same as the number of the original channels; the calculation formula of the channel attention of the local feature L (X) is as follows:

wherein

Represents a dot convolution of 1 x 1, B represents a BatchNorm layer,

it is shown that the function of the activation of the mesh,

represents a point convolution of 1 by 1, r is the channel scaling ratio,

representing feature vectors extracted by local feature channels; the other branch is the attention of the channel of the input feature passing through the global feature, and the difference of the other branch and the local branch is that the input feature is input firstPerforming global average pooling operation on the input features; the calculation formula L (X) of the channel attention of the global feature is as follows:

wherein

Represents the global average pooling layer or layers,

representing feature vectors extracted by the global feature channel;

step 2.3: the obtained feature vector

And

performing feature fusion, activating by using an activation function to obtain a weight matrix, endowing the weight matrix to the feature map, and adding the obtained new feature map and the input feature map through a residual error structure, wherein the calculation formula is as follows:

the pedestrian re-identification method based on residual multi-channel attention multi-feature fusion comprises the following sub-steps of step 3:

step 3.1: shallow layer feature f extracted by RRMCA _1 module in RRMCA-ResNet50 network ₁ Intermediate layer characteristic f extracted by RRMCA _3 module ₂ And deep features f extracted through global network ₃ The dimensions are respectively 256 dimensions, 1024 dimensions and 2048 dimensions; for the three different scales of the feature map f ₁ 、f ₂ 、f ₃ Performing pooling treatment on the three output characteristics obtained in three branch networks by a generalized average pooling layer, performing normalization treatment on the output characteristics obtained in the three branch networks by a BN layerThe sign diagrams are respectively f ₁ ′、f ₂ ' and f ₃ ', both 2048-dimensional;

step 3.2: the characteristic f obtained after normalization processing in the step 3.1 ₁ ′、f ₂ ' and f ₃ 'stitching, multi-scale feature fusion in channel stacking manner f' = [ f ₁ ′,f ₂ ′, f ₃ ′]. Where f1' is a low-level feature, containing more detailed information of the image, f ₂ ' and f ₃ The pedestrian detection method has the advantages that the pedestrian detection method is high-level features, stronger semantic information is achieved, the high-level features and the low-level features are fused to represent pedestrians, the influence of information loss caused by convolution and pooling can be effectively reduced, and the discriminability of the features is enhanced.

The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion comprises the following steps of (4):

in the training stage of the model, the triple loss function and the central loss function are used for calculating features extracted after the generalized average pooling layer, and optimizing Euclidean distance between image pairs, so that the distance between any target sample and a positive sample is minimum, and the distance between any target sample and a negative sample is maximum; the Softmax loss is used for calculating features extracted after the full connection layer to optimize cosine distances between image pairs, and the Softmax loss, the triple loss and the center loss are used for jointly optimizing a model to enable the model to learn a feature space which is more compact in a class and more separated between classes; the loss function is defined as follows:

wherein

Is the first in the model

A total of three branches,

the weights of the respective loss functions are represented separately.

The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion comprises the following steps of 5:

calculating Euclidean distances of specified objects in the query set and each object in the candidate set aiming at the query set and the candidate set contained in the pedestrian re-identification public data set, and then performing ascending sequencing on the calculated distances to obtain a sequencing result of the pedestrian re-identification; and judging the model training condition by adopting performance indexes such as the first hit rate (Rank-1), the average accuracy (mAP) and the like.

The invention has the advantages and effects that:

according to the pedestrian re-identification method based on the residual double-channel attention mechanism and the multi-scale feature fusion, firstly, four residual structures in an improved ResNet50 are respectively connected with a light-weight residual double-channel attention module in series to form a new residual module (RRMCA), on one hand, input features are extracted through a global channel to obtain discriminative information, on the other hand, local fine-grained features can be extracted from a local channel, so that a pedestrian re-identification task focuses on key information of some pedestrians, and interference of irrelevant information on the identification task is reduced. Secondly, through multi-scale feature fusion, fine-grained features on the pedestrian body can be obtained from local features with small receptive field, and global coarse-grained information can be obtained from global features with large receptive field. And finally, the model is trained by combining three loss functions with a multi-network model, so that the model converges to a feature space with compact class and separated class, the model optimization is facilitated, and the generalization of the model is enhanced. The pedestrian re-identification network model designed by the invention is proved to have higher accuracy by the experimental results of training and testing on the Market1501 and DukeMTMC-reiD data sets.

Drawings

FIG. 1 is a block diagram of a pedestrian re-identification function of the present invention;

FIG. 2 is a diagram of the RRMCA-ResNet50 network model architecture of the present invention;

FIG. 3 is a diagram of a modified residual layer structure according to the present invention;

FIG. 4 is a diagram of the residual two-channel attention network architecture of the present invention;

FIG. 5 is a block diagram of the RRMCA module of the present invention;

FIG. 6 is a flow chart of model training in accordance with the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below with reference to the accompanying drawings.

The invention provides a pedestrian re-identification technical method based on a residual multi-channel attention mechanism and multi-scale feature fusion. The following technical steps are adopted:

1. adopting a Market1501 and DukeMTMC-reiD data set in the pedestrian re-identification data set;

2. the experimental calculation platform is a Windows10 operating system, the GPU is used for operation acceleration, the software environment adopts Python 3.8 and a Pythroch deep learning framework to construct a model, and an Adam optimizer is used for optimizing model parameters to complete model construction in the environment;

3. the network model introduces a residual error attention mechanism on the basis of a ResNet50 network residual error block, the RRMCA-ResNet50 pedestrian re-identification network model is designed, and in consideration of the condition that a ReLU activation function in a Bottleneck structure of the ResNet50 network can be reduced to zero when the gradient is negative, the RReLU activation function is adopted in the model to replace the ReLU activation function, so that the defects that the gradient of a ReLU activation function neuron disappears and data fails are overcome;

4. in order to enable the network to fully learn the features of different scales and improve the feature effectiveness, the invention respectively extracts feature maps output by RRMCA _1, RRMCA _3 and RRMCA _4 on an RRMCA-ResNet50 network, and the feature maps respectively correspond to the shallow texture feature, the intermediate layer transition feature and the deep semantic feature extracted by a main network. And forming three different network branches by using the three characteristic graphs, and finally performing fusion prediction on the characteristics obtained after branch processing. Fully extracting key information of pedestrians by extracting image features on different scales to obtain global features with discriminative power;

5. taking the training data set after preprocessing as the input of the network, training, loading the pre-training weight of the network model, and using a Softmax loss, triple loss and central loss combined optimization model;

6. in order to measure the performance of the algorithm, two indexes of first hit rate (Rank-1) and average accuracy (mAP) are adopted on two pedestrian identification data sets of Market1501 and DukeMTMC-reiD to measure the effectiveness of the network model.

The specific implementation steps of the step 1 are as follows:

market1501 data collection is from 6 cameras in the Qinghua university campus, contains 1501 identity 32668 pedestrian pictures. Wherein the training set comprises 12936 pictures with 751 identities, the test set comprises 19732 pictures with 750 identities, and the identities contained in the training set and the test set are different. Images of the DukeMTMC-reID dataset were collected from 8 different cameras on the university of duck campus, where the training set contained 16522 images of 702 identities, 2228 of the other 702 identities were used as search pictures, and 17661 pictures of 1110 identities were used as the set of searched pictures.

The picture naming format in the dataset is: 0001_c1s1_000151 _00.Jpg;

the size of the input pedestrian picture is 224x224, and the number of channels is 3;

the concrete implementation steps of the step 2 are as follows:

importing a configuration environment required by the model;

setting the initial learning rate of Adam in the model to be 0.0001, dropout =0.5, extracting P identities and K images in the pedestrian data set each time as training samples to be sent to the network model, setting P = 16, K = 4, setting the batch number B =32, and setting epoche =100;

the concrete implementation steps of the step 3 are as follows:

the invention designs a residual double-channel attention module, and the residual double-channel attention module is respectively connected with 4 residual modules of ResNet50 in series, so that a pedestrian re-identification heavy network model RRMCA-ResNet50 network based on fusion of residual double-channel attention and multi-scale features is constructed. The RRMCA-ResNet50 network comprises a convolutional layer, a pooling layer, four RRMCA modules and three branch networks. In the process of extracting the image features of the target pedestrian, the network selectively strengthens key features, inhibits useless features and improves the distinguishing capability of the network and the expression capability of the model.

The ResNet50 residual layer uses a ReLU activation function, and the ResNet50 residual layer uses a RReLU function to replace the ReLU activation function, so that the gradient cannot be set to be zero when x is less than 0, and effective information of a negative half shaft is kept as much as possible. The mathematical expression for RReLU is:

(1)

in the training phase, x is the input of the neuron, and the parameter b is uniformly distributed.

ResNet50 contains 4 residual blocks, each of which has 3,4,6,3 residual layers, for each residual layer, the input feature X is convolved, normalized and propagated forward through the activation layer to obtain F (X), the other path is added with F (X) through an identity map, and then the feature H (X) is output through the activation layer, the activation function is a RReLU function, and the formula of H (X) is as follows:

(2)

wherein

For the RReLU activation function, X is the input characteristic of the residual layer, F (X) is the output after the first layer linear change and activation, and H (X) is the output characteristic after one layer of residual layer.

The residual error double-channel attention module extracts channel attention weight by combining semantic information of Global context and local context and using two branches with different scales, wherein one branch extracts attention of Global features through Global Avg Pooling and then through point convolution, and the other branch extracts the channel attention of local features through point convolution directly to obtain more feature detailed information. A residual error structure is added in the module, the obtained new characteristic diagram is added with the input characteristic diagram, the global weakening problem caused by the attention mechanism is effectively solved, point convolution is used in the model to replace convolution kernels with different sizes, and the model is lightened as much as possible.

For local branching, the input feature vector is firstly subjected to 1 × 1 point convolution, the number of channels of the input feature vector H (x) is reduced to 1/r of the original number, then the activation function is activated, and finally the dimension is raised through 1 × 1 point convolution to restore the number of channels to be the same as the number of original channels. The calculation formula of the channel attention of the local feature L (X) is as follows:

（3）

wherein

Represents a dot convolution of 1 by 1, B represents the BatchNorm layer,

it is shown that the function of the activation of the mesh,

represents the point convolution of 1X 1, r is the channel scaling ratio, and L (X) represents the feature vector extracted by the local feature channel. The other branch is the channel attention of the input features passing through the global features, and unlike the local branch, the global average pooling operation is firstly carried out on the input features. The calculation formula L (X) of the channel attention of the global feature is as follows:

（4）

wherein

Represents the global average pooling layer or layers,

representing feature vectors extracted via the global feature channel.

The obtained feature vectors L (X) and

performing feature fusion, activating by using an activation function to obtain a weight matrix, endowing the weight matrix to the feature map, adding the obtained new feature map and the input feature map through a residual structure, and calculating according to the following formula:

（5）

the residual double-channel attention designed by the invention belongs to a lightweight model, the model precision is improved, and the generalization capability of the model is improved.

The specific implementation steps of the step 4 are as follows:

the feature map information obtained by the average pooling or maximum pooling operation may be incomplete, and in order to avoid the occurrence of such a phenomenon as much as possible, the feature map in the branch network is pooled by using the generalized average pooling operation, and then the BN operation and the feature fusion are sequentially performed. The generalized mean pooling is calculated as follows:

（6）

where k is the number of channels in the feature map,

is a feature of the kth channel of the feature map,

is the pooling parameter of the pool-forming liquid,

the concrete implementation steps of the step 5 are as follows:

using a model for joint optimization of Softmax loss, triplet loss and center loss, the loss function is defined as follows:

（7）

wherein

Is the first in the model

A total of three branches,

the weights of the individual loss functions are represented separately.

The Softmax loss function expression is as follows:

（8）

wherein

Indicating the size of the batch, namely 32,

as a result of the number of classifications,

as the number of the categories,

the feature vector for the ith sample in the class,

is the weight;

is the offset. The triplet loss function expression is as follows:

（9）

wherein

、

And

respectively representing the features of the anchor point, the positive sample and the negative sample extracted by the model.

Representing the distance interval of triplets loss. The center loss function expression is as follows:

（10）

wherein

Is a depth feature

The cluster center makes up the defect that the triples cannot provide global optimal constraint.

The concrete implementation steps of the step 6 are as follows:

sending the training set and the test set which are subjected to pretreatment into a network to respectively carry out model training and testing;

the pedestrian re-recognition algorithm aims at finding the pedestrian which is most similar to the target to be inquired in the pedestrian images shot by different cameras, and can be regarded as a sequencing problem. Rank-1 represents the accuracy rate that the first picture and the picture to be inquired belong to the same identity in the sorted list.

The mAP is obtained by summing and averaging the average precision, and the calculation formula is as follows:

(11)

wherein

For the purpose of the average accuracy of the classes,

is the total number of categories.

The invention provides a pedestrian re-identification technical method based on residual double-channel attention and multi-scale feature fusion, which aims at solving the problems of pedestrian attitude change and global weakening caused by an attention mechanism in pedestrian re-identification. The following steps are required for carrying out a complete pedestrian identification task for one time: the method comprises the steps of (1) inputting a pedestrian data set (2), preprocessing an input data set picture, (3) extracting image features through a feature extraction module of a network, (4) carrying out feature fusion on feature graphs of different scales in a branch structure, (5) identifying a pedestrian target image through a loss function optimization model (6), and outputting a result. The pedestrian re-identification function module is shown in fig. 1. The network architecture in each functional module will be described in detail based on a complete recognition task

A data set input module: 32 images of the pedestrian are randomly extracted from the pedestrian data set at a time as training samples and sent to a training network.

An image preprocessing module: before image feature extraction, a preprocessing operation, namely data enhancement, needs to be performed on the picture. Including normalization, random horizontal flipping, random cropping, and random erasing. Wherein the probability of random horizontal flipping of each image is set to 0.5, each image is decoded to 32-bit floating point raw pixel values in [0,1], and the data is normalized by subtracting 0.485, 0.456, 0.406, respectively, and dividing by 0.229, 0.224, 0.225, respectively, to improve the convergence speed of the model. After the preprocessing of the data set is completed, the building of a model and the configuration of an operating environment need to be carried out. The integrated development environment adopting Pycharm as a project, constructing a model by a Pytrch deep learning framework, importing conda into the project, and accelerating by using a GPU. And setting parameters and building a model after the environment configuration is completed.

A feature extraction module: the pedestrian re-identification network model designed by the invention is an RRMCA-ResNet50 network, and the RRMCA-ResNet50 network is a network model formed by respectively connecting a Residual double Channel Attention module (RMCA) with 4 improved Residual blocks in ResNet-50 in series; the RRMCA-ResNet50 backbone network includes 1 convolutional layer, 1 pooling layer, and 4 residual attention modules (RRMAC), wherein the network model architecture diagram is shown in FIG. 2. One residual layer in the residual block in the modified ResNet-50 is shown in fig. 3. The structure diagram of the residual error two-channel attention mechanism network designed by the invention is shown in figure 4. The RRMCA module obtained by connecting the residual two-channel attention module in series with the improved residual block is shown in fig. 5. The pedestrian pictures after passing through the image preprocessing module are sent into a network model, and the salient features of the images are extracted sequentially through the convolution layer, the pooling layer and the four RRMCA modules.

A feature fusion module: respectively extracting feature graphs output by RRMCA _1, RRMCA _3 and RRMCA _4 on the RRMCA-ResNet50 network, forming three different network branches by using the three feature graphs, and finally performing fusion prediction on the features obtained after branch processing. The dimensions of the characteristic diagrams output by the RRMCA _1, RRMCA _3 and RRMCA _4 modules are 256 dimensions, 1024 dimensions and 2048 dimensions respectively, and characteristic diagrams with 2048 dimensions are obtained through a generalized average pooling layer. And a BN layer is added, so that the optimization of a loss function is facilitated. In the model testing stage, the feature vector f is tested ₁ ′、f ₂ ' and f ₃ 'feature fusion is performed in a channel-stacked manner, i.e. f' = [ f = [ ] ₁ ′,f ₂ ′,f ₃ ′]。

A loss calculating module: in the training phase of the model, the features extracted by the Gempool layer are used for calculating triple loss and central loss, and the features extracted by the full-link layer are used for calculating Softmax loss. And judging whether the feature extraction module converges or not according to the output result of the loss function, if so, sending the feature vector to the identification module, and if not, carrying out gradient back propagation on the loss result, updating the attention mechanism network parameters until converging to obtain the model weight according with the system, wherein the training process of the model is shown in fig. 6.

A test module: after the input image is subjected to model training through the module, a training result needs to be tested, and in the testing module, a sequencing result of pedestrian re-identification is obtained by calculating Euclidean distances of specified objects in the query set and each object in the candidate set and then sequencing the calculated distances in an ascending order. The model training situation is generally judged by adopting performance indexes such as the first hit rate (Rank-1) and the average accuracy (mAP).

Claims

1. A pedestrian re-identification method based on residual multi-channel attention multi-feature fusion is characterized by comprising the following steps of:

and step 3: fusing multi-scale features;

and 5: and aiming at the query set and the candidate set contained in the pedestrian re-identification public data set, calculating Euclidean distances of the specified objects in the query set and each object in the candidate set, and then performing ascending sorting on the calculated distances to obtain a sorting result of pedestrian re-identification.

2. The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion as claimed in claim 1, wherein the step 1 comprises the following sub-steps:

step 1.1: in order to enrich the diversity of data and improve the generalization capability of a model, firstly, preprocessing operations including normalization, random horizontal turnover, random cutting, random erasing and the like are performed on a picture data set of an input model; sending the picture data set after the preprocessing operation into a pedestrian re-identification network model;

3. The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion as claimed in claim 1, wherein the step 2 comprises the following sub-steps:

step 2.1: a residual double-channel attention module (RMAC) is introduced into a network structure, semantic information of Global context and local context is combined, two branches with different scales are used for extracting channel attention weight, one branch extracts the channel attention weight of Global features through Global Avg Poolling and then through point convolution, and the other branch extracts the channel attention weight of local features through point convolution directly to obtain more detailed information of image features; a residual error structure is added in the module, and the obtained new characteristic diagram is added with the input characteristic diagram, so that the problem of global weakening caused by an attention mechanism is effectively solved;

wherein

Represents a dot convolution of 1 by 1, B represents the BatchNorm layer,

it is shown that the function of the activation of the mesh,

represents a point convolution of 1 by 1, r is the channel scaling ratio,

representing feature vectors extracted by local feature channels; the other branch is the channel attention of the input features passing through the global features, and compared with the local branch, the global average pooling operation is firstly carried out on the input features; the calculation formula L (X) of the channel attention of the global feature is as follows:

wherein

Represents the global average pooling layer or layers,

representing feature vectors extracted by the global feature channel;

step 2.3: the obtained feature vector

And

。

4. the pedestrian re-identification method based on residual multi-channel attention multi-feature fusion as claimed in claim 1, wherein the step 3 comprises the following sub-steps:

step 3.1: shallow layer feature f extracted by RRMCA _1 module in RRMCA-ResNet50 network ₁ Intermediate layer characteristic f extracted by RRMCA _3 module ₂ And deep features f extracted through global network ₃ The dimensions are respectively 256 dimensions, 1024 dimensions and 2048 dimensions; for the three different scales of the feature map f ₁ 、f ₂ 、f ₃ Performing pooling treatment on the three branch networks through a generalized average pooling layer, performing normalization treatment on the three branch networks through a BN layer, wherein output characteristic graphs obtained in the three branch networks are f ₁ ′、f ₂ ' and f ₃ ', both 2048-dimensional;

step 3.2: the characteristic f obtained after normalization processing in the step 3.1 ₁ ′、f ₂ ' and f ₃ 'stitching, multi-scale feature fusion in channel stacking manner f' = [ f ₁ ′,f ₂ ′, f ₃ ′]；

Where f1' is a low-level feature, containing more detailed information of the image, f ₂ ' and f ₃ The pedestrian detection method has the advantages that the pedestrian detection method is high-level features, stronger semantic information is achieved, the high-level features and the low-level features are fused to represent pedestrians, the influence of information loss caused by convolution and pooling can be effectively reduced, and the discriminability of the features is enhanced.

5. The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion as claimed in claim 1, wherein the step 4 comprises the following steps:

wherein

Is the first in the model

A total of three branches,

the weights of the respective loss functions are represented separately.

6. The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion as claimed in claim 1, wherein the step 5 comprises the following steps:

aiming at a query set and a candidate set contained in a pedestrian re-identification public data set, calculating Euclidean distances of specified objects in the query set and each object in the candidate set, and then performing ascending sorting on the calculated distances to obtain a sorting result of pedestrian re-identification; and judging the model training condition by adopting performance indexes such as first hit rate (Rank-1), average accuracy (mAP) and the like.