CN115830531A - Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion - Google Patents

Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion Download PDF

Info

Publication number
CN115830531A
CN115830531A CN202211481147.8A CN202211481147A CN115830531A CN 115830531 A CN115830531 A CN 115830531A CN 202211481147 A CN202211481147 A CN 202211481147A CN 115830531 A CN115830531 A CN 115830531A
Authority
CN
China
Prior art keywords
pedestrian
feature
features
residual
channel attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211481147.8A
Other languages
Chinese (zh)
Inventor
陈斌
陈玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang University of Chemical Technology
Original Assignee
Shenyang University of Chemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang University of Chemical Technology filed Critical Shenyang University of Chemical Technology
Priority to CN202211481147.8A priority Critical patent/CN115830531A/en
Publication of CN115830531A publication Critical patent/CN115830531A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

A pedestrian re-identification method based on residual multi-channel attention multi-feature fusion relates to a pedestrian re-identification method, a residual dual-channel attention module is constructed, a residual block and the residual dual-channel attention module are connected in series to form a new residual attention module (RRMCA), and the RRMCA module is used for building a main network to extract image features; forming three branch networks and performing feature fusion; using a Softmax loss, triple loss and center loss joint optimization model; the introduction of the attention mechanism enables the network to selectively strengthen key features, inhibit useless features, improve the discrimination capability of the network and the expression capability of a model, effectively solve the problem of global weakening caused by the attention mechanism, extract key information of pedestrians by the fusion components of multi-scale features, and obtain global features with discrimination capability, so that the accuracy of pedestrian re-identification is improved, and the problems of more details and distinguishing remarkable features of pedestrian re-identification are solved.

Description

Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion
Technical Field
The invention relates to a pedestrian re-identification method, in particular to a pedestrian re-identification method based on residual multi-channel attention multi-feature fusion.
Background
Pedestrian re-identification is a technique that utilizes computer vision techniques to determine whether a particular pedestrian is present in an image or video sequence, i.e., by identifying the same person from multiple monitored images of pedestrians. With the popularization of large-scale high-definition cameras and the development of high-speed communication networks, the pedestrian re-identification technology is widely applied to the fields of intelligent people searching systems, intelligent security and protection, automatic driving, case solving of public security departments, social security protection and the like. Because the phenomena of illumination difference, attitude change, visual angle change, shielding, background noise and the like existing in the video monitoring descending pedestrian image bring certain challenges to identity recognition, extracting pedestrian features with robustness to judge whether the two image features belong to the same individual becomes a key problem for solving the heavy pedestrian recognition.
The traditional pedestrian re-identification technology judges whether the same pedestrian is under the crossing equipment or not by means of manually designing features and distance measurement, and is difficult to meet the pedestrian re-identification task of a complex monitoring scene due to the fact that manually extracting features are complex and the feature extracting capacity is limited. With the development of deep learning in recent years, a convolutional neural network has strong feature extraction capability, can automatically extract features from original image data according to task requirements, and has a remarkable effect in pedestrian re-identification application.
At present, the research of the pedestrian re-identification technology is mainly based on two modes of characterization learning and metric learning. The similarity between pictures is not directly considered in the characterization learning method when the network is trained, and a pedestrian re-identification task is taken as a classification problem or a verification problem. Unlike characterization learning, metric learning aims to learn the similarity of two pictures through a network. Specifically, the similarity of different pictures of the same pedestrian is greater than that of different pictures of different pedestrians. Finally, the loss function of the network enables the distance between the same pedestrian pictures (positive sample pairs) to be as small as possible, the distance between different pedestrian pictures (negative sample pairs) to be as large as possible, and common measurement learning loss methods comprise contrast loss, triple loss, quadruple loss and the like. Both the characterization learning and the metric learning are based on the image features extracted by the network model.
With the continuous improvement of network models, the combination of local features and global features achieves good results in the pedestrian re-identification technology. Specifically, the target pedestrian image is horizontally divided into equal parts, the equal parts are respectively sent into a model in sequence for training, and finally, the equal divided features are fused. The global features and the local features of the pedestrians are weighted through an attention mechanism, so that the remarkable features of the pedestrians are extracted, and the accuracy of pedestrian re-identification is improved.
Although the method has achieved extremely high accuracy, the method has the following problems that (1) weighting through an attention mechanism can lead to that the whole features are distributed with smaller weight in the network optimization process, and the feature difference of different identity images is reduced. (2) In the process of feature fusion, local features and global features are only unilaterally aggregated, the change of the pedestrian features on the scale due to the change of the pedestrian postures and positions is not considered, and meanwhile, the pedestrian features of different scales are not well reserved, so that the feature extraction is more easily interfered by information from irrelevant areas.
Therefore, how to make the finally extracted image features contain more detailed information and distinguishing salient features is a main problem in the pedestrian re-identification model.
Disclosure of Invention
The invention aims to provide a pedestrian re-identification method based on residual multi-channel attention multi-feature fusion, which provides a network model capable of acquiring deeper detail information and more discriminative feature information of a target pedestrian image, and obtains richer features to represent pedestrians by using a multi-task loss function training network, so that the accuracy of pedestrian re-identification is improved, and the problems of more details and distinguishing significant features of pedestrian re-identification are solved.
The purpose of the invention is realized by the following technical scheme:
a pedestrian re-identification method based on residual multi-channel attention multi-feature fusion comprises the following steps:
step 1: aiming at the input data set, extracting image features by utilizing the constructed pedestrian re-identification network model;
step 2: constructing a residual double-channel attention module to extract the discriminative characteristics of the pedestrian image;
and 3, step 3: fusing multi-scale features;
and 4, step 4: performing combined training on the pedestrian re-recognition network model in the step 1 by adopting a Softmax loss function, a triple loss function and a central loss function to obtain optimal parameters of the pedestrian re-recognition network model;
and 5: and calculating Euclidean distances of specified objects in the query set and each object in the candidate set aiming at the query set and the candidate set contained in the pedestrian re-identification public data set, and then performing ascending sequencing on the calculated distances to obtain a sequencing result of the pedestrian re-identification.
The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion comprises the following sub-steps of 1:
step 1.1: in order to enrich the diversity of data and improve the generalization capability of a model, firstly, preprocessing operations including normalization, random horizontal turning, random cutting, random erasing and the like are performed on a picture data set of an input model; sending the picture data set after the preprocessing operation into a pedestrian re-identification network model;
step 1.2: the pedestrian re-identification network model is an RRMCA-ResNet50 network, and the RRMCA-ResNet50 network is a network model formed by respectively connecting a Residual double-Channel Attention module (RMCA) with 4 improved Residual blocks in ResNet-50 in series; the RRMCA-ResNet50 main network comprises 1 convolutional layer, 1 pooling layer and 4 residual attention modules (RRMAC), wherein the feature maps extracted by the RRMCA _1, the RRMCA _3 and the RRMCA _4 of the network model form a multi-branch structure through three branch networks respectively.
The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion comprises the following sub-steps in step 2:
step 2.1: introducing a residual double-channel attention module (RMAC) into a network structure, extracting channel attention weights by using two branches with different scales by combining semantic information of Global context and local context, wherein one branch extracts the channel attention weight of Global features through Global Avg Pooling and then through point convolution, and the other branch directly extracts the channel attention weight of local features through point convolution to obtain more detailed information of image features; a residual error structure is added in the module, and the obtained new characteristic diagram is added with the input characteristic diagram, so that the problem of global weakening caused by an attention mechanism is effectively solved;
step 2.2: for local branching, firstly, inputting a feature vector, reducing the number of channels of the input feature vector H (x) to be 1/r of the original number by point convolution of 1 x 1, then activating a function, and finally, performing dimension raising by the point convolution of 1 x 1 to restore the number of the channels to be the same as the number of the original channels; the calculation formula of the channel attention of the local feature L (X) is as follows:
Figure 357335DEST_PATH_IMAGE001
wherein
Figure 798811DEST_PATH_IMAGE002
Represents a dot convolution of 1 x 1, B represents a BatchNorm layer,
Figure 162797DEST_PATH_IMAGE003
it is shown that the function of the activation of the mesh,
Figure 523371DEST_PATH_IMAGE004
represents a point convolution of 1 by 1, r is the channel scaling ratio,
Figure 284391DEST_PATH_IMAGE005
representing feature vectors extracted by local feature channels; the other branch is the attention of the channel of the input feature passing through the global feature, and the difference of the other branch and the local branch is that the input feature is input firstPerforming global average pooling operation on the input features; the calculation formula L (X) of the channel attention of the global feature is as follows:
Figure 654193DEST_PATH_IMAGE006
wherein
Figure 357838DEST_PATH_IMAGE007
Represents the global average pooling layer or layers,
Figure 217209DEST_PATH_IMAGE008
representing feature vectors extracted by the global feature channel;
step 2.3: the obtained feature vector
Figure 826045DEST_PATH_IMAGE005
And
Figure 325770DEST_PATH_IMAGE008
performing feature fusion, activating by using an activation function to obtain a weight matrix, endowing the weight matrix to the feature map, and adding the obtained new feature map and the input feature map through a residual error structure, wherein the calculation formula is as follows:
Figure 336452DEST_PATH_IMAGE009
the pedestrian re-identification method based on residual multi-channel attention multi-feature fusion comprises the following sub-steps of step 3:
step 3.1: shallow layer feature f extracted by RRMCA _1 module in RRMCA-ResNet50 network 1 Intermediate layer characteristic f extracted by RRMCA _3 module 2 And deep features f extracted through global network 3 The dimensions are respectively 256 dimensions, 1024 dimensions and 2048 dimensions; for the three different scales of the feature map f 1 、f 2 、f 3 Performing pooling treatment on the three output characteristics obtained in three branch networks by a generalized average pooling layer, performing normalization treatment on the output characteristics obtained in the three branch networks by a BN layerThe sign diagrams are respectively f 1 ′、f 2 ' and f 3 ', both 2048-dimensional;
step 3.2: the characteristic f obtained after normalization processing in the step 3.1 1 ′、f 2 ' and f 3 'stitching, multi-scale feature fusion in channel stacking manner f' = [ f 1 ′,f 2 ′, f 3 ′]. Where f1' is a low-level feature, containing more detailed information of the image, f 2 ' and f 3 The pedestrian detection method has the advantages that the pedestrian detection method is high-level features, stronger semantic information is achieved, the high-level features and the low-level features are fused to represent pedestrians, the influence of information loss caused by convolution and pooling can be effectively reduced, and the discriminability of the features is enhanced.
The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion comprises the following steps of (4):
in the training stage of the model, the triple loss function and the central loss function are used for calculating features extracted after the generalized average pooling layer, and optimizing Euclidean distance between image pairs, so that the distance between any target sample and a positive sample is minimum, and the distance between any target sample and a negative sample is maximum; the Softmax loss is used for calculating features extracted after the full connection layer to optimize cosine distances between image pairs, and the Softmax loss, the triple loss and the center loss are used for jointly optimizing a model to enable the model to learn a feature space which is more compact in a class and more separated between classes; the loss function is defined as follows:
Figure 366725DEST_PATH_IMAGE010
wherein
Figure 462857DEST_PATH_IMAGE011
Is the first in the model
Figure 174461DEST_PATH_IMAGE011
A total of three branches,
Figure 852698DEST_PATH_IMAGE012
the weights of the respective loss functions are represented separately.
The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion comprises the following steps of 5:
calculating Euclidean distances of specified objects in the query set and each object in the candidate set aiming at the query set and the candidate set contained in the pedestrian re-identification public data set, and then performing ascending sequencing on the calculated distances to obtain a sequencing result of the pedestrian re-identification; and judging the model training condition by adopting performance indexes such as the first hit rate (Rank-1), the average accuracy (mAP) and the like.
The invention has the advantages and effects that:
according to the pedestrian re-identification method based on the residual double-channel attention mechanism and the multi-scale feature fusion, firstly, four residual structures in an improved ResNet50 are respectively connected with a light-weight residual double-channel attention module in series to form a new residual module (RRMCA), on one hand, input features are extracted through a global channel to obtain discriminative information, on the other hand, local fine-grained features can be extracted from a local channel, so that a pedestrian re-identification task focuses on key information of some pedestrians, and interference of irrelevant information on the identification task is reduced. Secondly, through multi-scale feature fusion, fine-grained features on the pedestrian body can be obtained from local features with small receptive field, and global coarse-grained information can be obtained from global features with large receptive field. And finally, the model is trained by combining three loss functions with a multi-network model, so that the model converges to a feature space with compact class and separated class, the model optimization is facilitated, and the generalization of the model is enhanced. The pedestrian re-identification network model designed by the invention is proved to have higher accuracy by the experimental results of training and testing on the Market1501 and DukeMTMC-reiD data sets.
Drawings
FIG. 1 is a block diagram of a pedestrian re-identification function of the present invention;
FIG. 2 is a diagram of the RRMCA-ResNet50 network model architecture of the present invention;
FIG. 3 is a diagram of a modified residual layer structure according to the present invention;
FIG. 4 is a diagram of the residual two-channel attention network architecture of the present invention;
FIG. 5 is a block diagram of the RRMCA module of the present invention;
FIG. 6 is a flow chart of model training in accordance with the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below with reference to the accompanying drawings.
The invention provides a pedestrian re-identification technical method based on a residual multi-channel attention mechanism and multi-scale feature fusion. The following technical steps are adopted:
1. adopting a Market1501 and DukeMTMC-reiD data set in the pedestrian re-identification data set;
2. the experimental calculation platform is a Windows10 operating system, the GPU is used for operation acceleration, the software environment adopts Python 3.8 and a Pythroch deep learning framework to construct a model, and an Adam optimizer is used for optimizing model parameters to complete model construction in the environment;
3. the network model introduces a residual error attention mechanism on the basis of a ResNet50 network residual error block, the RRMCA-ResNet50 pedestrian re-identification network model is designed, and in consideration of the condition that a ReLU activation function in a Bottleneck structure of the ResNet50 network can be reduced to zero when the gradient is negative, the RReLU activation function is adopted in the model to replace the ReLU activation function, so that the defects that the gradient of a ReLU activation function neuron disappears and data fails are overcome;
4. in order to enable the network to fully learn the features of different scales and improve the feature effectiveness, the invention respectively extracts feature maps output by RRMCA _1, RRMCA _3 and RRMCA _4 on an RRMCA-ResNet50 network, and the feature maps respectively correspond to the shallow texture feature, the intermediate layer transition feature and the deep semantic feature extracted by a main network. And forming three different network branches by using the three characteristic graphs, and finally performing fusion prediction on the characteristics obtained after branch processing. Fully extracting key information of pedestrians by extracting image features on different scales to obtain global features with discriminative power;
5. taking the training data set after preprocessing as the input of the network, training, loading the pre-training weight of the network model, and using a Softmax loss, triple loss and central loss combined optimization model;
6. in order to measure the performance of the algorithm, two indexes of first hit rate (Rank-1) and average accuracy (mAP) are adopted on two pedestrian identification data sets of Market1501 and DukeMTMC-reiD to measure the effectiveness of the network model.
The specific implementation steps of the step 1 are as follows:
market1501 data collection is from 6 cameras in the Qinghua university campus, contains 1501 identity 32668 pedestrian pictures. Wherein the training set comprises 12936 pictures with 751 identities, the test set comprises 19732 pictures with 750 identities, and the identities contained in the training set and the test set are different. Images of the DukeMTMC-reID dataset were collected from 8 different cameras on the university of duck campus, where the training set contained 16522 images of 702 identities, 2228 of the other 702 identities were used as search pictures, and 17661 pictures of 1110 identities were used as the set of searched pictures.
The picture naming format in the dataset is: 0001_c1s1_000151 _00.Jpg;
the size of the input pedestrian picture is 224x224, and the number of channels is 3;
the concrete implementation steps of the step 2 are as follows:
importing a configuration environment required by the model;
setting the initial learning rate of Adam in the model to be 0.0001, dropout =0.5, extracting P identities and K images in the pedestrian data set each time as training samples to be sent to the network model, setting P = 16, K = 4, setting the batch number B =32, and setting epoche =100;
the concrete implementation steps of the step 3 are as follows:
the invention designs a residual double-channel attention module, and the residual double-channel attention module is respectively connected with 4 residual modules of ResNet50 in series, so that a pedestrian re-identification heavy network model RRMCA-ResNet50 network based on fusion of residual double-channel attention and multi-scale features is constructed. The RRMCA-ResNet50 network comprises a convolutional layer, a pooling layer, four RRMCA modules and three branch networks. In the process of extracting the image features of the target pedestrian, the network selectively strengthens key features, inhibits useless features and improves the distinguishing capability of the network and the expression capability of the model.
The ResNet50 residual layer uses a ReLU activation function, and the ResNet50 residual layer uses a RReLU function to replace the ReLU activation function, so that the gradient cannot be set to be zero when x is less than 0, and effective information of a negative half shaft is kept as much as possible. The mathematical expression for RReLU is:
Figure 991555DEST_PATH_IMAGE013
(1)
in the training phase, x is the input of the neuron, and the parameter b is uniformly distributed.
ResNet50 contains 4 residual blocks, each of which has 3,4,6,3 residual layers, for each residual layer, the input feature X is convolved, normalized and propagated forward through the activation layer to obtain F (X), the other path is added with F (X) through an identity map, and then the feature H (X) is output through the activation layer, the activation function is a RReLU function, and the formula of H (X) is as follows:
Figure 637300DEST_PATH_IMAGE014
(2)
wherein
Figure DEST_PATH_IMAGE015
For the RReLU activation function, X is the input characteristic of the residual layer, F (X) is the output after the first layer linear change and activation, and H (X) is the output characteristic after one layer of residual layer.
The residual error double-channel attention module extracts channel attention weight by combining semantic information of Global context and local context and using two branches with different scales, wherein one branch extracts attention of Global features through Global Avg Pooling and then through point convolution, and the other branch extracts the channel attention of local features through point convolution directly to obtain more feature detailed information. A residual error structure is added in the module, the obtained new characteristic diagram is added with the input characteristic diagram, the global weakening problem caused by the attention mechanism is effectively solved, point convolution is used in the model to replace convolution kernels with different sizes, and the model is lightened as much as possible.
For local branching, the input feature vector is firstly subjected to 1 × 1 point convolution, the number of channels of the input feature vector H (x) is reduced to 1/r of the original number, then the activation function is activated, and finally the dimension is raised through 1 × 1 point convolution to restore the number of channels to be the same as the number of original channels. The calculation formula of the channel attention of the local feature L (X) is as follows:
Figure 887016DEST_PATH_IMAGE016
(3)
wherein
Figure 183874DEST_PATH_IMAGE017
Represents a dot convolution of 1 by 1, B represents the BatchNorm layer,
Figure 493632DEST_PATH_IMAGE018
it is shown that the function of the activation of the mesh,
Figure 626674DEST_PATH_IMAGE019
represents the point convolution of 1X 1, r is the channel scaling ratio, and L (X) represents the feature vector extracted by the local feature channel. The other branch is the channel attention of the input features passing through the global features, and unlike the local branch, the global average pooling operation is firstly carried out on the input features. The calculation formula L (X) of the channel attention of the global feature is as follows:
Figure 680080DEST_PATH_IMAGE020
(4)
wherein
Figure 332910DEST_PATH_IMAGE021
Represents the global average pooling layer or layers,
Figure 813569DEST_PATH_IMAGE022
representing feature vectors extracted via the global feature channel.
The obtained feature vectors L (X) and
Figure 168327DEST_PATH_IMAGE022
performing feature fusion, activating by using an activation function to obtain a weight matrix, endowing the weight matrix to the feature map, adding the obtained new feature map and the input feature map through a residual structure, and calculating according to the following formula:
Figure 25425DEST_PATH_IMAGE023
(5)
the residual double-channel attention designed by the invention belongs to a lightweight model, the model precision is improved, and the generalization capability of the model is improved.
The specific implementation steps of the step 4 are as follows:
the feature map information obtained by the average pooling or maximum pooling operation may be incomplete, and in order to avoid the occurrence of such a phenomenon as much as possible, the feature map in the branch network is pooled by using the generalized average pooling operation, and then the BN operation and the feature fusion are sequentially performed. The generalized mean pooling is calculated as follows:
Figure 34226DEST_PATH_IMAGE024
(6)
where k is the number of channels in the feature map,
Figure 685787DEST_PATH_IMAGE025
is a feature of the kth channel of the feature map,
Figure 793420DEST_PATH_IMAGE026
is the pooling parameter of the pool-forming liquid,
the concrete implementation steps of the step 5 are as follows:
using a model for joint optimization of Softmax loss, triplet loss and center loss, the loss function is defined as follows:
Figure 188629DEST_PATH_IMAGE027
(7)
wherein
Figure 737422DEST_PATH_IMAGE028
Is the first in the model
Figure 638513DEST_PATH_IMAGE028
A total of three branches,
Figure DEST_PATH_IMAGE029
the weights of the individual loss functions are represented separately.
The Softmax loss function expression is as follows:
Figure 967864DEST_PATH_IMAGE030
(8)
wherein
Figure 166764DEST_PATH_IMAGE031
Indicating the size of the batch, namely 32,
Figure 881648DEST_PATH_IMAGE032
as a result of the number of classifications,
Figure 140591DEST_PATH_IMAGE033
as the number of the categories,
Figure 957237DEST_PATH_IMAGE034
the feature vector for the ith sample in the class,
Figure 694249DEST_PATH_IMAGE035
is the weight;
Figure 30683DEST_PATH_IMAGE036
is the offset. The triplet loss function expression is as follows:
Figure 460528DEST_PATH_IMAGE037
Figure 498891DEST_PATH_IMAGE038
(9)
wherein
Figure 39594DEST_PATH_IMAGE039
Figure 997579DEST_PATH_IMAGE040
And
Figure 332745DEST_PATH_IMAGE041
respectively representing the features of the anchor point, the positive sample and the negative sample extracted by the model.
Figure 937033DEST_PATH_IMAGE042
Representing the distance interval of triplets loss. The center loss function expression is as follows:
Figure 15847DEST_PATH_IMAGE043
(10)
wherein
Figure 513825DEST_PATH_IMAGE045
Is a depth feature
Figure 82209DEST_PATH_IMAGE046
The cluster center makes up the defect that the triples cannot provide global optimal constraint.
The concrete implementation steps of the step 6 are as follows:
sending the training set and the test set which are subjected to pretreatment into a network to respectively carry out model training and testing;
the pedestrian re-recognition algorithm aims at finding the pedestrian which is most similar to the target to be inquired in the pedestrian images shot by different cameras, and can be regarded as a sequencing problem. Rank-1 represents the accuracy rate that the first picture and the picture to be inquired belong to the same identity in the sorted list.
The mAP is obtained by summing and averaging the average precision, and the calculation formula is as follows:
Figure 298427DEST_PATH_IMAGE047
(11)
wherein
Figure 492517DEST_PATH_IMAGE048
For the purpose of the average accuracy of the classes,
Figure 845001DEST_PATH_IMAGE049
is the total number of categories.
The invention provides a pedestrian re-identification technical method based on residual double-channel attention and multi-scale feature fusion, which aims at solving the problems of pedestrian attitude change and global weakening caused by an attention mechanism in pedestrian re-identification. The following steps are required for carrying out a complete pedestrian identification task for one time: the method comprises the steps of (1) inputting a pedestrian data set (2), preprocessing an input data set picture, (3) extracting image features through a feature extraction module of a network, (4) carrying out feature fusion on feature graphs of different scales in a branch structure, (5) identifying a pedestrian target image through a loss function optimization model (6), and outputting a result. The pedestrian re-identification function module is shown in fig. 1. The network architecture in each functional module will be described in detail based on a complete recognition task
A data set input module: 32 images of the pedestrian are randomly extracted from the pedestrian data set at a time as training samples and sent to a training network.
An image preprocessing module: before image feature extraction, a preprocessing operation, namely data enhancement, needs to be performed on the picture. Including normalization, random horizontal flipping, random cropping, and random erasing. Wherein the probability of random horizontal flipping of each image is set to 0.5, each image is decoded to 32-bit floating point raw pixel values in [0,1], and the data is normalized by subtracting 0.485, 0.456, 0.406, respectively, and dividing by 0.229, 0.224, 0.225, respectively, to improve the convergence speed of the model. After the preprocessing of the data set is completed, the building of a model and the configuration of an operating environment need to be carried out. The integrated development environment adopting Pycharm as a project, constructing a model by a Pytrch deep learning framework, importing conda into the project, and accelerating by using a GPU. And setting parameters and building a model after the environment configuration is completed.
A feature extraction module: the pedestrian re-identification network model designed by the invention is an RRMCA-ResNet50 network, and the RRMCA-ResNet50 network is a network model formed by respectively connecting a Residual double Channel Attention module (RMCA) with 4 improved Residual blocks in ResNet-50 in series; the RRMCA-ResNet50 backbone network includes 1 convolutional layer, 1 pooling layer, and 4 residual attention modules (RRMAC), wherein the network model architecture diagram is shown in FIG. 2. One residual layer in the residual block in the modified ResNet-50 is shown in fig. 3. The structure diagram of the residual error two-channel attention mechanism network designed by the invention is shown in figure 4. The RRMCA module obtained by connecting the residual two-channel attention module in series with the improved residual block is shown in fig. 5. The pedestrian pictures after passing through the image preprocessing module are sent into a network model, and the salient features of the images are extracted sequentially through the convolution layer, the pooling layer and the four RRMCA modules.
A feature fusion module: respectively extracting feature graphs output by RRMCA _1, RRMCA _3 and RRMCA _4 on the RRMCA-ResNet50 network, forming three different network branches by using the three feature graphs, and finally performing fusion prediction on the features obtained after branch processing. The dimensions of the characteristic diagrams output by the RRMCA _1, RRMCA _3 and RRMCA _4 modules are 256 dimensions, 1024 dimensions and 2048 dimensions respectively, and characteristic diagrams with 2048 dimensions are obtained through a generalized average pooling layer. And a BN layer is added, so that the optimization of a loss function is facilitated. In the model testing stage, the feature vector f is tested 1 ′、f 2 ' and f 3 'feature fusion is performed in a channel-stacked manner, i.e. f' = [ f = [ ] 1 ′,f 2 ′,f 3 ′]。
A loss calculating module: in the training phase of the model, the features extracted by the Gempool layer are used for calculating triple loss and central loss, and the features extracted by the full-link layer are used for calculating Softmax loss. And judging whether the feature extraction module converges or not according to the output result of the loss function, if so, sending the feature vector to the identification module, and if not, carrying out gradient back propagation on the loss result, updating the attention mechanism network parameters until converging to obtain the model weight according with the system, wherein the training process of the model is shown in fig. 6.
A test module: after the input image is subjected to model training through the module, a training result needs to be tested, and in the testing module, a sequencing result of pedestrian re-identification is obtained by calculating Euclidean distances of specified objects in the query set and each object in the candidate set and then sequencing the calculated distances in an ascending order. The model training situation is generally judged by adopting performance indexes such as the first hit rate (Rank-1) and the average accuracy (mAP).

Claims (6)

1. A pedestrian re-identification method based on residual multi-channel attention multi-feature fusion is characterized by comprising the following steps of:
step 1: aiming at the input data set, extracting image features by utilizing the constructed pedestrian re-identification network model;
step 2: constructing a residual double-channel attention module to extract the discriminative characteristics of the pedestrian image;
and step 3: fusing multi-scale features;
and 4, step 4: performing combined training on the pedestrian re-recognition network model in the step 1 by adopting a Softmax loss function, a triple loss function and a central loss function to obtain optimal parameters of the pedestrian re-recognition network model;
and 5: and aiming at the query set and the candidate set contained in the pedestrian re-identification public data set, calculating Euclidean distances of the specified objects in the query set and each object in the candidate set, and then performing ascending sorting on the calculated distances to obtain a sorting result of pedestrian re-identification.
2. The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion as claimed in claim 1, wherein the step 1 comprises the following sub-steps:
step 1.1: in order to enrich the diversity of data and improve the generalization capability of a model, firstly, preprocessing operations including normalization, random horizontal turnover, random cutting, random erasing and the like are performed on a picture data set of an input model; sending the picture data set after the preprocessing operation into a pedestrian re-identification network model;
step 1.2: the pedestrian re-identification network model is an RRMCA-ResNet50 network, and the RRMCA-ResNet50 network is a network model formed by respectively connecting a Residual double-Channel Attention module (RMCA) with 4 improved Residual blocks in ResNet-50 in series; the RRMCA-ResNet50 main network comprises 1 convolutional layer, 1 pooling layer and 4 residual attention modules (RRMAC), wherein the feature maps extracted by the RRMCA _1, the RRMCA _3 and the RRMCA _4 of the network model form a multi-branch structure through three branch networks respectively.
3. The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion as claimed in claim 1, wherein the step 2 comprises the following sub-steps:
step 2.1: a residual double-channel attention module (RMAC) is introduced into a network structure, semantic information of Global context and local context is combined, two branches with different scales are used for extracting channel attention weight, one branch extracts the channel attention weight of Global features through Global Avg Poolling and then through point convolution, and the other branch extracts the channel attention weight of local features through point convolution directly to obtain more detailed information of image features; a residual error structure is added in the module, and the obtained new characteristic diagram is added with the input characteristic diagram, so that the problem of global weakening caused by an attention mechanism is effectively solved;
step 2.2: for local branching, firstly, inputting a feature vector, reducing the number of channels of the input feature vector H (x) to be 1/r of the original number by point convolution of 1 x 1, then activating a function, and finally, performing dimension raising by the point convolution of 1 x 1 to restore the number of the channels to be the same as the number of the original channels; the calculation formula of the channel attention of the local feature L (X) is as follows:
Figure DEST_PATH_IMAGE002
wherein
Figure DEST_PATH_IMAGE004
Represents a dot convolution of 1 by 1, B represents the BatchNorm layer,
Figure DEST_PATH_IMAGE006
it is shown that the function of the activation of the mesh,
Figure DEST_PATH_IMAGE008
represents a point convolution of 1 by 1, r is the channel scaling ratio,
Figure DEST_PATH_IMAGE010
representing feature vectors extracted by local feature channels; the other branch is the channel attention of the input features passing through the global features, and compared with the local branch, the global average pooling operation is firstly carried out on the input features; the calculation formula L (X) of the channel attention of the global feature is as follows:
Figure DEST_PATH_IMAGE012
wherein
Figure DEST_PATH_IMAGE014
Represents the global average pooling layer or layers,
Figure DEST_PATH_IMAGE016
representing feature vectors extracted by the global feature channel;
step 2.3: the obtained feature vector
Figure 12434DEST_PATH_IMAGE010
And
Figure 194017DEST_PATH_IMAGE016
performing feature fusion, activating by using an activation function to obtain a weight matrix, endowing the weight matrix to the feature map, adding the obtained new feature map and the input feature map through a residual structure, and calculating according to the following formula:
Figure DEST_PATH_IMAGE018
4. the pedestrian re-identification method based on residual multi-channel attention multi-feature fusion as claimed in claim 1, wherein the step 3 comprises the following sub-steps:
step 3.1: shallow layer feature f extracted by RRMCA _1 module in RRMCA-ResNet50 network 1 Intermediate layer characteristic f extracted by RRMCA _3 module 2 And deep features f extracted through global network 3 The dimensions are respectively 256 dimensions, 1024 dimensions and 2048 dimensions; for the three different scales of the feature map f 1 、f 2 、f 3 Performing pooling treatment on the three branch networks through a generalized average pooling layer, performing normalization treatment on the three branch networks through a BN layer, wherein output characteristic graphs obtained in the three branch networks are f 1 ′、f 2 ' and f 3 ', both 2048-dimensional;
step 3.2: the characteristic f obtained after normalization processing in the step 3.1 1 ′、f 2 ' and f 3 'stitching, multi-scale feature fusion in channel stacking manner f' = [ f 1 ′,f 2 ′, f 3 ′];
Where f1' is a low-level feature, containing more detailed information of the image, f 2 ' and f 3 The pedestrian detection method has the advantages that the pedestrian detection method is high-level features, stronger semantic information is achieved, the high-level features and the low-level features are fused to represent pedestrians, the influence of information loss caused by convolution and pooling can be effectively reduced, and the discriminability of the features is enhanced.
5. The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion as claimed in claim 1, wherein the step 4 comprises the following steps:
in the training stage of the model, the triple loss function and the central loss function are used for calculating features extracted after the generalized average pooling layer, and optimizing Euclidean distance between image pairs, so that the distance between any target sample and a positive sample is minimum, and the distance between any target sample and a negative sample is maximum; the Softmax loss is used for calculating features extracted after the full connection layer to optimize cosine distances between image pairs, and the Softmax loss, the triple loss and the center loss are used for jointly optimizing a model to enable the model to learn a feature space which is more compact in a class and more separated between classes; the loss function is defined as follows:
Figure DEST_PATH_IMAGE020
wherein
Figure DEST_PATH_IMAGE022
Is the first in the model
Figure 508324DEST_PATH_IMAGE022
A total of three branches,
Figure DEST_PATH_IMAGE024
the weights of the respective loss functions are represented separately.
6. The pedestrian re-identification method based on residual multi-channel attention multi-feature fusion as claimed in claim 1, wherein the step 5 comprises the following steps:
aiming at a query set and a candidate set contained in a pedestrian re-identification public data set, calculating Euclidean distances of specified objects in the query set and each object in the candidate set, and then performing ascending sorting on the calculated distances to obtain a sorting result of pedestrian re-identification; and judging the model training condition by adopting performance indexes such as first hit rate (Rank-1), average accuracy (mAP) and the like.
CN202211481147.8A 2022-11-24 2022-11-24 Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion Pending CN115830531A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211481147.8A CN115830531A (en) 2022-11-24 2022-11-24 Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211481147.8A CN115830531A (en) 2022-11-24 2022-11-24 Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion

Publications (1)

Publication Number Publication Date
CN115830531A true CN115830531A (en) 2023-03-21

Family

ID=85531028

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211481147.8A Pending CN115830531A (en) 2022-11-24 2022-11-24 Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion

Country Status (1)

Country Link
CN (1) CN115830531A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116580428A (en) * 2023-07-11 2023-08-11 中国民用航空总局第二研究所 Pedestrian re-recognition method based on multi-scale channel attention mechanism
CN116665019A (en) * 2023-07-31 2023-08-29 山东交通学院 Multi-axis interaction multi-dimensional attention network for vehicle re-identification
CN116894802A (en) * 2023-09-11 2023-10-17 苏州思谋智能科技有限公司 Image enhancement method, device, computer equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116580428A (en) * 2023-07-11 2023-08-11 中国民用航空总局第二研究所 Pedestrian re-recognition method based on multi-scale channel attention mechanism
CN116665019A (en) * 2023-07-31 2023-08-29 山东交通学院 Multi-axis interaction multi-dimensional attention network for vehicle re-identification
CN116665019B (en) * 2023-07-31 2023-09-29 山东交通学院 Multi-axis interaction multi-dimensional attention network for vehicle re-identification
CN116894802A (en) * 2023-09-11 2023-10-17 苏州思谋智能科技有限公司 Image enhancement method, device, computer equipment and storage medium
CN116894802B (en) * 2023-09-11 2023-12-15 苏州思谋智能科技有限公司 Image enhancement method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109949317B (en) Semi-supervised image example segmentation method based on gradual confrontation learning
CN109543606B (en) Human face recognition method with attention mechanism
CN109657584B (en) Improved LeNet-5 fusion network traffic sign identification method for assisting driving
CN110084151B (en) Video abnormal behavior discrimination method based on non-local network deep learning
CN110414432A (en) Training method, object identifying method and the corresponding device of Object identifying model
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN115830531A (en) Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion
CN111414862B (en) Expression recognition method based on neural network fusion key point angle change
CN111652293B (en) Vehicle weight recognition method for multi-task joint discrimination learning
CN106415594A (en) A method and a system for face verification
CN109255289B (en) Cross-aging face recognition method based on unified generation model
CN111832615A (en) Sample expansion method and system based on foreground and background feature fusion
CN115995018A (en) Long tail distribution visual classification method based on sample perception distillation
CN115035361A (en) Target detection method and system based on attention mechanism and feature cross fusion
Cai et al. Multi-target pan-class intrinsic relevance driven model for improving semantic segmentation in autonomous driving
CN112084895B (en) Pedestrian re-identification method based on deep learning
CN112163490A (en) Target detection method based on scene picture
CN116152658A (en) Forest fire smoke detection method based on domain countermeasure feature fusion network
CN113298817A (en) High-accuracy semantic segmentation method for remote sensing image
Li et al. Efficient detection in aerial images for resource-limited satellites
CN117011883A (en) Pedestrian re-recognition method based on pyramid convolution and transducer double branches
CN111898400A (en) Fingerprint activity detection method based on multi-modal feature fusion
CN115331135A (en) Method for detecting Deepfake video based on multi-domain characteristic region standard score difference
CN109871835B (en) Face recognition method based on mutual exclusion regularization technology
Pang et al. Target tracking based on siamese convolution neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination