CN112836637B - Pedestrian re-identification method based on space reverse attention network - Google Patents
Pedestrian re-identification method based on space reverse attention network Download PDFInfo
- Publication number
- CN112836637B CN112836637B CN202110146335.4A CN202110146335A CN112836637B CN 112836637 B CN112836637 B CN 112836637B CN 202110146335 A CN202110146335 A CN 202110146335A CN 112836637 B CN112836637 B CN 112836637B
- Authority
- CN
- China
- Prior art keywords
- attention
- features
- identification
- network
- pedestrian
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
Abstract
The invention discloses a pedestrian re-identification method based on a space reverse attention network, which comprises the following steps: collecting the shot pictures and dividing the pictures into a training set and a testing set; constructing a space reverse attention network model based on Resnet-50, training the convolutional neural network according to the training set, and adding CBAM-Pro; dividing the network into two branches according to the added CBAM-Pro, simultaneously executing forward learning and reverse attention, and extracting forward and reverse global features and local features; and connecting the extracted features according to the channel dimensions to obtain pedestrian identification features containing various types of features, and performing re-identification verification on the pedestrian identification features by using the test set to complete re-identification of pedestrians. The invention extracts various types of pedestrian identification features based on the space reverse attention network, and improves the effectiveness and reliability of re-identification.
Description
Technical Field
The invention relates to the technical field of intelligent security, in particular to a pedestrian re-identification method based on a space reverse attention network.
Background
The pedestrian re-identification has a great demand in the field of intelligent security, aims to associate the same pedestrians at different time and different places, and generally provides a to-be-retrieved picture of a pedestrian, extracts the characteristics of the query picture and the pictures in the image library through a trained model, and sorts the pictures in the image library according to the similarity of the characteristics, so as to retrieve the images of the pedestrian. In recent years, the task of re-identifying pedestrians has been greatly developed, but because in an open outdoor environment, images of pedestrians can generate large differences due to the existence of interferences such as postures, shading, clothes, background clutter, camera view angles and the like, the task of re-identifying pedestrians is still a very challenging task.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made in view of the above-mentioned conventional problems.
Therefore, the technical problem solved by the invention is as follows: the prior art cannot extract abundant pedestrian features, so that higher re-identification accuracy cannot be obtained.
In order to solve the technical problems, the invention provides the following technical scheme: collecting the shot pictures and dividing the pictures into a training set and a testing set; constructing a space reverse attention network model based on Resnet-50, training the convolutional neural network according to the training set, and adding CBAM-Pro; dividing the network into two branches according to the added CBAM-Pro, simultaneously executing forward learning and reverse attention, and extracting forward and reverse global features and local features; and connecting the extracted features according to the channel dimensions to obtain pedestrian identification features containing various types of features, and performing re-identification verification on the pedestrian identification features by using the test set to complete re-identification of pedestrians.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the forward learning and reverse attention executing process comprises the steps that after the network passes through CBAM-Pro, the network is divided into two branches, one branch is normally trained, namely forward learning is carried out, then a gradient-guided space attention is utilized to obtain a reverse mask, and therefore reverse attention is carried out on the other branch.
Space-based reverse attention network as described in the present inventionA preferable aspect of the pedestrian re-identification method of (1), wherein: the generation of the spatial attention map comprises that a given characteristic map F epsilon RC×H×WAnd its back-propagating gradient tensor G ∈ RC ×H×WWhere C is the number of channels in the feature map and H × W represents the size of the feature map, first a weight vector W ∈ R is generated for G using global average poolingC×1Then, the gradient-guided spatial attention is calculated.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the spatial attention is intended to further include,
wherein wiThe i-th element, F, of W(i)The high value of M characterizes the more interesting positions in the feature map.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the spatial reverse mask may comprise a spatial reverse mask,
wherein, aiAnd miThe elements at pixel position i of a and M are indicated, respectively, and T indicates the set spatial attention threshold.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the overall loss function for training the convolutional neural network includes,
L=β1Lsoftmax+β2Ltriplet
Wherein L issoftmaxRepresents the sum of cross-entropy losses of all features, LtripletRepresents the sum of the triad losses, beta1、β2Represents the equilibrium parameters, and is defined as beta in the experiment1=2,β2=1。
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the sum of the cross-entropy losses of all the features includes,
wherein C represents the number of classes in the data set, W represents the weight vector of the corresponding class, N represents the experimental batch size, fiRepresenting the features in each batch.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the sum of the losses of the triad includes,
wherein the content of the first and second substances,respectively representing the characteristics of the anchor identity, the positive samples and the negative samples, and alpha represents the margin parameter of the loss of the triples.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the CBAM-Pro representation improved convolution block attention model comprises an efficient channel attention module utilizing ECANet to improve CBAM, the channel weight feature vector comprises,
w=σ(CIDk(Cavg)+CIDk(Cmax))
wherein σ represents Sigmoid activation function, CIDkRepresenting a one-dimensional convolution operation with a convolution kernel size of k.
The invention has the beneficial effects that: the invention extracts various pedestrian identification characteristics based on the space reverse attention network, and improves the effectiveness and reliability of re-identification.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor. Wherein:
fig. 1 is a schematic basic flow chart of a pedestrian re-identification method based on a spatial reverse attention network according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a spatial reverse attention network model of a pedestrian re-identification method based on a spatial reverse attention network according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an experimental result of a channel attention neighborhood parameter k of a pedestrian re-identification method based on a spatial reverse attention network according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Furthermore, the references herein to "one embodiment" or "an embodiment" refer to a particular feature, structure, or characteristic that may be included in at least one implementation of the present invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected" and "connected" in the present invention are to be construed broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1
The pedestrian re-identification task aims to find the same pedestrian under different cameras, and although the development of deep learning brings great improvement to the pedestrian re-identification, the pedestrian re-identification task is still a challenging task. In recent years, attention mechanisms are widely verified to have excellent effects on the task of re-identifying pedestrians, but the combined use effects of different types of attention mechanisms (such as space attention, self-attention and the like) still need to be explored.
Referring to fig. 1 to 3, an embodiment of the present invention provides a pedestrian re-identification method based on a spatial reverse attention network, including:
s1: collecting the shot pictures and dividing the pictures into a training set and a testing set;
s2: constructing a space reverse attention network model based on Resnet-50, training a convolutional neural network according to a training set, and adding CBAM-Pro;
It should be noted that, the overall loss function for training the convolutional neural network includes,
L=β1Lsoftmax+β2Ltriplet
wherein L issoftmaxRepresenting the sum of the cross-entropy losses of all features, LtripletRepresenting the sum of triad losses, beta1、β2Represents the equilibrium parameters, and is defined as beta in the experiment1=2,β2=1。
Wherein the sum of the cross-entropy losses of all features comprises,
wherein C represents the number of classes in the data set, W represents the weight vector of the corresponding class, N represents the experimental batch size, fiRepresenting the features in each batch.
The sum of the losses of the triad is included,
wherein the content of the first and second substances,respectively representing the characteristics of the anchor identity, the positive samples and the negative samples, and alpha represents the margin parameter of the loss of the triples.
Further, CBAM-Pro represents an improved convolution block attention model including,
the CBAM is improved by an efficient channel attention module of ECANet, the channel weight feature vector includes,
w=σ(CIDk(Cavg)+CIDk(Cmax))
wherein σ represents Sigmoid activation function, CIDkRepresenting a one-dimensional convolution operation with a convolution kernel size of k.
Specifically, the invention provides an improved convolution block attention model CBAM-Pro by improving CBAM, because CBAM considers the attention of channel and space, when the CBAM is improved, the operation of space dimension still follows CBAM, the invention mainly focuses on the channel attention module of CBAM, actually, the multilayer perceptron used by CBAM in channel attention is a Squeeze-and-Excitation module, attention weight distribution is carried out by two fully connected layers, and the invention introduces an efficient channel attention module of ECANet to improve CBAM by considering that ECANet proves that Squeeze operation can bring negative effect on the prediction of channel attention.
Firstly, the characteristic diagram F is respectively passed through two pooling layers to obtain CavgAnd CmaxAnd then replacing the multilayer perceptron with an efficient channel attention module to distribute attention weights, wherein the efficient channel attention module captures information interaction between channels by paying attention to one channel and k neighbors adjacent to the channel so as to distribute the attention weights, so that the method can be easily realized by only one-dimensional convolution, and a better effect is achieved compared with the method of compressing a channel domain by using the Squeeze operation. Where the efficient channel attention module is used here as for the multi-layered perceptron in CBAM for CavgAnd CmaxAlso sharing parameters, and then obtaining channel weight feature vectors by element-by-element addition and Sigmoid operation:
w=σ(CIDk(Cavg)+CIDk(Cmax))
wherein σ represents Sigmoid activation function, CIDkThe one-dimensional convolution operation with the convolution kernel size k is shown, the CBAM-Pro not only retains the excellent characteristics of the CBAM, but also introduces the ECANet to the improvement of the Squeeze-and-Excitation module, so that the performance is more excellent, the selection of the parameter k is particularly important for the cross-channel information interaction module, and as can be seen from the graph 3, the neighborhood parameter k of the CBAM-Pro is tested on the Market1501 data set. To eliminateThe method uses a single-path global feature model as a Baseline (Baseline), and respectively compares 6 groups of settings in the experiment, namely, the Baseline experiment, the addition of CBAM and the addition of CBAM-Pro with different neighborhood parameters k, so that the improvement of the model of the Baseline with the addition of CBAM on mAP/rank-1 indexes can be seen from the figure, on the basis, the addition of the CBAM-Pro model with the different parameters k into the Baseline can find that the model result is always superior to the addition of CBAM no matter how the k value changes, and the effectiveness of the improvement of the CBAM-Pro is verified. In addition, it can be seen that mAP/rank-1 achieves the best results with k being 7. According to ECANet, the selection of the k value has a certain relation with the model and the number of characteristic diagram channels, ResNet-50 has better performance for larger k value, the number of the characteristic diagram channels using the model is 512, and k is obtained to be 5 by the method of automatically calculating the k value proposed by ECANet. From this, it is reasonable to find that the model achieves the optimal result when k is 7 under the combined action of two factors.
S3: dividing the network into two branches according to the added CBAM-Pro, simultaneously executing forward learning and reverse attention, and extracting forward and reverse global features and local features;
it should be noted that the forward learning and reverse attention performing process includes:
after the network passes through CBAM-Pro, the network is divided into two branches, one branch is normally trained, namely forward learning is carried out, and then reverse masks are obtained by utilizing the spatial attention of gradient guidance, so that the other branch is reversely noticed.
Wherein, the generation of the space attention graph comprises the following steps:
given characteristic diagram F ∈ RC×H×WAnd its counter-propagating gradient tensor G ∈ RC×H×WWhere C is the number of channels in the feature map, H × W represents the size of the feature map, and first a global average pooling is used on G to generate a weight vector W ∈ RC×1Then, the gradient-guided spatial attention is calculated.
Spatial attention is further directed to the fact that,
wherein wiThe i-th element, F, of W(i)The high value of M characterizes the more interesting positions in the feature map.
Wherein the spatial inverse mask comprises, in part,
wherein, aiAnd miThe elements at pixel position i of a and M are indicated, respectively, and T indicates the set spatial attention threshold.
Specifically, in the training phase of the convolutional neural network, the gradient of the feature map in the back propagation operation characterizes the sensitivity of different positions of the feature map to prediction perception, that is, even a slight change in the gradient of a certain position can have a strong influence on the prediction result, and the network update can focus on the position element more. Based on this, the attention of each spatial pixel location is characterized by a gradient, thereby generating a visualized attention thermal image. Given characteristic diagram F ∈ RC×H×WAnd its counter-propagating gradient tensor G ∈ RC×H×WWhere C denotes the number of channels of the feature map, and H × W denotes the size of the feature map. First, a weight vector W ∈ R is generated for G by using global average poolingC×1Then, the gradient-guided spatial attention is calculated:
wherein, wiThe i-th element, F, of W(i)The subgraph of the ith channel of the feature map F is shown, with the high value of M characterizing the more interesting positions in the feature map.
In the training process, the gradient descent algorithm forces to be converged to the most sensitive position in the image, so that for the recognition task, many less sensitive positions in the image are ignored, the gradient-guided attention image is subjected to threshold processing, the sensitive positions in the original feature image are shielded by using an inverse mask, so that the network is forced to recognize according to the insensitive positions of the image, a gradient-guided spatial attention image M is defined, and a spatial inverse mask A can be obtained:
Wherein, aiAnd miRepresenting the elements at pixel location i of a and M, respectively, and T representing a set spatial attention threshold, multiplying mask a with the corresponding feature map may force the network to learn from locations where the feature map is insensitive.
Further, as shown in fig. 2, the network structure used in the present invention is a network structure, where the network infrastructure adopts Resnet-50 which is most frequently used in the task of pedestrian re-identification, CBAM-Pro is added to the network after res _ conv _3 for attention learning, the network is divided into two branches after CBAM-Pro, and after normal training (referred to as forward learning) is performed on one branch, a gradient-guided spatial attention is used to obtain a reverse mask, so that reverse attention is performed on the other branch.
Specifically, for the branch of forward attention, the feature map F of the res _ conv _4 layer is obtained by a gradient echo propagation algorithm1Calculating a spatial attention map M guided by the gradient, performing threshold processing and 0-1 negation on the M to obtain a reverse mask A, and obtaining the reverse mask A and a feature map F passing through a res _ conv _4 layer on the reverse attention branch2Multiply element by element, make F2And shielding the global features of the two branches to a high attention position so as to force the network to perform attention learning on a reverse attention branch and a low sensitive position, respectively processing the global features of the two branches through the remaining convolution layers and then using global average-potential (GAP) to pass through dimension reduction layers including 1 × 1 convolution, batch normalization and ReLU activation to obtain two 256-dimensional independent attention global features, wherein the calculation process of the reverse mask of the dotted part is only used in a training stage.
In addition, a sub-branch is divided after the res _ conv _4 layer in the two attention branches for extracting local features, in order to keep the local features to have proper receiving domains, res _ conv _5 of the local feature branches does not use down-sampling operation, after res _ conv _5 is passed and respective local feature maps are divided, global max-posing (GMP) is used for each obtained local feature map, different from GAP used for the global feature map, GMP is used for the local feature maps to be more beneficial to digging out the local features with most discriminativity, each 256-dimensional local feature is obtained through a corresponding dimension reduction layer, and finally, the global feature and the local feature are connected according to channel dimensions to obtain the final pedestrian distinguishing feature containing the multi-type features.
S4: and connecting the extracted features according to the channel dimensions to obtain pedestrian identification features containing various types of features, and re-identifying and verifying the pedestrian identification features by using a test set to complete re-identification of the pedestrians.
The invention improves the convolution block attention model, combines the improved convolution block attention model and the space reverse attention network model on the basis to obtain a joint attention module capable of extracting different attention characteristics, and aims at the pedestrian re-identification task based on the joint attention module and introduces local branches.
Example 2
In order to verify the technical effects adopted in the method, the embodiment adopts the traditional technical scheme and the method of the invention to carry out comparison test, and compares the test results by means of scientific demonstration to verify the real effect of the method.
The embodiment performs experiments on three data sets, namely Marker-1501, DukeMTMC-reiD and CUHK03, which are most frequently used by the task of pedestrian re-identification, and evaluates the experimental results by using the first successful matching probability (rank-1) and the mean average precision (mAP).
The Marker-1501 comprises 1501 pedestrians with different identities shot by 6 cameras, 32668 pictures containing single pedestrians are generated by a DPM detector and are divided into non-overlapping training/testing sets, 12936 pictures containing 751 pedestrians with different identities are generated in the training sets, 3368 query pictures and 19732 gallery pictures from 750 pedestrians with different identities are generated in the testing sets, and a detection frame of the query pictures is drawn manually, so that the accuracy of a testing result is ensured; DukeMTMC-reID is a pedestrian re-identification subset of the DukeMTMC dataset, acquired using 8 cameras, including 36411 pictures of 1812 pedestrian identities, 1404 pedestrians appearing under more than 2 cameras in the pictures, dividing the 1404 pedestrians ' pictures into a training set and a test set by random sampling, which respectively include 702 pedestrians ' pictures, the remaining 408 pedestrians appearing under only 1 camera, adding their pictures to the test set's gallery as interference, the training set including 16522 pictures, the gallery including 17661 pictures, the query set including 2228 pictures, the CUHK03 dataset including 14097 pictures of 1467 pedestrians, each identity being acquired by 2 different cameras, 767 pictures of the identities for training, and 700 pictures of the identities for testing, the dataset providing both manual and automatic detector tagging, this example was tested under both labeled data sets.
The spatial inverse attention network adopts a direct division method to extract local features, an input picture is adjusted to 384 x 128 in a training stage, then data enhancement is carried out by adopting random horizontal inversion, standardization and random erasure, the picture is also adjusted to 384 x 128 in a test, only standardization processing is carried out, an infrastructure of the network adopts a network model before ResNet-50 and res _ conv _3 which are pre-trained on an ImageNet data set, parameters are shared in the model, CBAM-Pro is introduced into the model, all branches divided in the network are trained in parallel, wherein all branches after res _ conv _3 in the model are initialized by using pre-training weights of a corresponding layer after res _ conv _3 of ResNet-50, an experimental batch size is set to be 32, P is randomly sampled from the training set to be 8 identities, and K is sampled from each identity to be 4 pictures, network usage Adam optimizer AdamTraining is carried out with the initial learning rate set to 3 × 10-4A total of 250 cycles of training, and when the training is performed to 150 and 230 cycles, the learning rate is decreased to 3 × 10, respectively-5And 3 x 10-6And the margin of the triple loss is set to be 1.2, wherein the reverse attention operation is only used in the training stage, and two branches in the testing stage directly extract the features.
The embodiment compares the results of the method of the present invention on three reference data sets with the recent method, which includes a method using an attention mechanism, a method using a partition local feature, and other advanced methods, and in order to ensure the fairness of experimental comparison, the embodiment and the compared method do not use a reordering method, and the experimental results are shown in the following table.
Table 1: and (4) a Market-1501 data set experiment result performance comparison table.
Table 2: DukeMTMC-reiD data set Experimental results Performance comparison (%) Table.
Table 3: CUHK03 data set Experimental results Performance comparison (%) Table.
The comparison of the performances of the Market-1501 and DukeMTMC-reiD data sets is respectively shown in Table 1 and Table 2, and it can be seen that the method of the present invention is superior to other methods using an attention mechanism, for example, for Auto-Reid with excellent performance, mAP is respectively promoted by 2.72%/3.63% and rank-1 is respectively promoted by 0.80%/0.70%, MGN is still in an advantage in the method based on dividing local features, rank-1 on the Market-1501 is only slightly lower than MGN, and mAP and rank-1 indexes of mAP and DukeMTMC-reiD are both better than MGN, especially for mAP index, the MGN is greatly promoted by 0.92%/0.33% on both data sets compared with MGN; compared with other advanced methods, the method has excellent results, and compared with BDB, the method respectively improves the two data sets mAP by 1.12%/2.73%, and improves the rank-1 of Duke MTMC-reiD by 0.20%; the comparison of the performance of the data sets of CUHK03 is shown in Table 3, the existing methods such as Auto-ReiD and BDB have excellent results on two labeling sets of CUHK03, EANet has good performance on a detection labeling set, under the two labeling sets, the results of the method of the invention are superior to the methods, compared with the BDB with the most excellent performance, mAP/rank-1 on a manual labeling set is respectively improved by 0.82%/0.10%, mAP/rank-1 on a detection labeling set is respectively improved by 0.11%/0.67%, it is noted that MGN has excellent performance on Market-1501 and DukeMTMC-reiD, but the performance on CUHK03 is poor, and compared with the three data sets, the method of the invention has good performance.
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.
Claims (1)
1. A pedestrian re-identification method based on a space reverse attention network is characterized by comprising the following steps:
collecting the shot pictures and dividing the pictures into a training set and a testing set;
constructing a space reverse attention network model based on Resnet-50, training a convolutional neural network according to the training set, and adding CBAM-Pro;
dividing the network into two branches according to the added CBAM-Pro, simultaneously executing forward learning and reverse attention, and extracting forward and reverse global features and local features;
obtaining pedestrian identification features containing various types of features based on the extracted features and connected according to channel dimensions, and performing re-identification verification on the pedestrian identification features by using the test set to complete re-identification of pedestrians;
The forward learning, reverse attention performing process includes,
after the network passes through CBAM-Pro, the network is divided into two branches, one branch is normally trained, namely forward learning is carried out, and then a space reverse mask is obtained by utilizing a space attention force guided by gradient, so that the other branch is reversely noticed;
the generation of the spatial attention map includes,
given characteristic diagram F epsilon RC×H×WAnd its counter-propagating gradient tensor G ∈ RC×H×WWhere C is the number of channels of the feature map and H × W represents the size of the feature map, first a weight vector W ∈ R is generated for G using global average poolingC×1Then calculating a gradient-guided spatial attention map;
the spatial attention is intended to further encompass,
wherein wiThe i-th element, F, of W(i)A subgraph representing the ith channel of the feature map F, wherein the high value of M represents a more concerned position in the feature map;
the spatial inverse mask includes a spatial inverse mask including,
wherein, aiAnd miElements at pixel position i of a and M are represented, respectively, and T represents a set spatial attention threshold;
the overall loss function for training the convolutional neural network includes,
L=β1Lsoftmax+β2Ltriplet
wherein L issoftmaxRepresents the sum of cross-entropy losses of all features, LtripletRepresenting the sum of triad losses, beta 1、β2Represents the equilibrium parameters, and is defined as beta in the experiment1=2,β2=1;
The sum of the cross-entropy losses of all the features includes,
wherein C represents the number of classes in the data set, W represents the weight vector of the corresponding class, N represents the experimental batch size, fiRepresenting features in each batch;
the sum of the losses of the triad includes,
wherein the content of the first and second substances,respectively representing the characteristics of an anchor identity, a positive sample and a negative sample, wherein alpha represents a margin parameter of a triple loss;
the CBAM-Pro representation modified convolution block attention model includes,
the CBAM is improved with an efficient channel attention module of ECANet, the channel weight feature vector comprising,
w=σ(CIDk(Cavg)+CIDk(Cmax))
wherein σ represents Sigmoid activation function, CIDkRepresenting a one-dimensional convolution operation with a convolution kernel size of k.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110146335.4A CN112836637B (en) | 2021-02-03 | 2021-02-03 | Pedestrian re-identification method based on space reverse attention network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110146335.4A CN112836637B (en) | 2021-02-03 | 2021-02-03 | Pedestrian re-identification method based on space reverse attention network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112836637A CN112836637A (en) | 2021-05-25 |
CN112836637B true CN112836637B (en) | 2022-06-14 |
Family
ID=75931804
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110146335.4A Active CN112836637B (en) | 2021-02-03 | 2021-02-03 | Pedestrian re-identification method based on space reverse attention network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112836637B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113657355A (en) * | 2021-10-20 | 2021-11-16 | 之江实验室 | Global and local perception pedestrian re-identification method fusing segmentation information |
CN115393788B (en) * | 2022-08-03 | 2023-04-18 | 华中农业大学 | Multi-scale monitoring pedestrian re-identification method based on global information attention enhancement |
CN115862073B (en) * | 2023-02-27 | 2023-07-04 | 国网江西省电力有限公司电力科学研究院 | Substation hazard bird species target detection and identification method based on machine vision |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111368815A (en) * | 2020-05-28 | 2020-07-03 | 之江实验室 | Pedestrian re-identification method based on multi-component self-attention mechanism |
CN111539370A (en) * | 2020-04-30 | 2020-08-14 | 华中科技大学 | Image pedestrian re-identification method and system based on multi-attention joint learning |
CN111898736A (en) * | 2020-07-23 | 2020-11-06 | 武汉大学 | Efficient pedestrian re-identification method based on attribute perception |
CN111931624A (en) * | 2020-08-03 | 2020-11-13 | 重庆邮电大学 | Attention mechanism-based lightweight multi-branch pedestrian heavy identification method and system |
CN112183468A (en) * | 2020-10-27 | 2021-01-05 | 南京信息工程大学 | Pedestrian re-identification method based on multi-attention combined multi-level features |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11138469B2 (en) * | 2019-01-15 | 2021-10-05 | Naver Corporation | Training and using a convolutional neural network for person re-identification |
CN110070073A (en) * | 2019-05-07 | 2019-07-30 | 国家广播电视总局广播电视科学研究院 | Pedestrian's recognition methods again of global characteristics and local feature based on attention mechanism |
CN111325111A (en) * | 2020-01-23 | 2020-06-23 | 同济大学 | Pedestrian re-identification method integrating inverse attention and multi-scale deep supervision |
CN111507217A (en) * | 2020-04-08 | 2020-08-07 | 南京邮电大学 | Pedestrian re-identification method based on local resolution feature fusion |
CN111881780A (en) * | 2020-07-08 | 2020-11-03 | 上海蠡图信息科技有限公司 | Pedestrian re-identification method based on multi-layer fusion and alignment division |
-
2021
- 2021-02-03 CN CN202110146335.4A patent/CN112836637B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111539370A (en) * | 2020-04-30 | 2020-08-14 | 华中科技大学 | Image pedestrian re-identification method and system based on multi-attention joint learning |
CN111368815A (en) * | 2020-05-28 | 2020-07-03 | 之江实验室 | Pedestrian re-identification method based on multi-component self-attention mechanism |
CN111898736A (en) * | 2020-07-23 | 2020-11-06 | 武汉大学 | Efficient pedestrian re-identification method based on attribute perception |
CN111931624A (en) * | 2020-08-03 | 2020-11-13 | 重庆邮电大学 | Attention mechanism-based lightweight multi-branch pedestrian heavy identification method and system |
CN112183468A (en) * | 2020-10-27 | 2021-01-05 | 南京信息工程大学 | Pedestrian re-identification method based on multi-attention combined multi-level features |
Also Published As
Publication number | Publication date |
---|---|
CN112836637A (en) | 2021-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112836637B (en) | Pedestrian re-identification method based on space reverse attention network | |
Huang et al. | Instance-aware image and sentence matching with selective multimodal lstm | |
CN106919920B (en) | Scene recognition method based on convolution characteristics and space vision bag-of-words model | |
Tian et al. | A dual neural network for object detection in UAV images | |
CN111738143B (en) | Pedestrian re-identification method based on expectation maximization | |
CN114005096A (en) | Vehicle weight recognition method based on feature enhancement | |
Li et al. | HAR-Net: Joint learning of hybrid attention for single-stage object detection | |
Biswas et al. | One shot detection with laplacian object and fast matrix cosine similarity | |
CN108154133B (en) | Face portrait-photo recognition method based on asymmetric joint learning | |
CN111985538A (en) | Small sample picture classification model and method based on semantic auxiliary attention mechanism | |
CN105989336A (en) | Scene identification method based on deconvolution deep network learning with weight | |
CN108805102A (en) | A kind of video caption detection and recognition methods and system based on deep learning | |
Fan et al. | A hierarchical Dirichlet process mixture of generalized Dirichlet distributions for feature selection | |
Pratama et al. | Face recognition for presence system by using residual networks-50 architecture | |
Li et al. | Enhanced bird detection from low-resolution aerial image using deep neural networks | |
Kalliatakis et al. | Exploring object-centric and scene-centric CNN features and their complementarity for human rights violations recognition in images | |
CN115482508A (en) | Reloading pedestrian re-identification method, reloading pedestrian re-identification device, reloading pedestrian re-identification equipment and computer-storable medium | |
Zhou et al. | Exploiting visual context semantics for sound source localization | |
CN111582057B (en) | Face verification method based on local receptive field | |
Zhang et al. | Video action recognition with Key-detail Motion Capturing based on motion spectrum analysis and multiscale feature fusion | |
CN112347965A (en) | Video relation detection method and system based on space-time diagram | |
Gowada et al. | Unethical human action recognition using deep learning based hybrid model for video forensics | |
Dong et al. | Scene-oriented hierarchical classification of blurry and noisy images | |
CN116229580A (en) | Pedestrian re-identification method based on multi-granularity pyramid intersection network | |
Sathiyaprasad et al. | Content based video retrieval using Improved gray level Co-occurrence matrix with region-based pre convoluted neural network–RPCNN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |