CN112836637B - Pedestrian re-identification method based on space reverse attention network - Google Patents

Pedestrian re-identification method based on space reverse attention network Download PDF

Info

Publication number
CN112836637B
CN112836637B CN202110146335.4A CN202110146335A CN112836637B CN 112836637 B CN112836637 B CN 112836637B CN 202110146335 A CN202110146335 A CN 202110146335A CN 112836637 B CN112836637 B CN 112836637B
Authority
CN
China
Prior art keywords
attention
features
identification
network
pedestrian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110146335.4A
Other languages
Chinese (zh)
Other versions
CN112836637A (en
Inventor
宋晓宁
王鹏
冯振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202110146335.4A priority Critical patent/CN112836637B/en
Publication of CN112836637A publication Critical patent/CN112836637A/en
Application granted granted Critical
Publication of CN112836637B publication Critical patent/CN112836637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Abstract

The invention discloses a pedestrian re-identification method based on a space reverse attention network, which comprises the following steps: collecting the shot pictures and dividing the pictures into a training set and a testing set; constructing a space reverse attention network model based on Resnet-50, training the convolutional neural network according to the training set, and adding CBAM-Pro; dividing the network into two branches according to the added CBAM-Pro, simultaneously executing forward learning and reverse attention, and extracting forward and reverse global features and local features; and connecting the extracted features according to the channel dimensions to obtain pedestrian identification features containing various types of features, and performing re-identification verification on the pedestrian identification features by using the test set to complete re-identification of pedestrians. The invention extracts various types of pedestrian identification features based on the space reverse attention network, and improves the effectiveness and reliability of re-identification.

Description

Pedestrian re-identification method based on space reverse attention network
Technical Field
The invention relates to the technical field of intelligent security, in particular to a pedestrian re-identification method based on a space reverse attention network.
Background
The pedestrian re-identification has a great demand in the field of intelligent security, aims to associate the same pedestrians at different time and different places, and generally provides a to-be-retrieved picture of a pedestrian, extracts the characteristics of the query picture and the pictures in the image library through a trained model, and sorts the pictures in the image library according to the similarity of the characteristics, so as to retrieve the images of the pedestrian. In recent years, the task of re-identifying pedestrians has been greatly developed, but because in an open outdoor environment, images of pedestrians can generate large differences due to the existence of interferences such as postures, shading, clothes, background clutter, camera view angles and the like, the task of re-identifying pedestrians is still a very challenging task.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made in view of the above-mentioned conventional problems.
Therefore, the technical problem solved by the invention is as follows: the prior art cannot extract abundant pedestrian features, so that higher re-identification accuracy cannot be obtained.
In order to solve the technical problems, the invention provides the following technical scheme: collecting the shot pictures and dividing the pictures into a training set and a testing set; constructing a space reverse attention network model based on Resnet-50, training the convolutional neural network according to the training set, and adding CBAM-Pro; dividing the network into two branches according to the added CBAM-Pro, simultaneously executing forward learning and reverse attention, and extracting forward and reverse global features and local features; and connecting the extracted features according to the channel dimensions to obtain pedestrian identification features containing various types of features, and performing re-identification verification on the pedestrian identification features by using the test set to complete re-identification of pedestrians.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the forward learning and reverse attention executing process comprises the steps that after the network passes through CBAM-Pro, the network is divided into two branches, one branch is normally trained, namely forward learning is carried out, then a gradient-guided space attention is utilized to obtain a reverse mask, and therefore reverse attention is carried out on the other branch.
Space-based reverse attention network as described in the present inventionA preferable aspect of the pedestrian re-identification method of (1), wherein: the generation of the spatial attention map comprises that a given characteristic map F epsilon RC×H×WAnd its back-propagating gradient tensor G ∈ RC ×H×WWhere C is the number of channels in the feature map and H × W represents the size of the feature map, first a weight vector W ∈ R is generated for G using global average poolingC×1Then, the gradient-guided spatial attention is calculated.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the spatial attention is intended to further include,
Figure BDA0002930542490000021
wherein wiThe i-th element, F, of W(i)The high value of M characterizes the more interesting positions in the feature map.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the spatial reverse mask may comprise a spatial reverse mask,
Figure BDA0002930542490000022
wherein, aiAnd miThe elements at pixel position i of a and M are indicated, respectively, and T indicates the set spatial attention threshold.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the overall loss function for training the convolutional neural network includes,
L=β1Lsoftmax2Ltriplet
Wherein L issoftmaxRepresents the sum of cross-entropy losses of all features, LtripletRepresents the sum of the triad losses, beta1、β2Represents the equilibrium parameters, and is defined as beta in the experiment1=2,β2=1。
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the sum of the cross-entropy losses of all the features includes,
Figure BDA0002930542490000023
wherein C represents the number of classes in the data set, W represents the weight vector of the corresponding class, N represents the experimental batch size, fiRepresenting the features in each batch.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the sum of the losses of the triad includes,
Figure BDA0002930542490000031
wherein the content of the first and second substances,
Figure BDA0002930542490000032
respectively representing the characteristics of the anchor identity, the positive samples and the negative samples, and alpha represents the margin parameter of the loss of the triples.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the CBAM-Pro representation improved convolution block attention model comprises an efficient channel attention module utilizing ECANet to improve CBAM, the channel weight feature vector comprises,
w=σ(CIDk(Cavg)+CIDk(Cmax))
wherein σ represents Sigmoid activation function, CIDkRepresenting a one-dimensional convolution operation with a convolution kernel size of k.
The invention has the beneficial effects that: the invention extracts various pedestrian identification characteristics based on the space reverse attention network, and improves the effectiveness and reliability of re-identification.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor. Wherein:
fig. 1 is a schematic basic flow chart of a pedestrian re-identification method based on a spatial reverse attention network according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a spatial reverse attention network model of a pedestrian re-identification method based on a spatial reverse attention network according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an experimental result of a channel attention neighborhood parameter k of a pedestrian re-identification method based on a spatial reverse attention network according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Furthermore, the references herein to "one embodiment" or "an embodiment" refer to a particular feature, structure, or characteristic that may be included in at least one implementation of the present invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected" and "connected" in the present invention are to be construed broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1
The pedestrian re-identification task aims to find the same pedestrian under different cameras, and although the development of deep learning brings great improvement to the pedestrian re-identification, the pedestrian re-identification task is still a challenging task. In recent years, attention mechanisms are widely verified to have excellent effects on the task of re-identifying pedestrians, but the combined use effects of different types of attention mechanisms (such as space attention, self-attention and the like) still need to be explored.
Referring to fig. 1 to 3, an embodiment of the present invention provides a pedestrian re-identification method based on a spatial reverse attention network, including:
s1: collecting the shot pictures and dividing the pictures into a training set and a testing set;
s2: constructing a space reverse attention network model based on Resnet-50, training a convolutional neural network according to a training set, and adding CBAM-Pro;
It should be noted that, the overall loss function for training the convolutional neural network includes,
L=β1Lsoftmax2Ltriplet
wherein L issoftmaxRepresenting the sum of the cross-entropy losses of all features, LtripletRepresenting the sum of triad losses, beta1、β2Represents the equilibrium parameters, and is defined as beta in the experiment1=2,β2=1。
Wherein the sum of the cross-entropy losses of all features comprises,
Figure BDA0002930542490000051
wherein C represents the number of classes in the data set, W represents the weight vector of the corresponding class, N represents the experimental batch size, fiRepresenting the features in each batch.
The sum of the losses of the triad is included,
Figure BDA0002930542490000052
wherein the content of the first and second substances,
Figure BDA0002930542490000053
respectively representing the characteristics of the anchor identity, the positive samples and the negative samples, and alpha represents the margin parameter of the loss of the triples.
Further, CBAM-Pro represents an improved convolution block attention model including,
the CBAM is improved by an efficient channel attention module of ECANet, the channel weight feature vector includes,
w=σ(CIDk(Cavg)+CIDk(Cmax))
wherein σ represents Sigmoid activation function, CIDkRepresenting a one-dimensional convolution operation with a convolution kernel size of k.
Specifically, the invention provides an improved convolution block attention model CBAM-Pro by improving CBAM, because CBAM considers the attention of channel and space, when the CBAM is improved, the operation of space dimension still follows CBAM, the invention mainly focuses on the channel attention module of CBAM, actually, the multilayer perceptron used by CBAM in channel attention is a Squeeze-and-Excitation module, attention weight distribution is carried out by two fully connected layers, and the invention introduces an efficient channel attention module of ECANet to improve CBAM by considering that ECANet proves that Squeeze operation can bring negative effect on the prediction of channel attention.
Firstly, the characteristic diagram F is respectively passed through two pooling layers to obtain CavgAnd CmaxAnd then replacing the multilayer perceptron with an efficient channel attention module to distribute attention weights, wherein the efficient channel attention module captures information interaction between channels by paying attention to one channel and k neighbors adjacent to the channel so as to distribute the attention weights, so that the method can be easily realized by only one-dimensional convolution, and a better effect is achieved compared with the method of compressing a channel domain by using the Squeeze operation. Where the efficient channel attention module is used here as for the multi-layered perceptron in CBAM for CavgAnd CmaxAlso sharing parameters, and then obtaining channel weight feature vectors by element-by-element addition and Sigmoid operation:
w=σ(CIDk(Cavg)+CIDk(Cmax))
wherein σ represents Sigmoid activation function, CIDkThe one-dimensional convolution operation with the convolution kernel size k is shown, the CBAM-Pro not only retains the excellent characteristics of the CBAM, but also introduces the ECANet to the improvement of the Squeeze-and-Excitation module, so that the performance is more excellent, the selection of the parameter k is particularly important for the cross-channel information interaction module, and as can be seen from the graph 3, the neighborhood parameter k of the CBAM-Pro is tested on the Market1501 data set. To eliminateThe method uses a single-path global feature model as a Baseline (Baseline), and respectively compares 6 groups of settings in the experiment, namely, the Baseline experiment, the addition of CBAM and the addition of CBAM-Pro with different neighborhood parameters k, so that the improvement of the model of the Baseline with the addition of CBAM on mAP/rank-1 indexes can be seen from the figure, on the basis, the addition of the CBAM-Pro model with the different parameters k into the Baseline can find that the model result is always superior to the addition of CBAM no matter how the k value changes, and the effectiveness of the improvement of the CBAM-Pro is verified. In addition, it can be seen that mAP/rank-1 achieves the best results with k being 7. According to ECANet, the selection of the k value has a certain relation with the model and the number of characteristic diagram channels, ResNet-50 has better performance for larger k value, the number of the characteristic diagram channels using the model is 512, and k is obtained to be 5 by the method of automatically calculating the k value proposed by ECANet. From this, it is reasonable to find that the model achieves the optimal result when k is 7 under the combined action of two factors.
S3: dividing the network into two branches according to the added CBAM-Pro, simultaneously executing forward learning and reverse attention, and extracting forward and reverse global features and local features;
it should be noted that the forward learning and reverse attention performing process includes:
after the network passes through CBAM-Pro, the network is divided into two branches, one branch is normally trained, namely forward learning is carried out, and then reverse masks are obtained by utilizing the spatial attention of gradient guidance, so that the other branch is reversely noticed.
Wherein, the generation of the space attention graph comprises the following steps:
given characteristic diagram F ∈ RC×H×WAnd its counter-propagating gradient tensor G ∈ RC×H×WWhere C is the number of channels in the feature map, H × W represents the size of the feature map, and first a global average pooling is used on G to generate a weight vector W ∈ RC×1Then, the gradient-guided spatial attention is calculated.
Spatial attention is further directed to the fact that,
Figure BDA0002930542490000071
wherein wiThe i-th element, F, of W(i)The high value of M characterizes the more interesting positions in the feature map.
Wherein the spatial inverse mask comprises, in part,
Figure BDA0002930542490000072
wherein, aiAnd miThe elements at pixel position i of a and M are indicated, respectively, and T indicates the set spatial attention threshold.
Specifically, in the training phase of the convolutional neural network, the gradient of the feature map in the back propagation operation characterizes the sensitivity of different positions of the feature map to prediction perception, that is, even a slight change in the gradient of a certain position can have a strong influence on the prediction result, and the network update can focus on the position element more. Based on this, the attention of each spatial pixel location is characterized by a gradient, thereby generating a visualized attention thermal image. Given characteristic diagram F ∈ RC×H×WAnd its counter-propagating gradient tensor G ∈ RC×H×WWhere C denotes the number of channels of the feature map, and H × W denotes the size of the feature map. First, a weight vector W ∈ R is generated for G by using global average poolingC×1Then, the gradient-guided spatial attention is calculated:
Figure BDA0002930542490000073
wherein, wiThe i-th element, F, of W(i)The subgraph of the ith channel of the feature map F is shown, with the high value of M characterizing the more interesting positions in the feature map.
In the training process, the gradient descent algorithm forces to be converged to the most sensitive position in the image, so that for the recognition task, many less sensitive positions in the image are ignored, the gradient-guided attention image is subjected to threshold processing, the sensitive positions in the original feature image are shielded by using an inverse mask, so that the network is forced to recognize according to the insensitive positions of the image, a gradient-guided spatial attention image M is defined, and a spatial inverse mask A can be obtained:
Figure BDA0002930542490000074
Wherein, aiAnd miRepresenting the elements at pixel location i of a and M, respectively, and T representing a set spatial attention threshold, multiplying mask a with the corresponding feature map may force the network to learn from locations where the feature map is insensitive.
Further, as shown in fig. 2, the network structure used in the present invention is a network structure, where the network infrastructure adopts Resnet-50 which is most frequently used in the task of pedestrian re-identification, CBAM-Pro is added to the network after res _ conv _3 for attention learning, the network is divided into two branches after CBAM-Pro, and after normal training (referred to as forward learning) is performed on one branch, a gradient-guided spatial attention is used to obtain a reverse mask, so that reverse attention is performed on the other branch.
Specifically, for the branch of forward attention, the feature map F of the res _ conv _4 layer is obtained by a gradient echo propagation algorithm1Calculating a spatial attention map M guided by the gradient, performing threshold processing and 0-1 negation on the M to obtain a reverse mask A, and obtaining the reverse mask A and a feature map F passing through a res _ conv _4 layer on the reverse attention branch2Multiply element by element, make F2And shielding the global features of the two branches to a high attention position so as to force the network to perform attention learning on a reverse attention branch and a low sensitive position, respectively processing the global features of the two branches through the remaining convolution layers and then using global average-potential (GAP) to pass through dimension reduction layers including 1 × 1 convolution, batch normalization and ReLU activation to obtain two 256-dimensional independent attention global features, wherein the calculation process of the reverse mask of the dotted part is only used in a training stage.
In addition, a sub-branch is divided after the res _ conv _4 layer in the two attention branches for extracting local features, in order to keep the local features to have proper receiving domains, res _ conv _5 of the local feature branches does not use down-sampling operation, after res _ conv _5 is passed and respective local feature maps are divided, global max-posing (GMP) is used for each obtained local feature map, different from GAP used for the global feature map, GMP is used for the local feature maps to be more beneficial to digging out the local features with most discriminativity, each 256-dimensional local feature is obtained through a corresponding dimension reduction layer, and finally, the global feature and the local feature are connected according to channel dimensions to obtain the final pedestrian distinguishing feature containing the multi-type features.
S4: and connecting the extracted features according to the channel dimensions to obtain pedestrian identification features containing various types of features, and re-identifying and verifying the pedestrian identification features by using a test set to complete re-identification of the pedestrians.
The invention improves the convolution block attention model, combines the improved convolution block attention model and the space reverse attention network model on the basis to obtain a joint attention module capable of extracting different attention characteristics, and aims at the pedestrian re-identification task based on the joint attention module and introduces local branches.
Example 2
In order to verify the technical effects adopted in the method, the embodiment adopts the traditional technical scheme and the method of the invention to carry out comparison test, and compares the test results by means of scientific demonstration to verify the real effect of the method.
The embodiment performs experiments on three data sets, namely Marker-1501, DukeMTMC-reiD and CUHK03, which are most frequently used by the task of pedestrian re-identification, and evaluates the experimental results by using the first successful matching probability (rank-1) and the mean average precision (mAP).
The Marker-1501 comprises 1501 pedestrians with different identities shot by 6 cameras, 32668 pictures containing single pedestrians are generated by a DPM detector and are divided into non-overlapping training/testing sets, 12936 pictures containing 751 pedestrians with different identities are generated in the training sets, 3368 query pictures and 19732 gallery pictures from 750 pedestrians with different identities are generated in the testing sets, and a detection frame of the query pictures is drawn manually, so that the accuracy of a testing result is ensured; DukeMTMC-reID is a pedestrian re-identification subset of the DukeMTMC dataset, acquired using 8 cameras, including 36411 pictures of 1812 pedestrian identities, 1404 pedestrians appearing under more than 2 cameras in the pictures, dividing the 1404 pedestrians ' pictures into a training set and a test set by random sampling, which respectively include 702 pedestrians ' pictures, the remaining 408 pedestrians appearing under only 1 camera, adding their pictures to the test set's gallery as interference, the training set including 16522 pictures, the gallery including 17661 pictures, the query set including 2228 pictures, the CUHK03 dataset including 14097 pictures of 1467 pedestrians, each identity being acquired by 2 different cameras, 767 pictures of the identities for training, and 700 pictures of the identities for testing, the dataset providing both manual and automatic detector tagging, this example was tested under both labeled data sets.
The spatial inverse attention network adopts a direct division method to extract local features, an input picture is adjusted to 384 x 128 in a training stage, then data enhancement is carried out by adopting random horizontal inversion, standardization and random erasure, the picture is also adjusted to 384 x 128 in a test, only standardization processing is carried out, an infrastructure of the network adopts a network model before ResNet-50 and res _ conv _3 which are pre-trained on an ImageNet data set, parameters are shared in the model, CBAM-Pro is introduced into the model, all branches divided in the network are trained in parallel, wherein all branches after res _ conv _3 in the model are initialized by using pre-training weights of a corresponding layer after res _ conv _3 of ResNet-50, an experimental batch size is set to be 32, P is randomly sampled from the training set to be 8 identities, and K is sampled from each identity to be 4 pictures, network usage Adam optimizer AdamTraining is carried out with the initial learning rate set to 3 × 10-4A total of 250 cycles of training, and when the training is performed to 150 and 230 cycles, the learning rate is decreased to 3 × 10, respectively-5And 3 x 10-6And the margin of the triple loss is set to be 1.2, wherein the reverse attention operation is only used in the training stage, and two branches in the testing stage directly extract the features.
The embodiment compares the results of the method of the present invention on three reference data sets with the recent method, which includes a method using an attention mechanism, a method using a partition local feature, and other advanced methods, and in order to ensure the fairness of experimental comparison, the embodiment and the compared method do not use a reordering method, and the experimental results are shown in the following table.
Table 1: and (4) a Market-1501 data set experiment result performance comparison table.
Figure BDA0002930542490000101
Table 2: DukeMTMC-reiD data set Experimental results Performance comparison (%) Table.
Figure BDA0002930542490000102
Figure BDA0002930542490000111
Table 3: CUHK03 data set Experimental results Performance comparison (%) Table.
Figure BDA0002930542490000112
The comparison of the performances of the Market-1501 and DukeMTMC-reiD data sets is respectively shown in Table 1 and Table 2, and it can be seen that the method of the present invention is superior to other methods using an attention mechanism, for example, for Auto-Reid with excellent performance, mAP is respectively promoted by 2.72%/3.63% and rank-1 is respectively promoted by 0.80%/0.70%, MGN is still in an advantage in the method based on dividing local features, rank-1 on the Market-1501 is only slightly lower than MGN, and mAP and rank-1 indexes of mAP and DukeMTMC-reiD are both better than MGN, especially for mAP index, the MGN is greatly promoted by 0.92%/0.33% on both data sets compared with MGN; compared with other advanced methods, the method has excellent results, and compared with BDB, the method respectively improves the two data sets mAP by 1.12%/2.73%, and improves the rank-1 of Duke MTMC-reiD by 0.20%; the comparison of the performance of the data sets of CUHK03 is shown in Table 3, the existing methods such as Auto-ReiD and BDB have excellent results on two labeling sets of CUHK03, EANet has good performance on a detection labeling set, under the two labeling sets, the results of the method of the invention are superior to the methods, compared with the BDB with the most excellent performance, mAP/rank-1 on a manual labeling set is respectively improved by 0.82%/0.10%, mAP/rank-1 on a detection labeling set is respectively improved by 0.11%/0.67%, it is noted that MGN has excellent performance on Market-1501 and DukeMTMC-reiD, but the performance on CUHK03 is poor, and compared with the three data sets, the method of the invention has good performance.
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (1)

1. A pedestrian re-identification method based on a space reverse attention network is characterized by comprising the following steps:
collecting the shot pictures and dividing the pictures into a training set and a testing set;
constructing a space reverse attention network model based on Resnet-50, training a convolutional neural network according to the training set, and adding CBAM-Pro;
dividing the network into two branches according to the added CBAM-Pro, simultaneously executing forward learning and reverse attention, and extracting forward and reverse global features and local features;
obtaining pedestrian identification features containing various types of features based on the extracted features and connected according to channel dimensions, and performing re-identification verification on the pedestrian identification features by using the test set to complete re-identification of pedestrians;
The forward learning, reverse attention performing process includes,
after the network passes through CBAM-Pro, the network is divided into two branches, one branch is normally trained, namely forward learning is carried out, and then a space reverse mask is obtained by utilizing a space attention force guided by gradient, so that the other branch is reversely noticed;
the generation of the spatial attention map includes,
given characteristic diagram F epsilon RC×H×WAnd its counter-propagating gradient tensor G ∈ RC×H×WWhere C is the number of channels of the feature map and H × W represents the size of the feature map, first a weight vector W ∈ R is generated for G using global average poolingC×1Then calculating a gradient-guided spatial attention map;
the spatial attention is intended to further encompass,
Figure FDA0003618273130000011
wherein wiThe i-th element, F, of W(i)A subgraph representing the ith channel of the feature map F, wherein the high value of M represents a more concerned position in the feature map;
the spatial inverse mask includes a spatial inverse mask including,
Figure FDA0003618273130000012
wherein, aiAnd miElements at pixel position i of a and M are represented, respectively, and T represents a set spatial attention threshold;
the overall loss function for training the convolutional neural network includes,
L=β1Lsoftmax2Ltriplet
wherein L issoftmaxRepresents the sum of cross-entropy losses of all features, LtripletRepresenting the sum of triad losses, beta 1、β2Represents the equilibrium parameters, and is defined as beta in the experiment1=2,β2=1;
The sum of the cross-entropy losses of all the features includes,
Figure FDA0003618273130000021
wherein C represents the number of classes in the data set, W represents the weight vector of the corresponding class, N represents the experimental batch size, fiRepresenting features in each batch;
the sum of the losses of the triad includes,
Figure FDA0003618273130000022
wherein the content of the first and second substances,
Figure FDA0003618273130000023
respectively representing the characteristics of an anchor identity, a positive sample and a negative sample, wherein alpha represents a margin parameter of a triple loss;
the CBAM-Pro representation modified convolution block attention model includes,
the CBAM is improved with an efficient channel attention module of ECANet, the channel weight feature vector comprising,
w=σ(CIDk(Cavg)+CIDk(Cmax))
wherein σ represents Sigmoid activation function, CIDkRepresenting a one-dimensional convolution operation with a convolution kernel size of k.
CN202110146335.4A 2021-02-03 2021-02-03 Pedestrian re-identification method based on space reverse attention network Active CN112836637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110146335.4A CN112836637B (en) 2021-02-03 2021-02-03 Pedestrian re-identification method based on space reverse attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110146335.4A CN112836637B (en) 2021-02-03 2021-02-03 Pedestrian re-identification method based on space reverse attention network

Publications (2)

Publication Number Publication Date
CN112836637A CN112836637A (en) 2021-05-25
CN112836637B true CN112836637B (en) 2022-06-14

Family

ID=75931804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110146335.4A Active CN112836637B (en) 2021-02-03 2021-02-03 Pedestrian re-identification method based on space reverse attention network

Country Status (1)

Country Link
CN (1) CN112836637B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657355A (en) * 2021-10-20 2021-11-16 之江实验室 Global and local perception pedestrian re-identification method fusing segmentation information
CN115393788B (en) * 2022-08-03 2023-04-18 华中农业大学 Multi-scale monitoring pedestrian re-identification method based on global information attention enhancement
CN115862073B (en) * 2023-02-27 2023-07-04 国网江西省电力有限公司电力科学研究院 Substation hazard bird species target detection and identification method based on machine vision

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368815A (en) * 2020-05-28 2020-07-03 之江实验室 Pedestrian re-identification method based on multi-component self-attention mechanism
CN111539370A (en) * 2020-04-30 2020-08-14 华中科技大学 Image pedestrian re-identification method and system based on multi-attention joint learning
CN111898736A (en) * 2020-07-23 2020-11-06 武汉大学 Efficient pedestrian re-identification method based on attribute perception
CN111931624A (en) * 2020-08-03 2020-11-13 重庆邮电大学 Attention mechanism-based lightweight multi-branch pedestrian heavy identification method and system
CN112183468A (en) * 2020-10-27 2021-01-05 南京信息工程大学 Pedestrian re-identification method based on multi-attention combined multi-level features

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11138469B2 (en) * 2019-01-15 2021-10-05 Naver Corporation Training and using a convolutional neural network for person re-identification
CN110070073A (en) * 2019-05-07 2019-07-30 国家广播电视总局广播电视科学研究院 Pedestrian's recognition methods again of global characteristics and local feature based on attention mechanism
CN111325111A (en) * 2020-01-23 2020-06-23 同济大学 Pedestrian re-identification method integrating inverse attention and multi-scale deep supervision
CN111507217A (en) * 2020-04-08 2020-08-07 南京邮电大学 Pedestrian re-identification method based on local resolution feature fusion
CN111881780A (en) * 2020-07-08 2020-11-03 上海蠡图信息科技有限公司 Pedestrian re-identification method based on multi-layer fusion and alignment division

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539370A (en) * 2020-04-30 2020-08-14 华中科技大学 Image pedestrian re-identification method and system based on multi-attention joint learning
CN111368815A (en) * 2020-05-28 2020-07-03 之江实验室 Pedestrian re-identification method based on multi-component self-attention mechanism
CN111898736A (en) * 2020-07-23 2020-11-06 武汉大学 Efficient pedestrian re-identification method based on attribute perception
CN111931624A (en) * 2020-08-03 2020-11-13 重庆邮电大学 Attention mechanism-based lightweight multi-branch pedestrian heavy identification method and system
CN112183468A (en) * 2020-10-27 2021-01-05 南京信息工程大学 Pedestrian re-identification method based on multi-attention combined multi-level features

Also Published As

Publication number Publication date
CN112836637A (en) 2021-05-25

Similar Documents

Publication Publication Date Title
CN112836637B (en) Pedestrian re-identification method based on space reverse attention network
Huang et al. Instance-aware image and sentence matching with selective multimodal lstm
CN106919920B (en) Scene recognition method based on convolution characteristics and space vision bag-of-words model
Tian et al. A dual neural network for object detection in UAV images
CN111738143B (en) Pedestrian re-identification method based on expectation maximization
CN114005096A (en) Vehicle weight recognition method based on feature enhancement
Li et al. HAR-Net: Joint learning of hybrid attention for single-stage object detection
Biswas et al. One shot detection with laplacian object and fast matrix cosine similarity
CN108154133B (en) Face portrait-photo recognition method based on asymmetric joint learning
CN111985538A (en) Small sample picture classification model and method based on semantic auxiliary attention mechanism
CN105989336A (en) Scene identification method based on deconvolution deep network learning with weight
CN108805102A (en) A kind of video caption detection and recognition methods and system based on deep learning
Fan et al. A hierarchical Dirichlet process mixture of generalized Dirichlet distributions for feature selection
Pratama et al. Face recognition for presence system by using residual networks-50 architecture
Li et al. Enhanced bird detection from low-resolution aerial image using deep neural networks
Kalliatakis et al. Exploring object-centric and scene-centric CNN features and their complementarity for human rights violations recognition in images
CN115482508A (en) Reloading pedestrian re-identification method, reloading pedestrian re-identification device, reloading pedestrian re-identification equipment and computer-storable medium
Zhou et al. Exploiting visual context semantics for sound source localization
CN111582057B (en) Face verification method based on local receptive field
Zhang et al. Video action recognition with Key-detail Motion Capturing based on motion spectrum analysis and multiscale feature fusion
CN112347965A (en) Video relation detection method and system based on space-time diagram
Gowada et al. Unethical human action recognition using deep learning based hybrid model for video forensics
Dong et al. Scene-oriented hierarchical classification of blurry and noisy images
CN116229580A (en) Pedestrian re-identification method based on multi-granularity pyramid intersection network
Sathiyaprasad et al. Content based video retrieval using Improved gray level Co-occurrence matrix with region-based pre convoluted neural network–RPCNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant