CN114882534B

CN114882534B - Pedestrian re-recognition method, system and medium based on anti-facts attention learning

Info

Publication number: CN114882534B
Application number: CN202210606376.1A
Authority: CN
Inventors: 李小红; 代雪松; 齐美彬; 庄硕; 郝世杰; 蒋建国; 张晶晶; 李世龙
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2024-03-26
Anticipated expiration: 2042-05-31
Also published as: CN114882534A

Abstract

The invention discloses a pedestrian re-identification method and system based on anti-facts attention learning and a computer storage medium. Wherein the pedestrian re-recognition method quantifies the quality of attention by comparing the effect of facts, i.e. learned attention, and of counter facts, i.e. false attention, on the final prediction. And the difference is maximized to promote the more effective visual attention of the network learning and reduce the influence of a biased training set, so that the discriminative feature learning of the target area pedestrian image is enhanced, and the problems of shielding and background clutter of the target area pedestrian image are solved.

Description

Pedestrian re-recognition method, system and medium based on anti-facts attention learning

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a pedestrian re-identification method, a pedestrian re-identification system and a computer storage medium based on anti-facts attention learning.

Background

In recent years, unsupervised domain adaptation (Unsupervised Domain Adaptive, UDA) has been a research hotspot in the field of deep learning, which is the application of a model trained on labeled source domain datasets to another task-dependent but feature-distributed, non-labeled target domain dataset. The existing unsupervised domain adaptive pedestrian re-identification method generally comprises three steps: feature pre-training is performed by using tagged source domain data, clustering-based pseudo tag prediction is performed on target domain data, and feature representation learning and pseudo tag fine tuning are performed.

However, due to the differences/domain gaps between the source domain data and the target domain data and the imperfect nature of the clustering algorithm, the pseudo tags assigned by clustering often contain incorrect tags. Such noisy tags can mislead feature learning and compromise domain adaptation performance. Therefore, the method for alleviating the negative influence of the noise pseudo tag sample and solving the problems of background clutter and shielding of the pedestrian image in the target domain data is very important to improve the retrieval performance of the domain self-adaptive model. To address the noise signature problem described above, some domain adaptive algorithms exist that help the model learn a more robust feature representation by correcting incorrect samples online. Although these domain adaptive algorithms have made significant progress, their recognition accuracy remains unsatisfactory with respect to supervised pedestrian re-recognition. In addition, most of the traditional methods optimize attention by supervising final predictions and treating the whole network as a black box, ignoring how learned attention attempts affect predictions. The negative influence of noise pseudo labels easily occurs in the pseudo label fine tuning process under the clustering framework is caused.

Disclosure of Invention

The invention aims to: aiming at the problems in the prior art, the invention provides a pedestrian re-identification method based on the anti-facts attention learning, which can effectively improve the accuracy of pedestrian re-identification.

The technical scheme is as follows: the invention discloses a pedestrian re-identification method based on inverse fact attention learning, which comprises the following steps:

s1, establishing a teacher model and a student model, wherein the teacher model and the student model have the same structure; the input of the teacher model is a pedestrian image, and the structure comprises a backbone network and a classifier, wherein the backbone network is used for extracting the characteristics of the pedestrian image, and the classifier is used for acquiring the category probability of the pedestrian image according to the characteristics of the pedestrian image;

pretraining a teacher model with a source domain dataset, the pretraining targeting minimizing a class loss function, the class loss functionThe method comprises the following steps:

wherein p (y) _s,n ∣x _s,n ) Representing the input image x _s,n Identified as class y _s,n Probability of y _s,n Is x in the source domain dataset _s,n N is the number of samples per batch in the pre-training process;

initializing parameters of a student model into parameters of a pre-trained teacher model;

s2, inputting the image in the target domain data set into a teacher model, and outputting the characteristics of the target domain image by a backbone network of the teacher model; clustering the features of the target domain images, generating pseudo tags of the target domain images according to the clustered categories, and splicing the feature vectors of each clustering center into a target domain clustering center matrixWherein P is _t The number of categories after the target domain image features are clustered, c, h and w are the dimension, the height and the width of the features extracted by the main network of the teacher model respectively;

s3, inputting the image in the source domain data set into a teacher model, and outputting the characteristics of the source domain image by a backbone network of the teacher model; according to the true identity label of the source domain image, averaging the extracted feature vectors of the source domain image belonging to the same identity to obtain class center vectors of the pedestrian identity, and splicing the feature vectors of each class center into a sourceDomain class center matrixWherein P is _s The true pedestrian identity category number is the source domain data;

R _t and R is _s Spliced into a reference center matrixK _r ＝P _t +P _s ；

S4, keeping parameters of the teacher model unchanged, and optimizing parameters of the student model by adopting iterative training; the goal of iterative training optimization is to minimize the loss function, with each batch of samples in a particular training comprising N _t Target domain images and N _s A source domain image; loss function L _total Classification of losses L for targets _class And attention lossAnd (3) summing; the method comprises the following specific steps:

S4.1A calculating target classification loss L _class ：

Wherein the method comprises the steps ofFor the target domain image x _i Pseudo tag of->Input image x for student model _i Identified as class->Probability of (2);

S4.2A calculating Source Domain image z _v Influence of attention on prediction, v=1, 2, …, N _s ：

Will z _v Inputting student model to obtain z _v Is the first characteristic f of (2) _v ，f _v ∈R ^c×h×w ；

Extracting f using spatial attention module _v Attention profile F of (2) _v ：

F _v ＝{F _v,1 ,F _v,2 ,...,F _v,m ,...F _v,M }＝attention(f _v ) (4)

Wherein attention (·) represents the spatial attention module extraction attention profile operation, F _v,m ∈R ^h×w For attention profile F _v The attention characteristic diagram of the mth region of interest, M is the number of the regions of interest;

by F _v,m For the first feature f _v Weighting and aggregating by global average pooling operation to obtain local attention Z _v,m ：Z _v,m ＝GAP(f _v *F _v,m ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein GAP represents a global average pooling operation;

local attention is combined and normalized to obtain a source domain image z _v Global feature Z of (2) _v ：

Z _v ＝normalize([Z _v,1 ,Z _v,2 ,...,Z _v,m ,...,Z _v,M ]) (5)

Wherein normal represents a normalization operation;

global feature Z _v Sequentially inputting a batch normalization layer and a classification layer to obtain a source domain image z _v Prediction results weighted by spatial attention:

Y _v ＝C(BN(Z _v )) (6)

generating a counterattention profileThe anti-attention profile->Size and F of (2) _v The same, each element value is a random number;

using anti-attention profileFor f _v Weighting, carrying out global average pooling operation, combining and normalizing, and sequentially inputting a batch normalization layer and a classification layer to obtain a source domain image z _v Prediction after inverse spatial attention weighting +.>

The impact of attention on predictions is as follows:

S4.3A, calculate attention loss:

y _s,v is the source domain image z _v Category labels of (c);for the influence of attention on prediction the value is +.>Is identified as class y _s,v Probability of (2);

s5, carrying out weighted average update on parameters of the teacher model;

after the iterative training optimization is finished, carrying out weighted average update on the parameters of the teacher model according to the parameters of the student model; and inputting the target domain image into the updated teacher model, outputting class probability by the classifier, and selecting a class corresponding to the maximum value of the class probability as a recognition result of the input image.

Further, the backbone network of the teacher model comprises a first convolution module, a first pooling module, a first attention module, a second convolution module, a second attention module, a third convolution module, a third attention module, a fourth convolution module, a fourth attention module, a fifth convolution module, a fifth attention module and a second pooling module which are sequentially connected.

Further, the first attention module, the second attention module, the third attention module, the fourth attention module and the fifth attention module have the same structure and comprise a cascaded 2D convolution layer and an activation unit.

Further, uncertainty is calculated by calculating output differences of a teacher model and a student model, so that reliability or reliability of the pseudo tag of the target domain image is obtained, and a loss function is weighted by utilizing the reliability or reliability, specifically, the weighted target classification loss calculation step is as follows:

S4.1B calculating target Domain image x _i Uncertainty u of (2) _i ，i＝1,2,…,N _t ：

Will x _i Inputting a student model, wherein a backbone network of the student model outputs x _i Is the first characteristic f of (2) _i ；

Calculating a first feature f _i Similarity to the reference center matrix R as x _i Is a first soft multi-label L _i ：L _i ＝Softmax(R·f _i )；

Will x _i Inputting a teacher model, outputting x by a backbone network of the teacher model _i Is characterized by the second feature of (2)

Calculating a second characteristicSimilarity to the reference center matrix R as x _i Is>

KL-basedDivergence calculation x _i Uncertainty u of (2) _i ：

Wherein L is _i,k Representing a first soft multi-label L _i The probability of the kth identity corresponding to the reference center matrix R is predicted;representing a second soft poly tag->The probability of the kth identity corresponding to the reference center matrix R is predicted;

S4.2B calculating target Domain image x _i Reliability weight omega of (2) _i ：ω _i ＝exp(-u _i )；

S4.3B calculating weighted target domain classification loss L _class ，

Wherein the method comprises the steps ofFor the target domain image x _i Pseudo tag of->Input image x for student model _i Identified as class->Is a probability of (2).

Further, the loss function further includes a triplet loss, and the calculating step includes:

S4.4B, constructing a triplet according to the target domain image: from N _t Randomly selecting P categories in each target domain image, wherein each category comprises K examples, and constructing triples by taking the selected target domain image as an anchor sampleWherein l=1, 2, …, P, m=1, 2, …, K, -/->For the mth instance in class I, -/->Is->Positive samples of->Is->Is a negative sample of (2);

calculating uncertainty of anchor sample, positive sample and negative sample in each triplet

Calculating reliability of positive sample to Anchor sampleAnd reliability of negative sample to Anchor sample +.>

Wherein:

S4.5B, calculating reliability weighted triplet loss:

wherein the method comprises the steps ofRepresentation->And->Similarity between->Representation->And->Similarity between; alpha>0, α is a preset distance constant.

Further, the loss function further includes a contrast loss, and the calculating steps are as follows:

S4.6B, establishing a memory bank according to the training sample; image x of target domain _i As a query sample, find and x in a memory bank _i Samples with the same pseudo tag as positive samples, and x _i Samples with different pseudo tags as negative samples; calculating contrast loss:

wherein the method comprises the steps ofAnd->X in memory bank respectively _i Number of positive and negative samples, +.>Representing the jth negative sample and x _i Similarity of->Represents the jth negative sample pair x _i Reliability of->Representing the h positive sample and x _i Similarity of->Represents the h positive sample pair x _i Reliability of (3).

On the other hand, the invention also discloses a pedestrian re-recognition system based on the anti-facts attention learning, wherein the system is a teacher model obtained according to the method.

The invention also discloses a computer storage medium, on which computer instructions are stored, which execute the pedestrian re-recognition method when running.

The beneficial effects are that: compared with the prior art, the pedestrian re-identification method and system based on the counter fact attention learning have the following advantages:

1. the invention provides a counternotice mechanics learning mechanism aiming at the problems of background clutter and shielding of pedestrian images in a data set. Most existing methods learn visual attention in a weakly supervised manner, i.e. the attention module is supervised only by the final loss function, without a powerful supervision signal to guide the training process. This likelihood-based approach explicitly oversees only the final prediction, but ignores the causal relationship between prediction and attention. The invention provides a novel causal reasoning-based anti-facts attention mechanics learning method for enhancing the discriminant feature learning of the target area pedestrian image and solving the problems of shielding and background clutter of the target area pedestrian image. The basic idea is to quantify the quality of attention by comparing the effect of facts (i.e. learned attention) and counterfacts (i.e. false attention) on the final prediction. And maximize the difference to promote more efficient visual attention for web learning and reduce the impact of biased training sets.

2. For noise labeling problems, some domain adaptive algorithms exist that help models learn more robust feature representations by correcting incorrect samples online. Although these domain adaptive algorithms have made significant progress, their recognition accuracy remains unsatisfactory with respect to unsupervised pedestrian re-recognition. The invention takes the level of inconsistency of output characteristics of two models (a student model and a teacher model based on an average teacher method) as the uncertainty distribution of a target domain sample. The negative effects of noise pseudo tags are mitigated by estimating and exploiting the uncertainty of the samples, thereby enabling a more efficient domain adaptation algorithm.

Drawings

FIG. 1 is a training frame diagram in example 1;

fig. 2 is a schematic diagram of the structure of the anti-facts attention learning module;

fig. 3 is a training frame diagram in example 2.

Detailed Description

The invention is further elucidated below in connection with the drawings and the detailed description.

Example 1:

the invention discloses a pedestrian re-identification method based on inverse fact attention learning, which comprises the following steps:

s1, establishing a teacher model and a student model, wherein the teacher model and the student model have the same structure; the input of the teacher model is a pedestrian image, and the structure comprises a backbone network and a classifier, wherein the backbone network is used for extracting the characteristics of the pedestrian image, and the classifier is used for acquiring the category probability of the pedestrian image according to the characteristics of the pedestrian image; the backbone network comprises a first convolution module, a first pooling module, a first attention module, a second convolution module, a second attention module, a third convolution module, a third attention module, a fourth convolution module, a fourth attention module, a fifth convolution module, a fifth attention module and a second pooling module which are sequentially connected. In this embodiment, the teacher model and the student model are based on a ResNet50 network to which 5 attention modules are added, the specific network is shown in Table 1.

Table 1 shows the structural parameters of the first convolution module Conv_1, the first pooling module MaxPool_1, the first attention module Att_1, the second convolution module Conv_2x, the second attention module Att_2, the third convolution module Conv_3x, the third attention module Att_3, the fourth convolution module Conv_4x, the fourth attention module Att_4, the fifth convolution module Conv_5x, the fifth attention module Att_5, the second pooling module MaxPool_2 and the Classifier Classification in this embodiment. The first attention module att_1, the second attention module att_2, the third attention module att_3, the fourth attention module att_4 and the fifth attention module att_5 have the same structure and comprise cascaded 2D convolution layers and activation units, but the output channels of the convolution layers are different.

Table 1 network architecture parameters

wherein p (y) _s,n ∣x _s,n ) Representing the input image x _s,n Identified as class y _s,n Probability of y _s,n Is x in the source domain dataset _s,n N is the number of samples per batch in the pretraining processThe method comprises the steps of carrying out a first treatment on the surface of the In order to improve the robustness of the model, the image in the source domain data set is subjected to disturbance such as random erasure, random clipping, random overturning and the like before the pre-training, and the pre-training is performed by using the disturbed image, so that the sensitivity of the model to the image is reduced.

s3, inputting the image in the source domain data set into a teacher model, and outputting the characteristics of the source domain image by a backbone network of the teacher model; according to the true identity label of the source domain image, averaging the extracted source domain image feature vectors belonging to the same identity to obtain class center vectors of the pedestrian identity, and splicing the feature vectors of each class center into a source domain class center matrixWherein P is _s The true pedestrian identity category number is the source domain data;

R _t and R is _s Spliced into a reference center matrixK _r ＝P _t +P _s ；

S4, keeping parameters of the teacher model unchanged, and optimizing parameters of the student model by adopting iterative training; the goal of iterative training optimization is to minimize the loss function, with each batch of samples in a particular training comprising N _t Target domain images and N _s A source domain image; loss function L in this embodiment _total Classification of losses L for targets _class And attention lossThe sum is that:

wherein lambda is _eff Measuring a loss coefficient for a preset attention;

the attention module is an important component part of the pedestrian re-identification framework, which is used for guiding the network to extract the most discriminative features of the pedestrian image. In order to learn better attention weights, the invention introduces a counterfactual attention mechanism, and the network learns better attention weights by measuring the influence of the network learned attention feature map and the randomly generated counterfactual attention feature map on the classification result. The training frame is shown in fig. 1, wherein a solid line in the figure is the flow direction of a source domain image, a dotted line is the flow direction of a target domain image, and the specific steps are as follows:

S4.1A calculating target classification loss L _class ：

S4.2A calculating Source Domain image z _v Influence value of attention on prediction, v=1,2,…,N _s ：

F _v ＝{F _v,1 ,F _v,2 ,...,F _v,m ,...F _v,M }＝attention(f _v ) (4)

Wherein attention (·) represents the spatial attention module extraction attention profile operation, F _v,m ∈R ^h×w For attention profile F _v The attention characteristic diagram of the mth region of interest, M is the number of the regions of interest; in this embodiment, M takes an empirical value of 8, i.e. the attention feature map is divided into 8 regions of interest, and each region may have different features, such as a backpack, shoes, clothing, etc. of a pedestrian, all of which are embodied as different features. In this embodiment, the spatial attention module uses literature: rao Y, chen G, lu J, et al Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification [ J ]]2021 the method of the present invention, the spatial attention module is implemented using a 2D convolution layer and a Relu activation function.

Z _v ＝normalize([Z _v,1 ,Z _v,2 ,...,Z _v,m ,...,Z _v,M ]) (5)

Wherein normal represents a normalization operation;

Y _v ＝C(BN(Z _v )) (6)

The impact of attention on predictions is as follows:

the calculation of (2) is completed by a counterfacts attention mechanics learning module, and the structure of the counterfacts attention mechanics learning module is shown in figure 2.

S4.3A, calculate attention loss:

s5, carrying out weighted average update on parameters of the teacher model;

and after the iterative training optimization is finished, carrying out weighted average updating on the parameters of the teacher model according to the parameters of the student model. The present example uses literature: the method in Tarvainen A, valpola H.mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results [ J ].2017 optimizes the parameters of the student model and updates the parameters of the teacher model with a weighted average.

And inputting the target domain image into the updated teacher model, outputting class probability by the classifier, and selecting a class corresponding to the maximum value of the class probability as a recognition result of the input image.

Example 2:

the difference between this embodiment and embodiment 1 is that the portion of the objective classification loss function is weighted with a confidence weight.

Incorrect pseudo tags can mislead feature learning during training, affecting the performance of the domain adaptive model. In this embodiment, uncertainty distribution is estimated by evaluating output differences of the teacher model and the student model, so as to evaluate reliability or reliability of the pseudo tag of the target domain image, and weight the loss function by using the reliability or reliability, so as to suppress negative influence of the false tag sample on training, and improve domain self-adaptability of the model. The objective of iterative training optimization is to minimize a loss function, which may include one or more of classification loss, triplet loss, contrast loss, in addition to attention loss, and in this embodiment, the loss function is the sum of the attention loss and the three losses, and the training framework is as shown in fig. 3, specifically, a training batch includes the steps of:

Will x _i Inputting a student model, said learningBackbone network output x of green model _i Is the first characteristic f of (2) _i ；

The present invention uses KL divergence to measure the difference between probability distributions of feature vectors extracted by a teacher model and a student model, namely:

calculating x based on KL divergence _i Uncertainty u of (2) _i ：

Wherein L is _i,k Representing a first soft multi-label L _i The probability of the kth identity corresponding to the reference center matrix R is predicted;representing a second soft poly tag->Is predicted as the reference centerProbability of kth identity corresponding to matrix R;

S4.3B according to the calculation formula of uncertainty, the larger the difference of the teacher model and the student model on the same image output is, the higher the uncertainty is, and correspondingly, the lower the credibility weight is; thereby obtaining the target classification loss L weighted by the credibility weight _class ：

S4.4B, constructing a triplet according to the target domain image: from N _t Randomly selecting P categories in each target domain image, wherein each category comprises K examples, and constructing triples by taking the selected target domain image as an anchor sampleWherein l=1, 2, …, P, m=1, 2, …, K, -/->For the mth instance in class I, -/->Is->Positive samples of (i.e.)>Is->Target domain images of the same category; />Is->Negative samples of (i.e.)>Is->Target domain images of different categories;

calculate the Anchor samples in each triplet according to step S4.1BPositive sample->Negative sample->Uncertainty of +.>

In this embodiment, the mean value of the reliability of the positive sample and the Anchor sample is used as the reliability of the positive sample to the Anchor sample, namely:

in the same way, the processing method comprises the steps of,

S4.5B, calculating reliability weighted triplet loss:

wherein the method comprises the steps ofRepresentation->And->Similarity between->Representation->And->Similarity between; in the embodiment, euclidean distance is adopted to represent similarity; alpha>0, alpha is a preset distance constant, so that the distance between the anchor sample and the negative sample is larger, and the distance between the anchor and the positive sample is largerThe distance is smaller; from a mathematical perspective, the lower the confidence (higher uncertainty) the pair of samples, the less similarity weight. The smaller the gradient in the optimization corresponds, i.e. the smaller the contribution to the optimization.

S4.6B, establishing a memory bank according to the training sample;

memory bank based methods have been widely used for unsupervised representation learning, which helps introduce contrast loss for general tasks, which is an optimized network that stores feature representations of training set data extracted through the backbone network before training begins to take advantage of better similarity between samples and samples of the global memory bank. The invention adopts the following documents: wang X, zhang H, huang W, et al Cross-Batch Memory for Embedding Learning [ J ] 2019. The method of creating and maintaining a memory bank;

image x of target domain _i As a query sample, find and x in a memory bank _i Samples with the same pseudo tag as positive samples, and x _i Samples with different pseudo tags as negative samples; calculating contrast loss:

wherein the method comprises the steps ofAnd->X in memory bank respectively _i Number of positive and negative samples, +.>Representing the jth negative sample and x _i Similarity of->Represents the jth negative sample pair x _i Reliability of->Representing the h positive sample and x _i Similarity of->Represents the h positive sample pair x _i Reliability of (3); here, the similarity is also calculated using the euclidean distance, and the reliability is calculated in accordance with equations (9) and (11) in the same manner as in step S4.4B. The lower the confidence of the sample pair, the smaller the gradient, and the less contribution to the optimization of the sample pair. Thus, the contrast loss using reliability weighting obtains an optimal update to the network parameters by giving more weight to the negative samples most similar to the query samples and the positive samples least similar to the query samples.

To sum up, the objective of iterative training optimization in this embodiment is to minimize the loss function L _total ：

L _total ＝L _class +L _triplet +λ _ct L _contrastive +λ _eff L _effect (15)

Wherein lambda is _eff Attention loss L is a preset attention loss coefficient of a source domain _effect Is the same as in example 1;

this example conducted comparative experiments on both the Market-1501 and DukeMTMC-ReID datasets, the results of which are shown in Table 2.

Table 2: testing results on Market-1501, dukeMTMC-ReID dataset

In table 2, the method BUC is as follows: identification is carried out by the method in Lin Y, X Dong, zheng L, et al A Bottom-Up Clustering Approach to Unsupervised Person Re-Identification [ J ]. Proceedings of the AAAI Conference on Artificial Intelligence,2019, 33:8738-8745; the method ECN adopts the following literature: the method of Zhong Z, zheng L, luo Z, et al Invariance materials Exemplar Memory for Domain Adaptive Person Re-identification [ J ]. IEEE, 2019; MAR is described in: yu H X, zheng W S, wu A, et al, superior Person Re-Identification by Soft Multilabel Learning [ C ]//2019IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE,2019. The method of the present invention identifies; PDA-Net is described in literature: the method in Li Y J, lin C S, lin Y B, et al Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation [ J ]. IEEE, 2019; the PCB adopts the literature: the method in Zhang X, cao J, shen C, et al self-training with progressive augmentation for unsupervised cross-domain person re-identification [ J ]. IEEE, 2019; SSG is described in: identification is performed by the method of Fu Y, wei Y, wang G, et al Self-similarity Grouping: A Simple Unsupervised Cross Domain Adaptation Approach for Person Re-identification [ J ]. 2018; MPLP is a literature: wang D, zhang S.insuperved Person Re-identification via Multi-label Classification [ J ]. IEEE,2020. Method of identification; AD-Cluster is disclosed in the following literature: identification is performed by the method in Zhai Y, lu S, ye Q, et al AD-Cluster Augmented Discriminative Clustering for Domain Adaptive Person Re-Identification [ J ]. IEEE, 2020; MMT is a literature: the method in Liu J, zha Z J, D Chen, et al adaptive Transfer Network for Cross-Domain Person Re-Identification [ C ]//2019IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) & IEEE, 2019; NRMT is a literature: the method of Zhao F, liao S, xie G S, et al, superior visual Domain Adaptation with Noise Resistible Mutual-Training for Person Re-identification [ C ]// European Conference on Computer vision Springer, cham, 2020; the UNRN is as follows: identification is performed by the method of Zheng K, lan C, zeng W, et al, identification Sample Uncertainty for Domain Adaptive Person Re-Identification [ J ]. 2020; GLT is described in literature: identification is performed by the method of Zheng K, liu W, he L, et al group-aware Label Transfer for Domain Adaptive Person Re-identification [ J ]. 2021; the Ours is the method in this embodiment for identification.

As can be seen from Table 2, the Rank-1, rank-5, rank-10 and mAP indexes of the method are all superior to those of the advanced method at the time to a certain extent, and the cross-domain experiments performed on the two data sets also prove that the method has good generalization performance on various data sets.

Claims

1. A pedestrian re-recognition method based on anti-facts attention learning, comprising the steps of:

R _t and R is _s Spliced into a reference center matrix

S4.1A calculating target classification loss L _class ：

Wherein the method comprises the steps ofFor target domain imagesx _i Pseudo tag of->Input image x for student model _i Identified as class->Probability of (2);

F _v ＝{F _v,1 ,F _v,2 ,...,F _v,m ,...F _v,M }＝attention(f _v ) (4)

Z _v ＝normalize([Z _v,1 ,Z _v,2 ,...,Z _v,m ,...,Z _v,M ]) (5)

Wherein normal represents a normalization operation;

Y _v ＝C(BN(Z _v )) (6)

The impact of attention on predictions is as follows:

S4.3A, calculate attention loss:

y _s,v is the source domain image z _v Category labels of (c); p (y) _s,v ∣Y _v ^effect ) To make the influence of attention on prediction be Y _v ^effect Is identified as class y _s,v Probability of (2);

s5, carrying out weighted average update on parameters of the teacher model;

2. The pedestrian re-recognition method based on the anti-facts attention learning of claim 1, wherein the backbone network of the teacher model includes a first convolution module, a first pooling module, a first attention module, a second convolution module, a second attention module, a third convolution module, a third attention module, a fourth convolution module, a fourth attention module, a fifth convolution module, a fifth attention module, a second pooling module, which are sequentially connected.

3. The pedestrian re-recognition method based on the anti-facts attention learning of claim 2, wherein the first attention module, the second attention module, the third attention module, the fourth attention module, the fifth attention module are identical in structure, and comprise a cascade 2D convolution layer and an activation unit.

4. The pedestrian re-recognition method based on the anti-facts attention learning according to claim 1, wherein the uncertainty is calculated by calculating the output difference of the teacher model and the student model, so as to obtain the reliability or the reliability of the pseudo tag of the target domain image, and the loss function is weighted by using the reliability or the reliability, specifically, the weighted target classification loss calculation step is as follows:

Calculating x based on KL divergence _i Uncertainty u of (2) _i ：

S4.3B calculating weighted target domain classification loss L _class ，

5. The pedestrian re-recognition method based on the counter-facts attention learning according to claim 1, wherein the loss function further includes a triplet state loss, and the calculating step includes:

Wherein:

S4.5B, calculating reliability weighted triplet loss:

6. The pedestrian re-recognition method based on the anti-facts attention learning of claim 1, wherein the loss function further includes a contrast loss, the calculating step is:

7. A pedestrian re-recognition system based on counter-facts attention learning, characterized in that the system is a teacher model obtained according to the method of any one of claims 1-6.

8. A computer storage medium having stored thereon computer instructions which, when run, perform the pedestrian re-identification method of any one of claims 1 to 6.