CN114882534B - Pedestrian re-recognition method, system and medium based on anti-facts attention learning - Google Patents

Pedestrian re-recognition method, system and medium based on anti-facts attention learning Download PDF

Info

Publication number
CN114882534B
CN114882534B CN202210606376.1A CN202210606376A CN114882534B CN 114882534 B CN114882534 B CN 114882534B CN 202210606376 A CN202210606376 A CN 202210606376A CN 114882534 B CN114882534 B CN 114882534B
Authority
CN
China
Prior art keywords
attention
image
pedestrian
class
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210606376.1A
Other languages
Chinese (zh)
Other versions
CN114882534A (en
Inventor
李小红
代雪松
齐美彬
庄硕
郝世杰
蒋建国
张晶晶
李世龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202210606376.1A priority Critical patent/CN114882534B/en
Publication of CN114882534A publication Critical patent/CN114882534A/en
Application granted granted Critical
Publication of CN114882534B publication Critical patent/CN114882534B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pedestrian re-identification method and system based on anti-facts attention learning and a computer storage medium. Wherein the pedestrian re-recognition method quantifies the quality of attention by comparing the effect of facts, i.e. learned attention, and of counter facts, i.e. false attention, on the final prediction. And the difference is maximized to promote the more effective visual attention of the network learning and reduce the influence of a biased training set, so that the discriminative feature learning of the target area pedestrian image is enhanced, and the problems of shielding and background clutter of the target area pedestrian image are solved.

Description

Pedestrian re-recognition method, system and medium based on anti-facts attention learning
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a pedestrian re-identification method, a pedestrian re-identification system and a computer storage medium based on anti-facts attention learning.
Background
In recent years, unsupervised domain adaptation (Unsupervised Domain Adaptive, UDA) has been a research hotspot in the field of deep learning, which is the application of a model trained on labeled source domain datasets to another task-dependent but feature-distributed, non-labeled target domain dataset. The existing unsupervised domain adaptive pedestrian re-identification method generally comprises three steps: feature pre-training is performed by using tagged source domain data, clustering-based pseudo tag prediction is performed on target domain data, and feature representation learning and pseudo tag fine tuning are performed.
However, due to the differences/domain gaps between the source domain data and the target domain data and the imperfect nature of the clustering algorithm, the pseudo tags assigned by clustering often contain incorrect tags. Such noisy tags can mislead feature learning and compromise domain adaptation performance. Therefore, the method for alleviating the negative influence of the noise pseudo tag sample and solving the problems of background clutter and shielding of the pedestrian image in the target domain data is very important to improve the retrieval performance of the domain self-adaptive model. To address the noise signature problem described above, some domain adaptive algorithms exist that help the model learn a more robust feature representation by correcting incorrect samples online. Although these domain adaptive algorithms have made significant progress, their recognition accuracy remains unsatisfactory with respect to supervised pedestrian re-recognition. In addition, most of the traditional methods optimize attention by supervising final predictions and treating the whole network as a black box, ignoring how learned attention attempts affect predictions. The negative influence of noise pseudo labels easily occurs in the pseudo label fine tuning process under the clustering framework is caused.
Disclosure of Invention
The invention aims to: aiming at the problems in the prior art, the invention provides a pedestrian re-identification method based on the anti-facts attention learning, which can effectively improve the accuracy of pedestrian re-identification.
The technical scheme is as follows: the invention discloses a pedestrian re-identification method based on inverse fact attention learning, which comprises the following steps:
s1, establishing a teacher model and a student model, wherein the teacher model and the student model have the same structure; the input of the teacher model is a pedestrian image, and the structure comprises a backbone network and a classifier, wherein the backbone network is used for extracting the characteristics of the pedestrian image, and the classifier is used for acquiring the category probability of the pedestrian image according to the characteristics of the pedestrian image;
pretraining a teacher model with a source domain dataset, the pretraining targeting minimizing a class loss function, the class loss functionThe method comprises the following steps:
wherein p (y) s,n ∣x s,n ) Representing the input image x s,n Identified as class y s,n Probability of y s,n Is x in the source domain dataset s,n N is the number of samples per batch in the pre-training process;
initializing parameters of a student model into parameters of a pre-trained teacher model;
s2, inputting the image in the target domain data set into a teacher model, and outputting the characteristics of the target domain image by a backbone network of the teacher model; clustering the features of the target domain images, generating pseudo tags of the target domain images according to the clustered categories, and splicing the feature vectors of each clustering center into a target domain clustering center matrixWherein P is t The number of categories after the target domain image features are clustered, c, h and w are the dimension, the height and the width of the features extracted by the main network of the teacher model respectively;
s3, inputting the image in the source domain data set into a teacher model, and outputting the characteristics of the source domain image by a backbone network of the teacher model; according to the true identity label of the source domain image, averaging the extracted feature vectors of the source domain image belonging to the same identity to obtain class center vectors of the pedestrian identity, and splicing the feature vectors of each class center into a sourceDomain class center matrixWherein P is s The true pedestrian identity category number is the source domain data;
R t and R is s Spliced into a reference center matrixK r =P t +P s
S4, keeping parameters of the teacher model unchanged, and optimizing parameters of the student model by adopting iterative training; the goal of iterative training optimization is to minimize the loss function, with each batch of samples in a particular training comprising N t Target domain images and N s A source domain image; loss function L total Classification of losses L for targets class And attention lossAnd (3) summing; the method comprises the following specific steps:
S4.1A calculating target classification loss L class
Wherein the method comprises the steps ofFor the target domain image x i Pseudo tag of->Input image x for student model i Identified as class->Probability of (2);
S4.2A calculating Source Domain image z v Influence of attention on prediction, v=1, 2, …, N s
Will z v Inputting student model to obtain z v Is the first characteristic f of (2) v ,f v ∈R c×h×w
Extracting f using spatial attention module v Attention profile F of (2) v
F v ={F v,1 ,F v,2 ,...,F v,m ,...F v,M }=attention(f v ) (4)
Wherein attention (·) represents the spatial attention module extraction attention profile operation, F v,m ∈R h×w For attention profile F v The attention characteristic diagram of the mth region of interest, M is the number of the regions of interest;
by F v,m For the first feature f v Weighting and aggregating by global average pooling operation to obtain local attention Z v,m :Z v,m =GAP(f v *F v,m ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein GAP represents a global average pooling operation;
local attention is combined and normalized to obtain a source domain image z v Global feature Z of (2) v
Z v =normalize([Z v,1 ,Z v,2 ,...,Z v,m ,...,Z v,M ]) (5)
Wherein normal represents a normalization operation;
global feature Z v Sequentially inputting a batch normalization layer and a classification layer to obtain a source domain image z v Prediction results weighted by spatial attention:
Y v =C(BN(Z v )) (6)
generating a counterattention profileThe anti-attention profile->Size and F of (2) v The same, each element value is a random number;
using anti-attention profileFor f v Weighting, carrying out global average pooling operation, combining and normalizing, and sequentially inputting a batch normalization layer and a classification layer to obtain a source domain image z v Prediction after inverse spatial attention weighting +.>
The impact of attention on predictions is as follows:
S4.3A, calculate attention loss:
y s,v is the source domain image z v Category labels of (c);for the influence of attention on prediction the value is +.>Is identified as class y s,v Probability of (2);
s5, carrying out weighted average update on parameters of the teacher model;
after the iterative training optimization is finished, carrying out weighted average update on the parameters of the teacher model according to the parameters of the student model; and inputting the target domain image into the updated teacher model, outputting class probability by the classifier, and selecting a class corresponding to the maximum value of the class probability as a recognition result of the input image.
Further, the backbone network of the teacher model comprises a first convolution module, a first pooling module, a first attention module, a second convolution module, a second attention module, a third convolution module, a third attention module, a fourth convolution module, a fourth attention module, a fifth convolution module, a fifth attention module and a second pooling module which are sequentially connected.
Further, the first attention module, the second attention module, the third attention module, the fourth attention module and the fifth attention module have the same structure and comprise a cascaded 2D convolution layer and an activation unit.
Further, uncertainty is calculated by calculating output differences of a teacher model and a student model, so that reliability or reliability of the pseudo tag of the target domain image is obtained, and a loss function is weighted by utilizing the reliability or reliability, specifically, the weighted target classification loss calculation step is as follows:
S4.1B calculating target Domain image x i Uncertainty u of (2) i ,i=1,2,…,N t
Will x i Inputting a student model, wherein a backbone network of the student model outputs x i Is the first characteristic f of (2) i
Calculating a first feature f i Similarity to the reference center matrix R as x i Is a first soft multi-label L i :L i =Softmax(R·f i );
Will x i Inputting a teacher model, outputting x by a backbone network of the teacher model i Is characterized by the second feature of (2)
Calculating a second characteristicSimilarity to the reference center matrix R as x i Is>
KL-basedDivergence calculation x i Uncertainty u of (2) i
Wherein L is i,k Representing a first soft multi-label L i The probability of the kth identity corresponding to the reference center matrix R is predicted;representing a second soft poly tag->The probability of the kth identity corresponding to the reference center matrix R is predicted;
S4.2B calculating target Domain image x i Reliability weight omega of (2) i :ω i =exp(-u i );
S4.3B calculating weighted target domain classification loss L class
Wherein the method comprises the steps ofFor the target domain image x i Pseudo tag of->Input image x for student model i Identified as class->Is a probability of (2).
Further, the loss function further includes a triplet loss, and the calculating step includes:
S4.4B, constructing a triplet according to the target domain image: from N t Randomly selecting P categories in each target domain image, wherein each category comprises K examples, and constructing triples by taking the selected target domain image as an anchor sampleWherein l=1, 2, …, P, m=1, 2, …, K, -/->For the mth instance in class I, -/->Is->Positive samples of->Is->Is a negative sample of (2);
calculating uncertainty of anchor sample, positive sample and negative sample in each triplet
Calculating reliability of positive sample to Anchor sampleAnd reliability of negative sample to Anchor sample +.>
Wherein:
S4.5B, calculating reliability weighted triplet loss:
wherein the method comprises the steps ofRepresentation->And->Similarity between->Representation->And->Similarity between; alpha>0, α is a preset distance constant.
Further, the loss function further includes a contrast loss, and the calculating steps are as follows:
S4.6B, establishing a memory bank according to the training sample; image x of target domain i As a query sample, find and x in a memory bank i Samples with the same pseudo tag as positive samples, and x i Samples with different pseudo tags as negative samples; calculating contrast loss:
wherein the method comprises the steps ofAnd->X in memory bank respectively i Number of positive and negative samples, +.>Representing the jth negative sample and x i Similarity of->Represents the jth negative sample pair x i Reliability of->Representing the h positive sample and x i Similarity of->Represents the h positive sample pair x i Reliability of (3).
On the other hand, the invention also discloses a pedestrian re-recognition system based on the anti-facts attention learning, wherein the system is a teacher model obtained according to the method.
The invention also discloses a computer storage medium, on which computer instructions are stored, which execute the pedestrian re-recognition method when running.
The beneficial effects are that: compared with the prior art, the pedestrian re-identification method and system based on the counter fact attention learning have the following advantages:
1. the invention provides a counternotice mechanics learning mechanism aiming at the problems of background clutter and shielding of pedestrian images in a data set. Most existing methods learn visual attention in a weakly supervised manner, i.e. the attention module is supervised only by the final loss function, without a powerful supervision signal to guide the training process. This likelihood-based approach explicitly oversees only the final prediction, but ignores the causal relationship between prediction and attention. The invention provides a novel causal reasoning-based anti-facts attention mechanics learning method for enhancing the discriminant feature learning of the target area pedestrian image and solving the problems of shielding and background clutter of the target area pedestrian image. The basic idea is to quantify the quality of attention by comparing the effect of facts (i.e. learned attention) and counterfacts (i.e. false attention) on the final prediction. And maximize the difference to promote more efficient visual attention for web learning and reduce the impact of biased training sets.
2. For noise labeling problems, some domain adaptive algorithms exist that help models learn more robust feature representations by correcting incorrect samples online. Although these domain adaptive algorithms have made significant progress, their recognition accuracy remains unsatisfactory with respect to unsupervised pedestrian re-recognition. The invention takes the level of inconsistency of output characteristics of two models (a student model and a teacher model based on an average teacher method) as the uncertainty distribution of a target domain sample. The negative effects of noise pseudo tags are mitigated by estimating and exploiting the uncertainty of the samples, thereby enabling a more efficient domain adaptation algorithm.
Drawings
FIG. 1 is a training frame diagram in example 1;
fig. 2 is a schematic diagram of the structure of the anti-facts attention learning module;
fig. 3 is a training frame diagram in example 2.
Detailed Description
The invention is further elucidated below in connection with the drawings and the detailed description.
Example 1:
the invention discloses a pedestrian re-identification method based on inverse fact attention learning, which comprises the following steps:
s1, establishing a teacher model and a student model, wherein the teacher model and the student model have the same structure; the input of the teacher model is a pedestrian image, and the structure comprises a backbone network and a classifier, wherein the backbone network is used for extracting the characteristics of the pedestrian image, and the classifier is used for acquiring the category probability of the pedestrian image according to the characteristics of the pedestrian image; the backbone network comprises a first convolution module, a first pooling module, a first attention module, a second convolution module, a second attention module, a third convolution module, a third attention module, a fourth convolution module, a fourth attention module, a fifth convolution module, a fifth attention module and a second pooling module which are sequentially connected. In this embodiment, the teacher model and the student model are based on a ResNet50 network to which 5 attention modules are added, the specific network is shown in Table 1.
Table 1 shows the structural parameters of the first convolution module Conv_1, the first pooling module MaxPool_1, the first attention module Att_1, the second convolution module Conv_2x, the second attention module Att_2, the third convolution module Conv_3x, the third attention module Att_3, the fourth convolution module Conv_4x, the fourth attention module Att_4, the fifth convolution module Conv_5x, the fifth attention module Att_5, the second pooling module MaxPool_2 and the Classifier Classification in this embodiment. The first attention module att_1, the second attention module att_2, the third attention module att_3, the fourth attention module att_4 and the fifth attention module att_5 have the same structure and comprise cascaded 2D convolution layers and activation units, but the output channels of the convolution layers are different.
Table 1 network architecture parameters
Pretraining a teacher model with a source domain dataset, the pretraining targeting minimizing a class loss function, the class loss functionThe method comprises the following steps:
wherein p (y) s,n ∣x s,n ) Representing the input image x s,n Identified as class y s,n Probability of y s,n Is x in the source domain dataset s,n N is the number of samples per batch in the pretraining processThe method comprises the steps of carrying out a first treatment on the surface of the In order to improve the robustness of the model, the image in the source domain data set is subjected to disturbance such as random erasure, random clipping, random overturning and the like before the pre-training, and the pre-training is performed by using the disturbed image, so that the sensitivity of the model to the image is reduced.
Initializing parameters of a student model into parameters of a pre-trained teacher model;
s2, inputting the image in the target domain data set into a teacher model, and outputting the characteristics of the target domain image by a backbone network of the teacher model; clustering the features of the target domain images, generating pseudo tags of the target domain images according to the clustered categories, and splicing the feature vectors of each clustering center into a target domain clustering center matrixWherein P is t The number of categories after the target domain image features are clustered, c, h and w are the dimension, the height and the width of the features extracted by the main network of the teacher model respectively;
s3, inputting the image in the source domain data set into a teacher model, and outputting the characteristics of the source domain image by a backbone network of the teacher model; according to the true identity label of the source domain image, averaging the extracted source domain image feature vectors belonging to the same identity to obtain class center vectors of the pedestrian identity, and splicing the feature vectors of each class center into a source domain class center matrixWherein P is s The true pedestrian identity category number is the source domain data;
R t and R is s Spliced into a reference center matrixK r =P t +P s
S4, keeping parameters of the teacher model unchanged, and optimizing parameters of the student model by adopting iterative training; the goal of iterative training optimization is to minimize the loss function, with each batch of samples in a particular training comprising N t Target domain images and N s A source domain image; loss function L in this embodiment total Classification of losses L for targets class And attention lossThe sum is that:
wherein lambda is eff Measuring a loss coefficient for a preset attention;
the attention module is an important component part of the pedestrian re-identification framework, which is used for guiding the network to extract the most discriminative features of the pedestrian image. In order to learn better attention weights, the invention introduces a counterfactual attention mechanism, and the network learns better attention weights by measuring the influence of the network learned attention feature map and the randomly generated counterfactual attention feature map on the classification result. The training frame is shown in fig. 1, wherein a solid line in the figure is the flow direction of a source domain image, a dotted line is the flow direction of a target domain image, and the specific steps are as follows:
S4.1A calculating target classification loss L class
Wherein the method comprises the steps ofFor the target domain image x i Pseudo tag of->Input image x for student model i Identified as class->Probability of (2);
S4.2A calculating Source Domain image z v Influence value of attention on prediction, v=1,2,…,N s
Will z v Inputting student model to obtain z v Is the first characteristic f of (2) v ,f v ∈R c×h×w
Extracting f using spatial attention module v Attention profile F of (2) v
F v ={F v,1 ,F v,2 ,...,F v,m ,...F v,M }=attention(f v ) (4)
Wherein attention (·) represents the spatial attention module extraction attention profile operation, F v,m ∈R h×w For attention profile F v The attention characteristic diagram of the mth region of interest, M is the number of the regions of interest; in this embodiment, M takes an empirical value of 8, i.e. the attention feature map is divided into 8 regions of interest, and each region may have different features, such as a backpack, shoes, clothing, etc. of a pedestrian, all of which are embodied as different features. In this embodiment, the spatial attention module uses literature: rao Y, chen G, lu J, et al Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification [ J ]]2021 the method of the present invention, the spatial attention module is implemented using a 2D convolution layer and a Relu activation function.
By F v,m For the first feature f v Weighting and aggregating by global average pooling operation to obtain local attention Z v,m :Z v,m =GAP(f v *F v,m ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein GAP represents a global average pooling operation;
local attention is combined and normalized to obtain a source domain image z v Global feature Z of (2) v
Z v =normalize([Z v,1 ,Z v,2 ,...,Z v,m ,...,Z v,M ]) (5)
Wherein normal represents a normalization operation;
global feature Z v Sequentially inputting a batch normalization layer and a classification layer to obtain a source domain image z v Prediction results weighted by spatial attention:
Y v =C(BN(Z v )) (6)
generating a counterattention profileThe anti-attention profile->Size and F of (2) v The same, each element value is a random number;
using anti-attention profileFor f v Weighting, carrying out global average pooling operation, combining and normalizing, and sequentially inputting a batch normalization layer and a classification layer to obtain a source domain image z v Prediction after inverse spatial attention weighting +.>
The impact of attention on predictions is as follows:
the calculation of (2) is completed by a counterfacts attention mechanics learning module, and the structure of the counterfacts attention mechanics learning module is shown in figure 2.
S4.3A, calculate attention loss:
y s,v is the source domain image z v Category labels of (c);for the influence of attention on prediction the value is +.>Is identified as class y s,v Probability of (2);
s5, carrying out weighted average update on parameters of the teacher model;
and after the iterative training optimization is finished, carrying out weighted average updating on the parameters of the teacher model according to the parameters of the student model. The present example uses literature: the method in Tarvainen A, valpola H.mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results [ J ].2017 optimizes the parameters of the student model and updates the parameters of the teacher model with a weighted average.
And inputting the target domain image into the updated teacher model, outputting class probability by the classifier, and selecting a class corresponding to the maximum value of the class probability as a recognition result of the input image.
Example 2:
the difference between this embodiment and embodiment 1 is that the portion of the objective classification loss function is weighted with a confidence weight.
Incorrect pseudo tags can mislead feature learning during training, affecting the performance of the domain adaptive model. In this embodiment, uncertainty distribution is estimated by evaluating output differences of the teacher model and the student model, so as to evaluate reliability or reliability of the pseudo tag of the target domain image, and weight the loss function by using the reliability or reliability, so as to suppress negative influence of the false tag sample on training, and improve domain self-adaptability of the model. The objective of iterative training optimization is to minimize a loss function, which may include one or more of classification loss, triplet loss, contrast loss, in addition to attention loss, and in this embodiment, the loss function is the sum of the attention loss and the three losses, and the training framework is as shown in fig. 3, specifically, a training batch includes the steps of:
S4.1B calculating target Domain image x i Uncertainty u of (2) i ,i=1,2,…,N t
Will x i Inputting a student model, said learningBackbone network output x of green model i Is the first characteristic f of (2) i
Calculating a first feature f i Similarity to the reference center matrix R as x i Is a first soft multi-label L i :L i =Softmax(R·f i );
Will x i Inputting a teacher model, outputting x by a backbone network of the teacher model i Is characterized by the second feature of (2)
Calculating a second characteristicSimilarity to the reference center matrix R as x i Is>
The present invention uses KL divergence to measure the difference between probability distributions of feature vectors extracted by a teacher model and a student model, namely:
calculating x based on KL divergence i Uncertainty u of (2) i
Wherein L is i,k Representing a first soft multi-label L i The probability of the kth identity corresponding to the reference center matrix R is predicted;representing a second soft poly tag->Is predicted as the reference centerProbability of kth identity corresponding to matrix R;
S4.2B calculating target Domain image x i Reliability weight omega of (2) i :ω i =exp(-u i );
S4.3B according to the calculation formula of uncertainty, the larger the difference of the teacher model and the student model on the same image output is, the higher the uncertainty is, and correspondingly, the lower the credibility weight is; thereby obtaining the target classification loss L weighted by the credibility weight class
Wherein the method comprises the steps ofFor the target domain image x i Pseudo tag of->Input image x for student model i Identified as class->Probability of (2);
S4.4B, constructing a triplet according to the target domain image: from N t Randomly selecting P categories in each target domain image, wherein each category comprises K examples, and constructing triples by taking the selected target domain image as an anchor sampleWherein l=1, 2, …, P, m=1, 2, …, K, -/->For the mth instance in class I, -/->Is->Positive samples of (i.e.)>Is->Target domain images of the same category; />Is->Negative samples of (i.e.)>Is->Target domain images of different categories;
calculate the Anchor samples in each triplet according to step S4.1BPositive sample->Negative sample->Uncertainty of +.>
Calculating reliability of positive sample to Anchor sampleAnd reliability of negative sample to Anchor sample +.>
In this embodiment, the mean value of the reliability of the positive sample and the Anchor sample is used as the reliability of the positive sample to the Anchor sample, namely:
in the same way, the processing method comprises the steps of,
S4.5B, calculating reliability weighted triplet loss:
wherein the method comprises the steps ofRepresentation->And->Similarity between->Representation->And->Similarity between; in the embodiment, euclidean distance is adopted to represent similarity; alpha>0, alpha is a preset distance constant, so that the distance between the anchor sample and the negative sample is larger, and the distance between the anchor and the positive sample is largerThe distance is smaller; from a mathematical perspective, the lower the confidence (higher uncertainty) the pair of samples, the less similarity weight. The smaller the gradient in the optimization corresponds, i.e. the smaller the contribution to the optimization.
S4.6B, establishing a memory bank according to the training sample;
memory bank based methods have been widely used for unsupervised representation learning, which helps introduce contrast loss for general tasks, which is an optimized network that stores feature representations of training set data extracted through the backbone network before training begins to take advantage of better similarity between samples and samples of the global memory bank. The invention adopts the following documents: wang X, zhang H, huang W, et al Cross-Batch Memory for Embedding Learning [ J ] 2019. The method of creating and maintaining a memory bank;
image x of target domain i As a query sample, find and x in a memory bank i Samples with the same pseudo tag as positive samples, and x i Samples with different pseudo tags as negative samples; calculating contrast loss:
wherein the method comprises the steps ofAnd->X in memory bank respectively i Number of positive and negative samples, +.>Representing the jth negative sample and x i Similarity of->Represents the jth negative sample pair x i Reliability of->Representing the h positive sample and x i Similarity of->Represents the h positive sample pair x i Reliability of (3); here, the similarity is also calculated using the euclidean distance, and the reliability is calculated in accordance with equations (9) and (11) in the same manner as in step S4.4B. The lower the confidence of the sample pair, the smaller the gradient, and the less contribution to the optimization of the sample pair. Thus, the contrast loss using reliability weighting obtains an optimal update to the network parameters by giving more weight to the negative samples most similar to the query samples and the positive samples least similar to the query samples.
To sum up, the objective of iterative training optimization in this embodiment is to minimize the loss function L total
L total =L class +L tripletct L contrastiveeff L effect (15)
Wherein lambda is eff Attention loss L is a preset attention loss coefficient of a source domain effect Is the same as in example 1;
this example conducted comparative experiments on both the Market-1501 and DukeMTMC-ReID datasets, the results of which are shown in Table 2.
Table 2: testing results on Market-1501, dukeMTMC-ReID dataset
In table 2, the method BUC is as follows: identification is carried out by the method in Lin Y, X Dong, zheng L, et al A Bottom-Up Clustering Approach to Unsupervised Person Re-Identification [ J ]. Proceedings of the AAAI Conference on Artificial Intelligence,2019, 33:8738-8745; the method ECN adopts the following literature: the method of Zhong Z, zheng L, luo Z, et al Invariance materials Exemplar Memory for Domain Adaptive Person Re-identification [ J ]. IEEE, 2019; MAR is described in: yu H X, zheng W S, wu A, et al, superior Person Re-Identification by Soft Multilabel Learning [ C ]//2019IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE,2019. The method of the present invention identifies; PDA-Net is described in literature: the method in Li Y J, lin C S, lin Y B, et al Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation [ J ]. IEEE, 2019; the PCB adopts the literature: the method in Zhang X, cao J, shen C, et al self-training with progressive augmentation for unsupervised cross-domain person re-identification [ J ]. IEEE, 2019; SSG is described in: identification is performed by the method of Fu Y, wei Y, wang G, et al Self-similarity Grouping: A Simple Unsupervised Cross Domain Adaptation Approach for Person Re-identification [ J ]. 2018; MPLP is a literature: wang D, zhang S.insuperved Person Re-identification via Multi-label Classification [ J ]. IEEE,2020. Method of identification; AD-Cluster is disclosed in the following literature: identification is performed by the method in Zhai Y, lu S, ye Q, et al AD-Cluster Augmented Discriminative Clustering for Domain Adaptive Person Re-Identification [ J ]. IEEE, 2020; MMT is a literature: the method in Liu J, zha Z J, D Chen, et al adaptive Transfer Network for Cross-Domain Person Re-Identification [ C ]//2019IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) & IEEE, 2019; NRMT is a literature: the method of Zhao F, liao S, xie G S, et al, superior visual Domain Adaptation with Noise Resistible Mutual-Training for Person Re-identification [ C ]// European Conference on Computer vision Springer, cham, 2020; the UNRN is as follows: identification is performed by the method of Zheng K, lan C, zeng W, et al, identification Sample Uncertainty for Domain Adaptive Person Re-Identification [ J ]. 2020; GLT is described in literature: identification is performed by the method of Zheng K, liu W, he L, et al group-aware Label Transfer for Domain Adaptive Person Re-identification [ J ]. 2021; the Ours is the method in this embodiment for identification.
As can be seen from Table 2, the Rank-1, rank-5, rank-10 and mAP indexes of the method are all superior to those of the advanced method at the time to a certain extent, and the cross-domain experiments performed on the two data sets also prove that the method has good generalization performance on various data sets.

Claims (8)

1. A pedestrian re-recognition method based on anti-facts attention learning, comprising the steps of:
s1, establishing a teacher model and a student model, wherein the teacher model and the student model have the same structure; the input of the teacher model is a pedestrian image, and the structure comprises a backbone network and a classifier, wherein the backbone network is used for extracting the characteristics of the pedestrian image, and the classifier is used for acquiring the category probability of the pedestrian image according to the characteristics of the pedestrian image;
pretraining a teacher model with a source domain dataset, the pretraining targeting minimizing a class loss function, the class loss functionThe method comprises the following steps:
wherein p (y) s,n ∣x s,n ) Representing the input image x s,n Identified as class y s,n Probability of y s,n Is x in the source domain dataset s,n N is the number of samples per batch in the pre-training process;
initializing parameters of a student model into parameters of a pre-trained teacher model;
s2, inputting the image in the target domain data set into a teacher model, and outputting the characteristics of the target domain image by a backbone network of the teacher model; clustering the features of the target domain images, generating pseudo tags of the target domain images according to the clustered categories, and splicing the feature vectors of each clustering center into a target domain clustering center matrixWherein P is t The number of categories after the target domain image features are clustered, c, h and w are the dimension, the height and the width of the features extracted by the main network of the teacher model respectively;
s3, inputting the image in the source domain data set into a teacher model, and outputting the characteristics of the source domain image by a backbone network of the teacher model; according to the true identity label of the source domain image, averaging the extracted source domain image feature vectors belonging to the same identity to obtain class center vectors of the pedestrian identity, and splicing the feature vectors of each class center into a source domain class center matrixWherein P is s The true pedestrian identity category number is the source domain data;
R t and R is s Spliced into a reference center matrix
S4, keeping parameters of the teacher model unchanged, and optimizing parameters of the student model by adopting iterative training; the goal of iterative training optimization is to minimize the loss function, with each batch of samples in a particular training comprising N t Target domain images and N s A source domain image; loss function L total Classification of losses L for targets class And attention lossAnd (3) summing; the method comprises the following specific steps:
S4.1A calculating target classification loss L class
Wherein the method comprises the steps ofFor target domain imagesx i Pseudo tag of->Input image x for student model i Identified as class->Probability of (2);
S4.2A calculating Source Domain image z v Influence of attention on prediction, v=1, 2, …, N s
Will z v Inputting student model to obtain z v Is the first characteristic f of (2) v ,f v ∈R c×h×w
Extracting f using spatial attention module v Attention profile F of (2) v
F v ={F v,1 ,F v,2 ,...,F v,m ,...F v,M }=attention(f v ) (4)
Wherein attention (·) represents the spatial attention module extraction attention profile operation, F v,m ∈R h×w For attention profile F v The attention characteristic diagram of the mth region of interest, M is the number of the regions of interest;
by F v,m For the first feature f v Weighting and aggregating by global average pooling operation to obtain local attention Z v,m :Z v,m =GAP(f v *F v,m ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein GAP represents a global average pooling operation;
local attention is combined and normalized to obtain a source domain image z v Global feature Z of (2) v
Z v =normalize([Z v,1 ,Z v,2 ,...,Z v,m ,...,Z v,M ]) (5)
Wherein normal represents a normalization operation;
global feature Z v Sequentially inputting a batch normalization layer and a classification layer to obtain a source domain image z v Prediction results weighted by spatial attention:
Y v =C(BN(Z v )) (6)
generating a counterattention profileThe anti-attention profile->Size and F of (2) v The same, each element value is a random number;
using anti-attention profileFor f v Weighting, carrying out global average pooling operation, combining and normalizing, and sequentially inputting a batch normalization layer and a classification layer to obtain a source domain image z v Prediction after inverse spatial attention weighting +.>
The impact of attention on predictions is as follows:
S4.3A, calculate attention loss:
y s,v is the source domain image z v Category labels of (c); p (y) s,v ∣Y v effect ) To make the influence of attention on prediction be Y v effect Is identified as class y s,v Probability of (2);
s5, carrying out weighted average update on parameters of the teacher model;
after the iterative training optimization is finished, carrying out weighted average update on the parameters of the teacher model according to the parameters of the student model; and inputting the target domain image into the updated teacher model, outputting class probability by the classifier, and selecting a class corresponding to the maximum value of the class probability as a recognition result of the input image.
2. The pedestrian re-recognition method based on the anti-facts attention learning of claim 1, wherein the backbone network of the teacher model includes a first convolution module, a first pooling module, a first attention module, a second convolution module, a second attention module, a third convolution module, a third attention module, a fourth convolution module, a fourth attention module, a fifth convolution module, a fifth attention module, a second pooling module, which are sequentially connected.
3. The pedestrian re-recognition method based on the anti-facts attention learning of claim 2, wherein the first attention module, the second attention module, the third attention module, the fourth attention module, the fifth attention module are identical in structure, and comprise a cascade 2D convolution layer and an activation unit.
4. The pedestrian re-recognition method based on the anti-facts attention learning according to claim 1, wherein the uncertainty is calculated by calculating the output difference of the teacher model and the student model, so as to obtain the reliability or the reliability of the pseudo tag of the target domain image, and the loss function is weighted by using the reliability or the reliability, specifically, the weighted target classification loss calculation step is as follows:
S4.1B calculating target Domain image x i Uncertainty u of (2) i ,i=1,2,…,N t
Will x i Inputting a student model, wherein a backbone network of the student model outputs x i Is the first characteristic f of (2) i
Calculating a first feature f i Similarity to the reference center matrix R as x i Is a first soft multi-label L i :L i =Softmax(R·f i );
Will x i Inputting a teacher model, outputting x by a backbone network of the teacher model i Is characterized by the second feature of (2)
Calculating a second characteristicSimilarity to the reference center matrix R as x i Is>
Calculating x based on KL divergence i Uncertainty u of (2) i
Wherein L is i,k Representing a first soft multi-label L i The probability of the kth identity corresponding to the reference center matrix R is predicted;representing a second soft poly tag->The probability of the kth identity corresponding to the reference center matrix R is predicted;
S4.2B calculating target Domain image x i Reliability weight omega of (2) i :ω i =exp(-u i );
S4.3B calculating weighted target domain classification loss L class
Wherein the method comprises the steps ofFor the target domain image x i Pseudo tag of->Input image x for student model i Identified as class->Is a probability of (2).
5. The pedestrian re-recognition method based on the counter-facts attention learning according to claim 1, wherein the loss function further includes a triplet state loss, and the calculating step includes:
S4.4B, constructing a triplet according to the target domain image: from N t Randomly selecting P categories in each target domain image, wherein each category comprises K examples, and constructing triples by taking the selected target domain image as an anchor sampleWherein l=1, 2, …, P, m=1, 2, …, K, -/->For the mth instance in class I, -/->Is->Positive samples of->Is->Is a negative sample of (2);
calculating uncertainty of anchor sample, positive sample and negative sample in each triplet
Calculating reliability of positive sample to Anchor sampleAnd reliability of negative sample to Anchor sample +.>
Wherein:
S4.5B, calculating reliability weighted triplet loss:
wherein the method comprises the steps ofRepresentation->And->Similarity between->Representation->And->Similarity between; alpha>0, α is a preset distance constant.
6. The pedestrian re-recognition method based on the anti-facts attention learning of claim 1, wherein the loss function further includes a contrast loss, the calculating step is:
S4.6B, establishing a memory bank according to the training sample; image x of target domain i As a query sample, find and x in a memory bank i Samples with the same pseudo tag as positive samples, and x i Samples with different pseudo tags as negative samples; calculating contrast loss:
wherein the method comprises the steps ofAnd->X in memory bank respectively i Number of positive and negative samples, +.>Representing the jth negative sample and x i Similarity of->Represents the jth negative sample pair x i Reliability of->Representing the h positive sample and x i Similarity of->Represents the h positive sample pair x i Reliability of (3).
7. A pedestrian re-recognition system based on counter-facts attention learning, characterized in that the system is a teacher model obtained according to the method of any one of claims 1-6.
8. A computer storage medium having stored thereon computer instructions which, when run, perform the pedestrian re-identification method of any one of claims 1 to 6.
CN202210606376.1A 2022-05-31 2022-05-31 Pedestrian re-recognition method, system and medium based on anti-facts attention learning Active CN114882534B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210606376.1A CN114882534B (en) 2022-05-31 2022-05-31 Pedestrian re-recognition method, system and medium based on anti-facts attention learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210606376.1A CN114882534B (en) 2022-05-31 2022-05-31 Pedestrian re-recognition method, system and medium based on anti-facts attention learning

Publications (2)

Publication Number Publication Date
CN114882534A CN114882534A (en) 2022-08-09
CN114882534B true CN114882534B (en) 2024-03-26

Family

ID=82679768

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210606376.1A Active CN114882534B (en) 2022-05-31 2022-05-31 Pedestrian re-recognition method, system and medium based on anti-facts attention learning

Country Status (1)

Country Link
CN (1) CN114882534B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051948B (en) * 2023-03-08 2023-06-23 中国海洋大学 Fine granularity image recognition method based on attention interaction and anti-facts attention
CN117079142B (en) * 2023-10-13 2024-01-26 昆明理工大学 Anti-attention generation countermeasure road center line extraction method for automatic inspection of unmanned aerial vehicle

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200121206A (en) * 2019-04-15 2020-10-23 계명대학교 산학협력단 Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof
CN111881714A (en) * 2020-05-22 2020-11-03 北京交通大学 Unsupervised cross-domain pedestrian re-identification method
CN114092964A (en) * 2021-10-19 2022-02-25 杭州电子科技大学 Cross-domain pedestrian re-identification method based on attention guidance and multi-scale label generation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200121206A (en) * 2019-04-15 2020-10-23 계명대학교 산학협력단 Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof
CN111881714A (en) * 2020-05-22 2020-11-03 北京交通大学 Unsupervised cross-domain pedestrian re-identification method
CN114092964A (en) * 2021-10-19 2022-02-25 杭州电子科技大学 Cross-domain pedestrian re-identification method based on attention guidance and multi-scale label generation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
弱监督场景下的行人重识别研究综述;祁磊;于沛泽;高阳;;软件学报;20200915(09);全文 *

Also Published As

Publication number Publication date
CN114882534A (en) 2022-08-09

Similar Documents

Publication Publication Date Title
Ge et al. Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification
CN108960140B (en) Pedestrian re-identification method based on multi-region feature extraction and fusion
CN114882534B (en) Pedestrian re-recognition method, system and medium based on anti-facts attention learning
CN112069920B (en) Cross-domain pedestrian re-identification method based on attribute feature driven clustering
Zhao et al. A two-stage feature selection method with its application
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
Liu et al. Incdet: In defense of elastic weight consolidation for incremental object detection
CN114492574A (en) Pseudo label loss unsupervised countermeasure domain adaptive picture classification method based on Gaussian uniform mixing model
US11182602B2 (en) Method and system for person re-identification
CN113807420A (en) Domain self-adaptive target detection method and system considering category semantic matching
Wan et al. Joint feature extraction and classification in a unified framework for cost-sensitive face recognition
CN111783841A (en) Garbage classification method, system and medium based on transfer learning and model fusion
CN117152459B (en) Image detection method, device, computer readable medium and electronic equipment
Tao et al. Object detection with class aware region proposal network and focused attention objective
CN116777006A (en) Sample missing label enhancement-based multi-label learning method, device and equipment
Fang et al. Separating noisy samples from tail classes for long-tailed image classification with label noise
CN116051479A (en) Textile defect identification method integrating cross-domain migration and anomaly detection
Ragab Leveraging mayfly optimization with deep learning for secure remote sensing scene image classification
CN107993311B (en) Cost-sensitive latent semantic regression method for semi-supervised face recognition access control system
Hiba et al. Hierarchical attention-based age estimation and Bias estimation
CN114267060A (en) Face age identification method and system based on uncertain suppression network model
Zhang et al. A Face Detection Method Based on Image Processing and Improved Adaptive Boosting Algorithm.
CN116910571A (en) Open-domain adaptation method and system based on prototype comparison learning
Long et al. Object-level representation learning for few-shot image classification
Bai et al. A unified deep learning model for protein structure prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant