CN114882531A - Cross-domain pedestrian re-identification method based on deep learning - Google Patents

Cross-domain pedestrian re-identification method based on deep learning Download PDF

Info

Publication number
CN114882531A
CN114882531A CN202210554612.XA CN202210554612A CN114882531A CN 114882531 A CN114882531 A CN 114882531A CN 202210554612 A CN202210554612 A CN 202210554612A CN 114882531 A CN114882531 A CN 114882531A
Authority
CN
China
Prior art keywords
domain
source domain
training
sample
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210554612.XA
Other languages
Chinese (zh)
Inventor
葛永新
张俊银
华博誉
徐玲
黄晟
洪明坚
王洪星
张小洪
杨丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202210554612.XA priority Critical patent/CN114882531A/en
Publication of CN114882531A publication Critical patent/CN114882531A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a cross-domain pedestrian re-identification method based on deep learning, which comprises the following steps: selecting a public data set as a source domain and a target domain; selecting a ResNet-50 model M and initializing parameters thereof to obtain M'; taking a source domain and a target domain as the input of an initialization model M ', calculating corresponding loss, training the model M ', stopping training after the maximum training times is reached, and obtaining a trained model M '; and inputting the image of the pedestrian to be predicted into the trained model M' to obtain the retrieval result of the pedestrian. The method can more accurately detect and identify the specific pedestrian.

Description

Cross-domain pedestrian re-identification method based on deep learning
Technical Field
The invention relates to the field of pedestrian re-identification, in particular to a cross-domain pedestrian re-identification method based on deep learning.
Background
The current pedestrian re-identification task aims to search for a specific pedestrian under a camera. Due to its important application in intelligent monitoring, the task of pedestrian re-identification has become one of the research hotspots in the field of computer vision. Satisfactory performance has been achieved in recent years based on supervised pedestrian re-identification methods. However, most supervised methods suffer from a significant degradation in performance when the trained and tested pedestrian samples are from different data sets. In the real world, the labeling of pedestrian data is expensive and time consuming, and therefore, the unsupervised cross-domain pedestrian re-identification task is of great concern to the scholars.
The purpose of unsupervised cross-domain pedestrian re-identification is to migrate discriminative knowledge from the source domain to the unlabeled target domain and expect the test results of the model in the target domain to be comparable to supervised methods. This task also presents a significant challenge due to the large inter-domain differences between the source domain and the target domain. To date, clustering-based cross-domain pedestrian re-identification methods have made great progress, and most of the most advanced existing methods are clustering-based methods; these processes can generally be divided into two stages: 1) carrying out supervised pre-training model by using the labeled source domain data; 2) and distributing pseudo labels on the target domain by using a clustering algorithm and iteratively fine-tuning the pre-training model.
However, a pedestrian re-recognition model that is iteratively trained during the fine-tuning phase may gradually forget discriminative knowledge from the source domain, i.e., catastrophic forget. By observation, this phenomenon can be explained from two aspects: 1) with the increase of the iteration times of the model in the fine tuning stage, the test result of the model on the source domain gradually decreases; 2) simple removal of the pre-training portion results in only minor performance degradation for most clustering methods. Therefore, it can be concluded that most existing clustering-based methods do not fully utilize discriminative knowledge on the source domain; discriminant knowledge from the source domain is important to improve the performance of the model in the target domain.
Disclosure of Invention
Aiming at the problems in the prior art, the technical problems to be solved by the invention are as follows: the inability to fully exploit domain sharing knowledge on the source domain results in poor discrimination on the unmarked target domain in the prior art.
In order to solve the technical problems, the invention adopts the following technical scheme:
a cross-domain pedestrian re-identification method based on deep learning comprises the following steps:
s100: selecting public data sets A and B, and using the data set A as a source domain D s The formula is as follows:
Figure BDA0003651892610000021
wherein the content of the first and second substances,
Figure BDA0003651892610000022
represents the ith source domain sample,
Figure BDA0003651892610000023
representing the corresponding real label, n, of the ith source domain sample s Represents the total number of source domain samples;
selecting partial data in the data set B as a target domain training set D T ,D T Watch (A)The expression is as follows:
Figure BDA0003651892610000024
wherein the content of the first and second substances,
Figure BDA0003651892610000025
denotes the jth target domain sample, n t Represents the total number of samples of the target domain;
s200: selecting a ResNet-50 model M, wherein the model M comprises two modules, namely an online feature encoder f (· | theta) t ),θ t The related parameters of the module one, and the momentum feature encoder of the module two
Figure BDA0003651892610000026
The parameters are related to the module II;
initializing parameters of the model M by using a data set ImageNet to obtain an initialized model M';
s300: calculating the loss of the initialization model M' by using a loss function;
s400: training the model M 'by taking the source domain and the target domain as the input of the initialization model M', updating the parameters in the model M 'according to the loss calculated in the step S300, and stopping training when the maximum training times is reached to obtain a trained model M';
s500: and inputting the image of the pedestrian to be predicted into the trained model M' to obtain the retrieval result of the pedestrian.
Preferably, the loss function of the initialization model M' in S300 is as follows:
s310: using a momentum feature encoder pair D T The data in the database is subjected to feature extraction and is stored in a memory feature library N, and then all the features in the N are clustered by utilizing a DBSCAN clustering algorithm and generated
Figure BDA0003651892610000027
One-to-one pseudo label
Figure BDA0003651892610000028
S320: calculating training weight w of each iteration of source domain by using time sequence domain relation strategy d (i) Presetting the maximum training weight t of each iteration of the source domain 1 Minimum is t 2 Wherein t is 1 >t 2 The calculation expression is as follows:
Figure BDA0003651892610000029
w d (i)=(1-s(i))×t 1 +s(i)×t 2
wherein, the symbol% represents the operation of taking the remainder, i represents the ith training, e represents the maximum training time, w d (i) Representing the training weight of the source domain acting on the ith iteration training, s (i) representing t 1 And t 2 The length of each part after equal interval division;
s330: computing training weights for each source domain sample using a rank guided selection strategy
Figure BDA00036518926100000210
The method comprises the following specific steps:
s331: randomly selecting a source domain sample from the source domain Ds
Figure BDA00036518926100000211
And using a line feature encoder f (· | θ) t ) To pair
Figure BDA00036518926100000212
Extracting features, and then utilizing the class classifier of the target domain and the class classifier of the source domain respectively
Figure BDA00036518926100000213
Classifying and calculating respectively
Figure BDA00036518926100000214
Probability distribution of classification over target domain
Figure BDA00036518926100000215
And probability distribution of classification over source domain
Figure BDA00036518926100000216
The calculation expression is as follows:
Figure BDA00036518926100000217
Figure BDA00036518926100000218
wherein the content of the first and second substances,
Figure BDA0003651892610000031
to represent
Figure BDA0003651892610000032
Probability distribution of classification over target domain, C t Class classifier on the representation target domain, c p Representing the number of categories of pseudo labels on the target domain;
Figure BDA0003651892610000033
representative sample
Figure BDA0003651892610000034
Probability distribution of classification over source domain, c s Number of categories of real tags on source domain, C s Representing a class classifier on the target domain;
s332: computing
Figure BDA0003651892610000035
Similarity score with target domain, Si, expressed as follows:
Figure BDA0003651892610000036
wherein, c p Representing the number of categories of pseudo labels on the target domain;
s333: calculating similarity scores of all the source domain samples and the target domain to form a similarity score set
Figure BDA0003651892610000037
Then, all similarity scores are arranged in a descending order, and the source domain samples corresponding to the former k% of similarity scores are taken as a reliability sample set delta s The expression is as follows:
Figure BDA0003651892610000038
wherein, tau s Representing the similarity score of the kth% source domain sample;
s334: definition of
Figure BDA00036518926100000326
The maximum class probability and the second largest class probability at the source domain are respectively
Figure BDA00036518926100000310
And
Figure BDA00036518926100000311
computing
Figure BDA00036518926100000312
Uncertainty U over the source domain i The expression is as follows:
Figure BDA00036518926100000313
s335: calculating all source domain sample uncertainty values to form an uncertainty value set
Figure BDA00036518926100000314
Then, all uncertainty values are arranged in an ascending order, and the source domain sample corresponding to the top k% uncertainty value is taken as an uncertainty sample set delta u The expression is as follows:
Figure BDA00036518926100000315
s336: obtaining the training weight of each source domain sample by combining formula (6) and formula (8)
Figure BDA00036518926100000316
The expression is as follows:
Figure BDA00036518926100000317
s340: calculating the cross entropy loss of the source domain according to the source domain sample training weight obtained in S336
Figure BDA00036518926100000318
The specific expression is as follows:
Figure BDA00036518926100000319
wherein the content of the first and second substances,
Figure BDA00036518926100000320
representing source domain samples
Figure BDA00036518926100000321
Belong to the category
Figure BDA00036518926100000322
The probability of (d);
s350: calculating the triple loss of the source domain according to the training weight of the source domain sample obtained in the step S336
Figure BDA00036518926100000323
The method comprises the following specific steps:
s351: calculate the ith to
Figure BDA00036518926100000324
Lost for anchor tripletsThe weight is
Figure BDA00036518926100000325
The calculation expression is as follows:
Figure BDA0003651892610000041
wherein the content of the first and second substances,
Figure BDA0003651892610000042
is shown and
Figure BDA0003651892610000043
the source domain positive samples that are the farthest away,
Figure BDA0003651892610000044
is shown and
Figure BDA0003651892610000045
the nearest source domain negative examples;
s352: after calculating the triple loss of all the source domain samples, the triple loss of the source domain can be obtained
Figure BDA0003651892610000046
The specific expression is as follows:
Figure BDA0003651892610000047
wherein the content of the first and second substances,
Figure BDA0003651892610000048
and
Figure BDA0003651892610000049
respectively representing source domain samples
Figure BDA00036518926100000410
The distance between the farthest source domain positive sample and the nearest source domain negative sample, and m represents the interval size of the triplet;
s360: computing cross-entropy loss for target domains
Figure BDA00036518926100000411
And triple loss
Figure BDA00036518926100000412
The specific expression is as follows:
Figure BDA00036518926100000413
Figure BDA00036518926100000414
wherein the content of the first and second substances,
Figure BDA00036518926100000415
representing target domain samples
Figure BDA00036518926100000416
Belong to the category
Figure BDA00036518926100000417
The probability of (c).
Figure BDA00036518926100000418
And
Figure BDA00036518926100000419
respectively representing target domain samples
Figure BDA00036518926100000420
The distance between the positive sample of the farthest target domain and the negative sample of the nearest target domain, and m represents the interval size of the triad;
s370: the final loss function L of the initialized model M' can be obtained according to the formula (10), the formula (12), the formula (13) and the formula (14) total The expression is as follows:
Figure BDA00036518926100000421
wherein the content of the first and second substances,
Figure BDA00036518926100000422
representing the soft cross-entropy loss weight,
Figure BDA00036518926100000423
representing soft triplet loss weights.
The combination of cross entropy loss and triple loss is adopted for calculation, so that a weight balance effect can be achieved, and the influence of a noise pseudo label generated by a target domain on model training can be effectively reduced.
Preferably, the final loss function L in S370 is utilized total Calculating the loss of M' and updating f (· | theta) by gradient back propagation t ) Middle parameter, updated by equation (16)
Figure BDA00036518926100000424
And the parameters of (A) and (B):
Figure BDA00036518926100000425
where α is the momentum factor and t represents the number of rounds of training.
Compared with the prior art, the invention has at least the following advantages:
1. aiming at the problem that the prior art method possibly cannot fully utilize the source knowledge in the training process, the invention provides a novel PKSD method which can effectively utilize the source domain knowledge in the whole training process and improve the accuracy of the discrimination on the unmarked target domain.
2. The invention provides a time sequence domain relation method TDR with linear change, which reduces the influence of domain specific samples in a source domain by reducing the training weight of the source domain.
3. The invention provides a sequencing-guided sample selection method RIS, which selects source domain samples with rich and reliable information by calculating uncertainty and similarity indexes of the source domain samples.
4. In order to alleviate the influence of catastrophic forgetting on a source domain, the pedestrian re-recognition model is trained in a collaborative training mode. Specifically, there is a training model that is common to the truly labeled source domain samples and the target domain samples assigned with the pseudo-labels. Different from most methods in the past, the method does not adopt a two-stage training strategy of pre-training and fine adjustment, but changes into a single-stage cooperative training mode. However, as the number of training rounds grows, the model tends to over-fit to some domain-specific knowledge on the source domain, which can impair the performance of the model on the target domain when the domain difference between the source domain and the target domain is large.
Drawings
FIG. 1 shows the main structure of PKSD according to the method of the present invention.
FIG. 2 shows the validity verification results of the method of the present invention and other different methods.
Detailed Description
The present invention is described in further detail below.
The invention trains the pedestrian re-recognition model in a cooperative training mode. Specifically, there is a training model that is common to the truly labeled source domain samples and the target domain samples assigned with the pseudo-labels. Different from most methods in the past, the method does not adopt a two-stage training strategy of pre-training and fine adjustment, but changes into a single-stage cooperative training mode. However, as the number of training rounds grows, the model tends to over-fit to some domain-specific knowledge on the source domain, which can impair the performance of the model on the target domain when the domain difference between the source domain and the target domain is large.
In order to solve the above problems, the present invention proposes a novel cross-Domain pedestrian re-identification method of Source Domain Knowledge Preservation (PKSD) to effectively utilize Knowledge from the Source Domain in the whole training process. Unlike previous two-stage based training criteria, PKSD employs a collaborative training strategy, i.e., learning both source domain samples and target domain samples. Specifically, in each iteration, the PKSD trains the model together by using not only the target domain data with the pseudo-label as input to the model, but also the source domain data with the true label as input. While the source domain samples are fully utilized, the domain-specific knowledge present in the source domain plays a detrimental role in the domain-applicable task. Therefore, a linear Time Domain Relationship (TDR) method is proposed in this section to gradually alleviate the influence of the source Domain samples. Specifically, as the number of training times increases, the training weight of the source domain is gradually decreased. However, some information-rich and reliable domain sharing knowledge is helpful to improve the performance of the model on the target domain. Further, this section proposes a Ranking-guided sample Selection (RIS) method to evaluate uncertainty and similarity of each sample from the source domain, select samples with informative and reliable domain sharing knowledge by Ranking with uncertainty and similarity scores, and reassign their sample training weights. In general, by controlling the source domain weights and the sample weights, the proposed PKSD can effectively suppress the influence of domain-specific knowledge from the source domain, improving the test performance of the model on the target domain. The experimental results show that the proposed method greatly exceeds the most advanced method at present.
Referring to fig. 1, a cross-domain pedestrian re-identification method based on deep learning includes the following steps:
s100: selecting public data sets A and B, and using the data set A as a source domain D s The formula is as follows:
Figure BDA0003651892610000061
wherein the content of the first and second substances,
Figure BDA0003651892610000062
represents the ith source domain sample,
Figure BDA0003651892610000063
representing the corresponding real label, n, of the ith source domain sample s Representing a source domainTotal number of samples;
selecting partial data in the data set B as a target domain training set D T ,D T The expression of (a) is as follows:
Figure BDA0003651892610000064
wherein the content of the first and second substances,
Figure BDA0003651892610000065
denotes the jth target domain sample, n t Represents the total number of samples of the target domain;
s200: selecting a ResNet-50 model M, wherein the model M comprises two modules, namely an online feature encoder f (· | theta) t ),θ t The related parameters of the module one, and the momentum feature encoder of the module two
Figure BDA0003651892610000069
The parameters are related to the module II;
initializing parameters of the model M by using a data set ImageNet to obtain an initialized model M'; the ResNet-50 model is the prior art, the data set ImageNet is the existing public data set, and compared with other public data sets, the data set ImageNet has better accuracy of the given initialization parameters and cannot generate too large random errors;
s300: calculating the loss of the initialization model M' by using a loss function;
the loss function of the initialization model M' in S300 is as follows:
s310: using a momentum feature encoder pair D T The data in the database is subjected to feature extraction and is stored in a memory feature library N, and then all the features in the N are clustered and generated by using a DBSCAN clustering algorithm which is the prior art
Figure BDA0003651892610000067
One-to-one pseudo label
Figure BDA0003651892610000068
S320: computing a training weight w for each iteration of a source domain using a time-series domain relationship strategy d (i) The time sequence domain relation method is the prior art, and the maximum training weight of each iteration of the preset source domain is t 1 Minimum is t 2 Wherein t is 1 >t 2 The calculation expression is as follows:
Figure BDA0003651892610000071
w d (i)=(1-s(i))×t 1 +s(i)×t 2
wherein, the symbol% represents the operation of taking the remainder, i represents the ith training, e represents the maximum training time, w d (i) Representing the training weight of the source domain acting on the ith iteration training, s (i) representing t 1 And t 2 The length of each part after equal interval division;
s330: computing training weights for each source domain sample using a rank guided selection strategy
Figure BDA0003651892610000072
The method comprises the following specific steps:
s331: from the source domain D s Randomly selecting a source domain sample
Figure BDA0003651892610000073
And using a line feature encoder f (· | θ) t ) To pair
Figure BDA0003651892610000074
Extracting features, and then utilizing the class classifier of the target domain and the class classifier of the source domain respectively
Figure BDA0003651892610000075
Classifying and calculating respectively
Figure BDA0003651892610000076
Probability distribution of classification over target domain
Figure BDA0003651892610000077
And probability distribution of classification over source domain
Figure BDA0003651892610000078
By class classifier C on the target domain t Classifying each source domain sample so as to measure the similarity between the source domain sample and the target domain; by class classifier C on the source domain s Classifying each source domain sample so as to measure the uncertainty of each source domain sample, wherein the category classifier of the target domain and the category classifier of the source domain both adopt the existing classifiers, and the calculation expression is as follows:
Figure BDA0003651892610000079
Figure BDA00036518926100000710
wherein the content of the first and second substances,
Figure BDA00036518926100000711
to represent
Figure BDA00036518926100000712
Probability distribution of classification over target domain, C t Class classifier on the representation target domain, c p Representing the number of categories of pseudo labels on the target domain;
Figure BDA00036518926100000713
representative sample
Figure BDA00036518926100000714
Probability distribution of classification over source domain, c s Number of categories of real tags on source domain, C s Representing a class classifier on the target domain;
s332: computing
Figure BDA00036518926100000725
Similarity score with target domain, Si, expressed as follows:
Figure BDA00036518926100000716
wherein, c p Representing the number of categories of pseudo labels on the target domain;
s333: calculating similarity scores of all the source domain samples and the target domain to form a similarity score set
Figure BDA00036518926100000726
Then, all similarity scores are arranged in a descending order, and the source domain samples corresponding to the former k% of similarity scores are taken as a reliability sample set delta s The expression is as follows:
Figure BDA00036518926100000718
wherein, tau s Representing the similarity score of the kth% source domain sample;
s334: definition of
Figure BDA00036518926100000719
The maximum class probability and the second largest class probability at the source domain are respectively
Figure BDA00036518926100000720
And
Figure BDA00036518926100000721
computing
Figure BDA00036518926100000722
Uncertainty U over the source domain i The expression is as follows:
Figure BDA00036518926100000723
s335: computing stationActive domain sample uncertainty values, component uncertainty value sets
Figure BDA00036518926100000724
Then, all uncertainty values are arranged in an ascending order, and the source domain sample corresponding to the top k% uncertainty value is taken as an uncertainty sample set delta u The expression is as follows:
Figure BDA0003651892610000081
s336: obtaining the training weight of each source domain sample by combining formula (6) and formula (8)
Figure BDA0003651892610000082
The expression is as follows:
Figure BDA0003651892610000083
the smaller the similarity between a sample selected from the source domain and the target domain is, the larger the difference in appearance information between the sample and the sample in the target domain is; conversely, if the source domain sample has greater similarity to the target domain, then the sample is more likely to have domain-shared knowledge with the target domain sample. For samples from the source domain
Figure BDA0003651892610000084
The sample has low similarity to the target domain, and the share of the model is gradually decreased (TDR) along with the increase of the number of training rounds; conversely, if the sample has a higher similarity to the target domain, his contribution will not be affected by the method from TDR.
If the source domain sample has larger uncertainty, the fact that the sample still has plenty of information for the model to learn is shown. By combining the methods proposed by the formula (6) and the formula (8), reliable and information-rich samples can be selected on the source domain, and by increasing the training weights of the samples, the domain sharing knowledge from the source domain can be effectively utilized, so that the performance of the model on the target domain can be further improved.
S340: calculating the cross entropy loss of the source domain according to the source domain sample training weight obtained in S336
Figure BDA0003651892610000085
The specific expression is as follows:
Figure BDA0003651892610000086
wherein the content of the first and second substances,
Figure BDA0003651892610000087
representing source domain samples
Figure BDA00036518926100000820
Belong to the category
Figure BDA0003651892610000089
The probability of (d);
s350: calculating the triple loss of the source domain according to the training weight of the source domain sample obtained in the step S336
Figure BDA00036518926100000810
The method comprises the following specific steps:
s351: calculate the ith to
Figure BDA00036518926100000811
The weight lost for a triplet of anchor points is
Figure BDA00036518926100000812
The calculation expression is as follows:
Figure BDA00036518926100000813
wherein the content of the first and second substances,
Figure BDA00036518926100000814
is shown and
Figure BDA00036518926100000815
the source domain positive samples that are the farthest away,
Figure BDA00036518926100000816
is shown and
Figure BDA00036518926100000817
the nearest source domain negative examples;
s352: after calculating the triple loss of all the source domain samples, the triple loss of the source domain can be obtained
Figure BDA00036518926100000821
The specific expression is as follows:
Figure BDA00036518926100000819
wherein the content of the first and second substances,
Figure BDA0003651892610000091
and
Figure BDA0003651892610000092
respectively representing source domain samples
Figure BDA0003651892610000093
The distance between the farthest source domain positive sample and the nearest source domain negative sample, and m represents the interval size of the triplet; more precisely, m represents the minimum difference between the distance between the pair of positive sample features and the distance between the pair of negative sample features, where m is set to 0.5 based on empirical values; this is a hyper-parameter used in designing the loss function; the main effect is to pull the distance of the same type of sample feature pair close and push the distance of the different type of sample feature pair open by a threshold.
S360: computing cross-entropy loss for target domains
Figure BDA0003651892610000094
And triple loss
Figure BDA0003651892610000095
The specific expression is as follows:
Figure BDA0003651892610000096
Figure BDA0003651892610000097
wherein the content of the first and second substances,
Figure BDA0003651892610000098
representing target domain samples
Figure BDA0003651892610000099
Belong to the category
Figure BDA00036518926100000910
The probability of (c).
Figure BDA00036518926100000911
And
Figure BDA00036518926100000912
respectively representing target domain samples
Figure BDA00036518926100000913
The distance between the positive sample of the farthest target domain and the negative sample of the nearest target domain, and m represents the interval size of the triad;
s370: the final loss function L of the initialized model M' can be obtained according to the formula (10), the formula (12), the formula (13) and the formula (14) total The expression is as follows:
Figure BDA00036518926100000914
wherein the content of the first and second substances,
Figure BDA00036518926100000915
representing the soft cross-entropy loss weight of the cross,
Figure BDA00036518926100000916
representing soft triplet loss weights.
Using the final loss function L in S370 total Calculating the loss of M' and updating f (· | theta) by gradient back propagation t ) Middle parameter, updated by equation (16)
Figure BDA00036518926100000917
The parameters in (1):
Figure BDA00036518926100000918
where α is the momentum factor and t represents the number of rounds of training.
S400: training the model M 'by taking the source domain and the target domain as the input of the initialization model M', updating the parameters in the model M 'according to the loss calculated in the step S300, and stopping training when the maximum training times is reached to obtain a trained model M';
s500: and inputting the image of the pedestrian to be predicted into the trained model M' to obtain the retrieval result of the pedestrian.
Experimental design and results analysis
Introduction of data sets used
The invention verifies the effectiveness of the proposed method on three widely used public data sets, i.e. Market1501, DukeMTMC-ReID and MSMT 17. For Market1501, the data set contains 1501 32668 pedestrian images of different identities taken by 6 cameras. Of these, 12936 pedestrian images of 751 were used for training, and the remaining images were used for testing. For the DukeMTMC-ReID, the data set contained 16522 training images, 2228 query images, and 17661 galery images from 702 different identities taken by 8 cameras. MSMT17 is a larger data set that includes 126441 images of 4101 different identities captured by 15 cameras. Specifically, 1041 different identities of 32621 pedestrian images were used as a training set, with the remaining images being used as a test set. This subsection is evaluated with the Mean Average Precision (mAP) and Cumulative Matching Curve (CMC) of two commonly used evaluation indexes. For convenience of description, the following refers to the mark 1501, DukeMTMC-ReID and MSMT17 as mark, Duke and MSMT respectively, and no additional description is provided.
2, Experimental setup
In the experiment, the proposed method uses ResNet-50 as a feature code and loads pre-training parameters on ImageNet. For the co-training setup, 64 pedestrian images of 16 different identities are constructed for each mini-batch on the source and target domains. And optimizing the network by using an Adam algorithm with the weight attenuation rate of 0.0005. The whole training process is carried out for 40 times, and in the current chapter, a war-up strategy is adopted in the previous 10 times of training, and the initial learning rate is set to be 0.00035. For each training, the momentum feature encoder f m (·|θ m ) The parameters are updated by time-sequence moving average, and the momentum factor alpha is 0.999. The reassignment of the pseudo-label is performed after every 400 iterations. All pedestrian images are resized to 256 x 128 size as input to the network. During the testing, the output of the last Batch Normalization (BN) layer was used as the final characterization of the pedestrian image. All experiments were performed on a Pytorch platform and three NVIDIA TITAN V GPUs were used. Note that the test phase is only the momentum feature encoder fm (. |. theta.) m ) For testing.
Table 1 compares some of the latest methods on the Market and Duke data sets, respectively.
Figure BDA0003651892610000101
Figure BDA0003651892610000111
Table 2 compares some recent methods on MSMT datasets.
Figure BDA0003651892610000112
3, learning by ablation
In order to verify the validity of each proposed method module, this section combines different method modules, and tests under the setting that Duke and mark are respectively used as a source domain and a target domain. As shown in fig. 2, the left side of fig. 2 shows the test results of different methods on the source domain, and the right side of fig. 2 shows the test results of different methods on the target domain. Wherein, the target and Duke are respectively used as the target domain and the source domain. From the test result on the source domain, it can be seen that with the increase of the number of iterations, the model performance under the two-stage training strategy shows a great decline trend, and finally only 20% of mAP is obtained. This indicates that the two-phase training strategy can lead to forgetting of the source domain knowledge by the fine-tuning phase model. In contrast, the collaborative training mode provided by the invention well overcomes the catastrophic forgetting problem encountered by the source domain knowledge and ensures the final performance of the model on the source domain. As can be seen from the test results on the target domain, the test performance of the model under the target domain is limited because the two-stage training strategy does not fully utilize the knowledge from the source domain. However, the method provided by the invention can fully and effectively utilize the source domain knowledge and obviously improve the test result of the model in the target domain. Specifically, the model under the two-stage training strategy obtains 74.7% of mAP, and the benchmark method under the collaborative training provided by the invention can achieve 79.1% of mAP. By introducing TDR to prevent overfitting of the model on the source domain, the test results of the model under the target domain achieved a 0.9% increase in the mAP. Further, combined with the RIS module, the model finally reaches 80.7% of the maps. From the experimental results, the method provided by the invention can effectively utilize the knowledge from the source domain, and further promotes the test result of the model on the target domain.
4, comparison of results
The method compares the PKSD method with the existing mainstream cross-domain pedestrian re-identification method. Note that the global Pooling layer (GAP) of the model for the final features is changed to the Generalized Mean Pooling layer (Gem). The experimental results are shown in table 1, and it can be seen that the performance of PKSD under the strategy of cooperative training greatly surpasses all the most advanced cross-domain pedestrian re-identification methods. Specifically, in the setting of 'Duke to mark', three generation-based methods, namely SPGAN, PTGAN and ATNet, are compared first. Compared with the best generation method, namely ATNet, PKSD is improved by 58.5 percent and 37.8 percent respectively in mAP and Rank 1. Further, the mainstream methods represented by NRMT, MEB-Net, UNRN, GLT, IDM, PDA and the like were compared. The proposed method achieves the best performance, whether on 'mark to Duke' or 'Duke to mark'. In particular, PKSD brought a 0.9% improvement in the mAP over PDA on 'mark to Duke'. Also, PKSD increased by 1.9% mAP over PDA on 'Duke to Market'.
The present method has also been experimented with on larger and more challenging MSMT datasets. Some of the latest approaches, such as: NRMT, UNRN, GLT, IDM and PDA, demonstrated their good performance on MSMT datasets. As shown in Table 2, PKSD achieved 63.8% Rank1 and 36.5% mAP on 'Market to MSMT'. Similarly, under the 'Duke to MSMT' setting, PKSD reached 63.8% Rank1 and 36.7% mAP. Compared with other methods, the PKSD provided by the method achieves the best test result. Overall, these experiments show that fully efficient utilization of knowledge from the source domain can further improve the performance of the model on the target domain.
Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims (3)

1. A cross-domain pedestrian re-identification method based on deep learning is characterized by comprising the following steps: the method comprises the following steps:
s100: selecting public data sets A and B, and using the data set A as a source domain D s The formula is as follows:
Figure FDA0003651892600000011
wherein the content of the first and second substances,
Figure FDA0003651892600000012
represents the ith source domain sample,
Figure FDA0003651892600000013
representing the corresponding real label, N, of the ith source domain sample s Represents the total number of source domain samples;
selecting partial data in the data set B as a target domain training set D T ,D T The expression of (a) is as follows:
Figure FDA0003651892600000014
wherein the content of the first and second substances,
Figure FDA0003651892600000015
denotes the J-th target domain sample, N t Represents the total number of samples of the target domain;
s200: selecting a ResNet-50 model M, wherein the model M comprises two modules, namely an online feature encoder f (· | theta) t ),θ t The related parameters of the module one, and the momentum feature encoder of the module two
Figure FDA0003651892600000016
Figure FDA0003651892600000017
The parameters are related to the module II;
initializing parameters of the model M by using a data set ImageNet to obtain an initialized model M';
s300: calculating the loss of the initialization model M' by using a loss function;
s400: training the model M 'by taking the source domain and the target domain as the input of the initialization model M', updating the parameters in the model M 'according to the loss calculated in the step S300, and stopping training when the maximum training times is reached to obtain a trained model M';
s500: and inputting the image of the pedestrian to be predicted into the trained model M' to obtain the retrieval result of the pedestrian.
2. The deep learning-based cross-domain pedestrian re-identification method according to claim 1, characterized in that: the loss function of the initialization model M' in S300 is as follows:
s310: using a momentum feature encoder pair D T The data in the database is subjected to feature extraction and is stored in a memory feature library N, and then all the features in the N are clustered by utilizing a DBSCAN clustering algorithm and generated
Figure FDA0003651892600000018
One-to-one pseudo label
Figure FDA0003651892600000019
S320: computing a training weight w for each iteration of a source domain using a time-series domain relationship strategy d (i) Presetting the maximum training weight t of each iteration of the source domain 1 Minimum of t 2 Wherein t is 1 >t 2 The calculation expression is as follows:
Figure FDA00036518926000000110
wherein, the symbol% represents the operation of taking the remainder, i represents the ith training, e represents the maximum training time, w d (i) Represents the training weight of the source domain action in the ith iteration training, s (i) TableWill t 1 And t 2 The length of each part after equal interval division;
s330: computing training weights for each source domain sample using a rank guided selection strategy
Figure FDA00036518926000000111
The method comprises the following specific steps:
s331: from the source domain D s Randomly selecting a source domain sample
Figure FDA00036518926000000112
And using a line feature encoder f (· | θ) t ) To pair
Figure FDA00036518926000000113
Extracting features, and then utilizing the class classifier of the target domain and the class classifier of the source domain respectively
Figure FDA00036518926000000114
Classifying and calculating respectively
Figure FDA0003651892600000021
Probability distribution of classification over target domain
Figure FDA0003651892600000022
And probability distribution of classification over source domain
Figure FDA0003651892600000023
The calculation expression is as follows:
Figure FDA0003651892600000024
Figure FDA0003651892600000025
wherein the content of the first and second substances,
Figure FDA0003651892600000026
to represent
Figure FDA0003651892600000027
Probability distribution of classification over target domain, C t Class classifier on the representation target domain, c p Representing the number of categories of pseudo labels on the target domain;
Figure FDA0003651892600000028
representative sample
Figure FDA0003651892600000029
Probability distribution of classification over source domain, c s Number of categories of real tags on source domain, C s Representing a class classifier on the target domain;
s332: calculating out
Figure FDA00036518926000000210
Similarity score with target Domain S i The expression is as follows:
Figure FDA00036518926000000211
wherein, c p Representing the number of categories of pseudo labels on the target domain;
s333: calculating similarity scores of all the source domain samples and the target domain to form a similarity score set
Figure FDA00036518926000000212
Then, all similarity scores are arranged in a descending order, and the source domain samples corresponding to the former k% of similarity scores are taken as a reliability sample set delta s The expression is as follows:
Figure FDA00036518926000000213
wherein, tau s Representing the similarity score of the kth% source domain sample;
s334: definition of
Figure FDA00036518926000000214
The maximum class probability and the second largest class probability at the source domain are respectively
Figure FDA00036518926000000215
And
Figure FDA00036518926000000216
computing
Figure FDA00036518926000000217
Uncertainty U over the source domain i The expression is as follows:
Figure FDA00036518926000000218
s335: calculating all source domain sample uncertainty values to form an uncertainty value set
Figure FDA00036518926000000219
Then, all uncertainty values are arranged in an ascending order, and the source domain sample corresponding to the top k% uncertainty value is taken as an uncertainty sample set delta u The expression is as follows:
Figure FDA00036518926000000220
s336: obtaining the training weight of each source domain sample by combining formula (6) and formula (8)
Figure FDA00036518926000000221
The expression is as follows:
Figure FDA00036518926000000222
s340: calculating the cross entropy loss of the source domain according to the training weight of the source domain sample obtained in S336
Figure FDA00036518926000000223
The specific expression is as follows:
Figure FDA00036518926000000224
wherein the content of the first and second substances,
Figure FDA00036518926000000225
representing source domain samples
Figure FDA00036518926000000226
Belong to the category
Figure FDA00036518926000000227
The probability of (d);
s350: calculating the triple loss of the source domain according to the training weight of the source domain sample obtained in the step S336
Figure FDA0003651892600000031
The method comprises the following specific steps:
s351: calculate the ith to
Figure FDA0003651892600000032
The weight lost for a triplet of anchor points is
Figure FDA0003651892600000033
The calculation expression is as follows:
Figure FDA0003651892600000034
wherein the content of the first and second substances,
Figure FDA0003651892600000035
is shown and
Figure FDA0003651892600000036
the source domain positive samples that are the farthest away,
Figure FDA0003651892600000037
is represented by
Figure FDA0003651892600000038
Nearest source domain negative examples;
s352: after calculating the triple losses of all the source domain samples, the triple losses of the source domain can be obtained
Figure FDA0003651892600000039
The specific expression is as follows:
Figure FDA00036518926000000310
wherein the content of the first and second substances,
Figure FDA00036518926000000311
and
Figure FDA00036518926000000312
respectively representing source domain samples
Figure FDA00036518926000000313
The distance between the farthest source domain positive sample and the nearest source domain negative sample, and m represents the interval size of the triplet;
s360: computing cross-entropy loss for target domains
Figure FDA00036518926000000314
And triple loss
Figure FDA00036518926000000315
The specific expression is as follows:
Figure FDA00036518926000000316
Figure FDA00036518926000000317
wherein the content of the first and second substances,
Figure FDA00036518926000000318
representing target domain samples
Figure FDA00036518926000000319
Belong to the category
Figure FDA00036518926000000320
The probability of (c).
Figure FDA00036518926000000321
And
Figure FDA00036518926000000322
respectively representing target domain samples
Figure FDA00036518926000000323
The distance between the positive sample of the farthest target domain and the negative sample of the nearest target domain, and m represents the interval size of the triad;
s370: the final loss function L of the initialized model M' can be obtained according to the formula (10), the formula (12), the formula (13) and the formula (14) total The expression is as follows:
Figure FDA00036518926000000324
wherein the content of the first and second substances,
Figure FDA00036518926000000325
representing the soft cross-entropy loss weight,
Figure FDA00036518926000000326
representing soft triplet loss weights.
3. The deep learning-based cross-domain pedestrian re-identification method according to claim 2, characterized in that: using the final loss function L in S370 total Calculating the loss of M' and updating f (· | theta) by gradient back propagation t ) Middle parameter, updated by equation (16)
Figure FDA00036518926000000327
The parameters in (1):
Figure FDA00036518926000000328
where α is the momentum factor and t represents the number of rounds of training.
CN202210554612.XA 2022-05-19 2022-05-19 Cross-domain pedestrian re-identification method based on deep learning Pending CN114882531A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210554612.XA CN114882531A (en) 2022-05-19 2022-05-19 Cross-domain pedestrian re-identification method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210554612.XA CN114882531A (en) 2022-05-19 2022-05-19 Cross-domain pedestrian re-identification method based on deep learning

Publications (1)

Publication Number Publication Date
CN114882531A true CN114882531A (en) 2022-08-09

Family

ID=82677958

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210554612.XA Pending CN114882531A (en) 2022-05-19 2022-05-19 Cross-domain pedestrian re-identification method based on deep learning

Country Status (1)

Country Link
CN (1) CN114882531A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115205570A (en) * 2022-09-14 2022-10-18 中国海洋大学 Unsupervised cross-domain target re-identification method based on comparative learning
CN115482927A (en) * 2022-09-21 2022-12-16 浙江大学 Children pneumonia diagnostic system based on small sample
CN117892183A (en) * 2024-03-14 2024-04-16 南京邮电大学 Electroencephalogram signal identification method and system based on reliable transfer learning
CN117892183B (en) * 2024-03-14 2024-06-04 南京邮电大学 Electroencephalogram signal identification method and system based on reliable transfer learning

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115205570A (en) * 2022-09-14 2022-10-18 中国海洋大学 Unsupervised cross-domain target re-identification method based on comparative learning
CN115205570B (en) * 2022-09-14 2022-12-20 中国海洋大学 Unsupervised cross-domain target re-identification method based on comparative learning
CN115482927A (en) * 2022-09-21 2022-12-16 浙江大学 Children pneumonia diagnostic system based on small sample
CN117892183A (en) * 2024-03-14 2024-04-16 南京邮电大学 Electroencephalogram signal identification method and system based on reliable transfer learning
CN117892183B (en) * 2024-03-14 2024-06-04 南京邮电大学 Electroencephalogram signal identification method and system based on reliable transfer learning

Similar Documents

Publication Publication Date Title
CN107480261B (en) Fine-grained face image fast retrieval method based on deep learning
CN108564129B (en) Trajectory data classification method based on generation countermeasure network
CN107515895B (en) Visual target retrieval method and system based on target detection
CN112069310B (en) Text classification method and system based on active learning strategy
US11210470B2 (en) Automatic text segmentation based on relevant context
US9053391B2 (en) Supervised and semi-supervised online boosting algorithm in machine learning framework
CN114882531A (en) Cross-domain pedestrian re-identification method based on deep learning
CN108846259A (en) A kind of gene sorting method and system based on cluster and random forests algorithm
CN112465040B (en) Software defect prediction method based on class unbalance learning algorithm
CN105760888B (en) A kind of neighborhood rough set integrated learning approach based on hierarchical cluster attribute
CN110942091B (en) Semi-supervised few-sample image classification method for searching reliable abnormal data center
CN113326731A (en) Cross-domain pedestrian re-identification algorithm based on momentum network guidance
CN101561805A (en) Document classifier generation method and system
CN113255573B (en) Pedestrian re-identification method based on mixed cluster center label learning and storage medium
CN108446334A (en) A kind of content-based image retrieval method of unsupervised dual training
CN114444600A (en) Small sample image classification method based on memory enhanced prototype network
CN114387473A (en) Small sample image classification method based on base class sample characteristic synthesis
Fan et al. Deep Hashing for Speaker Identification and Retrieval.
CN113505225A (en) Small sample medical relation classification method based on multilayer attention mechanism
CN116524960A (en) Speech emotion recognition system based on mixed entropy downsampling and integrated classifier
CN115063664A (en) Model learning method, training method and system for industrial vision detection
CN112819027B (en) Classification method based on machine learning and similarity scoring
CN112465054B (en) FCN-based multivariate time series data classification method
CN110162629B (en) Text classification method based on multi-base model framework
CN111222570B (en) Ensemble learning classification method based on difference privacy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination