CN114882531A - Cross-domain pedestrian re-identification method based on deep learning - Google Patents
Cross-domain pedestrian re-identification method based on deep learning Download PDFInfo
- Publication number
- CN114882531A CN114882531A CN202210554612.XA CN202210554612A CN114882531A CN 114882531 A CN114882531 A CN 114882531A CN 202210554612 A CN202210554612 A CN 202210554612A CN 114882531 A CN114882531 A CN 114882531A
- Authority
- CN
- China
- Prior art keywords
- domain
- source domain
- training
- sample
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a cross-domain pedestrian re-identification method based on deep learning, which comprises the following steps: selecting a public data set as a source domain and a target domain; selecting a ResNet-50 model M and initializing parameters thereof to obtain M'; taking a source domain and a target domain as the input of an initialization model M ', calculating corresponding loss, training the model M ', stopping training after the maximum training times is reached, and obtaining a trained model M '; and inputting the image of the pedestrian to be predicted into the trained model M' to obtain the retrieval result of the pedestrian. The method can more accurately detect and identify the specific pedestrian.
Description
Technical Field
The invention relates to the field of pedestrian re-identification, in particular to a cross-domain pedestrian re-identification method based on deep learning.
Background
The current pedestrian re-identification task aims to search for a specific pedestrian under a camera. Due to its important application in intelligent monitoring, the task of pedestrian re-identification has become one of the research hotspots in the field of computer vision. Satisfactory performance has been achieved in recent years based on supervised pedestrian re-identification methods. However, most supervised methods suffer from a significant degradation in performance when the trained and tested pedestrian samples are from different data sets. In the real world, the labeling of pedestrian data is expensive and time consuming, and therefore, the unsupervised cross-domain pedestrian re-identification task is of great concern to the scholars.
The purpose of unsupervised cross-domain pedestrian re-identification is to migrate discriminative knowledge from the source domain to the unlabeled target domain and expect the test results of the model in the target domain to be comparable to supervised methods. This task also presents a significant challenge due to the large inter-domain differences between the source domain and the target domain. To date, clustering-based cross-domain pedestrian re-identification methods have made great progress, and most of the most advanced existing methods are clustering-based methods; these processes can generally be divided into two stages: 1) carrying out supervised pre-training model by using the labeled source domain data; 2) and distributing pseudo labels on the target domain by using a clustering algorithm and iteratively fine-tuning the pre-training model.
However, a pedestrian re-recognition model that is iteratively trained during the fine-tuning phase may gradually forget discriminative knowledge from the source domain, i.e., catastrophic forget. By observation, this phenomenon can be explained from two aspects: 1) with the increase of the iteration times of the model in the fine tuning stage, the test result of the model on the source domain gradually decreases; 2) simple removal of the pre-training portion results in only minor performance degradation for most clustering methods. Therefore, it can be concluded that most existing clustering-based methods do not fully utilize discriminative knowledge on the source domain; discriminant knowledge from the source domain is important to improve the performance of the model in the target domain.
Disclosure of Invention
Aiming at the problems in the prior art, the technical problems to be solved by the invention are as follows: the inability to fully exploit domain sharing knowledge on the source domain results in poor discrimination on the unmarked target domain in the prior art.
In order to solve the technical problems, the invention adopts the following technical scheme:
a cross-domain pedestrian re-identification method based on deep learning comprises the following steps:
s100: selecting public data sets A and B, and using the data set A as a source domain D s The formula is as follows:
wherein the content of the first and second substances,represents the ith source domain sample,representing the corresponding real label, n, of the ith source domain sample s Represents the total number of source domain samples;
selecting partial data in the data set B as a target domain training set D T ,D T Watch (A)The expression is as follows:
wherein the content of the first and second substances,denotes the jth target domain sample, n t Represents the total number of samples of the target domain;
s200: selecting a ResNet-50 model M, wherein the model M comprises two modules, namely an online feature encoder f (· | theta) t ),θ t The related parameters of the module one, and the momentum feature encoder of the module twoThe parameters are related to the module II;
initializing parameters of the model M by using a data set ImageNet to obtain an initialized model M';
s300: calculating the loss of the initialization model M' by using a loss function;
s400: training the model M 'by taking the source domain and the target domain as the input of the initialization model M', updating the parameters in the model M 'according to the loss calculated in the step S300, and stopping training when the maximum training times is reached to obtain a trained model M';
s500: and inputting the image of the pedestrian to be predicted into the trained model M' to obtain the retrieval result of the pedestrian.
Preferably, the loss function of the initialization model M' in S300 is as follows:
s310: using a momentum feature encoder pair D T The data in the database is subjected to feature extraction and is stored in a memory feature library N, and then all the features in the N are clustered by utilizing a DBSCAN clustering algorithm and generatedOne-to-one pseudo label
S320: calculating training weight w of each iteration of source domain by using time sequence domain relation strategy d (i) Presetting the maximum training weight t of each iteration of the source domain 1 Minimum is t 2 Wherein t is 1 >t 2 The calculation expression is as follows:
w d (i)=(1-s(i))×t 1 +s(i)×t 2
wherein, the symbol% represents the operation of taking the remainder, i represents the ith training, e represents the maximum training time, w d (i) Representing the training weight of the source domain acting on the ith iteration training, s (i) representing t 1 And t 2 The length of each part after equal interval division;
s330: computing training weights for each source domain sample using a rank guided selection strategyThe method comprises the following specific steps:
s331: randomly selecting a source domain sample from the source domain DsAnd using a line feature encoder f (· | θ) t ) To pairExtracting features, and then utilizing the class classifier of the target domain and the class classifier of the source domain respectivelyClassifying and calculating respectivelyProbability distribution of classification over target domainAnd probability distribution of classification over source domainThe calculation expression is as follows:
wherein the content of the first and second substances,to representProbability distribution of classification over target domain, C t Class classifier on the representation target domain, c p Representing the number of categories of pseudo labels on the target domain;representative sampleProbability distribution of classification over source domain, c s Number of categories of real tags on source domain, C s Representing a class classifier on the target domain;
wherein, c p Representing the number of categories of pseudo labels on the target domain;
s333: calculating similarity scores of all the source domain samples and the target domain to form a similarity score setThen, all similarity scores are arranged in a descending order, and the source domain samples corresponding to the former k% of similarity scores are taken as a reliability sample set delta s The expression is as follows:
wherein, tau s Representing the similarity score of the kth% source domain sample;
s334: definition ofThe maximum class probability and the second largest class probability at the source domain are respectivelyAndcomputingUncertainty U over the source domain i The expression is as follows:
s335: calculating all source domain sample uncertainty values to form an uncertainty value setThen, all uncertainty values are arranged in an ascending order, and the source domain sample corresponding to the top k% uncertainty value is taken as an uncertainty sample set delta u The expression is as follows:
s336: obtaining the training weight of each source domain sample by combining formula (6) and formula (8)The expression is as follows:
s340: calculating the cross entropy loss of the source domain according to the source domain sample training weight obtained in S336The specific expression is as follows:
wherein the content of the first and second substances,representing source domain samplesBelong to the categoryThe probability of (d);
s350: calculating the triple loss of the source domain according to the training weight of the source domain sample obtained in the step S336The method comprises the following specific steps:
s351: calculate the ith toLost for anchor tripletsThe weight isThe calculation expression is as follows:
wherein the content of the first and second substances,is shown andthe source domain positive samples that are the farthest away,is shown andthe nearest source domain negative examples;
s352: after calculating the triple loss of all the source domain samples, the triple loss of the source domain can be obtainedThe specific expression is as follows:
wherein the content of the first and second substances,andrespectively representing source domain samplesThe distance between the farthest source domain positive sample and the nearest source domain negative sample, and m represents the interval size of the triplet;
s360: computing cross-entropy loss for target domainsAnd triple lossThe specific expression is as follows:
wherein the content of the first and second substances,representing target domain samplesBelong to the categoryThe probability of (c).Andrespectively representing target domain samplesThe distance between the positive sample of the farthest target domain and the negative sample of the nearest target domain, and m represents the interval size of the triad;
s370: the final loss function L of the initialized model M' can be obtained according to the formula (10), the formula (12), the formula (13) and the formula (14) total The expression is as follows:
wherein the content of the first and second substances,representing the soft cross-entropy loss weight,representing soft triplet loss weights.
The combination of cross entropy loss and triple loss is adopted for calculation, so that a weight balance effect can be achieved, and the influence of a noise pseudo label generated by a target domain on model training can be effectively reduced.
Preferably, the final loss function L in S370 is utilized total Calculating the loss of M' and updating f (· | theta) by gradient back propagation t ) Middle parameter, updated by equation (16)And the parameters of (A) and (B):
where α is the momentum factor and t represents the number of rounds of training.
Compared with the prior art, the invention has at least the following advantages:
1. aiming at the problem that the prior art method possibly cannot fully utilize the source knowledge in the training process, the invention provides a novel PKSD method which can effectively utilize the source domain knowledge in the whole training process and improve the accuracy of the discrimination on the unmarked target domain.
2. The invention provides a time sequence domain relation method TDR with linear change, which reduces the influence of domain specific samples in a source domain by reducing the training weight of the source domain.
3. The invention provides a sequencing-guided sample selection method RIS, which selects source domain samples with rich and reliable information by calculating uncertainty and similarity indexes of the source domain samples.
4. In order to alleviate the influence of catastrophic forgetting on a source domain, the pedestrian re-recognition model is trained in a collaborative training mode. Specifically, there is a training model that is common to the truly labeled source domain samples and the target domain samples assigned with the pseudo-labels. Different from most methods in the past, the method does not adopt a two-stage training strategy of pre-training and fine adjustment, but changes into a single-stage cooperative training mode. However, as the number of training rounds grows, the model tends to over-fit to some domain-specific knowledge on the source domain, which can impair the performance of the model on the target domain when the domain difference between the source domain and the target domain is large.
Drawings
FIG. 1 shows the main structure of PKSD according to the method of the present invention.
FIG. 2 shows the validity verification results of the method of the present invention and other different methods.
Detailed Description
The present invention is described in further detail below.
The invention trains the pedestrian re-recognition model in a cooperative training mode. Specifically, there is a training model that is common to the truly labeled source domain samples and the target domain samples assigned with the pseudo-labels. Different from most methods in the past, the method does not adopt a two-stage training strategy of pre-training and fine adjustment, but changes into a single-stage cooperative training mode. However, as the number of training rounds grows, the model tends to over-fit to some domain-specific knowledge on the source domain, which can impair the performance of the model on the target domain when the domain difference between the source domain and the target domain is large.
In order to solve the above problems, the present invention proposes a novel cross-Domain pedestrian re-identification method of Source Domain Knowledge Preservation (PKSD) to effectively utilize Knowledge from the Source Domain in the whole training process. Unlike previous two-stage based training criteria, PKSD employs a collaborative training strategy, i.e., learning both source domain samples and target domain samples. Specifically, in each iteration, the PKSD trains the model together by using not only the target domain data with the pseudo-label as input to the model, but also the source domain data with the true label as input. While the source domain samples are fully utilized, the domain-specific knowledge present in the source domain plays a detrimental role in the domain-applicable task. Therefore, a linear Time Domain Relationship (TDR) method is proposed in this section to gradually alleviate the influence of the source Domain samples. Specifically, as the number of training times increases, the training weight of the source domain is gradually decreased. However, some information-rich and reliable domain sharing knowledge is helpful to improve the performance of the model on the target domain. Further, this section proposes a Ranking-guided sample Selection (RIS) method to evaluate uncertainty and similarity of each sample from the source domain, select samples with informative and reliable domain sharing knowledge by Ranking with uncertainty and similarity scores, and reassign their sample training weights. In general, by controlling the source domain weights and the sample weights, the proposed PKSD can effectively suppress the influence of domain-specific knowledge from the source domain, improving the test performance of the model on the target domain. The experimental results show that the proposed method greatly exceeds the most advanced method at present.
Referring to fig. 1, a cross-domain pedestrian re-identification method based on deep learning includes the following steps:
s100: selecting public data sets A and B, and using the data set A as a source domain D s The formula is as follows:
wherein the content of the first and second substances,represents the ith source domain sample,representing the corresponding real label, n, of the ith source domain sample s Representing a source domainTotal number of samples;
selecting partial data in the data set B as a target domain training set D T ,D T The expression of (a) is as follows:
wherein the content of the first and second substances,denotes the jth target domain sample, n t Represents the total number of samples of the target domain;
s200: selecting a ResNet-50 model M, wherein the model M comprises two modules, namely an online feature encoder f (· | theta) t ),θ t The related parameters of the module one, and the momentum feature encoder of the module twoThe parameters are related to the module II;
initializing parameters of the model M by using a data set ImageNet to obtain an initialized model M'; the ResNet-50 model is the prior art, the data set ImageNet is the existing public data set, and compared with other public data sets, the data set ImageNet has better accuracy of the given initialization parameters and cannot generate too large random errors;
s300: calculating the loss of the initialization model M' by using a loss function;
the loss function of the initialization model M' in S300 is as follows:
s310: using a momentum feature encoder pair D T The data in the database is subjected to feature extraction and is stored in a memory feature library N, and then all the features in the N are clustered and generated by using a DBSCAN clustering algorithm which is the prior artOne-to-one pseudo label
S320: computing a training weight w for each iteration of a source domain using a time-series domain relationship strategy d (i) The time sequence domain relation method is the prior art, and the maximum training weight of each iteration of the preset source domain is t 1 Minimum is t 2 Wherein t is 1 >t 2 The calculation expression is as follows:
w d (i)=(1-s(i))×t 1 +s(i)×t 2
wherein, the symbol% represents the operation of taking the remainder, i represents the ith training, e represents the maximum training time, w d (i) Representing the training weight of the source domain acting on the ith iteration training, s (i) representing t 1 And t 2 The length of each part after equal interval division;
s330: computing training weights for each source domain sample using a rank guided selection strategyThe method comprises the following specific steps:
s331: from the source domain D s Randomly selecting a source domain sampleAnd using a line feature encoder f (· | θ) t ) To pairExtracting features, and then utilizing the class classifier of the target domain and the class classifier of the source domain respectivelyClassifying and calculating respectivelyProbability distribution of classification over target domainAnd probability distribution of classification over source domainBy class classifier C on the target domain t Classifying each source domain sample so as to measure the similarity between the source domain sample and the target domain; by class classifier C on the source domain s Classifying each source domain sample so as to measure the uncertainty of each source domain sample, wherein the category classifier of the target domain and the category classifier of the source domain both adopt the existing classifiers, and the calculation expression is as follows:
wherein the content of the first and second substances,to representProbability distribution of classification over target domain, C t Class classifier on the representation target domain, c p Representing the number of categories of pseudo labels on the target domain;representative sampleProbability distribution of classification over source domain, c s Number of categories of real tags on source domain, C s Representing a class classifier on the target domain;
wherein, c p Representing the number of categories of pseudo labels on the target domain;
s333: calculating similarity scores of all the source domain samples and the target domain to form a similarity score setThen, all similarity scores are arranged in a descending order, and the source domain samples corresponding to the former k% of similarity scores are taken as a reliability sample set delta s The expression is as follows:
wherein, tau s Representing the similarity score of the kth% source domain sample;
s334: definition ofThe maximum class probability and the second largest class probability at the source domain are respectivelyAndcomputingUncertainty U over the source domain i The expression is as follows:
s335: computing stationActive domain sample uncertainty values, component uncertainty value setsThen, all uncertainty values are arranged in an ascending order, and the source domain sample corresponding to the top k% uncertainty value is taken as an uncertainty sample set delta u The expression is as follows:
s336: obtaining the training weight of each source domain sample by combining formula (6) and formula (8)The expression is as follows:
the smaller the similarity between a sample selected from the source domain and the target domain is, the larger the difference in appearance information between the sample and the sample in the target domain is; conversely, if the source domain sample has greater similarity to the target domain, then the sample is more likely to have domain-shared knowledge with the target domain sample. For samples from the source domainThe sample has low similarity to the target domain, and the share of the model is gradually decreased (TDR) along with the increase of the number of training rounds; conversely, if the sample has a higher similarity to the target domain, his contribution will not be affected by the method from TDR.
If the source domain sample has larger uncertainty, the fact that the sample still has plenty of information for the model to learn is shown. By combining the methods proposed by the formula (6) and the formula (8), reliable and information-rich samples can be selected on the source domain, and by increasing the training weights of the samples, the domain sharing knowledge from the source domain can be effectively utilized, so that the performance of the model on the target domain can be further improved.
S340: calculating the cross entropy loss of the source domain according to the source domain sample training weight obtained in S336The specific expression is as follows:
wherein the content of the first and second substances,representing source domain samplesBelong to the categoryThe probability of (d);
s350: calculating the triple loss of the source domain according to the training weight of the source domain sample obtained in the step S336The method comprises the following specific steps:
s351: calculate the ith toThe weight lost for a triplet of anchor points isThe calculation expression is as follows:
wherein the content of the first and second substances,is shown andthe source domain positive samples that are the farthest away,is shown andthe nearest source domain negative examples;
s352: after calculating the triple loss of all the source domain samples, the triple loss of the source domain can be obtainedThe specific expression is as follows:
wherein the content of the first and second substances,andrespectively representing source domain samplesThe distance between the farthest source domain positive sample and the nearest source domain negative sample, and m represents the interval size of the triplet; more precisely, m represents the minimum difference between the distance between the pair of positive sample features and the distance between the pair of negative sample features, where m is set to 0.5 based on empirical values; this is a hyper-parameter used in designing the loss function; the main effect is to pull the distance of the same type of sample feature pair close and push the distance of the different type of sample feature pair open by a threshold.
S360: computing cross-entropy loss for target domainsAnd triple lossThe specific expression is as follows:
wherein the content of the first and second substances,representing target domain samplesBelong to the categoryThe probability of (c).Andrespectively representing target domain samplesThe distance between the positive sample of the farthest target domain and the negative sample of the nearest target domain, and m represents the interval size of the triad;
s370: the final loss function L of the initialized model M' can be obtained according to the formula (10), the formula (12), the formula (13) and the formula (14) total The expression is as follows:
wherein the content of the first and second substances,representing the soft cross-entropy loss weight of the cross,representing soft triplet loss weights.
Using the final loss function L in S370 total Calculating the loss of M' and updating f (· | theta) by gradient back propagation t ) Middle parameter, updated by equation (16)The parameters in (1):
where α is the momentum factor and t represents the number of rounds of training.
S400: training the model M 'by taking the source domain and the target domain as the input of the initialization model M', updating the parameters in the model M 'according to the loss calculated in the step S300, and stopping training when the maximum training times is reached to obtain a trained model M';
s500: and inputting the image of the pedestrian to be predicted into the trained model M' to obtain the retrieval result of the pedestrian.
Experimental design and results analysis
Introduction of data sets used
The invention verifies the effectiveness of the proposed method on three widely used public data sets, i.e. Market1501, DukeMTMC-ReID and MSMT 17. For Market1501, the data set contains 1501 32668 pedestrian images of different identities taken by 6 cameras. Of these, 12936 pedestrian images of 751 were used for training, and the remaining images were used for testing. For the DukeMTMC-ReID, the data set contained 16522 training images, 2228 query images, and 17661 galery images from 702 different identities taken by 8 cameras. MSMT17 is a larger data set that includes 126441 images of 4101 different identities captured by 15 cameras. Specifically, 1041 different identities of 32621 pedestrian images were used as a training set, with the remaining images being used as a test set. This subsection is evaluated with the Mean Average Precision (mAP) and Cumulative Matching Curve (CMC) of two commonly used evaluation indexes. For convenience of description, the following refers to the mark 1501, DukeMTMC-ReID and MSMT17 as mark, Duke and MSMT respectively, and no additional description is provided.
2, Experimental setup
In the experiment, the proposed method uses ResNet-50 as a feature code and loads pre-training parameters on ImageNet. For the co-training setup, 64 pedestrian images of 16 different identities are constructed for each mini-batch on the source and target domains. And optimizing the network by using an Adam algorithm with the weight attenuation rate of 0.0005. The whole training process is carried out for 40 times, and in the current chapter, a war-up strategy is adopted in the previous 10 times of training, and the initial learning rate is set to be 0.00035. For each training, the momentum feature encoder f m (·|θ m ) The parameters are updated by time-sequence moving average, and the momentum factor alpha is 0.999. The reassignment of the pseudo-label is performed after every 400 iterations. All pedestrian images are resized to 256 x 128 size as input to the network. During the testing, the output of the last Batch Normalization (BN) layer was used as the final characterization of the pedestrian image. All experiments were performed on a Pytorch platform and three NVIDIA TITAN V GPUs were used. Note that the test phase is only the momentum feature encoder fm (. |. theta.) m ) For testing.
Table 1 compares some of the latest methods on the Market and Duke data sets, respectively.
Table 2 compares some recent methods on MSMT datasets.
3, learning by ablation
In order to verify the validity of each proposed method module, this section combines different method modules, and tests under the setting that Duke and mark are respectively used as a source domain and a target domain. As shown in fig. 2, the left side of fig. 2 shows the test results of different methods on the source domain, and the right side of fig. 2 shows the test results of different methods on the target domain. Wherein, the target and Duke are respectively used as the target domain and the source domain. From the test result on the source domain, it can be seen that with the increase of the number of iterations, the model performance under the two-stage training strategy shows a great decline trend, and finally only 20% of mAP is obtained. This indicates that the two-phase training strategy can lead to forgetting of the source domain knowledge by the fine-tuning phase model. In contrast, the collaborative training mode provided by the invention well overcomes the catastrophic forgetting problem encountered by the source domain knowledge and ensures the final performance of the model on the source domain. As can be seen from the test results on the target domain, the test performance of the model under the target domain is limited because the two-stage training strategy does not fully utilize the knowledge from the source domain. However, the method provided by the invention can fully and effectively utilize the source domain knowledge and obviously improve the test result of the model in the target domain. Specifically, the model under the two-stage training strategy obtains 74.7% of mAP, and the benchmark method under the collaborative training provided by the invention can achieve 79.1% of mAP. By introducing TDR to prevent overfitting of the model on the source domain, the test results of the model under the target domain achieved a 0.9% increase in the mAP. Further, combined with the RIS module, the model finally reaches 80.7% of the maps. From the experimental results, the method provided by the invention can effectively utilize the knowledge from the source domain, and further promotes the test result of the model on the target domain.
4, comparison of results
The method compares the PKSD method with the existing mainstream cross-domain pedestrian re-identification method. Note that the global Pooling layer (GAP) of the model for the final features is changed to the Generalized Mean Pooling layer (Gem). The experimental results are shown in table 1, and it can be seen that the performance of PKSD under the strategy of cooperative training greatly surpasses all the most advanced cross-domain pedestrian re-identification methods. Specifically, in the setting of 'Duke to mark', three generation-based methods, namely SPGAN, PTGAN and ATNet, are compared first. Compared with the best generation method, namely ATNet, PKSD is improved by 58.5 percent and 37.8 percent respectively in mAP and Rank 1. Further, the mainstream methods represented by NRMT, MEB-Net, UNRN, GLT, IDM, PDA and the like were compared. The proposed method achieves the best performance, whether on 'mark to Duke' or 'Duke to mark'. In particular, PKSD brought a 0.9% improvement in the mAP over PDA on 'mark to Duke'. Also, PKSD increased by 1.9% mAP over PDA on 'Duke to Market'.
The present method has also been experimented with on larger and more challenging MSMT datasets. Some of the latest approaches, such as: NRMT, UNRN, GLT, IDM and PDA, demonstrated their good performance on MSMT datasets. As shown in Table 2, PKSD achieved 63.8% Rank1 and 36.5% mAP on 'Market to MSMT'. Similarly, under the 'Duke to MSMT' setting, PKSD reached 63.8% Rank1 and 36.7% mAP. Compared with other methods, the PKSD provided by the method achieves the best test result. Overall, these experiments show that fully efficient utilization of knowledge from the source domain can further improve the performance of the model on the target domain.
Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.
Claims (3)
1. A cross-domain pedestrian re-identification method based on deep learning is characterized by comprising the following steps: the method comprises the following steps:
s100: selecting public data sets A and B, and using the data set A as a source domain D s The formula is as follows:
wherein the content of the first and second substances,represents the ith source domain sample,representing the corresponding real label, N, of the ith source domain sample s Represents the total number of source domain samples;
selecting partial data in the data set B as a target domain training set D T ,D T The expression of (a) is as follows:
wherein the content of the first and second substances,denotes the J-th target domain sample, N t Represents the total number of samples of the target domain;
s200: selecting a ResNet-50 model M, wherein the model M comprises two modules, namely an online feature encoder f (· | theta) t ),θ t The related parameters of the module one, and the momentum feature encoder of the module two The parameters are related to the module II;
initializing parameters of the model M by using a data set ImageNet to obtain an initialized model M';
s300: calculating the loss of the initialization model M' by using a loss function;
s400: training the model M 'by taking the source domain and the target domain as the input of the initialization model M', updating the parameters in the model M 'according to the loss calculated in the step S300, and stopping training when the maximum training times is reached to obtain a trained model M';
s500: and inputting the image of the pedestrian to be predicted into the trained model M' to obtain the retrieval result of the pedestrian.
2. The deep learning-based cross-domain pedestrian re-identification method according to claim 1, characterized in that: the loss function of the initialization model M' in S300 is as follows:
s310: using a momentum feature encoder pair D T The data in the database is subjected to feature extraction and is stored in a memory feature library N, and then all the features in the N are clustered by utilizing a DBSCAN clustering algorithm and generatedOne-to-one pseudo label
S320: computing a training weight w for each iteration of a source domain using a time-series domain relationship strategy d (i) Presetting the maximum training weight t of each iteration of the source domain 1 Minimum of t 2 Wherein t is 1 >t 2 The calculation expression is as follows:
wherein, the symbol% represents the operation of taking the remainder, i represents the ith training, e represents the maximum training time, w d (i) Represents the training weight of the source domain action in the ith iteration training, s (i) TableWill t 1 And t 2 The length of each part after equal interval division;
s330: computing training weights for each source domain sample using a rank guided selection strategyThe method comprises the following specific steps:
s331: from the source domain D s Randomly selecting a source domain sampleAnd using a line feature encoder f (· | θ) t ) To pairExtracting features, and then utilizing the class classifier of the target domain and the class classifier of the source domain respectivelyClassifying and calculating respectivelyProbability distribution of classification over target domainAnd probability distribution of classification over source domainThe calculation expression is as follows:
wherein the content of the first and second substances,to representProbability distribution of classification over target domain, C t Class classifier on the representation target domain, c p Representing the number of categories of pseudo labels on the target domain;representative sampleProbability distribution of classification over source domain, c s Number of categories of real tags on source domain, C s Representing a class classifier on the target domain;
wherein, c p Representing the number of categories of pseudo labels on the target domain;
s333: calculating similarity scores of all the source domain samples and the target domain to form a similarity score setThen, all similarity scores are arranged in a descending order, and the source domain samples corresponding to the former k% of similarity scores are taken as a reliability sample set delta s The expression is as follows:
wherein, tau s Representing the similarity score of the kth% source domain sample;
s334: definition ofThe maximum class probability and the second largest class probability at the source domain are respectivelyAndcomputingUncertainty U over the source domain i The expression is as follows:
s335: calculating all source domain sample uncertainty values to form an uncertainty value setThen, all uncertainty values are arranged in an ascending order, and the source domain sample corresponding to the top k% uncertainty value is taken as an uncertainty sample set delta u The expression is as follows:
s336: obtaining the training weight of each source domain sample by combining formula (6) and formula (8)The expression is as follows:
s340: calculating the cross entropy loss of the source domain according to the training weight of the source domain sample obtained in S336The specific expression is as follows:
wherein the content of the first and second substances,representing source domain samplesBelong to the categoryThe probability of (d);
s350: calculating the triple loss of the source domain according to the training weight of the source domain sample obtained in the step S336The method comprises the following specific steps:
s351: calculate the ith toThe weight lost for a triplet of anchor points isThe calculation expression is as follows:
wherein the content of the first and second substances,is shown andthe source domain positive samples that are the farthest away,is represented byNearest source domain negative examples;
s352: after calculating the triple losses of all the source domain samples, the triple losses of the source domain can be obtainedThe specific expression is as follows:
wherein the content of the first and second substances,andrespectively representing source domain samplesThe distance between the farthest source domain positive sample and the nearest source domain negative sample, and m represents the interval size of the triplet;
s360: computing cross-entropy loss for target domainsAnd triple lossThe specific expression is as follows:
wherein the content of the first and second substances,representing target domain samplesBelong to the categoryThe probability of (c).Andrespectively representing target domain samplesThe distance between the positive sample of the farthest target domain and the negative sample of the nearest target domain, and m represents the interval size of the triad;
s370: the final loss function L of the initialized model M' can be obtained according to the formula (10), the formula (12), the formula (13) and the formula (14) total The expression is as follows:
3. The deep learning-based cross-domain pedestrian re-identification method according to claim 2, characterized in that: using the final loss function L in S370 total Calculating the loss of M' and updating f (· | theta) by gradient back propagation t ) Middle parameter, updated by equation (16)The parameters in (1):
where α is the momentum factor and t represents the number of rounds of training.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210554612.XA CN114882531A (en) | 2022-05-19 | 2022-05-19 | Cross-domain pedestrian re-identification method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210554612.XA CN114882531A (en) | 2022-05-19 | 2022-05-19 | Cross-domain pedestrian re-identification method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114882531A true CN114882531A (en) | 2022-08-09 |
Family
ID=82677958
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210554612.XA Pending CN114882531A (en) | 2022-05-19 | 2022-05-19 | Cross-domain pedestrian re-identification method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114882531A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115205570A (en) * | 2022-09-14 | 2022-10-18 | 中国海洋大学 | Unsupervised cross-domain target re-identification method based on comparative learning |
CN115482927A (en) * | 2022-09-21 | 2022-12-16 | 浙江大学 | Children pneumonia diagnostic system based on small sample |
CN117892183A (en) * | 2024-03-14 | 2024-04-16 | 南京邮电大学 | Electroencephalogram signal identification method and system based on reliable transfer learning |
CN117892183B (en) * | 2024-03-14 | 2024-06-04 | 南京邮电大学 | Electroencephalogram signal identification method and system based on reliable transfer learning |
-
2022
- 2022-05-19 CN CN202210554612.XA patent/CN114882531A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115205570A (en) * | 2022-09-14 | 2022-10-18 | 中国海洋大学 | Unsupervised cross-domain target re-identification method based on comparative learning |
CN115205570B (en) * | 2022-09-14 | 2022-12-20 | 中国海洋大学 | Unsupervised cross-domain target re-identification method based on comparative learning |
CN115482927A (en) * | 2022-09-21 | 2022-12-16 | 浙江大学 | Children pneumonia diagnostic system based on small sample |
CN117892183A (en) * | 2024-03-14 | 2024-04-16 | 南京邮电大学 | Electroencephalogram signal identification method and system based on reliable transfer learning |
CN117892183B (en) * | 2024-03-14 | 2024-06-04 | 南京邮电大学 | Electroencephalogram signal identification method and system based on reliable transfer learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107480261B (en) | Fine-grained face image fast retrieval method based on deep learning | |
CN108564129B (en) | Trajectory data classification method based on generation countermeasure network | |
CN107515895B (en) | Visual target retrieval method and system based on target detection | |
CN112069310B (en) | Text classification method and system based on active learning strategy | |
US11210470B2 (en) | Automatic text segmentation based on relevant context | |
US9053391B2 (en) | Supervised and semi-supervised online boosting algorithm in machine learning framework | |
CN114882531A (en) | Cross-domain pedestrian re-identification method based on deep learning | |
CN108846259A (en) | A kind of gene sorting method and system based on cluster and random forests algorithm | |
CN112465040B (en) | Software defect prediction method based on class unbalance learning algorithm | |
CN105760888B (en) | A kind of neighborhood rough set integrated learning approach based on hierarchical cluster attribute | |
CN110942091B (en) | Semi-supervised few-sample image classification method for searching reliable abnormal data center | |
CN113326731A (en) | Cross-domain pedestrian re-identification algorithm based on momentum network guidance | |
CN101561805A (en) | Document classifier generation method and system | |
CN113255573B (en) | Pedestrian re-identification method based on mixed cluster center label learning and storage medium | |
CN108446334A (en) | A kind of content-based image retrieval method of unsupervised dual training | |
CN114444600A (en) | Small sample image classification method based on memory enhanced prototype network | |
CN114387473A (en) | Small sample image classification method based on base class sample characteristic synthesis | |
Fan et al. | Deep Hashing for Speaker Identification and Retrieval. | |
CN113505225A (en) | Small sample medical relation classification method based on multilayer attention mechanism | |
CN116524960A (en) | Speech emotion recognition system based on mixed entropy downsampling and integrated classifier | |
CN115063664A (en) | Model learning method, training method and system for industrial vision detection | |
CN112819027B (en) | Classification method based on machine learning and similarity scoring | |
CN112465054B (en) | FCN-based multivariate time series data classification method | |
CN110162629B (en) | Text classification method based on multi-base model framework | |
CN111222570B (en) | Ensemble learning classification method based on difference privacy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |