CN114882531A

CN114882531A - Cross-domain pedestrian re-identification method based on deep learning

Info

Publication number: CN114882531A
Application number: CN202210554612.XA
Authority: CN
Inventors: 葛永新; 张俊银; 华博誉; 徐玲; 黄晟; 洪明坚; 王洪星; 张小洪; 杨丹
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-05-19
Filing date: 2022-05-19
Publication date: 2022-08-09

Abstract

The invention relates to a cross-domain pedestrian re-identification method based on deep learning, which comprises the following steps: selecting a public data set as a source domain and a target domain; selecting a ResNet-50 model M and initializing parameters thereof to obtain M'; taking a source domain and a target domain as the input of an initialization model M ', calculating corresponding loss, training the model M ', stopping training after the maximum training times is reached, and obtaining a trained model M '; and inputting the image of the pedestrian to be predicted into the trained model M' to obtain the retrieval result of the pedestrian. The method can more accurately detect and identify the specific pedestrian.

Description

Cross-domain pedestrian re-identification method based on deep learning

Technical Field

The invention relates to the field of pedestrian re-identification, in particular to a cross-domain pedestrian re-identification method based on deep learning.

Background

The current pedestrian re-identification task aims to search for a specific pedestrian under a camera. Due to its important application in intelligent monitoring, the task of pedestrian re-identification has become one of the research hotspots in the field of computer vision. Satisfactory performance has been achieved in recent years based on supervised pedestrian re-identification methods. However, most supervised methods suffer from a significant degradation in performance when the trained and tested pedestrian samples are from different data sets. In the real world, the labeling of pedestrian data is expensive and time consuming, and therefore, the unsupervised cross-domain pedestrian re-identification task is of great concern to the scholars.

The purpose of unsupervised cross-domain pedestrian re-identification is to migrate discriminative knowledge from the source domain to the unlabeled target domain and expect the test results of the model in the target domain to be comparable to supervised methods. This task also presents a significant challenge due to the large inter-domain differences between the source domain and the target domain. To date, clustering-based cross-domain pedestrian re-identification methods have made great progress, and most of the most advanced existing methods are clustering-based methods; these processes can generally be divided into two stages: 1) carrying out supervised pre-training model by using the labeled source domain data; 2) and distributing pseudo labels on the target domain by using a clustering algorithm and iteratively fine-tuning the pre-training model.

However, a pedestrian re-recognition model that is iteratively trained during the fine-tuning phase may gradually forget discriminative knowledge from the source domain, i.e., catastrophic forget. By observation, this phenomenon can be explained from two aspects: 1) with the increase of the iteration times of the model in the fine tuning stage, the test result of the model on the source domain gradually decreases; 2) simple removal of the pre-training portion results in only minor performance degradation for most clustering methods. Therefore, it can be concluded that most existing clustering-based methods do not fully utilize discriminative knowledge on the source domain; discriminant knowledge from the source domain is important to improve the performance of the model in the target domain.

Disclosure of Invention

Aiming at the problems in the prior art, the technical problems to be solved by the invention are as follows: the inability to fully exploit domain sharing knowledge on the source domain results in poor discrimination on the unmarked target domain in the prior art.

In order to solve the technical problems, the invention adopts the following technical scheme:

a cross-domain pedestrian re-identification method based on deep learning comprises the following steps:

s100: selecting public data sets A and B, and using the data set A as a source domain D _s The formula is as follows:

wherein the content of the first and second substances,

represents the ith source domain sample,

representing the corresponding real label, n, of the ith source domain sample _s Represents the total number of source domain samples;

selecting partial data in the data set B as a target domain training set D _T ，D _T Watch (A)The expression is as follows:

wherein the content of the first and second substances,

denotes the jth target domain sample, n _t Represents the total number of samples of the target domain;

s200: selecting a ResNet-50 model M, wherein the model M comprises two modules, namely an online feature encoder f (· | theta) ^t )，θ ^t The related parameters of the module one, and the momentum feature encoder of the module two

The parameters are related to the module II;

initializing parameters of the model M by using a data set ImageNet to obtain an initialized model M';

s300: calculating the loss of the initialization model M' by using a loss function;

s400: training the model M 'by taking the source domain and the target domain as the input of the initialization model M', updating the parameters in the model M 'according to the loss calculated in the step S300, and stopping training when the maximum training times is reached to obtain a trained model M';

s500: and inputting the image of the pedestrian to be predicted into the trained model M' to obtain the retrieval result of the pedestrian.

Preferably, the loss function of the initialization model M' in S300 is as follows:

s310: using a momentum feature encoder pair D _T The data in the database is subjected to feature extraction and is stored in a memory feature library N, and then all the features in the N are clustered by utilizing a DBSCAN clustering algorithm and generated

One-to-one pseudo label

S320: calculating training weight w of each iteration of source domain by using time sequence domain relation strategy _d (i) Presetting the maximum training weight t of each iteration of the source domain ₁ Minimum is t ₂ Wherein t is ₁ ＞t ₂ The calculation expression is as follows:

w _d (i)＝(1-s(i))×t ₁ +s(i)×t ₂

wherein, the symbol% represents the operation of taking the remainder, i represents the ith training, e represents the maximum training time, w _d (i) Representing the training weight of the source domain acting on the ith iteration training, s (i) representing t ₁ And t ₂ The length of each part after equal interval division;

s330: computing training weights for each source domain sample using a rank guided selection strategy

The method comprises the following specific steps:

s331: randomly selecting a source domain sample from the source domain Ds

And using a line feature encoder f (· | θ) ^t ) To pair

Extracting features, and then utilizing the class classifier of the target domain and the class classifier of the source domain respectively

Classifying and calculating respectively

Probability distribution of classification over target domain

And probability distribution of classification over source domain

The calculation expression is as follows:

wherein the content of the first and second substances,

to represent

Probability distribution of classification over target domain, C ^t Class classifier on the representation target domain, c _p Representing the number of categories of pseudo labels on the target domain;

representative sample

Probability distribution of classification over source domain, c _s Number of categories of real tags on source domain, C ^s Representing a class classifier on the target domain;

s332: computing

Similarity score with target domain, Si, expressed as follows:

wherein, c _p Representing the number of categories of pseudo labels on the target domain;

s333: calculating similarity scores of all the source domain samples and the target domain to form a similarity score set

Then, all similarity scores are arranged in a descending order, and the source domain samples corresponding to the former k% of similarity scores are taken as a reliability sample set delta _s The expression is as follows:

wherein, tau _s Representing the similarity score of the kth% source domain sample;

s334: definition of

The maximum class probability and the second largest class probability at the source domain are respectively

And

computing

Uncertainty U over the source domain _i The expression is as follows:

s335: calculating all source domain sample uncertainty values to form an uncertainty value set

Then, all uncertainty values are arranged in an ascending order, and the source domain sample corresponding to the top k% uncertainty value is taken as an uncertainty sample set delta _u The expression is as follows:

s336: obtaining the training weight of each source domain sample by combining formula (6) and formula (8)

The expression is as follows:

s340: calculating the cross entropy loss of the source domain according to the source domain sample training weight obtained in S336

The specific expression is as follows:

wherein the content of the first and second substances,

representing source domain samples

Belong to the category

The probability of (d);

s350: calculating the triple loss of the source domain according to the training weight of the source domain sample obtained in the step S336

The method comprises the following specific steps:

s351: calculate the ith to

Lost for anchor tripletsThe weight is

The calculation expression is as follows:

wherein the content of the first and second substances,

is shown and

the source domain positive samples that are the farthest away,

is shown and

the nearest source domain negative examples;

s352: after calculating the triple loss of all the source domain samples, the triple loss of the source domain can be obtained

The specific expression is as follows:

wherein the content of the first and second substances,

and

respectively representing source domain samples

The distance between the farthest source domain positive sample and the nearest source domain negative sample, and m represents the interval size of the triplet;

s360: computing cross-entropy loss for target domains

And triple loss

The specific expression is as follows:

wherein the content of the first and second substances,

representing target domain samples

Belong to the category

The probability of (c).

And

respectively representing target domain samples

The distance between the positive sample of the farthest target domain and the negative sample of the nearest target domain, and m represents the interval size of the triad;

s370: the final loss function L of the initialized model M' can be obtained according to the formula (10), the formula (12), the formula (13) and the formula (14) _total The expression is as follows:

wherein the content of the first and second substances,

representing the soft cross-entropy loss weight,

representing soft triplet loss weights.

The combination of cross entropy loss and triple loss is adopted for calculation, so that a weight balance effect can be achieved, and the influence of a noise pseudo label generated by a target domain on model training can be effectively reduced.

Preferably, the final loss function L in S370 is utilized _total Calculating the loss of M' and updating f (· | theta) by gradient back propagation ^t ) Middle parameter, updated by equation (16)

And the parameters of (A) and (B):

where α is the momentum factor and t represents the number of rounds of training.

Compared with the prior art, the invention has at least the following advantages:

1. aiming at the problem that the prior art method possibly cannot fully utilize the source knowledge in the training process, the invention provides a novel PKSD method which can effectively utilize the source domain knowledge in the whole training process and improve the accuracy of the discrimination on the unmarked target domain.

2. The invention provides a time sequence domain relation method TDR with linear change, which reduces the influence of domain specific samples in a source domain by reducing the training weight of the source domain.

3. The invention provides a sequencing-guided sample selection method RIS, which selects source domain samples with rich and reliable information by calculating uncertainty and similarity indexes of the source domain samples.

4. In order to alleviate the influence of catastrophic forgetting on a source domain, the pedestrian re-recognition model is trained in a collaborative training mode. Specifically, there is a training model that is common to the truly labeled source domain samples and the target domain samples assigned with the pseudo-labels. Different from most methods in the past, the method does not adopt a two-stage training strategy of pre-training and fine adjustment, but changes into a single-stage cooperative training mode. However, as the number of training rounds grows, the model tends to over-fit to some domain-specific knowledge on the source domain, which can impair the performance of the model on the target domain when the domain difference between the source domain and the target domain is large.

Drawings

FIG. 1 shows the main structure of PKSD according to the method of the present invention.

FIG. 2 shows the validity verification results of the method of the present invention and other different methods.

Detailed Description

The present invention is described in further detail below.

The invention trains the pedestrian re-recognition model in a cooperative training mode. Specifically, there is a training model that is common to the truly labeled source domain samples and the target domain samples assigned with the pseudo-labels. Different from most methods in the past, the method does not adopt a two-stage training strategy of pre-training and fine adjustment, but changes into a single-stage cooperative training mode. However, as the number of training rounds grows, the model tends to over-fit to some domain-specific knowledge on the source domain, which can impair the performance of the model on the target domain when the domain difference between the source domain and the target domain is large.

In order to solve the above problems, the present invention proposes a novel cross-Domain pedestrian re-identification method of Source Domain Knowledge Preservation (PKSD) to effectively utilize Knowledge from the Source Domain in the whole training process. Unlike previous two-stage based training criteria, PKSD employs a collaborative training strategy, i.e., learning both source domain samples and target domain samples. Specifically, in each iteration, the PKSD trains the model together by using not only the target domain data with the pseudo-label as input to the model, but also the source domain data with the true label as input. While the source domain samples are fully utilized, the domain-specific knowledge present in the source domain plays a detrimental role in the domain-applicable task. Therefore, a linear Time Domain Relationship (TDR) method is proposed in this section to gradually alleviate the influence of the source Domain samples. Specifically, as the number of training times increases, the training weight of the source domain is gradually decreased. However, some information-rich and reliable domain sharing knowledge is helpful to improve the performance of the model on the target domain. Further, this section proposes a Ranking-guided sample Selection (RIS) method to evaluate uncertainty and similarity of each sample from the source domain, select samples with informative and reliable domain sharing knowledge by Ranking with uncertainty and similarity scores, and reassign their sample training weights. In general, by controlling the source domain weights and the sample weights, the proposed PKSD can effectively suppress the influence of domain-specific knowledge from the source domain, improving the test performance of the model on the target domain. The experimental results show that the proposed method greatly exceeds the most advanced method at present.

Referring to fig. 1, a cross-domain pedestrian re-identification method based on deep learning includes the following steps:

wherein the content of the first and second substances,

represents the ith source domain sample,

representing the corresponding real label, n, of the ith source domain sample _s Representing a source domainTotal number of samples;

selecting partial data in the data set B as a target domain training set D _T ，D _T The expression of (a) is as follows:

wherein the content of the first and second substances,

The parameters are related to the module II;

initializing parameters of the model M by using a data set ImageNet to obtain an initialized model M'; the ResNet-50 model is the prior art, the data set ImageNet is the existing public data set, and compared with other public data sets, the data set ImageNet has better accuracy of the given initialization parameters and cannot generate too large random errors;

the loss function of the initialization model M' in S300 is as follows:

s310: using a momentum feature encoder pair D _T The data in the database is subjected to feature extraction and is stored in a memory feature library N, and then all the features in the N are clustered and generated by using a DBSCAN clustering algorithm which is the prior art

One-to-one pseudo label

S320: computing a training weight w for each iteration of a source domain using a time-series domain relationship strategy _d (i) The time sequence domain relation method is the prior art, and the maximum training weight of each iteration of the preset source domain is t ₁ Minimum is t ₂ Wherein t is ₁ ＞t ₂ The calculation expression is as follows:

w _d (i)＝(1-s(i))×t ₁ +s(i)×t ₂

The method comprises the following specific steps:

s331: from the source domain D _s Randomly selecting a source domain sample

And using a line feature encoder f (· | θ) ^t ) To pair

Classifying and calculating respectively

Probability distribution of classification over target domain

And probability distribution of classification over source domain

By class classifier C on the target domain ^t Classifying each source domain sample so as to measure the similarity between the source domain sample and the target domain; by class classifier C on the source domain ^s Classifying each source domain sample so as to measure the uncertainty of each source domain sample, wherein the category classifier of the target domain and the category classifier of the source domain both adopt the existing classifiers, and the calculation expression is as follows:

wherein the content of the first and second substances,

to represent

representative sample

s332: computing

Similarity score with target domain, Si, expressed as follows:

s334: definition of

And

computing

Uncertainty U over the source domain _i The expression is as follows:

s335: computing stationActive domain sample uncertainty values, component uncertainty value sets

The expression is as follows:

the smaller the similarity between a sample selected from the source domain and the target domain is, the larger the difference in appearance information between the sample and the sample in the target domain is; conversely, if the source domain sample has greater similarity to the target domain, then the sample is more likely to have domain-shared knowledge with the target domain sample. For samples from the source domain

The sample has low similarity to the target domain, and the share of the model is gradually decreased (TDR) along with the increase of the number of training rounds; conversely, if the sample has a higher similarity to the target domain, his contribution will not be affected by the method from TDR.

If the source domain sample has larger uncertainty, the fact that the sample still has plenty of information for the model to learn is shown. By combining the methods proposed by the formula (6) and the formula (8), reliable and information-rich samples can be selected on the source domain, and by increasing the training weights of the samples, the domain sharing knowledge from the source domain can be effectively utilized, so that the performance of the model on the target domain can be further improved.

The specific expression is as follows:

wherein the content of the first and second substances,

representing source domain samples

Belong to the category

The probability of (d);

The method comprises the following specific steps:

s351: calculate the ith to

The weight lost for a triplet of anchor points is

The calculation expression is as follows:

wherein the content of the first and second substances,

is shown and

the source domain positive samples that are the farthest away,

is shown and

the nearest source domain negative examples;

The specific expression is as follows:

wherein the content of the first and second substances,

and

respectively representing source domain samples

The distance between the farthest source domain positive sample and the nearest source domain negative sample, and m represents the interval size of the triplet; more precisely, m represents the minimum difference between the distance between the pair of positive sample features and the distance between the pair of negative sample features, where m is set to 0.5 based on empirical values; this is a hyper-parameter used in designing the loss function; the main effect is to pull the distance of the same type of sample feature pair close and push the distance of the different type of sample feature pair open by a threshold.

S360: computing cross-entropy loss for target domains

And triple loss

The specific expression is as follows:

wherein the content of the first and second substances,

representing target domain samples

Belong to the category

The probability of (c).

And

respectively representing target domain samples

wherein the content of the first and second substances,

representing the soft cross-entropy loss weight of the cross,

representing soft triplet loss weights.

Using the final loss function L in S370 _total Calculating the loss of M' and updating f (· | theta) by gradient back propagation ^t ) Middle parameter, updated by equation (16)

The parameters in (1):

Experimental design and results analysis

Introduction of data sets used

The invention verifies the effectiveness of the proposed method on three widely used public data sets, i.e. Market1501, DukeMTMC-ReID and MSMT 17. For Market1501, the data set contains 1501 32668 pedestrian images of different identities taken by 6 cameras. Of these, 12936 pedestrian images of 751 were used for training, and the remaining images were used for testing. For the DukeMTMC-ReID, the data set contained 16522 training images, 2228 query images, and 17661 galery images from 702 different identities taken by 8 cameras. MSMT17 is a larger data set that includes 126441 images of 4101 different identities captured by 15 cameras. Specifically, 1041 different identities of 32621 pedestrian images were used as a training set, with the remaining images being used as a test set. This subsection is evaluated with the Mean Average Precision (mAP) and Cumulative Matching Curve (CMC) of two commonly used evaluation indexes. For convenience of description, the following refers to the mark 1501, DukeMTMC-ReID and MSMT17 as mark, Duke and MSMT respectively, and no additional description is provided.

2, Experimental setup

In the experiment, the proposed method uses ResNet-50 as a feature code and loads pre-training parameters on ImageNet. For the co-training setup, 64 pedestrian images of 16 different identities are constructed for each mini-batch on the source and target domains. And optimizing the network by using an Adam algorithm with the weight attenuation rate of 0.0005. The whole training process is carried out for 40 times, and in the current chapter, a war-up strategy is adopted in the previous 10 times of training, and the initial learning rate is set to be 0.00035. For each training, the momentum feature encoder f _m (·|θ _m ) The parameters are updated by time-sequence moving average, and the momentum factor alpha is 0.999. The reassignment of the pseudo-label is performed after every 400 iterations. All pedestrian images are resized to 256 x 128 size as input to the network. During the testing, the output of the last Batch Normalization (BN) layer was used as the final characterization of the pedestrian image. All experiments were performed on a Pytorch platform and three NVIDIA TITAN V GPUs were used. Note that the test phase is only the momentum feature encoder fm (. |. theta.) _m ) For testing.

Table 1 compares some of the latest methods on the Market and Duke data sets, respectively.

Table 2 compares some recent methods on MSMT datasets.

3, learning by ablation

In order to verify the validity of each proposed method module, this section combines different method modules, and tests under the setting that Duke and mark are respectively used as a source domain and a target domain. As shown in fig. 2, the left side of fig. 2 shows the test results of different methods on the source domain, and the right side of fig. 2 shows the test results of different methods on the target domain. Wherein, the target and Duke are respectively used as the target domain and the source domain. From the test result on the source domain, it can be seen that with the increase of the number of iterations, the model performance under the two-stage training strategy shows a great decline trend, and finally only 20% of mAP is obtained. This indicates that the two-phase training strategy can lead to forgetting of the source domain knowledge by the fine-tuning phase model. In contrast, the collaborative training mode provided by the invention well overcomes the catastrophic forgetting problem encountered by the source domain knowledge and ensures the final performance of the model on the source domain. As can be seen from the test results on the target domain, the test performance of the model under the target domain is limited because the two-stage training strategy does not fully utilize the knowledge from the source domain. However, the method provided by the invention can fully and effectively utilize the source domain knowledge and obviously improve the test result of the model in the target domain. Specifically, the model under the two-stage training strategy obtains 74.7% of mAP, and the benchmark method under the collaborative training provided by the invention can achieve 79.1% of mAP. By introducing TDR to prevent overfitting of the model on the source domain, the test results of the model under the target domain achieved a 0.9% increase in the mAP. Further, combined with the RIS module, the model finally reaches 80.7% of the maps. From the experimental results, the method provided by the invention can effectively utilize the knowledge from the source domain, and further promotes the test result of the model on the target domain.

4, comparison of results

The method compares the PKSD method with the existing mainstream cross-domain pedestrian re-identification method. Note that the global Pooling layer (GAP) of the model for the final features is changed to the Generalized Mean Pooling layer (Gem). The experimental results are shown in table 1, and it can be seen that the performance of PKSD under the strategy of cooperative training greatly surpasses all the most advanced cross-domain pedestrian re-identification methods. Specifically, in the setting of 'Duke to mark', three generation-based methods, namely SPGAN, PTGAN and ATNet, are compared first. Compared with the best generation method, namely ATNet, PKSD is improved by 58.5 percent and 37.8 percent respectively in mAP and Rank 1. Further, the mainstream methods represented by NRMT, MEB-Net, UNRN, GLT, IDM, PDA and the like were compared. The proposed method achieves the best performance, whether on 'mark to Duke' or 'Duke to mark'. In particular, PKSD brought a 0.9% improvement in the mAP over PDA on 'mark to Duke'. Also, PKSD increased by 1.9% mAP over PDA on 'Duke to Market'.

The present method has also been experimented with on larger and more challenging MSMT datasets. Some of the latest approaches, such as: NRMT, UNRN, GLT, IDM and PDA, demonstrated their good performance on MSMT datasets. As shown in Table 2, PKSD achieved 63.8% Rank1 and 36.5% mAP on 'Market to MSMT'. Similarly, under the 'Duke to MSMT' setting, PKSD reached 63.8% Rank1 and 36.7% mAP. Compared with other methods, the PKSD provided by the method achieves the best test result. Overall, these experiments show that fully efficient utilization of knowledge from the source domain can further improve the performance of the model on the target domain.

Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims

1. A cross-domain pedestrian re-identification method based on deep learning is characterized by comprising the following steps: the method comprises the following steps:

wherein the content of the first and second substances,

represents the ith source domain sample,

wherein the content of the first and second substances,

denotes the J-th target domain sample, N _t Represents the total number of samples of the target domain;

The parameters are related to the module II;

2. The deep learning-based cross-domain pedestrian re-identification method according to claim 1, characterized in that: the loss function of the initialization model M' in S300 is as follows:

One-to-one pseudo label

S320: computing a training weight w for each iteration of a source domain using a time-series domain relationship strategy _d (i) Presetting the maximum training weight t of each iteration of the source domain ₁ Minimum of t ₂ Wherein t is ₁ ＞t ₂ The calculation expression is as follows:

wherein, the symbol% represents the operation of taking the remainder, i represents the ith training, e represents the maximum training time, w _d (i) Represents the training weight of the source domain action in the ith iteration training, s (i) TableWill t ₁ And t ₂ The length of each part after equal interval division;

The method comprises the following specific steps:

s331: from the source domain D _s Randomly selecting a source domain sample

And using a line feature encoder f (· | θ) ^t ) To pair

Classifying and calculating respectively

Probability distribution of classification over target domain

And probability distribution of classification over source domain

The calculation expression is as follows:

wherein the content of the first and second substances,

to represent

representative sample

s332: calculating out

Similarity score with target Domain S _i The expression is as follows:

s334: definition of

And

computing

Uncertainty U over the source domain _i The expression is as follows:

The expression is as follows:

s340: calculating the cross entropy loss of the source domain according to the training weight of the source domain sample obtained in S336

The specific expression is as follows:

wherein the content of the first and second substances,

representing source domain samples

Belong to the category

The probability of (d);

The method comprises the following specific steps:

s351: calculate the ith to

The weight lost for a triplet of anchor points is

The calculation expression is as follows:

wherein the content of the first and second substances,

is shown and

the source domain positive samples that are the farthest away,

is represented by

Nearest source domain negative examples;

s352: after calculating the triple losses of all the source domain samples, the triple losses of the source domain can be obtained

The specific expression is as follows:

wherein the content of the first and second substances,

and

respectively representing source domain samples

s360: computing cross-entropy loss for target domains

And triple loss

The specific expression is as follows:

wherein the content of the first and second substances,

representing target domain samples

Belong to the category

The probability of (c).

And

respectively representing target domain samples

wherein the content of the first and second substances,

representing the soft cross-entropy loss weight,

representing soft triplet loss weights.

3. The deep learning-based cross-domain pedestrian re-identification method according to claim 2, characterized in that: using the final loss function L in S370 _total Calculating the loss of M' and updating f (· | theta) by gradient back propagation ^t ) Middle parameter, updated by equation (16)

The parameters in (1):