CN113326731A

CN113326731A - Cross-domain pedestrian re-identification algorithm based on momentum network guidance

Info

Publication number: CN113326731A
Application number: CN202110436422.3A
Authority: CN
Inventors: 何爱清; 高阳; 李文斌
Original assignee: Jiangsu Wanwei Aisi Network Intelligent Industry Innovation Center Co ltd; Nanjing University
Current assignee: Jiangsu Wanwei Aisi Network Intelligent Industry Innovation Center Co ltd; Nanjing University
Priority date: 2021-04-22
Filing date: 2021-04-22
Publication date: 2021-08-31
Anticipated expiration: 2041-04-22
Also published as: CN113326731B

Abstract

The invention provides a cross-domain pedestrian re-identification algorithm based on momentum network guidance, aiming at the problem of pseudo label noise interference caused by a domain deviation phenomenon in a cross-domain pedestrian re-identification task. The method comprises the following steps: step S1, initializing a backbone network by using a model pre-trained on the ImageNet data set; step S2, carrying out pre-fine adjustment on the model by using the marked data on the source domain data set so as to fully utilize the marked information of the source domain; step S3, initializing a proposed momentum learning frame by using a model trained by setting different random parameters on a source domain data set, and clustering according to the features extracted by the model by using a clustering algorithm to generate a hard pseudo label with a confidence coefficient of 1; step S4, designing a new softened pseudo label and a loss function to be combined with the traditional loss to train an optimization model; and S5, updating the hard pseudo label before each iteration, dynamically updating the soft pseudo label in real time, and continuously iterating to generate and optimize the pseudo label until the model converges.

Description

Cross-domain pedestrian re-identification algorithm based on momentum network guidance

Technical Field

The invention relates to the field of computer vision, in particular to a cross-domain pedestrian re-identification algorithm based on momentum network guidance.

Background

The task of pedestrian re-identification is to give a target image, and to find the image or images closest to the target image from a pedestrian database by some method. The disclosure of many large-scale pedestrian data sets in recent years has prompted the study of pedestrian re-identification technology, and it can be said that the quality and scale of the data sets are key to improving the performance of pedestrian re-identification technology. However, the labeling of the data set is very labor and material consuming, and environmental factors also prevent the collection of effective data to a certain extent, considering that in practical application, the robustness of the model directly determines the practicability. Research shows that for a pedestrian re-recognition model trained on a large-scale data set, if the model is directly deployed in a new monitoring system, due to data distribution differences among the fields, the performance of the model can be reduced in a cliff breaking mode.

In order to solve this problem, many efforts have been made by researchers. There was some early work on unsupervised pedestrian re-identification tasks with manually designed features, but these features were generally not very discriminative and did not work well. Method based on image generation^]Constraints are usually set to complete style conversion from source domain images to target domain images, the source domain images can be converted into images with styles of target domain data while label information of the source domain data is maintained, newly generated style data sets or the style data sets and the target domain data are mixed to form a larger data set to be learned or trained independently, and then models are migrated to the target data sets. The example classification-based method generally considers each pedestrian image as a single category at first, starts with the similarity between image characteristics, and designs an algorithm to find out the imageAnd the similarity neighborhood is adopted, so that the image retrieval of the target domain is realized, and the key of the method is how to effectively perform sample correlation degree matching. Strict domain-based adaptive method^]In consideration of eliminating or reducing the difference between a source domain and a target domain to transfer information with discriminant force between the domains, some resist 'finding the identity and existence difference' through the two domains, and some try to assume that a group of shared middle-layer semantic attributes exist between the source domain and the target domain to improve the accuracy of pedestrian re-identification. Among the many attempts of researchers, the method which is most widely applied and has the highest performance is a pseudo tag generation-based method, wherein the algorithm flow of the pseudo tag generation method based on clustering is as follows: firstly, clustering pedestrians on a target domain data set by using a clustering algorithm to print pseudo labels, secondly, regarding the pseudo labels as a kind of supervision, and performing similar supervised training on the target domain, but the method is often interfered by pseudo label noise in the training process. The pseudo-tag noise is mainly due to: the number of pedestrian classes in the target domain is unknown, the clustering algorithm is limited, the domain deviation causes the pre-trained network on the source domain to have limited performance on the target domain, and the like. If the initial pseudo label is low in reliability and high in noise, the model is likely to crash directly and deviate from a correct training track, and the problem is not effectively solved by the existing method.

The invention provides a cross-domain pedestrian re-identification method based on momentum network guidance based on a clustering algorithm. The method has a simple structure, and the output of the momentum network is used for guiding the training of the backbone network; meanwhile, the randomness of the network is enhanced by using a data enhancement mode, the designed soft label and loss can fully optimize the model without being limited to blind learning of the pseudo label, and the network is optimized from different angles together with the traditional loss, so that the interference of pseudo label noise is reduced, and the robustness of the model is enhanced.

Disclosure of Invention

The purpose of the invention is as follows: the invention provides a cross-domain pedestrian re-identification method based on momentum network guidance, aiming at solving the problem of false tag noise interference caused by domain deviation, unknown pedestrian category number in a target domain, limitation of an algorithm and the like in an unsupervised pedestrian re-identification task, so as to reduce the false tag noise in a cross-domain scene and improve the unsupervised domain adaptive pedestrian re-identification model retrieval performance.

The technical scheme is as follows: a cross-domain pedestrian re-identification method based on momentum network guidance focuses on the following three aspects: the first aspect is how to utilize the label information of the source domain data set, so as to provide an initial pseudo label with better quality for subsequent training and ensure that the optimization can not deviate from a correct track; the second aspect is to soften the generated pseudo label, because the pedestrian category number and the pedestrian ID of the target domain are unknown during the unsupervised training process, the training is performed only by the generated pseudo label, and the generated pseudo label may be noise; a third aspect is how to guarantee the temporal and spatial complexity of the model. The technical scheme of the invention provides the following steps:

step 1, initializing a backbone network of the invention by using a pre-trained model on a large-scale image data set ImageNet so as to transfer and utilize some prior image discriminant characteristics;

step 2, the initialized network finely adjusts and optimizes on the source data set;

step 3, dividing the target domain data set into a training set and a testing set, dividing a momentum learning framework into a backbone network and a momentum network, then initializing the two networks by respectively using models obtained by setting different random parameters on the source domain data set for training so as to realize the utilization of the label information of the source domain data set, classifying the pedestrian images of the target domain by a clustering algorithm and giving each pedestrian image a pseudo label;

step 4, softening the pseudo label, and cooperatively supervising the network by the new softened pseudo label and the original hard pseudo label, thereby enabling the network to train and learn more smoothly and stably;

step 5, updating the hard pseudo label off line, updating the soft pseudo label on line, and iterating to generate and optimize the pseudo label; and after the training is finished, using the part of the momentum network in the momentum frame for a subsequent testing process.

Steps 1 to 2 belong to a training part of a source domain data set, and steps 3 to 5 belong to an optimization part of a target domain data set. Wherein step 5 further comprises the operation of a test phase. The network architecture diagram of the present invention is shown in fig. 1. The deep learning pedestrian re-identification training stage framework is shown in fig. 2, and the testing stage framework is shown in fig. 3.

In step 1, modifying the ResNet-50 network layer to obtain the network structure of the present invention: removing the final layer of the solution, adding two additional FC layers, namely FC-1000 and FC-M_S(ii) a The first full connection layer is used for extracting features and used for triple loss; the second layer fully-connected layer generates a probabilistic prediction belonging to each class for use in classifying losses.

In step 2, the source domain data set and the target domain data set are disjoint, and are usually collected under different time points, places and cameras, and there is usually a great data distribution difference between the two data sets, where the source domain data set is expressed as:

where x represents a pedestrian image and y represents a tag corresponding to a pedestrian. N is a radical of_sIndicating the size of the data set.

Supervised training is carried out on a source domain data set, cross entropy classification loss and traditional threshold-based triplet loss are used, and the calculation forms are respectively expressed as follows:

where F denotes feature coding, C denotes a classifier, θ denotes model parameters, m denotes a distance threshold, which is usually taken to be 0.5, and subscripts p and n denote positive and negative examples of the original sample, respectively. In the source domain pre-training stage, a source domain data set with marks is given, and an initialization model for target domain training is obtained through a combined training network of two loss functions.

In step 3, the target domain data set without label information may be represented as:

two models with better performance on a source domain are selected to initialize two networks of the momentum framework, and the input of the two networks is obtained by processing the same pedestrian image in different data enhancement modes, so that the randomness of the models is improved. The clustering algorithm selects simple and effective K-means clustering, different clustering numbers K are respectively set on different task sets, and the set values usually include various conditions larger than or smaller than the real category number of the target domain.

In step 4, the characteristics output by the first FC layer of the momentum network and the classification prediction output by the second FC layer provide soft labels for the backbone network respectively; the soft class label is a probability vector that can be expressed as: cs ═ 0.10,0.32,0.21,0.05,0.13,0.11]The value of each dimension represents the probability that the input image belongs to that class, and there is a total of M_tAnd (5) maintaining.

For the soft triple label, three samples are involved, namely an original sample, a positive sample and a negative sample; the definition of the positive and negative samples comes from pseudo labels generated by a clustering algorithm, wherein the same type is positive, and the different type is negative. The invention designs a new soft triple label by using Wasserstein distance to measure the distribution difference of characteristics among samples, which can be expressed as follows:

wherein Sim represents the distribution similarity measure, θ represents the average network parameter, the triple label value range is 0 to 1, and the output of the average network is taken as a soft label to be smoother and more stable.

In step 5, the momentum network is historical, and its updating mode is not gradient back transmission through loss like the backbone network, but a weighting calculation mode is adopted to balance the past performance of the momentum network itself with the performance of the backbone network after immediate updating, and a suitable hyper-parameter is selected, which is defined as:

E^(T)[θ]＝αE^(T-1)[θ]+(1-α)θ

t denotes the iteration time and a denotes the weight override parameter, typically taken to be 0.999.

In the testing stage, the similarity measurement modes of the query image and the search set image are generally two, one is the euclidean distance:

the other is the cosine distance:

and selecting a certain similarity measurement mode to carry out similarity calculation on the feature vectors of the two, and returning the sequence of similarity results from large to small.

Has the advantages that: the invention provides a cross-domain pedestrian re-identification algorithm based on momentum network guidance, and provides a momentum learning framework designed by utilizing the capability of a neural network for capturing and learning data distribution, so that the interference of pseudo label noise in the training process is reduced, and the pedestrian re-identification retrieval performance is improved; a new way of learning feature distribution is proposed: based on L with tradition₂Distance triple loss and newly-proposed triplet loss based on Wassertein distance are jointly trained, the distance is shortened from the linear distance and the distribution distance to the characteristic distance of the positive sample and the characteristic distance of the negative sample, and critical trust for the pseudo label can be kept in an unsupervised task; meanwhile, a large number of experiments are carried out on a plurality of pedestrian re-identification cross-domain tasks, and compared with the prior art, the performance improvement is good, and the method has a certain reference value.

Drawings

FIG. 1 is a network framework diagram of the present invention.

Fig. 2 is a training framework diagram of a deep learning pedestrian re-recognition task.

Fig. 3 is a deep learning pedestrian re-identification task testing framework diagram.

FIG. 4 is a graph of a validation hyperparameter λ_idAnd λ_triAnd the influence of different values on the performance is shown schematically.

Fig. 5 is an exemplary diagram of a pedestrian re-identification task data set image in the present invention.

Detailed Description

The invention is further illustrated below with reference to specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention.

A cross-domain pedestrian re-identification method based on momentum network guidance is disclosed, and a network framework is shown in figure 1. The framework utilizes the output of the momentum network to guide the training of a backbone network to reduce the noise interference of the pseudo label, and in the training stage, results obtained by processing different data enhancement such as random cutting, overturning and the like are adopted for a pedestrian image as the input of two networks. In order to fully utilize the label information of a source data set, the invention initializes two networks of a momentum learning framework by different model parameters obtained by training information such as different random seeds on the source data set; meanwhile, clustering is carried out according to the extracted pedestrian features of the target domain by using a K-Means clustering algorithm to generate a classified hard pseudo label with confidence coefficient of 1. The clustering algorithm can mine and play the potential connection existing among the samples of the same type. However, due to the existence of the pseudo tag noise, the pseudo tag generated by the clustering algorithm is not always correct, and if the network learns a wrong pseudo tag with a confidence level of 1, the model is seriously disturbed. The invention provides a softened label, the confidence coefficient of which is less than 1, the noise interference of a pseudo label can be reduced, and a model can be smoothly and stably trained. New "soft" classification losses and "soft" triple losses are proposed based on this. The invention also indirectly provides a way of learning feature distribution from different angles. In the testing stage, only the momentum network is used for feature extraction, and a reordering method is not adopted. The method comprises the following specific steps:

step 1, initializing a pre-trained model on an ImageNet data set for a backbone network;

step 2, carrying out supervised training on the source domain data set to finely tune the model, obtaining a plurality of models by using different random parameters, and selecting two models with better performance;

step 3, initializing a backbone network and a momentum network in the proposed momentum learning framework by two models selected from the source domain data set, extracting features on the target domain data set by the initial models, and generating a hard pseudo label with a confidence coefficient of 1 by using a K-Means clustering algorithm;

step 4, designing a new softened pseudo label and a loss function by utilizing the output of the momentum network, performing combined training with the traditional loss, and observing the error change of each round;

step 5, performing off-line hard pseudo label updating, performing on-line soft pseudo label updating, momentum network linear updating, backbone network gradient updating, and finally converging the model; and selecting a momentum network part with better performance to test, and returning a sequencing result to each image for querying the pedestrians.

The content and the specific setting of the invention are specifically shown, and the experimental result is combined for analysis, so that the effectiveness of the invention is verified.

The invention adopts a two-stage training mode:

stage 1, source domain pre-training: supervised pre-training is performed on the source domain. Firstly, initializing a network model by adopting parameters pre-trained on ImageNet, and performing supervised training by using classification loss and triple loss in each mini-batch. The training period epoch at this stage is 80, the initial learning rate is 0.00035, the learning rate decays by a factor of 10 after the 40 th and 70 th training periods, and the boundary threshold m in the hard triplet loss is set to 0.5.

Stage 2, target domain optimization stage: models of different parameter values obtained by training with different random seeds are used to initialize the backbone network and the momentum network of the momentum framework. The input being taken of images of the same personDifferent data enhancement forms. Training by total loss function with a hyperparameter lambda_idAnd λ_triThe weight value alpha in the momentum network parameter updating formula is set to be 0.999, the training period at the stage is set to be 50, and the learning rate is fixed to be 0.0003. When a K-means clustering algorithm is used, in order to meet the unknown setting of the number of categories of a target domain in an unsupervised domain adaptation task, the number of categories is respectively set to be 500, 700 and 900 on a Market-1501 data set and a DukeMTMC-reiD data set; on the MSMT17 dataset, the number of categories was set to 500, 1000, 1500, and 2000. These numbers are different from the actual pedestrian category number of the dataset.

The loss functions used in the target domain training phase are as follows:

hard classification loss function:

threshold-based hard triplet loss function:

soft classification loss function:

soft triplet loss based on feature distribution:

to this end, the total loss function can be expressed as:

three classical mainstream large-scale public pedestrian re-identification datasets were used in the experiment: market-1501, DukeMTMC-reiD and MSMT 17. The specific information is shown in table 1. And measuring the performance of the model by using rank-1, rank-5 and rank-10 of the mean average precision mAP and the cumulative matching characteristic CMC curve in the analysis of the experimental result. The unsupervised domain adaptive pedestrian re-identification task needs two data sets, namely a marked source domain data set and an unmarked target domain data set, so that for the data sets of Market-1501, DukeMTMC-reiD and MSMT17, one of the data sets is sequentially selected as the source domain data set, the other data set is used as the target domain data set for experiment, and the experiment part expresses the data sets in the form of a source domain data set-target domain data set, such as Duke-Market, Market-Duke, Duke-MSMT, Market-MSMT and the like. In the test process, the evaluation index is calculated by setting the retrieval of the single pedestrian image, and other post-processing methods such as reordering and the like are not used.

In addition, each mini-batch of the experimental link contains 16 pedestrians, and each pedestrian has 4 images, and the total number of the pedestrians is 64. And updating the operation of generating the hard pseudo label after each epoch is finished, and clustering and generating the initial hard pseudo label according to the pedestrian image characteristics extracted from the target domain by the pre-trained model of the source domain. The mini-batch of the target domain dataset image needs to be reassembled after each round of hard pseudo-tag updates. All images are resized to 256 x 128 before being sent to the network.

TABLE 1 pedestrian re-identification public mainstream image dataset

The algorithm model provided by the invention is realized based on a deep learning framework-PyTorch, and the training and the testing of the model are completed on a Linux server with 4 GTX-1080TI GPUs.

The algorithm compared with the invention comprises 2 models LOMO and BoW which are used for unsupervised learning by traditional manual feature extraction and 4 advanced algorithms based on deep learning, and specifically comprises the following steps:

pseudo-label based method: TJ-AIDL, PCB-R-PAST, SSG, ACT, MMT, AD-Cluster, and NRMT;

the method based on image generation comprises the following steps: HHL, PTGAN, SPGAN, CR-GAN, and SDA;

example classification based approach: ECN, MPLP^]And LAIM;

method based on domain adaptation: UCDA.

The performance on the Duke-mark, mark-Duke tasks is shown in table 2. Table 3 shows that compared results (the result with the optimal performance on the current task is represented by bold font) of various methods on Duke-MSMT and Market-MSMT tasks are analyzed from the table, and the method is obviously superior to other methods under the same setting.

Firstly, compared with the method LOMO and BoW for unsupervised pedestrian re-identification by directly utilizing manual characteristics, the method has remarkable superiority. For example, the BoW algorithm obtains 14.8% of mAP index and 35.8% of rank-1 accuracy on Duke-Market, the invention respectively improves 59.6 and 54.5 percent of mAP index and rank-1 accuracy relative to BoW, and the performance of the BoW and LOMO methods is much inferior to that of other methods in the table. Therefore, the performance of the method based on the manual feature extraction is quite different from that of the method based on the deep learning. The information in the data cannot be fully expressed only by manually designing and extracting the features, so that the method performance is poor.

Compared with an example classification-based method, an image generation-based method and a domain self-adaption-based method, the method disclosed by the invention has the advantage that the performance is greatly surpassed. The method based on image generation usually only focuses on carrying out style transformation on the image and neglects potential connection existing among samples of the same type, so that the generated image loses important information; the method based on example classification usually focuses on how to establish the relationship between samples, and how to effectively perform the sample correlation degree matching is still a problem which needs to be solved urgently. In a macroscopic aspect, a method based on domain self-adaptation is to reduce the difference between domains from a feature representation level, and belongs to the implicit solution of the unmarked problem. Example classification or pseudo-labeling based methods learn explicitly by generating pseudo-labels directly on label-free data or by establishing associations between these samples. Thus, the method based on pseudo-mark and instance classification can more directly deal with the unsupervised pedestrian re-identification problem than other methods as a whole.

The mAP indexes of the Duke-Mark task, the Marke-Duke task, the Duke-MSMT task and the Marke-MSMT task respectively reach 74.4%, 66.0%, 25.3% and 24.7%, and compared with the mAP indexes and rank-1 indexes of the optimal algorithm in the table, the mAP indexes and rank-1 indexes are respectively improved by 2.7%, 0.9%, 2.0% and 1.8%. By setting different clustering values, the model performance has different performances, but the invention is in a higher level in the field.

The comparison of the experimental results on the three pedestrian re-identification standard data sets verifies that the framework structure of the invention is simple, and the output of the momentum network is utilized to guide the training of the backbone network. Meanwhile, the randomness of the network is enhanced by using a data enhancement mode, the designed soft label and loss can fully optimize the model without being limited to blind learning of the pseudo label, and the network is promoted from different angles together with the traditional loss, so that the interference of the noise of the pseudo label is reduced, and the network discriminability is enhanced, thereby improving the performance of the model.

TABLE 2 Performance performance on Duke-Market/Market-Duke tasks (%)

TABLE 3 Performance (%)

Table 4 shows ablation experiments performed on Duke-mark, mark-Duke tasks to verify the effectiveness of the various component modules of the present invention. ResNet-50 is adopted as a backbone network in the experiment, and the number of the classes is set to be 700. First, the meaning of some symbol names in the table is introduced. Supervised represents an experimental result obtained by training a deep network model under the condition of a known data set real pedestrian identity label; pre-trained indicates that the model Pre-trained on the source domain only by classification loss and triplet loss is applied directly to the unmarked target domain dataset; the Baseline represents that only a hard pseudo label supervision part in the framework of the invention is reserved, namely the most basic processing flow of a pseudo label method based on clustering, a hard pseudo label is generated by using a K-means clustering method, and a result is obtained by training by using hard classification loss and original hard triple loss; only indicates training optimization using only losses in parentheses; w/o indicates that parts other than the bracketed module are used to train the network; the numbers "500, 700, 900, 1000, 1500, 2000" indicate the number of target domain categories set in the K-means cluster.

It can be found from comparison of results of supervised learning and direct migration that the pedestrian re-identification technology has reached a satisfactory level in the case of a mark, but when a pre-trained model on one data set is directly "photographed" on another data set, the performance of the model is drastically reduced due to a significant data distribution difference between the two data sets. For example, when the model trained on the Market-1501 data set is directly applied to the Duke MTMC-reiD data set, the mAP index is reduced to 30.0% from the original 83.7%. Therefore, a cross-domain pedestrian re-identification method needs to be researched to improve the generalization capability of the model.

A baseline comparison was set to verify that there is a false label noise problem in the clustering-based false label approach, which uses only hard false label supervision. The baseline method is greatly improved relative to direct migration, but compared with the method, the difference between mAP on Duke-Market tasks and between mAP on Market-Duke tasks is 19.4% and 17.5%, which indicates that the method has serious false tag noise interference. The invention plays an important role in processing the pseudo tag noise.

To verify the validity of the proposed soft triplet penalty, the hyper-parameter λ _tri0. As can be seen from the table, the method

Without supervision of the loss of "soft" triples, a significant degradation of model performance occurred. This illustrates that the "hard" pseudo tag and conventional "hard" triple losses do not effectively cope with the interference of the pseudo tag noise, reflecting the value of the proposed feature distribution based "soft" triple losses from the side.

In order to verify the ingenuity of the momentum network design, the momentum network is replaced by the network with the same backbone for experiment, and the method is Baseline + MN-700(E [ theta ])]→ θ'). That is, the output of the two networks themselves is delivered to the other network as a kind of supervision, e.g. initializing the two networks net₁And net₂By using net₁Output supervision net of₂、net₂Output supervision net of₁. The scheme has two defects, one is that the two networks are updated quickly by utilizing gradient return parameters, if the noise is large, the two networks can be amplified quickly, the influence is serious, and the supervision mode is unstable and is easy to interfere with the learning of the networks; and secondly, the outputs of the two networks are directly transmitted to the other network, so that the two networks can quickly converge to be similar, the complementarity of the outputs is greatly reduced, and the outputs become a redundant structure. Therefore, this approach is not feasible. From the experimental result, the performances of the models mAP and rank-n after replacement are obviously reduced, and the mAP and rank-1 on the Market-Duke task are respectively reduced by 7.9% and 5.7% compared with the models proposed in the chapter, thereby verifying the idea.

Besides the provided soft label and loss and momentum learning framework effectively improve the experimental performance, the hard pseudo label also has great significance for the feature learning of the unmarked target domain. Two experiments are designed for respectively verifying hard classification loss and traditional hard threeThe effect of tuple loss. Method for verifying hard class loss

Hyper-parametric lambda _id1 is ═ 1; method for verifying loss of traditional triples

Hyper-parametric lambda _tri1. The results in the table are observed to show that the model performance is reduced to different degrees, and the removal of the hard classification loss even leads the network performance to be lower than that under the setting of the Pre-trained method, thereby illustrating the effectiveness of the hard classification loss. This is because the original network will typically output relatively uniform probabilities for each identity, which act as soft labels for soft classification losses, because it initially cannot correctly distinguish between different identities on the target domain, and training directly with such smooth and noisy soft pseudo labels, the network in the framework will soon crash due to excessive bias. A hard pseudo label with confidence 1 for classification loss is crucial for learning the discriminative feature representation on the target domain.

TABLE 4 comparative analysis of ablation experiments Duke-Market/Market-Duke (%)

Method	mAP	rank-1	rank-5	rank-10
					Supervised(Duke)	72.0	90.2	96.3	96.9
Supervised(Market)	83.7	93.0	97.3	98.3
					Pre-trained	32.0/30.0	62.1/46.7	76.5/61.8	82.4/67.5
Baseline	55.0/48.5	76.0/66.9	88.5/80.1	91.8/84.3
					Baseline+MN-700(w/o L^t _id)	30.1/20.4	55.1/30.8	69.8/42.3	74.5/48.0
Baseline+MN-700(w/o L^t _tri)	73.2/64.0	89.3/76.7	94.5/87.8	96.1/92.9
					Baseline+MN-700(w/o L^t _sid)	65.2/61.5	87.3/75.7	94.1/87.8	95.1/91.0
Baseline+MN-700(w/o L^t _stri)	68.2/62.5	87.8/75.7	93.9/87.8	95.1/92.5
					Baseline+MN-700(E[θ]→θ’)	64.2/58.1	83.8/72.7	92.9/85.8	94.1/89.4
MN-Guided(ours)-700	74.4/66.0	90.3/78.4	95.3/90.2	97.0/94.0

Aiming at weight value over-parameter lambda_idAnd λ_triThe clustering number is 700 on the Duke-Market task. The results of the experiment are shown in FIG. 4. When lambda is_idAnd λ_triThe performance is optimal when the values are 0.5 and 0.8 respectively. All experiments were performed based on these two values.

The embodiment provides core ideas, steps and parameters of a cross-domain pedestrian re-identification method based on momentum network guidance. In addition, the implementation is only a preferred specific implementation of the present invention, and the setting of the parameters needs to be adjusted according to specific variables and data in the specific implementation process, so as to achieve a better practical effect.

Claims

1. A cross-domain pedestrian re-identification algorithm based on momentum network guidance is characterized by comprising the following steps:

step S1, initializing a backbone network by using a model pre-trained on the ImageNet data set;

step S2, carrying out pre-fine adjustment on the model by using the marked data on the source domain data set so as to fully utilize the marked information of the source domain;

step S3, initializing a proposed momentum learning frame by using a model trained by setting different random parameters on a source domain data set, and clustering according to the features extracted by the model by using a clustering algorithm to generate a hard pseudo label with a confidence coefficient of 1;

step S4, designing a new softened pseudo label and a loss function to be combined with the traditional loss to train an optimization model;

and S5, updating the hard pseudo label before each iteration, dynamically updating the soft pseudo label in real time, continuously iterating to generate and optimize the pseudo label until the model is converged, and using the momentum network feature coding part for testing.

2. The momentum network guidance-based cross-domain pedestrian re-identification algorithm according to claim 1, wherein the step S1 further comprises:

step S101, using ResNet-50 network as backbone network, removing the last full connection layer, adding two additional FC layers named FC-1000 and FC-M_S；

Step S102, initializing the same parameters by using a model pre-trained on a large data set ImageNet.

3. The momentum network guidance-based cross-domain pedestrian re-identification algorithm according to claim 1, wherein the step S2 further comprises:

step S201, adjusting the pedestrian image to be 256 multiplied by 128 in the same size, and setting network training related parameters;

step S202, different random seeds are set to conduct multiple times of supervised training to obtain multiple pre-training models, and cross entropy classification loss and threshold-based triple loss are adopted in the training process.

4. The momentum network guidance-based cross-domain pedestrian re-identification algorithm according to claim 1, wherein the step S3 further comprises:

s301, respectively initializing a backbone network and a momentum network in a momentum frame by two better models trained by a source domain data set, and dividing a target domain into a training set and a test set;

step S302, appointing the category number M of the specific formation of the target domain_tFirstly, the clustering algorithm carries out unsupervised clustering on the features extracted from the target domain images according to the model to generate a hard pseudo label with the confidence coefficient of 1.

5. The momentum network guidance-based cross-domain pedestrian re-identification algorithm according to claim 1, wherein the step S4 further comprises:

step S401, providing soft classification pseudo labels with confidence coefficient smaller than 1 for a backbone network by utilizing classification prediction output by a momentum network, and designing triple labels based on Wasserstein distribution distance by utilizing feature distribution output by the momentum network;

step S402, updating soft losses including soft cross entropy losses based on classification prediction and soft triple losses based on feature distribution, and jointly updating a training network together with hard losses;

and S403, updating the backbone network through gradient return, and updating the history memory and the real-time parameters of the backbone network through the momentum network in a weighting manner.

6. The momentum network guidance-based cross-domain pedestrian re-identification algorithm according to claim 1, wherein the step S5 further comprises:

s501, after each epoch is finished, the hard pseudo labels generated by the clustering algorithm are redistributed according to new characteristics, and the soft pseudo labels provided by the momentum network are continuously updated online along with the optimization of the network;

step S502, after the hard pseudo label is updated, the positive examples and the negative examples in the corresponding triple samples are also recombined;

step S503, returning to step S302, repeating the process, and continuously iterating to generate the pseudo label and optimize the pseudo label until the model converges;

and S504, in the testing stage, the characteristics of the query image and the search set image in the target domain test set are extracted by utilizing the momentum network characteristic coding part, the similarity between the query image and the search set image is measured, the sequencing result of the similarity from high to low is returned, the evaluation index is calculated, and the task of re-identifying and retrieving the pedestrians is completed.