CN113326731A - Cross-domain pedestrian re-identification algorithm based on momentum network guidance - Google Patents

Cross-domain pedestrian re-identification algorithm based on momentum network guidance Download PDF

Info

Publication number
CN113326731A
CN113326731A CN202110436422.3A CN202110436422A CN113326731A CN 113326731 A CN113326731 A CN 113326731A CN 202110436422 A CN202110436422 A CN 202110436422A CN 113326731 A CN113326731 A CN 113326731A
Authority
CN
China
Prior art keywords
network
momentum
model
domain
pedestrian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110436422.3A
Other languages
Chinese (zh)
Other versions
CN113326731B (en
Inventor
何爱清
高阳
李文斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Wanwei Aisi Network Intelligent Industry Innovation Center Co ltd
Nanjing University
Original Assignee
Jiangsu Wanwei Aisi Network Intelligent Industry Innovation Center Co ltd
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Wanwei Aisi Network Intelligent Industry Innovation Center Co ltd, Nanjing University filed Critical Jiangsu Wanwei Aisi Network Intelligent Industry Innovation Center Co ltd
Priority to CN202110436422.3A priority Critical patent/CN113326731B/en
Publication of CN113326731A publication Critical patent/CN113326731A/en
Application granted granted Critical
Publication of CN113326731B publication Critical patent/CN113326731B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a cross-domain pedestrian re-identification algorithm based on momentum network guidance, aiming at the problem of pseudo label noise interference caused by a domain deviation phenomenon in a cross-domain pedestrian re-identification task. The method comprises the following steps: step S1, initializing a backbone network by using a model pre-trained on the ImageNet data set; step S2, carrying out pre-fine adjustment on the model by using the marked data on the source domain data set so as to fully utilize the marked information of the source domain; step S3, initializing a proposed momentum learning frame by using a model trained by setting different random parameters on a source domain data set, and clustering according to the features extracted by the model by using a clustering algorithm to generate a hard pseudo label with a confidence coefficient of 1; step S4, designing a new softened pseudo label and a loss function to be combined with the traditional loss to train an optimization model; and S5, updating the hard pseudo label before each iteration, dynamically updating the soft pseudo label in real time, and continuously iterating to generate and optimize the pseudo label until the model converges.

Description

Cross-domain pedestrian re-identification algorithm based on momentum network guidance
Technical Field
The invention relates to the field of computer vision, in particular to a cross-domain pedestrian re-identification algorithm based on momentum network guidance.
Background
The task of pedestrian re-identification is to give a target image, and to find the image or images closest to the target image from a pedestrian database by some method. The disclosure of many large-scale pedestrian data sets in recent years has prompted the study of pedestrian re-identification technology, and it can be said that the quality and scale of the data sets are key to improving the performance of pedestrian re-identification technology. However, the labeling of the data set is very labor and material consuming, and environmental factors also prevent the collection of effective data to a certain extent, considering that in practical application, the robustness of the model directly determines the practicability. Research shows that for a pedestrian re-recognition model trained on a large-scale data set, if the model is directly deployed in a new monitoring system, due to data distribution differences among the fields, the performance of the model can be reduced in a cliff breaking mode.
In order to solve this problem, many efforts have been made by researchers. There was some early work on unsupervised pedestrian re-identification tasks with manually designed features, but these features were generally not very discriminative and did not work well. Method based on image generation]Constraints are usually set to complete style conversion from source domain images to target domain images, the source domain images can be converted into images with styles of target domain data while label information of the source domain data is maintained, newly generated style data sets or the style data sets and the target domain data are mixed to form a larger data set to be learned or trained independently, and then models are migrated to the target data sets. The example classification-based method generally considers each pedestrian image as a single category at first, starts with the similarity between image characteristics, and designs an algorithm to find out the imageAnd the similarity neighborhood is adopted, so that the image retrieval of the target domain is realized, and the key of the method is how to effectively perform sample correlation degree matching. Strict domain-based adaptive method]In consideration of eliminating or reducing the difference between a source domain and a target domain to transfer information with discriminant force between the domains, some resist 'finding the identity and existence difference' through the two domains, and some try to assume that a group of shared middle-layer semantic attributes exist between the source domain and the target domain to improve the accuracy of pedestrian re-identification. Among the many attempts of researchers, the method which is most widely applied and has the highest performance is a pseudo tag generation-based method, wherein the algorithm flow of the pseudo tag generation method based on clustering is as follows: firstly, clustering pedestrians on a target domain data set by using a clustering algorithm to print pseudo labels, secondly, regarding the pseudo labels as a kind of supervision, and performing similar supervised training on the target domain, but the method is often interfered by pseudo label noise in the training process. The pseudo-tag noise is mainly due to: the number of pedestrian classes in the target domain is unknown, the clustering algorithm is limited, the domain deviation causes the pre-trained network on the source domain to have limited performance on the target domain, and the like. If the initial pseudo label is low in reliability and high in noise, the model is likely to crash directly and deviate from a correct training track, and the problem is not effectively solved by the existing method.
The invention provides a cross-domain pedestrian re-identification method based on momentum network guidance based on a clustering algorithm. The method has a simple structure, and the output of the momentum network is used for guiding the training of the backbone network; meanwhile, the randomness of the network is enhanced by using a data enhancement mode, the designed soft label and loss can fully optimize the model without being limited to blind learning of the pseudo label, and the network is optimized from different angles together with the traditional loss, so that the interference of pseudo label noise is reduced, and the robustness of the model is enhanced.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a cross-domain pedestrian re-identification method based on momentum network guidance, aiming at solving the problem of false tag noise interference caused by domain deviation, unknown pedestrian category number in a target domain, limitation of an algorithm and the like in an unsupervised pedestrian re-identification task, so as to reduce the false tag noise in a cross-domain scene and improve the unsupervised domain adaptive pedestrian re-identification model retrieval performance.
The technical scheme is as follows: a cross-domain pedestrian re-identification method based on momentum network guidance focuses on the following three aspects: the first aspect is how to utilize the label information of the source domain data set, so as to provide an initial pseudo label with better quality for subsequent training and ensure that the optimization can not deviate from a correct track; the second aspect is to soften the generated pseudo label, because the pedestrian category number and the pedestrian ID of the target domain are unknown during the unsupervised training process, the training is performed only by the generated pseudo label, and the generated pseudo label may be noise; a third aspect is how to guarantee the temporal and spatial complexity of the model. The technical scheme of the invention provides the following steps:
step 1, initializing a backbone network of the invention by using a pre-trained model on a large-scale image data set ImageNet so as to transfer and utilize some prior image discriminant characteristics;
step 2, the initialized network finely adjusts and optimizes on the source data set;
step 3, dividing the target domain data set into a training set and a testing set, dividing a momentum learning framework into a backbone network and a momentum network, then initializing the two networks by respectively using models obtained by setting different random parameters on the source domain data set for training so as to realize the utilization of the label information of the source domain data set, classifying the pedestrian images of the target domain by a clustering algorithm and giving each pedestrian image a pseudo label;
step 4, softening the pseudo label, and cooperatively supervising the network by the new softened pseudo label and the original hard pseudo label, thereby enabling the network to train and learn more smoothly and stably;
step 5, updating the hard pseudo label off line, updating the soft pseudo label on line, and iterating to generate and optimize the pseudo label; and after the training is finished, using the part of the momentum network in the momentum frame for a subsequent testing process.
Steps 1 to 2 belong to a training part of a source domain data set, and steps 3 to 5 belong to an optimization part of a target domain data set. Wherein step 5 further comprises the operation of a test phase. The network architecture diagram of the present invention is shown in fig. 1. The deep learning pedestrian re-identification training stage framework is shown in fig. 2, and the testing stage framework is shown in fig. 3.
In step 1, modifying the ResNet-50 network layer to obtain the network structure of the present invention: removing the final layer of the solution, adding two additional FC layers, namely FC-1000 and FC-MS(ii) a The first full connection layer is used for extracting features and used for triple loss; the second layer fully-connected layer generates a probabilistic prediction belonging to each class for use in classifying losses.
In step 2, the source domain data set and the target domain data set are disjoint, and are usually collected under different time points, places and cameras, and there is usually a great data distribution difference between the two data sets, where the source domain data set is expressed as:
Figure BDA0003033280990000031
where x represents a pedestrian image and y represents a tag corresponding to a pedestrian. N is a radical ofsIndicating the size of the data set.
Supervised training is carried out on a source domain data set, cross entropy classification loss and traditional threshold-based triplet loss are used, and the calculation forms are respectively expressed as follows:
Figure BDA0003033280990000032
Figure BDA0003033280990000033
where F denotes feature coding, C denotes a classifier, θ denotes model parameters, m denotes a distance threshold, which is usually taken to be 0.5, and subscripts p and n denote positive and negative examples of the original sample, respectively. In the source domain pre-training stage, a source domain data set with marks is given, and an initialization model for target domain training is obtained through a combined training network of two loss functions.
In step 3, the target domain data set without label information may be represented as:
Figure BDA0003033280990000034
two models with better performance on a source domain are selected to initialize two networks of the momentum framework, and the input of the two networks is obtained by processing the same pedestrian image in different data enhancement modes, so that the randomness of the models is improved. The clustering algorithm selects simple and effective K-means clustering, different clustering numbers K are respectively set on different task sets, and the set values usually include various conditions larger than or smaller than the real category number of the target domain.
In step 4, the characteristics output by the first FC layer of the momentum network and the classification prediction output by the second FC layer provide soft labels for the backbone network respectively; the soft class label is a probability vector that can be expressed as: cs ═ 0.10,0.32,0.21,0.05,0.13,0.11]The value of each dimension represents the probability that the input image belongs to that class, and there is a total of MtAnd (5) maintaining.
For the soft triple label, three samples are involved, namely an original sample, a positive sample and a negative sample; the definition of the positive and negative samples comes from pseudo labels generated by a clustering algorithm, wherein the same type is positive, and the different type is negative. The invention designs a new soft triple label by using Wasserstein distance to measure the distribution difference of characteristics among samples, which can be expressed as follows:
Figure BDA0003033280990000041
wherein Sim represents the distribution similarity measure, θ represents the average network parameter, the triple label value range is 0 to 1, and the output of the average network is taken as a soft label to be smoother and more stable.
In step 5, the momentum network is historical, and its updating mode is not gradient back transmission through loss like the backbone network, but a weighting calculation mode is adopted to balance the past performance of the momentum network itself with the performance of the backbone network after immediate updating, and a suitable hyper-parameter is selected, which is defined as:
E(T)[θ]=αE(T-1)[θ]+(1-α)θ
t denotes the iteration time and a denotes the weight override parameter, typically taken to be 0.999.
In the testing stage, the similarity measurement modes of the query image and the search set image are generally two, one is the euclidean distance:
Figure BDA0003033280990000042
the other is the cosine distance:
Figure BDA0003033280990000043
and selecting a certain similarity measurement mode to carry out similarity calculation on the feature vectors of the two, and returning the sequence of similarity results from large to small.
Has the advantages that: the invention provides a cross-domain pedestrian re-identification algorithm based on momentum network guidance, and provides a momentum learning framework designed by utilizing the capability of a neural network for capturing and learning data distribution, so that the interference of pseudo label noise in the training process is reduced, and the pedestrian re-identification retrieval performance is improved; a new way of learning feature distribution is proposed: based on L with tradition2Distance triple loss and newly-proposed triplet loss based on Wassertein distance are jointly trained, the distance is shortened from the linear distance and the distribution distance to the characteristic distance of the positive sample and the characteristic distance of the negative sample, and critical trust for the pseudo label can be kept in an unsupervised task; meanwhile, a large number of experiments are carried out on a plurality of pedestrian re-identification cross-domain tasks, and compared with the prior art, the performance improvement is good, and the method has a certain reference value.
Drawings
FIG. 1 is a network framework diagram of the present invention.
Fig. 2 is a training framework diagram of a deep learning pedestrian re-recognition task.
Fig. 3 is a deep learning pedestrian re-identification task testing framework diagram.
FIG. 4 is a graph of a validation hyperparameter λidAnd λtriAnd the influence of different values on the performance is shown schematically.
Fig. 5 is an exemplary diagram of a pedestrian re-identification task data set image in the present invention.
Detailed Description
The invention is further illustrated below with reference to specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention.
A cross-domain pedestrian re-identification method based on momentum network guidance is disclosed, and a network framework is shown in figure 1. The framework utilizes the output of the momentum network to guide the training of a backbone network to reduce the noise interference of the pseudo label, and in the training stage, results obtained by processing different data enhancement such as random cutting, overturning and the like are adopted for a pedestrian image as the input of two networks. In order to fully utilize the label information of a source data set, the invention initializes two networks of a momentum learning framework by different model parameters obtained by training information such as different random seeds on the source data set; meanwhile, clustering is carried out according to the extracted pedestrian features of the target domain by using a K-Means clustering algorithm to generate a classified hard pseudo label with confidence coefficient of 1. The clustering algorithm can mine and play the potential connection existing among the samples of the same type. However, due to the existence of the pseudo tag noise, the pseudo tag generated by the clustering algorithm is not always correct, and if the network learns a wrong pseudo tag with a confidence level of 1, the model is seriously disturbed. The invention provides a softened label, the confidence coefficient of which is less than 1, the noise interference of a pseudo label can be reduced, and a model can be smoothly and stably trained. New "soft" classification losses and "soft" triple losses are proposed based on this. The invention also indirectly provides a way of learning feature distribution from different angles. In the testing stage, only the momentum network is used for feature extraction, and a reordering method is not adopted. The method comprises the following specific steps:
step 1, initializing a pre-trained model on an ImageNet data set for a backbone network;
step 2, carrying out supervised training on the source domain data set to finely tune the model, obtaining a plurality of models by using different random parameters, and selecting two models with better performance;
step 3, initializing a backbone network and a momentum network in the proposed momentum learning framework by two models selected from the source domain data set, extracting features on the target domain data set by the initial models, and generating a hard pseudo label with a confidence coefficient of 1 by using a K-Means clustering algorithm;
step 4, designing a new softened pseudo label and a loss function by utilizing the output of the momentum network, performing combined training with the traditional loss, and observing the error change of each round;
step 5, performing off-line hard pseudo label updating, performing on-line soft pseudo label updating, momentum network linear updating, backbone network gradient updating, and finally converging the model; and selecting a momentum network part with better performance to test, and returning a sequencing result to each image for querying the pedestrians.
The content and the specific setting of the invention are specifically shown, and the experimental result is combined for analysis, so that the effectiveness of the invention is verified.
The invention adopts a two-stage training mode:
stage 1, source domain pre-training: supervised pre-training is performed on the source domain. Firstly, initializing a network model by adopting parameters pre-trained on ImageNet, and performing supervised training by using classification loss and triple loss in each mini-batch. The training period epoch at this stage is 80, the initial learning rate is 0.00035, the learning rate decays by a factor of 10 after the 40 th and 70 th training periods, and the boundary threshold m in the hard triplet loss is set to 0.5.
Stage 2, target domain optimization stage: models of different parameter values obtained by training with different random seeds are used to initialize the backbone network and the momentum network of the momentum framework. The input being taken of images of the same personDifferent data enhancement forms. Training by total loss function with a hyperparameter lambdaidAnd λtriThe weight value alpha in the momentum network parameter updating formula is set to be 0.999, the training period at the stage is set to be 50, and the learning rate is fixed to be 0.0003. When a K-means clustering algorithm is used, in order to meet the unknown setting of the number of categories of a target domain in an unsupervised domain adaptation task, the number of categories is respectively set to be 500, 700 and 900 on a Market-1501 data set and a DukeMTMC-reiD data set; on the MSMT17 dataset, the number of categories was set to 500, 1000, 1500, and 2000. These numbers are different from the actual pedestrian category number of the dataset.
The loss functions used in the target domain training phase are as follows:
hard classification loss function:
Figure BDA0003033280990000061
threshold-based hard triplet loss function:
Figure BDA0003033280990000062
soft classification loss function:
Figure BDA0003033280990000071
soft triplet loss based on feature distribution:
Figure BDA0003033280990000072
to this end, the total loss function can be expressed as:
Figure BDA0003033280990000073
three classical mainstream large-scale public pedestrian re-identification datasets were used in the experiment: market-1501, DukeMTMC-reiD and MSMT 17. The specific information is shown in table 1. And measuring the performance of the model by using rank-1, rank-5 and rank-10 of the mean average precision mAP and the cumulative matching characteristic CMC curve in the analysis of the experimental result. The unsupervised domain adaptive pedestrian re-identification task needs two data sets, namely a marked source domain data set and an unmarked target domain data set, so that for the data sets of Market-1501, DukeMTMC-reiD and MSMT17, one of the data sets is sequentially selected as the source domain data set, the other data set is used as the target domain data set for experiment, and the experiment part expresses the data sets in the form of a source domain data set-target domain data set, such as Duke-Market, Market-Duke, Duke-MSMT, Market-MSMT and the like. In the test process, the evaluation index is calculated by setting the retrieval of the single pedestrian image, and other post-processing methods such as reordering and the like are not used.
In addition, each mini-batch of the experimental link contains 16 pedestrians, and each pedestrian has 4 images, and the total number of the pedestrians is 64. And updating the operation of generating the hard pseudo label after each epoch is finished, and clustering and generating the initial hard pseudo label according to the pedestrian image characteristics extracted from the target domain by the pre-trained model of the source domain. The mini-batch of the target domain dataset image needs to be reassembled after each round of hard pseudo-tag updates. All images are resized to 256 x 128 before being sent to the network.
TABLE 1 pedestrian re-identification public mainstream image dataset
Figure BDA0003033280990000074
Figure BDA0003033280990000081
The algorithm model provided by the invention is realized based on a deep learning framework-PyTorch, and the training and the testing of the model are completed on a Linux server with 4 GTX-1080TI GPUs.
The algorithm compared with the invention comprises 2 models LOMO and BoW which are used for unsupervised learning by traditional manual feature extraction and 4 advanced algorithms based on deep learning, and specifically comprises the following steps:
pseudo-label based method: TJ-AIDL, PCB-R-PAST, SSG, ACT, MMT, AD-Cluster, and NRMT;
the method based on image generation comprises the following steps: HHL, PTGAN, SPGAN, CR-GAN, and SDA;
example classification based approach: ECN, MPLP]And LAIM;
method based on domain adaptation: UCDA.
The performance on the Duke-mark, mark-Duke tasks is shown in table 2. Table 3 shows that compared results (the result with the optimal performance on the current task is represented by bold font) of various methods on Duke-MSMT and Market-MSMT tasks are analyzed from the table, and the method is obviously superior to other methods under the same setting.
Firstly, compared with the method LOMO and BoW for unsupervised pedestrian re-identification by directly utilizing manual characteristics, the method has remarkable superiority. For example, the BoW algorithm obtains 14.8% of mAP index and 35.8% of rank-1 accuracy on Duke-Market, the invention respectively improves 59.6 and 54.5 percent of mAP index and rank-1 accuracy relative to BoW, and the performance of the BoW and LOMO methods is much inferior to that of other methods in the table. Therefore, the performance of the method based on the manual feature extraction is quite different from that of the method based on the deep learning. The information in the data cannot be fully expressed only by manually designing and extracting the features, so that the method performance is poor.
Compared with an example classification-based method, an image generation-based method and a domain self-adaption-based method, the method disclosed by the invention has the advantage that the performance is greatly surpassed. The method based on image generation usually only focuses on carrying out style transformation on the image and neglects potential connection existing among samples of the same type, so that the generated image loses important information; the method based on example classification usually focuses on how to establish the relationship between samples, and how to effectively perform the sample correlation degree matching is still a problem which needs to be solved urgently. In a macroscopic aspect, a method based on domain self-adaptation is to reduce the difference between domains from a feature representation level, and belongs to the implicit solution of the unmarked problem. Example classification or pseudo-labeling based methods learn explicitly by generating pseudo-labels directly on label-free data or by establishing associations between these samples. Thus, the method based on pseudo-mark and instance classification can more directly deal with the unsupervised pedestrian re-identification problem than other methods as a whole.
The mAP indexes of the Duke-Mark task, the Marke-Duke task, the Duke-MSMT task and the Marke-MSMT task respectively reach 74.4%, 66.0%, 25.3% and 24.7%, and compared with the mAP indexes and rank-1 indexes of the optimal algorithm in the table, the mAP indexes and rank-1 indexes are respectively improved by 2.7%, 0.9%, 2.0% and 1.8%. By setting different clustering values, the model performance has different performances, but the invention is in a higher level in the field.
The comparison of the experimental results on the three pedestrian re-identification standard data sets verifies that the framework structure of the invention is simple, and the output of the momentum network is utilized to guide the training of the backbone network. Meanwhile, the randomness of the network is enhanced by using a data enhancement mode, the designed soft label and loss can fully optimize the model without being limited to blind learning of the pseudo label, and the network is promoted from different angles together with the traditional loss, so that the interference of the noise of the pseudo label is reduced, and the network discriminability is enhanced, thereby improving the performance of the model.
TABLE 2 Performance performance on Duke-Market/Market-Duke tasks (%)
Figure BDA0003033280990000091
Figure BDA0003033280990000101
TABLE 3 Performance (%)
Figure BDA0003033280990000102
Figure BDA0003033280990000111
Table 4 shows ablation experiments performed on Duke-mark, mark-Duke tasks to verify the effectiveness of the various component modules of the present invention. ResNet-50 is adopted as a backbone network in the experiment, and the number of the classes is set to be 700. First, the meaning of some symbol names in the table is introduced. Supervised represents an experimental result obtained by training a deep network model under the condition of a known data set real pedestrian identity label; pre-trained indicates that the model Pre-trained on the source domain only by classification loss and triplet loss is applied directly to the unmarked target domain dataset; the Baseline represents that only a hard pseudo label supervision part in the framework of the invention is reserved, namely the most basic processing flow of a pseudo label method based on clustering, a hard pseudo label is generated by using a K-means clustering method, and a result is obtained by training by using hard classification loss and original hard triple loss; only indicates training optimization using only losses in parentheses; w/o indicates that parts other than the bracketed module are used to train the network; the numbers "500, 700, 900, 1000, 1500, 2000" indicate the number of target domain categories set in the K-means cluster.
It can be found from comparison of results of supervised learning and direct migration that the pedestrian re-identification technology has reached a satisfactory level in the case of a mark, but when a pre-trained model on one data set is directly "photographed" on another data set, the performance of the model is drastically reduced due to a significant data distribution difference between the two data sets. For example, when the model trained on the Market-1501 data set is directly applied to the Duke MTMC-reiD data set, the mAP index is reduced to 30.0% from the original 83.7%. Therefore, a cross-domain pedestrian re-identification method needs to be researched to improve the generalization capability of the model.
A baseline comparison was set to verify that there is a false label noise problem in the clustering-based false label approach, which uses only hard false label supervision. The baseline method is greatly improved relative to direct migration, but compared with the method, the difference between mAP on Duke-Market tasks and between mAP on Market-Duke tasks is 19.4% and 17.5%, which indicates that the method has serious false tag noise interference. The invention plays an important role in processing the pseudo tag noise.
To verify the validity of the proposed soft triplet penalty, the hyper-parameter λ tri0. As can be seen from the table, the method
Figure BDA0003033280990000121
Without supervision of the loss of "soft" triples, a significant degradation of model performance occurred. This illustrates that the "hard" pseudo tag and conventional "hard" triple losses do not effectively cope with the interference of the pseudo tag noise, reflecting the value of the proposed feature distribution based "soft" triple losses from the side.
In order to verify the ingenuity of the momentum network design, the momentum network is replaced by the network with the same backbone for experiment, and the method is Baseline + MN-700(E [ theta ])]→ θ'). That is, the output of the two networks themselves is delivered to the other network as a kind of supervision, e.g. initializing the two networks net1And net2By using net1Output supervision net of2、net2Output supervision net of1. The scheme has two defects, one is that the two networks are updated quickly by utilizing gradient return parameters, if the noise is large, the two networks can be amplified quickly, the influence is serious, and the supervision mode is unstable and is easy to interfere with the learning of the networks; and secondly, the outputs of the two networks are directly transmitted to the other network, so that the two networks can quickly converge to be similar, the complementarity of the outputs is greatly reduced, and the outputs become a redundant structure. Therefore, this approach is not feasible. From the experimental result, the performances of the models mAP and rank-n after replacement are obviously reduced, and the mAP and rank-1 on the Market-Duke task are respectively reduced by 7.9% and 5.7% compared with the models proposed in the chapter, thereby verifying the idea.
Besides the provided soft label and loss and momentum learning framework effectively improve the experimental performance, the hard pseudo label also has great significance for the feature learning of the unmarked target domain. Two experiments are designed for respectively verifying hard classification loss and traditional hard threeThe effect of tuple loss. Method for verifying hard class loss
Figure BDA0003033280990000122
Hyper-parametric lambda id1 is ═ 1; method for verifying loss of traditional triples
Figure BDA0003033280990000123
Hyper-parametric lambda tri1. The results in the table are observed to show that the model performance is reduced to different degrees, and the removal of the hard classification loss even leads the network performance to be lower than that under the setting of the Pre-trained method, thereby illustrating the effectiveness of the hard classification loss. This is because the original network will typically output relatively uniform probabilities for each identity, which act as soft labels for soft classification losses, because it initially cannot correctly distinguish between different identities on the target domain, and training directly with such smooth and noisy soft pseudo labels, the network in the framework will soon crash due to excessive bias. A hard pseudo label with confidence 1 for classification loss is crucial for learning the discriminative feature representation on the target domain.
TABLE 4 comparative analysis of ablation experiments Duke-Market/Market-Duke (%)
Method mAP rank-1 rank-5 rank-10
Supervised(Duke) 72.0 90.2 96.3 96.9
Supervised(Market) 83.7 93.0 97.3 98.3
Pre-trained 32.0/30.0 62.1/46.7 76.5/61.8 82.4/67.5
Baseline 55.0/48.5 76.0/66.9 88.5/80.1 91.8/84.3
Baseline+MN-700(w/o Lt id) 30.1/20.4 55.1/30.8 69.8/42.3 74.5/48.0
Baseline+MN-700(w/o Lt tri) 73.2/64.0 89.3/76.7 94.5/87.8 96.1/92.9
Baseline+MN-700(w/o Lt sid) 65.2/61.5 87.3/75.7 94.1/87.8 95.1/91.0
Baseline+MN-700(w/o Lt stri) 68.2/62.5 87.8/75.7 93.9/87.8 95.1/92.5
Baseline+MN-700(E[θ]→θ’) 64.2/58.1 83.8/72.7 92.9/85.8 94.1/89.4
MN-Guided(ours)-700 74.4/66.0 90.3/78.4 95.3/90.2 97.0/94.0
Aiming at weight value over-parameter lambdaidAnd λtriThe clustering number is 700 on the Duke-Market task. The results of the experiment are shown in FIG. 4. When lambda isidAnd λtriThe performance is optimal when the values are 0.5 and 0.8 respectively. All experiments were performed based on these two values.
The embodiment provides core ideas, steps and parameters of a cross-domain pedestrian re-identification method based on momentum network guidance. In addition, the implementation is only a preferred specific implementation of the present invention, and the setting of the parameters needs to be adjusted according to specific variables and data in the specific implementation process, so as to achieve a better practical effect.

Claims (6)

1. A cross-domain pedestrian re-identification algorithm based on momentum network guidance is characterized by comprising the following steps:
step S1, initializing a backbone network by using a model pre-trained on the ImageNet data set;
step S2, carrying out pre-fine adjustment on the model by using the marked data on the source domain data set so as to fully utilize the marked information of the source domain;
step S3, initializing a proposed momentum learning frame by using a model trained by setting different random parameters on a source domain data set, and clustering according to the features extracted by the model by using a clustering algorithm to generate a hard pseudo label with a confidence coefficient of 1;
step S4, designing a new softened pseudo label and a loss function to be combined with the traditional loss to train an optimization model;
and S5, updating the hard pseudo label before each iteration, dynamically updating the soft pseudo label in real time, continuously iterating to generate and optimize the pseudo label until the model is converged, and using the momentum network feature coding part for testing.
2. The momentum network guidance-based cross-domain pedestrian re-identification algorithm according to claim 1, wherein the step S1 further comprises:
step S101, using ResNet-50 network as backbone network, removing the last full connection layer, adding two additional FC layers named FC-1000 and FC-MS
Step S102, initializing the same parameters by using a model pre-trained on a large data set ImageNet.
3. The momentum network guidance-based cross-domain pedestrian re-identification algorithm according to claim 1, wherein the step S2 further comprises:
step S201, adjusting the pedestrian image to be 256 multiplied by 128 in the same size, and setting network training related parameters;
step S202, different random seeds are set to conduct multiple times of supervised training to obtain multiple pre-training models, and cross entropy classification loss and threshold-based triple loss are adopted in the training process.
4. The momentum network guidance-based cross-domain pedestrian re-identification algorithm according to claim 1, wherein the step S3 further comprises:
s301, respectively initializing a backbone network and a momentum network in a momentum frame by two better models trained by a source domain data set, and dividing a target domain into a training set and a test set;
step S302, appointing the category number M of the specific formation of the target domaintFirstly, the clustering algorithm carries out unsupervised clustering on the features extracted from the target domain images according to the model to generate a hard pseudo label with the confidence coefficient of 1.
5. The momentum network guidance-based cross-domain pedestrian re-identification algorithm according to claim 1, wherein the step S4 further comprises:
step S401, providing soft classification pseudo labels with confidence coefficient smaller than 1 for a backbone network by utilizing classification prediction output by a momentum network, and designing triple labels based on Wasserstein distribution distance by utilizing feature distribution output by the momentum network;
step S402, updating soft losses including soft cross entropy losses based on classification prediction and soft triple losses based on feature distribution, and jointly updating a training network together with hard losses;
and S403, updating the backbone network through gradient return, and updating the history memory and the real-time parameters of the backbone network through the momentum network in a weighting manner.
6. The momentum network guidance-based cross-domain pedestrian re-identification algorithm according to claim 1, wherein the step S5 further comprises:
s501, after each epoch is finished, the hard pseudo labels generated by the clustering algorithm are redistributed according to new characteristics, and the soft pseudo labels provided by the momentum network are continuously updated online along with the optimization of the network;
step S502, after the hard pseudo label is updated, the positive examples and the negative examples in the corresponding triple samples are also recombined;
step S503, returning to step S302, repeating the process, and continuously iterating to generate the pseudo label and optimize the pseudo label until the model converges;
and S504, in the testing stage, the characteristics of the query image and the search set image in the target domain test set are extracted by utilizing the momentum network characteristic coding part, the similarity between the query image and the search set image is measured, the sequencing result of the similarity from high to low is returned, the evaluation index is calculated, and the task of re-identifying and retrieving the pedestrians is completed.
CN202110436422.3A 2021-04-22 2021-04-22 Cross-domain pedestrian re-identification method based on momentum network guidance Active CN113326731B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110436422.3A CN113326731B (en) 2021-04-22 2021-04-22 Cross-domain pedestrian re-identification method based on momentum network guidance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110436422.3A CN113326731B (en) 2021-04-22 2021-04-22 Cross-domain pedestrian re-identification method based on momentum network guidance

Publications (2)

Publication Number Publication Date
CN113326731A true CN113326731A (en) 2021-08-31
CN113326731B CN113326731B (en) 2024-04-19

Family

ID=77415041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110436422.3A Active CN113326731B (en) 2021-04-22 2021-04-22 Cross-domain pedestrian re-identification method based on momentum network guidance

Country Status (1)

Country Link
CN (1) CN113326731B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435546A (en) * 2021-08-26 2021-09-24 广东众聚人工智能科技有限公司 Migratable image recognition method and system based on differentiation confidence level
CN113642547A (en) * 2021-10-18 2021-11-12 中国海洋大学 Unsupervised domain adaptive character re-identification method and system based on density clustering
CN113723345A (en) * 2021-09-09 2021-11-30 河北工业大学 Domain-adaptive pedestrian re-identification method based on style conversion and joint learning network
CN113807420A (en) * 2021-09-06 2021-12-17 湖南大学 Domain self-adaptive target detection method and system considering category semantic matching
CN113822262A (en) * 2021-11-25 2021-12-21 之江实验室 Pedestrian re-identification method based on unsupervised learning
CN114333062A (en) * 2021-12-31 2022-04-12 江南大学 Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
CN114565799A (en) * 2022-04-27 2022-05-31 南京邮电大学 Comparison self-supervision learning method based on multi-network framework
CN114692732A (en) * 2022-03-11 2022-07-01 华南理工大学 Method, system, device and storage medium for updating online label
CN114863200A (en) * 2022-03-21 2022-08-05 北京航空航天大学 Gaze estimation method, device, and storage medium
CN114913372A (en) * 2022-05-10 2022-08-16 电子科技大学 Target recognition algorithm based on multi-mode data integration decision
CN114937289A (en) * 2022-07-06 2022-08-23 天津师范大学 Cross-domain pedestrian retrieval method based on heterogeneous pseudo label learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163117A (en) * 2019-04-28 2019-08-23 浙江大学 A kind of pedestrian's recognition methods again based on autoexcitation identification feature learning
CN110443174A (en) * 2019-07-26 2019-11-12 浙江大学 A kind of pedestrian's recognition methods again based on decoupling self-adaptive identification feature learning
US20200125897A1 (en) * 2018-10-18 2020-04-23 Deepnorth Inc. Semi-Supervised Person Re-Identification Using Multi-View Clustering
US20200226421A1 (en) * 2019-01-15 2020-07-16 Naver Corporation Training and using a convolutional neural network for person re-identification
CN111860678A (en) * 2020-07-29 2020-10-30 中国矿业大学 Unsupervised cross-domain pedestrian re-identification method based on clustering
CN112232241A (en) * 2020-10-22 2021-01-15 华中科技大学 Pedestrian re-identification method and device, electronic equipment and readable storage medium
CN112232439A (en) * 2020-11-06 2021-01-15 四川云从天府人工智能科技有限公司 Method and system for updating pseudo label in unsupervised ReID
WO2021022752A1 (en) * 2019-08-07 2021-02-11 深圳先进技术研究院 Multimodal three-dimensional medical image fusion method and system, and electronic device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200125897A1 (en) * 2018-10-18 2020-04-23 Deepnorth Inc. Semi-Supervised Person Re-Identification Using Multi-View Clustering
US20200226421A1 (en) * 2019-01-15 2020-07-16 Naver Corporation Training and using a convolutional neural network for person re-identification
CN110163117A (en) * 2019-04-28 2019-08-23 浙江大学 A kind of pedestrian's recognition methods again based on autoexcitation identification feature learning
CN110443174A (en) * 2019-07-26 2019-11-12 浙江大学 A kind of pedestrian's recognition methods again based on decoupling self-adaptive identification feature learning
WO2021022752A1 (en) * 2019-08-07 2021-02-11 深圳先进技术研究院 Multimodal three-dimensional medical image fusion method and system, and electronic device
CN111860678A (en) * 2020-07-29 2020-10-30 中国矿业大学 Unsupervised cross-domain pedestrian re-identification method based on clustering
CN112232241A (en) * 2020-10-22 2021-01-15 华中科技大学 Pedestrian re-identification method and device, electronic equipment and readable storage medium
CN112232439A (en) * 2020-11-06 2021-01-15 四川云从天府人工智能科技有限公司 Method and system for updating pseudo label in unsupervised ReID

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何爱清: ""基于伪标签优化的跨域行人重识别方法研究"", 《中国优秀硕士学位论文全文数据库》 *
杨昌东;余烨;徐珑刀;付源梓;路强;: "基于AT-PGGAN的增强数据车辆型号精细识别", 中国图象图形学报, no. 03 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435546A (en) * 2021-08-26 2021-09-24 广东众聚人工智能科技有限公司 Migratable image recognition method and system based on differentiation confidence level
CN113435546B (en) * 2021-08-26 2021-12-24 山东力聚机器人科技股份有限公司 Migratable image recognition method and system based on differentiation confidence level
CN113807420A (en) * 2021-09-06 2021-12-17 湖南大学 Domain self-adaptive target detection method and system considering category semantic matching
CN113807420B (en) * 2021-09-06 2024-03-19 湖南大学 Domain self-adaptive target detection method and system considering category semantic matching
CN113723345A (en) * 2021-09-09 2021-11-30 河北工业大学 Domain-adaptive pedestrian re-identification method based on style conversion and joint learning network
CN113723345B (en) * 2021-09-09 2023-11-14 河北工业大学 Domain self-adaptive pedestrian re-identification method based on style conversion and joint learning network
CN113642547A (en) * 2021-10-18 2021-11-12 中国海洋大学 Unsupervised domain adaptive character re-identification method and system based on density clustering
CN113642547B (en) * 2021-10-18 2022-02-11 中国海洋大学 Unsupervised domain adaptive character re-identification method and system based on density clustering
CN113822262A (en) * 2021-11-25 2021-12-21 之江实验室 Pedestrian re-identification method based on unsupervised learning
CN114333062A (en) * 2021-12-31 2022-04-12 江南大学 Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
CN114333062B (en) * 2021-12-31 2022-07-15 江南大学 Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
CN114692732A (en) * 2022-03-11 2022-07-01 华南理工大学 Method, system, device and storage medium for updating online label
CN114863200A (en) * 2022-03-21 2022-08-05 北京航空航天大学 Gaze estimation method, device, and storage medium
CN114565799B (en) * 2022-04-27 2022-07-08 南京邮电大学 Comparison self-supervision learning method based on multi-network framework
CN114565799A (en) * 2022-04-27 2022-05-31 南京邮电大学 Comparison self-supervision learning method based on multi-network framework
CN114913372A (en) * 2022-05-10 2022-08-16 电子科技大学 Target recognition algorithm based on multi-mode data integration decision
CN114913372B (en) * 2022-05-10 2023-05-26 电子科技大学 Target recognition method based on multi-mode data integration decision
CN114937289A (en) * 2022-07-06 2022-08-23 天津师范大学 Cross-domain pedestrian retrieval method based on heterogeneous pseudo label learning
CN114937289B (en) * 2022-07-06 2024-04-19 天津师范大学 Cross-domain pedestrian retrieval method based on heterogeneous pseudo tag learning

Also Published As

Publication number Publication date
CN113326731B (en) 2024-04-19

Similar Documents

Publication Publication Date Title
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
CN113326731B (en) Cross-domain pedestrian re-identification method based on momentum network guidance
Zhang et al. Discovering new intents with deep aligned clustering
CN111967294B (en) Unsupervised domain self-adaptive pedestrian re-identification method
CN109034205B (en) Image classification method based on direct-push type semi-supervised deep learning
CN111583263B (en) Point cloud segmentation method based on joint dynamic graph convolution
CN107515895B (en) Visual target retrieval method and system based on target detection
CN110717526B (en) Unsupervised migration learning method based on graph convolution network
CN108399428B (en) Triple loss function design method based on trace ratio criterion
CN110929679B (en) GAN-based unsupervised self-adaptive pedestrian re-identification method
CN108875816A (en) Merge the Active Learning samples selection strategy of Reliability Code and diversity criterion
CN113204952B (en) Multi-intention and semantic slot joint identification method based on cluster pre-analysis
US20210319215A1 (en) Method and system for person re-identification
CN112765352A (en) Graph convolution neural network text classification method based on self-attention mechanism
CN113408605A (en) Hyperspectral image semi-supervised classification method based on small sample learning
CN112464004A (en) Multi-view depth generation image clustering method
CN110942091A (en) Semi-supervised few-sample image classification method for searching reliable abnormal data center
CN111666406A (en) Short text classification prediction method based on word and label combination of self-attention
CN112232395B (en) Semi-supervised image classification method for generating countermeasure network based on joint training
CN113255366B (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN114357221B (en) Self-supervision active learning method based on image classification
CN113361627A (en) Label perception collaborative training method for graph neural network
CN114998602A (en) Domain adaptive learning method and system based on low confidence sample contrast loss
CN114299362A (en) Small sample image classification method based on k-means clustering
CN116152554A (en) Knowledge-guided small sample image recognition system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant