CN114187655B - Unsupervised pedestrian re-recognition method based on joint training strategy - Google Patents
Unsupervised pedestrian re-recognition method based on joint training strategy Download PDFInfo
- Publication number
- CN114187655B CN114187655B CN202111430274.0A CN202111430274A CN114187655B CN 114187655 B CN114187655 B CN 114187655B CN 202111430274 A CN202111430274 A CN 202111430274A CN 114187655 B CN114187655 B CN 114187655B
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- camera
- features
- centroid
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012549 training Methods 0.000 title claims abstract description 54
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000012360 testing method Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 2
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/66—Analysis of geometric attributes of image moments or centre of gravity
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Geometry (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the field of artificial intelligence and pedestrian re-recognition, and discloses an unsupervised pedestrian re-recognition method based on a combined training strategy. Aiming at the problem of larger inter-camera domain gap, the method for learning the inter-camera invariance features is provided, and aims to enable a model to learn the capability of distinguishing invariance features under different cameras. The method comprises the following steps: extracting pedestrian image characteristics; clustering and distributing pseudo tags; calculating the mass center of the pedestrian and the mass center of the camera; digging edge characteristics and invariance characteristics between cameras; updating pedestrian instance characteristics and a camera centroid; the parameters of the model are updated with the contrast loss. By using the method and the device, the tag noise can be effectively reduced, the inter-camera domain gap is reduced, and the pedestrian re-recognition precision is obviously improved. The method for unsupervised pedestrian re-recognition based on the combined training strategy can be widely applied to the field of pedestrian re-recognition.
Description
Technical Field
The invention belongs to the field of artificial intelligence and pedestrian re-recognition, and particularly relates to an unsupervised pedestrian re-recognition method based on a combined training strategy.
Background
Pedestrian re-recognition mainly matches pedestrian images, and finds pedestrian images of the same category as given pedestrian images. The pedestrian re-identification technology plays a vital role in the fields of smart cities, intelligent security and the like, and can be applied to the fields such as criminal suspects tracking, missing population searching, people flow statistics and the like.
In recent years, the task of re-identifying the supervised pedestrians has been greatly advanced, but due to the large demand of a large-scale monitoring system, the monitoring data is continuously increased, and the high marking cost is added, the application of the system is greatly limited due to the dependence on a large number of manual marks. Therefore, the unsupervised pedestrian re-recognition task is increasingly focused, can be directly learned from unlabeled data, has stronger expandability, and has great application value in the industrial field.
The main methods of current unsupervised pedestrian re-identification task research are generally classified into three categories, (1) an unsupervised domain adaptive method is used to adjust the feature distribution between the source domain and the target domain. (2) The camera perception method is utilized to enable the model to learn the capability of distinguishing sample characteristics under different cameras. (3) And generating pseudo labels for training on the target domain by a clustering method, and distributing the same pseudo labels to similar images. The first class defines unsupervised pedestrian re-recognition tasks as transfer learning tasks, which typically use both source and target domain data sets and employ marked data sets on the source domain to assist in training. The latter two categories are training pedestrian re-recognition models with complete unsupervised. Compared with an unsupervised domain self-adaptive pedestrian re-recognition method, the completely unsupervised pedestrian re-recognition method has more application value. This is because when the difference in the characteristic distribution of the source domain and the target domain is large, it is difficult to obtain a high-quality pseudo tag, which tends to affect the performance because of much tag noise. In practical application, the sample with the tag is difficult to obtain, so that the application of the unsupervised domain self-adaptive method is limited. The completely unsupervised pedestrian re-recognition method can only train the depth model by using unlabeled images, so that the method has more practical value in the industrial field and is more widely applied. The invention mainly aims at the field of completely unsupervised pedestrian re-recognition and provides an unsupervised pedestrian re-recognition method based on a combined training strategy.
The popular unsupervised pedestrian re-identification method in recent years mainly comprises the steps of adopting a clustering algorithm to distribute pseudo labels to unlabeled samples, then updating an example feature library, calculating a centroid, and finally optimizing a model by utilizing contrast learning loss. The contrast learning shows good performance in the field of unsupervised pedestrian re-recognition. Ge et al propose a self-walking contrast learning framework that dynamically updates a hybrid feature library containing source and target domain dataset features, and then performs contrast learning (Yixiao Ge,Feng Zhu,Dapeng Chen,Rui Zhao,et al.Self-paced contrastive learning with hybrid memory for domain adaptive object re-id.[C]//Advances in Neural Information Processing Systems,NeurIPS.2020:11309–11321). with the appearance of the person varying from camera view to camera view due to changes in viewpoint, lighting conditions, background, etc. In general, pedestrians of the same type have higher similarity in the same camera view and have larger appearance differences under different cameras, so how to reduce the field gap generated by the cameras is also one of research hotspots for unsupervised pedestrian re-recognition. The direction of current research is usually at the training level to let the model learn the invariance features between cameras. Yang et al propose a camera perceptron learning to mitigate the negative effects of noise samples and learn inter-camera invariance features (Fengxiang Yang,Zhun Zhong,Zhiming Luo,et al.Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification[C]//in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,CVPR.2021:4855–4864)., while the existing approach is effective, ignoring two important factors. (1) influence of tag noise. In each iteration process, the instance features are updated continuously, which inevitably introduces label noise, so that the accurate update of the instance features can effectively optimize the cluster distribution of the instance features and reduce the influence of the label noise. (2) learning of invariance features between cameras. Invariance between cameras is characterized by the most difficult to distinguish samples of the same identity in each camera, with a large inter-camera field gap. The clustering algorithm is difficult to cluster difficult samples with the same identity in all cameras into the same set, and the real labels are lacking in unsupervised pedestrian re-recognition, so that real supervised learning cannot be performed, and the model cannot effectively learn invariance characteristics among the cameras. The present invention aims to solve the two key unsupervised pedestrian re-recognition problems described above.
For the first problem, the present patent proposes a centralized momentum update strategy aimed at optimizing the cluster distribution to reduce the impact of label noise.
For the second problem, the invention provides a method for learning invariance characteristics among cameras, which aims to enable a model to learn the capability of distinguishing invariance characteristic samples under different cameras, so that the domain gap among the cameras is reduced, and the performance of unsupervised pedestrian re-identification is improved.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide an unsupervised pedestrian re-identification method based on a combined training strategy. The method can optimize global cluster distribution by utilizing a centralized momentum update strategy, and the model learns the capability of distinguishing invariance characteristic samples under different cameras by utilizing an invariance characteristic learning method among the cameras, so that the influence of tag noise and the domain gap among the cameras are reduced, and the performance of unsupervised pedestrian re-identification is improved.
An unsupervised pedestrian re-identification method based on a joint training strategy comprises the following steps:
Step 1: dividing the pedestrian image into a training set and a testing set;
Step 2: extracting pedestrian characteristics of the training set by using a CNN network loaded with a pre-training model;
step 3: calculating the similarity between pedestrian features, clustering the similarity between the features by using a density clustering algorithm, and generating a pseudo tag;
step 4: removing outlier features, and constructing a new pedestrian training set by using the pseudo tag and the corresponding pedestrian features;
Step 5: and (5) extracting the pedestrian training set constructed in the step (4) to initialize the pedestrian centroid and the camera centroid. An arithmetic average of pedestrian features having the same pseudo tag is calculated as a pedestrian centroid, and an arithmetic average of pedestrian features having the same camera ID is calculated as a camera centroid.
Step 6: and extracting the pedestrian characteristics in the pedestrian training set constructed in the step 4 by using ResNet network. Then executing a centralized momentum update strategy, wherein the specific content of the strategy is as follows: and 5, calculating the similarity between the pedestrian features with the same pseudo labels and the pedestrian centroid obtained in the step 5, and representing the pedestrian feature with the minimum similarity as an edge feature. And updating all pedestrian instance features under the same pseudo tag by utilizing the edge features. Then, the invariance characteristic study among the cameras is carried out, and the specific content of the method is as follows: firstly, calculating the similarity between the pedestrian characteristics with the same camera ID and the camera mass center obtained in the step 5, representing the pedestrian characteristics with the maximum similarity as camera invariance characteristics, and then updating all pedestrian example characteristics under the same pseudo-label and the camera mass center under the same camera ID by utilizing the characteristics. And finally, calculating the mass center of the pedestrian by using the updated pedestrian example characteristics.
Step 7: and (3) extracting a pedestrian inquiry sample by using the pedestrian training set constructed in the step (4), calculating a comparison learning loss by using the sample and the pedestrian centroid obtained in the step (6), and updating parameters of the model.
Step 8: inputting the test set image into the optimal CNN model obtained by training in the step 7 to extract the pedestrian characteristics of the image. And calculating the characteristic distance of the pedestrian images in the query set and the test set to obtain an unsupervised pedestrian re-identification result.
Further, the specific process of step 2 is as follows: the selected CNN is ResNet to load the ImageNet pre-training model and delete its last classification layer. All images of the training set are input into ResNet, and a feature space V (Φ (x 1),φ(x2),...,φ(xn)) is formed assuming that the pedestrian feature Φ (x i) of the ith image is extracted.
Further, the specific process of step 3 is as follows:
The similarity between pedestrian features is calculated using the Jaccard similarity formula, which is as follows:
Wherein s i and s j represent the ith and jth pedestrian features, respectively, J (s i,sj) represents the similarity between the pedestrian features s i and s j, n represents the intersection, and u represents the union.
After the similarity between the pedestrian features is obtained, a pseudo tag Y= { Y 1,y2,...,yM,yM+1,...,yN } is assigned to each pedestrian feature X= { X 1,x2,...,xM,xM+1,...,xN } by using density clustering.
Further, the specific process of step 4 is as follows:
And 3, obtaining a pseudo tag Y= { Y 1,y2,...,yM,yM+1,...,yN } and a corresponding pedestrian characteristic X= { X 1,x2,...,xM,xM+1,...,xN } in the training set, wherein Y= { Y M+1,...,yN } is a discrete value pseudo tag generated by clustering, the corresponding discrete value characteristic is X= { X M+1,...,xN }, the training set with the discrete value removed is X= { X 1,x2,...,xM }, and the corresponding pseudo tag is Y= { Y 1,y2,...,yM }.
Further, the specific process of step 5 is as follows:
The training set X= { X 1,x2,…,xM } and the pseudo tag Y= { Y 1,y2,…,yM } with discrete values removed are obtained through the step 4, the pedestrian characteristic X i={x1,x2,...,xn of the ith pseudo tag is extracted, the average value of the characteristic set is calculated as the centroid of the ith pedestrian, and the calculation formula is as follows:
wherein, Is a d-dimensional vector belonging to the nth pedestrian instance feature in the ith cluster set. Alpha i represents all pedestrian instance features under the ith cluster set, |·| represents the number of all pedestrian instance features under the cluster set, and V i is the ith class of pedestrian centroid.
Extracting pedestrian characteristics Y k={y1,y2,...,ym under the kth camera, and calculating the average value of the characteristic set as the mass center of the kth camera, wherein the calculation formula is as follows:
Wherein the method comprises the steps of Is a d-dimensional vector belonging to the nth instance in the kth camera set. Beta k denotes all camera instance features under the kth camera set, |·| denotes the number of all camera instance features under the camera, and C k is the kth camera centroid.
Further, the specific process of step 6 is as follows:
And 5, obtaining a pedestrian centroid V and a camera centroid C which are obtained through the step 5, and obtaining the pedestrian characteristic X with discrete values removed through the step 4. The pedestrian feature X is input into ResNet network for feature extraction, and then a centralized momentum update strategy is executed: firstly, calculating the similarity between the extracted features and the pedestrian centroid of the same pseudo tag, and selecting the edge feature with the minimum similarity to update all pedestrian instance features under the same pseudo tag. Then, the invariance characteristic study among cameras is carried out: firstly, calculating the similarity between pedestrian features under the same camera ID and the camera centroid, selecting the pedestrian feature with the largest similarity as the camera invariance feature, and updating all pedestrian instance features under the same pseudo-tag and the camera centroid of the same camera ID by utilizing the feature. And finally, calculating the mass center of the pedestrian by using the updated pedestrian example characteristics, wherein the calculation formula is as follows:
Ck←mcCk+(1-mc)pk (6)
Where V i is the i-th pedestrian centroid and C k is the k-th camera centroid. m is a momentum update parameter, and m c is a camera momentum update parameter. Alpha i represents all pedestrian instance features under the i-th cluster set, beta k represents all camera instance features under the k-th camera set. p k denotes the invariance features of the kth camera set, Representing edge features of the class i cluster set.Is a d-dimensional vector, belongs to the nth pedestrian instance feature in the ith cluster set,Representing any one of the features under the class i cluster set.
Further, the specific process of step 7 is as follows:
And (3) comparing the pedestrian centroid obtained in the step (5) with the pedestrian inquiry sample in the training set X extracted in the step (4) to learn and calculate the loss, wherein the calculation formula is as follows:
Where τ is a temperature hyper-parameter, f is a pedestrian query sample, V + is a positive sample pedestrian centroid, K is a number of cluster categories, and the objective of the model parameter optimization is to improve the similarity of a pedestrian query instance and a corresponding same pseudo-tag pedestrian centroid, and reduce the similarity of a pedestrian query instance and different pseudo-tag pedestrian centroids.
The beneficial effects of the invention are as follows: the invention adopts the pedestrian centroid to mine the edge characteristics and uses the characteristics to update the pedestrian example characteristics, so that the influence of the label noise on the cluster distribution can be reduced. According to the invention, the camera centroid is adopted to mine the invariance characteristics among the cameras, and the characteristics are utilized to update all pedestrian example characteristics under the same pseudo labels, so that the inter-camera domain gaps distributed in a clustering way can be reduced, and the capability of the model for distinguishing the same pedestrian identity under different cameras is improved. Finally, the model parameters are optimized through comparison and learning loss, and the accuracy of pedestrian re-identification is effectively improved.
Drawings
FIG. 1 is a flow chart of an unsupervised pedestrian re-recognition method based on a joint training strategy of the present invention;
FIG. 2 is a flowchart of training steps according to an embodiment of the present invention;
FIG. 3 is a flow chart of testing steps according to an embodiment of the invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific examples, which include, but are not limited to, the following examples.
As shown in fig. 1, the invention provides an unsupervised pedestrian re-recognition method based on a joint training strategy, which comprises the following specific implementation processes:
1. Pedestrian image feature extraction
As shown in fig. 2, the training set features are extracted by using CNNs, and assuming that the pedestrian features phi (x i) of the ith image are extracted, a feature space V (phi (x 1),φ(x2),...,φ(xn)) is formed, the selected CNNs are ResNet for loading the ImageNet pre-training model, and then the last classification layer is deleted.
2. Cluster allocation pseudo tag
And (3) calculating the similarity between the pedestrian features extracted in the step (2) by using a Jaccard similarity formula, wherein the calculation formula is as follows:
Where s i and s j represent the ith and jth pedestrians, J (s i,sj) represents the similarity between the pedestrian features s i and s j, and n represents the intersection, and u represents the union.
After the similarity between the pedestrian features is obtained, a pseudo tag Y= { Y 1,y2,...,yM,yM+1,...,yN } is assigned to each pedestrian feature X= { X 1,x2,...,xM,xM+1,...,xN } by using density clustering.
3. Generating a training set of pseudo tags
And 3, obtaining a pseudo tag Y= { Y 1,y2,...,yM,yM+1,...,yN } and a corresponding pedestrian characteristic X= { X 1,x2,...,xM,xM+1,...,xN }, wherein the training set with discrete values removed is X= { X 1,x2,...,xM }, and the corresponding pseudo tag is Y= { Y 1,y2,...,yM }.
4. Centroid initialization and updating of camera centroids and pedestrian instance features
And (3) extracting the pedestrian training set X= { X 1,x2,...,xN } constructed in the step (4), extracting the pedestrian characteristic X i={x1,x2,...,xn } under the ith pseudo tag, and initializing and calculating the pedestrian centroid V= { V 1,V2,...,Vn }. Pedestrian feature Y k={y1,y2,...,ym under the kth camera is extracted and the calculated camera centroid c= { C 1,C2,...,Cn } is initialized. The centroid calculation formula is as follows:
Wherein the method comprises the steps of Is a d-dimensional vector belonging to the nth pedestrian instance feature in the ith cluster set. Alpha i represents all pedestrian instance features under the ith cluster set, and |·| represents the number of all pedestrian instance features under the cluster set.Is a d-dimensional vector belonging to the nth instance in the kth camera set. Beta k denotes all camera instance features under the kth camera set. V i is the i-th pedestrian centroid and C k is the k-th camera centroid.
And (3) inputting the pedestrian characteristics obtained in the step (4) into a ResNet network for characteristic extraction, calculating the similarity between the pedestrian centroid V obtained in the step (5) and the pedestrian characteristics under the same pseudo-label, representing the pedestrian characteristics with the minimum similarity as edge characteristics, and updating all the pedestrian example characteristics under the same pseudo-label by using the characteristics. And simultaneously calculating the similarity between the pedestrian characteristics under the same camera ID and the camera centroid, representing the pedestrian characteristics with the maximum similarity as camera invariance characteristics, and updating all pedestrian instance characteristics under the same pseudo-label and the camera centroid of the same camera ID by using the characteristics. And finally, calculating the mass center of the pedestrian by utilizing the pedestrian example characteristics. The calculation formula is as follows:
Ck←mcCk+(1-mc)pk (14)
Where V i is the i-th pedestrian centroid and C k is the k-th camera centroid. m is a momentum update parameter, and m c is a camera momentum update parameter. Alpha i represents all pedestrian instance features under the ith cluster set, beta k represents all camera instance features under the kth camera set. p k denotes the invariance features of the kth camera set, Representing edge features of the class i cluster set.Is a d-dimensional vector belonging to the nth pedestrian instance feature in the ith cluster set.
5. Contrast loss training network
And (3) comparing the updated pedestrian centroid in the step (6) with the pedestrian inquiry sample in the training set X extracted in the step (4) to learn and calculate the loss, wherein the calculation formula is as follows:
where τ is the temperature hyper-parameter, f is the pedestrian query sample, V + is the positive sample pedestrian centroid, K is the number of cluster categories. The direction of optimizing the model parameters by the loss function is to improve the similarity between the pedestrian inquiry example and the corresponding pedestrian centroid of the same pseudo tag and reduce the similarity between the pedestrian inquiry example and the pedestrian centroid of different pseudo tags.
6. Test set pedestrian retrieval
As shown in fig. 3, the pedestrian image of the test set is input to the best ResNet obtained through training in step 7 to perform feature extraction, and the unsupervised pedestrian re-recognition result can be obtained by calculating the distance between the pedestrian features in the query set and the test set. For example, the Euclidean distance is used for measuring the distance between the features of pedestrians, if the smaller the Euclidean distance between the features is, the more similar the two pedestrian images are, the larger the probability of belonging to the same type of pedestrians is, and finally, the unsupervised pedestrian re-recognition result is obtained.
In summary, the invention discloses an unsupervised pedestrian re-identification method based on a combined training strategy. According to the invention, the edge characteristics and the invariance samples among the cameras are jointly excavated through the center of mass of the camera and the center of mass of the pedestrian, the characteristics of the pedestrian example are updated by utilizing the excavated characteristics, and then the model parameters are optimized through comparison and learning. Therefore, the influence of label noise and inter-camera domain gaps on cluster distribution is reduced, so that the model can learn the capability of distinguishing invariance characteristic samples among cameras, and the performance of unsupervised pedestrian re-recognition is improved.
Firstly, the pedestrian centroid is utilized to mine the characteristics at the clustering edge, and the characteristics are utilized to update the pedestrian instance characteristics, so that the label noise is reduced, and the clustering effect is improved.
Secondly, adopting the mass center of the cameras to mine invariance characteristics among the cameras, and updating all pedestrian example characteristics under the same pseudo labels by utilizing the characteristics. Thus, the model can learn the distribution of invariance characteristics among the cameras, and the domain gap among the cameras is reduced.
Finally, in the process of updating pedestrian example characteristics in a combined mode through the edge characteristics and the invariance characteristics between the cameras, the optimal global cluster distribution is gradually achieved, and the more robust characteristics are learned by the model through contrast learning, so that the accuracy of pedestrian re-identification is effectively improved.
Claims (2)
1. An unsupervised pedestrian re-identification method based on a joint training strategy is characterized by comprising the following steps:
Step 1: dividing the pedestrian image into a training set and a testing set;
Step 2: extracting pedestrian characteristics of the training set by using a CNN network loaded with a pre-training model;
The selected CNN is ResNet to load an ImageNet pre-training model, and the last classification layer is deleted; inputting all images of the training set into ResNet, and forming a feature space V (phi (x 1),φ(x2),...,φ(xn)) on the premise of extracting the pedestrian feature phi (x i) of the ith image;
Step 3: calculating the similarity between pedestrian features by using a Jaccard similarity formula, clustering the similarity between features by using a density clustering algorithm, and generating a pseudo tag;
and calculating the similarity between pedestrian features by using a similarity formula, wherein the calculation formula is as follows:
Wherein s i and s j represent the ith and jth pedestrian features, respectively, J (s i,sj) represents the similarity between the pedestrian features s i and s j, ∈represents the intersection, and u represents the union;
After the similarity between the pedestrian features is obtained, a pseudo tag Y= { Y 1,y2,...,yM,yM+1,...,yN } is distributed to each pedestrian feature X= { X 1,x2,…,xM,xM+1,…,xN } by using density clustering;
step 4: removing outlier features, and constructing a new pedestrian training set by using the pseudo tag and the corresponding pedestrian features;
Obtaining a pseudo tag Y= { Y 1,y2,…,yM,yM+1,…,yN } and a corresponding pedestrian feature X= { X 1,x2,...,xM,xM+1,...,xN } in the training set through the step 3, wherein Y= { Y M+1,...,yN } is a discrete value pseudo tag generated by clustering, the corresponding discrete value feature is X= { X M+1,...,xN }, the training set with the discrete value removed is X= { X 1,x2,...,xM }, and the corresponding pseudo tag is Y= { Y 1,y2,...,yM };
Step 5: extracting the pedestrian training set constructed in the step 4 to initialize the pedestrian centroid and the camera centroid; calculating an arithmetic average of pedestrian features having the same pseudo tag as a pedestrian centroid, and calculating an arithmetic average of pedestrian features having the same camera ID as a camera centroid;
The training set X= { X 1,x2,...,xM } and the pseudo tag Y= { Y 1,y2,...,yM } with discrete values removed are obtained through the step 4, the pedestrian characteristic X i={x1,x2,...,xn of the ith pseudo tag is extracted, the average value of the characteristic set is calculated as the centroid of the ith pedestrian, and the calculation formula is as follows:
wherein, Is a d-dimensional vector, belonging to the nth pedestrian instance feature in the ith cluster set; alpha i represents all pedestrian instance features under the ith cluster set, |·| represents the number of all pedestrian instance features under the cluster set, V i is the ith class of pedestrian centroid;
Extracting pedestrian characteristics Y k={y1,y2,…,ym under the kth camera, and calculating the average value of the characteristic set as the mass center of the kth camera, wherein the calculation formula is as follows:
wherein, Is a d-dimensional vector belonging to the nth instance in the kth camera set; β k denotes all camera instance features under the kth camera set, |·| denotes the number of all camera instance features under the camera, C k is the kth camera centroid;
Step 6: extracting the pedestrian characteristics in the pedestrian training set constructed in the step 4 by using ResNet network; then executing a centralized momentum update strategy; the specific content of the strategy is as follows: calculating the similarity between the pedestrian features with the same pseudo labels and the pedestrian centroid obtained in the step 5, and representing the pedestrian feature with the minimum similarity as an edge feature; updating all pedestrian example features under the same pseudo tag by utilizing the edge features; then, camera invariance feature learning is carried out, and the specific content of the method is as follows: firstly, calculating the similarity between pedestrian features with the same camera ID and the camera mass center obtained in the step 5, representing the pedestrian feature with the maximum similarity as a camera invariance feature, and then updating all pedestrian instance features under the same pseudo tag and the camera mass center under the same camera ID by using the feature; finally, calculating the mass center of the pedestrian by using the updated pedestrian example characteristics;
Step 7: extracting a pedestrian inquiry sample by using the pedestrian training set constructed in the step 4, calculating and comparing learning loss and updating parameters of the model by using the sample and the pedestrian centroid loss obtained in the step 6;
And (3) comparing the pedestrian centroid obtained in the step (5) with the pedestrian inquiry sample in the extracted training set X to learn and calculate the loss, wherein the calculation formula is as follows:
Wherein τ is a temperature superparameter, f is a pedestrian inquiry sample, V + is a positive sample pedestrian centroid, K is the number of clustering categories, and the objective of optimizing model parameters by the loss function is to improve the similarity of a pedestrian inquiry example and a corresponding same pseudo-tag pedestrian centroid and reduce the similarity of the pedestrian inquiry example and different pseudo-tag pedestrian centroids;
Step 8: inputting the test set image into the optimal CNN model obtained by training in the step 7 to extract pedestrian characteristics of the image; and calculating the characteristic distance of the pedestrian images in the query set and the test set to obtain an unsupervised pedestrian re-identification result.
2. The method for unsupervised pedestrian re-recognition based on the combined training strategy according to claim 1, wherein in step 6, the pedestrian centroid V and the camera centroid C obtained in step 5 are subjected to step 4 to obtain the pedestrian feature X with the discrete values removed; the pedestrian feature X is input into ResNet network for feature extraction, and then a centralized momentum update strategy is executed: firstly, calculating the similarity between the extracted features and the pedestrian centroids of the same pseudo tags, and selecting edge features with minimum similarity to update all pedestrian instance features under the same pseudo tags; then, the invariance characteristic study among cameras is carried out: firstly, calculating the similarity between pedestrian features under the same camera ID and the camera centroid, selecting the pedestrian feature with the largest similarity as a camera invariance feature, and updating all pedestrian instance features under the same pseudo-tag and the camera centroid of the same camera ID by using the feature; and finally, calculating the mass center of the pedestrian by using the updated pedestrian example characteristics, wherein the calculation formula is as follows:
Ck←mcCk+(1-mc)pk (6)
where V i is the i-th class pedestrian centroid, C k is the k-th camera centroid, m is the momentum update parameter, m c is the camera momentum update parameter, alpha i represents all pedestrian instance features under the i-th class cluster set, beta k represents all camera instance features under the k-th camera set, p k represents the invariance feature of the k-th camera set, Representing edge features of a set of class i clusters,Is a d-dimensional vector, belongs to the nth pedestrian instance feature in the ith cluster set,Representing any one of the features under the class i cluster set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111430274.0A CN114187655B (en) | 2021-11-29 | 2021-11-29 | Unsupervised pedestrian re-recognition method based on joint training strategy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111430274.0A CN114187655B (en) | 2021-11-29 | 2021-11-29 | Unsupervised pedestrian re-recognition method based on joint training strategy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114187655A CN114187655A (en) | 2022-03-15 |
CN114187655B true CN114187655B (en) | 2024-08-13 |
Family
ID=80602835
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111430274.0A Active CN114187655B (en) | 2021-11-29 | 2021-11-29 | Unsupervised pedestrian re-recognition method based on joint training strategy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114187655B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111476168A (en) * | 2020-04-08 | 2020-07-31 | 山东师范大学 | Cross-domain pedestrian re-identification method and system based on three stages |
CN112836675A (en) * | 2021-03-01 | 2021-05-25 | 中山大学 | Unsupervised pedestrian re-identification method and system based on clustering-generated pseudo label |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112906606B (en) * | 2021-03-05 | 2024-04-02 | 南京航空航天大学 | Domain self-adaptive pedestrian re-identification method based on mutual divergence learning |
CN113065409A (en) * | 2021-03-09 | 2021-07-02 | 北京工业大学 | Unsupervised pedestrian re-identification method based on camera distribution difference alignment constraint |
-
2021
- 2021-11-29 CN CN202111430274.0A patent/CN114187655B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111476168A (en) * | 2020-04-08 | 2020-07-31 | 山东师范大学 | Cross-domain pedestrian re-identification method and system based on three stages |
CN112836675A (en) * | 2021-03-01 | 2021-05-25 | 中山大学 | Unsupervised pedestrian re-identification method and system based on clustering-generated pseudo label |
Also Published As
Publication number | Publication date |
---|---|
CN114187655A (en) | 2022-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111126360B (en) | Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model | |
CN108960140B (en) | Pedestrian re-identification method based on multi-region feature extraction and fusion | |
CN111666851B (en) | Cross domain self-adaptive pedestrian re-identification method based on multi-granularity label | |
Lin et al. | RSCM: Region selection and concurrency model for multi-class weather recognition | |
CN109117793B (en) | Direct-push type radar high-resolution range profile identification method based on deep migration learning | |
CN112906606B (en) | Domain self-adaptive pedestrian re-identification method based on mutual divergence learning | |
CN110728694B (en) | Long-time visual target tracking method based on continuous learning | |
CN111274958B (en) | Pedestrian re-identification method and system with network parameter self-correction function | |
CN112434599B (en) | Pedestrian re-identification method based on random occlusion recovery of noise channel | |
CN116910571B (en) | Open-domain adaptation method and system based on prototype comparison learning | |
CN112818175B (en) | Factory staff searching method and training method of staff identification model | |
CN113920472A (en) | Unsupervised target re-identification method and system based on attention mechanism | |
CN112819065A (en) | Unsupervised pedestrian sample mining method and unsupervised pedestrian sample mining system based on multi-clustering information | |
CN108345866B (en) | Pedestrian re-identification method based on deep feature learning | |
CN111967325A (en) | Unsupervised cross-domain pedestrian re-identification method based on incremental optimization | |
Wang et al. | Multiple pedestrian tracking with graph attention map on urban road scene | |
CN114092873B (en) | Long-term cross-camera target association method and system based on appearance and morphological decoupling | |
CN112115780A (en) | Semi-supervised pedestrian re-identification method based on deep multi-model cooperation | |
CN115205570A (en) | Unsupervised cross-domain target re-identification method based on comparative learning | |
CN113657267B (en) | Semi-supervised pedestrian re-identification method and device | |
Wang et al. | Online visual place recognition via saliency re-identification | |
CN111695531B (en) | Cross-domain pedestrian re-identification method based on heterogeneous convolution network | |
CN115527269B (en) | Intelligent human body posture image recognition method and system | |
CN114187655B (en) | Unsupervised pedestrian re-recognition method based on joint training strategy | |
CN112052722A (en) | Pedestrian identity re-identification method and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |