CN114187655B

CN114187655B - Unsupervised pedestrian re-recognition method based on joint training strategy

Info

Publication number: CN114187655B
Application number: CN202111430274.0A
Authority: CN
Inventors: 刘雨轩; 葛宏伟; 孙亮; 候亚庆
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2024-08-13
Anticipated expiration: 2041-11-29
Also published as: CN114187655A

Abstract

The invention belongs to the field of artificial intelligence and pedestrian re-recognition, and discloses an unsupervised pedestrian re-recognition method based on a combined training strategy. Aiming at the problem of larger inter-camera domain gap, the method for learning the inter-camera invariance features is provided, and aims to enable a model to learn the capability of distinguishing invariance features under different cameras. The method comprises the following steps: extracting pedestrian image characteristics; clustering and distributing pseudo tags; calculating the mass center of the pedestrian and the mass center of the camera; digging edge characteristics and invariance characteristics between cameras; updating pedestrian instance characteristics and a camera centroid; the parameters of the model are updated with the contrast loss. By using the method and the device, the tag noise can be effectively reduced, the inter-camera domain gap is reduced, and the pedestrian re-recognition precision is obviously improved. The method for unsupervised pedestrian re-recognition based on the combined training strategy can be widely applied to the field of pedestrian re-recognition.

Description

Unsupervised pedestrian re-recognition method based on joint training strategy

Technical Field

The invention belongs to the field of artificial intelligence and pedestrian re-recognition, and particularly relates to an unsupervised pedestrian re-recognition method based on a combined training strategy.

Background

Pedestrian re-recognition mainly matches pedestrian images, and finds pedestrian images of the same category as given pedestrian images. The pedestrian re-identification technology plays a vital role in the fields of smart cities, intelligent security and the like, and can be applied to the fields such as criminal suspects tracking, missing population searching, people flow statistics and the like.

In recent years, the task of re-identifying the supervised pedestrians has been greatly advanced, but due to the large demand of a large-scale monitoring system, the monitoring data is continuously increased, and the high marking cost is added, the application of the system is greatly limited due to the dependence on a large number of manual marks. Therefore, the unsupervised pedestrian re-recognition task is increasingly focused, can be directly learned from unlabeled data, has stronger expandability, and has great application value in the industrial field.

The main methods of current unsupervised pedestrian re-identification task research are generally classified into three categories, (1) an unsupervised domain adaptive method is used to adjust the feature distribution between the source domain and the target domain. (2) The camera perception method is utilized to enable the model to learn the capability of distinguishing sample characteristics under different cameras. (3) And generating pseudo labels for training on the target domain by a clustering method, and distributing the same pseudo labels to similar images. The first class defines unsupervised pedestrian re-recognition tasks as transfer learning tasks, which typically use both source and target domain data sets and employ marked data sets on the source domain to assist in training. The latter two categories are training pedestrian re-recognition models with complete unsupervised. Compared with an unsupervised domain self-adaptive pedestrian re-recognition method, the completely unsupervised pedestrian re-recognition method has more application value. This is because when the difference in the characteristic distribution of the source domain and the target domain is large, it is difficult to obtain a high-quality pseudo tag, which tends to affect the performance because of much tag noise. In practical application, the sample with the tag is difficult to obtain, so that the application of the unsupervised domain self-adaptive method is limited. The completely unsupervised pedestrian re-recognition method can only train the depth model by using unlabeled images, so that the method has more practical value in the industrial field and is more widely applied. The invention mainly aims at the field of completely unsupervised pedestrian re-recognition and provides an unsupervised pedestrian re-recognition method based on a combined training strategy.

The popular unsupervised pedestrian re-identification method in recent years mainly comprises the steps of adopting a clustering algorithm to distribute pseudo labels to unlabeled samples, then updating an example feature library, calculating a centroid, and finally optimizing a model by utilizing contrast learning loss. The contrast learning shows good performance in the field of unsupervised pedestrian re-recognition. Ge et al propose a self-walking contrast learning framework that dynamically updates a hybrid feature library containing source and target domain dataset features, and then performs contrast learning (Yixiao Ge,Feng Zhu,Dapeng Chen,Rui Zhao,et al.Self-paced contrastive learning with hybrid memory for domain adaptive object re-id.[C]//Advances in Neural Information Processing Systems,NeurIPS.2020:11309–11321). with the appearance of the person varying from camera view to camera view due to changes in viewpoint, lighting conditions, background, etc. In general, pedestrians of the same type have higher similarity in the same camera view and have larger appearance differences under different cameras, so how to reduce the field gap generated by the cameras is also one of research hotspots for unsupervised pedestrian re-recognition. The direction of current research is usually at the training level to let the model learn the invariance features between cameras. Yang et al propose a camera perceptron learning to mitigate the negative effects of noise samples and learn inter-camera invariance features (Fengxiang Yang,Zhun Zhong,Zhiming Luo,et al.Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification[C]//in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,CVPR.2021:4855–4864)., while the existing approach is effective, ignoring two important factors. (1) influence of tag noise. In each iteration process, the instance features are updated continuously, which inevitably introduces label noise, so that the accurate update of the instance features can effectively optimize the cluster distribution of the instance features and reduce the influence of the label noise. (2) learning of invariance features between cameras. Invariance between cameras is characterized by the most difficult to distinguish samples of the same identity in each camera, with a large inter-camera field gap. The clustering algorithm is difficult to cluster difficult samples with the same identity in all cameras into the same set, and the real labels are lacking in unsupervised pedestrian re-recognition, so that real supervised learning cannot be performed, and the model cannot effectively learn invariance characteristics among the cameras. The present invention aims to solve the two key unsupervised pedestrian re-recognition problems described above.

For the first problem, the present patent proposes a centralized momentum update strategy aimed at optimizing the cluster distribution to reduce the impact of label noise.

For the second problem, the invention provides a method for learning invariance characteristics among cameras, which aims to enable a model to learn the capability of distinguishing invariance characteristic samples under different cameras, so that the domain gap among the cameras is reduced, and the performance of unsupervised pedestrian re-identification is improved.

Disclosure of Invention

In order to solve the technical problems, the invention aims to provide an unsupervised pedestrian re-identification method based on a combined training strategy. The method can optimize global cluster distribution by utilizing a centralized momentum update strategy, and the model learns the capability of distinguishing invariance characteristic samples under different cameras by utilizing an invariance characteristic learning method among the cameras, so that the influence of tag noise and the domain gap among the cameras are reduced, and the performance of unsupervised pedestrian re-identification is improved.

An unsupervised pedestrian re-identification method based on a joint training strategy comprises the following steps:

Step 1: dividing the pedestrian image into a training set and a testing set;

Step 2: extracting pedestrian characteristics of the training set by using a CNN network loaded with a pre-training model;

step 3: calculating the similarity between pedestrian features, clustering the similarity between the features by using a density clustering algorithm, and generating a pseudo tag;

step 4: removing outlier features, and constructing a new pedestrian training set by using the pseudo tag and the corresponding pedestrian features;

Step 5: and (5) extracting the pedestrian training set constructed in the step (4) to initialize the pedestrian centroid and the camera centroid. An arithmetic average of pedestrian features having the same pseudo tag is calculated as a pedestrian centroid, and an arithmetic average of pedestrian features having the same camera ID is calculated as a camera centroid.

Step 6: and extracting the pedestrian characteristics in the pedestrian training set constructed in the step 4 by using ResNet network. Then executing a centralized momentum update strategy, wherein the specific content of the strategy is as follows: and 5, calculating the similarity between the pedestrian features with the same pseudo labels and the pedestrian centroid obtained in the step 5, and representing the pedestrian feature with the minimum similarity as an edge feature. And updating all pedestrian instance features under the same pseudo tag by utilizing the edge features. Then, the invariance characteristic study among the cameras is carried out, and the specific content of the method is as follows: firstly, calculating the similarity between the pedestrian characteristics with the same camera ID and the camera mass center obtained in the step 5, representing the pedestrian characteristics with the maximum similarity as camera invariance characteristics, and then updating all pedestrian example characteristics under the same pseudo-label and the camera mass center under the same camera ID by utilizing the characteristics. And finally, calculating the mass center of the pedestrian by using the updated pedestrian example characteristics.

Step 7: and (3) extracting a pedestrian inquiry sample by using the pedestrian training set constructed in the step (4), calculating a comparison learning loss by using the sample and the pedestrian centroid obtained in the step (6), and updating parameters of the model.

Step 8: inputting the test set image into the optimal CNN model obtained by training in the step 7 to extract the pedestrian characteristics of the image. And calculating the characteristic distance of the pedestrian images in the query set and the test set to obtain an unsupervised pedestrian re-identification result.

Further, the specific process of step 2 is as follows: the selected CNN is ResNet to load the ImageNet pre-training model and delete its last classification layer. All images of the training set are input into ResNet, and a feature space V (Φ (x ₁),φ(x₂),...,φ(x_n)) is formed assuming that the pedestrian feature Φ (x _i) of the ith image is extracted.

Further, the specific process of step 3 is as follows:

The similarity between pedestrian features is calculated using the Jaccard similarity formula, which is as follows:

Wherein s _i and s _j represent the ith and jth pedestrian features, respectively, J (s _i,s_j) represents the similarity between the pedestrian features s _i and s _j, n represents the intersection, and u represents the union.

After the similarity between the pedestrian features is obtained, a pseudo tag Y= { Y ₁,y₂,...,y_M,y_M+1,...,y_N } is assigned to each pedestrian feature X= { X ₁,x₂,...,x_M,x_M+1,...,x_N } by using density clustering.

Further, the specific process of step 4 is as follows:

And 3, obtaining a pseudo tag Y= { Y ₁,y₂,...,y_M,y_M+1,...,y_N } and a corresponding pedestrian characteristic X= { X ₁,x₂,...,x_M,x_M+1,...,x_N } in the training set, wherein Y= { Y _M+1,...,y_N } is a discrete value pseudo tag generated by clustering, the corresponding discrete value characteristic is X= { X _M+1,...,x_N }, the training set with the discrete value removed is X= { X ₁,x₂,...,x_M }, and the corresponding pseudo tag is Y= { Y ₁,y₂,...,y_M }.

Further, the specific process of step 5 is as follows:

The training set X= { X ₁,x₂,…,x_M } and the pseudo tag Y= { Y ₁,y₂,…,y_M } with discrete values removed are obtained through the step 4, the pedestrian characteristic X _i＝{x₁,x₂,...,x_n of the ith pseudo tag is extracted, the average value of the characteristic set is calculated as the centroid of the ith pedestrian, and the calculation formula is as follows:

wherein, Is a d-dimensional vector belonging to the nth pedestrian instance feature in the ith cluster set. Alpha _i represents all pedestrian instance features under the ith cluster set, |·| represents the number of all pedestrian instance features under the cluster set, and V _i is the ith class of pedestrian centroid.

Extracting pedestrian characteristics Y _k＝{y₁,y₂,...,y_m under the kth camera, and calculating the average value of the characteristic set as the mass center of the kth camera, wherein the calculation formula is as follows:

Wherein the method comprises the steps of Is a d-dimensional vector belonging to the nth instance in the kth camera set. Beta _k denotes all camera instance features under the kth camera set, |·| denotes the number of all camera instance features under the camera, and C _k is the kth camera centroid.

Further, the specific process of step 6 is as follows:

And 5, obtaining a pedestrian centroid V and a camera centroid C which are obtained through the step 5, and obtaining the pedestrian characteristic X with discrete values removed through the step 4. The pedestrian feature X is input into ResNet network for feature extraction, and then a centralized momentum update strategy is executed: firstly, calculating the similarity between the extracted features and the pedestrian centroid of the same pseudo tag, and selecting the edge feature with the minimum similarity to update all pedestrian instance features under the same pseudo tag. Then, the invariance characteristic study among cameras is carried out: firstly, calculating the similarity between pedestrian features under the same camera ID and the camera centroid, selecting the pedestrian feature with the largest similarity as the camera invariance feature, and updating all pedestrian instance features under the same pseudo-tag and the camera centroid of the same camera ID by utilizing the feature. And finally, calculating the mass center of the pedestrian by using the updated pedestrian example characteristics, wherein the calculation formula is as follows:

C_k←m_cC_k+(1-m_c)p_k (6)

Where V _i is the i-th pedestrian centroid and C _k is the k-th camera centroid. m is a momentum update parameter, and m _c is a camera momentum update parameter. Alpha _i represents all pedestrian instance features under the i-th cluster set, beta _k represents all camera instance features under the k-th camera set. p _k denotes the invariance features of the kth camera set, Representing edge features of the class i cluster set.Is a d-dimensional vector, belongs to the nth pedestrian instance feature in the ith cluster set,Representing any one of the features under the class i cluster set.

Further, the specific process of step 7 is as follows:

And (3) comparing the pedestrian centroid obtained in the step (5) with the pedestrian inquiry sample in the training set X extracted in the step (4) to learn and calculate the loss, wherein the calculation formula is as follows:

Where τ is a temperature hyper-parameter, f is a pedestrian query sample, V ⁺ is a positive sample pedestrian centroid, K is a number of cluster categories, and the objective of the model parameter optimization is to improve the similarity of a pedestrian query instance and a corresponding same pseudo-tag pedestrian centroid, and reduce the similarity of a pedestrian query instance and different pseudo-tag pedestrian centroids.

The beneficial effects of the invention are as follows: the invention adopts the pedestrian centroid to mine the edge characteristics and uses the characteristics to update the pedestrian example characteristics, so that the influence of the label noise on the cluster distribution can be reduced. According to the invention, the camera centroid is adopted to mine the invariance characteristics among the cameras, and the characteristics are utilized to update all pedestrian example characteristics under the same pseudo labels, so that the inter-camera domain gaps distributed in a clustering way can be reduced, and the capability of the model for distinguishing the same pedestrian identity under different cameras is improved. Finally, the model parameters are optimized through comparison and learning loss, and the accuracy of pedestrian re-identification is effectively improved.

Drawings

FIG. 1 is a flow chart of an unsupervised pedestrian re-recognition method based on a joint training strategy of the present invention;

FIG. 2 is a flowchart of training steps according to an embodiment of the present invention;

FIG. 3 is a flow chart of testing steps according to an embodiment of the invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and specific examples, which include, but are not limited to, the following examples.

As shown in fig. 1, the invention provides an unsupervised pedestrian re-recognition method based on a joint training strategy, which comprises the following specific implementation processes:

1. Pedestrian image feature extraction

As shown in fig. 2, the training set features are extracted by using CNNs, and assuming that the pedestrian features phi (x _i) of the ith image are extracted, a feature space V (phi (x ₁),φ(x₂),...,φ(x_n)) is formed, the selected CNNs are ResNet for loading the ImageNet pre-training model, and then the last classification layer is deleted.

2. Cluster allocation pseudo tag

And (3) calculating the similarity between the pedestrian features extracted in the step (2) by using a Jaccard similarity formula, wherein the calculation formula is as follows:

Where s _i and s _j represent the ith and jth pedestrians, J (s _i,s_j) represents the similarity between the pedestrian features s _i and s _j, and n represents the intersection, and u represents the union.

3. Generating a training set of pseudo tags

And 3, obtaining a pseudo tag Y= { Y ₁,y₂,...,y_M,y_M+1,...,y_N } and a corresponding pedestrian characteristic X= { X ₁,x₂,...,x_M,x_M+1,...,x_N }, wherein the training set with discrete values removed is X= { X ₁,x₂,...,x_M }, and the corresponding pseudo tag is Y= { Y ₁,y₂,...,y_M }.

4. Centroid initialization and updating of camera centroids and pedestrian instance features

And (3) extracting the pedestrian training set X= { X ₁,x₂,...,x_N } constructed in the step (4), extracting the pedestrian characteristic X _i＝{x₁,x₂,...,x_n } under the ith pseudo tag, and initializing and calculating the pedestrian centroid V= { V ₁,V₂,...,V_n }. Pedestrian feature Y _k＝{y₁,y₂,...,y_m under the kth camera is extracted and the calculated camera centroid c= { C ₁,C₂,...,C_n } is initialized. The centroid calculation formula is as follows:

Wherein the method comprises the steps of Is a d-dimensional vector belonging to the nth pedestrian instance feature in the ith cluster set. Alpha _i represents all pedestrian instance features under the ith cluster set, and |·| represents the number of all pedestrian instance features under the cluster set.Is a d-dimensional vector belonging to the nth instance in the kth camera set. Beta _k denotes all camera instance features under the kth camera set. V _i is the i-th pedestrian centroid and C _k is the k-th camera centroid.

And (3) inputting the pedestrian characteristics obtained in the step (4) into a ResNet network for characteristic extraction, calculating the similarity between the pedestrian centroid V obtained in the step (5) and the pedestrian characteristics under the same pseudo-label, representing the pedestrian characteristics with the minimum similarity as edge characteristics, and updating all the pedestrian example characteristics under the same pseudo-label by using the characteristics. And simultaneously calculating the similarity between the pedestrian characteristics under the same camera ID and the camera centroid, representing the pedestrian characteristics with the maximum similarity as camera invariance characteristics, and updating all pedestrian instance characteristics under the same pseudo-label and the camera centroid of the same camera ID by using the characteristics. And finally, calculating the mass center of the pedestrian by utilizing the pedestrian example characteristics. The calculation formula is as follows:

C_k←m_cC_k+(1-m_c)p_k (14)

Where V _i is the i-th pedestrian centroid and C _k is the k-th camera centroid. m is a momentum update parameter, and m _c is a camera momentum update parameter. Alpha _i represents all pedestrian instance features under the ith cluster set, beta _k represents all camera instance features under the kth camera set. p _k denotes the invariance features of the kth camera set, Representing edge features of the class i cluster set.Is a d-dimensional vector belonging to the nth pedestrian instance feature in the ith cluster set.

5. Contrast loss training network

And (3) comparing the updated pedestrian centroid in the step (6) with the pedestrian inquiry sample in the training set X extracted in the step (4) to learn and calculate the loss, wherein the calculation formula is as follows:

where τ is the temperature hyper-parameter, f is the pedestrian query sample, V ⁺ is the positive sample pedestrian centroid, K is the number of cluster categories. The direction of optimizing the model parameters by the loss function is to improve the similarity between the pedestrian inquiry example and the corresponding pedestrian centroid of the same pseudo tag and reduce the similarity between the pedestrian inquiry example and the pedestrian centroid of different pseudo tags.

6. Test set pedestrian retrieval

As shown in fig. 3, the pedestrian image of the test set is input to the best ResNet obtained through training in step 7 to perform feature extraction, and the unsupervised pedestrian re-recognition result can be obtained by calculating the distance between the pedestrian features in the query set and the test set. For example, the Euclidean distance is used for measuring the distance between the features of pedestrians, if the smaller the Euclidean distance between the features is, the more similar the two pedestrian images are, the larger the probability of belonging to the same type of pedestrians is, and finally, the unsupervised pedestrian re-recognition result is obtained.

In summary, the invention discloses an unsupervised pedestrian re-identification method based on a combined training strategy. According to the invention, the edge characteristics and the invariance samples among the cameras are jointly excavated through the center of mass of the camera and the center of mass of the pedestrian, the characteristics of the pedestrian example are updated by utilizing the excavated characteristics, and then the model parameters are optimized through comparison and learning. Therefore, the influence of label noise and inter-camera domain gaps on cluster distribution is reduced, so that the model can learn the capability of distinguishing invariance characteristic samples among cameras, and the performance of unsupervised pedestrian re-recognition is improved.

Firstly, the pedestrian centroid is utilized to mine the characteristics at the clustering edge, and the characteristics are utilized to update the pedestrian instance characteristics, so that the label noise is reduced, and the clustering effect is improved.

Secondly, adopting the mass center of the cameras to mine invariance characteristics among the cameras, and updating all pedestrian example characteristics under the same pseudo labels by utilizing the characteristics. Thus, the model can learn the distribution of invariance characteristics among the cameras, and the domain gap among the cameras is reduced.

Finally, in the process of updating pedestrian example characteristics in a combined mode through the edge characteristics and the invariance characteristics between the cameras, the optimal global cluster distribution is gradually achieved, and the more robust characteristics are learned by the model through contrast learning, so that the accuracy of pedestrian re-identification is effectively improved.

Claims

1. An unsupervised pedestrian re-identification method based on a joint training strategy is characterized by comprising the following steps:

Step 1: dividing the pedestrian image into a training set and a testing set;

The selected CNN is ResNet to load an ImageNet pre-training model, and the last classification layer is deleted; inputting all images of the training set into ResNet, and forming a feature space V (phi (x ₁),φ(x₂),...,φ(x_n)) on the premise of extracting the pedestrian feature phi (x _i) of the ith image;

Step 3: calculating the similarity between pedestrian features by using a Jaccard similarity formula, clustering the similarity between features by using a density clustering algorithm, and generating a pseudo tag;

and calculating the similarity between pedestrian features by using a similarity formula, wherein the calculation formula is as follows:

Wherein s _i and s _j represent the ith and jth pedestrian features, respectively, J (s _i,s_j) represents the similarity between the pedestrian features s _i and s _j, ∈represents the intersection, and u represents the union;

After the similarity between the pedestrian features is obtained, a pseudo tag Y= { Y ₁,y₂,...,y_M,y_M+1,...,y_N } is distributed to each pedestrian feature X= { X ₁,x₂,…,x_M,x_M+1,…,x_N } by using density clustering;

Obtaining a pseudo tag Y= { Y ₁,y₂,…,y_M,y_M+1,…,y_N } and a corresponding pedestrian feature X= { X ₁,x₂,...,x_M,x_M+1,...,x_N } in the training set through the step 3, wherein Y= { Y _M+1,...,y_N } is a discrete value pseudo tag generated by clustering, the corresponding discrete value feature is X= { X _M+1,...,x_N }, the training set with the discrete value removed is X= { X ₁,x₂,...,x_M }, and the corresponding pseudo tag is Y= { Y ₁,y₂,...,y_M };

Step 5: extracting the pedestrian training set constructed in the step 4 to initialize the pedestrian centroid and the camera centroid; calculating an arithmetic average of pedestrian features having the same pseudo tag as a pedestrian centroid, and calculating an arithmetic average of pedestrian features having the same camera ID as a camera centroid;

The training set X= { X ₁,x₂,...,x_M } and the pseudo tag Y= { Y ₁,y₂,...,y_M } with discrete values removed are obtained through the step 4, the pedestrian characteristic X _i＝{x₁,x₂,...,x_n of the ith pseudo tag is extracted, the average value of the characteristic set is calculated as the centroid of the ith pedestrian, and the calculation formula is as follows:

wherein, Is a d-dimensional vector, belonging to the nth pedestrian instance feature in the ith cluster set; alpha _i represents all pedestrian instance features under the ith cluster set, |·| represents the number of all pedestrian instance features under the cluster set, V _i is the ith class of pedestrian centroid;

Extracting pedestrian characteristics Y _k＝{y₁,y₂,…,y_m under the kth camera, and calculating the average value of the characteristic set as the mass center of the kth camera, wherein the calculation formula is as follows:

wherein, Is a d-dimensional vector belonging to the nth instance in the kth camera set; β _k denotes all camera instance features under the kth camera set, |·| denotes the number of all camera instance features under the camera, C _k is the kth camera centroid;

Step 6: extracting the pedestrian characteristics in the pedestrian training set constructed in the step 4 by using ResNet network; then executing a centralized momentum update strategy; the specific content of the strategy is as follows: calculating the similarity between the pedestrian features with the same pseudo labels and the pedestrian centroid obtained in the step 5, and representing the pedestrian feature with the minimum similarity as an edge feature; updating all pedestrian example features under the same pseudo tag by utilizing the edge features; then, camera invariance feature learning is carried out, and the specific content of the method is as follows: firstly, calculating the similarity between pedestrian features with the same camera ID and the camera mass center obtained in the step 5, representing the pedestrian feature with the maximum similarity as a camera invariance feature, and then updating all pedestrian instance features under the same pseudo tag and the camera mass center under the same camera ID by using the feature; finally, calculating the mass center of the pedestrian by using the updated pedestrian example characteristics;

Step 7: extracting a pedestrian inquiry sample by using the pedestrian training set constructed in the step 4, calculating and comparing learning loss and updating parameters of the model by using the sample and the pedestrian centroid loss obtained in the step 6;

And (3) comparing the pedestrian centroid obtained in the step (5) with the pedestrian inquiry sample in the extracted training set X to learn and calculate the loss, wherein the calculation formula is as follows:

Wherein τ is a temperature superparameter, f is a pedestrian inquiry sample, V ⁺ is a positive sample pedestrian centroid, K is the number of clustering categories, and the objective of optimizing model parameters by the loss function is to improve the similarity of a pedestrian inquiry example and a corresponding same pseudo-tag pedestrian centroid and reduce the similarity of the pedestrian inquiry example and different pseudo-tag pedestrian centroids;

Step 8: inputting the test set image into the optimal CNN model obtained by training in the step 7 to extract pedestrian characteristics of the image; and calculating the characteristic distance of the pedestrian images in the query set and the test set to obtain an unsupervised pedestrian re-identification result.

2. The method for unsupervised pedestrian re-recognition based on the combined training strategy according to claim 1, wherein in step 6, the pedestrian centroid V and the camera centroid C obtained in step 5 are subjected to step 4 to obtain the pedestrian feature X with the discrete values removed; the pedestrian feature X is input into ResNet network for feature extraction, and then a centralized momentum update strategy is executed: firstly, calculating the similarity between the extracted features and the pedestrian centroids of the same pseudo tags, and selecting edge features with minimum similarity to update all pedestrian instance features under the same pseudo tags; then, the invariance characteristic study among cameras is carried out: firstly, calculating the similarity between pedestrian features under the same camera ID and the camera centroid, selecting the pedestrian feature with the largest similarity as a camera invariance feature, and updating all pedestrian instance features under the same pseudo-tag and the camera centroid of the same camera ID by using the feature; and finally, calculating the mass center of the pedestrian by using the updated pedestrian example characteristics, wherein the calculation formula is as follows:

C_k←m_cC_k+(1-m_c)p_k (6)

where V _i is the i-th class pedestrian centroid, C _k is the k-th camera centroid, m is the momentum update parameter, m _c is the camera momentum update parameter, alpha _i represents all pedestrian instance features under the i-th class cluster set, beta _k represents all camera instance features under the k-th camera set, p _k represents the invariance feature of the k-th camera set, Representing edge features of a set of class i clusters,Is a d-dimensional vector, belongs to the nth pedestrian instance feature in the ith cluster set,Representing any one of the features under the class i cluster set.