CN114898406A

CN114898406A - Unsupervised pedestrian re-identification method based on contrast clustering

Info

Publication number: CN114898406A
Application number: CN202210664167.2A
Authority: CN
Inventors: 张远辉; 冯化涛; 刘康; 朱俊江; 付铎
Original assignee: China Jiliang University
Current assignee: China Jiliang University
Priority date: 2022-06-13
Filing date: 2022-06-13
Publication date: 2022-08-12

Abstract

The invention discloses an unsupervised pedestrian re-identification method based on contrast clustering, which comprises the following steps: performing forward calculation on the unmarked pedestrian image data set by using an initial feature encoder, and initializing a feature storage unit by using the encoded features; clustering the feature storage units before each round of training, and screening clustering results according to a clustering independence standard and a clustering closeness standard; carrying out feature coding on each group of small-batch training samples, and carrying out backward propagation updating on the network by using a uniform contrast loss function; dynamically updating instance features in the feature storage unit in a momentum updating manner by using the encoded features; and circularly updating the feature encoder and the feature storage unit according to the preset number of training rounds until the pedestrian re-identification network converges. The invention fully excavates the available information of the non-clustered outlier by using a contrast clustering method, and improves the identification accuracy of the unsupervised pedestrian re-identification model.

Description

An Unsupervised Person Re-ID Method Based on Contrastive Clustering

技术领域technical field

本发明涉及计算机视觉和行人重识别领域，特别涉及一种基于对比聚类的无监督行人重识别方法。The invention relates to the fields of computer vision and pedestrian re-identification, in particular to an unsupervised pedestrian re-identification method based on contrastive clustering.

背景技术Background technique

行人重识别也称行人再识别，被认为是一个图像检索的子问题，目标是在不重叠的多个监控摄像区域中检索特定行人。行人重识别技术可以弥补单个固定摄像头的视觉局限，并可与行人检测及行人跟踪技术相结合，被广泛应用于智能监控、视频追踪等安防领域。随着深度学习技术的发展与大规模数据集的提出，目前有监督的行人重识别方法在性能上取得了巨大的提升，然而基于有监督学习的算法极大地依赖于人工标注的真实标签，阻碍了行人重识别技术的进一步发展。另一方面，在现实中很容易获取大量无标注的行人图像数据，研究如何使用大规模无标注行人图像训练更为鲁棒的行人重识别模型，具有较大的研究价值。因此，无需任何标注信息的无监督行人重识别方法被提出以解决上述问题。Pedestrian re-identification, also known as person re-identification, is considered as a sub-problem of image retrieval, with the goal of retrieving specific pedestrians in multiple non-overlapping surveillance camera regions. Pedestrian re-identification technology can make up for the visual limitations of a single fixed camera, and can be combined with pedestrian detection and pedestrian tracking technology, and is widely used in intelligent monitoring, video tracking and other security fields. With the development of deep learning technology and the introduction of large-scale datasets, the current supervised person re-identification methods have achieved a huge improvement in performance. However, algorithms based on supervised learning greatly rely on manually annotated real labels, hindering The further development of pedestrian re-identification technology. On the other hand, it is easy to obtain a large amount of unlabeled pedestrian image data in reality, and it is of great research value to study how to use large-scale unlabeled pedestrian images to train a more robust person re-identification model. Therefore, an unsupervised person re-identification method without any annotation information is proposed to solve the above problems.

无监督行人重识别方法中主要包括基于伪标签的方法和基于图像生成的方法，其中基于聚类的伪标签法被证实较为有效，且保持目前最先进的精度。目前大部分基于聚类的伪标签方法在训练上分为两步：第一步，使用初始特征编码器对行人图像进行特征编码；第二步，对编码的特征进行聚类得到伪标签，以监督网络的训练。尽管该类方法可以一定程度上随着模型的优化不断提升伪标签的质量，但是模型的训练往往被无法避免的伪标签噪声所干扰，并且在初始伪标签噪声较大的情况下，模型有较大的崩溃风险。此外，基于聚类的伪标签方法往往没有用到全部的无标签训练数据，基于密度的聚类算法本身会产生聚类离群值，这些聚类离群值由于无法分配伪标签，通常被简单地丢弃而不用于模型训练。然而此类聚类离群值往往正是行人数据集中值得挖掘的困难训练样本，尤其在训练的早期，往往存在大量的聚类离群值，若简单丢弃它们将大幅减少训练样本，严重损害模型的性能。Unsupervised person re-identification methods mainly include methods based on pseudo-labels and methods based on image generation. Among them, the pseudo-label method based on clustering has been proved to be more effective and maintains the current state-of-the-art accuracy. At present, most of the clustering-based pseudo-label methods are divided into two steps in training: the first step is to use the initial feature encoder to encode pedestrian images; the second step is to cluster the encoded features to obtain pseudo labels, as Supervise the training of the network. Although this kind of method can continuously improve the quality of pseudo-labels with the optimization of the model to a certain extent, the training of the model is often disturbed by the unavoidable pseudo-label noise, and when the initial pseudo-label noise is large, the model has more Great risk of collapse. In addition, cluster-based pseudo-label methods often do not use all unlabeled training data, and density-based clustering algorithms themselves will generate cluster outliers. These cluster outliers cannot be assigned pseudo-labels. are discarded and not used for model training. However, such clustering outliers are often the difficult training samples worth mining in the pedestrian dataset. Especially in the early stage of training, there are often a large number of clustering outliers. Simply discarding them will greatly reduce the training samples and seriously damage the model. performance.

对比学习近几年被广泛应用在无监督表示学习任务上，对比学习可以在无监督的情况下,让模型充分学习到同类别样本之间的相似性,以及不同类别之间的差异性，将每一个未标注的样本视为不同的类别，通过优化对比损失去学习样本实例的判别性表示。然而目前的对比损失大多为样本实例级别的，难以正确地度量行人图像数据集中的类内关系。Contrastive learning has been widely used in unsupervised representation learning tasks in recent years. Contrastive learning can fully learn the similarity between samples of the same category and the differences between different categories without supervision. Each unlabeled sample is regarded as a different class, and the discriminative representation of the sample instance is learned by optimizing the contrast loss. However, most of the current contrastive losses are at the sample instance level, and it is difficult to correctly measure the intra-class relationship in pedestrian image datasets.

发明内容SUMMARY OF THE INVENTION

为了解决上述问题，本发明提出了一种基于对比聚类的无监督行人重识别方法，通过进行聚类簇质心及未聚类离群值的联合对比学习，充分挖掘目标域数据集中的困难训练样本，并有效地建模行人的类内关系。In order to solve the above problems, the present invention proposes an unsupervised pedestrian re-identification method based on contrastive clustering. Through joint contrastive learning of cluster centroids and unclustered outliers, the difficult training in the target domain data set can be fully exploited. samples, and effectively model the intra-class relationships of pedestrians.

具体而言，本发明提出的一种基于对比聚类的无监督行人重识别方法的技术方案包括以下步骤：Specifically, the technical solution of an unsupervised pedestrian re-identification method based on contrastive clustering proposed by the present invention includes the following steps:

步骤1：使用初始特征编码器f_θ对无标注训练数据集中的行人图像进行前向计算，利用编码的特征初始化基于类别原型的特征存储单元；Step 1: Use the initial feature encoder f _θ to perform forward calculation on the pedestrian images in the unlabeled training data set, and use the encoded features to initialize the feature storage unit based on the category prototype;

步骤2：在每轮训练前使用DBSCAN聚类算法对特征存储单元中的编码特征进行聚类，根据聚类可靠性评价标准进行聚类结果的筛选；Step 2: Before each round of training, use the DBSCAN clustering algorithm to cluster the coding features in the feature storage unit, and screen the clustering results according to the clustering reliability evaluation standard;

步骤3：对于每一组小批量训练样本，使用编码器f_θ对其进行特征编码得到小批量样本特征f，利用统一对比损失函数计算小批量样本特征f与特征存储单元中特征之间的损失，并进行网络的反向传播更新；Step 3: For each group of mini-batch training samples, use the encoder f _θ to perform feature encoding on them to obtain the mini-batch sample feature f, and use the unified contrast loss function to calculate the loss between the mini-batch sample feature f and the features in the feature storage unit , and perform back-propagation updates to the network;

步骤4：在每一次迭代训练过程中，利用小批量训练样本前向计算得到的编码特征，以动量更新的方式动态更新特征存储单元；Step 4: In each iterative training process, the feature storage unit is dynamically updated in the manner of momentum update by using the coding features obtained by forward calculation of the training samples in small batches;

步骤5：根据预先设定的训练轮回数，循环进行步骤2-步骤4，直到行人重识别模型收敛。Step 5: According to the preset number of training rounds, repeat steps 2-4 until the pedestrian re-identification model converges.

进一步地，步骤1中特征编码器及特征存储单元的初始化过程为：Further, the initialization process of the feature encoder and feature storage unit in step 1 is:

使用ResNet-50深度神经网络作为特征编码器f_θ，并使用ImageNet图像数据集上的预训练权重对其进行初始化；Use a ResNet-50 deep neural network as the feature encoder f _θ and initialize it with pretrained weights on the ImageNet image dataset;

使用特征编码器f_θ对行人图像数据集中的样本进行前向计算提取特征，得到特征集合{v₁,…,v_n}，其中n表示行人图像数据集中的样本数量，将特征集合中的特征以实例为单位全部存储到特征存储单元中，使得在聚类簇和未聚类离群值不断变化的情况下，特征存储单元中的类别原型仍然能够得到持续更新。Use the feature encoder f _θ to perform forward calculation on the samples in the pedestrian image data set to extract features, and obtain a feature set {v ₁ ,...,v _n }, where n represents the number of samples in the pedestrian image data set, and the features in the feature set are All instances are stored in the feature storage unit, so that the category prototypes in the feature storage unit can still be continuously updated when the clustered clusters and unclustered outliers are constantly changing.

进一步地，步骤2中对特征的聚类及筛选过程为：Further, the clustering and screening process of the features in step 2 is:

首先使用DBSCAN聚类算法对步骤1中特征存储单元中的特征集合{v₁,…,v_n}进行聚类，特征存储单元中的类别原型进一步分为聚类簇质心

及未聚类离群值实例

其中n_c表示聚类簇的数量，n_o表示未聚类离群值的数量，随后依据聚类独立性与聚类紧密性标准，对聚类的结果进行筛选，并采用k-reciprocal近邻算法对检索结果进行重排序。First, the DBSCAN clustering algorithm is used to cluster the feature set {v ₁ ,...,v _n } in the feature storage unit in step 1, and the category prototypes in the feature storage unit are further divided into cluster centroids

and unclustered outlier instances

where n _c represents the number of clustered clusters, and n _o represents the number of unclustered outliers. Then, according to the criteria of cluster independence and cluster tightness, the clustering results are screened, and the k-reciprocal nearest neighbor algorithm is used. Reorder search results.

特征存储单元中的聚类簇和未聚类实例都被看作平等且独立的类，故聚类的可靠性对训练的影响至关重要，且网络在训练的一开始对于图像的辨别性较差，聚类的噪声也较大，因此提出自步学习策略改善聚类的效果。具体地，在每轮训练开始前重新进行聚类，从最可靠的聚类开始，保留可靠的聚类簇，而将不可靠的聚类簇中的特征拆解回无聚类的离群值实例中，逐步增加聚类簇的数量，通过调整DBSCAN聚类算法中样本的∈-邻域距离阈值，交替放宽与缩紧聚类标准，得到更为可靠的聚类结果。Both clustered clusters and unclustered instances in the feature storage unit are regarded as equal and independent classes, so the reliability of clustering is very important to the impact of training, and the network is more discriminative for images at the beginning of training. poor, and the noise of the clustering is also large, so a self-step learning strategy is proposed to improve the effect of clustering. Specifically, re-clustering is performed before the start of each round of training, starting from the most reliable clusters, retaining the reliable clusters, and disassembling the features in the unreliable clusters back to cluster-free outliers In the example, the number of clusters is gradually increased, and more reliable clustering results are obtained by adjusting the ε-neighbor distance threshold of the samples in the DBSCAN clustering algorithm, and alternately relaxing and tightening the clustering criteria.

所述聚类独立性标准用于度量类间距离，表现为特征集合与放宽聚类标准后特征集合之间的交并比：The cluster independence criterion is used to measure the distance between classes, which is expressed as the intersection ratio between the feature set and the feature set after relaxing the clustering criteria:

其中|·|表示集合中特征的数量，I(f_i)表示同一个聚类簇中的样本集合，I_loose(f_i)表示放宽聚类标准后同一个簇中的样本集合，R_indep(f_i)表示簇I(f_i)的独立性得分；where |·| represents the number of features in the set, I( _fi ) represents the sample set in the same cluster, I _loose ( _fi ) represents the sample set in the same cluster after relaxing the clustering criteria, R _indep ( f _i ) represents the independence score of cluster I(fi ₎ ;

所述聚类紧密性标准用于度量类内距离，表现为特征集合与缩紧聚类标准后特征集合之间的交并比：The cluster tightness criterion is used to measure the intra-class distance, which is expressed as the intersection ratio between the feature set and the feature set after the clustering criterion is compressed:

其中I_tight(f_i)表示缩紧聚类标准后同一个聚类簇中的样本集合，R_comp(f_i)表示簇I(f_i)的紧密性得分；where I _tight ( _fi ) represents the sample set in the same cluster after the clustering standard is tightened, and R _comp ( _fi ) represents the tightness score of the cluster I ( _fi );

通过以上聚类可靠性评价标准实现聚类簇之间的独立性及样本之间的紧密性的度量，该聚类可靠性评价标准的出发点在于，一个可靠的聚类应当在多尺度的聚类环境下保持稳定，设置超参数α,β∈[0,1]表示独立性和紧密性阈值，保留类间独立性R_comp(f_i)>α且类内紧密性R_indep(f_i)>β的聚类簇样本，将其余样本划分为未聚类的离群值。The above cluster reliability evaluation criteria are used to measure the independence between clusters and the closeness between samples. The starting point of the cluster reliability evaluation criteria is that a reliable cluster should be based on multi-scale clustering. Stable under the environment, set the hyperparameter α,β∈[0,1] to represent the threshold of independence and compactness, keep the inter-class independence R _comp (f _i )>α and the intra-class compactness R _indep (f _i )>β's clustered cluster samples, and the remaining samples are divided into unclustered outliers.

进一步地，步骤3中的统一对比损失函数表示为：Further, the unified contrastive loss function in step 3 is expressed as:

给定无标签的训练样本

通过特征编码器编码后将特征全部存储到特征存储单元中，使用步骤2中的自步学习策略将特征集合划分为聚类簇特征和未聚类离群值特征，从而整个训练数据集被分为有聚类伪标签的样本集合

和不属于任何聚类的离群值实例样本集合

且

Given unlabeled training samples

After encoding through the feature encoder, all the features are stored in the feature storage unit, and the feature set is divided into clustered cluster features and unclustered outlier features using the self-paced learning strategy in step 2, so that the entire training data set is classified into is a set of samples with clustered pseudo-labels

and a sample set of outlier instances that do not belong to any cluster

and

给定训练样本

使用特征编码器对每个训练样本进行前向计算得到编码的特征f，构建统一对比损失函数：Given training samples

Use the feature encoder to perform forward calculation on each training sample to obtain the encoded feature f, and construct a unified contrastive loss function:

其中，z⁺表示特征f的正面类别原型，τ表示温度系数，<·,·>表示向量内积，c_k为当前聚类k的质心，表示聚类内的类别原型，v_k为当前聚类k离群点的实例特征，表示无聚类的类别原型；Among them, z ⁺ represents the positive category prototype of the feature f, τ represents the temperature coefficient, <·,·> represents the vector inner product, _ck is the centroid of the current cluster k, which represents the category prototype in the cluster, and v _k is the current cluster. Instance features of class k outliers, representing class prototypes without clustering;

若f属于聚类簇k，那么z⁺＝c_k为聚类簇k的质心；若f属于未聚类离群值，那么z⁺＝v_k为未聚类离群值实例特征，上述对比损失促进编码特征靠近其真实类别，对小批量样本编码特征后，与两种类别原型进行比较，使得每个训练样本靠近它所属于的类别而远离其他类别。If f belongs to cluster k, then z ⁺ =c _k is the centroid of cluster k; if f belongs to unclustered outliers, then z ⁺ =v _k is the instance feature of unclustered outliers. The above comparison The loss promotes encoding features close to their true classes. After encoding features in mini-batches, they are compared with two class prototypes, so that each training sample is close to the class it belongs to and far away from other classes.

进一步地，步骤4中特征存储单元动量更新的过程为：Further, the process of updating the momentum of the feature storage unit in step 4 is:

首先将训练样本全部以实例为单位进行特征存储，此后将每个小批量样本中的特征根据索引号以动量更新的方式累加到特征存储单元对应的实例特征中；First, the training samples are all stored in the instance unit, and then the features in each mini-batch sample are accumulated into the instance features corresponding to the feature storage unit in the form of momentum update according to the index number;

在特征存储单元中，对同一聚类簇中的特征{v₁,…,v_n}计算特征之间的平均值得到聚类簇质心

而未聚类离群值的实例特征

直接从特征存储单元中提取剩余的实例特征，第k个聚类簇的质心表示为：In the feature storage unit, the average value between the features is calculated for the features {v ₁ ,...,v _n } in the same cluster to obtain the cluster centroid

while the instance features of the unclustered outliers

The remaining instance features are directly extracted from the feature storage unit, and the centroid of the kth cluster is expressed as:

其中I_k表示第k个聚类簇的特征向量集合，|·|表示集合中特征向量的数量，特征存储单元中的实例特征{v}最初通过网络前向计算初始化一次，在此后的训练过程中不断进行更新以进行更为鲁棒的聚类；where I _k represents the feature vector set of the kth cluster, |·| represents the number of feature vectors in the set, and the instance feature {v} in the feature storage unit is initially initialized once by the network forward calculation, and in the subsequent training process is continuously updated for more robust clustering;

在每一轮训练过程中，使用小批量样本中的编码特征以动量更新的方式更新特征存储单元中的类别原型，对实例特征v_i及当前样本特征f_i进行动量加权求和，得到更新后的实例特征：In each round of training, use the encoded features in the small batch of samples to update the category prototype in the feature storage unit in a momentum update manner, and perform a momentum weighted summation on the instance feature v _i and the current sample feature f _i to obtain the updated Instance features of :

v_i←mv_i+(1-m)f_i v _i ←mv _i +(1-m)f _i

其中m∈[0,1]为动量因子，表示实例特征v_i及样本特征f_i在动量更新过程中所占的比重，给定更新后的v_i，若f_i属于聚类簇k，那么对应的聚类簇质心c_k相应进行更新。where m∈[0,1] is the momentum factor, which represents the proportion of instance feature vi and sample feature _fi in the momentum update process. Given the updated vi _, if _fi belongs to cluster _k , then The corresponding cluster centroids _ck are updated accordingly.

进一步地，步骤5中本方法整体的训练流程为：Further, the overall training process of this method in step 5 is:

对于无标注训练数据集

通过ResNet-50特征编码器进行特征编码，使用得到的特征集合初始化特征存储单元；For unlabeled training datasets

The feature encoding is performed by the ResNet-50 feature encoder, and the feature storage unit is initialized with the obtained feature set;

在此后的每一次训练轮回中，依据聚类独立性及聚类紧密性标准，对特征存储单元中的特征集合{v}进行聚类，将训练样本

划分为聚类簇样本

和无聚类离群值样本

计算

中的聚类质心；In each subsequent training round, the feature set {v} in the feature storage unit is clustered according to the cluster independence and cluster tightness criteria, and the training samples are

Divide samples into clusters

and cluster-free outlier samples

calculate

The cluster centroids in ;

对于每个小批量中的训练样本，使用特征编码器f_θ进行特征编码，计算对比损失，并进行反向传播更新编码器f_θ；For training samples in each mini-batch, use the feature encoder f _θ for feature encoding, compute the contrastive loss, and perform backpropagation to update the encoder f _θ ;

更新特征存储单元中的特征集合{v}，依据更新后的特征集合{v}再进行聚类簇质心c_k的更新；Update the feature set {v} in the feature storage unit, and then update the cluster centroid _ck according to the updated feature set {v};

循环进行特征编码器f_θ及特征存储单元的更新，直至模型实现较好收敛。The feature encoder f _θ and feature storage unit are updated cyclically until the model achieves better convergence.

与现有技术相比，本发明的有益效果在于：Compared with the prior art, the beneficial effects of the present invention are:

(1)本发明提出特征存储单元存储无标签数据域上所有有效信息用以学习更充分的特征表示，通过动态更新特征存储单元提供聚类及未聚类离群值实例两种监督信息，充分利用无标注数据集中的困难训练样本；(1) The present invention proposes that the feature storage unit stores all valid information on the unlabeled data domain to learn more sufficient feature representations, and provides two kinds of supervision information of clustering and unclustered outlier instances by dynamically updating the feature storage unit, which is sufficient Utilize difficult training samples from unlabeled datasets;

(2)本发明提出使用自步学习策略对聚类结果进行筛选，网络训练过程从特征存储单元中的最可靠聚类开始，通过将更多未聚类离群值放进新的聚类簇中进行多次特征聚类，得到更为可靠的聚类结果，逐步提高特征表示的判别性，可以有效缓解伪标签噪声问题且优化特征表示的学习过程；(2) The present invention proposes to use the self-step learning strategy to screen the clustering results. The network training process starts from the most reliable clustering in the feature storage unit, and puts more unclustered outliers into new clustering clusters. Perform multiple feature clustering in the process to obtain more reliable clustering results, and gradually improve the discriminativeness of feature representation, which can effectively alleviate the problem of pseudo-label noise and optimize the learning process of feature representation;

(3)本发明提出多尺度聚类可靠性度量标准，包括聚类独立性和聚类紧密性标准，先从最可靠的聚类簇开始，再逐步增加聚类簇的数量，由简入难，结合自步学习策略逐步创建更可靠的聚类，实现特征存储单元及行人重识别网络的动态优化，大幅提升无监督行人重识别模型的性能。(3) The present invention proposes a multi-scale clustering reliability measurement standard, including clustering independence and clustering tightness standards, starting from the most reliable clustering cluster, and then gradually increasing the number of clustering clusters, from simple to difficult , combined with the self-step learning strategy to gradually create more reliable clusters, realize the dynamic optimization of the feature storage unit and the person re-identification network, and greatly improve the performance of the unsupervised person re-identification model.

附图说明Description of drawings

图1是本发明实施例基于对比聚类的无监督行人重识别方法流程图；1 is a flowchart of an unsupervised pedestrian re-identification method based on contrast clustering according to an embodiment of the present invention;

图2是本发明实施例基于对比聚类的无监督行人重识别方法系统示意图；2 is a system schematic diagram of an unsupervised pedestrian re-identification method based on comparative clustering according to an embodiment of the present invention;

图3是本发明实施例统一对比损失计算示意图；3 is a schematic diagram of a unified comparison loss calculation according to an embodiment of the present invention;

图4是本发明实施例特征存储单元动量更新示意图。FIG. 4 is a schematic diagram of momentum update of a characteristic storage unit according to an embodiment of the present invention.

具体实施方案specific implementation

下面结合实施例与附图对本发明的具体技术方案作进一步详细描述，以下实施例或附图用于说明本发明，但不用于限制本发明的范围。The specific technical solutions of the present invention will be described in further detail below with reference to the embodiments and the accompanying drawings. The following embodiments or drawings are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

系统实施例System embodiment

参考图1-图4，本实施例提供一种基于对比聚类的无监督行人重识别方法，该方法包括以下步骤：Referring to FIG. 1 to FIG. 4 , the present embodiment provides an unsupervised pedestrian re-identification method based on contrastive clustering, and the method includes the following steps:

上述步骤可概述为以下过程：(1)特征编码器及特征存储单元的初始化过程；(2)特征的聚类及筛选过程；(3)统一对比损失计算过程；(4)特征存储单元动量更新过程；(5)整体网络训练过程。The above steps can be summarized as the following processes: (1) initialization process of feature encoder and feature storage unit; (2) feature clustering and screening process; (3) unified contrast loss calculation process; (4) feature storage unit momentum update process; (5) the overall network training process.

下面进行具体描述。A detailed description will be given below.

(1)特征编码器及特征存储单元的初始化过程(1) Initialization process of feature encoder and feature storage unit

使用ResNet-50深度神经网络作为特征编码器f_θ，并使用ImageNet图像数据集上的预训练权重对其进行参数初始化，将最后一层的全连接层替换为批量标准化层和L₂正则化层，以适应无监督任务的需要；Use a ResNet-50 deep neural network as the feature encoder f _θ and initialize its parameters with pretrained weights on the _ImageNet image dataset, replacing the fully connected layer of the last layer with a batch normalization layer and an L2 regularization layer , to accommodate the needs of unsupervised tasks;

在训练过程中，每一个小批量样本包含64张无标注行人图像，至少属于16个不同的类别，在聚类簇和未聚类离群值被视为独立的类别的情况下，每个聚类簇分配4张行人图像，每个未聚类离群值分配单张行人图像，数据集中每张行人图像的尺寸在训练前被调整为256×128，并使用随机翻转、随机裁剪及随机擦除等方式进行数据增强。During training, each mini-batch sample contains 64 unlabeled pedestrian images belonging to at least 16 different classes. In the case where clustered clusters and unclustered outliers are treated as separate classes, each cluster Clusters are assigned 4 pedestrian images, and each unclustered outlier is assigned a single pedestrian image. The size of each pedestrian image in the dataset is resized to 256×128 before training, and random flipping, random cropping, and random erasing are used. Data augmentation is performed by means of division and so on.

使用特征编码器f_θ对行人图像数据集中的样本进行前向计算提取特征，得到特征集合{v₁,…,v_n}，其中n表示行人图像数据集中的样本数量，将特征集合中的特征以实例为单位全部存储到特征存储单元中，使得在聚类和未聚类离群值不断变化的情况下，特征存储单元中的类别特征仍然能够得到持续更新。Use the feature encoder f _θ to perform forward calculation on the samples in the pedestrian image data set to extract features, and obtain a feature set {v ₁ ,...,v _n }, where n represents the number of samples in the pedestrian image data set, and the features in the feature set are All instances are stored in the feature storage unit, so that the category features in the feature storage unit can still be continuously updated when the clustered and unclustered outliers are constantly changing.

(2)特征的聚类及筛选过程(2) Feature clustering and screening process

首先使用DBSCAN聚类算法对步骤1中特征存储单元中的特征集合{v₁,…,v_n}进行聚类，特征存储单元中的类别原型进一步分为聚类质心

及未聚类离群值实例

其中n_c表示聚类的数量，n_o表示未聚类离群值的数量，随后依据聚类独立性与聚类紧密性标准，对聚类的结果进行筛选，并采用k-reciprocal近邻算法对检索结果进行重排序。First, the DBSCAN clustering algorithm is used to cluster the feature set {v ₁ ,...,v _n } in the feature storage unit in step 1, and the category prototypes in the feature storage unit are further divided into cluster centroids

and unclustered outlier instances

Among them, n _c represents the number of clusters, and n _o represents the number of unclustered outliers. Then, according to the criteria of cluster independence and cluster tightness, the results of the clustering are screened, and the k-reciprocal nearest neighbor algorithm is used to analyze the results. Search results are reordered.

在DBSCAN算法中，将最大∈-邻域距离阈值设为d＝0.6，将最少邻域点数量设为4，即一个聚类簇最少包含4个样本。通过调整DBSCAN聚类算法中样本的∈-邻域距离阈值，交替放宽与缩紧聚类标准，得到更为可靠的聚类结果，调整邻域距离d＝0.58及d＝0.62实现聚类标准的放宽与缩紧。In the DBSCAN algorithm, the maximum ε-neighborhood distance threshold is set to d=0.6, and the minimum number of neighbor points is set to 4, that is, a cluster contains at least 4 samples. By adjusting the ε-neighborhood distance threshold of the samples in the DBSCAN clustering algorithm, and alternately relaxing and tightening the clustering criteria, more reliable clustering results are obtained. Relax and tighten.

特征存储单元中的聚类和未聚类实例都被看作平等且独立的类，故聚类的可靠性对训练的影响至关重要，且网络在训练的一开始对于图像的辨别性较差，聚类的噪声也较大，因此提出自步学习策略改善聚类的效果。具体地，在每轮训练开始前重新进行聚类，从最可靠的聚类开始，保留可靠的聚类，而将不可靠的聚类拆解回无聚类的离群值实例中，逐步增加聚类的数。Both clustered and unclustered instances in the feature storage unit are regarded as equal and independent classes, so the reliability of clustering is critical to the impact of training, and the network has poor discrimination of images at the beginning of training. , the noise of clustering is also large, so a self-step learning strategy is proposed to improve the effect of clustering. Specifically, re-clustering is performed before the start of each round of training, starting from the most reliable clusters, retaining the reliable clusters, and disassembling the unreliable clusters back into the cluster-free outlier instances, gradually increasing number of clusters.

其中|·|表示集合中特征的数量，I(f_i)表示同一个簇中的样本集合，I_loose(f_i)表示放宽聚类标准后同一个簇中的样本集合，R_indep(f_i)表示簇I(f_i)的独立性得分；where |·| represents the number of features in the set, I(fi ) represents the set of samples in the same cluster, I _loose (fi ₎ represents the set of samples in the same cluster after relaxing the clustering criteria, R _indep (f _i ₎ ) represents the independence score of cluster I(f _i );

其中I_tight(f_i)表示缩紧聚类标准后同一个簇中的样本集合，R_comp(f_i)表示簇I(f_i)的紧密性得分；where I _tight ( _fi ) represents the set of samples in the same cluster after tightening the clustering criteria, and R _comp ( _fi ) represents the tightness score of the cluster I ( _fi );

通过以上聚类可靠性评价标准实现聚类之间的独立性及样本之间的紧密性的度量，该聚类可靠性评价标准的出发点在于，一个可靠的聚类应当在多尺度的聚类环境下保持稳定，设置超参数α,β∈[0,1]表示独立性和紧密性阈值，其中α被初始化为α＝0.9*R_indep-1th，在此后的训练中保持一致，R_indep-1th表示第一次训练轮回中求得的聚类独立性标准，β被设置为整个训练过程中的最大聚类紧密性标准R_comp-max，以充分保留每个聚类簇中相距最紧密的样本，保留类间独立性R_indep(f_i)>α且类内紧密性R_comp(f_i)>β的聚类样本，将其余样本划分为未聚类的离群值。The independence between clusters and the closeness between samples can be measured through the above cluster reliability evaluation criteria. The starting point of the cluster reliability evaluation criteria is that a reliable cluster should be in a multi-scale clustering environment. It remains stable under the hyperparameter α,β∈[0,1] to denote the independence and closeness thresholds, where α is initialized as α=0.9*R _indep-1th , and remains consistent in subsequent training, R _indep-1th Represents the cluster independence criterion obtained in the first training round, β is set as the maximum cluster closeness criterion R _comp-max in the whole training process, in order to fully retain the most closely spaced samples in each cluster , retain the clustered samples for which the inter-class independence R _indep ( _fi )>α and the intra-class compactness R _comp ( _fi )>β, and the remaining samples are divided into unclustered outliers.

(3)统一对比损失函数计算过程(3) Unified Contrast Loss Function Calculation Process

给定无标签的训练样本

通过特征编码器编码后将特征全部存储到特征存储单元中，使自步学习策略将特征集合划分为聚类特征和未聚类离群值特征，从而整个训练数据集被分为有聚类伪标签的样本集合

和不属于任何聚类的离群值实例样本集合

且

使用特征存储单元用于存储样本特征的正面原型，对于一个来自无标注数据集的样本，若其在聚类内，则正面原型为其所对应的聚类质心，反之，若其不在聚类内，为聚类离群值，则正面原型为该离群值所对应的实例特征。Given unlabeled training samples

After encoding by the feature encoder, all the features are stored in the feature storage unit, so that the self-paced learning strategy divides the feature set into clustered features and unclustered outlier features, so that the entire training data set is divided into clustered pseudo A sample collection of labels

and a sample set of outlier instances that do not belong to any cluster

and

The feature storage unit is used to store the positive prototype of the sample feature. For a sample from an unlabeled data set, if it is in the cluster, the positive prototype is the corresponding cluster centroid, otherwise, if it is not in the cluster. , is a cluster outlier, and the positive prototype is the instance feature corresponding to the outlier.

如图3统一对比损失计算示意图所示，给定训练样本

使用特征编码器对每个训练样本进行前向计算得到编码的特征f，将特征f与其对应的类别原型进行向量点积运算计算相似度，构建统一对比损失函数：As shown in Figure 3, the schematic diagram of the unified comparison loss calculation, given the training sample

Use the feature encoder to perform forward calculation on each training sample to obtain the encoded feature f, and perform the vector dot product operation between the feature f and its corresponding category prototype to calculate the similarity, and construct a unified comparison loss function:

其中，z⁺表示特征f的正面类别原型，τ表示温度系数，根据经验设为0.05，<·,·>表示向量内积，c_k为当前聚类k的质心，表示聚类内的类别原型，v_k为当前聚类k离群点的实例特征，表示无聚类的类别原型；Among them, z ⁺ represents the positive category prototype of the feature f, τ represents the temperature coefficient, which is set to 0.05 according to experience, <·,·> represents the vector inner product, _ck is the centroid of the current cluster k, which represents the category prototype within the cluster , v _k is the instance feature of the current cluster k outliers, representing the class prototype without clustering;

若f属于聚类k，那么z⁺＝c_k为聚类k的质心；若f属于未聚类离群值，那么z⁺＝v_k未聚类离群值实例特征，上述对比损失促进编码器特征靠近其真实类别，对小批量样本编码特征后，与两种类别原型进行比较，使得每个训练样本靠近它所属于的类别而远离其他类别。If f belongs to cluster k, then z ⁺ =c _k is the centroid of cluster k; if f belongs to unclustered outliers, then z ⁺ =v _k is the instance feature of unclustered outliers, and the above contrast loss promotes coding After encoding the features of the mini-batch samples, they are compared with the prototypes of the two categories, so that each training sample is close to the category it belongs to and far away from other categories.

(4)特征存储单元动量更新过程(4) Momentum update process of feature storage unit

如图4特征存储单元动量更新示意图所示，首先将训练样本全部以实例为单位进行特征存储，此后将每个小批量样本中的特征根据索引以动量更新的方式累加到特征存储单元对应的实例特征中，动量更新方式广泛应用于深度学习中的优化算法，可以使参数更新时在一定程度上保留之前更新的方向，同时又利用当前小批量中样本的特征微调最终的更新方向，简言之就是通过积累之前的动量来实现当前的特征的更新。As shown in the schematic diagram of the momentum update of the feature storage unit in Figure 4, firstly, the training samples are all stored in the instance unit, and then the features in each mini-batch sample are accumulated to the corresponding instance of the feature storage unit in the form of momentum update according to the index. Among the features, the momentum update method is widely used in the optimization algorithm in deep learning, which can keep the previous update direction to a certain extent when the parameters are updated, and at the same time use the characteristics of the samples in the current mini-batch to fine-tune the final update direction, in short It is to achieve the update of the current feature by accumulating the previous momentum.

在特征存储单元中，对同一聚类簇中的特征{v₁,…,v_n}计算特征之间的平均值得到聚类质心

而未聚类离群值的实例特征

直接从特征存储单元中提取剩余的实例特征，不失一般性，假定特征集合{v}中的未聚类离群值特征索引为{1,…,n₀}，那么聚类特征索引为{n_c+1,n}，第k个聚类的质心表示为：In the feature storage unit, the average value between the features is calculated for the features {v ₁ ,...,v _n } in the same cluster to obtain the cluster centroid

while the instance features of the unclustered outliers

Extract the remaining instance features directly from the feature storage unit, without loss of generality, assuming that the unclustered outlier feature index in the feature set {v} is {1,...,n ₀ }, then the clustering feature index is { n _c +1,n}, the centroid of the kth cluster is expressed as:

其中I_k表示第k个簇的特征向量集合，|·|表示集合中特征向量的数量，特征存储单元中的实例特征{v}最初通过网络前向计算初始化一次，在此后的训练过程中不断进行更新以进行更为鲁棒的聚类；where I _k represents the feature vector set of the k-th cluster, |·| represents the number of feature vectors in the set, and the instance feature {v} in the feature storage unit is initially initialized once by the network forward calculation, and is continuously performed in the subsequent training process. Update for more robust clustering;

v_i←mv_i+(1-m)f_i v _i ←mv _i +(1-m)f _i

其中m∈[0,1]为动量因子，表示实例特征v_i及样本特征f_i在动量更新过程中所占的比重，根据经验设为0.2，给定更新后的v_i，若f_i属于聚类簇k，那么对应的聚类质心c_k相应进行更新。where m∈[0,1] is the momentum factor, which represents the proportion of instance feature _vi and sample feature _fi in the momentum update process, which is set to 0.2 according to experience. Given the updated _vi , if _fi belongs to If the cluster k is clustered, then the corresponding cluster centroid _ck is updated accordingly.

(5)整体网络训练过程(5) Overall network training process

对于无标注训练数据集

通过ResNet-50特征编码器进行特征编码，使用得到的特征集合初始化特征存储单元，使用Adam优化器进行模型参数的更新，权重衰减设为0.0005，训练轮回数设为70，采用学习率调度的方式进行模型的训练，以动态的方式来响应优化的进展情况，初始学习率设为0.00035，每20个训练轮回将学习率减小10倍。For unlabeled training datasets

Use the ResNet-50 feature encoder for feature encoding, use the obtained feature set to initialize the feature storage unit, use the Adam optimizer to update the model parameters, set the weight attenuation to 0.0005, set the number of training rounds to 70, and use the learning rate scheduling method The model is trained to respond to the progress of the optimization in a dynamic way, with the initial learning rate set to 0.00035 and decreasing the learning rate by a factor of 10 every 20 training epochs.

划分为聚类样本

和无聚类离群值样本

计算

Divide into cluster samples

and cluster-free outlier samples

calculate

the cluster centroids in ;

更新特征存储单元中的特征集合v，依据更新后的特征集合{v}再进行聚类簇质心c_k的更新；Update the feature set v in the feature storage unit, and then update the cluster centroid _ck according to the updated feature set {v};

最后应说明的是，本实施例提供了一种基于对比聚类的无监督行人重识别方法的理论前提、实施步骤和参数设定。此外，本实施仅为本发明较佳的具体实施方式，具体的实施过程中还需要根据具体的变量和数据调整参数的设置，以达到更好的实际效果。Finally, it should be noted that this embodiment provides the theoretical premise, implementation steps and parameter settings of an unsupervised pedestrian re-identification method based on contrastive clustering. In addition, this implementation is only a preferred specific implementation of the present invention, and in the specific implementation process, parameter settings need to be adjusted according to specific variables and data to achieve better practical effects.

Claims

1. an unsupervised pedestrian re-identification method based on contrast clustering, is characterized in that, comprises the following steps:

Step 1: Use the initial feature encoder to perform forward calculation on the unlabeled pedestrian image data set, and use the encoded features to initialize the feature storage unit based on the category prototype;

Step 2: Before each round of training, perform clustering of the encoded features in the feature storage unit, and screen the clustering results according to the clustering reliability evaluation criteria;

Step 3: Use the feature encoder to perform feature encoding on each group of mini-batch training samples, use the unified contrast loss function to perform back-propagation of the network, and update the feature encoder;

Step 4: Dynamically update the feature storage unit in a momentum update manner using the encoded features;

Step 5: Repeat steps 2 to 4 according to the number of training rounds until the pedestrian re-identification network converges.

2. a kind of unsupervised pedestrian re-identification method based on contrast clustering according to claim 1, is characterized in that, described step 1 comprises:

Use a ResNet-50 deep neural network as a feature encoder and initialize it with pretrained weights on the ImageNet image dataset;

Use the feature encoder to extract the features of the samples in the pedestrian image data set to obtain the feature set {v ₁ ,...,v _n }, and save all the sample features to the feature storage unit in the unit of instance.

3. a kind of unsupervised pedestrian re-identification method based on contrast clustering according to claim 1, is characterized in that, described step 2 comprises:

Use the DBSCAN clustering algorithm to cluster the feature set {v ₁ ,...,v _n } in the feature storage unit in step 1, and the category prototypes in the feature storage unit are further divided into cluster centroids

and unclustered outlier instances

where n _c represents the number of clusters, n _o represents the number of unclustered outliers, using a self-paced learning strategy combined with cluster independence and cluster tightness criteria to retain reliable clusters, and Features in reliable clusters are disassembled back into cluster-free outlier instances.

4. The method according to claim 3, wherein the self-paced learning strategy re-clusters before the start of each round of training, starting from the most reliable cluster, and gradually increasing the number of clusters, through Adjust the sample neighborhood distance threshold in the DBSCAN clustering algorithm, and alternately relax and tighten the clustering criteria;

The cluster independence criterion is used to measure the distance between classes, which is expressed as the intersection ratio between the feature set and the feature set after relaxing the clustering criteria:

where |·| represents the number of features in the set, I(fi ) represents the set of samples in the same cluster, I _loose (fi ₎ represents the set of samples in the same cluster after relaxing the clustering criteria, R _indep (f _i ₎ ) represents the independence score of cluster I(f _i );

The cluster tightness criterion is used to measure the intra-class distance, which is expressed as the intersection ratio between the feature set and the feature set after the clustering criterion is compressed:

where I _tight ( _fi ) represents the set of samples in the same cluster after tightening the clustering criteria, and R _comp ( _fi ) represents the tightness score of the cluster I ( _fi );

Through the above cluster reliability evaluation criteria, the independence and closeness of the data in the cluster are measured, and α,β∈[0,1] is set to represent the threshold of independence and closeness, and the inter-class independence R _comp (f _i )>α and the intra-cluster tightness R _indep (f _i )>β of the clustered cluster samples, and the remaining samples are divided into unclustered outliers.

5. a kind of unsupervised pedestrian re-identification method based on contrast clustering according to claim 1, is characterized in that, described step 3 comprises:

Given unlabeled training samples

Divide it into sample sets with clustered pseudo-labels using the self-paced learning strategy in step 2

and a sample set of outlier instances that do not belong to any cluster

and

Given training samples

Use the feature encoder to perform forward calculation to obtain the encoded feature f, and construct a unified contrastive loss function:

Among them, z ⁺ represents the positive category prototype of the feature f, τ represents the temperature coefficient, <·,·> represents the vector inner product, _ck is the cluster centroid of the current cluster, which represents the category prototype within the cluster, and v _k is the current cluster. Instance features of class outliers, representing class prototypes without clustering;

After encoding the features of the mini-batch samples, they are compared with the two class prototypes, so that each training sample is close to the class it belongs to and far away from other classes.

6. a kind of unsupervised pedestrian re-identification method based on contrast clustering according to claim 1, is characterized in that, described step 4 comprises:

In the feature storage unit, the average value between the features is calculated for the features {v ₁ ,...,v _n } in the same cluster to obtain the cluster centroid

while the instance features of the unclustered outliers

where I _k represents the feature vector set of the k-th cluster, |·| represents the number of feature vectors in the set, and the instance feature {v} in the feature storage unit is initially initialized by the network forward calculation once, and is used in the subsequent training continuously updated in the process;

All training samples are initialized to the feature storage unit in units of instances. In each subsequent round of training, the features in the current mini-batch sample are accumulated into the instance features corresponding to the feature storage unit according to the index, and the features in the mini-batch sample are used. The encoded features dynamically update the class prototypes in the feature storage unit in a momentum update manner:

v _i ←mv _i +(1-m)f _i

Where m∈[0,1] momentum factor, given the updated instance feature vi, if _{f i} _belongs to cluster k, then the corresponding cluster centroid _ck needs to be updated accordingly.

7. a kind of unsupervised pedestrian re-identification method based on contrast clustering according to claim 1, is characterized in that, described step 5 comprises:

In each training round, the features in the feature storage unit are clustered according to the cluster independence and cluster tightness criteria, and the training samples are

Divide samples into clusters

and cluster-free outlier samples

calculate

Cluster centroids in ;

For the training samples in each mini-batch, use the feature encoder for feature encoding, calculate the unified contrast loss, and perform backpropagation to update the encoder;

Update the feature set in the feature storage unit according to the category prototype momentum update method, and then update the cluster centroid in combination with the updated feature set;

The feature encoder and feature storage unit are updated cyclically until the model achieves better convergence.