CN115620338A

CN115620338A - Method and device for re-identifying pedestrians who change clothes guided by black clothes and head images

Info

Publication number: CN115620338A
Application number: CN202211258905.XA
Authority: CN
Inventors: 袁彩虹; 邹明东; 苏晨爽; 周玉洁; 许元辰; 关志杰
Original assignee: Henan University
Current assignee: Henan University
Priority date: 2022-10-14
Filing date: 2022-10-14
Publication date: 2023-01-17

Abstract

The invention discloses a method and device for re-recognition of pedestrians who change clothes guided by black clothes and head images. The method includes: firstly, using a newly designed method of occluding clothes to process the original image to obtain the corresponding black clothes image. The obtained black clothes image is placed in the black clothes branch for pre-training; then the framework is jointly learned, the original pedestrian image is put into the original branch and the pre-trained black clothes branch is used to guide the original branch learning; at the same time, the pedestrian head image Put into the head branch to obtain finer-grained pedestrian features. The present invention blocks the clothes of all pedestrians, thereby obtaining the image of black clothes, making the clothes of pedestrians uniform in color, so that the model pays more attention to the parts other than the color of the clothes, thereby improving the robustness of the model; and can effectively use the information, effectively reducing the information loss in the process of acquiring black clothes images, and improving the robustness of features.

Description

Method and device for re-identifying pedestrians who change clothes guided by black clothes and head images

技术领域technical field

本发明涉及行人重识别技术领域，尤其涉及一种由黑衣和头部图像指导的换衣行人重识别方法及装置。The invention relates to the technical field of pedestrian re-identification, in particular to a method and device for re-identifying pedestrians who change clothes and are guided by black clothes and head images.

背景技术Background technique

行人重识别的目的是解决不同条件下的行人检索问题，如不同的相机、不同的灯光或不同的观察角度。行人重识别研究有多种方法，如轻量级网络、域泛化、无监督学习等子领域，近年来都取得了不错的效果。但以上这些方法一般都假设一个人的衣服在长时间内保持一致。The purpose of person re-identification is to solve the problem of person retrieval under different conditions, such as different cameras, different lights or different viewing angles. There are many methods for pedestrian re-identification research, such as lightweight networks, domain generalization, unsupervised learning and other sub-fields, all of which have achieved good results in recent years. But these methods generally assume that a person's clothing remains consistent over a long period of time.

但在现实世界中，人们的衣服不会一成不变。例如，人们在很长一段时间内总是穿着不同的衣服，而一些嫌疑人可能会在短时间内通过改变他们的衣服来逃避追踪。因此，一个不同版本的行人重识别问题被提出来了，它被称为长期换衣行人重识别，并成为当今的一个热点问题。从长远来看，换衣行人重识别是当今的一个热点问题。解决换衣行人重识别的核心是提取仅与身份相关的具有鉴别性的相关特征。为了去除衣服的干扰项，研究人员通常采用两种通用策略。But in the real world, people's clothes don't stay the same. For example, people always wear different clothes for a long time, and some suspects may evade tracking by changing their clothes in a short period of time. Therefore, a different version of the person re-ID problem is proposed, which is called long-term clothes-changing person re-ID, and has become a hot topic nowadays. In the long run, re-identification of people changing clothes is a hot issue nowadays. The core of solving clothes-changing person re-identification is to extract discriminative relevant features that are only related to identity. To remove distractors from clothes, researchers generally employ two general strategies.

第一是数据策略。常见的方法是构建一个大规模的数据集，其中每个人都应该有多张具有大量不同衣服的图片，然后强迫模型从这些图片中学习与衣服无关的特征。然而，纯粹靠人力来构建这样的换衣数据集是非常艰巨的，几乎不可能。因此，一些研究人员利用GAN或其他方式来扩展原始数据集。The first is data strategy. A common approach is to build a large-scale dataset where each person should have multiple pictures with a large number of different clothes, and then force the model to learn clothes-independent features from these pictures. However, constructing such a clothes-changing dataset purely by human power is very arduous, almost impossible. Therefore, some researchers utilize GAN or other means to extend the original dataset.

第二是特征分离策略。常见的操作是将服装特征与其他身份特征分开。通过这样做，除了衣服以外的其他特征可以被用于身份判断。例如，杨等人采用行人的轮廓作为查询和图库，并利用极坐标来更好地获得行人的轮廓特征。然而，尽管从轮廓中学习可以获得与衣服无关的特征，但它同时也抛弃了很大一部分与衣服无关的特征(如头部)。除此之外，洪等人提出使用外观分支和形状分支来提取细粒度特征。然而，这种方法经常受到衣服不同颜色的影响，不能提取更多与衣服无关的稳健特征。The second is the feature separation strategy. A common operation is to separate clothing features from other identity features. By doing this, other features besides clothes can be used for identity judgment. For example, Yang et al. adopted the silhouette of pedestrians as a query and gallery, and utilized polar coordinates to better obtain pedestrian silhouette features. However, although learning from silhouettes can obtain clothing-independent features, it simultaneously discards a large part of clothing-independent features (such as the head). Besides, Hong et al. proposed to use appearance branch and shape branch to extract fine-grained features. However, this method is often affected by different colors of clothes and cannot extract more robust features that are not related to clothes.

现有技术主要存在的问题：The main problems of the prior art:

1.现有的换衣行人重识别方法存在着需要大量的图像生成工作，需要大量的训练时间。1. The existing re-identification methods for people changing clothes require a lot of image generation work and a lot of training time.

2.现有的换衣行人重识别方法经常受到衣服不同颜色的影响，不能提取更多的与衣服无关的稳健特征。2. Existing re-identification methods for people changing clothes are often affected by different colors of clothes, and cannot extract more robust features that are not related to clothes.

3.现有的换衣行人重识别方法多数采用传统的卷积神经网络作为训练网络，传统的卷积神经网络由于下采样和池化的原因带来了一定的损失。3. Most of the existing re-identification methods for people changing clothes use the traditional convolutional neural network as the training network. The traditional convolutional neural network brings a certain loss due to downsampling and pooling.

4.现有的换衣行人重识别方法多数忽略了头部特征对整体判断的影响。4. Most of the existing re-identification methods for people changing clothes ignore the influence of head features on the overall judgment.

发明内容Contents of the invention

本发明针对背景技术中上述问题，提出一种由黑衣和头部图像指导的换衣行人重识别方法及装置，使用非GAN的方法来提取图像中与衣服无关的特征，提出了新的遮挡衣服策略使所有行人的衣服趋于一致，强制模型去学习与衣服无关的稳健特征，采用改进后的Transformer作为训练网络，并单独设计了头部分支来获取原始图像的细粒度头部特征。Aiming at the above-mentioned problems in the background technology, the present invention proposes a method and device for re-identifying pedestrians who change clothes guided by black clothes and head images, uses a non-GAN method to extract features that have nothing to do with clothes in the image, and proposes a new occlusion The clothes strategy makes the clothes of all pedestrians tend to be consistent, and forces the model to learn robust features that have nothing to do with clothes. The improved Transformer is used as the training network, and the head branch is designed separately to obtain the fine-grained head features of the original image.

为了实现上述目的，本发明采用以下技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

本发明一方面提出一种由黑衣和头部图像指导的换衣行人重识别方法，包括：On the one hand, the present invention proposes a method for re-identifying pedestrians who change clothes guided by black clothes and head images, including:

步骤1：从原始行人图像中删除衣服特征，得到黑色的衣服行人图像；Step 1: Remove the clothes feature from the original pedestrian image to get a black clothed pedestrian image;

步骤2：使用预训练的HRNet对原始行人图像进行处理，得到原始行人图像中的行人头部图像；Step 2: Use the pre-trained HRNet to process the original pedestrian image to obtain the pedestrian head image in the original pedestrian image;

步骤3：构建换衣行人重识别网络，该网络由三个网络分支组成，分别为原始分支、黑衣分支、头部分支，分别用于学习原始行人图像特征、黑色的衣服行人图像特征、行人头部图像特征；三个网络分支主干网络结构相同，但不共享参数；Step 3: Build a re-recognition network for pedestrians in clothes, which consists of three network branches, namely the original branch, the black branch, and the head branch, which are used to learn the original pedestrian image features, black clothes pedestrian image features, and pedestrian Head image features; the three network branches have the same backbone network structure, but do not share parameters;

步骤4：将步骤1得到的黑色的衣服行人图像输入黑衣分支中进行训练，得到预训练的黑衣分支；Step 4: Input the black-clothed pedestrian image obtained in step 1 into the black-clothed branch for training, and obtain the pre-trained black-clothed branch;

步骤5：在预训练的黑衣分支的指导下训练原始分支，得到原始行人图像中与行人相关但与衣服不相关的特征；Step 5: Train the original branch under the guidance of the pre-trained black branch to obtain the features related to pedestrians but not related to clothes in the original pedestrian image;

步骤6：将步骤2得到行人头部图像输入头部分支中进行学习，并将学习到的行人头部图像特征与步骤5得到的与行人相关但与衣服不相关的特征相结合，得到与行人相关但与衣服不相关的总体特征，完成换衣行人重识别网络训练；Step 6: Input the pedestrian head image obtained in step 2 into the head branch for learning, and combine the learned pedestrian head image features with the features related to pedestrians but not related to clothes obtained in step 5 to obtain pedestrian The overall features that are relevant but not related to clothes, complete the training of the re-identification network for people changing clothes;

步骤7：基于训练后的换衣行人重识别网络进行换衣行人重识别。Step 7: Perform re-identification of people changing clothes based on the trained re-identification network for people changing clothes.

进一步地，所述步骤1包括：Further, said step 1 includes:

采用预训练好的的人体解析模型，来获得原始行人图像中行人的各身体部位图像，并将得到的各身体部位图像重新组合，得到六个部分：背景、头部、上衣、裤子、手臂和腿，从中提取上衣和裤子图像的像素，形成衣服区域；将衣服区域的所有像素设置为零，得到黑色的衣服行人图像。Use the pre-trained human body analysis model to obtain the body part images of pedestrians in the original pedestrian image, and recombine the obtained body part images to obtain six parts: background, head, top, pants, arms and Legs, from which the pixels of the top and trousers images are extracted to form the clothes region; all pixels in the clothes region are set to zero, resulting in a black clothed pedestrian image.

进一步地，三个网络分支主干网络均为imViT网络，在每个imViT网络中，有两个输出分别用于提取全局特征和局部特征，主干网络由全局特征和局部特征上的三元组损失和ID损失分别进行优化，其中ID损失为没有标签平滑的交叉熵损失。Furthermore, the backbone networks of the three network branches are all imViT networks. In each imViT network, two outputs are used to extract global features and local features respectively. The backbone network consists of triplet loss on global features and local features and The ID loss is optimized separately, where the ID loss is the cross-entropy loss without label smoothing.

进一步地，所述步骤5包括：Further, said step 5 includes:

根据知识蒸馏算法，在预训练的黑衣分支的指导下训练原始图像的分支，采用均方差损失来规范原始分支的训练，以训练出更多与身份相关但与衣服不相关的特征。According to the knowledge distillation algorithm, the branch of the original image is trained under the guidance of the pre-trained black branch, and the mean square error loss is used to standardize the training of the original branch to train more features related to identity but not related to clothes.

本发明另一方面提出一种由黑衣和头部图像指导的换衣行人重识别装置，包括：Another aspect of the present invention proposes a device for re-identifying pedestrians who change clothes guided by black clothes and head images, including:

黑衣图像得出模块，用于从原始行人图像中删除衣服特征，得到黑色的衣服行人图像；The image of the black clothes is derived from the module, which is used to delete the clothes feature from the original pedestrian image, and obtain the black clothes pedestrian image;

头部图像得出模块，用于使用预训练的HRNet对原始行人图像进行处理，得到原始行人图像中的行人头部图像；The head image obtains the module, is used to process the original pedestrian image using pre-trained HRNet, and obtains the pedestrian head image in the original pedestrian image;

换衣行人重识别网络构建模块，用于构建换衣行人重识别网络，该网络由三个网络分支组成，分别为原始分支、黑衣分支、头部分支，分别用于学习原始行人图像特征、黑色的衣服行人图像特征、行人头部图像特征；三个网络分支主干网络结构相同，但不共享参数；Clothes-changing pedestrian re-identification network building module is used to build a clothing-changing pedestrian re-identification network. The network consists of three network branches, namely the original branch, the black branch, and the head branch, which are used to learn the original pedestrian image features, Image features of pedestrians in black clothes, image features of pedestrians' heads; the backbone network structure of the three network branches is the same, but do not share parameters;

黑衣分支训练模块，用于将黑衣图像得出模块得到的黑色的衣服行人图像输入黑衣分支中进行训练，得到预训练的黑衣分支；The black clothing branch training module is used to input the black clothing pedestrian image obtained by the black clothing image module into the black clothing branch for training to obtain the pre-trained black clothing branch;

原始分支训练模块，用于在预训练的黑衣分支的指导下训练原始分支，得到原始行人图像中与行人相关但与衣服不相关的特征；The original branch training module is used to train the original branch under the guidance of the pre-trained black branch, and obtain the features related to pedestrians but not related to clothes in the original pedestrian image;

头部分支训练模块，用于将头部图像得出模块得到行人头部图像输入头部分支中进行学习，并将学习到的行人头部图像特征与原始分支训练模块得到的与行人相关但与衣服不相关的特征相结合，得到与行人相关但与衣服不相关的总体特征，完成换衣行人重识别网络训练；The head branch training module is used to obtain the head image obtained by the head image module and input the head image into the head branch for learning, and the learned pedestrian head image features are related to the pedestrian but not related to the original branch training module. Combining the features that are not related to clothes, the overall features that are related to pedestrians but not related to clothes are obtained, and the training of re-identification network for pedestrians who change clothes is completed;

换衣行人重识别模块，用于基于训练后的换衣行人重识别网络进行换衣行人重识别。The re-identification module of people changing clothes is used for re-identifying people changing clothes based on the trained re-identification network for people changing clothes.

进一步地，所述黑衣图像得出模块具体用于：Further, the black clothes image obtaining module is specifically used for:

进一步地，所述原始分支训练模块具体用于：Further, the original branch training module is specifically used for:

与现有技术相比，本发明具有的有益效果：Compared with the prior art, the present invention has the beneficial effects:

1.相比较于利用GAN生成大量换衣图像扩充数据集的方法，本发明使用非GAN的方式。利用人体语义解析模型获得人体的上衣和裤子部分，并对其使用本发明的方法进行遮挡，这样的方式不用扩充数据集就能使模型更集中地学习到与衣服无关的特征，既节省了空间又节省了时间。1. Compared with the method of using GAN to generate a large number of clothes-changing image expansion data sets, the present invention uses a non-GAN method. Use the human body semantic analysis model to obtain the top and trousers of the human body, and use the method of the present invention to block them. In this way, the model can learn features that have nothing to do with clothes more intensively without expanding the data set, which saves space. Another time saver.

2.相比较于其他衣服特征和身份特征分开的方法，本发明将所有行人的衣服进行遮挡，从而获得了黑衣图像，这样就使得行人的衣服颜色统一，从而使模型更加关注衣服颜色以外的部分，从而提高模型健壮性。2. Compared with other methods that separate clothing features and identity features, the present invention blocks all pedestrians' clothes to obtain black clothes images, which makes the pedestrians' clothes uniform in color, so that the model pays more attention to things other than the color of clothes part, thereby improving the robustness of the model.

3.相比较于直接从排除衣服后图像中学习特征，本发明利用黑衣分支，指导原始分支直接从原始RGB图像中学习到衣服无关的身份特征，这样能够有效地利用原始图像中的信息，有效减少黑衣图像获取过程中的信息丢失，提高特征的鲁棒性。3. Compared with learning features directly from the image after excluding clothes, the present invention utilizes the black branch to guide the original branch to learn clothes-independent identity features directly from the original RGB image, so that the information in the original image can be effectively used, Effectively reduce the information loss in the process of acquiring black clothes images and improve the robustness of features.

4.相比较于直接从原始图像提取衣服无关特征的方法，本发明通过增加头部图像块的专门处理，从而能够提取到更具判别性的细粒度特征，对全局特征有较好的补充作用。4. Compared with the method of directly extracting clothing-independent features from the original image, the present invention can extract more discriminative fine-grained features by adding special processing of the head image block, which has a good supplementary effect on the global features .

5.在PRCC数据集上的测试结果表明本发明的方法取得优秀的换衣行人重识别效果。5. The test results on the PRCC data set show that the method of the present invention achieves excellent re-identification results for people changing clothes.

附图说明Description of drawings

图1为本发明实施例一种由黑衣和头部图像指导的换衣行人重识别方法的基本流程图；Fig. 1 is a basic flowchart of a method for re-identifying pedestrians who change clothes guided by black clothes and head images according to an embodiment of the present invention;

图2为本发明实施例构建的换衣行人重识别网络架构示意图；FIG. 2 is a schematic diagram of a re-identification network architecture for clothes-changing pedestrians constructed in an embodiment of the present invention;

图3为本发明实施例得出的特征激活图对比示例；Fig. 3 is a comparison example of feature activation maps obtained in the embodiment of the present invention;

图4为本发明实施例一种由黑衣和头部图像指导的换衣行人重识别装置的结构示意图。Fig. 4 is a schematic structural diagram of a device for re-identifying people who change clothes guided by black clothes and head images according to an embodiment of the present invention.

具体实施方式detailed description

下面结合附图和具体的实施例对本发明做进一步的解释说明：The present invention will be further explained below in conjunction with accompanying drawing and specific embodiment:

如图1所示，一种由黑衣和头部图像指导的换衣行人重识别方法，包括：As shown in Figure 1, a method for re-identification of pedestrians who change clothes guided by black clothes and head images, including:

步骤1：从原始行人图像中删除衣服特征，得到黑色的衣服行人图像(简称为黑衣图像)；Step 1: Delete the clothes feature from the original pedestrian image to get the black clothes pedestrian image (referred to as the black image);

步骤3：构建换衣行人重识别网络，该网络由三个网络分支组成，分别为原始分支、黑衣分支、头部分支，如图2所示，分别用于学习原始行人图像特征、黑色的衣服行人图像特征、行人头部图像特征；三个网络分支主干网络结构相同，但不共享参数，以更好地利用辨别特征空间；Step 3: Build a re-recognition network for pedestrians who change clothes. The network consists of three network branches, namely the original branch, the black branch, and the head branch. As shown in Figure 2, they are used to learn the original pedestrian image features, black Clothes pedestrian image features, pedestrian head image features; the three network branches have the same backbone network structure, but do not share parameters to better utilize the discriminative feature space;

进一步地，所述步骤1包括：Further, said step 1 includes:

进一步地，所述步骤5包括：Further, said step 5 includes:

具体地，该方法包括：Specifically, the method includes:

1.获取黑衣图像1. Get the black image

为了消除衣服在特征提取中的影响，我们从原始图像中删除了衣服特征，得到了黑色的衣服行人图像。首先，我们使用HRNet，它是一个预训练好的的人体解析模型，来获得身体各部位图像。这个模型的预测结果将身体分为20个部分，由于这个模型划分了20个部分，我们将它们重新组合，得到六个部分：背景、头部、上衣、裤子、手臂和腿。我们从中提取上衣和裤子，形成衣服区域。我们只使用其中的两个(上衣、裤子)。In order to eliminate the influence of clothes in feature extraction, we remove the clothes features from the original image, and get black clothes pedestrian images. First, we use HRNet, which is a pre-trained human parsing model, to obtain images of body parts. The prediction result of this model divides the body into 20 parts, and since this model divides the 20 parts, we recombine them to get six parts: background, head, top, pants, arms and legs. We extract the top and pants from it, forming the clothes area. We only use two of them (top, pants).

首先，对于给定输入的批量样本x_i[i＝1....B]，其中B是批量大小，x_i实际上是一个图像，x_i∈R^H×W×C，其中H、W、C分别表示其高度、宽度、通道数。首先，我们将x_i通过人体解析模型解析的语义图表示为s_i[i＝1....B]，s_i∈R^1×H×W。s_i的像素值定义为s_i∈{0,1,2,3,4,5}分别代表身体的六个部分。s_i的每个像素都被定义为0,1,2,3,4或5，分别代表身体的六个部分。First, for a given input batch sample x _i [i=1....B], where B is the batch size, x _i is actually an image, x _i ∈ R ^H×W×C , where H, W , C represent its height, width, channel number respectively. First, we denote the semantic graph of x _i parsed by the human body parsing model as s _i [i=1....B], s _i ∈ R ^1×H×W . The pixel values of s _i are defined as s _i ∈ {0,1,2,3,4,5} representing the six parts of the body respectively. Each pixel of _si is defined as 0, 1, 2, 3, 4 or 5, representing six parts of the body, respectively.

其次，获得上衣和裤子的像素。x_i的每个像素被定义为v_j，有c个值，每个输入样本x_i总共有W×H像素向量。而我们将x_i中上衣和裤子的所有像素向量表示为：Second, get the pixels for the top and pants. Each pixel of _xi is defined as _vj with c values, for a total of W×H pixel vectors for each input sample _xi . And we represent all pixel vectors of tops and pants in _xi as:

B(clothes and pants)＝{v_j|v_j＝x_i[s_i＝＝2||s_i＝＝3],i∈[1,B],j∈[1,N]}(1)B(clothes and pants)＝{v _j |v _j ＝x _i [s _i ＝＝2||s _i ＝＝3], i∈[1,B],j∈[1,N]}(1)

其中N是每个x_i中裤子和上衣的总数，并且在每个x_i中是不同的，j代表j_th像素向量，s_i表示语义分割图，2代表上衣的索引，3代表裤子的索引,x_i[s_i＝＝2||s_i＝＝3]代表x_i像素向量的上衣和裤子。因此，原始图像x_i可以视为x_i＝[v₁,v₂...v_c1,...,v_cn,...v_last]，其中[v_c1...v_cn]属于由公式(1)得到的裤子和上衣的像素。where N is the total number of pants and tops in each _xi and is different in each _xi , j represents the j _th pixel vector, s _i represents the semantic segmentation map, 2 represents the index of tops, and 3 represents the index of pants , _xi [s _i ==2||s _i ==3] represents the coat and trousers of the x _i pixel vector. Therefore, the original image x _i can be regarded as x _i =[v ₁ ,v ₂ ...v _c1 ,...,v _cn ,...v _last ], where [v _c1 ...v _cn ] belongs to Equation (1) obtains the pixels of the pants and the top.

最后，黑衣图像是通过将裤子和上衣的像素设置为零而得到的。具体来说，我们设置[v_c1...v_cn]＝0，然后可以得到x_i′＝[v₁,v₂...0,...v_last]，这就是x_i的对应黑衣图像。Finally, the black image is obtained by setting the pixels of the pants and top to zero. Specifically, we set [v _c1 ...v _cn ]=0, then we can get x _i′ =[v ₁ ,v ₂ ...0,...v _last ], which is the corresponding black of x _i clothing image.

2.各分支机构的骨干网络2. The backbone network of each branch

我们选择imViT(具体参见[He,Shuting,et al."Transreid:Transformer-basedobject re-identifification."Proceedings of the IEEE/CVF InternationalConference on Computer Vision.2021.1,2,4])作为我们每个分支的主干网络。在每个imViT中，有两个输出用于提取全局和局部特征。对于局部，我们可以选择网络中的局部特征是由多少个局部子特征组成的，在本发明的实验中选择4个局部子特征。因此，对于i-th(i∈{0,1,2})分支，我们可以得到局部特征F_li和全局特征F_gi,其中F_li＝[F_li1,F_li2,F_li3,F_li4,i∈{0,1,2}]。We choose imViT (see [He, Shuting, et al. "Transreid: Transformer-based object re-identification." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.1, 2, 4]) as the backbone of each of our branches network. In each imViT, there are two outputs for extracting global and local features. For local, we can choose how many local sub-features the local features in the network are composed of, and 4 local sub-features are selected in the experiment of the present invention. Therefore, for the i-th (i∈{0,1,2}) branch, we can get the local feature F _li and the global feature F _gi , where F _li = [F _li1 ,F _li2 ,F _{li3 ,F li4} _, i ∈{0,1,2}].

主干网络是由全局和局部特征上的三元组损失和ID损失分别优化的。ID损失是没有标签平滑的交叉熵损失。至于三元组损失L_tri如下所示：The backbone network is optimized by triplet loss and ID loss on global and local features, respectively. ID loss is cross-entropy loss without label smoothing. As for the triplet loss L _tri is as follows:

L_tri＝log(1+exp(||f_a-f_p||₂-||f_a-f_n||₂))L _tri ＝log(1+exp(||f _a -f _p || ₂ -||f _a -f _n || ₂ ))

其中f_a代表锚，f_p代表正样本，而f_n代表负样本。因此，我们框架的每个分支都有两组损失函数，分别是L_tri-gi、L_id-gi，以及L_tri-li、L_id-li，其中L_tri-gi、L_id-gi分别为全局特征F_gi对应的三元组损失、ID损失，L_tri-li、L_id-li分别为局部特征F_li对应的三元组损失、ID损失。where f _a represents anchors, f _p represents positive samples, and f _n represents negative samples. Therefore, each branch of our framework has two sets of loss functions, namely L _tri-gi , L _id-gi , and L _tri-li , L _id-li , where L _tri-gi , L _id-gi are respectively The triplet loss and ID loss corresponding to the global feature F _gi , L _tri-li and L _id-li are the triplet loss and ID loss corresponding to the local feature F _li , respectively.

3.由黑衣分支指导的原始分支3. The original branch directed by the black branch

黑衣分支可以学习与衣服无关的特征。但是一些重要的鉴别信息可能在获得黑衣图像的过程中被丢弃，这些信息隐藏在原始图像中。而直接从原始图像中提取与衣服无关的身份特征是不可行的。根据知识蒸馏算法，我们在预训练的黑衣分支的指导下训练原始图像的分支。具体来说，我们采用均方差(MSE)损失来规范原始分支的训练，以训练出更多与身份相关但与衣服不相关的特征。均方差损失L_mse被定义为The black branch can learn clothes-independent features. But some important discriminative information may be discarded in the process of obtaining the black clothes image, which is hidden in the original image. However, it is infeasible to directly extract clothes-independent identity features from raw images. According to the knowledge distillation algorithm, we train the original image branch under the guidance of the pre-trained black branch. Specifically, we employ a mean square error (MSE) loss to regularize the training of the original branch to train more identity-related but clothes-unrelated features. The mean square error loss L _mse is defined as

L_mse＝L_mse-opg+L_mse-opl L _mse =L _mse-opg +L _mse-opl

其中F_l1、F_g1分别表示由黑色分支得到的局部特征和全局特征，F_l2、F_g2分别表示由黑色分支引导的原始分支得到相连的局部特征和全局特征，L_mse-opg、L_mse-opl分别表示全局特征F_opg＝[F_g1,F_g2]和局部特征F_opl＝[F_l1,F_l2]对应的均方差损失。Among them, F _l1 and F _g1 represent the local features and global features obtained by the black branch respectively, F _l2 and F _g2 respectively represent the local features and global features connected by the original branch guided by the black branch, L _mse-opg , L _{mse- opl} represents the mean square error loss corresponding to the global feature F _opg = [F _g1 , F _g2 ] and the local feature F _opl = [F _l1 , F _l2 ] respectively.

4.头部特征的提取4. Extraction of head features

对于输入的原始图像，我们使用预训练的HRNet来获得头部图像部分。将得到的头部图像放入imViT3中，得到合并的局部特征F_l3和全局特征F_g3，其中，前者是由四个局部特征F_l31,F_l32,F_l33,F_l34合并而成。同样地，由黑色分支引导的原始分支可以得到相连的局部特征F_l2和全局特征F_g2。为了获得更多的不确定的衣服相关特征，我们对F_l2和F_l3、F_g2和F_g3进行元素求和，其定义如下：For the input raw image, we use a pre-trained HRNet to obtain the head image part. Put the obtained head image into imViT3 to obtain the merged local feature F _l3 and global feature F _g3 , where the former is formed by merging four local features F _l31 , F _l32 , F _l33 , and F _l34 . Similarly, the original branch guided by the black branch can get the connected local feature F _l2 and global feature F _g2 . To obtain more uncertain clothes-related features, we perform element-wise summation of F _l2 and F _l3 , F _g2 and F _g3 , which are defined as follows:

F_ohg＝wF_g2+(1-w)F_g3 F _ohg =wF _g2 +(1-w)F _g3

F_ohl＝wF_l2+(1-w)F_l3 F _ohl ＝wF _l2 +(1-w)F _l3

其中，F_ohg、F_ohl分别表示与行人相关但与衣服不相关的全局总体特征、局部总体特征，w为权重系数，w∈(0,1)。Among them, F _ohg and F _ohl respectively represent the global overall feature and local overall feature related to pedestrians but not related to clothes, w is the weight coefficient, w∈(0,1).

而三元组损失和ID损失被应用于其中，它们是L_id-ohg、L_tri-ohg、L_id-ohl和L_tri-ohl，其中L_id-ohg、L_tri-ohg分别表示F_ohg对应的ID损失、三元组损失，L_id-ohl、L_tri-ohl分别表示F_ohl对应的ID损失、三元组损失。And the triplet loss and ID loss are applied in it, they are L _id-ohg , L _tri-ohg , L _id-ohl and L _tri-ohl , where L _id-ohg and L _tri-ohg represent F _ohg corresponding to The ID loss and triplet loss of , L _id-ohl and L _tri-ohl represent the ID loss and triplet loss corresponding to F _ohl respectively.

5.联合训练5. Joint training

在训练中，有两个阶段。首先，我们将黑衣图像送入黑衣分支，并在全局特征

和

上用ID损失和三元组损失(为了简洁起见，我们在图1的黑衣分支中省略了这两种损失)对其进行单独训练。然后我们得到预训练的黑衣分支。In training, there are two phases. First, we feed the black image into the black branch, and in the global feature

and

It is trained separately with ID loss and triplet loss (for brevity, we omit these two losses in the black branch in Fig. 1). Then we get the pretrained black branch.

其次，我们固定黑衣分支的学习权重并联合训练其他分支。因此，总损失函数被定义为：Second, we fix the learned weights of the black branch and jointly train other branches. Therefore, the total loss function is defined as:

L_total(θ)＝l₁L_id(θ)+l₂L_tri(θ)+l₃L_mse(θ)L _total (θ)＝l ₁ L _id (θ)+l ₂ L _tri (θ)+l ₃ L _mse (θ)

其中，L_id(θ)代表ID损失，L_tri(θ)代表三元组损失，L_mse(θ)代表均方差损失。l₁,l₂,l₃是平衡各损失贡献的权衡参数。在本发明的实验中，l₁,l₂,l₃分别被设定为0.25、0.25和0.5。Among them, L _id (θ) represents the ID loss, L _tri (θ) represents the triplet loss, and L _mse (θ) represents the mean square error loss. l ₁ , l ₂ , l ₃ are trade-off parameters to balance the contribution of each loss. In the experiments of the present invention, l ₁ , l ₂ , and l ₃ are set to 0.25, 0.25, and 0.5, respectively.

为验证本发明效果，进行如下实验：For verifying effect of the present invention, carry out following experiment:

我们分别在不同服装设置和相同服装设置中对PRCC(PRCC包括来自221个人物的33698张图片，3个不同的角度，还提供人物的轮廓草图图像，方便提取人的轮廓信息)数据集进行了实验。在不同衣服的设置中，来自相机A的图像被用于候选集gallery，而来自相机C的图像则作为查找集query。在相同的衣服设置中，gallery的图像也来自相机A，但query的图像来自相机B。实验结果见表1。We tested the dataset of PRCC (PRCC includes 33698 pictures from 221 people, 3 different angles, and also provides silhouette sketch images of people, which is convenient for extracting people’s contour information) in different clothing settings and the same clothing settings. experiment. In the different clothes setting, the image from camera A is used as the candidate set gallery, while the image from camera C is used as the lookup set query. In the same clothes setting, the image of gallery is also from camera A, but the image of query is from camera B. The experimental results are shown in Table 1.

表1实验结果Table 1 Experimental results

我们在PRCC数据集上对本发明的方法和一些最先进的方法做了一些比较，包括有代表性的传统行人重识别方法(PCB[Y.Sun,L.Zheng,Y.Yang,Q.Tian,and S.Wang,“Beyondpart models:Person retrieval with refifined part pooling(and a strongconvolutional baseline),”in Proceedings of the European conference oncomputer vision(ECCV),2018,pp.480–496.]，Zheng et al’s method[Z.Zheng,L.Zheng,and Y.Yang,“A discriminatively learned cnn embedding for personreidentification,”ACM transactions on multimedia computing,communications,andapplications(TOMM),vol.14,no.1,pp.1–20,2017.]，HPM[Y.Fu,Y.Wei,Y.Zhou,H.Shi,G.Huang,X.Wang,Z.Yao,and T.Huang,“Horizontal pyramid matching for person re-identification,”in Proceedings of the AAAI conference on artificialintelligence,vol.33,no.01,2019,pp.8295–8302.]，HACNN[W.Li,X.Zhu,and S.Gong,“Harmonious attention network for person re-identification,”in Proceedings ofthe IEEE conference on computer vision and pattern recognition,2018,pp.2285–2294.])和换衣行人重识别方法(PRCC(sketch)[Q.Yang,A.Wu,and W.-S.Zheng,“Personre-identification by contour sketch under moderate clothing change,”IEEEtransactions on pattern analysis and machine intelligence,vol.43,no.6,pp.2029–2046,2019.]，GI-ReID(OSNet)[X.Jin,T.He,K.Zheng,Z.Yin,X.Shen,Z.Huang,R.Feng,J.Huang,Z.Chen,and X.-S.Hua,“Cloth-changing person re-identificationfrom a single image with gait prediction and regularization,”in Proceedingsof the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022,pp.14 278–14 287.]，LightMBN[F.Herzog,X.Ji,T.Teepe,S.

J.Gilg,andG.Rigoll,“Lightweight multi-branch network for person re-identification,”in2021 IEEE International Conference on Image Processing(ICIP).IEEE,2021,pp.1129–1133.])。所有比较方法的结果都来自他们发表的论文。We made some comparisons between our method and some state-of-the-art methods on the PRCC dataset, including representative traditional person re-identification methods (PCB [Y.Sun, L.Zheng, Y.Yang, Q.Tian, and S. Wang, “Beyondpart models: Person retrieval with refined part pooling(and a strong convolutional baseline),” in Proceedings of the European conference on computer vision (ECCV), 2018, pp.480–496.], Zheng et al's method[ Z. Zheng, L. Zheng, and Y. Yang, "A discriminatively learned cnn embedding for personreidentification," ACM transactions on multimedia computing, communications, and applications (TOMM), vol.14, no.1, pp.1–20, 2017.], HPM [Y.Fu, Y.Wei, Y.Zhou, H.Shi, G.Huang, X.Wang, Z.Yao, and T.Huang, "Horizontal pyramid matching for person re-identification," in Proceedings of the AAAI conference on artificial intelligence, vol.33, no.01, 2019, pp.8295–8302.], HACNN [W.Li, X.Zhu, and S.Gong, "Harmonious attention network for person re- identification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp.2285–2294.]) and re-identification method for people changing clothes (PRCC (sketch) [Q.Yang, A.Wu, and W.- S. Zheng, "Person re-identification by contour sketch under moderate clothing change," IEEE transactions on pattern analysis and machine intelligence, vol.43, no.6, pp.2029–2046, 2019.], GI-ReID (OSNet) [X.Jin, T.He, K.Zheng , Z.Yin, X.Shen, Z.Huang, R.Feng, J.Huang, Z.Chen, and X.-S.Hua, “Cloth-changing person re-identification from a single image with gait prediction and regularization, "in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp.14 278–14 287.], LightMBN [F. Herzog, X. Ji, T. Teepe, S.

J. Gilg, and G. Rigoll, “Lightweight multi-branch network for person re-identification,” in 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 2021, pp.1129–1133.]). The results of all compared methods come from their published papers.

从表1可以看出，本发明的方法取得了非常好的性能。在同一件衣服上，虽然本发明的方法略逊于LightMBN，但几乎超过了其他所有的方法。这表明本发明的方法在相同衣服的行人重识别上有更好的泛化能力。在不同衣服的设置中，本发明的方法表现出明显的优异性能，在mAP(mAP用于评价行人重识别算法的整体效果；其中AP是指一个query样本的平均精度，表示模型在某一个样本上的效果，mAP是所有query样本AP的平均值，表示模型在所有query样本上的整体效果)超过LightMBN 4.7％，在rank-1(即表1中R@1)的准确性上超过9.6％。这可能是由于本发明的方法能够提取更多的与衣服无关的身份特征。As can be seen from Table 1, the method of the present invention has achieved very good performance. On the same garment, although our method is slightly inferior to LightMBN, it surpasses almost all other methods. This shows that the method of the present invention has a better generalization ability in the re-identification of people with the same clothes. In the settings of different clothes, the method of the present invention shows obvious excellent performance. In mAP (mAP is used to evaluate the overall effect of the pedestrian re-identification algorithm; wherein AP refers to the average precision of a query sample, indicating that the model is in a certain sample The effect on the mAP is the average of all query samples AP, indicating the overall effect of the model on all query samples) exceeding LightMBN by 4.7%, and the accuracy of rank-1 (that is, R@1 in Table 1) exceeding 9.6% . This may be due to the fact that the method of the present invention is able to extract more identity features that are not related to clothes.

为了进行直观的分析，我们从PRCC中随机抽取了3张图像，并在图3中显示了基线和本发明的方法捕捉到的相应的激活图(CAM)。从第一行可以看出，基线的激活区域集中在衣服和背景上，许多背景和衣服区域被激活并被用来识别一个人，这可能会混淆识别结果。此外，基线对头部的注意较少。从第二行可以发现，我们提出的方法中的激活点更加集中，而且背景和衣服区域更少，这就减轻了背景和衣服对识别的影响。此外，我们还注意到，我们提出的模型对头部和身体的形状给予了更多关注。For an intuitive analysis, we randomly sample 3 images from PRCC and show the corresponding activation maps (CAMs) captured by the baseline and our method in Fig. 3. As can be seen from the first row, the activation regions of the baseline are concentrated on the clothes and the background, and many background and clothes regions are activated and used to recognize a person, which may confuse the recognition results. Also, the baseline pays less attention to the head. From the second row, it can be found that the activation points in our proposed method are more concentrated, and there are fewer background and clothes areas, which alleviates the impact of background and clothes on recognition. Furthermore, we also notice that our proposed model pays more attention to the shape of the head and body.

在上述实施例的基础上，如图4所示，本发明还提出一种由黑衣和头部图像指导的换衣行人重识别装置，包括：On the basis of the above embodiments, as shown in Figure 4, the present invention also proposes a device for re-identifying pedestrians who change clothes guided by black clothes and head images, including:

综上，相比较于利用GAN生成大量换衣图像扩充数据集的方法，本发明使用非GAN的方式。利用人体语义解析模型获得人体的上衣和裤子部分，并对其使用本发明的方法进行遮挡，这样的方式不用扩充数据集就能使模型更集中地学习到与衣服无关的特征，既节省了空间又节省了时间。相比较于其他衣服特征和身份特征分开的方法，本发明将所有行人的衣服进行遮挡，从而获得了黑衣图像，这样就使得行人的衣服颜色统一，从而使模型更加关注衣服颜色以外的部分，从而提高模型健壮性。相比较于直接从排除衣服后图像中学习特征，本发明利用黑衣分支，指导原始分支直接从原始RGB图像中学习到衣服无关的身份特征，这样能够有效地利用原始图像中的信息，有效减少黑衣图像获取过程中的信息丢失，提高特征的鲁棒性。相比较于直接从原始图像提取衣服无关特征的方法，本发明通过增加头部图像块的专门处理，从而能够提取到更具判别性的细粒度特征，对全局特征有较好的补充作用。在PRCC数据集上的测试结果表明本发明的方法取得优秀的换衣行人重识别效果。In summary, compared with the method of using GAN to generate a large number of clothes-changing image expansion data sets, the present invention uses a non-GAN method. Use the human body semantic analysis model to obtain the top and trousers of the human body, and use the method of the present invention to block them. In this way, the model can learn features that have nothing to do with clothes more intensively without expanding the data set, which saves space. Another time saver. Compared with other methods that separate clothing features and identity features, the present invention blocks all pedestrians' clothes to obtain a black image, which makes the color of pedestrians' clothes uniform, so that the model pays more attention to parts other than the color of the clothes. Thereby improving the robustness of the model. Compared with learning features directly from the image after excluding clothes, the present invention uses the black branch to guide the original branch to learn clothes-independent identity features directly from the original RGB image, which can effectively use the information in the original image and effectively reduce The information is lost during the acquisition of the black clothes image, which improves the robustness of the features. Compared with the method of directly extracting clothing-independent features from the original image, the present invention can extract more discriminative fine-grained features by adding special processing of the head image block, which has a good supplementary effect on the global features. The test results on the PRCC data set show that the method of the present invention achieves excellent re-identification results for people changing clothes.

以上所示仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。What is shown above is only a preferred embodiment of the present invention, and it should be pointed out that for those of ordinary skill in the art, some improvements and modifications can also be made without departing from the principle of the present invention. It should be regarded as the protection scope of the present invention.

Claims

1. A method for re-identifying pedestrians who change clothes guided by black clothes and head images, characterized in that it comprises:

Step 1: Remove the clothes feature from the original pedestrian image to get a black clothed pedestrian image;

Step 2: Use the pre-trained HRNet to process the original pedestrian image to obtain the pedestrian head image in the original pedestrian image;

Step 3: Build a re-recognition network for pedestrians in clothes, which consists of three network branches, namely the original branch, the black branch, and the head branch, which are used to learn the original pedestrian image features, black clothes pedestrian image features, and pedestrian Head image features; the three network branches have the same backbone network structure, but do not share parameters;

Step 4: Input the black-clothed pedestrian image obtained in step 1 into the black-clothed branch for training, and obtain the pre-trained black-clothed branch;

Step 5: Train the original branch under the guidance of the pre-trained black branch to obtain the features related to pedestrians but not related to clothes in the original pedestrian image;

Step 6: Input the pedestrian head image obtained in step 2 into the head branch for learning, and combine the learned pedestrian head image features with the features related to pedestrians but not related to clothes obtained in step 5 to obtain pedestrian The overall features that are relevant but not related to clothes, complete the training of the re-identification network for people changing clothes;

Step 7: Perform re-identification of people changing clothes based on the trained re-identification network for people changing clothes.

2. The method for re-identifying pedestrians who change clothes guided by black clothes and head images according to claim 1, wherein said step 1 comprises:

Use the pre-trained human body analysis model to obtain the body part images of pedestrians in the original pedestrian image, and recombine the obtained body part images to obtain six parts: background, head, top, pants, arms and Legs, from which the pixels of the top and trousers images are extracted to form the clothes region; all pixels in the clothes region are set to zero, resulting in a black clothed pedestrian image.

3. The method for re-identifying pedestrians who change clothes under the guidance of black clothes and head images according to claim 1, wherein the three network branch backbone networks are all imViT networks, and in each imViT network, there are two The output is used to extract global features and local features, respectively, and the backbone network is optimized by triplet loss and ID loss on global features and local features, respectively, where ID loss is cross-entropy loss without label smoothing.

4. The re-identification method for people changing clothes guided by black clothes and head images according to claim 1, wherein said step 5 comprises:

According to the knowledge distillation algorithm, the branch of the original image is trained under the guidance of the pre-trained black branch, and the mean square error loss is used to standardize the training of the original branch to train more features related to identity but not related to clothes.

5. A clothing-changing pedestrian re-identification device guided by black clothes and head images, characterized in that it includes:

The image of the black clothes is derived from the module, which is used to delete the clothes feature from the original pedestrian image, and obtain the black clothes pedestrian image;

The head image obtains the module, is used to process the original pedestrian image using pre-trained HRNet, and obtains the pedestrian head image in the original pedestrian image;

Clothes-changing pedestrian re-identification network building module is used to build a clothing-changing pedestrian re-identification network. The network consists of three network branches, namely the original branch, the black branch, and the head branch, which are used to learn the original pedestrian image features, Image features of pedestrians in black clothes, image features of pedestrians' heads; the backbone network structure of the three network branches is the same, but do not share parameters;

The black clothing branch training module is used to input the black clothing pedestrian image obtained by the black clothing image module into the black clothing branch for training to obtain the pre-trained black clothing branch;

The original branch training module is used to train the original branch under the guidance of the pre-trained black branch, and obtain the features related to pedestrians but not related to clothes in the original pedestrian image;

The head branch training module is used to obtain the head image obtained by the head image module and input the head image into the head branch for learning, and the learned pedestrian head image features are related to the pedestrian but not related to the original branch training module. Combining the features that are not related to clothes, the overall features that are related to pedestrians but not related to clothes are obtained, and the training of re-identification network for pedestrians who change clothes is completed;

The re-identification module of people changing clothes is used for re-identifying people changing clothes based on the trained re-identification network for people changing clothes.

6. The clothing-changing pedestrian re-identification device guided by black clothes and head images according to claim 5, wherein the black clothes image deriving module is specifically used for:

7. The clothing-changing pedestrian re-identification device guided by black clothes and head images according to claim 5, characterized in that, the three network branch backbone networks are all imViT networks, and in each imViT network, there are two The output is used to extract global features and local features, respectively, and the backbone network is optimized by triplet loss and ID loss on global features and local features, respectively, where ID loss is cross-entropy loss without label smoothing.

8. The clothing-changing pedestrian re-identification device guided by black clothes and head images according to claim 5, wherein the original branch training module is specifically used for: