CN115620338A - Method and device for re-identifying pedestrians who change clothes guided by black clothes and head images - Google Patents
Method and device for re-identifying pedestrians who change clothes guided by black clothes and head images Download PDFInfo
- Publication number
- CN115620338A CN115620338A CN202211258905.XA CN202211258905A CN115620338A CN 115620338 A CN115620338 A CN 115620338A CN 202211258905 A CN202211258905 A CN 202211258905A CN 115620338 A CN115620338 A CN 115620338A
- Authority
- CN
- China
- Prior art keywords
- clothes
- black
- branch
- image
- pedestrian
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000008859 change Effects 0.000 title claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 40
- 230000008569 process Effects 0.000 claims abstract description 10
- 238000004458 analytical method Methods 0.000 claims description 10
- 238000009499 grossing Methods 0.000 claims description 7
- 238000013140 knowledge distillation Methods 0.000 claims description 7
- 101000611614 Homo sapiens Proline-rich protein PRCC Proteins 0.000 description 7
- 102100040829 Proline-rich protein PRCC Human genes 0.000 description 7
- 230000004913 activation Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 239000003086 colorant Substances 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000000576 supplementary effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000005021 gait Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
本发明公开一种由黑衣和头部图像指导的换衣行人重识别方法及装置,该方法包括:首先,使用新设计的遮挡衣服方法对原始图像进行处理获得到对应的黑衣图像,将得到的黑衣图像放在黑衣分支进行预训练;然后对框架进行联合学习,将原始行人图像放入原始分支并使用预训练好的黑衣分支指导原始分支学习;同时,将行人头部图像放入到头部分支中从而获得更细粒度的行人特征。本发明将所有行人的衣服进行遮挡,从而获得了黑衣图像,使得行人的衣服颜色统一,从而使模型更加关注衣服颜色以外的部分,从而提高模型健壮性;且能有效地利用原始图像中的信息,有效减少黑衣图像获取过程中的信息丢失,提高特征的鲁棒性。
The invention discloses a method and device for re-recognition of pedestrians who change clothes guided by black clothes and head images. The method includes: firstly, using a newly designed method of occluding clothes to process the original image to obtain the corresponding black clothes image. The obtained black clothes image is placed in the black clothes branch for pre-training; then the framework is jointly learned, the original pedestrian image is put into the original branch and the pre-trained black clothes branch is used to guide the original branch learning; at the same time, the pedestrian head image Put into the head branch to obtain finer-grained pedestrian features. The present invention blocks the clothes of all pedestrians, thereby obtaining the image of black clothes, making the clothes of pedestrians uniform in color, so that the model pays more attention to the parts other than the color of the clothes, thereby improving the robustness of the model; and can effectively use the information, effectively reducing the information loss in the process of acquiring black clothes images, and improving the robustness of features.
Description
技术领域technical field
本发明涉及行人重识别技术领域,尤其涉及一种由黑衣和头部图像指导的换衣行人重识别方法及装置。The invention relates to the technical field of pedestrian re-identification, in particular to a method and device for re-identifying pedestrians who change clothes and are guided by black clothes and head images.
背景技术Background technique
行人重识别的目的是解决不同条件下的行人检索问题,如不同的相机、不同的灯光或不同的观察角度。行人重识别研究有多种方法,如轻量级网络、域泛化、无监督学习等子领域,近年来都取得了不错的效果。但以上这些方法一般都假设一个人的衣服在长时间内保持一致。The purpose of person re-identification is to solve the problem of person retrieval under different conditions, such as different cameras, different lights or different viewing angles. There are many methods for pedestrian re-identification research, such as lightweight networks, domain generalization, unsupervised learning and other sub-fields, all of which have achieved good results in recent years. But these methods generally assume that a person's clothing remains consistent over a long period of time.
但在现实世界中,人们的衣服不会一成不变。例如,人们在很长一段时间内总是穿着不同的衣服,而一些嫌疑人可能会在短时间内通过改变他们的衣服来逃避追踪。因此,一个不同版本的行人重识别问题被提出来了,它被称为长期换衣行人重识别,并成为当今的一个热点问题。从长远来看,换衣行人重识别是当今的一个热点问题。解决换衣行人重识别的核心是提取仅与身份相关的具有鉴别性的相关特征。为了去除衣服的干扰项,研究人员通常采用两种通用策略。But in the real world, people's clothes don't stay the same. For example, people always wear different clothes for a long time, and some suspects may evade tracking by changing their clothes in a short period of time. Therefore, a different version of the person re-ID problem is proposed, which is called long-term clothes-changing person re-ID, and has become a hot topic nowadays. In the long run, re-identification of people changing clothes is a hot issue nowadays. The core of solving clothes-changing person re-identification is to extract discriminative relevant features that are only related to identity. To remove distractors from clothes, researchers generally employ two general strategies.
第一是数据策略。常见的方法是构建一个大规模的数据集,其中每个人都应该有多张具有大量不同衣服的图片,然后强迫模型从这些图片中学习与衣服无关的特征。然而,纯粹靠人力来构建这样的换衣数据集是非常艰巨的,几乎不可能。因此,一些研究人员利用GAN或其他方式来扩展原始数据集。The first is data strategy. A common approach is to build a large-scale dataset where each person should have multiple pictures with a large number of different clothes, and then force the model to learn clothes-independent features from these pictures. However, constructing such a clothes-changing dataset purely by human power is very arduous, almost impossible. Therefore, some researchers utilize GAN or other means to extend the original dataset.
第二是特征分离策略。常见的操作是将服装特征与其他身份特征分开。通过这样做,除了衣服以外的其他特征可以被用于身份判断。例如,杨等人采用行人的轮廓作为查询和图库,并利用极坐标来更好地获得行人的轮廓特征。然而,尽管从轮廓中学习可以获得与衣服无关的特征,但它同时也抛弃了很大一部分与衣服无关的特征(如头部)。除此之外,洪等人提出使用外观分支和形状分支来提取细粒度特征。然而,这种方法经常受到衣服不同颜色的影响,不能提取更多与衣服无关的稳健特征。The second is the feature separation strategy. A common operation is to separate clothing features from other identity features. By doing this, other features besides clothes can be used for identity judgment. For example, Yang et al. adopted the silhouette of pedestrians as a query and gallery, and utilized polar coordinates to better obtain pedestrian silhouette features. However, although learning from silhouettes can obtain clothing-independent features, it simultaneously discards a large part of clothing-independent features (such as the head). Besides, Hong et al. proposed to use appearance branch and shape branch to extract fine-grained features. However, this method is often affected by different colors of clothes and cannot extract more robust features that are not related to clothes.
现有技术主要存在的问题:The main problems of the prior art:
1.现有的换衣行人重识别方法存在着需要大量的图像生成工作,需要大量的训练时间。1. The existing re-identification methods for people changing clothes require a lot of image generation work and a lot of training time.
2.现有的换衣行人重识别方法经常受到衣服不同颜色的影响,不能提取更多的与衣服无关的稳健特征。2. Existing re-identification methods for people changing clothes are often affected by different colors of clothes, and cannot extract more robust features that are not related to clothes.
3.现有的换衣行人重识别方法多数采用传统的卷积神经网络作为训练网络,传统的卷积神经网络由于下采样和池化的原因带来了一定的损失。3. Most of the existing re-identification methods for people changing clothes use the traditional convolutional neural network as the training network. The traditional convolutional neural network brings a certain loss due to downsampling and pooling.
4.现有的换衣行人重识别方法多数忽略了头部特征对整体判断的影响。4. Most of the existing re-identification methods for people changing clothes ignore the influence of head features on the overall judgment.
发明内容Contents of the invention
本发明针对背景技术中上述问题,提出一种由黑衣和头部图像指导的换衣行人重识别方法及装置,使用非GAN的方法来提取图像中与衣服无关的特征,提出了新的遮挡衣服策略使所有行人的衣服趋于一致,强制模型去学习与衣服无关的稳健特征,采用改进后的Transformer作为训练网络,并单独设计了头部分支来获取原始图像的细粒度头部特征。Aiming at the above-mentioned problems in the background technology, the present invention proposes a method and device for re-identifying pedestrians who change clothes guided by black clothes and head images, uses a non-GAN method to extract features that have nothing to do with clothes in the image, and proposes a new occlusion The clothes strategy makes the clothes of all pedestrians tend to be consistent, and forces the model to learn robust features that have nothing to do with clothes. The improved Transformer is used as the training network, and the head branch is designed separately to obtain the fine-grained head features of the original image.
为了实现上述目的,本发明采用以下技术方案:In order to achieve the above object, the present invention adopts the following technical solutions:
本发明一方面提出一种由黑衣和头部图像指导的换衣行人重识别方法,包括:On the one hand, the present invention proposes a method for re-identifying pedestrians who change clothes guided by black clothes and head images, including:
步骤1:从原始行人图像中删除衣服特征,得到黑色的衣服行人图像;Step 1: Remove the clothes feature from the original pedestrian image to get a black clothed pedestrian image;
步骤2:使用预训练的HRNet对原始行人图像进行处理,得到原始行人图像中的行人头部图像;Step 2: Use the pre-trained HRNet to process the original pedestrian image to obtain the pedestrian head image in the original pedestrian image;
步骤3:构建换衣行人重识别网络,该网络由三个网络分支组成,分别为原始分支、黑衣分支、头部分支,分别用于学习原始行人图像特征、黑色的衣服行人图像特征、行人头部图像特征;三个网络分支主干网络结构相同,但不共享参数;Step 3: Build a re-recognition network for pedestrians in clothes, which consists of three network branches, namely the original branch, the black branch, and the head branch, which are used to learn the original pedestrian image features, black clothes pedestrian image features, and pedestrian Head image features; the three network branches have the same backbone network structure, but do not share parameters;
步骤4:将步骤1得到的黑色的衣服行人图像输入黑衣分支中进行训练,得到预训练的黑衣分支;Step 4: Input the black-clothed pedestrian image obtained in
步骤5:在预训练的黑衣分支的指导下训练原始分支,得到原始行人图像中与行人相关但与衣服不相关的特征;Step 5: Train the original branch under the guidance of the pre-trained black branch to obtain the features related to pedestrians but not related to clothes in the original pedestrian image;
步骤6:将步骤2得到行人头部图像输入头部分支中进行学习,并将学习到的行人头部图像特征与步骤5得到的与行人相关但与衣服不相关的特征相结合,得到与行人相关但与衣服不相关的总体特征,完成换衣行人重识别网络训练;Step 6: Input the pedestrian head image obtained in step 2 into the head branch for learning, and combine the learned pedestrian head image features with the features related to pedestrians but not related to clothes obtained in step 5 to obtain pedestrian The overall features that are relevant but not related to clothes, complete the training of the re-identification network for people changing clothes;
步骤7:基于训练后的换衣行人重识别网络进行换衣行人重识别。Step 7: Perform re-identification of people changing clothes based on the trained re-identification network for people changing clothes.
进一步地,所述步骤1包括:Further, said
采用预训练好的的人体解析模型,来获得原始行人图像中行人的各身体部位图像,并将得到的各身体部位图像重新组合,得到六个部分:背景、头部、上衣、裤子、手臂和腿,从中提取上衣和裤子图像的像素,形成衣服区域;将衣服区域的所有像素设置为零,得到黑色的衣服行人图像。Use the pre-trained human body analysis model to obtain the body part images of pedestrians in the original pedestrian image, and recombine the obtained body part images to obtain six parts: background, head, top, pants, arms and Legs, from which the pixels of the top and trousers images are extracted to form the clothes region; all pixels in the clothes region are set to zero, resulting in a black clothed pedestrian image.
进一步地,三个网络分支主干网络均为imViT网络,在每个imViT网络中,有两个输出分别用于提取全局特征和局部特征,主干网络由全局特征和局部特征上的三元组损失和ID损失分别进行优化,其中ID损失为没有标签平滑的交叉熵损失。Furthermore, the backbone networks of the three network branches are all imViT networks. In each imViT network, two outputs are used to extract global features and local features respectively. The backbone network consists of triplet loss on global features and local features and The ID loss is optimized separately, where the ID loss is the cross-entropy loss without label smoothing.
进一步地,所述步骤5包括:Further, said step 5 includes:
根据知识蒸馏算法,在预训练的黑衣分支的指导下训练原始图像的分支,采用均方差损失来规范原始分支的训练,以训练出更多与身份相关但与衣服不相关的特征。According to the knowledge distillation algorithm, the branch of the original image is trained under the guidance of the pre-trained black branch, and the mean square error loss is used to standardize the training of the original branch to train more features related to identity but not related to clothes.
本发明另一方面提出一种由黑衣和头部图像指导的换衣行人重识别装置,包括:Another aspect of the present invention proposes a device for re-identifying pedestrians who change clothes guided by black clothes and head images, including:
黑衣图像得出模块,用于从原始行人图像中删除衣服特征,得到黑色的衣服行人图像;The image of the black clothes is derived from the module, which is used to delete the clothes feature from the original pedestrian image, and obtain the black clothes pedestrian image;
头部图像得出模块,用于使用预训练的HRNet对原始行人图像进行处理,得到原始行人图像中的行人头部图像;The head image obtains the module, is used to process the original pedestrian image using pre-trained HRNet, and obtains the pedestrian head image in the original pedestrian image;
换衣行人重识别网络构建模块,用于构建换衣行人重识别网络,该网络由三个网络分支组成,分别为原始分支、黑衣分支、头部分支,分别用于学习原始行人图像特征、黑色的衣服行人图像特征、行人头部图像特征;三个网络分支主干网络结构相同,但不共享参数;Clothes-changing pedestrian re-identification network building module is used to build a clothing-changing pedestrian re-identification network. The network consists of three network branches, namely the original branch, the black branch, and the head branch, which are used to learn the original pedestrian image features, Image features of pedestrians in black clothes, image features of pedestrians' heads; the backbone network structure of the three network branches is the same, but do not share parameters;
黑衣分支训练模块,用于将黑衣图像得出模块得到的黑色的衣服行人图像输入黑衣分支中进行训练,得到预训练的黑衣分支;The black clothing branch training module is used to input the black clothing pedestrian image obtained by the black clothing image module into the black clothing branch for training to obtain the pre-trained black clothing branch;
原始分支训练模块,用于在预训练的黑衣分支的指导下训练原始分支,得到原始行人图像中与行人相关但与衣服不相关的特征;The original branch training module is used to train the original branch under the guidance of the pre-trained black branch, and obtain the features related to pedestrians but not related to clothes in the original pedestrian image;
头部分支训练模块,用于将头部图像得出模块得到行人头部图像输入头部分支中进行学习,并将学习到的行人头部图像特征与原始分支训练模块得到的与行人相关但与衣服不相关的特征相结合,得到与行人相关但与衣服不相关的总体特征,完成换衣行人重识别网络训练;The head branch training module is used to obtain the head image obtained by the head image module and input the head image into the head branch for learning, and the learned pedestrian head image features are related to the pedestrian but not related to the original branch training module. Combining the features that are not related to clothes, the overall features that are related to pedestrians but not related to clothes are obtained, and the training of re-identification network for pedestrians who change clothes is completed;
换衣行人重识别模块,用于基于训练后的换衣行人重识别网络进行换衣行人重识别。The re-identification module of people changing clothes is used for re-identifying people changing clothes based on the trained re-identification network for people changing clothes.
进一步地,所述黑衣图像得出模块具体用于:Further, the black clothes image obtaining module is specifically used for:
采用预训练好的的人体解析模型,来获得原始行人图像中行人的各身体部位图像,并将得到的各身体部位图像重新组合,得到六个部分:背景、头部、上衣、裤子、手臂和腿,从中提取上衣和裤子图像的像素,形成衣服区域;将衣服区域的所有像素设置为零,得到黑色的衣服行人图像。Use the pre-trained human body analysis model to obtain the body part images of pedestrians in the original pedestrian image, and recombine the obtained body part images to obtain six parts: background, head, top, pants, arms and Legs, from which the pixels of the top and trousers images are extracted to form the clothes region; all pixels in the clothes region are set to zero, resulting in a black clothed pedestrian image.
进一步地,三个网络分支主干网络均为imViT网络,在每个imViT网络中,有两个输出分别用于提取全局特征和局部特征,主干网络由全局特征和局部特征上的三元组损失和ID损失分别进行优化,其中ID损失为没有标签平滑的交叉熵损失。Furthermore, the backbone networks of the three network branches are all imViT networks. In each imViT network, two outputs are used to extract global features and local features respectively. The backbone network consists of triplet loss on global features and local features and The ID loss is optimized separately, where the ID loss is the cross-entropy loss without label smoothing.
进一步地,所述原始分支训练模块具体用于:Further, the original branch training module is specifically used for:
根据知识蒸馏算法,在预训练的黑衣分支的指导下训练原始图像的分支,采用均方差损失来规范原始分支的训练,以训练出更多与身份相关但与衣服不相关的特征。According to the knowledge distillation algorithm, the branch of the original image is trained under the guidance of the pre-trained black branch, and the mean square error loss is used to standardize the training of the original branch to train more features related to identity but not related to clothes.
与现有技术相比,本发明具有的有益效果:Compared with the prior art, the present invention has the beneficial effects:
1.相比较于利用GAN生成大量换衣图像扩充数据集的方法,本发明使用非GAN的方式。利用人体语义解析模型获得人体的上衣和裤子部分,并对其使用本发明的方法进行遮挡,这样的方式不用扩充数据集就能使模型更集中地学习到与衣服无关的特征,既节省了空间又节省了时间。1. Compared with the method of using GAN to generate a large number of clothes-changing image expansion data sets, the present invention uses a non-GAN method. Use the human body semantic analysis model to obtain the top and trousers of the human body, and use the method of the present invention to block them. In this way, the model can learn features that have nothing to do with clothes more intensively without expanding the data set, which saves space. Another time saver.
2.相比较于其他衣服特征和身份特征分开的方法,本发明将所有行人的衣服进行遮挡,从而获得了黑衣图像,这样就使得行人的衣服颜色统一,从而使模型更加关注衣服颜色以外的部分,从而提高模型健壮性。2. Compared with other methods that separate clothing features and identity features, the present invention blocks all pedestrians' clothes to obtain black clothes images, which makes the pedestrians' clothes uniform in color, so that the model pays more attention to things other than the color of clothes part, thereby improving the robustness of the model.
3.相比较于直接从排除衣服后图像中学习特征,本发明利用黑衣分支,指导原始分支直接从原始RGB图像中学习到衣服无关的身份特征,这样能够有效地利用原始图像中的信息,有效减少黑衣图像获取过程中的信息丢失,提高特征的鲁棒性。3. Compared with learning features directly from the image after excluding clothes, the present invention utilizes the black branch to guide the original branch to learn clothes-independent identity features directly from the original RGB image, so that the information in the original image can be effectively used, Effectively reduce the information loss in the process of acquiring black clothes images and improve the robustness of features.
4.相比较于直接从原始图像提取衣服无关特征的方法,本发明通过增加头部图像块的专门处理,从而能够提取到更具判别性的细粒度特征,对全局特征有较好的补充作用。4. Compared with the method of directly extracting clothing-independent features from the original image, the present invention can extract more discriminative fine-grained features by adding special processing of the head image block, which has a good supplementary effect on the global features .
5.在PRCC数据集上的测试结果表明本发明的方法取得优秀的换衣行人重识别效果。5. The test results on the PRCC data set show that the method of the present invention achieves excellent re-identification results for people changing clothes.
附图说明Description of drawings
图1为本发明实施例一种由黑衣和头部图像指导的换衣行人重识别方法的基本流程图;Fig. 1 is a basic flowchart of a method for re-identifying pedestrians who change clothes guided by black clothes and head images according to an embodiment of the present invention;
图2为本发明实施例构建的换衣行人重识别网络架构示意图;FIG. 2 is a schematic diagram of a re-identification network architecture for clothes-changing pedestrians constructed in an embodiment of the present invention;
图3为本发明实施例得出的特征激活图对比示例;Fig. 3 is a comparison example of feature activation maps obtained in the embodiment of the present invention;
图4为本发明实施例一种由黑衣和头部图像指导的换衣行人重识别装置的结构示意图。Fig. 4 is a schematic structural diagram of a device for re-identifying people who change clothes guided by black clothes and head images according to an embodiment of the present invention.
具体实施方式detailed description
下面结合附图和具体的实施例对本发明做进一步的解释说明:The present invention will be further explained below in conjunction with accompanying drawing and specific embodiment:
如图1所示,一种由黑衣和头部图像指导的换衣行人重识别方法,包括:As shown in Figure 1, a method for re-identification of pedestrians who change clothes guided by black clothes and head images, including:
步骤1:从原始行人图像中删除衣服特征,得到黑色的衣服行人图像(简称为黑衣图像);Step 1: Delete the clothes feature from the original pedestrian image to get the black clothes pedestrian image (referred to as the black image);
步骤2:使用预训练的HRNet对原始行人图像进行处理,得到原始行人图像中的行人头部图像;Step 2: Use the pre-trained HRNet to process the original pedestrian image to obtain the pedestrian head image in the original pedestrian image;
步骤3:构建换衣行人重识别网络,该网络由三个网络分支组成,分别为原始分支、黑衣分支、头部分支,如图2所示,分别用于学习原始行人图像特征、黑色的衣服行人图像特征、行人头部图像特征;三个网络分支主干网络结构相同,但不共享参数,以更好地利用辨别特征空间;Step 3: Build a re-recognition network for pedestrians who change clothes. The network consists of three network branches, namely the original branch, the black branch, and the head branch. As shown in Figure 2, they are used to learn the original pedestrian image features, black Clothes pedestrian image features, pedestrian head image features; the three network branches have the same backbone network structure, but do not share parameters to better utilize the discriminative feature space;
步骤4:将步骤1得到的黑色的衣服行人图像输入黑衣分支中进行训练,得到预训练的黑衣分支;Step 4: Input the black-clothed pedestrian image obtained in
步骤5:在预训练的黑衣分支的指导下训练原始分支,得到原始行人图像中与行人相关但与衣服不相关的特征;Step 5: Train the original branch under the guidance of the pre-trained black branch to obtain the features related to pedestrians but not related to clothes in the original pedestrian image;
步骤6:将步骤2得到行人头部图像输入头部分支中进行学习,并将学习到的行人头部图像特征与步骤5得到的与行人相关但与衣服不相关的特征相结合,得到与行人相关但与衣服不相关的总体特征,完成换衣行人重识别网络训练;Step 6: Input the pedestrian head image obtained in step 2 into the head branch for learning, and combine the learned pedestrian head image features with the features related to pedestrians but not related to clothes obtained in step 5 to obtain pedestrian The overall features that are relevant but not related to clothes, complete the training of the re-identification network for people changing clothes;
步骤7:基于训练后的换衣行人重识别网络进行换衣行人重识别。Step 7: Perform re-identification of people changing clothes based on the trained re-identification network for people changing clothes.
进一步地,所述步骤1包括:Further, said
采用预训练好的的人体解析模型,来获得原始行人图像中行人的各身体部位图像,并将得到的各身体部位图像重新组合,得到六个部分:背景、头部、上衣、裤子、手臂和腿,从中提取上衣和裤子图像的像素,形成衣服区域;将衣服区域的所有像素设置为零,得到黑色的衣服行人图像。Use the pre-trained human body analysis model to obtain the body part images of pedestrians in the original pedestrian image, and recombine the obtained body part images to obtain six parts: background, head, top, pants, arms and Legs, from which the pixels of the top and trousers images are extracted to form the clothes region; all pixels in the clothes region are set to zero, resulting in a black clothed pedestrian image.
进一步地,三个网络分支主干网络均为imViT网络,在每个imViT网络中,有两个输出分别用于提取全局特征和局部特征,主干网络由全局特征和局部特征上的三元组损失和ID损失分别进行优化,其中ID损失为没有标签平滑的交叉熵损失。Furthermore, the backbone networks of the three network branches are all imViT networks. In each imViT network, two outputs are used to extract global features and local features respectively. The backbone network consists of triplet loss on global features and local features and The ID loss is optimized separately, where the ID loss is the cross-entropy loss without label smoothing.
进一步地,所述步骤5包括:Further, said step 5 includes:
根据知识蒸馏算法,在预训练的黑衣分支的指导下训练原始图像的分支,采用均方差损失来规范原始分支的训练,以训练出更多与身份相关但与衣服不相关的特征。According to the knowledge distillation algorithm, the branch of the original image is trained under the guidance of the pre-trained black branch, and the mean square error loss is used to standardize the training of the original branch to train more features related to identity but not related to clothes.
具体地,该方法包括:Specifically, the method includes:
1.获取黑衣图像1. Get the black image
为了消除衣服在特征提取中的影响,我们从原始图像中删除了衣服特征,得到了黑色的衣服行人图像。首先,我们使用HRNet,它是一个预训练好的的人体解析模型,来获得身体各部位图像。这个模型的预测结果将身体分为20个部分,由于这个模型划分了20个部分,我们将它们重新组合,得到六个部分:背景、头部、上衣、裤子、手臂和腿。我们从中提取上衣和裤子,形成衣服区域。我们只使用其中的两个(上衣、裤子)。In order to eliminate the influence of clothes in feature extraction, we remove the clothes features from the original image, and get black clothes pedestrian images. First, we use HRNet, which is a pre-trained human parsing model, to obtain images of body parts. The prediction result of this model divides the body into 20 parts, and since this model divides the 20 parts, we recombine them to get six parts: background, head, top, pants, arms and legs. We extract the top and pants from it, forming the clothes area. We only use two of them (top, pants).
首先,对于给定输入的批量样本xi[i=1....B],其中B是批量大小,xi实际上是一个图像,xi∈RH×W×C,其中H、W、C分别表示其高度、宽度、通道数。首先,我们将xi通过人体解析模型解析的语义图表示为si[i=1....B],si∈R1×H×W。si的像素值定义为si∈{0,1,2,3,4,5}分别代表身体的六个部分。si的每个像素都被定义为0,1,2,3,4或5,分别代表身体的六个部分。First, for a given input batch sample x i [i=1....B], where B is the batch size, x i is actually an image, x i ∈ R H×W×C , where H, W , C represent its height, width, channel number respectively. First, we denote the semantic graph of x i parsed by the human body parsing model as s i [i=1....B], s i ∈ R 1×H×W . The pixel values of s i are defined as s i ∈ {0,1,2,3,4,5} representing the six parts of the body respectively. Each pixel of si is defined as 0, 1, 2, 3, 4 or 5, representing six parts of the body, respectively.
其次,获得上衣和裤子的像素。xi的每个像素被定义为vj,有c个值,每个输入样本xi总共有W×H像素向量。而我们将xi中上衣和裤子的所有像素向量表示为:Second, get the pixels for the top and pants. Each pixel of xi is defined as vj with c values, for a total of W×H pixel vectors for each input sample xi . And we represent all pixel vectors of tops and pants in xi as:
B(clothes and pants)={vj|vj=xi[si==2||si==3],i∈[1,B],j∈[1,N]}(1)B(clothes and pants)={v j |v j =x i [s i ==2||s i ==3], i∈[1,B],j∈[1,N]}(1)
其中N是每个xi中裤子和上衣的总数,并且在每个xi中是不同的,j代表jth像素向量,si表示语义分割图,2代表上衣的索引,3代表裤子的索引,xi[si==2||si==3]代表xi像素向量的上衣和裤子。因此,原始图像xi可以视为xi=[v1,v2...vc1,...,vcn,...vlast],其中[vc1...vcn]属于由公式(1)得到的裤子和上衣的像素。where N is the total number of pants and tops in each xi and is different in each xi , j represents the j th pixel vector, s i represents the semantic segmentation map, 2 represents the index of tops, and 3 represents the index of pants , xi [s i ==2||s i ==3] represents the coat and trousers of the x i pixel vector. Therefore, the original image x i can be regarded as x i =[v 1 ,v 2 ...v c1 ,...,v cn ,...v last ], where [v c1 ...v cn ] belongs to Equation (1) obtains the pixels of the pants and the top.
最后,黑衣图像是通过将裤子和上衣的像素设置为零而得到的。具体来说,我们设置[vc1...vcn]=0,然后可以得到xi′=[v1,v2...0,...vlast],这就是xi的对应黑衣图像。Finally, the black image is obtained by setting the pixels of the pants and top to zero. Specifically, we set [v c1 ...v cn ]=0, then we can get x i′ =[v 1 ,v 2 ...0,...v last ], which is the corresponding black of x i clothing image.
2.各分支机构的骨干网络2. The backbone network of each branch
我们选择imViT(具体参见[He,Shuting,et al."Transreid:Transformer-basedobject re-identifification."Proceedings of the IEEE/CVF InternationalConference on Computer Vision.2021.1,2,4])作为我们每个分支的主干网络。在每个imViT中,有两个输出用于提取全局和局部特征。对于局部,我们可以选择网络中的局部特征是由多少个局部子特征组成的,在本发明的实验中选择4个局部子特征。因此,对于i-th(i∈{0,1,2})分支,我们可以得到局部特征Fli和全局特征Fgi,其中Fli=[Fli1,Fli2,Fli3,Fli4,i∈{0,1,2}]。We choose imViT (see [He, Shuting, et al. "Transreid: Transformer-based object re-identification." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.1, 2, 4]) as the backbone of each of our branches network. In each imViT, there are two outputs for extracting global and local features. For local, we can choose how many local sub-features the local features in the network are composed of, and 4 local sub-features are selected in the experiment of the present invention. Therefore, for the i-th (i∈{0,1,2}) branch, we can get the local feature F li and the global feature F gi , where F li = [F li1 ,F li2 ,F li3 ,F li4 , i ∈{0,1,2}].
主干网络是由全局和局部特征上的三元组损失和ID损失分别优化的。ID损失是没有标签平滑的交叉熵损失。至于三元组损失Ltri如下所示:The backbone network is optimized by triplet loss and ID loss on global and local features, respectively. ID loss is cross-entropy loss without label smoothing. As for the triplet loss L tri is as follows:
Ltri=log(1+exp(||fa-fp||2-||fa-fn||2))L tri =log(1+exp(||f a -f p || 2 -||f a -f n || 2 ))
其中fa代表锚,fp代表正样本,而fn代表负样本。因此,我们框架的每个分支都有两组损失函数,分别是Ltri-gi、Lid-gi,以及Ltri-li、Lid-li,其中Ltri-gi、Lid-gi分别为全局特征Fgi对应的三元组损失、ID损失,Ltri-li、Lid-li分别为局部特征Fli对应的三元组损失、ID损失。where f a represents anchors, f p represents positive samples, and f n represents negative samples. Therefore, each branch of our framework has two sets of loss functions, namely L tri-gi , L id-gi , and L tri-li , L id-li , where L tri-gi , L id-gi are respectively The triplet loss and ID loss corresponding to the global feature F gi , L tri-li and L id-li are the triplet loss and ID loss corresponding to the local feature F li , respectively.
3.由黑衣分支指导的原始分支3. The original branch directed by the black branch
黑衣分支可以学习与衣服无关的特征。但是一些重要的鉴别信息可能在获得黑衣图像的过程中被丢弃,这些信息隐藏在原始图像中。而直接从原始图像中提取与衣服无关的身份特征是不可行的。根据知识蒸馏算法,我们在预训练的黑衣分支的指导下训练原始图像的分支。具体来说,我们采用均方差(MSE)损失来规范原始分支的训练,以训练出更多与身份相关但与衣服不相关的特征。均方差损失Lmse被定义为The black branch can learn clothes-independent features. But some important discriminative information may be discarded in the process of obtaining the black clothes image, which is hidden in the original image. However, it is infeasible to directly extract clothes-independent identity features from raw images. According to the knowledge distillation algorithm, we train the original image branch under the guidance of the pre-trained black branch. Specifically, we employ a mean square error (MSE) loss to regularize the training of the original branch to train more identity-related but clothes-unrelated features. The mean square error loss L mse is defined as
Lmse=Lmse-opg+Lmse-opl L mse =L mse-opg +L mse-opl
其中Fl1、Fg1分别表示由黑色分支得到的局部特征和全局特征,Fl2、Fg2分别表示由黑色分支引导的原始分支得到相连的局部特征和全局特征,Lmse-opg、Lmse-opl分别表示全局特征Fopg=[Fg1,Fg2]和局部特征Fopl=[Fl1,Fl2]对应的均方差损失。Among them, F l1 and F g1 represent the local features and global features obtained by the black branch respectively, F l2 and F g2 respectively represent the local features and global features connected by the original branch guided by the black branch, L mse-opg , L mse- opl represents the mean square error loss corresponding to the global feature F opg = [F g1 , F g2 ] and the local feature F opl = [F l1 , F l2 ] respectively.
4.头部特征的提取4. Extraction of head features
对于输入的原始图像,我们使用预训练的HRNet来获得头部图像部分。将得到的头部图像放入imViT3中,得到合并的局部特征Fl3和全局特征Fg3,其中,前者是由四个局部特征Fl31,Fl32,Fl33,Fl34合并而成。同样地,由黑色分支引导的原始分支可以得到相连的局部特征Fl2和全局特征Fg2。为了获得更多的不确定的衣服相关特征,我们对Fl2和Fl3、Fg2和Fg3进行元素求和,其定义如下:For the input raw image, we use a pre-trained HRNet to obtain the head image part. Put the obtained head image into imViT3 to obtain the merged local feature F l3 and global feature F g3 , where the former is formed by merging four local features F l31 , F l32 , F l33 , and F l34 . Similarly, the original branch guided by the black branch can get the connected local feature F l2 and global feature F g2 . To obtain more uncertain clothes-related features, we perform element-wise summation of F l2 and F l3 , F g2 and F g3 , which are defined as follows:
Fohg=wFg2+(1-w)Fg3 F ohg =wF g2 +(1-w)F g3
Fohl=wFl2+(1-w)Fl3 F ohl =wF l2 +(1-w)F l3
其中,Fohg、Fohl分别表示与行人相关但与衣服不相关的全局总体特征、局部总体特征,w为权重系数,w∈(0,1)。Among them, F ohg and F ohl respectively represent the global overall feature and local overall feature related to pedestrians but not related to clothes, w is the weight coefficient, w∈(0,1).
而三元组损失和ID损失被应用于其中,它们是Lid-ohg、Ltri-ohg、Lid-ohl和Ltri-ohl,其中Lid-ohg、Ltri-ohg分别表示Fohg对应的ID损失、三元组损失,Lid-ohl、Ltri-ohl分别表示Fohl对应的ID损失、三元组损失。And the triplet loss and ID loss are applied in it, they are L id-ohg , L tri-ohg , L id-ohl and L tri-ohl , where L id-ohg and L tri-ohg represent F ohg corresponding to The ID loss and triplet loss of , L id-ohl and L tri-ohl represent the ID loss and triplet loss corresponding to F ohl respectively.
5.联合训练5. Joint training
在训练中,有两个阶段。首先,我们将黑衣图像送入黑衣分支,并在全局特征和上用ID损失和三元组损失(为了简洁起见,我们在图1的黑衣分支中省略了这两种损失)对其进行单独训练。然后我们得到预训练的黑衣分支。In training, there are two phases. First, we feed the black image into the black branch, and in the global feature and It is trained separately with ID loss and triplet loss (for brevity, we omit these two losses in the black branch in Fig. 1). Then we get the pretrained black branch.
其次,我们固定黑衣分支的学习权重并联合训练其他分支。因此,总损失函数被定义为:Second, we fix the learned weights of the black branch and jointly train other branches. Therefore, the total loss function is defined as:
Ltotal(θ)=l1Lid(θ)+l2Ltri(θ)+l3Lmse(θ)L total (θ)=l 1 L id (θ)+l 2 L tri (θ)+l 3 L mse (θ)
其中,Lid(θ)代表ID损失,Ltri(θ)代表三元组损失,Lmse(θ)代表均方差损失。l1,l2,l3是平衡各损失贡献的权衡参数。在本发明的实验中,l1,l2,l3分别被设定为0.25、0.25和0.5。Among them, L id (θ) represents the ID loss, L tri (θ) represents the triplet loss, and L mse (θ) represents the mean square error loss. l 1 , l 2 , l 3 are trade-off parameters to balance the contribution of each loss. In the experiments of the present invention, l 1 , l 2 , and l 3 are set to 0.25, 0.25, and 0.5, respectively.
为验证本发明效果,进行如下实验:For verifying effect of the present invention, carry out following experiment:
我们分别在不同服装设置和相同服装设置中对PRCC(PRCC包括来自221个人物的33698张图片,3个不同的角度,还提供人物的轮廓草图图像,方便提取人的轮廓信息)数据集进行了实验。在不同衣服的设置中,来自相机A的图像被用于候选集gallery,而来自相机C的图像则作为查找集query。在相同的衣服设置中,gallery的图像也来自相机A,但query的图像来自相机B。实验结果见表1。We tested the dataset of PRCC (PRCC includes 33698 pictures from 221 people, 3 different angles, and also provides silhouette sketch images of people, which is convenient for extracting people’s contour information) in different clothing settings and the same clothing settings. experiment. In the different clothes setting, the image from camera A is used as the candidate set gallery, while the image from camera C is used as the lookup set query. In the same clothes setting, the image of gallery is also from camera A, but the image of query is from camera B. The experimental results are shown in Table 1.
表1实验结果Table 1 Experimental results
我们在PRCC数据集上对本发明的方法和一些最先进的方法做了一些比较,包括有代表性的传统行人重识别方法(PCB[Y.Sun,L.Zheng,Y.Yang,Q.Tian,and S.Wang,“Beyondpart models:Person retrieval with refifined part pooling(and a strongconvolutional baseline),”in Proceedings of the European conference oncomputer vision(ECCV),2018,pp.480–496.],Zheng et al’s method[Z.Zheng,L.Zheng,and Y.Yang,“A discriminatively learned cnn embedding for personreidentification,”ACM transactions on multimedia computing,communications,andapplications(TOMM),vol.14,no.1,pp.1–20,2017.],HPM[Y.Fu,Y.Wei,Y.Zhou,H.Shi,G.Huang,X.Wang,Z.Yao,and T.Huang,“Horizontal pyramid matching for person re-identification,”in Proceedings of the AAAI conference on artificialintelligence,vol.33,no.01,2019,pp.8295–8302.],HACNN[W.Li,X.Zhu,and S.Gong,“Harmonious attention network for person re-identification,”in Proceedings ofthe IEEE conference on computer vision and pattern recognition,2018,pp.2285–2294.])和换衣行人重识别方法(PRCC(sketch)[Q.Yang,A.Wu,and W.-S.Zheng,“Personre-identification by contour sketch under moderate clothing change,”IEEEtransactions on pattern analysis and machine intelligence,vol.43,no.6,pp.2029–2046,2019.],GI-ReID(OSNet)[X.Jin,T.He,K.Zheng,Z.Yin,X.Shen,Z.Huang,R.Feng,J.Huang,Z.Chen,and X.-S.Hua,“Cloth-changing person re-identificationfrom a single image with gait prediction and regularization,”in Proceedingsof the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022,pp.14 278–14 287.],LightMBN[F.Herzog,X.Ji,T.Teepe,S.J.Gilg,andG.Rigoll,“Lightweight multi-branch network for person re-identification,”in2021 IEEE International Conference on Image Processing(ICIP).IEEE,2021,pp.1129–1133.])。所有比较方法的结果都来自他们发表的论文。We made some comparisons between our method and some state-of-the-art methods on the PRCC dataset, including representative traditional person re-identification methods (PCB [Y.Sun, L.Zheng, Y.Yang, Q.Tian, and S. Wang, “Beyondpart models: Person retrieval with refined part pooling(and a strong convolutional baseline),” in Proceedings of the European conference on computer vision (ECCV), 2018, pp.480–496.], Zheng et al's method[ Z. Zheng, L. Zheng, and Y. Yang, "A discriminatively learned cnn embedding for personreidentification," ACM transactions on multimedia computing, communications, and applications (TOMM), vol.14, no.1, pp.1–20, 2017.], HPM [Y.Fu, Y.Wei, Y.Zhou, H.Shi, G.Huang, X.Wang, Z.Yao, and T.Huang, "Horizontal pyramid matching for person re-identification," in Proceedings of the AAAI conference on artificial intelligence, vol.33, no.01, 2019, pp.8295–8302.], HACNN [W.Li, X.Zhu, and S.Gong, "Harmonious attention network for person re- identification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp.2285–2294.]) and re-identification method for people changing clothes (PRCC (sketch) [Q.Yang, A.Wu, and W.- S. Zheng, "Person re-identification by contour sketch under moderate clothing change," IEEE transactions on pattern analysis and machine intelligence, vol.43, no.6, pp.2029–2046, 2019.], GI-ReID (OSNet) [X.Jin, T.He, K.Zheng , Z.Yin, X.Shen, Z.Huang, R.Feng, J.Huang, Z.Chen, and X.-S.Hua, “Cloth-changing person re-identification from a single image with gait prediction and regularization, "in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp.14 278–14 287.], LightMBN [F. Herzog, X. Ji, T. Teepe, S. J. Gilg, and G. Rigoll, “Lightweight multi-branch network for person re-identification,” in 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 2021, pp.1129–1133.]). The results of all compared methods come from their published papers.
从表1可以看出,本发明的方法取得了非常好的性能。在同一件衣服上,虽然本发明的方法略逊于LightMBN,但几乎超过了其他所有的方法。这表明本发明的方法在相同衣服的行人重识别上有更好的泛化能力。在不同衣服的设置中,本发明的方法表现出明显的优异性能,在mAP(mAP用于评价行人重识别算法的整体效果;其中AP是指一个query样本的平均精度,表示模型在某一个样本上的效果,mAP是所有query样本AP的平均值,表示模型在所有query样本上的整体效果)超过LightMBN 4.7%,在rank-1(即表1中R@1)的准确性上超过9.6%。这可能是由于本发明的方法能够提取更多的与衣服无关的身份特征。As can be seen from Table 1, the method of the present invention has achieved very good performance. On the same garment, although our method is slightly inferior to LightMBN, it surpasses almost all other methods. This shows that the method of the present invention has a better generalization ability in the re-identification of people with the same clothes. In the settings of different clothes, the method of the present invention shows obvious excellent performance. In mAP (mAP is used to evaluate the overall effect of the pedestrian re-identification algorithm; wherein AP refers to the average precision of a query sample, indicating that the model is in a certain sample The effect on the mAP is the average of all query samples AP, indicating the overall effect of the model on all query samples) exceeding LightMBN by 4.7%, and the accuracy of rank-1 (that is, R@1 in Table 1) exceeding 9.6% . This may be due to the fact that the method of the present invention is able to extract more identity features that are not related to clothes.
为了进行直观的分析,我们从PRCC中随机抽取了3张图像,并在图3中显示了基线和本发明的方法捕捉到的相应的激活图(CAM)。从第一行可以看出,基线的激活区域集中在衣服和背景上,许多背景和衣服区域被激活并被用来识别一个人,这可能会混淆识别结果。此外,基线对头部的注意较少。从第二行可以发现,我们提出的方法中的激活点更加集中,而且背景和衣服区域更少,这就减轻了背景和衣服对识别的影响。此外,我们还注意到,我们提出的模型对头部和身体的形状给予了更多关注。For an intuitive analysis, we randomly sample 3 images from PRCC and show the corresponding activation maps (CAMs) captured by the baseline and our method in Fig. 3. As can be seen from the first row, the activation regions of the baseline are concentrated on the clothes and the background, and many background and clothes regions are activated and used to recognize a person, which may confuse the recognition results. Also, the baseline pays less attention to the head. From the second row, it can be found that the activation points in our proposed method are more concentrated, and there are fewer background and clothes areas, which alleviates the impact of background and clothes on recognition. Furthermore, we also notice that our proposed model pays more attention to the shape of the head and body.
在上述实施例的基础上,如图4所示,本发明还提出一种由黑衣和头部图像指导的换衣行人重识别装置,包括:On the basis of the above embodiments, as shown in Figure 4, the present invention also proposes a device for re-identifying pedestrians who change clothes guided by black clothes and head images, including:
黑衣图像得出模块,用于从原始行人图像中删除衣服特征,得到黑色的衣服行人图像;The image of the black clothes is derived from the module, which is used to delete the clothes feature from the original pedestrian image, and obtain the black clothes pedestrian image;
头部图像得出模块,用于使用预训练的HRNet对原始行人图像进行处理,得到原始行人图像中的行人头部图像;The head image obtains the module, is used to process the original pedestrian image using pre-trained HRNet, and obtains the pedestrian head image in the original pedestrian image;
换衣行人重识别网络构建模块,用于构建换衣行人重识别网络,该网络由三个网络分支组成,分别为原始分支、黑衣分支、头部分支,分别用于学习原始行人图像特征、黑色的衣服行人图像特征、行人头部图像特征;三个网络分支主干网络结构相同,但不共享参数;Clothes-changing pedestrian re-identification network building module is used to build a clothing-changing pedestrian re-identification network. The network consists of three network branches, namely the original branch, the black branch, and the head branch, which are used to learn the original pedestrian image features, Image features of pedestrians in black clothes, image features of pedestrians' heads; the backbone network structure of the three network branches is the same, but do not share parameters;
黑衣分支训练模块,用于将黑衣图像得出模块得到的黑色的衣服行人图像输入黑衣分支中进行训练,得到预训练的黑衣分支;The black clothing branch training module is used to input the black clothing pedestrian image obtained by the black clothing image module into the black clothing branch for training to obtain the pre-trained black clothing branch;
原始分支训练模块,用于在预训练的黑衣分支的指导下训练原始分支,得到原始行人图像中与行人相关但与衣服不相关的特征;The original branch training module is used to train the original branch under the guidance of the pre-trained black branch, and obtain the features related to pedestrians but not related to clothes in the original pedestrian image;
头部分支训练模块,用于将头部图像得出模块得到行人头部图像输入头部分支中进行学习,并将学习到的行人头部图像特征与原始分支训练模块得到的与行人相关但与衣服不相关的特征相结合,得到与行人相关但与衣服不相关的总体特征,完成换衣行人重识别网络训练;The head branch training module is used to obtain the head image obtained by the head image module and input the head image into the head branch for learning, and the learned pedestrian head image features are related to the pedestrian but not related to the original branch training module. Combining the features that are not related to clothes, the overall features that are related to pedestrians but not related to clothes are obtained, and the training of re-identification network for pedestrians who change clothes is completed;
换衣行人重识别模块,用于基于训练后的换衣行人重识别网络进行换衣行人重识别。The re-identification module of people changing clothes is used for re-identifying people changing clothes based on the trained re-identification network for people changing clothes.
进一步地,所述黑衣图像得出模块具体用于:Further, the black clothes image obtaining module is specifically used for:
采用预训练好的的人体解析模型,来获得原始行人图像中行人的各身体部位图像,并将得到的各身体部位图像重新组合,得到六个部分:背景、头部、上衣、裤子、手臂和腿,从中提取上衣和裤子图像的像素,形成衣服区域;将衣服区域的所有像素设置为零,得到黑色的衣服行人图像。Use the pre-trained human body analysis model to obtain the body part images of pedestrians in the original pedestrian image, and recombine the obtained body part images to obtain six parts: background, head, top, pants, arms and Legs, from which the pixels of the top and trousers images are extracted to form the clothes region; all pixels in the clothes region are set to zero, resulting in a black clothed pedestrian image.
进一步地,三个网络分支主干网络均为imViT网络,在每个imViT网络中,有两个输出分别用于提取全局特征和局部特征,主干网络由全局特征和局部特征上的三元组损失和ID损失分别进行优化,其中ID损失为没有标签平滑的交叉熵损失。Furthermore, the backbone networks of the three network branches are all imViT networks. In each imViT network, two outputs are used to extract global features and local features respectively. The backbone network consists of triplet loss on global features and local features and The ID loss is optimized separately, where the ID loss is the cross-entropy loss without label smoothing.
进一步地,所述原始分支训练模块具体用于:Further, the original branch training module is specifically used for:
根据知识蒸馏算法,在预训练的黑衣分支的指导下训练原始图像的分支,采用均方差损失来规范原始分支的训练,以训练出更多与身份相关但与衣服不相关的特征。According to the knowledge distillation algorithm, the branch of the original image is trained under the guidance of the pre-trained black branch, and the mean square error loss is used to standardize the training of the original branch to train more features related to identity but not related to clothes.
综上,相比较于利用GAN生成大量换衣图像扩充数据集的方法,本发明使用非GAN的方式。利用人体语义解析模型获得人体的上衣和裤子部分,并对其使用本发明的方法进行遮挡,这样的方式不用扩充数据集就能使模型更集中地学习到与衣服无关的特征,既节省了空间又节省了时间。相比较于其他衣服特征和身份特征分开的方法,本发明将所有行人的衣服进行遮挡,从而获得了黑衣图像,这样就使得行人的衣服颜色统一,从而使模型更加关注衣服颜色以外的部分,从而提高模型健壮性。相比较于直接从排除衣服后图像中学习特征,本发明利用黑衣分支,指导原始分支直接从原始RGB图像中学习到衣服无关的身份特征,这样能够有效地利用原始图像中的信息,有效减少黑衣图像获取过程中的信息丢失,提高特征的鲁棒性。相比较于直接从原始图像提取衣服无关特征的方法,本发明通过增加头部图像块的专门处理,从而能够提取到更具判别性的细粒度特征,对全局特征有较好的补充作用。在PRCC数据集上的测试结果表明本发明的方法取得优秀的换衣行人重识别效果。In summary, compared with the method of using GAN to generate a large number of clothes-changing image expansion data sets, the present invention uses a non-GAN method. Use the human body semantic analysis model to obtain the top and trousers of the human body, and use the method of the present invention to block them. In this way, the model can learn features that have nothing to do with clothes more intensively without expanding the data set, which saves space. Another time saver. Compared with other methods that separate clothing features and identity features, the present invention blocks all pedestrians' clothes to obtain a black image, which makes the color of pedestrians' clothes uniform, so that the model pays more attention to parts other than the color of the clothes. Thereby improving the robustness of the model. Compared with learning features directly from the image after excluding clothes, the present invention uses the black branch to guide the original branch to learn clothes-independent identity features directly from the original RGB image, which can effectively use the information in the original image and effectively reduce The information is lost during the acquisition of the black clothes image, which improves the robustness of the features. Compared with the method of directly extracting clothing-independent features from the original image, the present invention can extract more discriminative fine-grained features by adding special processing of the head image block, which has a good supplementary effect on the global features. The test results on the PRCC data set show that the method of the present invention achieves excellent re-identification results for people changing clothes.
以上所示仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。What is shown above is only a preferred embodiment of the present invention, and it should be pointed out that for those of ordinary skill in the art, some improvements and modifications can also be made without departing from the principle of the present invention. It should be regarded as the protection scope of the present invention.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211258905.XA CN115620338A (en) | 2022-10-14 | 2022-10-14 | Method and device for re-identifying pedestrians who change clothes guided by black clothes and head images |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211258905.XA CN115620338A (en) | 2022-10-14 | 2022-10-14 | Method and device for re-identifying pedestrians who change clothes guided by black clothes and head images |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115620338A true CN115620338A (en) | 2023-01-17 |
Family
ID=84861950
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211258905.XA Pending CN115620338A (en) | 2022-10-14 | 2022-10-14 | Method and device for re-identifying pedestrians who change clothes guided by black clothes and head images |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115620338A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116129473A (en) * | 2023-04-17 | 2023-05-16 | 山东省人工智能研究院 | Identity-guided joint learning method and system for re-identification of pedestrians who change clothes |
-
2022
- 2022-10-14 CN CN202211258905.XA patent/CN115620338A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116129473A (en) * | 2023-04-17 | 2023-05-16 | 山东省人工智能研究院 | Identity-guided joint learning method and system for re-identification of pedestrians who change clothes |
CN116129473B (en) * | 2023-04-17 | 2023-07-14 | 山东省人工智能研究院 | Identity-guided joint learning method and system for re-identification of pedestrians who change clothes |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109508663A (en) | A kind of pedestrian's recognition methods again based on multi-level supervision network | |
CN111582095B (en) | Light-weight rapid detection method for abnormal behaviors of pedestrians | |
CN111274921A (en) | A method for human action recognition using pose mask | |
He et al. | Enhancing face recognition with self-supervised 3d reconstruction | |
Chen et al. | Learning discriminative and generalizable representations by spatial-channel partition for person re-identification | |
CN116704611B (en) | A cross-view gait recognition method based on motion feature mixing and fine-grained multi-stage feature extraction | |
CN112131970A (en) | Identity recognition method based on multi-channel space-time network and joint optimization loss | |
CN113158739B (en) | Method for solving re-identification of replacement person by twin network based on attention mechanism | |
CN114299542A (en) | Video pedestrian re-identification method based on multi-scale feature fusion | |
CN115620338A (en) | Method and device for re-identifying pedestrians who change clothes guided by black clothes and head images | |
CN115830652A (en) | Deep palm print recognition device and method | |
Yu et al. | Pedestrian detection based on improved Faster RCNN algorithm | |
Wei et al. | A survey of facial expression recognition based on deep learning | |
Wu et al. | Spatio-Temporal Associative Representation for Video Person Re-Identification. | |
Liu et al. | SCSA-Net: Presentation of two-view reliable correspondence learning via spatial-channel self-attention | |
Zhou et al. | LRDNN: Local-refining based Deep Neural Network for Person Re-Identification with Attribute Discerning. | |
Hong et al. | Camera-specific informative data augmentation module for unbalanced person re-identification | |
Liu et al. | Pose-guided attention learning for cloth-changing person re-identification | |
Liu et al. | Similarity preserved camera-to-camera GAN for person re-identification | |
Zheng et al. | A mask-pooling model with local-level triplet loss for person re-identification | |
Cao et al. | Few-shot person re-identification based on meta-learning with a compression and stimulation module | |
CN115393950A (en) | Gesture segmentation network device and method based on multi-branch cascade Transformer | |
Liu et al. | Multi-Scale Feature Fusion Network for Video-Based Person Re-Identification | |
Ding et al. | Key frame extraction based on frame difference and cluster for person re-identification | |
Guan et al. | Cdtnet: Cross-domain transformer based on attributes for person re-identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |