CN110110689B - A Pedestrian Re-identification Method - Google Patents

A Pedestrian Re-identification Method Download PDF

Info

Publication number
CN110110689B
CN110110689B CN201910403777.5A CN201910403777A CN110110689B CN 110110689 B CN110110689 B CN 110110689B CN 201910403777 A CN201910403777 A CN 201910403777A CN 110110689 B CN110110689 B CN 110110689B
Authority
CN
China
Prior art keywords
pedestrian
feature map
channel
feature
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910403777.5A
Other languages
Chinese (zh)
Other versions
CN110110689A (en
Inventor
张云洲
刘双伟
齐林
朱尚栋
徐文娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201910403777.5A priority Critical patent/CN110110689B/en
Publication of CN110110689A publication Critical patent/CN110110689A/en
Application granted granted Critical
Publication of CN110110689B publication Critical patent/CN110110689B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the disclosure relates to a pedestrian re-identification method, which comprises the following steps: extracting a pedestrian CNN characteristic diagram from a plurality of pictures; simulating the situation that the discriminant area of the CNN feature map of the pedestrian is blocked by adopting an anti-erasure learning mode to perform model training to obtain a training model; and carrying out pedestrian re-recognition by combining the training model with the target pedestrian image and the pedestrian image to be recognized to obtain a pedestrian re-recognition result. The method provided by the embodiment of the disclosure provides a feature level data enhancement strategy, the input feature map of the auxiliary classifier is partially erased, the variation of pedestrian features is increased, the situation that pedestrians are blocked is resisted, and the generalization capability of a deep pedestrian re-recognition model is improved.

Description

一种行人重识别方法A Pedestrian Re-identification Method

技术领域technical field

本公开涉及计算机视觉技术领域,尤其涉及一种行人重识别方法。The present disclosure relates to the technical field of computer vision, in particular to a pedestrian re-identification method.

背景技术Background technique

行人重识别是在无重叠的多摄像机监控系统下对行人身份进行匹配与识别,在智能视频监控、预防犯罪发生、维护社会治安等方面发挥着重要的作用。然而当姿势、步态、服饰等人体属性以及光照、背景等环境因素变化时,同一行人的外观在不同的监控视频下存在明显的差异,而不同行人的外观在某种情况下会比较相似。Pedestrian re-identification is to match and identify pedestrian identities under the non-overlapping multi-camera surveillance system, and it plays an important role in intelligent video surveillance, crime prevention, and maintenance of social order. However, when human body attributes such as posture, gait, clothing, and environmental factors such as illumination and background change, the appearance of the same pedestrian will be significantly different under different surveillance videos, while the appearance of different pedestrians will be relatively similar under certain circumstances.

近年来,深度学习的方法被广泛应用,相比于传统手工设计方法,深度学习能够取得更好的性能。然而,深度行人重识别模型通常存在大量网络参数,却在有限的数据集上进行优化,这就会增加过拟合的风险,降低泛化能力。因此提高模型的泛化能力对于深度行人重识别是一个有意义且重要的问题。In recent years, deep learning methods have been widely used. Compared with traditional manual design methods, deep learning can achieve better performance. However, deep person re-identification models usually have a large number of network parameters, but are optimized on a limited data set, which increases the risk of over-fitting and reduces the generalization ability. Therefore, improving the generalization ability of the model is a meaningful and important issue for deep person re-identification.

为了提高深度卷积神经网络的泛化能力,可以增加训练数据集的变体以及收集大量包含遮挡情形的行人图像,但是仅能实现图像级别的数据增强,未能提供在图像级别之外的方面进行数据增强,以提高深度卷积神经网络的泛化能力。In order to improve the generalization ability of the deep convolutional neural network, it is possible to increase the variants of the training data set and collect a large number of pedestrian images containing occlusions, but only image-level data enhancement can be achieved, and aspects other than the image level cannot be provided. Perform data augmentation to improve the generalization ability of deep convolutional neural networks.

上述缺陷是本领域技术人员期望克服的。The above-mentioned disadvantages are expected to be overcome by those skilled in the art.

发明内容Contents of the invention

(一)要解决的技术问题(1) Technical problems to be solved

为了解决现有技术的上述问题,本公开提供一种行人重识别方法,其可以在特征级别的方面进行数据增强,以提高深度卷积神经网络的泛化能力。In order to solve the above-mentioned problems in the prior art, the present disclosure provides a pedestrian re-identification method, which can perform data enhancement at the feature level to improve the generalization ability of a deep convolutional neural network.

(二)技术方案(2) Technical solution

为了达到上述目的,本公开采用的主要技术方案包括:In order to achieve the above object, the main technical solutions adopted in this disclosure include:

本公开一实施例提供一种行人重识别方法,其包括:An embodiment of the present disclosure provides a pedestrian re-identification method, which includes:

从多个图片中提取得到行人CNN特征图;Extract pedestrian CNN feature maps from multiple pictures;

采用对抗擦除学习的方式模仿对所述行人CNN特征图的判别性区域被遮挡的情形进行模型训练,得到训练模型;Adopting the mode of anti-erasing learning to imitate the situation that the discriminative region of the pedestrian CNN feature map is blocked is carried out model training to obtain the training model;

利用所述训练模型结合目标行人图像和待识别行人图像进行行人重识别,得到行人重识别结果。The pedestrian re-identification is performed by using the training model combined with the target pedestrian image and the pedestrian image to be recognized, and the pedestrian re-identification result is obtained.

本公开的一个实施例中,所述从多个图片中提取得到行人CNN特征图包括:In one embodiment of the present disclosure, the extraction of pedestrian CNN feature maps from multiple pictures includes:

从训练数据集中随机选择所述多个图片;randomly selecting the plurality of pictures from the training data set;

将所述多个图片输入到ResNet50模型的多个不同语义层进行提取,得到多个通道的特征图;The plurality of pictures are input to a plurality of different semantic layers of the ResNet50 model to extract, and obtain feature maps of a plurality of channels;

利用通道注意力模块对所述多个通道的特征图进行处理,得到经通道处理的特征图;Using a channel attention module to process the feature maps of the plurality of channels to obtain channel-processed feature maps;

利用空间注意力模块对所述经通道处理的特征图在不同位置的空间上下文信息进行处理,得到所述行人CNN特征图。A spatial attention module is used to process the spatial context information of the channel-processed feature map at different positions to obtain the pedestrian CNN feature map.

本公开的一个实施例中,所述利用通道注意力模块对所述多个通道的特征图进行处理,得到经通道处理的特征图包括:In an embodiment of the present disclosure, the process of using the channel attention module to process the feature maps of the multiple channels, and obtaining the channel-processed feature maps includes:

根据所述多个通道的特征图中的每一通道的特征图,得到通道特征描述子;Obtaining a channel feature descriptor according to the feature map of each channel in the feature map of the plurality of channels;

对所述通道特征描述子通过激活函数运算,得到通道注意力特征图;Obtaining a channel attention feature map through an activation function operation on the channel feature descriptor;

将所述通道注意力特征图与所述特征图聚合的特征图相乘,得到所述经通道处理的特征图。The channel-attention feature map is multiplied by the feature map aggregated by the feature map to obtain the channel-processed feature map.

本公开的一个实施例中,所述特征描述子包括所述多个通道的统计值,所述特征描述子为:In an embodiment of the present disclosure, the feature descriptor includes statistical values of the multiple channels, and the feature descriptor is:

Figure BDA0002060606770000021
Figure BDA0002060606770000021

每一通道的统计值为:The statistical value of each channel is:

Figure BDA0002060606770000022
Figure BDA0002060606770000022

其中N为通道的数量,n为通道的编号,A和B分别为所述特征图的长和宽;Where N is the number of channels, n is the number of channels, and A and B are the length and width of the feature map, respectively;

所述通道注意力特征图为:The channel attention feature map is:

e=σ(W2δ(W1(s)))e=σ(W 2 δ(W 1 (s)))

其中σ,δ分别代表Sigmod激活函数和ReLU激活函数,

Figure BDA0002060606770000031
是第一全连接层Fc1的权重,/>
Figure BDA0002060606770000032
是第二全连接层Fc2的权重,r是衰减的倍数。Where σ and δ represent the Sigmod activation function and the ReLU activation function respectively,
Figure BDA0002060606770000031
is the weight of the first fully connected layer Fc1, />
Figure BDA0002060606770000032
is the weight of the second fully connected layer Fc2, and r is the multiple of attenuation.

本公开的一个实施例中,所述利用空间注意力模块对所述经通道处理的特征图在不同位置的空间上下文信息进行处理,得到所述行人CNN特征图包括:In an embodiment of the present disclosure, the spatial context information of the channel-processed feature map at different positions is processed by using the spatial attention module, and obtaining the pedestrian CNN feature map includes:

对所述经通道处理的特征图进行1×1的卷积运算,得到第一空间信息特征图T和第二空间信息特征图U;performing a 1×1 convolution operation on the channel-processed feature map to obtain a first spatial information feature map T and a second spatial information feature map U;

将所述第一空间信息特征图T的转置与所述第二空间信息特征图U进行矩阵乘法运算,得到空间注意力特征图;Perform matrix multiplication with the transposition of the first spatial information feature map T and the second spatial information feature map U to obtain a spatial attention feature map;

对所述经通道处理的特征图进行1×1的卷积运算,得到第三空间信息特征图V;performing a 1×1 convolution operation on the channel-processed feature map to obtain a third spatial information feature map V;

将所述第三空间信息特征图V与所述空间注意力特征图的转置进行矩阵乘法运算,得到经空间处理的特征图;Perform matrix multiplication operation with the transposition of the third spatial information feature map V and the spatial attention feature map to obtain a spatially processed feature map;

根据所述通道处理和所述空间处理得到所述行人CNN特征图。Obtaining the pedestrian CNN feature map according to the channel processing and the spatial processing.

本公开的一个实施例中,所所述采用对抗擦除学习的方式模仿对所述行人CNN特征图的判别性区域被遮挡的情形进行模型训练,得到训练模型包括:In one embodiment of the present disclosure, the model training is carried out by imitating the case where the discriminative area of the pedestrian CNN feature map is blocked by means of anti-erasing learning, and the obtained training model includes:

将所述行人CNN特征图分别输入到主分类器和辅分类器进行分类训练,从所述主分类器和所述辅分类器输出行人类别专属的特征图;The pedestrian CNN feature map is input to the main classifier and the auxiliary classifier respectively for classification training, and the pedestrian category-specific feature map is output from the main classifier and the auxiliary classifier;

在所述辅分类器进行部分擦除,得到擦除后特征图;Perform partial erasure in the auxiliary classifier to obtain a feature map after erasure;

对所述主分类器输出的所述行人类别专属的特征图与所述辅分类器输出的所述擦除后特征图,分别通过损失函数进行计算,得到损失值;The pedestrian category-specific feature map output by the main classifier and the erased feature map output by the auxiliary classifier are respectively calculated through a loss function to obtain a loss value;

根据所述损失值对所述训练模型进行参数更新。Perform parameter update on the training model according to the loss value.

本公开的一个实施例中,所述主分类器与所述辅分类器包含相同数量的卷积层和全局平均池化层,且所述卷积层的通道的数目与所述训练数据集中行人类别的数目相同,所述行人类别专属的特征图的每一通道代表行人图像属于不同类别时的身体响应热度图。In one embodiment of the present disclosure, the main classifier and the auxiliary classifier include the same number of convolutional layers and global average pooling layers, and the number of channels of the convolutional layer is the same as the number of pedestrians in the training data set. The number of categories is the same, and each channel of the pedestrian category-specific feature map represents the body response heat map when the pedestrian image belongs to different categories.

本公开的一个实施例中,所述在所述辅分类器进行部分擦除包括:In an embodiment of the present disclosure, performing partial erasing on the auxiliary classifier includes:

将所述身体响应热度图中热度图数值高于设定的对抗擦除阈值的区域确定为判别性区域;Determining the area where the heat map value in the body response heat map is higher than the set resistance erasure threshold as a discriminative area;

对所述辅分类器输出的所述行人类别专属的特征图中对应所述判别性区域的部分通过响应值被取代为0的对抗方式被擦除掉。The portion corresponding to the discriminative region in the pedestrian category-specific feature map output by the auxiliary classifier is erased in an adversarial manner by replacing the response value with 0.

本公开的一个实施例中,所述利用所述训练模型结合目标行人图像和待识别行人图像进行行人重识别,得到行人重识别结果包括:In an embodiment of the present disclosure, using the training model to perform pedestrian re-identification in combination with the target pedestrian image and the pedestrian image to be recognized, and obtaining the result of pedestrian re-identification includes:

根据所述目标行人图像和所述待识别行人图像输入到所述训练模型中进行训练,分别得到对应的深度特征;Inputting the image of the target pedestrian and the image of the pedestrian to be identified into the training model for training to obtain corresponding depth features;

根据所述目标行人图像的深度特征与所述待识别行人图像的深度特征计算余弦距离;calculating a cosine distance according to the depth feature of the target pedestrian image and the depth feature of the pedestrian image to be identified;

根据余弦距离的大小确定所述目标行人图像和所述待识别行人图像之间的相似度,其中相似度最大的待识别行人图像为所述行人重识别结果。Determine the similarity between the target pedestrian image and the pedestrian image to be recognized according to the size of the cosine distance, wherein the pedestrian image to be recognized with the largest similarity is the pedestrian re-identification result.

本公开的一个实施例中,根据所述目标行人图像的深度特征与所述待识别行人图像的深度特征计算余弦距离的计算公式为:In one embodiment of the present disclosure, the calculation formula for calculating the cosine distance according to the depth features of the target pedestrian image and the depth feature of the pedestrian image to be recognized is:

Figure BDA0002060606770000041
Figure BDA0002060606770000041

其中feat1为所述目标行人图像的深度特征,feat2为所述待识别行人图像的深度特征。Where feat1 is the depth feature of the target pedestrian image, and feat2 is the depth feature of the pedestrian image to be recognized.

(三)有益效果(3) Beneficial effects

本公开的有益效果是:本公开实施例提供的行人重识别方法,通过提供一种特征级别数据增强策略,辅分类器的输入特征图被部分擦除,增加行人特征的变体和抵抗行人被遮挡的情形,提高深度行人重识别模型的泛化能力。The beneficial effects of the present disclosure are: the pedestrian re-identification method provided by the embodiment of the present disclosure, by providing a feature-level data enhancement strategy, the input feature map of the auxiliary classifier is partially erased, and the variation of pedestrian features and the resistance to pedestrians are increased. In the case of occlusion, the generalization ability of the deep person re-identification model is improved.

附图说明Description of drawings

图1为本公开一个实施例提供的一种行人重识别方法的流程图;FIG. 1 is a flowchart of a pedestrian re-identification method provided by an embodiment of the present disclosure;

图2为本公开一个实施例中实现图1所述方法的网络结构示意图;FIG. 2 is a schematic diagram of a network structure for implementing the method described in FIG. 1 in an embodiment of the present disclosure;

图3为本公开一个实施例图1中步骤S110的流程图;FIG. 3 is a flowchart of step S110 in FIG. 1 according to an embodiment of the present disclosure;

图4为本公开一个实施例图3中步骤S303的流程图;FIG. 4 is a flowchart of step S303 in FIG. 3 according to an embodiment of the present disclosure;

图5为本公开一个实施例中通道注意力示意图;FIG. 5 is a schematic diagram of channel attention in an embodiment of the present disclosure;

图6为本公开一个实施例中空间注意力示意图;FIG. 6 is a schematic diagram of spatial attention in an embodiment of the present disclosure;

图7为本公开一个实施例图3中步骤S304的流程图;FIG. 7 is a flowchart of step S304 in FIG. 3 according to an embodiment of the present disclosure;

图8为本公开一实施例中对抗擦除学习示意图;FIG. 8 is a schematic diagram of anti-erasure learning in an embodiment of the present disclosure;

图9为本公开一个实施例图1中步骤S120的流程图;FIG. 9 is a flowchart of step S120 in FIG. 1 according to an embodiment of the present disclosure;

图10为本公开一个实施例图1中步骤S130的流程图。FIG. 10 is a flowchart of step S130 in FIG. 1 according to an embodiment of the present disclosure.

具体实施方式Detailed ways

为了更好的解释本公开,以便于理解,下面结合附图,通过具体实施方式,对本公开作详细描述。In order to better explain the present disclosure and facilitate understanding, the present disclosure will be described in detail below through specific implementation manners in conjunction with the accompanying drawings.

本文所使用的所有的技术和科学术语与属于本公开的技术领域的技术人员通常理解的含义相同。本文中在本公开的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本公开。本文所使用的术语“和/或”包括一个或多个相关的所列项目的任意的和所有的组合。All technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terms used herein in the description of the present disclosure are for the purpose of describing specific embodiments only, and are not intended to limit the present disclosure. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

在本公开的其他实施例中,增加训练数据集的变体是提高深度卷积神经网络泛化能力的一种有效方式。然而,与目标识别视觉任务不同,行人重识别需要收集跨摄像头的图像数据,行人标注也十分困难,为行人重识别构建一个足够大的数据集往往需高昂的成本投入,造成现有的数据集行人标注量小。为了解决这个问题,数据增强可以仅仅使用当前的数据集来增加训练集样本的变体,而不需要其他的代价。最近数据增强的研究利用对抗生成网络(Generative Adversarial Networks,简称GAN)来生成不同的人体姿势、相机风格的行人图像,但这种方式存在训练时间长、收敛困难、生成图像质量低等问题。除了显式地生成新图像外,通常的办法还可以在训练图像上通过抖动像素值、随机裁剪、翻转原始图像等操作来增强数据。In other embodiments of the present disclosure, increasing the variants of the training data set is an effective way to improve the generalization ability of the deep convolutional neural network. However, unlike the visual task of object recognition, person re-identification needs to collect image data across cameras, and pedestrian labeling is also very difficult. Building a large enough dataset for person re-identification often requires high cost investment, resulting in existing datasets. Pedestrian labels are small. To solve this problem, data augmentation can only use the current data set to increase the variant of the training set samples without other costs. Recent research on data enhancement uses Generative Adversarial Networks (GAN) to generate pedestrian images with different human poses and camera styles, but this method has problems such as long training time, difficult convergence, and low quality of generated images. In addition to explicitly generating new images, the usual approach is to augment the data by dithering pixel values, randomly cropping, flipping the original image, etc. on the training image.

另外,遮挡也是影响卷积神经网络泛化能力的一个重要因素。收集大量包含遮挡情形的行人图像是一种可以有效解决遮挡问题的方式,但这也需要高昂的成本投入。另一种比较合理的方法是精确地模仿行人被遮挡的情形。例说,在训练图像上使用一个随机大小、随机位置的矩形框来遮挡训练图像,并以随机值来取代此矩形区域的像素值,来模仿遮挡以增加数据集的变体。然而上述遮挡区域是随机选择的,可选地,训练一个行人重识别分类模型,然后在网络可视化以及多个分类器的辅助下找到图像判别性的区域,并在原始图像上将判别性区域遮挡产生新样本,最后将新样本加入原始数据集来重新训练行人重识别模型。In addition, occlusion is also an important factor affecting the generalization ability of convolutional neural networks. Collecting a large number of pedestrian images including occlusion situations is an effective way to solve the occlusion problem, but it also requires high cost input. Another more reasonable method is to accurately simulate the situation where pedestrians are occluded. For example, use a rectangular frame of random size and random position on the training image to occlude the training image, and replace the pixel values of this rectangular area with random values to imitate the occlusion to increase the variation of the dataset. However, the above occluded areas are randomly selected. Optionally, train a pedestrian re-identification classification model, and then find the discriminative area of the image with the assistance of network visualization and multiple classifiers, and block the discriminative area on the original image. Generate new samples, and finally add the new samples to the original data set to retrain the pedestrian re-identification model.

基于上述的两种方法,均是在原始行人图像上通过遮挡来增加样本的变体,属于图像级别的数据增强方法,而本申请提供一种。Based on the above two methods, both are variants of adding samples through occlusion on the original pedestrian image, which belong to image-level data enhancement methods, and this application provides one.

图1为本公开一个实施例提供的一种行人重识别方法的流程图,如图1所示,该方法包括以下步骤:Fig. 1 is a flowchart of a pedestrian re-identification method provided by an embodiment of the present disclosure. As shown in Fig. 1, the method includes the following steps:

如图1所示,在步骤S110中,从多个图片中提取得到行人CNN特征图;As shown in Figure 1, in step S110, extract the pedestrian CNN feature map from multiple pictures;

如图1所示,在步骤S120中,采用对抗擦除学习的方式模仿对所述行人CNN特征图的判别性区域被遮挡的情形进行模型训练,得到训练模型;As shown in Figure 1, in step S120, adopt the mode of anti-erasure learning to imitate the situation that the discriminative region of described pedestrian CNN feature map is blocked and carry out model training, obtain training model;

如图1所示,在步骤S130中,利用所述训练模型结合目标行人图像和待识别行人图像进行行人重识别,得到行人重识别结果。As shown in FIG. 1 , in step S130 , the training model is used to perform pedestrian re-identification in combination with the target pedestrian image and the image of the pedestrian to be recognized, to obtain a pedestrian re-identification result.

以下对图1所示实施例的各个步骤的具体实现进行详细阐述:The specific implementation of each step of the embodiment shown in Figure 1 is described in detail below:

结合图1所示流程,图2为本公开一个实施例中实现图1所述方法的网络结构示意图,如图2所示,对每一通道在处理过程中都需要采用通道注意力和空间注意力的互补注意力,然后对得到的特征图进行对抗擦除学习和Softmax损失计算。另外,如图2所示,其中的三个通道分为两个通道为中级语义分支和一个高级语义分支。Combined with the process shown in FIG. 1, FIG. 2 is a schematic diagram of the network structure for implementing the method described in FIG. 1 in an embodiment of the present disclosure. As shown in FIG. 2, channel attention and spatial attention are required for each channel during processing Complementary attention of force, and then perform adversarial erasure learning and Softmax loss calculation on the resulting feature maps. In addition, as shown in Figure 2, the three channels are divided into two channels as an intermediate semantic branch and a high-level semantic branch.

在步骤S110中,从多个图片中提取得到行人CNN特征图。In step S110, pedestrian CNN feature maps are extracted from multiple pictures.

图3为本公开一个实施例图1中步骤S110的流程图,具体包括以下步骤:FIG. 3 is a flowchart of step S110 in FIG. 1 according to an embodiment of the present disclosure, which specifically includes the following steps:

如图3所示,在步骤S301中,从训练数据集中随机选择所述多个图片。As shown in FIG. 3, in step S301, the multiple pictures are randomly selected from the training data set.

如图3所示,在步骤S302中,将所述多个图片输入到ResNet50模型的多个不同语义层进行提取,得到多个通道的特征图。As shown in FIG. 3, in step S302, the multiple pictures are input to multiple different semantic layers of the ResNet50 model for extraction, and feature maps of multiple channels are obtained.

在本公开的一个实施例中,该步骤中首先,输入图像,在训练数据集上随机选取批量数目的图片集,即选取多个图片。其次,将图片大小都调整到384*128,并将其送入主干网络ResNet50不同语义层(res_conv5a,res_conv5b,res_conv5c,如图2所示,其中res_conv5a,res_conv5b对应中级语义分支,res_conv5c对应高级语义分支)来提取行人CNN特征图。In an embodiment of the present disclosure, in this step, firstly, an image is input, and a batch number of picture sets are randomly selected from the training data set, that is, multiple pictures are selected. Secondly, adjust the size of the picture to 384*128, and send it to the different semantic layers of the backbone network ResNet50 (res_conv5a, res_conv5b, res_conv5c, as shown in Figure 2, where res_conv5a, res_conv5b correspond to the intermediate semantic branch, and res_conv5c corresponds to the advanced semantic branch ) to extract pedestrian CNN feature maps.

如图3所示,在步骤S303中,利用通道注意力模块对所述多个通道的特征图进行处理,得到经通道处理的特征图。As shown in FIG. 3 , in step S303 , the channel attention module is used to process the feature maps of the multiple channels to obtain channel-processed feature maps.

图4为本公开一个实施例图3中步骤S303的流程图,具体包括以下步骤:FIG. 4 is a flowchart of step S303 in FIG. 3 according to an embodiment of the present disclosure, which specifically includes the following steps:

如图4所示,在步骤S401中,根据所述多个通道的特征图中的每一通道的特征图,得到通道特征描述子。As shown in FIG. 4 , in step S401 , a channel feature descriptor is obtained according to the feature map of each channel in the feature maps of the plurality of channels.

如图4所示,在步骤S402中,对所述通道特征描述子通过激活函数运算,得到通道注意力特征图。As shown in FIG. 4 , in step S402 , the channel feature descriptor is operated with an activation function to obtain a channel attention feature map.

如图4所示,在步骤S403中,将所述通道注意力特征图与所述特征图聚合的特征图相乘,得到所述经通道处理的特征图。As shown in FIG. 4 , in step S403 , the channel-attention feature map is multiplied by the feature map aggregated from the feature map to obtain the channel-processed feature map.

在本公开的一个实施例中,步骤S303可以采用通道注意力模块探索行人CNN特征图通道间的联系,捕获和描述输入图像判别性的区域。In one embodiment of the present disclosure, step S303 can use the channel attention module to explore the connection between the channels of the pedestrian CNN feature map, capture and describe the discriminative regions of the input image.

图5为本公开一个实施例中通道注意力示意图,如图5所示,对于每一通道提取得到的特征图,A和B分别为所述特征图的长和宽,n为通道的编号,N为通道的数量。FIG. 5 is a schematic diagram of channel attention in an embodiment of the present disclosure. As shown in FIG. 5 , for the feature map extracted for each channel, A and B are the length and width of the feature map, and n is the number of the channel. N is the number of channels.

首先,使用GAP操作聚合特征图

Figure BDA0002060606770000081
每一个通道的空间信息,产生通道的特征描述子:/>
Figure BDA0002060606770000082
可见,所述特征描述子包括所述多个通道的统计值,每一通道的统计值为:First, the feature maps are aggregated using the GAP operation
Figure BDA0002060606770000081
The spatial information of each channel generates the feature descriptor of the channel: />
Figure BDA0002060606770000082
It can be seen that the feature descriptor includes statistical values of the multiple channels, and the statistical value of each channel is:

Figure BDA0002060606770000083
Figure BDA0002060606770000083

其次,将s通过一个阈值机制模块,得到通道注意力特征图

Figure BDA0002060606770000084
Second, pass s through a threshold mechanism module to obtain the channel attention feature map
Figure BDA0002060606770000084

e=σ(W2δ(W1(s))) 公式(2)e=σ(W 2 δ(W 1 (s))) formula (2)

其中σ,δ分别代表Sigmod激活函数和ReLU激活函数,

Figure BDA0002060606770000085
是第一全连接层Fc1的权重,/>
Figure BDA0002060606770000086
是第二全连接层Fc2的权重,r是衰减的倍数。Where σ and δ represent the Sigmod activation function and the ReLU activation function respectively,
Figure BDA0002060606770000085
is the weight of the first fully connected layer Fc1, />
Figure BDA0002060606770000086
is the weight of the second fully connected layer Fc2, and r is the multiple of attenuation.

最后,将通道注意力e与原始输入特征图S相乘得到修正后的特征图

Figure BDA0002060606770000087
Finally, the channel attention e is multiplied by the original input feature map S to obtain the corrected feature map
Figure BDA0002060606770000087

Figure BDA0002060606770000088
Figure BDA0002060606770000088

由于通道注意力特征图e编码包含通道特征图之间的依赖性和相对重要性,神经网络将会通过动态更新e去学习重要类型的特征图而忽略不太重用的特征图。Since the channel attention feature map e encodes the dependencies and relative importance between channel feature maps, the neural network will dynamically update e to learn important types of feature maps and ignore feature maps that are less reused.

如图3所示,在步骤S304中,利用空间注意力模块对所述经通道处理的特征图在不同位置的空间上下文信息进行处理,得到所述行人CNN特征图。As shown in FIG. 3 , in step S304 , the spatial context information at different positions of the channel-processed feature map is processed by using a spatial attention module to obtain the pedestrian CNN feature map.

在本公开的一个实施例中,步骤S304可以采用空间注意力模块将特征图不同位置的空间上下文信息融入行人局部特征,增强行人局部区域的空间相关性。图6为本公开一个实施例中空间注意力示意图,如图6所示,对经通道处理的特征图分别进行卷积操作,得到第一空间信息特征图T、第二空间信息特征图U和第三空间信息特征图V,T经转置后与U相乘得到D,D与V相乘得到X,对于X进行一定比例的缩放后与经通道处理的特征图进行相加,实现对特征图的空间处理,得到最终的行人CNN特征图。In one embodiment of the present disclosure, step S304 may use a spatial attention module to integrate spatial context information at different positions of the feature map into the local features of pedestrians, so as to enhance the spatial correlation of the local areas of pedestrians. FIG. 6 is a schematic diagram of spatial attention in an embodiment of the present disclosure. As shown in FIG. 6 , convolution operations are performed on the channel-processed feature maps to obtain the first spatial information feature map T, the second spatial information feature map U and The third spatial information feature map V and T are transposed and multiplied by U to obtain D, and D and V are multiplied to obtain X. X is scaled to a certain ratio and added to the channel-processed feature map to realize feature The spatial processing of the graph yields the final pedestrian CNN feature map.

图7为本公开一个实施例图3中步骤S304的流程图,具体包括以下步骤:FIG. 7 is a flowchart of step S304 in FIG. 3 according to an embodiment of the present disclosure, which specifically includes the following steps:

如图7所示,在步骤S701中,对所述经通道处理的特征图进行1×1的卷积运算,得到第一空间信息特征图T和第二空间信息特征图U。通道注意力修正过的特征图(即经通道处理的特征图)

Figure BDA0002060606770000091
送进1×1的卷积fkey与fquery,得到两个特征图T和U,其中
Figure BDA0002060606770000092
As shown in FIG. 7 , in step S701 , a 1×1 convolution operation is performed on the channel-processed feature map to obtain a first spatial information feature map T and a second spatial information feature map U. Channel attention corrected feature map (i.e. channel processed feature map)
Figure BDA0002060606770000091
Send 1×1 convolution f key and f query to get two feature maps T and U, where
Figure BDA0002060606770000092

如图7所示,在步骤S702中,将所述第一空间信息特征图T的转置与所述第二空间信息特征图U进行矩阵乘法运算,得到空间注意力特征图。将T和U形状调整为

Figure BDA0002060606770000093
其中Z=A×B,代表特征的数量,之后将T进行转置并且与U进行矩阵乘法,按照行方向应用一个Softmax函数得到空间注意力特征图D∈RZ×Z,D的每一个元素dj,i可以表示为:As shown in FIG. 7 , in step S702 , matrix multiplication is performed on the transposition of the first spatial information feature map T and the second spatial information feature map U to obtain a spatial attention feature map. Adjust the T and U shapes to
Figure BDA0002060606770000093
Where Z=A×B represents the number of features, then transpose T and perform matrix multiplication with U, and apply a Softmax function in the row direction to obtain the spatial attention feature map D∈R Z×Z , each element of D d j,i can be expressed as:

Figure BDA0002060606770000094
Figure BDA0002060606770000094

其中,dj,i代表第i个位置对第j个位置特征的相关性,两个位置的特征表达越相似,二者之间的相关性就越高。Among them, d j,i represents the correlation of the i-th position to the feature of the j-th position, the more similar the feature expressions of the two positions are, the higher the correlation between them is.

如图7所示,在步骤S703中,对所述经通道处理的特征图进行1×1的卷积运算,得到第三空间信息特征图V。As shown in FIG. 7 , in step S703 , a 1×1 convolution operation is performed on the channel-processed feature map to obtain a third spatial information feature map V.

将经通道处理的特征图S'送进1×1的卷积层fvalue,得到一个新的特征图

Figure BDA0002060606770000095
并将其形状调整为/>
Figure BDA0002060606770000096
Send the channel-processed feature map S' into the 1×1 convolutional layer f value to get a new feature map
Figure BDA0002060606770000095
and reshape it to />
Figure BDA0002060606770000096

如图7所示,在步骤S704中,将所述第三空间信息特征图V与所述空间注意力特征图的转置进行矩阵乘法运算,得到经空间处理的特征图。As shown in FIG. 7 , in step S704 , matrix multiplication is performed on the transposition of the third spatial information feature map V and the spatial attention feature map to obtain a spatially processed feature map.

该步骤中首先将V与D的转置进行矩阵乘法,将结果的形状调整为

Figure BDA0002060606770000097
并将其通过1×1的卷积fup,得到特征图/>
Figure BDA0002060606770000098
In this step, matrix multiplication is first performed on the transpose of V and D, and the shape of the result is adjusted as
Figure BDA0002060606770000097
And pass it through the 1×1 convolution f up to get the feature map />
Figure BDA0002060606770000098

如图7所示,在步骤S705中,根据所述通道处理和所述空间处理得到所述行人CNN特征图,这里的通道处理即上述步骤S401~S403,空间处理即上述步骤S701~S704。As shown in Fig. 7, in step S705, the pedestrian CNN feature map is obtained according to the channel processing and the space processing, where the channel processing refers to the above steps S401-S403, and the space processing refers to the above-mentioned steps S701-S704.

该步骤中将X与缩放参数α相乘,并且与经通道处理的特征图S'按照元素相加得到特征图

Figure BDA0002060606770000101
即:In this step, X is multiplied by the scaling parameter α, and is added element-wise to the channel-processed feature map S' to obtain the feature map
Figure BDA0002060606770000101
Right now:

Figure BDA0002060606770000102
Figure BDA0002060606770000102

基于上述,特征图S”的每一个位置的元素可以表示为:Based on the above, the elements of each position of the feature map S" can be expressed as:

Figure BDA0002060606770000103
Figure BDA0002060606770000103

其中,α是可学习参数,初始时设置为0,可以从0开始逐渐地学习到更大的权重。从公式(6)可以看到,特征图S”的每一个位置的特征S”j是所有位置的特征与经通道处理的特征图S'j的加权总和,因此它包含全局感受野,可以根据空间注意特征图D中每一个元素dj,i的大小,有选择地聚合相关的局部区域Vi的空间上下文信息,从而可以增强行人不同局部特征间的联系。Among them, α is a learnable parameter, which is initially set to 0, and can gradually learn a larger weight from 0. It can be seen from formula (6) that the feature S” j of each position of the feature map S” is the weighted sum of the features of all positions and the channel-processed feature map S’ j , so it contains the global receptive field, which can be calculated according to Spatial attention is paid to the size of each element dj ,i in the feature map D, selectively aggregating the spatial context information of related local regions V i , so that the connection between different local features of pedestrians can be enhanced.

基于前述步骤,将通道注意力和空间注意力串联使用修正CNN特征图,让神经网络来自动关注于哪些类型特征与哪些位置的特征是更有效的。因此,在本公开中通道注意力模块和空间注意力模块两个模块被联合使用,充分发挥两者的作用。如图5所示,本公开特征图

Figure BDA0002060606770000104
先通过通道注意力模块,再通过空间注意力模块,即实现互补注意力的修正:Based on the previous steps, it is more effective to use the modified CNN feature map in series with channel attention and spatial attention, so that the neural network can automatically focus on which types of features and which features. Therefore, in the present disclosure, the two modules of the channel attention module and the spatial attention module are jointly used to give full play to the roles of both. As shown in Figure 5, the feature map of the present disclosure
Figure BDA0002060606770000104
First through the channel attention module, and then through the spatial attention module, the correction of complementary attention is realized:

S'=Mc(S)S'=M c (S)

S”=Ms(S') 公式(7)S”=M s (S’) formula (7)

在步骤S120中,采用对抗擦除学习的方式模仿对所述行人CNN特征图的判别性区域被遮挡的情形进行模型训练,得到训练模型。In step S120 , model training is performed by imitating the case where the discriminative region of the pedestrian CNN feature map is blocked by means of anti-erasure learning, to obtain a training model.

图8为本公开一实施例中对抗擦除学习示意图,如图8所示,对于行人CNN特征图分别通过主分类器和辅分类器进行卷积、GAP和Softmax损失函数等处理。FIG. 8 is a schematic diagram of anti-erasure learning in an embodiment of the present disclosure. As shown in FIG. 8 , the CNN feature map of pedestrians is processed by convolution, GAP and Softmax loss functions through the main classifier and auxiliary classifier respectively.

图9为本公开一个实施例图1中步骤S120的流程图,具体包括以下步骤:FIG. 9 is a flowchart of step S120 in FIG. 1 according to an embodiment of the present disclosure, which specifically includes the following steps:

如图9所示,在步骤S901中,将所述行人CNN特征图分别输入到主分类器和辅分类器进行分类训练,从所述主分类器和所述辅分类器输出行人类别专属的特征图。As shown in Figure 9, in step S901, the pedestrian CNN feature map is respectively input to the main classifier and auxiliary classifier for classification training, and the pedestrian category-specific features are output from the main classifier and the auxiliary classifier picture.

其中所述主分类器与所述辅分类器包含相同数量的卷积层和全局平均池化层(Global Average Pooling,简称GAP),且所述卷积层的通道的数目与所述训练数据集中行人类别的数目相同,所述行人类别专属的特征图的每一通道代表行人图像属于不同类别时的身体响应热度图。Wherein the main classifier and the auxiliary classifier include the same number of convolutional layers and global average pooling layers (Global Average Pooling, GAP for short), and the number of channels of the convolutional layer is the same as that in the training data set The number of pedestrian categories is the same, and each channel of the pedestrian category-specific feature map represents the body response heat map when the pedestrian images belong to different categories.

该步骤中,将分类模型的全连接层换成1×1的卷积层,构成基于全卷积网络的分类模型,将修正后的特征图(即行人CNN特征图)送进1×1的卷积层可以直接得到行人类别专属的特征图,由于卷积层的通道数目是训练集中行人类别数目,特征图的每一个通道代表该行人图像属于不同类别时的身体响应热度图。在训练阶段,可以获得行人图像类别标签,将类别标签所对应通道的特征图索引出来,得到行人类别专属的特征图,即该行人图像的身体响应热度图。In this step, the fully connected layer of the classification model is replaced with a 1×1 convolutional layer to form a classification model based on a fully convolutional network, and the corrected feature map (i.e. pedestrian CNN feature map) is sent to the 1×1 convolutional layer. The convolutional layer can directly obtain the feature map specific to the pedestrian category. Since the number of channels in the convolutional layer is the number of pedestrian categories in the training set, each channel of the feature map represents the body response heat map when the pedestrian image belongs to a different category. In the training phase, the category label of the pedestrian image can be obtained, and the feature map of the channel corresponding to the category label can be indexed to obtain the specific feature map of the pedestrian category, that is, the body response heat map of the pedestrian image.

如图9所示,在步骤S902中,在所述辅分类器进行部分擦除,得到擦除后特征图。As shown in FIG. 9 , in step S902 , partial erasure is performed in the auxiliary classifier to obtain the feature map after erasure.

首先,将所述身体响应热度图中热度图数值高于设定的对抗擦除阈值的区域确定为判别性区域;其次,对所述辅分类器输出的所述行人类别专属的特征图中对应所述判别性区域的部分通过响应值被取代为0的对抗方式被擦除掉。Firstly, the area whose heat map value in the body response heat map is higher than the set adversarial erasure threshold is determined as the discriminative area; secondly, the pedestrian category-specific feature map output by the auxiliary classifier corresponds to Parts of the discriminative region are erased in an adversarial manner by replacing the response value with zero.

该步骤中通过部分擦除辅分类器的输入特征图,通过步骤S901主分类器产生行人类别专属的特征图,进一步根据身体响应热度图数值高于对抗擦除阈值的部位被设定为判别性区域,而辅分类器的输入特征图中相对应的区域会通过响应值被取代为0的对抗方式而被擦除掉。辅分类器输入的特征图被部分擦除,这样可以增加特征图的变体,同时模仿行人被遮挡的情况。In this step, by partially erasing the input feature map of the auxiliary classifier, the main classifier generates a pedestrian category-specific feature map through step S901, and further according to the part of the body response heat map whose value is higher than the anti-erasing threshold, it is set as discriminative region, and the corresponding region in the input feature map of the auxiliary classifier will be erased by the adversarial way of replacing the response value with 0. The feature maps input by the auxiliary classifier are partially erased, which can increase the variation of feature maps while imitating the occluded pedestrian situation.

如图9所示,在步骤S903中,对所述主分类器输出的所述行人类别专属的特征图与所述辅分类器输出的所述擦除后特征图,分别通过损失函数进行计算,得到损失值。As shown in FIG. 9, in step S903, the pedestrian category-specific feature map output by the main classifier and the erased feature map output by the auxiliary classifier are respectively calculated through a loss function, Get the loss value.

如图9所示,在步骤S904中,根据所述损失值对所述训练模型进行参数更新。As shown in FIG. 9, in step S904, the parameters of the training model are updated according to the loss value.

在该步骤中,对于主分类器和辅分类器两个分支均在Softmax损失函数的监督下进行参数更新,损失函数表达式为:In this step, both branches of the main classifier and auxiliary classifier are updated under the supervision of the Softmax loss function, and the expression of the loss function is:

Figure BDA0002060606770000121
Figure BDA0002060606770000121

其中,P代表批量样本的大小,M代表分支数量,K代表对抗擦除学习中分类器的数量(本实施例中为2),C代表类别数目,

Figure BDA0002060606770000122
代表采用全卷积分类网络时,第p个样本的第m个分支的第k个分类器的第lp个Softmax输入的节点值,其中lp是第p个样本的类别。每一个分支的第一个分类器都是主分类器,第二个分类器是辅分类器,参数λk是分配给这两个分类器损失的权重,其中参数λ1=1对应的是主分类器,参数λ2=0.5对应的是辅分类器。Wherein, P represents the size of batch samples, M represents the number of branches, K represents the number of classifiers in the anti-erasure learning (2 in this embodiment), C represents the number of categories,
Figure BDA0002060606770000122
Represents the node value of the l pth Softmax input node value of the lpth Softmax input of the kth classifier of the mth branch of the pth sample when the full convolutional classification network is used, where lp is the category of the pth sample. The first classifier of each branch is the main classifier, the second classifier is the auxiliary classifier, and the parameter λ k is the weight assigned to the loss of these two classifiers, where the parameter λ 1 =1 corresponds to the main classifier For the classifier, the parameter λ 2 =0.5 corresponds to the auxiliary classifier.

在步骤S130中,利用所述训练模型结合目标行人图像和待识别行人图像进行行人重识别,得到行人重识别结果。In step S130, pedestrian re-identification is performed by using the training model combined with the target pedestrian image and the pedestrian image to be recognized, to obtain a pedestrian re-identification result.

图10为本公开一个实施例图1中步骤S130的流程图,具体包括以下步骤:FIG. 10 is a flowchart of step S130 in FIG. 1 according to an embodiment of the present disclosure, which specifically includes the following steps:

如图10所示,在步骤S1001中,根据所述目标行人图像和所述待识别行人图像输入到所述训练模型中进行训练,分别得到对应的深度特征。该步骤中将目标行人图像与待识别行人图像送进步骤2训练好的CNN模型提取图像特征,具体地,将图2中不同语义级别的特征(res_conv5a,res_conv5b,res_conv5c)串联作为最终的特征描述子。As shown in FIG. 10 , in step S1001 , input the image of the target pedestrian and the image of the pedestrian to be identified into the training model for training, and obtain corresponding depth features respectively. In this step, the target pedestrian image and the pedestrian image to be recognized are sent to the CNN model trained in step 2 to extract image features. Specifically, the features of different semantic levels (res_conv5a, res_conv5b, res_conv5c) in Figure 2 are concatenated as the final feature description son.

如图10所示,在步骤S1002中,根据所述目标行人图像的深度特征与所述待识别行人图像的深度特征计算余弦距离,计算公式为:As shown in FIG. 10, in step S1002, the cosine distance is calculated according to the depth feature of the target pedestrian image and the depth feature of the pedestrian image to be identified, and the calculation formula is:

Figure BDA0002060606770000131
Figure BDA0002060606770000131

其中feat1为所述目标行人图像的深度特征,feat2为所述待识别行人图像的深度特征。Where feat1 is the depth feature of the target pedestrian image, and feat2 is the depth feature of the pedestrian image to be recognized.

如图10所示,在步骤S1003中,根据余弦距离的大小确定所述目标行人图像和所述待识别行人图像之间的相似度,其中相似度最大的待识别行人图像为所述行人重识别结果。As shown in Figure 10, in step S1003, the similarity between the target pedestrian image and the pedestrian image to be recognized is determined according to the size of the cosine distance, wherein the pedestrian image to be recognized with the largest similarity is the pedestrian re-identification result.

由于目标行人图像和待识别行人图像组成的图相对之间的相似度与特征余弦距离呈负线性相关的关系,因此,特征余弦距离越小,图相对的相似度越高。基于上述,可以得到余弦距离后按大小进行升序排列,即图像对相似度大小进行降序排序,将相似度最大的待识别行人图像作为行人重识别的结果。Since the similarity between the target pedestrian image and the pedestrian image to be recognized is negatively linearly correlated with the characteristic cosine distance, the smaller the characteristic cosine distance, the higher the relative similarity of the graph. Based on the above, the cosine distance can be obtained and sorted in ascending order, that is, the images are sorted in descending order of similarity, and the image of the pedestrian to be recognized with the largest similarity is taken as the result of pedestrian re-identification.

综上所述,采用本公开实施例提供的行人重识别方法,一方面,通过提供一种特征级别数据增强策略,辅分类器的输入特征图被部分擦除,增加行人特征的变体和抵抗行人被遮挡的情形,提高深度行人重识别模型的泛化能力。另一方面,本公开中的空间注意力模型将空间上下文信息融入行人局部特征,增强行人不同位置的空间相关性,同时与通道注意力模型构成互补注意力模型,二者联合使用,从通道与空间两个方向修正特征图,可以更好地捕获判别性区域。本公开提出一种基于全卷积网络的分类模型,可以在前向传播的过程中直接得到身体响应热度图,指导擦除判别性身体区域,实现特征级别数据的数据增强。In summary, using the pedestrian re-identification method provided by the embodiment of the present disclosure, on the one hand, by providing a feature-level data enhancement strategy, the input feature map of the auxiliary classifier is partially erased, increasing the variation and resistance of pedestrian features In the case of occluded pedestrians, the generalization ability of the deep pedestrian re-identification model is improved. On the other hand, the spatial attention model in this disclosure integrates the spatial context information into the local features of pedestrians, enhances the spatial correlation of different positions of pedestrians, and at the same time forms a complementary attention model with the channel attention model. The feature maps are revised in two spatial directions, which can better capture discriminative regions. This disclosure proposes a classification model based on a fully convolutional network, which can directly obtain a body response heat map in the process of forward propagation, guide the erasure of discriminative body regions, and realize data enhancement of feature-level data.

应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory. Actually, according to the embodiment of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided to be embodied by a plurality of modules or units.

通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、触控终端、或者网络设备等)执行根据本公开实施方式的方法。Through the description of the above implementations, those skilled in the art can easily understand that the example implementations described here can be implemented by software, or by combining software with necessary hardware. Therefore, the technical solutions according to the embodiments of the present disclosure can be embodied in the form of software products, and the software products can be stored in a non-volatile storage medium (which can be CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to make a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) execute the method according to the embodiments of the present disclosure.

本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any modification, use or adaptation of the present disclosure, and these modifications, uses or adaptations follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure . The specification and examples are to be considered exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It should be understood that the present disclosure is not limited to the precise constructions which have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (6)

1.一种行人重识别方法,其特征在于,其包括:1. A pedestrian re-identification method, characterized in that it comprises: 从多个图片中提取得到行人CNN特征图,包括:The pedestrian CNN feature map is extracted from multiple pictures, including: 从训练数据集中随机选择所述多个图片;randomly selecting the plurality of pictures from the training data set; 将所述多个图片输入到ResNet50模型的多个不同语义层进行提取,得到多个通道的特征图;The plurality of pictures are input to a plurality of different semantic layers of the ResNet50 model to extract, and obtain feature maps of a plurality of channels; 利用通道注意力模块对所述多个通道的特征图进行处理,得到经通道处理的特征图;Using a channel attention module to process the feature maps of the plurality of channels to obtain channel-processed feature maps; 利用空间注意力模块对所述经通道处理的特征图在不同位置的空间上下文信息进行处理,得到所述行人CNN特征图;采用对抗擦除学习的方式模仿对所述行人CNN特征图的判别性区域被遮挡的情形进行模型训练,得到训练模型,包括:Use the spatial attention module to process the spatial context information of the channel-processed feature map at different positions to obtain the pedestrian CNN feature map; use the method of confrontational erasure learning to imitate the discriminativeness of the pedestrian CNN feature map The model training is carried out in the case where the area is occluded, and the training model is obtained, including: 将所述行人CNN特征图分别输入到主分类器和辅分类器进行分类训练,从所述主分类器和所述辅分类器输出行人类别专属的特征图;The pedestrian CNN feature map is input to the main classifier and the auxiliary classifier respectively for classification training, and the pedestrian category-specific feature map is output from the main classifier and the auxiliary classifier; 所述辅分类器为resnet50的基础上增加的辅助分类器;The auxiliary classifier is an auxiliary classifier added on the basis of resnet50; 所述主分类器与所述辅分类器包含相同数量的卷积层和全局平均池化层,且所述卷积层的通道的数目与所述训练数据集中行人类别的数目相同,所述行人类别专属的特征图的每一通道代表行人图像属于不同类别时的身体响应热度图;在所述辅分类器进行部分擦除,得到擦除后特征图;The main classifier and the auxiliary classifier include the same number of convolutional layers and global average pooling layers, and the number of channels of the convolutional layer is the same as the number of pedestrian categories in the training data set, and the pedestrian Each channel of the category-specific feature map represents the body response heat map when the pedestrian image belongs to different categories; partial erasing is performed in the auxiliary classifier to obtain the feature map after erasing; 所述在所述辅分类器进行部分擦除包括:The partial erasing performed in the auxiliary classifier includes: 将所述身体响应热度图中热度图数值高于设定的对抗擦除阈值的区域确定为判别性区域;Determining the area where the heat map value in the body response heat map is higher than the set resistance erasure threshold as a discriminative area; 对所述辅分类器输出的所述行人类别专属的特征图中对应所述判别性区域的部分通过响应值被取代为0的对抗方式被擦除掉;The portion corresponding to the discriminative region in the pedestrian category-specific feature map output by the auxiliary classifier is erased in an adversarial manner in which the response value is replaced with 0; 对所述主分类器输出的所述行人类别专属的特征图与所述辅分类器输出的所述擦除后特征图,分别通过损失函数进行计算,得到损失值;The pedestrian category-specific feature map output by the main classifier and the erased feature map output by the auxiliary classifier are respectively calculated through a loss function to obtain a loss value; 根据所述损失值对所述训练模型进行参数更新;updating parameters of the training model according to the loss value; 利用所述训练模型结合目标行人图像和待识别行人图像进行行人重识别,得到行人重识别结果。The pedestrian re-identification is performed by using the training model combined with the target pedestrian image and the pedestrian image to be recognized, and the pedestrian re-identification result is obtained. 2.如权利要求1所述的行人重识别方法,其特征在于,所述利用通道注意力模块对所述多个通道的特征图进行处理,得到经通道处理的特征图包括:2. pedestrian re-identification method as claimed in claim 1, is characterized in that, described utilize channel attention module to process the feature map of described multiple channels, obtain the feature map through channel processing comprising: 根据所述多个通道的特征图中的每一通道的特征图,得到通道特征描述子;Obtaining a channel feature descriptor according to the feature map of each channel in the feature map of the plurality of channels; 对所述通道特征描述子通过激活函数运算,得到通道注意力特征图;Obtaining a channel attention feature map through an activation function operation on the channel feature descriptor; 将所述通道注意力特征图与所述特征图聚合的特征图相乘,得到所述经通道处理的特征图。The channel-attention feature map is multiplied by the feature map aggregated by the feature map to obtain the channel-processed feature map. 3.如权利要求2所述的行人重识别方法,其特征在于,所述特征描述子包括所述多个通道的统计值,所述特征描述子为:3. pedestrian re-identification method as claimed in claim 2, is characterized in that, described feature descriptor comprises the statistical value of described multiple channels, and described feature descriptor is:
Figure FDA0004176651120000021
Figure FDA0004176651120000021
每一通道的统计值为:The statistical value of each channel is:
Figure FDA0004176651120000022
Figure FDA0004176651120000022
其中N为通道的数量,n为通道的编号,A和B分别为所述特征图的长和宽;Where N is the number of channels, n is the number of channels, and A and B are the length and width of the feature map, respectively; 所述通道注意力特征图为:The channel attention feature map is: e=σ(W2δ(W1(s)))e=σ(W 2 δ(W 1 (s))) 其中σ,δ分别代表Sigmod激活函数和ReLU激活函数,
Figure FDA0004176651120000023
是第一全连接层Fc1的权重,/>
Figure FDA0004176651120000024
是第二全连接层Fc2的权重,r是衰减的倍数。
Where σ and δ represent the Sigmod activation function and the ReLU activation function respectively,
Figure FDA0004176651120000023
is the weight of the first fully connected layer Fc1, />
Figure FDA0004176651120000024
is the weight of the second fully connected layer Fc2, and r is the multiple of attenuation.
4.如权利要求1所述的行人重识别方法,其特征在于,所述利用空间注意力模块对所述经通道处理的特征图在不同位置的空间上下文信息进行处理,得到所述行人CNN特征图包括:4. The pedestrian re-identification method according to claim 1, wherein the spatial context information of the channel-processed feature map at different positions is processed by using the spatial attention module to obtain the pedestrian CNN feature Figures include: 对所述经通道处理的特征图进行1×1的卷积运算,得到第一空间信息特征图T和第二空间信息特征图U;performing a 1×1 convolution operation on the channel-processed feature map to obtain a first spatial information feature map T and a second spatial information feature map U; 将所述第一空间信息特征图T的转置与所述第二空间信息特征图U进行矩阵乘法运算,得到空间注意力特征图;Perform matrix multiplication with the transposition of the first spatial information feature map T and the second spatial information feature map U to obtain a spatial attention feature map; 对所述经通道处理的特征图进行1×1的卷积运算,得到第三空间信息特征图V;performing a 1×1 convolution operation on the channel-processed feature map to obtain a third spatial information feature map V; 将所述第三空间信息特征图V与所述空间注意力特征图的转置进行矩阵乘法运算,得到经空间处理的特征图;Perform matrix multiplication operation with the transposition of the third spatial information feature map V and the spatial attention feature map to obtain a spatially processed feature map; 根据所述通道处理和所述空间处理得到所述行人CNN特征图。Obtaining the pedestrian CNN feature map according to the channel processing and the spatial processing. 5.如权利要求2所述的行人重识别方法,其特征在于,所述利用所述训练模型结合目标行人图像和待识别行人图像进行行人重识别,得到行人重识别结果包括:5. pedestrian re-identification method as claimed in claim 2, is characterized in that, described using described training model to carry out pedestrian re-identification in combination with target pedestrian image and pedestrian image to be identified, obtaining pedestrian re-identification result comprises: 根据所述目标行人图像和所述待识别行人图像输入到所述训练模型中进行训练,分别得到对应的深度特征;Inputting the image of the target pedestrian and the image of the pedestrian to be identified into the training model for training to obtain corresponding depth features; 根据所述目标行人图像的深度特征与所述待识别行人图像的深度特征计算余弦距离;calculating a cosine distance according to the depth feature of the target pedestrian image and the depth feature of the pedestrian image to be identified; 根据余弦距离的大小确定所述目标行人图像和所述待识别行人图像之间的相似度,其中相似度最大的待识别行人图像为所述行人重识别结果。Determine the similarity between the target pedestrian image and the pedestrian image to be recognized according to the size of the cosine distance, wherein the pedestrian image to be recognized with the largest similarity is the pedestrian re-identification result. 6.如权利要求5所述的行人重识别方法,其特征在于,根据所述目标行人图像的深度特征与所述待识别行人图像的深度特征计算余弦距离的计算公式为:6. The pedestrian re-identification method according to claim 5, wherein the calculation formula for calculating the cosine distance according to the depth feature of the target pedestrian image and the depth feature of the image of the pedestrian to be identified is:
Figure FDA0004176651120000031
Figure FDA0004176651120000031
其中feat1为所述目标行人图像的深度特征,feat2为所述待识别行人图像的深度特征。Where feat1 is the depth feature of the target pedestrian image, and feat2 is the depth feature of the pedestrian image to be recognized.
CN201910403777.5A 2019-05-15 2019-05-15 A Pedestrian Re-identification Method Active CN110110689B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910403777.5A CN110110689B (en) 2019-05-15 2019-05-15 A Pedestrian Re-identification Method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910403777.5A CN110110689B (en) 2019-05-15 2019-05-15 A Pedestrian Re-identification Method

Publications (2)

Publication Number Publication Date
CN110110689A CN110110689A (en) 2019-08-09
CN110110689B true CN110110689B (en) 2023-05-26

Family

ID=67490255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910403777.5A Active CN110110689B (en) 2019-05-15 2019-05-15 A Pedestrian Re-identification Method

Country Status (1)

Country Link
CN (1) CN110110689B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516603B (en) * 2019-08-28 2022-03-18 北京百度网讯科技有限公司 Information processing method and device
CN112633459B (en) * 2019-09-24 2024-09-20 华为技术有限公司 Method for training neural network, data processing method and related device
CN112784648B (en) * 2019-11-07 2022-09-06 中国科学技术大学 Method and device for optimizing feature extraction of pedestrian re-identification system of video
CN111160096A (en) * 2019-11-26 2020-05-15 北京海益同展信息科技有限公司 Method, device and system for identifying poultry egg abnormality, storage medium and electronic device
CN111198964B (en) * 2020-01-10 2023-04-25 中国科学院自动化研究所 Image retrieval method and system
CN111461038B (en) * 2020-04-07 2022-08-05 中北大学 Pedestrian re-identification method based on layered multi-mode attention mechanism
CN111582587B (en) * 2020-05-11 2021-06-04 深圳赋乐科技有限公司 Prediction method and prediction system for video public sentiment
CN111814618B (en) * 2020-06-28 2023-09-01 浙江大华技术股份有限公司 Pedestrian re-recognition method, gait recognition network training method and related devices
CN112131943B (en) * 2020-08-20 2023-07-11 深圳大学 A video behavior recognition method and system based on a dual attention model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT201600068348A1 (en) * 2016-07-01 2018-01-01 Octo Telematics Spa Procedure for determining the status of a vehicle by detecting the vehicle's battery voltage.
CN107563313A (en) * 2017-08-18 2018-01-09 北京航空航天大学 Multiple target pedestrian detection and tracking based on deep learning
CN107679483A (en) * 2017-09-27 2018-02-09 北京小米移动软件有限公司 Number plate recognition methods and device
CN107992882A (en) * 2017-11-20 2018-05-04 电子科技大学 A kind of occupancy statistical method based on WiFi channel condition informations and support vector machines
WO2018153322A1 (en) * 2017-02-23 2018-08-30 北京市商汤科技开发有限公司 Key point detection method, neural network training method, apparatus and electronic device
CN109583379A (en) * 2018-11-30 2019-04-05 常州大学 A kind of pedestrian's recognition methods again being aligned network based on selective erasing pedestrian

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359559B (en) * 2018-09-27 2021-11-12 天津师范大学 Pedestrian re-identification method based on dynamic shielding sample
CN109583502B (en) * 2018-11-30 2022-11-18 天津师范大学 Pedestrian re-identification method based on anti-erasure attention mechanism

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT201600068348A1 (en) * 2016-07-01 2018-01-01 Octo Telematics Spa Procedure for determining the status of a vehicle by detecting the vehicle's battery voltage.
WO2018153322A1 (en) * 2017-02-23 2018-08-30 北京市商汤科技开发有限公司 Key point detection method, neural network training method, apparatus and electronic device
CN107563313A (en) * 2017-08-18 2018-01-09 北京航空航天大学 Multiple target pedestrian detection and tracking based on deep learning
CN107679483A (en) * 2017-09-27 2018-02-09 北京小米移动软件有限公司 Number plate recognition methods and device
CN107992882A (en) * 2017-11-20 2018-05-04 电子科技大学 A kind of occupancy statistical method based on WiFi channel condition informations and support vector machines
CN109583379A (en) * 2018-11-30 2019-04-05 常州大学 A kind of pedestrian's recognition methods again being aligned network based on selective erasing pedestrian

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于稀疏学习的行人重识别算法;张文文 等;《数据采集与处理》;第33卷(第5期);第855-864页 *

Also Published As

Publication number Publication date
CN110110689A (en) 2019-08-09

Similar Documents

Publication Publication Date Title
CN110110689B (en) A Pedestrian Re-identification Method
Cong et al. Global-and-local collaborative learning for co-salient object detection
CN109740419B (en) A Video Action Recognition Method Based on Attention-LSTM Network
CN109389055B (en) Video Classification Method Based on Hybrid Convolution and Attention Mechanism
Ge et al. An attention mechanism based convolutional LSTM network for video action recognition
CN108830252B (en) Convolutional neural network human body action recognition method fusing global space-time characteristics
CN108229338B (en) Video behavior identification method based on deep convolution characteristics
CN107506712B (en) A method for human action recognition based on 3D deep convolutional network
CN103226708B (en) A kind of multi-model fusion video hand division method based on Kinect
CN106951867A (en) Face recognition method, device, system and equipment based on convolutional neural network
CN114639042A (en) Video target detection algorithm based on improved CenterNet backbone network
CN106778796A (en) Human motion recognition method and system based on hybrid cooperative model training
Jiang et al. An efficient attention module for 3d convolutional neural networks in action recognition
CN110889375A (en) Hidden Two-Stream Collaborative Learning Network and Method for Behavior Recognition
CN111814705B (en) Pedestrian re-identification method based on batch blocking shielding network
CN112801182B (en) A RGBT Target Tracking Method Based on Difficult Sample Perception
CN112597324A (en) Image hash index construction method, system and equipment based on correlation filtering
CN108830254A (en) A kind of detection of fine granularity vehicle and recognition methods based on data balancing strategy and intensive attention network
CN110956158A (en) Pedestrian shielding re-identification method based on teacher and student learning frame
CN116798070A (en) A cross-modal person re-identification method based on spectral perception and attention mechanism
CN114882351A (en) Multi-target detection and tracking method based on improved YOLO-V5s
CN112115780A (en) Semi-supervised pedestrian re-identification method based on deep multi-model cooperation
CN117252904A (en) Target tracking method and system based on long-range space perception and channel enhancement
CN118279934A (en) Pedestrian appearance change robustness recognition method with semantic aggregation and fine granularity enhancement
CN114821258B (en) Class activation mapping method and device based on feature map fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant