CN113761995A - A Cross-modal Pedestrian Re-identification Method Based on Double Transform Alignment and Blocking - Google Patents
A Cross-modal Pedestrian Re-identification Method Based on Double Transform Alignment and Blocking Download PDFInfo
- Publication number
- CN113761995A CN113761995A CN202010814790.2A CN202010814790A CN113761995A CN 113761995 A CN113761995 A CN 113761995A CN 202010814790 A CN202010814790 A CN 202010814790A CN 113761995 A CN113761995 A CN 113761995A
- Authority
- CN
- China
- Prior art keywords
- image
- visible light
- infrared
- pedestrian
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 230000000903 blocking effect Effects 0.000 title claims abstract description 4
- 230000009466 transformation Effects 0.000 claims abstract description 29
- 238000005070 sampling Methods 0.000 claims abstract description 28
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims abstract description 11
- 230000011218 segmentation Effects 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 7
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 6
- 230000003044 adaptive effect Effects 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 claims 2
- 238000013507 mapping Methods 0.000 claims 2
- 238000012544 monitoring process Methods 0.000 abstract description 2
- 239000000523 sample Substances 0.000 description 13
- 238000002679 ablation Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000009977 dual effect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 102100040160 Rabankyrin-5 Human genes 0.000 description 1
- 101710086049 Rabankyrin-5 Proteins 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Traffic Control Systems (AREA)
- Image Processing (AREA)
- Closed-Circuit Television Systems (AREA)
Abstract
Description
技术领域technical field
本发明涉及一种基于双变换对齐与分块的跨模态行人重识别方法,以及一种新的网络模型DTASN(Dual transform alignment and segmentation network),涉及视频智能监控领域中的跨模态行人重识别问题,属于计算机视觉与智能信息处理领域。The invention relates to a cross-modal pedestrian re-identification method based on double transform alignment and segmentation, and a new network model DTASN (Dual transform alignment and segmentation network), and relates to a cross-modal pedestrian re-identification method in the field of video intelligent monitoring. The recognition problem belongs to the field of computer vision and intelligent information processing.
背景技术Background technique
行人重识别(Person Re-Identification,ReID)是计算机视觉领域中的一种技术,行人重识别的目的是在多个不重叠的摄像机中检索感兴趣的人,通常被认为是图像检索的一个子问题。一个高效的ReID算法可以缓解视频观看的痛苦,加速调查的进程。行人重识别在视频监控,智能安防等领域开阔的应用前景,引起了学术界和工业界的广泛关注,使其成为计算机视觉领域一个既很有研究价值又极具挑战性的研究热点。Person Re-Identification (ReID) is a technique in the field of computer vision. The purpose of Person Re-Identification is to retrieve persons of interest from multiple non-overlapping cameras, which is usually considered as a sub-section of image retrieval. question. An efficient ReID algorithm can ease the pain of video viewing and speed up the investigation process. The broad application prospect of pedestrian re-identification in video surveillance, intelligent security and other fields has attracted extensive attention from academia and industry, making it a valuable and challenging research hotspot in the field of computer vision.
当前,大多数研究主要集中在RGB-RGB(单模态)行人重识别问题,其中probe和gallery行人都是可见摄像机捕获。但是,可见光相机可能无法捕获光照变化中的外观信息,尤其是在光照条件不足时(例如,在夜间或黑暗的环境中)。得益于技术的发展,当前大多数新一代相机都可以根据光线条件自动进行可见光和红外模式切换。因此,有必要开发一些方法来解决可见光和红外图像跨模态ReID问题。与传统的行人重识别不同,可见光和红外图像跨模态行人重识别是将不同光谱的可见光行人图像和红外摄像机捕捉到的行人图像进行匹配,这种可见光图像与红外图像跨模态行人重识别VI-ReID(Visible andinfrared person re-identification)主要解决跨模态图像的匹配。VI-ReID通常使用可见光(或红外)行人图像来搜索整个摄像头设备中的红外(或可见光)行人图像。Currently, most research focuses on the RGB-RGB (single-modal) pedestrian re-identification problem, where both probe and gallery pedestrians are captured by visible cameras. However, visible light cameras may fail to capture appearance information in lighting changes, especially when lighting conditions are insufficient (eg, at night or in dark environments). Thanks to advances in technology, most current-generation cameras can automatically switch between visible and infrared modes depending on lighting conditions. Therefore, it is necessary to develop some methods to solve the cross-modal ReID problem of visible and infrared images. Different from the traditional pedestrian re-identification, the cross-modal pedestrian re-identification of visible light and infrared images is to match the visible-light pedestrian images of different spectra and the pedestrian images captured by the infrared camera. VI-ReID (Visible and infrared person re-identification) mainly solves the matching of cross-modal images. VI-ReID typically uses visible light (or infrared) pedestrian images to search for infrared (or visible light) pedestrian images throughout the camera device.
行人图像(裁剪的行人)通常是通过自动检测器或跟踪器获得的。然而,由于人的检测/跟踪结果不完善,图像的不对齐通常是不可避免的,即存在部分遮挡、部件缺失(只有身体的一部分)、背景过多等语义错位错误。为了解决ReID中的语义错位问题。一些工作试图通过减少异类数据的跨模态差异来提高行人匹配的准确性。另外,还有一些方法,着重解决行人不对齐的问题,以提高行人匹配的准确性,从而在某种程度上减小模态差异。除了上述困难之外,由于姿态和视角的变化,行人的外观也会发生很大的变化。许多实际问题都会导致图像之间的空间语义错位,即同一空间位置对应的两幅匹配图像的内容语义不同,从而限制了人员再识别技术的鲁棒性和有效性。因此,开发一个具有较强判别能力的模型来同时处理跨模态变化是很重要的,它不仅可以减少异构数据的跨模态差异,同时还可以缓解模态内图像之间不对齐带来的图像差异,从而提高跨模态行人重识别的精度。Pedestrian images (cropped pedestrians) are usually obtained by automatic detectors or trackers. However, due to imperfect human detection/tracking results, misalignment of images is usually unavoidable, i.e., there are semantic misalignment errors such as partial occlusion, missing parts (only part of the body), and excessive background. In order to solve the problem of semantic dislocation in ReID. Some works attempt to improve the accuracy of pedestrian matching by reducing cross-modal differences in heterogeneous data. In addition, there are some methods that focus on solving the problem of pedestrian misalignment to improve the accuracy of pedestrian matching, thereby reducing the modal difference to some extent. In addition to the above difficulties, the appearance of pedestrians also changes greatly due to changes in pose and perspective. Many practical problems can lead to spatial semantic dislocation between images, that is, the content semantics of two matching images corresponding to the same spatial location are different, which limits the robustness and effectiveness of person re-identification technology. Therefore, it is important to develop a model with strong discriminative ability to deal with cross-modal changes simultaneously, which can not only reduce the cross-modal differences in heterogeneous data, but also alleviate the misalignment between images within the modalities. image differences, thereby improving the accuracy of cross-modal person re-identification.
发明内容SUMMARY OF THE INVENTION
本发明提出了一种基于双变换对齐与分块的跨模态行人重识别方法,设计了一种多路径双变换对齐与切分网络结构DTASN,每个训练批次采样策略是:从训练数据集中随机选取P个行人,然后每个行人再随机选取K张可见光行人图像和K张红外行人图像,构成一个包含有2PK张行人图像的批次训练数据,最后将2PK张行人图像送入网络进行训练。在标签信息的监督下,利用卷积神经网络的自我学习能力,分别对错位严重的可见光图像和红外图像进行自适应的对齐矫正,并且将对齐矫正后的图像进行水平切分得到局部块的图像,从而达到提高跨模态行人重识别精度的目的。The invention proposes a cross-modal pedestrian re-identification method based on double-transform alignment and segmentation, and designs a multi-path double-transform alignment and segmentation network structure DTASN. The sampling strategy of each training batch is: from the training data Centrally select P pedestrians at random, and then randomly select K visible light pedestrian images and K infrared pedestrian images for each pedestrian to form a batch training data containing 2PK pedestrian images, and finally send 2PK pedestrian images into the network. train. Under the supervision of the label information, the self-learning ability of the convolutional neural network is used to perform adaptive alignment correction on the severely misplaced visible light images and infrared images, and the aligned and corrected images are horizontally segmented to obtain local block images. , so as to achieve the purpose of improving the accuracy of cross-modal pedestrian re-identification.
一种基于双变换对齐与分块的跨模态行人重识别方法,包括以下步骤:A cross-modal pedestrian re-identification method based on double transform alignment and block, comprising the following steps:
(1)利用可见光基分支网络提取可见光行人图像的特征,得到利用红外基分支网络去提取红外行人图像的特征,得到 (1) Using visible light-based branch network to extract visible light pedestrian images features, get Extracting Infrared Pedestrian Images Using Infrared-Based Branch Networks features, get
(2)从可见光基分支网络中取出第五残差块(conv_5x)特征输入到可见光图像空间变换模块的网格网络中,线性回归一组仿射变换参数并生成可见光图像变换网格,然后通过双线性采样器生成新的可见光行人图像然后对进行特征提取,得到变换后的可见光行人全局特征 (2) Take the fifth residual block (conv_5x) feature from the visible light base branch network and input it into the grid network of the visible light image space transformation module, and linearly regress a set of affine transformation parameters and generate a visible light image transformation grid, and then generate a new visible light pedestrian image through a bilinear sampler then right Perform feature extraction to obtain the transformed visible light pedestrian global feature
(3)从红外基分支网络中取出第五残差块(conv_5x)特征输入到红外图像空间变换模块的网格网络中,线性回归一组仿射变换参数并生成红外图像变换网格,然后通过双线性采样器生成新的红外行人对齐图像然后对进行特征提取,得到全局特征 (3) Take the fifth residual block (conv_5x) feature from the infrared base branch network and input it into the grid network of the infrared image space transformation module, and linearly regress a set of affine transformation parameters and generate an infrared image transformation grid, and then generate a new infrared pedestrian alignment image through a bilinear sampler then right Perform feature extraction to get global features
(4)将新的可见光行人图像水平切分为上、中、下三个不重叠块;然后分别提取这三块的特征,得到特征和最后将对齐图像全局特征和三块图像特征求和得到可见光变换对齐与切分网络的总特征 (4) The new visible light pedestrian image The horizontal segmentation is divided into three non-overlapping blocks, upper, middle and lower; then the features of these three blocks are extracted respectively to obtain the features and Finally, the image global features will be aligned Sum the three image features to get the total features of the visible light transformation alignment and segmentation network
(5)将新的红外行人图像水平切分为上、中、下三个不重叠块;然后分别提取这三块的特征,得到特征和最后将对齐图像全局特征和三块图像特征求和得到红外变换对齐与切分网络的总特征 (5) Put the new infrared pedestrian image The horizontal segmentation is divided into three non-overlapping blocks, upper, middle and lower; then the features of these three blocks are extracted respectively to obtain the features and Finally, the image global features will be aligned Sum the three image features to get the total features of the infrared transform alignment and segmentation network
(6)将与可见光基础分支网络提取到的特征进行加权相加融合,得到可见光分支的总特征将与红外基础分支网络提取到的特征进行加权相加融合,得到红外分支的总特征然后将可见光图像的特征和红外图像的特征映射到同一个特征嵌入空间中,结合身份损失函数和带权重的最难批次采样损失函数进行训练,最终提高跨模态行人重识别精度。(6) will Features extracted from the visible light base branch network Perform weighted addition and fusion to obtain the total features of the visible light branch Will Features extracted with the infrared base branch network Perform weighted addition fusion to get the total features of the infrared branch Then the features of the visible light image and infrared image features Map to the same feature embedding space, combine the identity loss function and the weighted most difficult batch sampling loss function for training, and finally improve the accuracy of cross-modal person re-identification.
附图说明Description of drawings
图1为本发明基于双变换对齐与分块的跨模态行人重识别方法框图;1 is a block diagram of a cross-modal pedestrian re-identification method based on double-transform alignment and segmentation of the present invention;
图2为本发明可见光变换对齐与分块支路结构图。FIG. 2 is a structural diagram of the visible light transformation alignment and block branch according to the present invention.
图3为本发明红外变换对齐与分块支路结构图。FIG. 3 is a structural diagram of the infrared transform alignment and block branch according to the present invention.
具体实施方式Detailed ways
下面结合附图1、附图2和附图3对本发明作进一步说明:Below in conjunction with accompanying drawing 1, accompanying
DTASN模型网络结构和原理具体如下:The network structure and principle of the DTASN model are as follows:
该网络模型框架以端到端的方式通过多径双对齐与分块网络来学习特征表示和距离度量,同时保持较高的可分辨性。该框架包括三个组件:(1)特征提取模块,(2)特征嵌入模块,(3)损失计算模块。所有路径的骨干网结都是采用的深度残差网络ResNet50。由于可用数据的缺乏,为了加快训练过程的收敛速度,本发明使用预先训练好的ResNet50模型对网络进行初始化。为了加强对局部特征的注意,本发明在每条路径上都应用了位置注意模块。This network model framework learns feature representations and distance metrics through a multi-path dual alignment and block network in an end-to-end manner while maintaining high discriminability. The framework consists of three components: (1) feature extraction module, (2) feature embedding module, (3) loss calculation module. The backbone network of all paths is the deep residual network ResNet50. Due to the lack of available data, in order to speed up the convergence speed of the training process, the present invention uses the pre-trained ResNet50 model to initialize the network. To strengthen the attention to local features, the present invention applies a location attention module on each path.
对于可见光和红外交叉模态行人重识别,相似点在于行人轮廓和纹理的非彩色信息,显著差异在于成像光谱。因此,本发明设计了一个孪生网络模型来提取红外和可见行人图像的视觉特征。如附图1所示,本发明使用两个结构相同的网络提取可见光和红外图像的特征表示,注意他们之间权值不共享。特征提取模块主要包含了处理可见光和红外数据的两个主要网络:基分支网络和对齐与分割网络。For visible-light and infrared cross-modal pedestrian re-identification, the similarity lies in the achromatic information of pedestrian contours and textures, and the significant difference lies in the imaging spectrum. Therefore, the present invention designs a Siamese network model to extract visual features of infrared and visible pedestrian images. As shown in FIG. 1 , the present invention uses two networks with the same structure to extract the feature representation of visible light and infrared images, and it is noted that the weights do not share between them. The feature extraction module mainly contains two main networks for processing visible light and infrared data: the base branch network and the alignment and segmentation network.
(1)基分支网络:(1) Base branch network:
由两个相同的子网络组成,他们权重不共享,网络的骨干结构采用的是ResNet50。输入图像都为三通道图像,其高度和宽度为:288×144。假设可见光和红外基分支网络的输入图像分别用和表示,基分支网络特征提取器用φ(·)表示。那么可表示为利用可见光基础分支网络提取到的可见光图像的深度特征,可表示为利用红外基础分支网络提取的红外图像的深度特征;所有输出特征向量的长度为2048。It consists of two identical sub-networks, their weights are not shared, and the backbone structure of the network is ResNet50. The input images are all three-channel images whose height and width are: 288×144. Assume that the input images of the visible-light and infrared-based branch networks are respectively and , and the base branch network feature extractor is denoted by φ(·). So It can be expressed as a visible light image extracted by using the visible light basic branch network The depth features of , can be represented as an infrared image extracted using the infrared base branch network The depth features of ; all output feature vectors are of length 2048.
(2)空间变换模块(2) Spatial transformation module
可见光和红外变换对齐原理:利用可见光和红外基分支中第五残差块特征conv_5x线性回归出一组仿射变换参数和然后,通过式(1)建立仿射变换前后图像对应的坐标关系:Visible light and infrared transformation alignment principle: use the fifth residual block feature conv_5x in the visible and infrared base branches to linearly regress a set of affine transformation parameters and Then, the coordinate relationship corresponding to the images before and after the affine transformation is established by formula (1):
其中,是目标图像的规则网格中的第i个目标坐标,是输入图像中采样点的源坐标,和是仿射变换矩阵,其中θ13和θ23控制转换图像的偏移,θ11,θ12,θ21和θ22控制转换图像的大小和旋转变化;仿射变换时使用双线性采样对图像网格进行采样;和为双线性采样器的输入图像,假定通过空间变换输出的可见光和红外新图像为和则他们之间的对应关系为:in, is the ith target coordinate in the regular grid of the target image, are the source coordinates of the sample points in the input image, and is an affine transformation matrix, where θ 13 and θ 23 control the offset of the transformed image, and θ 11 , θ 12 , θ 21 and θ 22 control the size and rotation changes of the transformed image; bilinear sampling is used for affine transformation on the image Grid for sampling; and is the input image of the bilinear sampler, assuming that the new visible and infrared images output by spatial transformation are and Then the corresponding relationship between them is:
其中,和表示目标图像中每个通道中坐标(m,n)位置的像素值,和表示源图像中每个通道中(n,m)坐标处的像素值,H和W表示目标图像(或源图像)的高度和宽度;双线性采样是连续可导的,因此上述方程式是连续可导并允许梯度反向传播,从而实现行人自适应对齐。对齐后图像的全局特征用可用和表示。此外,为了学习更加具有鉴别力的特征,本发明对变换后的图像水平分为三个不重叠的固定块。in, and represents the pixel value of the coordinate (m, n) position in each channel in the target image, and represents the pixel value at the (n,m) coordinate in each channel in the source image, and H and W represent the height and width of the target image (or source image); bilinear sampling is continuously differentiable, so the above equation is continuous It can induce and allow gradient back-propagation to achieve pedestrian adaptive alignment. The global features of the aligned images are available with and express. In addition, in order to learn more discriminative features, the present invention horizontally divides the transformed image into three non-overlapping fixed blocks.
(3)可见光变换对齐与分块支路(3) Visible light transformation alignment and block branch
如附图2所示,首先将变换对齐的可见光图像进行水平切分为上、中、下三个不重叠的块;第一块高度范围像素为1×96,第二块高度范围像素为97×192,第三块高度范围像素为193×288,三块宽度像素均为144;然后,分别将这三个区域块图像复制到重新定义的3张新的高宽像素为288×144,像素值全为0的子图的对应位置;接下来,通过4个残差网络分别提取变换后的全局特征和3个块子图特征;提取到的特征分别为和本发明选择将全局特征和3个分块的新图特征直接求,得到变换后图像的总特征 As shown in Figure 2, firstly, the transformed and aligned visible light image is horizontally divided into three non-overlapping blocks, upper, middle and lower; ×192, the height range of the third block is 193 × 288, and the width of the three blocks is 144; then, copy these three area block images to 3 new redefined height and width pixels of 288 × 144, pixel The corresponding positions of the subgraphs whose values are all 0; next, the transformed global features and the three block subgraph features are extracted through four residual networks; the extracted features are and The present invention chooses to directly obtain the global feature and the new image features of the three blocks to obtain the total feature of the transformed image.
最后再将与可见光基分支网络的特征通过加权相加的方式融合得到可见光图像的最终特征即其中λ是0到1区间的预定义权衡参数。Finally, the Features of Visible Light-Based Branch Networks The final features of the visible light image are obtained by fusion by weighted addition which is where λ is a predefined trade-off parameter in the interval 0 to 1.
(4)红外变换对齐与分块支路(4) Infrared transform alignment and block branch
如附图3所示,首先将变换对齐红外图像进行水平切分为上、中、下三个不重叠的块;第一块高度范围像素为1×96,第二块高度范围像素为97×192,第三块高度范围像素为193×288,三块宽度像素均为144;然后,分别将这三个区域块图像复制到重新定义的3张新的高宽像素为288×144,像素值全为0的子图的对应位置;接下来,通过4个残差网络分别提取变换后的全局特征和3个块子图特征;提取到的特征分别为和本发明选择将全局特征和3个块子图特征直接求和,得到变换后图像的总特征 As shown in Figure 3, firstly, the transformed and aligned infrared image is horizontally divided into three non-overlapping blocks; the height range of the first block is 1×96, and the height range of the second block is 97× 192, the height range of the third block is 193×288 pixels, and the width pixels of the three blocks are 144; then, copy the three area block images to the redefined 3 new height and width pixels of 288×144, the pixel value The corresponding positions of the subgraphs with all 0s; next, the transformed global features and the three block subgraph features are extracted through the 4 residual networks respectively; the extracted features are and In the present invention, the global feature and the three block sub-image features are directly summed to obtain the total feature of the transformed image.
最后再将与红外基分支网络的特征通过加权相加的方式融合得到可见光图像的最终特征即其中λ是0到1区间的预定义权衡参数,以平衡两个特征的贡献。Finally, the Characteristics of branched networks with infrared bases The final features of the visible light image are obtained by fusion by weighted addition which is where λ is a predefined trade-off parameter in the interval 0 to 1 to balance the contributions of the two features.
(5)特征嵌入与损失计算(5) Feature embedding and loss calculation
为减少红外图像和可见光图像之间的交叉模态差异,通过同一个嵌套函数fθ,fθ本质上为一个全连接层(假设其参数为θ),将可见光图像特征和红外图像特征和映射到同一特征空间,以获得嵌套特征和简写为和 和分别表示输出长度为512的一维特征向量;为了简化表达,使用来表示一个可见光图像批次中的第i个人的第j张图像,同理对于一个批次的红外图像也是同样的表示。In order to reduce the cross-modal difference between the infrared image and the visible light image, through the same nested function f θ , f θ is essentially a fully connected layer (assuming its parameter is θ), and the visible light image features are combined. and infrared image features and map to the same feature space to get nested features and abbreviated as and and Respectively represent a one-dimensional feature vector with an output length of 512; in order to simplify the expression, use to represent a batch of visible light images The jth image of the ith person in , and similarly for a batch of infrared images It is also the same expression.
身份损失函数:Identity loss function:
假设和然后和则分别代表输入行人和的身份预测概率;例如,表示预测输入可见光图像的身份为k的概率;使用和表示真实身份为i的输入图像的标注信息,也即和那么一个批次中使用交叉熵损失预测身份的身份损失函数定义为:Assumption and Then and respectively represent the input pedestrians and The identity prediction probability of ; for example, represents the prediction input visible light image is the probability that the identity is k; use and represents the input image with real identity i label information, that is, and Then the identity loss function for predicting identity using cross-entropy loss in a batch is defined as:
带权重的最难批次采样损失函数:Hardest batch sampling loss function with weights:
由于Lid只考虑每个输入样本的身份,并未强调输入的可见光和红外是否属于同一身份;为了进一步缓解红外图像与可见光图像之间的跨模态差异,由于TriHard loss(最难三元组采样损失)只考虑了极端样本的信息,造成局部梯度特别大,使得网络崩溃,与TriHard loss不同,本发明使用单批次自适应加权的最难三元组采样损失函数,其核心思想是,对于一个批次中的每个红外图像样本可以在该批次中的可见光图像中,分别计算出ID身份与相同的正样本的距离,对于正样本对,在嵌套特征空间中欧式距离越大,权值分配越大;同理,对于也可以在该批次的所有可见光图像中,分别计算出ID身份与不同的负样本的距离,对于负样本对,在嵌套特征空间中欧式距离越大,权值分配越小;因此可知,不同距离(即,困难程度不同)分配不同的权重;从而带权重的最难三元组采样损失函数继承了正负样本对之间相对距离优化的优点,避免了引入任何多余参数,使其更加灵活和适应性强;因此,对于每一个批次中每个可见光图像锚点样本带权重的最难三元组采样损失函数计算为Since L id only considers the identity of each input sample, it does not emphasize whether the input visible light and infrared belong to the same identity; in order to further alleviate the cross-modal difference between the infrared image and the visible light image, due to TriHard loss (the most difficult triplet Sampling loss) only considers the information of extreme samples, resulting in a particularly large local gradient, which makes the network collapse. Different from TriHard loss, the present invention uses the most difficult triple sampling loss function with single batch adaptive weighting. The core idea is, For each infrared image sample in a batch In the visible light images in this batch, the ID identities and the same positive sample The distance of , for positive sample pairs, the larger the Euclidean distance in the nested feature space, the larger the weight distribution; similarly, for It is also possible to calculate the ID identities and different negative samples For the negative sample pair, the larger the Euclidean distance in the nested feature space, the smaller the weight distribution; therefore, it can be seen that different distances (ie, different degrees of difficulty) are assigned different weights; thus the most difficult ternary with weights The group sampling loss function inherits the advantages of relative distance optimization between positive and negative sample pairs, avoids introducing any redundant parameters, making it more flexible and adaptable; therefore, for each visible light image anchor point sample in each batch Hardest triplet sampling loss function with weights Calculated as
其中p为对应的正样本集,n为负集,Wi p为正样本距离权值,Wi n表示负样本距离权值;同理,对于每一个批次中每个红外图像锚点样本带权重的最难三元组采样损失函数计算为:where p is the corresponding positive sample set, n is the negative set, W i p is the distance weight of the positive sample, and W i n is the distance weight of the negative sample; similarly, for each infrared image anchor point sample in each batch Hardest triplet sampling loss function with weights Calculated as:
因此,整体的带权重的最难三元组采样损失函数为:Therefore, the overall weighted hardest triple sampling loss function is:
最终,总损失函数定义为:Finally, the total loss function is defined as:
Lwrt=Lid+λLc_wrt (14)L wrt =L id + λL c_wrt (14)
其中λ为预定义参数,用于平衡ID身份损失Lid和带权重的最难三元组采样损失Lc_wrt的贡献。where λ is a predefined parameter to balance the contributions of the ID identity loss L id and the weighted hardest triple sampling loss L c_wrt .
本发明在RegDB和SYSU-MM01数据集进行了网络结构消融研究,其中Baseline表示基准网络,Lid表示识别损失,Lc_wrt表示带权重的最难三元组采样损失函数,RE是随机擦除,PA表示位置注意模块PAM,ST表示STN空间变换网络,HDB表示水平地分块。另外还和一些主流算法进行了比较,使用单一查询设置进行评估,并使用Rank-1,Rank-5,Rank-10和mAP(平均匹配精度)作为评价指标。实验结果如表1,表2,表3和表4所示,实验精度相比于基准网络和其他对比算法均有较大提高。The present invention conducts network structure ablation research on RegDB and SYSU-MM01 data sets, where Baseline represents the benchmark network, L id represents the recognition loss, L c_wrt represents the most difficult triple sampling loss function with weight, RE is random erasure, PA stands for Position Attention Module PAM, ST stands for STN Spatial Transform Network, and HDB stands for Horizontal Blocking. In addition, comparisons are made with some mainstream algorithms, using a single query setting for evaluation, and using Rank-1, Rank-5, Rank-10 and mAP (Mean Matching Accuracy) as evaluation metrics. The experimental results are shown in Table 1, Table 2, Table 3 and Table 4. Compared with the benchmark network and other comparison algorithms, the experimental accuracy is greatly improved.
表1网络结构RegDB数据上的消融研究Table 1 Ablation study on RegDB data with network structure
表2网络结构在YSU-MM01数据上的消融研究Table 2 Ablation study of network structure on YSU-MM01 data
表3在RegDB数据集上与主流算法结果对比Table 3 compares the results of the mainstream algorithms on the RegDB dataset
表4在SYSU-MM01数据集上与主流算法结果对比Table 4 compares the results with mainstream algorithms on the SYSU-MM01 dataset
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010814790.2A CN113761995A (en) | 2020-08-13 | 2020-08-13 | A Cross-modal Pedestrian Re-identification Method Based on Double Transform Alignment and Blocking |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010814790.2A CN113761995A (en) | 2020-08-13 | 2020-08-13 | A Cross-modal Pedestrian Re-identification Method Based on Double Transform Alignment and Blocking |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113761995A true CN113761995A (en) | 2021-12-07 |
Family
ID=78785620
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010814790.2A Pending CN113761995A (en) | 2020-08-13 | 2020-08-13 | A Cross-modal Pedestrian Re-identification Method Based on Double Transform Alignment and Blocking |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113761995A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114612937A (en) * | 2022-03-15 | 2022-06-10 | 西安电子科技大学 | Single-mode enhancement-based infrared and visible light fusion pedestrian detection method |
CN116071369A (en) * | 2022-12-13 | 2023-05-05 | 哈尔滨理工大学 | An infrared image processing method and device |
WO2023231233A1 (en) * | 2022-05-31 | 2023-12-07 | 浪潮电子信息产业股份有限公司 | Cross-modal target re-identification method and apparatus, device, and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107480178A (en) * | 2017-07-01 | 2017-12-15 | 广州深域信息科技有限公司 | A kind of pedestrian's recognition methods again compared based on image and video cross-module state |
US10176405B1 (en) * | 2018-06-18 | 2019-01-08 | Inception Institute Of Artificial Intelligence | Vehicle re-identification techniques using neural networks for image analysis, viewpoint-aware pattern recognition, and generation of multi- view vehicle representations |
CN111325115A (en) * | 2020-02-05 | 2020-06-23 | 山东师范大学 | Countermeasures cross-modal pedestrian re-identification method and system with triple constraint loss |
-
2020
- 2020-08-13 CN CN202010814790.2A patent/CN113761995A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107480178A (en) * | 2017-07-01 | 2017-12-15 | 广州深域信息科技有限公司 | A kind of pedestrian's recognition methods again compared based on image and video cross-module state |
US10176405B1 (en) * | 2018-06-18 | 2019-01-08 | Inception Institute Of Artificial Intelligence | Vehicle re-identification techniques using neural networks for image analysis, viewpoint-aware pattern recognition, and generation of multi- view vehicle representations |
CN111325115A (en) * | 2020-02-05 | 2020-06-23 | 山东师范大学 | Countermeasures cross-modal pedestrian re-identification method and system with triple constraint loss |
Non-Patent Citations (4)
Title |
---|
BO LI ET AL.: "Visible Infrared Cross-Modality Person Re-Identification Network Based on Adaptive Pedestrian Alignment" * |
MANG YE ET AL.: "Deep Learning for Person Re-identification: A Survey and Outlook" * |
MANG YE ET AL.: "Visible Thermal Person Re-Identification via Dual-Constrained Top-Ranking" * |
罗浩 ET AL.: "基于深度学习的行人重识别研究进展" * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114612937A (en) * | 2022-03-15 | 2022-06-10 | 西安电子科技大学 | Single-mode enhancement-based infrared and visible light fusion pedestrian detection method |
WO2023231233A1 (en) * | 2022-05-31 | 2023-12-07 | 浪潮电子信息产业股份有限公司 | Cross-modal target re-identification method and apparatus, device, and medium |
CN116071369A (en) * | 2022-12-13 | 2023-05-05 | 哈尔滨理工大学 | An infrared image processing method and device |
CN116071369B (en) * | 2022-12-13 | 2023-07-14 | 哈尔滨理工大学 | An infrared image processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107832672B (en) | Pedestrian re-identification method for designing multi-loss function by utilizing attitude information | |
CN111666843B (en) | A Pedestrian Re-Identification Method Based on Global Feature and Local Feature Splicing | |
CN112651262B (en) | A Cross-modal Pedestrian Re-identification Method Based on Adaptive Pedestrian Alignment | |
CN110135375A (en) | Multi-Person Pose Estimation Method Based on Global Information Integration | |
CN111259786A (en) | Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video | |
CN113989851B (en) | Cross-modal pedestrian re-identification method based on heterogeneous fusion graph convolution network | |
CN109508663A (en) | A kind of pedestrian's recognition methods again based on multi-level supervision network | |
CN103268480A (en) | A visual tracking system and method | |
Shen et al. | MCCG: A ConvNeXt-based multiple-classifier method for cross-view geo-localization | |
CN113761995A (en) | A Cross-modal Pedestrian Re-identification Method Based on Double Transform Alignment and Blocking | |
CN116597267B (en) | Image recognition method, device, computer equipment and storage medium | |
CN113011359B (en) | Method for simultaneously detecting plane structure and generating plane description based on image and application | |
CN112434654A (en) | Cross-modal pedestrian re-identification method based on symmetric convolutional neural network | |
CN114495170A (en) | A method and system for pedestrian re-identification based on local suppression of self-attention | |
Zhang et al. | Combining depth-skeleton feature with sparse coding for action recognition | |
CN117274627A (en) | Multi-temporal snow remote sensing image matching method and system based on image conversion | |
CN110543817A (en) | Pedestrian Re-Identification Method Based on Pose-Guided Feature Learning | |
Chen et al. | Self-supervised feature learning for long-term metric visual localization | |
Li et al. | Development and challenges of object detection: A survey | |
Zhang et al. | Two-stage domain adaptation for infrared ship target segmentation | |
Gao et al. | Occluded person re-identification based on feature fusion and sparse reconstruction | |
CN103903269B (en) | The description method and system of ball machine monitor video | |
CN107730535B (en) | Visible light infrared cascade video tracking method | |
CN114154576B (en) | Feature selection model training method and system based on hybrid supervision | |
CN109740405A (en) | A kind of non-alignment similar vehicle front window different information detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20211207 |