CN113761995A - A Cross-modal Pedestrian Re-identification Method Based on Double Transform Alignment and Blocking - Google Patents

A Cross-modal Pedestrian Re-identification Method Based on Double Transform Alignment and Blocking Download PDF

Info

Publication number
CN113761995A
CN113761995A CN202010814790.2A CN202010814790A CN113761995A CN 113761995 A CN113761995 A CN 113761995A CN 202010814790 A CN202010814790 A CN 202010814790A CN 113761995 A CN113761995 A CN 113761995A
Authority
CN
China
Prior art keywords
image
visible light
infrared
pedestrian
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010814790.2A
Other languages
Chinese (zh)
Inventor
陈洪刚
刘强
滕奇志
何小海
卿粼波
吴晓红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202010814790.2A priority Critical patent/CN113761995A/en
Publication of CN113761995A publication Critical patent/CN113761995A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Traffic Control Systems (AREA)
  • Image Processing (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

The invention provides a cross-mode pedestrian re-identification method based on double-transformation alignment and blocking. Firstly, extracting the features of the input infrared and visible light pedestrian images by using a basic branch network, linearly regressing a group of affine transformation parameters by using the high-level features of the images, and then generating an aligned image by using the parameters, wherein the image can effectively relieve the modal difference of misalignment. And then, horizontally dividing the aligned image into three blocks, taking out the characteristics of the three block images, and fusing the three block images with the aligned global characteristics and the original image characteristics to obtain the total characteristics of the visible light and the infrared image. Next, the total features of the infrared and visible images are mapped to the same embedding space. And finally, performing joint training by combining the identity loss and the most difficult batch sampling loss function with the weight to improve the identification precision. The invention is mainly applied to the video monitoring intelligent analysis application system, and has wide application prospect in the fields of image retrieval, intelligent security and the like.

Description

一种基于双变换对齐与分块的跨模态行人重识别方法A Cross-modal Pedestrian Re-identification Method Based on Double Transform Alignment and Blocking

技术领域technical field

本发明涉及一种基于双变换对齐与分块的跨模态行人重识别方法,以及一种新的网络模型DTASN(Dual transform alignment and segmentation network),涉及视频智能监控领域中的跨模态行人重识别问题,属于计算机视觉与智能信息处理领域。The invention relates to a cross-modal pedestrian re-identification method based on double transform alignment and segmentation, and a new network model DTASN (Dual transform alignment and segmentation network), and relates to a cross-modal pedestrian re-identification method in the field of video intelligent monitoring. The recognition problem belongs to the field of computer vision and intelligent information processing.

背景技术Background technique

行人重识别(Person Re-Identification,ReID)是计算机视觉领域中的一种技术,行人重识别的目的是在多个不重叠的摄像机中检索感兴趣的人,通常被认为是图像检索的一个子问题。一个高效的ReID算法可以缓解视频观看的痛苦,加速调查的进程。行人重识别在视频监控,智能安防等领域开阔的应用前景,引起了学术界和工业界的广泛关注,使其成为计算机视觉领域一个既很有研究价值又极具挑战性的研究热点。Person Re-Identification (ReID) is a technique in the field of computer vision. The purpose of Person Re-Identification is to retrieve persons of interest from multiple non-overlapping cameras, which is usually considered as a sub-section of image retrieval. question. An efficient ReID algorithm can ease the pain of video viewing and speed up the investigation process. The broad application prospect of pedestrian re-identification in video surveillance, intelligent security and other fields has attracted extensive attention from academia and industry, making it a valuable and challenging research hotspot in the field of computer vision.

当前,大多数研究主要集中在RGB-RGB(单模态)行人重识别问题,其中probe和gallery行人都是可见摄像机捕获。但是,可见光相机可能无法捕获光照变化中的外观信息,尤其是在光照条件不足时(例如,在夜间或黑暗的环境中)。得益于技术的发展,当前大多数新一代相机都可以根据光线条件自动进行可见光和红外模式切换。因此,有必要开发一些方法来解决可见光和红外图像跨模态ReID问题。与传统的行人重识别不同,可见光和红外图像跨模态行人重识别是将不同光谱的可见光行人图像和红外摄像机捕捉到的行人图像进行匹配,这种可见光图像与红外图像跨模态行人重识别VI-ReID(Visible andinfrared person re-identification)主要解决跨模态图像的匹配。VI-ReID通常使用可见光(或红外)行人图像来搜索整个摄像头设备中的红外(或可见光)行人图像。Currently, most research focuses on the RGB-RGB (single-modal) pedestrian re-identification problem, where both probe and gallery pedestrians are captured by visible cameras. However, visible light cameras may fail to capture appearance information in lighting changes, especially when lighting conditions are insufficient (eg, at night or in dark environments). Thanks to advances in technology, most current-generation cameras can automatically switch between visible and infrared modes depending on lighting conditions. Therefore, it is necessary to develop some methods to solve the cross-modal ReID problem of visible and infrared images. Different from the traditional pedestrian re-identification, the cross-modal pedestrian re-identification of visible light and infrared images is to match the visible-light pedestrian images of different spectra and the pedestrian images captured by the infrared camera. VI-ReID (Visible and infrared person re-identification) mainly solves the matching of cross-modal images. VI-ReID typically uses visible light (or infrared) pedestrian images to search for infrared (or visible light) pedestrian images throughout the camera device.

行人图像(裁剪的行人)通常是通过自动检测器或跟踪器获得的。然而,由于人的检测/跟踪结果不完善,图像的不对齐通常是不可避免的,即存在部分遮挡、部件缺失(只有身体的一部分)、背景过多等语义错位错误。为了解决ReID中的语义错位问题。一些工作试图通过减少异类数据的跨模态差异来提高行人匹配的准确性。另外,还有一些方法,着重解决行人不对齐的问题,以提高行人匹配的准确性,从而在某种程度上减小模态差异。除了上述困难之外,由于姿态和视角的变化,行人的外观也会发生很大的变化。许多实际问题都会导致图像之间的空间语义错位,即同一空间位置对应的两幅匹配图像的内容语义不同,从而限制了人员再识别技术的鲁棒性和有效性。因此,开发一个具有较强判别能力的模型来同时处理跨模态变化是很重要的,它不仅可以减少异构数据的跨模态差异,同时还可以缓解模态内图像之间不对齐带来的图像差异,从而提高跨模态行人重识别的精度。Pedestrian images (cropped pedestrians) are usually obtained by automatic detectors or trackers. However, due to imperfect human detection/tracking results, misalignment of images is usually unavoidable, i.e., there are semantic misalignment errors such as partial occlusion, missing parts (only part of the body), and excessive background. In order to solve the problem of semantic dislocation in ReID. Some works attempt to improve the accuracy of pedestrian matching by reducing cross-modal differences in heterogeneous data. In addition, there are some methods that focus on solving the problem of pedestrian misalignment to improve the accuracy of pedestrian matching, thereby reducing the modal difference to some extent. In addition to the above difficulties, the appearance of pedestrians also changes greatly due to changes in pose and perspective. Many practical problems can lead to spatial semantic dislocation between images, that is, the content semantics of two matching images corresponding to the same spatial location are different, which limits the robustness and effectiveness of person re-identification technology. Therefore, it is important to develop a model with strong discriminative ability to deal with cross-modal changes simultaneously, which can not only reduce the cross-modal differences in heterogeneous data, but also alleviate the misalignment between images within the modalities. image differences, thereby improving the accuracy of cross-modal person re-identification.

发明内容SUMMARY OF THE INVENTION

本发明提出了一种基于双变换对齐与分块的跨模态行人重识别方法,设计了一种多路径双变换对齐与切分网络结构DTASN,每个训练批次采样策略是:从训练数据集中随机选取P个行人,然后每个行人再随机选取K张可见光行人图像和K张红外行人图像,构成一个包含有2PK张行人图像的批次训练数据,最后将2PK张行人图像送入网络进行训练。在标签信息的监督下,利用卷积神经网络的自我学习能力,分别对错位严重的可见光图像和红外图像进行自适应的对齐矫正,并且将对齐矫正后的图像进行水平切分得到局部块的图像,从而达到提高跨模态行人重识别精度的目的。The invention proposes a cross-modal pedestrian re-identification method based on double-transform alignment and segmentation, and designs a multi-path double-transform alignment and segmentation network structure DTASN. The sampling strategy of each training batch is: from the training data Centrally select P pedestrians at random, and then randomly select K visible light pedestrian images and K infrared pedestrian images for each pedestrian to form a batch training data containing 2PK pedestrian images, and finally send 2PK pedestrian images into the network. train. Under the supervision of the label information, the self-learning ability of the convolutional neural network is used to perform adaptive alignment correction on the severely misplaced visible light images and infrared images, and the aligned and corrected images are horizontally segmented to obtain local block images. , so as to achieve the purpose of improving the accuracy of cross-modal pedestrian re-identification.

一种基于双变换对齐与分块的跨模态行人重识别方法,包括以下步骤:A cross-modal pedestrian re-identification method based on double transform alignment and block, comprising the following steps:

(1)利用可见光基分支网络提取可见光行人图像

Figure BDA0002632290710000027
的特征,得到
Figure BDA00026322907100000210
利用红外基分支网络去提取红外行人图像
Figure BDA0002632290710000028
的特征,得到
Figure BDA0002632290710000029
(1) Using visible light-based branch network to extract visible light pedestrian images
Figure BDA0002632290710000027
features, get
Figure BDA00026322907100000210
Extracting Infrared Pedestrian Images Using Infrared-Based Branch Networks
Figure BDA0002632290710000028
features, get
Figure BDA0002632290710000029

(2)从可见光基分支网络中取出第五残差块(conv_5x)特征输入到可见光图像空间变换模块的网格网络中,线性回归一组仿射变换参数

Figure BDA0002632290710000021
并生成可见光图像变换网格,然后通过双线性采样器生成新的可见光行人图像
Figure BDA00026322907100000212
然后对
Figure BDA00026322907100000211
进行特征提取,得到变换后的可见光行人全局特征
Figure BDA0002632290710000022
(2) Take the fifth residual block (conv_5x) feature from the visible light base branch network and input it into the grid network of the visible light image space transformation module, and linearly regress a set of affine transformation parameters
Figure BDA0002632290710000021
and generate a visible light image transformation grid, and then generate a new visible light pedestrian image through a bilinear sampler
Figure BDA00026322907100000212
then right
Figure BDA00026322907100000211
Perform feature extraction to obtain the transformed visible light pedestrian global feature
Figure BDA0002632290710000022

(3)从红外基分支网络中取出第五残差块(conv_5x)特征输入到红外图像空间变换模块的网格网络中,线性回归一组仿射变换参数

Figure BDA0002632290710000023
并生成红外图像变换网格,然后通过双线性采样器生成新的红外行人对齐图像
Figure BDA0002632290710000024
然后对
Figure BDA0002632290710000025
进行特征提取,得到全局特征
Figure BDA0002632290710000026
(3) Take the fifth residual block (conv_5x) feature from the infrared base branch network and input it into the grid network of the infrared image space transformation module, and linearly regress a set of affine transformation parameters
Figure BDA0002632290710000023
and generate an infrared image transformation grid, and then generate a new infrared pedestrian alignment image through a bilinear sampler
Figure BDA0002632290710000024
then right
Figure BDA0002632290710000025
Perform feature extraction to get global features
Figure BDA0002632290710000026

(4)将新的可见光行人图像

Figure BDA00026322907100000318
水平切分为上、中、下三个不重叠块;然后分别提取这三块的特征,得到特征
Figure BDA0002632290710000031
Figure BDA0002632290710000032
最后将对齐图像全局特征
Figure BDA0002632290710000033
和三块图像特征求和得到可见光变换对齐与切分网络的总特征
Figure BDA0002632290710000034
(4) The new visible light pedestrian image
Figure BDA00026322907100000318
The horizontal segmentation is divided into three non-overlapping blocks, upper, middle and lower; then the features of these three blocks are extracted respectively to obtain the features
Figure BDA0002632290710000031
and
Figure BDA0002632290710000032
Finally, the image global features will be aligned
Figure BDA0002632290710000033
Sum the three image features to get the total features of the visible light transformation alignment and segmentation network
Figure BDA0002632290710000034

(5)将新的红外行人图像

Figure BDA0002632290710000035
水平切分为上、中、下三个不重叠块;然后分别提取这三块的特征,得到特征
Figure BDA0002632290710000036
Figure BDA0002632290710000037
最后将对齐图像全局特征
Figure BDA0002632290710000038
和三块图像特征求和得到红外变换对齐与切分网络的总特征
Figure BDA0002632290710000039
(5) Put the new infrared pedestrian image
Figure BDA0002632290710000035
The horizontal segmentation is divided into three non-overlapping blocks, upper, middle and lower; then the features of these three blocks are extracted respectively to obtain the features
Figure BDA0002632290710000036
and
Figure BDA0002632290710000037
Finally, the image global features will be aligned
Figure BDA0002632290710000038
Sum the three image features to get the total features of the infrared transform alignment and segmentation network
Figure BDA0002632290710000039

(6)将

Figure BDA00026322907100000310
与可见光基础分支网络提取到的特征
Figure BDA00026322907100000311
进行加权相加融合,得到可见光分支的总特征
Figure BDA00026322907100000312
Figure BDA00026322907100000313
与红外基础分支网络提取到的特征
Figure BDA00026322907100000314
进行加权相加融合,得到红外分支的总特征
Figure BDA00026322907100000315
然后将可见光图像的特征
Figure BDA00026322907100000316
和红外图像的特征
Figure BDA00026322907100000317
映射到同一个特征嵌入空间中,结合身份损失函数和带权重的最难批次采样损失函数进行训练,最终提高跨模态行人重识别精度。(6) will
Figure BDA00026322907100000310
Features extracted from the visible light base branch network
Figure BDA00026322907100000311
Perform weighted addition and fusion to obtain the total features of the visible light branch
Figure BDA00026322907100000312
Will
Figure BDA00026322907100000313
Features extracted with the infrared base branch network
Figure BDA00026322907100000314
Perform weighted addition fusion to get the total features of the infrared branch
Figure BDA00026322907100000315
Then the features of the visible light image
Figure BDA00026322907100000316
and infrared image features
Figure BDA00026322907100000317
Map to the same feature embedding space, combine the identity loss function and the weighted most difficult batch sampling loss function for training, and finally improve the accuracy of cross-modal person re-identification.

附图说明Description of drawings

图1为本发明基于双变换对齐与分块的跨模态行人重识别方法框图;1 is a block diagram of a cross-modal pedestrian re-identification method based on double-transform alignment and segmentation of the present invention;

图2为本发明可见光变换对齐与分块支路结构图。FIG. 2 is a structural diagram of the visible light transformation alignment and block branch according to the present invention.

图3为本发明红外变换对齐与分块支路结构图。FIG. 3 is a structural diagram of the infrared transform alignment and block branch according to the present invention.

具体实施方式Detailed ways

下面结合附图1、附图2和附图3对本发明作进一步说明:Below in conjunction with accompanying drawing 1, accompanying drawing 2 and accompanying drawing 3, the present invention is further described:

DTASN模型网络结构和原理具体如下:The network structure and principle of the DTASN model are as follows:

该网络模型框架以端到端的方式通过多径双对齐与分块网络来学习特征表示和距离度量,同时保持较高的可分辨性。该框架包括三个组件:(1)特征提取模块,(2)特征嵌入模块,(3)损失计算模块。所有路径的骨干网结都是采用的深度残差网络ResNet50。由于可用数据的缺乏,为了加快训练过程的收敛速度,本发明使用预先训练好的ResNet50模型对网络进行初始化。为了加强对局部特征的注意,本发明在每条路径上都应用了位置注意模块。This network model framework learns feature representations and distance metrics through a multi-path dual alignment and block network in an end-to-end manner while maintaining high discriminability. The framework consists of three components: (1) feature extraction module, (2) feature embedding module, (3) loss calculation module. The backbone network of all paths is the deep residual network ResNet50. Due to the lack of available data, in order to speed up the convergence speed of the training process, the present invention uses the pre-trained ResNet50 model to initialize the network. To strengthen the attention to local features, the present invention applies a location attention module on each path.

对于可见光和红外交叉模态行人重识别,相似点在于行人轮廓和纹理的非彩色信息,显著差异在于成像光谱。因此,本发明设计了一个孪生网络模型来提取红外和可见行人图像的视觉特征。如附图1所示,本发明使用两个结构相同的网络提取可见光和红外图像的特征表示,注意他们之间权值不共享。特征提取模块主要包含了处理可见光和红外数据的两个主要网络:基分支网络和对齐与分割网络。For visible-light and infrared cross-modal pedestrian re-identification, the similarity lies in the achromatic information of pedestrian contours and textures, and the significant difference lies in the imaging spectrum. Therefore, the present invention designs a Siamese network model to extract visual features of infrared and visible pedestrian images. As shown in FIG. 1 , the present invention uses two networks with the same structure to extract the feature representation of visible light and infrared images, and it is noted that the weights do not share between them. The feature extraction module mainly contains two main networks for processing visible light and infrared data: the base branch network and the alignment and segmentation network.

(1)基分支网络:(1) Base branch network:

由两个相同的子网络组成,他们权重不共享,网络的骨干结构采用的是ResNet50。输入图像都为三通道图像,其高度和宽度为:288×144。假设可见光和红外基分支网络的输入图像分别用

Figure BDA0002632290710000041
Figure BDA0002632290710000042
表示,基分支网络特征提取器用φ(·)表示。那么
Figure BDA0002632290710000043
可表示为利用可见光基础分支网络提取到的可见光图像
Figure BDA0002632290710000044
的深度特征,
Figure BDA0002632290710000045
可表示为利用红外基础分支网络提取的红外图像
Figure BDA0002632290710000046
的深度特征;所有输出特征向量的长度为2048。It consists of two identical sub-networks, their weights are not shared, and the backbone structure of the network is ResNet50. The input images are all three-channel images whose height and width are: 288×144. Assume that the input images of the visible-light and infrared-based branch networks are respectively
Figure BDA0002632290710000041
and
Figure BDA0002632290710000042
, and the base branch network feature extractor is denoted by φ(·). So
Figure BDA0002632290710000043
It can be expressed as a visible light image extracted by using the visible light basic branch network
Figure BDA0002632290710000044
The depth features of ,
Figure BDA0002632290710000045
can be represented as an infrared image extracted using the infrared base branch network
Figure BDA0002632290710000046
The depth features of ; all output feature vectors are of length 2048.

(2)空间变换模块(2) Spatial transformation module

可见光和红外变换对齐原理:利用可见光和红外基分支中第五残差块特征conv_5x线性回归出一组仿射变换参数

Figure BDA0002632290710000047
Figure BDA0002632290710000048
然后,通过式(1)建立仿射变换前后图像对应的坐标关系:Visible light and infrared transformation alignment principle: use the fifth residual block feature conv_5x in the visible and infrared base branches to linearly regress a set of affine transformation parameters
Figure BDA0002632290710000047
and
Figure BDA0002632290710000048
Then, the coordinate relationship corresponding to the images before and after the affine transformation is established by formula (1):

Figure BDA0002632290710000049
Figure BDA0002632290710000049

其中,

Figure BDA00026322907100000410
是目标图像的规则网格中的第i个目标坐标,
Figure BDA00026322907100000411
是输入图像中采样点的源坐标,
Figure BDA00026322907100000412
Figure BDA00026322907100000413
是仿射变换矩阵,其中θ13和θ23控制转换图像的偏移,θ11,θ12,θ21和θ22控制转换图像的大小和旋转变化;仿射变换时使用双线性采样对图像网格进行采样;
Figure BDA00026322907100000414
Figure BDA00026322907100000415
为双线性采样器的输入图像,假定通过空间变换输出的可见光和红外新图像为
Figure BDA00026322907100000416
Figure BDA00026322907100000417
则他们之间的对应关系为:in,
Figure BDA00026322907100000410
is the ith target coordinate in the regular grid of the target image,
Figure BDA00026322907100000411
are the source coordinates of the sample points in the input image,
Figure BDA00026322907100000412
and
Figure BDA00026322907100000413
is an affine transformation matrix, where θ 13 and θ 23 control the offset of the transformed image, and θ 11 , θ 12 , θ 21 and θ 22 control the size and rotation changes of the transformed image; bilinear sampling is used for affine transformation on the image Grid for sampling;
Figure BDA00026322907100000414
and
Figure BDA00026322907100000415
is the input image of the bilinear sampler, assuming that the new visible and infrared images output by spatial transformation are
Figure BDA00026322907100000416
and
Figure BDA00026322907100000417
Then the corresponding relationship between them is:

Figure BDA00026322907100000418
Figure BDA00026322907100000418

Figure BDA0002632290710000051
Figure BDA0002632290710000051

其中,

Figure BDA0002632290710000052
Figure BDA0002632290710000053
表示目标图像中每个通道中坐标(m,n)位置的像素值,
Figure BDA0002632290710000054
Figure BDA0002632290710000055
表示源图像中每个通道中(n,m)坐标处的像素值,H和W表示目标图像(或源图像)的高度和宽度;双线性采样是连续可导的,因此上述方程式是连续可导并允许梯度反向传播,从而实现行人自适应对齐。对齐后图像的全局特征用可用
Figure BDA0002632290710000056
Figure BDA0002632290710000057
表示。此外,为了学习更加具有鉴别力的特征,本发明对变换后的图像水平分为三个不重叠的固定块。in,
Figure BDA0002632290710000052
and
Figure BDA0002632290710000053
represents the pixel value of the coordinate (m, n) position in each channel in the target image,
Figure BDA0002632290710000054
and
Figure BDA0002632290710000055
represents the pixel value at the (n,m) coordinate in each channel in the source image, and H and W represent the height and width of the target image (or source image); bilinear sampling is continuously differentiable, so the above equation is continuous It can induce and allow gradient back-propagation to achieve pedestrian adaptive alignment. The global features of the aligned images are available with
Figure BDA0002632290710000056
and
Figure BDA0002632290710000057
express. In addition, in order to learn more discriminative features, the present invention horizontally divides the transformed image into three non-overlapping fixed blocks.

(3)可见光变换对齐与分块支路(3) Visible light transformation alignment and block branch

如附图2所示,首先将变换对齐的可见光图像进行水平切分为上、中、下三个不重叠的块;第一块高度范围像素为1×96,第二块高度范围像素为97×192,第三块高度范围像素为193×288,三块宽度像素均为144;然后,分别将这三个区域块图像复制到重新定义的3张新的高宽像素为288×144,像素值全为0的子图的对应位置;接下来,通过4个残差网络分别提取变换后的全局特征和3个块子图特征;提取到的特征分别为

Figure BDA0002632290710000058
Figure BDA0002632290710000059
本发明选择将全局特征和3个分块的新图特征直接求,得到变换后图像的总特征
Figure BDA00026322907100000510
As shown in Figure 2, firstly, the transformed and aligned visible light image is horizontally divided into three non-overlapping blocks, upper, middle and lower; ×192, the height range of the third block is 193 × 288, and the width of the three blocks is 144; then, copy these three area block images to 3 new redefined height and width pixels of 288 × 144, pixel The corresponding positions of the subgraphs whose values are all 0; next, the transformed global features and the three block subgraph features are extracted through four residual networks; the extracted features are
Figure BDA0002632290710000058
and
Figure BDA0002632290710000059
The present invention chooses to directly obtain the global feature and the new image features of the three blocks to obtain the total feature of the transformed image.
Figure BDA00026322907100000510

Figure BDA00026322907100000511
Figure BDA00026322907100000511

最后再将

Figure BDA00026322907100000512
与可见光基分支网络的特征
Figure BDA00026322907100000513
通过加权相加的方式融合得到可见光图像的最终特征
Figure BDA00026322907100000514
Figure BDA00026322907100000515
其中λ是0到1区间的预定义权衡参数。Finally, the
Figure BDA00026322907100000512
Features of Visible Light-Based Branch Networks
Figure BDA00026322907100000513
The final features of the visible light image are obtained by fusion by weighted addition
Figure BDA00026322907100000514
which is
Figure BDA00026322907100000515
where λ is a predefined trade-off parameter in the interval 0 to 1.

(4)红外变换对齐与分块支路(4) Infrared transform alignment and block branch

如附图3所示,首先将变换对齐红外图像进行水平切分为上、中、下三个不重叠的块;第一块高度范围像素为1×96,第二块高度范围像素为97×192,第三块高度范围像素为193×288,三块宽度像素均为144;然后,分别将这三个区域块图像复制到重新定义的3张新的高宽像素为288×144,像素值全为0的子图的对应位置;接下来,通过4个残差网络分别提取变换后的全局特征和3个块子图特征;提取到的特征分别为

Figure BDA00026322907100000516
Figure BDA00026322907100000517
本发明选择将全局特征和3个块子图特征直接求和,得到变换后图像的总特征
Figure BDA0002632290710000061
As shown in Figure 3, firstly, the transformed and aligned infrared image is horizontally divided into three non-overlapping blocks; the height range of the first block is 1×96, and the height range of the second block is 97× 192, the height range of the third block is 193×288 pixels, and the width pixels of the three blocks are 144; then, copy the three area block images to the redefined 3 new height and width pixels of 288×144, the pixel value The corresponding positions of the subgraphs with all 0s; next, the transformed global features and the three block subgraph features are extracted through the 4 residual networks respectively; the extracted features are
Figure BDA00026322907100000516
and
Figure BDA00026322907100000517
In the present invention, the global feature and the three block sub-image features are directly summed to obtain the total feature of the transformed image.
Figure BDA0002632290710000061

Figure BDA0002632290710000062
Figure BDA0002632290710000062

最后再将

Figure BDA0002632290710000063
与红外基分支网络的特征
Figure BDA0002632290710000064
通过加权相加的方式融合得到可见光图像的最终特征
Figure BDA0002632290710000065
Figure BDA0002632290710000066
其中λ是0到1区间的预定义权衡参数,以平衡两个特征的贡献。Finally, the
Figure BDA0002632290710000063
Characteristics of branched networks with infrared bases
Figure BDA0002632290710000064
The final features of the visible light image are obtained by fusion by weighted addition
Figure BDA0002632290710000065
which is
Figure BDA0002632290710000066
where λ is a predefined trade-off parameter in the interval 0 to 1 to balance the contributions of the two features.

(5)特征嵌入与损失计算(5) Feature embedding and loss calculation

为减少红外图像和可见光图像之间的交叉模态差异,通过同一个嵌套函数fθ,fθ本质上为一个全连接层(假设其参数为θ),将可见光图像特征

Figure BDA0002632290710000067
和红外图像特征
Figure BDA0002632290710000068
和映射到同一特征空间,以获得嵌套特征
Figure BDA0002632290710000069
Figure BDA00026322907100000610
简写为
Figure BDA00026322907100000611
Figure BDA00026322907100000612
Figure BDA00026322907100000613
Figure BDA00026322907100000614
分别表示输出长度为512的一维特征向量;为了简化表达,使用
Figure BDA00026322907100000615
来表示一个可见光图像批次
Figure BDA00026322907100000616
中的第i个人的第j张图像,同理对于一个批次的红外图像
Figure BDA00026322907100000617
也是同样的表示。In order to reduce the cross-modal difference between the infrared image and the visible light image, through the same nested function f θ , f θ is essentially a fully connected layer (assuming its parameter is θ), and the visible light image features are combined.
Figure BDA0002632290710000067
and infrared image features
Figure BDA0002632290710000068
and map to the same feature space to get nested features
Figure BDA0002632290710000069
and
Figure BDA00026322907100000610
abbreviated as
Figure BDA00026322907100000611
and
Figure BDA00026322907100000612
Figure BDA00026322907100000613
and
Figure BDA00026322907100000614
Respectively represent a one-dimensional feature vector with an output length of 512; in order to simplify the expression, use
Figure BDA00026322907100000615
to represent a batch of visible light images
Figure BDA00026322907100000616
The jth image of the ith person in , and similarly for a batch of infrared images
Figure BDA00026322907100000617
It is also the same expression.

身份损失函数:Identity loss function:

假设

Figure BDA00026322907100000618
Figure BDA00026322907100000619
然后
Figure BDA00026322907100000620
Figure BDA00026322907100000621
则分别代表输入行人
Figure BDA00026322907100000622
Figure BDA00026322907100000623
的身份预测概率;例如,
Figure BDA00026322907100000624
表示预测输入可见光图像
Figure BDA00026322907100000625
的身份为k的概率;使用
Figure BDA00026322907100000626
Figure BDA00026322907100000627
表示真实身份为i的输入图像
Figure BDA00026322907100000628
的标注信息,也即
Figure BDA00026322907100000629
Figure BDA00026322907100000630
那么一个批次中使用交叉熵损失预测身份的身份损失函数定义为:Assumption
Figure BDA00026322907100000618
and
Figure BDA00026322907100000619
Then
Figure BDA00026322907100000620
and
Figure BDA00026322907100000621
respectively represent the input pedestrians
Figure BDA00026322907100000622
and
Figure BDA00026322907100000623
The identity prediction probability of ; for example,
Figure BDA00026322907100000624
represents the prediction input visible light image
Figure BDA00026322907100000625
is the probability that the identity is k; use
Figure BDA00026322907100000626
and
Figure BDA00026322907100000627
represents the input image with real identity i
Figure BDA00026322907100000628
label information, that is,
Figure BDA00026322907100000629
and
Figure BDA00026322907100000630
Then the identity loss function for predicting identity using cross-entropy loss in a batch is defined as:

Figure BDA00026322907100000631
Figure BDA00026322907100000631

带权重的最难批次采样损失函数:Hardest batch sampling loss function with weights:

由于Lid只考虑每个输入样本的身份,并未强调输入的可见光和红外是否属于同一身份;为了进一步缓解红外图像与可见光图像之间的跨模态差异,由于TriHard loss(最难三元组采样损失)只考虑了极端样本的信息,造成局部梯度特别大,使得网络崩溃,与TriHard loss不同,本发明使用单批次自适应加权的最难三元组采样损失函数,其核心思想是,对于一个批次中的每个红外图像样本

Figure BDA0002632290710000071
可以在该批次中的可见光图像中,分别计算出ID身份与
Figure BDA0002632290710000072
相同的正样本
Figure BDA0002632290710000073
的距离,对于正样本对,在嵌套特征空间中欧式距离越大,权值分配越大;同理,对于
Figure BDA0002632290710000074
也可以在该批次的所有可见光图像中,分别计算出ID身份与
Figure BDA0002632290710000075
不同的负样本
Figure BDA0002632290710000076
的距离,对于负样本对,在嵌套特征空间中欧式距离越大,权值分配越小;因此可知,不同距离(即,困难程度不同)分配不同的权重;从而带权重的最难三元组采样损失函数继承了正负样本对之间相对距离优化的优点,避免了引入任何多余参数,使其更加灵活和适应性强;因此,对于每一个批次中每个可见光图像锚点样本
Figure BDA0002632290710000077
带权重的最难三元组采样损失函数
Figure BDA0002632290710000078
计算为Since L id only considers the identity of each input sample, it does not emphasize whether the input visible light and infrared belong to the same identity; in order to further alleviate the cross-modal difference between the infrared image and the visible light image, due to TriHard loss (the most difficult triplet Sampling loss) only considers the information of extreme samples, resulting in a particularly large local gradient, which makes the network collapse. Different from TriHard loss, the present invention uses the most difficult triple sampling loss function with single batch adaptive weighting. The core idea is, For each infrared image sample in a batch
Figure BDA0002632290710000071
In the visible light images in this batch, the ID identities and
Figure BDA0002632290710000072
the same positive sample
Figure BDA0002632290710000073
The distance of , for positive sample pairs, the larger the Euclidean distance in the nested feature space, the larger the weight distribution; similarly, for
Figure BDA0002632290710000074
It is also possible to calculate the ID identities and
Figure BDA0002632290710000075
different negative samples
Figure BDA0002632290710000076
For the negative sample pair, the larger the Euclidean distance in the nested feature space, the smaller the weight distribution; therefore, it can be seen that different distances (ie, different degrees of difficulty) are assigned different weights; thus the most difficult ternary with weights The group sampling loss function inherits the advantages of relative distance optimization between positive and negative sample pairs, avoids introducing any redundant parameters, making it more flexible and adaptable; therefore, for each visible light image anchor point sample in each batch
Figure BDA0002632290710000077
Hardest triplet sampling loss function with weights
Figure BDA0002632290710000078
Calculated as

Figure BDA0002632290710000079
Figure BDA0002632290710000079

Figure BDA00026322907100000710
Figure BDA00026322907100000710

Figure BDA00026322907100000711
Figure BDA00026322907100000711

其中p为对应的正样本集,n为负集,Wi p为正样本距离权值,Wi n表示负样本距离权值;同理,对于每一个批次中每个红外图像锚点样本

Figure BDA00026322907100000712
带权重的最难三元组采样损失函数
Figure BDA00026322907100000713
计算为:where p is the corresponding positive sample set, n is the negative set, W i p is the distance weight of the positive sample, and W i n is the distance weight of the negative sample; similarly, for each infrared image anchor point sample in each batch
Figure BDA00026322907100000712
Hardest triplet sampling loss function with weights
Figure BDA00026322907100000713
Calculated as:

Figure BDA00026322907100000714
Figure BDA00026322907100000714

Figure BDA00026322907100000715
Figure BDA00026322907100000715

Figure BDA00026322907100000716
Figure BDA00026322907100000716

因此,整体的带权重的最难三元组采样损失函数为:Therefore, the overall weighted hardest triple sampling loss function is:

Figure BDA0002632290710000081
Figure BDA0002632290710000081

最终,总损失函数定义为:Finally, the total loss function is defined as:

Lwrt=Lid+λLc_wrt (14)L wrt =L id + λL c_wrt (14)

其中λ为预定义参数,用于平衡ID身份损失Lid和带权重的最难三元组采样损失Lc_wrt的贡献。where λ is a predefined parameter to balance the contributions of the ID identity loss L id and the weighted hardest triple sampling loss L c_wrt .

本发明在RegDB和SYSU-MM01数据集进行了网络结构消融研究,其中Baseline表示基准网络,Lid表示识别损失,Lc_wrt表示带权重的最难三元组采样损失函数,RE是随机擦除,PA表示位置注意模块PAM,ST表示STN空间变换网络,HDB表示水平地分块。另外还和一些主流算法进行了比较,使用单一查询设置进行评估,并使用Rank-1,Rank-5,Rank-10和mAP(平均匹配精度)作为评价指标。实验结果如表1,表2,表3和表4所示,实验精度相比于基准网络和其他对比算法均有较大提高。The present invention conducts network structure ablation research on RegDB and SYSU-MM01 data sets, where Baseline represents the benchmark network, L id represents the recognition loss, L c_wrt represents the most difficult triple sampling loss function with weight, RE is random erasure, PA stands for Position Attention Module PAM, ST stands for STN Spatial Transform Network, and HDB stands for Horizontal Blocking. In addition, comparisons are made with some mainstream algorithms, using a single query setting for evaluation, and using Rank-1, Rank-5, Rank-10 and mAP (Mean Matching Accuracy) as evaluation metrics. The experimental results are shown in Table 1, Table 2, Table 3 and Table 4. Compared with the benchmark network and other comparison algorithms, the experimental accuracy is greatly improved.

表1网络结构RegDB数据上的消融研究Table 1 Ablation study on RegDB data with network structure

Figure BDA0002632290710000082
Figure BDA0002632290710000082

表2网络结构在YSU-MM01数据上的消融研究Table 2 Ablation study of network structure on YSU-MM01 data

Figure BDA0002632290710000083
Figure BDA0002632290710000083

表3在RegDB数据集上与主流算法结果对比Table 3 compares the results of the mainstream algorithms on the RegDB dataset

Figure BDA0002632290710000091
Figure BDA0002632290710000091

表4在SYSU-MM01数据集上与主流算法结果对比Table 4 compares the results with mainstream algorithms on the SYSU-MM01 dataset

Figure BDA0002632290710000092
Figure BDA0002632290710000092

Figure BDA0002632290710000101
Figure BDA0002632290710000101

Claims (6)

1. A cross-mode pedestrian re-identification method based on double-transformation alignment and blocking is characterized by comprising the following steps:
(1) method for extracting visible light pedestrian image by using visible light-based branch network
Figure FDA0002632290700000011
Is characterized by obtaining
Figure FDA0002632290700000012
Infrared pedestrian image extraction method using infrared-based branch network
Figure FDA0002632290700000013
Is characterized by obtaining
Figure FDA0002632290700000014
(2) Taking out the characteristics of a fifth residual block (conv _5x) from the visible light base branch network, inputting the characteristics into a grid network of a visible light image space transformation module, and linearly regressing a group of affine transformation parameters
Figure FDA0002632290700000015
And generating a visible light image transformation grid, and then generating a new visible light pedestrian alignment image through a bilinear sampler
Figure FDA0002632290700000016
Then to
Figure FDA0002632290700000017
Carrying out feature extraction to obtain global features
Figure FDA0002632290700000018
(3) Taking out the characteristics of a fifth residual block (conv _5x) from the infrared base branch network, inputting the characteristics into a grid network of an infrared image space transformation module, and linearly regressing a group of affine transformation parameters
Figure FDA0002632290700000019
And generating an infrared image transformation grid, and then generating a new infrared pedestrian alignment image through a bilinear sampler
Figure FDA00026322907000000110
Then to
Figure FDA00026322907000000111
Carrying out feature extraction to obtain global features
Figure FDA00026322907000000112
(4) New visible light pedestrian image
Figure FDA00026322907000000113
Horizontally cutting into an upper non-overlapping block, a middle non-overlapping block and a lower non-overlapping block; then extracting the characteristics of the three blocks respectively to obtain the characteristics
Figure FDA00026322907000000114
And
Figure FDA00026322907000000115
finally, the global features of the image are aligned
Figure FDA00026322907000000116
Summing the three image characteristics to obtain the total characteristics of the visible light conversion alignment and segmentation network
Figure FDA00026322907000000117
(5) New infrared pedestrian image
Figure FDA00026322907000000118
Horizontally cutting into an upper non-overlapping block, a middle non-overlapping block and a lower non-overlapping block; then extracting the characteristics of the three blocks respectively to obtain the characteristics
Figure FDA00026322907000000119
And
Figure FDA00026322907000000120
finally, the global features of the image are aligned
Figure FDA00026322907000000121
Summing the three image characteristics to obtain the total characteristics of the infrared conversion alignment and segmentation network
Figure FDA00026322907000000122
(6) Will be provided with
Figure FDA00026322907000000123
Features extracted from visible light basic branch network
Figure FDA00026322907000000124
Performing weighted addition fusion to obtain the total characteristics of visible light branch
Figure FDA00026322907000000125
Will be provided with
Figure FDA00026322907000000126
Features extracted from infrared basic branch network
Figure FDA00026322907000000127
Carrying out weighted addition fusion to obtain the total characteristics of the infrared branches
Figure FDA00026322907000000128
Then the characteristics of the visible light image
Figure FDA00026322907000000129
And features of infrared images
Figure FDA00026322907000000130
And mapping the data to the same characteristic embedding space, and training by combining an identity loss function and a most difficult batch sampling loss function with weight, thereby finally improving the cross-modal pedestrian re-identification precision.
2. The method of claim 1, wherein the sampling strategy for each training batch in step (1) is: randomly selecting P pedestrians from a training data set, then randomly selecting K visible light pedestrian images and K infrared pedestrian images for each pedestrian to form batch training data containing 2PK pedestrian images, and finally sending the 2PK pedestrian images into a network for training;
Figure FDA0002632290700000021
representing a visible light image extracted using a visible light basic branch network
Figure FDA0002632290700000022
The depth characteristic of (a) is,
Figure FDA0002632290700000023
representing infrared images extracted using an infrared-based branched network
Figure FDA0002632290700000024
The depth characteristic of (a); all output feature vectors are 2048 in length.
3. The method according to claim 1, wherein the transformation alignment is performed in steps (2) and (3) by using the fifth residual block conv _5x extracted from the visible light basic branch (infrared basic branch) to linearly regress a set of affine transformation parameters
Figure FDA0002632290700000025
And
Figure FDA0002632290700000026
then, establishing a coordinate relation corresponding to the images before and after the affine transformation through a formula (1):
Figure FDA0002632290700000027
wherein,
Figure FDA0002632290700000028
is the ith target coordinate in the regular grid of the target image,
Figure FDA0002632290700000029
is the source coordinates of the sample points in the input image,
Figure FDA00026322907000000210
and
Figure FDA00026322907000000211
is an affine transformation matrix in which13And theta23Controlling the shift, theta, of the converted image11,θ12,θ21And theta22Controlling the size and rotation change of the converted image; sampling an image grid by using bilinear sampling during affine transformation;
Figure FDA00026322907000000212
and
Figure FDA00026322907000000213
for the input image of the bilinear sampler, the new image of visible light and infrared outputted by space transformation is assumed to be
Figure FDA00026322907000000214
And
Figure FDA00026322907000000215
the correspondence between them is:
Figure FDA00026322907000000216
Figure FDA00026322907000000217
wherein,
Figure FDA00026322907000000218
and
Figure FDA00026322907000000219
a pixel value representing a coordinate (m, n) position in each channel in the target image,
Figure FDA00026322907000000220
and
Figure FDA00026322907000000221
represents the pixel value at the (n, m) coordinate in each channel in the source image, and H and W represent the height and width of the target image (or source image); bilinear sampling is continuously derivable, so the above equation is continuously derivable and allows gradient back propagation, thereby enabling pedestrian adaptive alignment, aligning global features of the image with available global features
Figure FDA00026322907000000222
And
Figure FDA00026322907000000223
it is shown that, in addition, the present invention horizontally divides the transformed image into three non-overlapping fixed blocks in order to learn more discriminative features.
4. The method according to claim 1, wherein in step (4), the transformed aligned image is first horizontally sliced into an upper, a middle and a lower blocks, respectively; the pixels in the first height range are 1-96, the pixels in the second height range are 97-192, the pixels in the third height range are 193-288, and the pixels in the three width ranges are all 144; then, the three area block images are respectively copied to the corresponding positions of 3 newly redefined sub-images with height and width of 288 × 144 and pixel values of all 0; next, the transformed global is extracted through 4 Resnet50 residual networks, respectivelyA feature and 3 sub-map features; the obtained characteristics are respectively
Figure FDA0002632290700000031
And
Figure FDA0002632290700000032
the invention selects a mode of directly summing the global characteristic and the 3 block sub-image characteristics to obtain the total characteristic of the transformed image
Figure FDA0002632290700000033
Figure FDA0002632290700000034
Finally, will
Figure FDA0002632290700000035
And the characteristics of the original map in the step (1)
Figure FDA0002632290700000036
Obtaining the final characteristics of the visible light image by means of weighted addition fusion
Figure FDA0002632290700000037
Namely, it is
Figure FDA0002632290700000038
Where λ is a predefined trade-off parameter in the interval 0 to 1 to balance the contributions of the two features.
5. The method according to claim 1, wherein in step (5), the transformed aligned image is first horizontally sliced into an upper, a middle and a lower blocks, respectively; the pixels in the first height range are 1-96, the pixels in the second height range are 97-192, the pixels in the third height range are 193-288, and the pixels in the three width ranges are all 144; then, the three region block images are respectively copied to the image data3 newly defined new height and width pixels are 288 multiplied by 144, and the pixel values are all 0 at the corresponding positions of the subgraph; next, extracting the transformed global features and 3 sub-graph features through 4 Resnet50 residual error networks respectively; the obtained characteristics are respectively
Figure FDA0002632290700000039
And
Figure FDA00026322907000000310
the invention selects a mode of directly summing the global characteristic and the 3 block sub-image characteristics to obtain the total characteristic of the transformed image
Figure FDA00026322907000000311
Figure FDA00026322907000000312
Finally, will
Figure FDA00026322907000000313
And the characteristics of the original map in the step (1)
Figure FDA00026322907000000314
Obtaining the final characteristics of the visible light image by means of weighted addition fusion
Figure FDA00026322907000000315
Namely, it is
Figure FDA00026322907000000316
Where λ is a predefined trade-off parameter in the interval 0 to 1 to balance the contributions of the two features.
6. The method according to claim 1, wherein in step (6) for reducing the cross-modal difference between the infrared image and the visible light image, the same nesting function f is usedθ,fθEssentially a fully connected layer (assuming its parameters are theta), characterizing the visible image
Figure FDA00026322907000000317
And infrared image characteristics
Figure FDA00026322907000000318
And mapping to the same feature space to obtain nested features
Figure FDA0002632290700000041
And
Figure FDA0002632290700000042
is abbreviated as
Figure FDA0002632290700000043
And
Figure FDA0002632290700000044
Figure FDA0002632290700000045
and
Figure FDA0002632290700000046
respectively representing one-dimensional feature vectors with the output length of 512; for simplicity of presentation, use is made of
Figure FDA0002632290700000047
To represent a visible light image batch
Figure FDA0002632290700000048
The jth image of the ith person in (1), similarly for an infrared image of a batch
Figure FDA0002632290700000049
Are also denoted by the same; suppose that
Figure FDA00026322907000000410
And
Figure FDA00026322907000000411
then the
Figure FDA00026322907000000412
And
Figure FDA00026322907000000413
respectively represent the input pedestrian
Figure FDA00026322907000000414
And
Figure FDA00026322907000000415
the identity prediction probability of (a); for example,
Figure FDA00026322907000000416
representing predictive input visible light images
Figure FDA00026322907000000417
Is the probability of k; use of
Figure FDA00026322907000000418
And
Figure FDA00026322907000000419
input image representing true identity i
Figure FDA00026322907000000420
Of (2), i.e. of
Figure FDA00026322907000000421
And
Figure FDA00026322907000000422
then the identity loss function for predicting identity using cross-entropy loss in a batch is defined as:
Figure FDA00026322907000000423
due to LidOnly the identity of each input sample is considered, and whether the input visible light and infrared belong to the same identity is not emphasized; in order to further relieve the cross-modal difference between the infrared image and the visible light image, the TriHardloss (the most difficult triple sampling loss) only considers the information of extreme samples, so that the local gradient is extremely large, and the network is broken down; unlike TriHardloss, the present invention uses the most difficult triple sampling loss function for single batch adaptive weighting; the core idea is that for each infrared image sample in a batch
Figure FDA00026322907000000424
ID identities and
Figure FDA00026322907000000425
same positive sample
Figure FDA00026322907000000426
For the positive sample pair, the larger the Euclidean distance in the nested feature space is, the larger the weight distribution is; in the same way, for
Figure FDA00026322907000000427
The ID and ID can also be calculated in all visible light images of the batch respectively
Figure FDA00026322907000000428
Different negative examples
Figure FDA00026322907000000429
For negative example pairs, in nested featuresThe bigger the European distance in the space is, the smaller the weight distribution is; it can therefore be seen that different distances (with different degrees of difficulty) are assigned different weights; therefore, the most difficult triple sampling loss function with the weight inherits the advantage of optimizing the relative distance between the positive sample pair and the negative sample pair, avoids introducing any redundant parameter, and enables the triple sampling loss function to be more flexible and strong in adaptability; thus, the anchor point samples are for each visible light image in each batch
Figure FDA00026322907000000430
Weighted least difficult triple sampling loss function
Figure FDA00026322907000000431
The calculation is as follows:
Figure FDA00026322907000000432
Figure FDA0002632290700000051
Figure FDA0002632290700000052
where p is the corresponding positive sample set, n is the negative sample set, Wi pIs a positive sample distance weight, Wi nRepresenting the distance weight of the negative sample; similarly, for each infrared image anchor point sample in each batch
Figure FDA0002632290700000053
Weighted least difficult triple sampling loss function
Figure FDA0002632290700000054
The calculation is as follows:
Figure FDA0002632290700000055
Figure FDA0002632290700000056
Figure FDA0002632290700000057
thus, the overall most difficult triplet sampling loss function with weights is:
Figure FDA0002632290700000058
finally, the total loss function is defined as:
Lwrt=Lid+λLc_wrt (11)
where λ is a predefined parameter for balancing ID identity loss LidAnd the most difficult triplet sampling loss with weight, Lc_wrtThe contribution of (c).
CN202010814790.2A 2020-08-13 2020-08-13 A Cross-modal Pedestrian Re-identification Method Based on Double Transform Alignment and Blocking Pending CN113761995A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010814790.2A CN113761995A (en) 2020-08-13 2020-08-13 A Cross-modal Pedestrian Re-identification Method Based on Double Transform Alignment and Blocking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010814790.2A CN113761995A (en) 2020-08-13 2020-08-13 A Cross-modal Pedestrian Re-identification Method Based on Double Transform Alignment and Blocking

Publications (1)

Publication Number Publication Date
CN113761995A true CN113761995A (en) 2021-12-07

Family

ID=78785620

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010814790.2A Pending CN113761995A (en) 2020-08-13 2020-08-13 A Cross-modal Pedestrian Re-identification Method Based on Double Transform Alignment and Blocking

Country Status (1)

Country Link
CN (1) CN113761995A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612937A (en) * 2022-03-15 2022-06-10 西安电子科技大学 Single-mode enhancement-based infrared and visible light fusion pedestrian detection method
CN116071369A (en) * 2022-12-13 2023-05-05 哈尔滨理工大学 An infrared image processing method and device
WO2023231233A1 (en) * 2022-05-31 2023-12-07 浪潮电子信息产业股份有限公司 Cross-modal target re-identification method and apparatus, device, and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480178A (en) * 2017-07-01 2017-12-15 广州深域信息科技有限公司 A kind of pedestrian's recognition methods again compared based on image and video cross-module state
US10176405B1 (en) * 2018-06-18 2019-01-08 Inception Institute Of Artificial Intelligence Vehicle re-identification techniques using neural networks for image analysis, viewpoint-aware pattern recognition, and generation of multi- view vehicle representations
CN111325115A (en) * 2020-02-05 2020-06-23 山东师范大学 Countermeasures cross-modal pedestrian re-identification method and system with triple constraint loss

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480178A (en) * 2017-07-01 2017-12-15 广州深域信息科技有限公司 A kind of pedestrian's recognition methods again compared based on image and video cross-module state
US10176405B1 (en) * 2018-06-18 2019-01-08 Inception Institute Of Artificial Intelligence Vehicle re-identification techniques using neural networks for image analysis, viewpoint-aware pattern recognition, and generation of multi- view vehicle representations
CN111325115A (en) * 2020-02-05 2020-06-23 山东师范大学 Countermeasures cross-modal pedestrian re-identification method and system with triple constraint loss

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BO LI ET AL.: "Visible Infrared Cross-Modality Person Re-Identification Network Based on Adaptive Pedestrian Alignment" *
MANG YE ET AL.: "Deep Learning for Person Re-identification: A Survey and Outlook" *
MANG YE ET AL.: "Visible Thermal Person Re-Identification via Dual-Constrained Top-Ranking" *
罗浩 ET AL.: "基于深度学习的行人重识别研究进展" *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612937A (en) * 2022-03-15 2022-06-10 西安电子科技大学 Single-mode enhancement-based infrared and visible light fusion pedestrian detection method
WO2023231233A1 (en) * 2022-05-31 2023-12-07 浪潮电子信息产业股份有限公司 Cross-modal target re-identification method and apparatus, device, and medium
CN116071369A (en) * 2022-12-13 2023-05-05 哈尔滨理工大学 An infrared image processing method and device
CN116071369B (en) * 2022-12-13 2023-07-14 哈尔滨理工大学 An infrared image processing method and device

Similar Documents

Publication Publication Date Title
CN107832672B (en) Pedestrian re-identification method for designing multi-loss function by utilizing attitude information
CN111666843B (en) A Pedestrian Re-Identification Method Based on Global Feature and Local Feature Splicing
CN112651262B (en) A Cross-modal Pedestrian Re-identification Method Based on Adaptive Pedestrian Alignment
CN110135375A (en) Multi-Person Pose Estimation Method Based on Global Information Integration
CN111259786A (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN113989851B (en) Cross-modal pedestrian re-identification method based on heterogeneous fusion graph convolution network
CN109508663A (en) A kind of pedestrian's recognition methods again based on multi-level supervision network
CN103268480A (en) A visual tracking system and method
Shen et al. MCCG: A ConvNeXt-based multiple-classifier method for cross-view geo-localization
CN113761995A (en) A Cross-modal Pedestrian Re-identification Method Based on Double Transform Alignment and Blocking
CN116597267B (en) Image recognition method, device, computer equipment and storage medium
CN113011359B (en) Method for simultaneously detecting plane structure and generating plane description based on image and application
CN112434654A (en) Cross-modal pedestrian re-identification method based on symmetric convolutional neural network
CN114495170A (en) A method and system for pedestrian re-identification based on local suppression of self-attention
Zhang et al. Combining depth-skeleton feature with sparse coding for action recognition
CN117274627A (en) Multi-temporal snow remote sensing image matching method and system based on image conversion
CN110543817A (en) Pedestrian Re-Identification Method Based on Pose-Guided Feature Learning
Chen et al. Self-supervised feature learning for long-term metric visual localization
Li et al. Development and challenges of object detection: A survey
Zhang et al. Two-stage domain adaptation for infrared ship target segmentation
Gao et al. Occluded person re-identification based on feature fusion and sparse reconstruction
CN103903269B (en) The description method and system of ball machine monitor video
CN107730535B (en) Visible light infrared cascade video tracking method
CN114154576B (en) Feature selection model training method and system based on hybrid supervision
CN109740405A (en) A kind of non-alignment similar vehicle front window different information detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20211207