WO2021077785A1 - 一种基于行人重识别驱动定位调整的行人搜索方法 - Google Patents
一种基于行人重识别驱动定位调整的行人搜索方法 Download PDFInfo
- Publication number
- WO2021077785A1 WO2021077785A1 PCT/CN2020/097623 CN2020097623W WO2021077785A1 WO 2021077785 A1 WO2021077785 A1 WO 2021077785A1 CN 2020097623 W CN2020097623 W CN 2020097623W WO 2021077785 A1 WO2021077785 A1 WO 2021077785A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pedestrian
- identification
- positioning adjustment
- person
- loss
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000001514 detection method Methods 0.000 claims abstract description 52
- 238000006243 chemical reaction Methods 0.000 claims abstract description 13
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 11
- 238000012549 training Methods 0.000 claims description 11
- 230000009466 transformation Effects 0.000 claims description 11
- 238000005070 sampling Methods 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
- G06F18/2193—Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4007—Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/75—Determining position or orientation of objects or cameras using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
- G06V10/95—Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present invention belongs to the technical field of computer vision, and more specifically, relates to a pedestrian search method based on pedestrian re-identification driving positioning adjustment.
- pedestrian search Person Sea rch
- Pedestrian search means that given a picture containing a pedestrian to be queried, the pedestrian is detected and identified from the image library. It includes two subtasks: pedestrian detection and pedestrian re-identification. Compared with pedestrian re-identification and directly using cropped pedestrian pictures, pedestrian search is closer to the real scene.
- the existing pedestrian search methods are mainly divided into two categories:
- One type of method is joint training by sharing some features of the pedestrian detection and pedestrian re-identification network, such as the first pedestrian search article: "Joint detection and identification feature learning for person search , Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on IEEE, 2017: 3376-3385.”
- this type of method ignores that pedestrian detection is a two-class classification, and pedestrian re-recognition is a multi-classification task.
- the detection frame output by the detection network often has background interference. It is not suitable for pedestrian search, so the accuracy of this type of method is generally low; in addition, this method cannot achieve end-to-end detection, which also results in a relatively low efficiency of pedestrian search.
- the existing pedestrian search methods share some features to realize the joint training of the pedestrian detection network and the pedestrian re-recognition network. Due to the conflict of shared features, the technical problem of low accuracy of pedestrian search is caused.
- the present invention provides a pedestrian search method based on pedestrian re-recognition driving positioning adjustment. Its purpose is to solve the existing existing pedestrian search methods by sharing some features to realize a pedestrian detection network
- the joint training with pedestrian re-recognition network has a technical problem of low accuracy of pedestrian search due to the conflict of shared features.
- the present invention provides a pedestrian search method based on pedestrian re-identification driving positioning adjustment, including:
- the pedestrian re-identification-driven positioning adjustment model includes a detection module, a region of interest conversion module, and a pedestrian re-identification module;
- the detection module is used to detect pedestrians in the input image to obtain the detection frame coordinates corresponding to the pedestrian position;
- the region of interest conversion module is used to calculate the coordinates from the input image to the detection frame according to the detection frame coordinates
- the affine transformation parameter of the affine transformation parameter and extract the region of interest in the input image according to the affine transformation parameter and bilinear sampling;
- the pedestrian re-identification module is used to extract the depth feature of the region of interest;
- the original picture is used as the input of the pedestrian re-recognition-driven positioning adjustment model, and the probability value of the identity tag corresponding to the pedestrian in the original picture is used as the expected output after the classification of the pedestrian re-identification-driven positioning adjustment model output features, Training the pedestrian re-recognition drive positioning adjustment model;
- cross-entropy loss and the triple proxy loss are used to supervise the pedestrian re-identification model.
- the method for supervising the pedestrian re-identification module by adopting the triple proxy loss is specifically as follows:
- the loss function of the pedestrian re-identification module supervises the detection frame coordinates output by the detection module.
- the detection module uses Faster R-CNN as the network backbone.
- the FasterR-CNN includes classification loss, but does not include regression loss.
- the anchor frame aspect ratio adopted by the FasterR-CNN is less than 1.
- the pedestrian re-identification module ResNet50 serves as the backbone of the network.
- ResNet50 uses a batch normalization layer to replace the last fully connected layer of the network.
- the present invention effectively realizes the joint optimization of the pedestrian detection network and pedestrian re-recognition by designing the interest conversion module; on the one hand, the interest conversion module converts the original input image into a small image corresponding to the region of interest, avoiding the pedestrian re-recognition network There are contradictions in sharing some features with the detection network; on the other hand, the loss of the pedestrian re-identification network can be passed back to the detection network through the gradient of the region of interest conversion module to realize the supervision and adjustment of the pedestrian re-identification network loss to the detection network output detection frame
- the latter detection frame can effectively remove background interference, and will contain more useful attribute information, which is more suitable for pedestrian search, which greatly improves the accuracy of pedestrian search.
- the present invention designs a triple proxy loss.
- a triple proxy table is used to store the features of all categories, and it is updated in each iteration. Therefore, even in the pedestrian search task, due to the number of training samples in the batch If the number is too small, it is impossible to construct a conventional triple loss.
- You can also construct a triple by extracting proxy samples in the proxy table of triples, so as to make the distance between samples of the same category closer. The distance between the two is farther, and the accuracy of pedestrian search is improved.
- FIG. 1 is a flowchart of a pedestrian search method based on pedestrian re-identification driving positioning adjustment according to an embodiment of the present invention.
- Fig. 2 is a structural diagram of a pedestrian re-identification driving positioning adjustment model according to an embodiment of the present invention.
- an embodiment of the present invention provides a pedestrian search method based on pedestrian re-identification driving positioning adjustment, including:
- the pedestrian re-identification-driven positioning adjustment model includes a detection module, a region of interest conversion module, and a pedestrian re-identification module; among them, the detection module is used to input The pedestrian in the image is detected to obtain the coordinates of the detection frame corresponding to the position of the pedestrian; the region of interest conversion module is used to calculate the affine transformation parameters from the input image to the coordinates of the detection frame according to the coordinates of the detection frame, and according to the affine transformation parameters and Bilinear sampling extracts the region of interest in the input image; the pedestrian re-recognition module is used to extract the depth features of the region of interest;
- the detection module provided by the embodiment of the present invention uses Faster R-CNN as the backbone of the network. Since the detection target is a pedestrian, in order to be more suitable for the proportion of the human body, the aspect ratio of the anchor in Faster R-CNN should be modified to be less than 1.
- the present invention modifies the aspect ratio of the anchor in FasterR-CNN from 1:1, 1:2, 2:1 to 1:1, 1:2, 1:3; at the same time, to enable the re-identification loss to dominate the generation of the detection frame, Instead of just making the detection frame close to the real frame, the present invention only retains the classification loss of the original Faster R-CNN, and removes the regression loss in the original network.
- the loss of the re-identification network can be gradient back to the detection network, so as to supervise the detected coordinates.
- the following formula is used to calculate the coordinates from the input image to the detection frame according to the detection frame coordinates
- the affine transformation parameter ⁇ is used to calculate the coordinates from the input image to the detection frame according to the detection frame coordinates
- the small image of the region of interest corresponding to the detection frame can be obtained, and the gradient return of the loss function can be realized;
- the calculation formula of the small image of the region of interest is:
- V B(P S ,U)
- the pedestrian re-recognition module uses ResNet50 as the backbone of the network.
- ResNet50 the backbone of the network.
- the present invention removes the last fully connected layer of ResNet50 to obtain a modified residual network. Add a batch normalization layer after the network.
- the original picture is used as the input of the pedestrian re-recognition-driven positioning adjustment model, and the probability value of the identity tag corresponding to the pedestrian in the original picture is used as the expected output after the classification of the pedestrian re-identification-driven positioning adjustment model output features. Training the pedestrian re-recognition driving positioning adjustment model;
- the present invention uses cross-entropy loss and triple proxy loss to supervise the pedestrian re-identification model; among them, the triple loss is a commonly used measurement loss in the field of pedestrian re-identification, and the loss can make the same type of samples The distance between samples is closer, and the distance between samples of different categories is longer.
- the present invention designs a triple proxy loss.
- a triple proxy table is used to store the features of all categories, and it is updated in each iteration, so that even if the batch of training samples is not enough to form a triple, it can be passed Extract the proxy samples in the triple proxy table to construct the triple, so it is called the triple proxy loss;
- m represents the interval of constraining the negative sample pair to be greater than the distance of the positive sample pair
- f i a , f i p , and f i n respectively represent the characteristics of the anchor sample, the positive sample and the negative sample in the triplet
- D represents the Euclidean distance
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Biodiversity & Conservation Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种基于行人重识别驱动定位调整的行人搜索方法,属于计算机视觉技术领域,本发明通过设计感兴趣转换模块有效实现行人检测网络和行人重识别的联合优化;一方面,感兴趣转换模块将原始输入图像转换为感兴趣区域对应的小图,避免行人重识别网络和检测网络共享部分特征存在的矛盾;另一方面,行人重识别网络的损失可通过感兴趣区域转换模块梯度回传至检测网络,实现行人重识别网络损失对检测网络输出检测框的监督,调整后的检测框可以有效去除背景干扰,并且会包含更多有用的属性信息,更适合用于行人搜索,从而大大提升了行人搜索的准确率。
Description
本发明属于计算机视觉技术领域,更具体地,涉及一种基于行人重识别驱动定位调整的行人搜索方法。
目前在人员密集的公共场所、政府部门、企事业单位、住宅小区、甚至许多居民的家中都装有监控摄像头,为维护社会治安、保障人民生命财产安全提供了可靠的视频监控资源。在视频监控中,由于摄像头的分辨率、拍摄角度等参数变化较大,难以实现高质量人脸图片的稳定获取,使得基于人脸识别技术的目标追踪稳定性较差。相对而言,行人搜索(Person Sea rch)技术可以为视频监控提供鲁棒性更强的目标追踪解决方案。行人搜索是指给定一张包含待查询行人的图片,从图片库中将该行人检测并识别出来,共包括行人检测和行人重识别两个子任务。相比于行人重识别直接利用裁好的行人图片,行人搜索更加接近现实场景。
现有的行人搜索方法主要分为两大类:一类方法是通过共享行人检测和行人重识别网络的部分特征进行联合训练,例如首篇行人搜索文章:“Joint detection and identification feature learning for person search,ComputerVisionand Pattern Recognition(CVPR),2017IEEE Conferenceon IEEE,2017:3376-3385.”,但这类方法忽略了行人检测是二分类,而行人重识别是多分类任务,直接共享特征这种做法存在冲突,因此这类方法的准确率普遍比较低;另一类方法是分别进行检测和重识别,将两个任务分隔开,不能很好的进行联合优化,检测网络输出的检测框往往存在背景干扰,不适合用于行人搜索,因此这类方法的准确率普遍也较低;此外,该方法不能实现端到端的检测,也导致了行人搜索的效率比较低。
总体而言,现有现有行人搜索方法通过共享部分特征,实现行人检测网络和行人重识别网络的联合训练,由于共享特征存在冲突,而造成行人搜索的准确率较低的技术问题。
【发明内容】
针对现有技术的以上缺陷或改进需求,本发明提供了一种基于行人重识别驱动定位调整的行人搜索方法,其目的在于解决,现有现有行人搜索方法通过共享部分特征,实现行人检测网络和行人重识别网络的联合训练,由于共享特征存在冲突,而造成行人搜索的准确率较低的技术问题。
为实现上述目的,本发明提供了一种基于行人重识别驱动定位调整的行人搜索方法,包括:
(1)构建行人重识别驱动定位调整模型;所述行人重识别驱动定位调整模型包括检测模块、感兴趣区域转换模块和行人重识别模块;
所述检测模块,用于对输入图像中的行人进行检测,得到行人位置对应的检测框坐标;所述感兴趣区域转换模块,用于根据所述检测框坐标计算得到由输入图像到检测框坐标的仿射变换参数,并根据所述仿射变换参数和双线性采样对输入图像中的感兴趣区域进行提取;所述行人重识别模块,用于对所述感兴趣区域提取深度特征;
(2)将原始图片作为所述行人重识别驱动定位调整模型的输入,将原始图片内行人对应的身份标签的概率值作为所述行人重识别驱动定位调整模型输出特征经过分类后的期望输出,对所述行人重识别驱动定位调整模型进行训练;
(3)将待搜索图像和目标图像分别输入训练好的行人重识别驱动定位调整模型,得到待搜索图像行人特征和目标图像行人特征,计算待搜索图像行人特征与目标图像行人特征之间的相似度,获得待搜索图像的匹配结果。
进一步地,采用交叉熵损失和三元组代理损失对所述行人重识别模型 进行监督。
进一步地,采用三元组代理损失监督所述行人重识别模块的方法具体为:
(01)初始化用于存储每个类别特征值的三元组代理表T∈R
N*K;其中,N代表样本的类别总数,K代表每个类别存储特征的个数;
(02)前向传播时,通过计算三元组代理损失值使同类别的样本之间距离更近,不同类别样本之间的距离更远;
(03)反向传播时,更新当前样本对应类别在所述三元组代理表中的特征,采用先入先出原则替换已有特征。
进一步地,所述行人重识别模块的损失函数对所述检测模块输出的检测框坐标进行监督。
进一步地,所述检测模块采用Faster R-CNN作为网络主干。
进一步地,所述FasterR-CNN包括分类损失,不包括回归损失。
进一步地,所述FasterR-CNN采用的锚框长宽比小于1。
进一步地,所述行人重识别模块ResNet50作为网络主干。
进一步地,所述ResNet50采用批归一化层替换网络最后的全连接层。
总体而言,通过本发明所构思的以上技术方案与现有技术相比,能够取得下列有益效果:
(1)本发明通过设计感兴趣转换模块有效实现行人检测网络和行人重识别的联合优化;一方面,感兴趣转换模块将原始输入图像转换为感兴趣区域对应的小图,避免行人重识别网络和检测网络共享部分特征存在的矛盾;另一方面,行人重识别网络的损失可通过感兴趣区域转换模块梯度回传至检测网络,实现行人重识别网络损失对检测网络输出检测框的监督,调整后的检测框可以有效去除背景干扰,并且会包含更多有用的属性信息,更适合用于行人搜索,从而大大提升了行人搜索的准确率。
(2)本发明设计了一种三元组代理损失,用一个三元组代理表来存储 所有类别的特征,并且在每次迭代进行更新,因此即使在行人搜索任务中,由于批训练样本个数太少的限制,无法构建常规的三元组损失,也可以通过提取三元组代理表中的代理样本来构建三元组,进而使同类别的样本之间距离更近,不同类别样本之间的距离更远,提高行人搜索的精确度。
图1是本发明实施例的一种基于行人重识别驱动定位调整的行人搜索方法的流程图。
图2是本发明实施例的行人重识别驱动定位调整模型结构图。
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。此外,下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。
如图1所示,本发明实施例提供了一种基于行人重识别驱动定位调整的行人搜索方法,包括:
(1)构建行人重识别驱动定位调整模型;如图2所示,该行人重识别驱动定位调整模型包括检测模块、感兴趣区域转换模块和行人重识别模块;其中,检测模块,用于对输入图像中的行人进行检测,得到行人位置对应的检测框坐标;感兴趣区域转换模块,用于根据检测框坐标计算得到由输入图像到检测框坐标的仿射变换参数,并根据仿射变换参数和双线性采样对输入图像中的感兴趣区域进行提取;行人重识别模块,用于对感兴趣区域提取深度特征;
具体地,本发明实施例提供的检测模块采用Faster R-CNN作为网络主干,由于检测目标为行人,为更适合人体比例,应修改FasterR-CNN中anchor的长宽比使其小于1,本发明实施例修改FasterR-CNN中anchor的长宽比 从1:1,1:2,2:1到1:1,1:2,1:3;同时为使重识别损失能够主导检测框的生成,而不仅仅使检测框接近于真实框,本发明只保留原始Faster R-CNN的分类损失,去掉原始网络中的回归损失。
由于感兴趣转换模块的作用,使得重识别网络的损失可以梯度回传至检测网络,从而对检测出的坐标进行监督,具体地,采用以下公式根据检测框坐标计算得到由输入图像到检测框坐标的仿射变换参数θ:
根据仿射变换参数θ以及双线性采样,可以得到检测框对应的感兴趣区域小图,并且实现损失函数的梯度回传;感兴趣区域小图的计算公式为:
V=B(P
S,U)
其中,B代表双线性采样,U和V分别代表原始输入图像和感兴趣区域小图,P
S是根据仿射变换得到由小图到原图像的像素点。
行人重识别模块采用ResNet50作为网络主干,为了保持训练的类别数和训练集类别数保持一致,本发明移除ResNet50最后的全连接层,得到修改后的残差网络,并在修改后的残差网络之后添加批归一化层。
(2)将原始图片作为所述行人重识别驱动定位调整模型的输入,将原始图片内行人对应的身份标签概率值作为所述行人重识别驱动定位调整模型输出特征经过分类后的期望输出,对所述行人重识别驱动定位调整模型进行训练;
具体地,本发明采用交叉熵损失和三元组代理损失对行人重识别模型进行监督;其中,三元组损失是行人重识别领域一种常用的度量损失,该损失可以使同类别的样本之间距离更近,不同类别样本之间的距离更远, 但是由于在行人搜索任务中,由于批训练样本个数太少的限制,无法构建常规的三元组损失。因此本发明设计了一种三元组代理损失,用一个三元组代理表来存储所有类别的特征,并且在每次迭代进行更新,这样即使批训练样本不足以构成三元组,也可以通过提取三元组代理表中的代理样本来构建三元组,故称为三元组代理损失;采用三元组代理损失监督所述行人重识别模块的方法具体为:(01)初始化用于存储每个类别特征值的三元组代理表T∈R
N*K;其中,N代表样本的类别总数,K代表每个类别存储特征的个数,本发明实施例取K=2;(02)前向传播时,通过计算三元组代理损失值L使同类别的样本之间距离更近,不同类别样本之间的距离更远:
其中,m表示约束负样本对大于正样本对距离的间隔,f
i
a,f
i
p,f
i
n分别表示三元组中的锚点样本、正样本以及负样本的特征,D表示欧氏距离;
(03)反向传播时,更新当前样本对应类别在所述三元组代理表中的特征,采用先入先出原则替换已有特征。
(3)将待搜索图像和目标图像分别输入训练好的行人重识别驱动定位调整模型,得到待搜索图像行人特征和目标图像行人特征,计算待搜索图像行人特征与目标图像行人特征之间的相似度,获得待搜索图像的匹配结果。
本领域的技术人员容易理解,以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。
Claims (9)
- 一种基于行人重识别驱动定位调整的行人搜索方法,其特征在于,包括:(1)构建行人重识别驱动定位调整模型;所述行人重识别驱动定位调整模型包括检测模块、感兴趣区域转换模块和行人重识别模块;所述检测模块,用于对输入图像中的行人进行检测,得到行人位置对应的检测框坐标;所述感兴趣区域转换模块,用于根据所述检测框坐标计算得到由输入图像到检测框坐标的仿射变换参数,并根据所述仿射变换参数和双线性采样对输入图像中的感兴趣区域进行提取;所述行人重识别模块,用于对所述感兴趣区域提取深度特征;(2)将原始图片作为所述行人重识别驱动定位调整模型的输入,将原始图片内行人对应的身份标签的概率值作为所述行人重识别驱动定位调整模型输出特征经过分类后的期望输出,对所述行人重识别驱动定位调整模型进行训练;(3)将待搜索图像和目标图像分别输入训练好的行人重识别驱动定位调整模型,得到待搜索图像行人特征和目标图像行人特征,计算待搜索图像行人特征与目标图像行人特征之间的相似度,获得待搜索图像的匹配结果。
- 根据权利要求1所述的一种基于行人重识别驱动定位调整的行人搜索方法,其特征在于,采用交叉熵损失和三元组代理损失对所述行人重识别模型进行监督。
- 根据权利要求2所述的一种基于行人重识别驱动定位调整的行人搜索方法,其特征在于,采用三元组代理损失监督所述行人重识别模块的方法具体为:(01)初始化用于存储每个类别特征值的三元组代理表T∈R N*K;其中, N代表样本的类别总数,K代表每个类别存储特征的个数;(02)前向传播时,通过计算三元组代理损失值使同类别的样本之间距离更近,不同类别样本之间的距离更远;(03)反向传播时,更新当前样本对应类别在所述三元组代理表中的特征,采用先入先出原则替换已有特征。
- 根据权利要求1-3任一项所述的一种基于行人重识别驱动定位调整的行人搜索方法,其特征在于,所述行人重识别模块的损失函数对所述检测模块输出的检测框坐标进行监督。
- 根据权利要求1-4任一项所述的一种基于行人重识别驱动定位调整的行人搜索方法,其特征在于,所述检测模块采用Faster R-CNN作为网络主干。
- 根据权利要求5所述的一种基于行人重识别驱动定位调整的行人搜索方法,其特征在于,所述FasterR-CNN包括分类损失,不包括回归损失。
- 根据权利要求6所述的一种基于行人重识别驱动定位调整的行人搜索方法,其特征在于,所述FasterR-CNN采用的锚框长宽比小于1。
- 根据权利要求1-7任一项所述的一种基于行人重识别驱动定位调整的行人搜索方法,其特征在于,所述行人重识别模块ResNet50作为网络主干。
- 根据权利要求8所述的一种基于行人重识别驱动定位调整的行人搜索方法,其特征在于,所述ResNet50采用批归一化层替换网络最后的全连接层。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/253,124 US11263491B2 (en) | 2019-10-21 | 2020-06-23 | Person search method based on person re-identification driven localization refinement |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910998178.2 | 2019-10-21 | ||
CN201910998178.2A CN110826424B (zh) | 2019-10-21 | 2019-10-21 | 一种基于行人重识别驱动定位调整的行人搜索方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021077785A1 true WO2021077785A1 (zh) | 2021-04-29 |
Family
ID=69549837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/097623 WO2021077785A1 (zh) | 2019-10-21 | 2020-06-23 | 一种基于行人重识别驱动定位调整的行人搜索方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US11263491B2 (zh) |
CN (1) | CN110826424B (zh) |
WO (1) | WO2021077785A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113743251A (zh) * | 2021-08-17 | 2021-12-03 | 华中科技大学 | 一种基于弱监督场景的目标搜索方法及装置 |
CN116229381A (zh) * | 2023-05-11 | 2023-06-06 | 南昌工程学院 | 一种河湖采砂船船脸识别方法 |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110826424B (zh) | 2019-10-21 | 2021-07-27 | 华中科技大学 | 一种基于行人重识别驱动定位调整的行人搜索方法 |
CN112085119A (zh) * | 2020-09-17 | 2020-12-15 | 上海眼控科技股份有限公司 | 数据处理方法、装置、设备和存储介质 |
CN112232203B (zh) * | 2020-10-15 | 2024-05-28 | 平安科技(深圳)有限公司 | 行人识别方法、装置、电子设备及存储介质 |
CN113191338B (zh) * | 2021-06-29 | 2021-09-17 | 苏州浪潮智能科技有限公司 | 一种行人重识别方法、装置、设备及可读存储介质 |
CN115424086A (zh) * | 2022-07-26 | 2022-12-02 | 北京邮电大学 | 多视角的细粒度识别方法、装置、电子设备及介质 |
EP4365782A1 (en) * | 2022-11-01 | 2024-05-08 | Tata Consultancy Services Limited | Method and system for contradiction avoided learning for multi-class multi-label classification |
CN116863313B (zh) * | 2023-09-05 | 2024-01-12 | 湖北大学 | 基于标签增量精炼和对称评分的目标重识别方法及系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190156154A1 (en) * | 2017-11-21 | 2019-05-23 | Nvidia Corporation | Training a neural network to predict superpixels using segmentation-aware affinity loss |
CN109902573A (zh) * | 2019-01-24 | 2019-06-18 | 中国矿业大学 | 面向矿井下视频监控的多摄像机无标注行人重识别方法 |
CN110334687A (zh) * | 2019-07-16 | 2019-10-15 | 合肥工业大学 | 一种基于行人检测、属性学习和行人识别的行人检索增强方法 |
CN110826424A (zh) * | 2019-10-21 | 2020-02-21 | 华中科技大学 | 一种基于行人重识别驱动定位调整的行人搜索方法 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9158971B2 (en) * | 2014-03-03 | 2015-10-13 | Xerox Corporation | Self-learning object detectors for unlabeled videos using multi-task learning |
US10579911B2 (en) * | 2017-05-23 | 2020-03-03 | The United States Of America, As Represented By The Secretary Of The Navy | Systems and related methods employing directed energy and machine learning operable for enabling or protecting from non-destructive degradation or disruption of electro-optic(s) or sensors |
US20190205608A1 (en) * | 2017-12-29 | 2019-07-04 | Deep Innovations Ltd | Method and apparatus for safety monitoring of a body of water |
US11504607B2 (en) * | 2019-02-05 | 2022-11-22 | Deep Innovations Ltd. | System and method for using a camera unit for the pool cleaning robot for safety monitoring and augmented reality games |
US11363416B2 (en) * | 2019-10-04 | 2022-06-14 | Samsung Electronics Co., Ltd. | System and method for WiFi-based indoor localization via unsupervised domain adaptation |
-
2019
- 2019-10-21 CN CN201910998178.2A patent/CN110826424B/zh not_active Expired - Fee Related
-
2020
- 2020-06-23 US US17/253,124 patent/US11263491B2/en active Active
- 2020-06-23 WO PCT/CN2020/097623 patent/WO2021077785A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190156154A1 (en) * | 2017-11-21 | 2019-05-23 | Nvidia Corporation | Training a neural network to predict superpixels using segmentation-aware affinity loss |
CN109902573A (zh) * | 2019-01-24 | 2019-06-18 | 中国矿业大学 | 面向矿井下视频监控的多摄像机无标注行人重识别方法 |
CN110334687A (zh) * | 2019-07-16 | 2019-10-15 | 合肥工业大学 | 一种基于行人检测、属性学习和行人识别的行人检索增强方法 |
CN110826424A (zh) * | 2019-10-21 | 2020-02-21 | 华中科技大学 | 一种基于行人重识别驱动定位调整的行人搜索方法 |
Non-Patent Citations (1)
Title |
---|
ZHANG, JIAN-MING ET AL.: "Summary of research on progress of person re-identification", INFORMATION TECHNOLOGY, vol. 10, 31 December 2017 (2017-12-31), pages 172 - 176, XP055805588 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113743251A (zh) * | 2021-08-17 | 2021-12-03 | 华中科技大学 | 一种基于弱监督场景的目标搜索方法及装置 |
CN113743251B (zh) * | 2021-08-17 | 2024-02-13 | 华中科技大学 | 一种基于弱监督场景的目标搜索方法及装置 |
CN116229381A (zh) * | 2023-05-11 | 2023-06-06 | 南昌工程学院 | 一种河湖采砂船船脸识别方法 |
CN116229381B (zh) * | 2023-05-11 | 2023-07-07 | 南昌工程学院 | 一种河湖采砂船船脸识别方法 |
Also Published As
Publication number | Publication date |
---|---|
US20210365743A1 (en) | 2021-11-25 |
CN110826424B (zh) | 2021-07-27 |
US11263491B2 (en) | 2022-03-01 |
CN110826424A (zh) | 2020-02-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021077785A1 (zh) | 一种基于行人重识别驱动定位调整的行人搜索方法 | |
CN111126360B (zh) | 基于无监督联合多损失模型的跨域行人重识别方法 | |
WO2020098158A1 (zh) | 行人重识别方法、装置及计算机可读存储介质 | |
Kawewong et al. | Online and incremental appearance-based SLAM in highly dynamic environments | |
WO2016131300A1 (zh) | 一种自适应跨摄像机多目标跟踪方法及系统 | |
CN110135249B (zh) | 基于时间注意力机制和lstm的人体行为识别方法 | |
WO2023082882A1 (zh) | 一种基于姿态估计的行人摔倒动作识别方法及设备 | |
CN111539370A (zh) | 一种基于多注意力联合学习的图像行人重识别方法和系统 | |
CN114973317B (zh) | 一种基于多尺度邻接交互特征的行人重识别方法 | |
US12087028B2 (en) | Lifted semantic graph embedding for omnidirectional place recognition | |
CN113762009B (zh) | 一种基于多尺度特征融合及双注意力机制的人群计数方法 | |
CN103984955B (zh) | 基于显著性特征和迁移增量学习的多摄像机目标识别方法 | |
Wang et al. | Head pose estimation with combined 2D SIFT and 3D HOG features | |
CN108363771B (zh) | 一种面向公安侦查应用的图像检索方法 | |
Wang et al. | An overview of 3d object detection | |
Tan et al. | A multiple object tracking algorithm based on YOLO detection | |
Janku et al. | Fire detection in video stream by using simple artificial neural network | |
CN110516533A (zh) | 一种基于深度度量的行人再辨识方法 | |
Wang et al. | Embedding metric learning into set-based face recognition for video surveillance | |
Yin | Object Detection Based on Deep Learning: A Brief Review | |
CN110866426A (zh) | 基于光场相机和深度学习的行人识别方法 | |
Cai et al. | A target tracking method based on KCF for omnidirectional vision | |
Sunderrajan et al. | Multiple view discriminative appearance modeling with IMCMC for distributed tracking | |
Fan et al. | Siamese graph convolution network for face sketch recognition: an application using graph structure for face photo-sketch recognition | |
Liu et al. | Reliable Cross-camera Learning in Random Camera Person Re-identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20880001 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20880001 Country of ref document: EP Kind code of ref document: A1 |