WO2023272801A1 - 一种融合上下文信息的行人重识别方法及系统 - Google Patents
一种融合上下文信息的行人重识别方法及系统 Download PDFInfo
- Publication number
- WO2023272801A1 WO2023272801A1 PCT/CN2021/106989 CN2021106989W WO2023272801A1 WO 2023272801 A1 WO2023272801 A1 WO 2023272801A1 CN 2021106989 W CN2021106989 W CN 2021106989W WO 2023272801 A1 WO2023272801 A1 WO 2023272801A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pedestrian
- features
- identification
- context information
- information
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000011176 pooling Methods 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims description 14
- 238000012360 testing method Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 12
- 238000010276 construction Methods 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 7
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims 2
- 230000006872 improvement Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011840 criminal investigation Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Definitions
- the invention relates to the technical field of computer vision, in particular to a method and system for re-identifying pedestrians by fusing context information.
- Pedestrian re-identification is an image retrieval technology across different cameras, that is, for a given monitored pedestrian image, it is required to retrieve all images of the pedestrian in other cameras.
- This technology is widely used in intelligent video surveillance, security, criminal investigation and other fields, and is a current research hotspot in computer vision.
- the existing person re-identification methods based on feature learning can be mainly divided into person re-identification based on global features, person re-identification based on local features, and person re-identification based on auxiliary information. Both global and local feature-based person re-identification only learn the features of pedestrian images from a single pedestrian image, which limits the expressive power of this feature to a certain extent.
- the pedestrian re-identification method based on auxiliary information requires additional information, such as a text description of the pedestrian, or using GAN (generated confrontation network) to generate some pseudo-data as auxiliary information to improve the robustness of the pedestrian re-identification model, while The generation of such information is often costly.
- the method based on global features simply takes a complete pedestrian image as the input of the model, and cannot effectively solve some pedestrian images with occlusion and light problems due to the reduction of information.
- the method based on local features divides a pedestrian image into multiple parts horizontally, extracts multiple local features, and then compares the local features, which can improve the accuracy of the model to a certain extent.
- similar to global features a single There is too little information in the pedestrian image, and it is difficult to solve problems such as occlusion.
- the pedestrian re-identification method based on auxiliary information uses some additional information as a supplement to the pedestrian image information, which not only increases the calculation cost of the model, but also often difficult to obtain auxiliary information, which does not meet the practical problems.
- the purpose of the present invention is to provide a pedestrian re-identification method and system that integrates context information, to solve the problem of insufficient information in the existing feature learning method and the excessive cost of obtaining auxiliary information, so that the pedestrian re-identification model does not add additional information improve the accuracy of the model.
- the present invention provides a method for pedestrian re-identification by fusing context information, which includes the following steps:
- S2 Select the context information of each pedestrian, and construct a graph structure composed of each pedestrian feature and its corresponding context information;
- step S5 Concatenate the pooled pedestrian features with the corresponding original pedestrian features in step S1 to obtain the final pedestrian classification features, and build a pedestrian re-identification model;
- S6 Input the image of the pedestrian to be identified into the pedestrian re-identification model, and compare the similarity with all the final pedestrian classification features to obtain the matching result of the pedestrian re-identification.
- the pedestrian features extracted in S1 are extracted through a trained convolutional neural network.
- the method for constructing a graph structure composed of a single pedestrian feature and its corresponding context information in the S2 specifically includes the following steps:
- the method for constructing the edges of the graph in S22 specifically includes the following steps:
- Node X is connected to the remaining k context nodes, and the similarity between k context nodes is greater than ⁇ p , then the nodes are connected to construct the edges of the graph, and the similarity between nodes is used as the weight of the edges .
- updating the node information on the graph structure constructed in S2 specifically includes the following steps:
- the original nodes are spliced together for the next update.
- the data set includes a training set and a test set
- the training set is used to continuously train the pedestrian re-identification model, and the pictures of pedestrians to be identified in the test set are input into the pedestrian re-identification model, and all final pedestrians in the training set are classified.
- the similarity of the features is compared, and the picture with the greatest similarity is selected, which is considered to be the same pedestrian as the pedestrian picture to be identified.
- a pedestrian re-identification system that fuses contextual information, including:
- the extraction module is used to select a pedestrian re-identification data set and extract all pedestrian features in the data set;
- Construct a graph structure module which is used to select the context information of each pedestrian, and construct a graph structure composed of each pedestrian feature and its corresponding context information;
- the update module is used to update the node information of the constructed graph structure
- the pooling module is used to obtain pedestrian features combined with context information after the updated graph structure undergoes a weighted pooling operation
- the model construction module is used to splice the pooled pedestrian features with the corresponding original pedestrian features to obtain the final pedestrian classification features and construct a pedestrian re-identification model;
- the identification module is used to input the image of the pedestrian to be identified into the pedestrian re-identification model, and compare the similarity with all final pedestrian classification features to obtain the matching result of the pedestrian re-identification.
- a computer device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the computer program, the above-mentioned pedestrian re-identification fused with context information is realized method steps.
- a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of the above-mentioned pedestrian re-identification method fused with context information are realized.
- the present invention Compared with the global-based and local-based pedestrian re-identification methods, the present invention not only utilizes a single pedestrian picture information, but also utilizes information from other pictures to uniformly improve the pedestrian's characteristic Rodness; Compared with the pedestrian re-identification method based on auxiliary information, the present invention does not add additional information, which can reduce the consumption of information acquisition and calculation costs, and solve the information shortage of previous feature learning methods and obtain auxiliary information The cost is too large, so that the pedestrian re-identification model can improve the accuracy of the model without adding additional information.
- Fig. 1 is a schematic flow sheet of the method of the present invention
- Fig. 2 is a schematic flow chart of the specific implementation of the present invention.
- an embodiment of the present invention provides a pedestrian re-identification method that fuses context information, including the following steps:
- S2 Select the context information of each pedestrian, and construct a graph structure composed of each pedestrian feature and its corresponding context information;
- step S5 Concatenate the pooled pedestrian features with the corresponding original pedestrian features in step S1 to obtain the final pedestrian classification features, and build a pedestrian re-identification model;
- S6 Input the image of the pedestrian to be identified into the pedestrian re-identification model, and compare the similarity with all the final pedestrian classification features to obtain the matching result of the pedestrian re-identification.
- the pedestrian re-identification data set which contains a training set and a test set; use the trained convolutional neural network to extract the features of all pedestrian pictures (including training set and test set) in the data set; for the extracted
- k pedestrian features are selected in the data set based on the nearest neighbor algorithm (the training set is used for training, and the test set is used for testing) as the context information of the pedestrian feature P.
- This k+ One pedestrian feature is used as the node X of the construction graph, and then the feature P is connected to the other k pedestrian features. The larger the cosine distance between the k pedestrian features, the more likely they are connected to construct an edge.
- the similarity between features is used as the edge Weight, get the adjacency matrix A of the graph.
- the graph structure composed of the pedestrian feature P and its context information is constructed, and other pedestrian features are similar; the node information is updated on the graph structure constructed by S3, and the first information update is performed on the node through the message passing process.
- the self-attention mechanism is used to update the node information for the second time, and then the nonlinear function is used to update the node information for the third time; the updated graph structure is passed through the weighted pool After the operation, the pedestrian features combined with the context information are obtained.
- the pooled features are concatenated (concat) with the original features in the S2 step to obtain the final feature representation of a pedestrian.
- this feature is used for pedestrian classification, and the pedestrian re-identification model is continuously trained using the training set.
- this method Compared with the global-based and local-based person re-identification methods, this method not only uses information from a single pedestrian picture, but also uses information from other pictures, so that the input of the pedestrian re-identification model is not limited to one Pedestrian images, but using multiple pedestrian images to improve the feature robustness of the pedestrian; Compared with the pedestrian re-identification method based on auxiliary information, this method does not add additional information, that is, the pedestrian re-identification model does not increase Improve the accuracy of the model in the case of additional information.
- an embodiment of the present invention provides a method for pedestrian re-identification fused with context information. Based on Embodiment 1, taking the pedestrian re-identification data set DukeMTMC as an example, it specifically includes the following steps:
- Step (1) The DukeMTMC data set contains a training set and a test set. Assuming that the training set has a total of N pictures, the features of the N pedestrian pictures are extracted through a trained convolutional neural network (CNN). Each feature dimension is d-dimensional, and the specific value of d depends on the specific structure of CNN;
- the cosine distance formula is as follows:
- x 0 (f p ) is connected to the remaining k context nodes, and if the similarity between k context nodes is greater than ⁇ p , then the nodes are connected to construct the edges of the graph.
- the similarity between nodes is used as the weight on the edge, so as to obtain the adjacency matrix A p of the constructed graph;
- Step (4) In order to combine context information, update the nodes of the graph constructed in step (3) multiple times:
- the first node update using the message passing mechanism (A p X p ), in which, in order to make the features have better expressive ability, after the message passing mechanism, the splicing operation (concat) is used to splice the original features:
- the second node update due to the high degree of node similarity, in order to avoid the over-smoothing problem, the classic multi-head self-attention mechanism is used to update the nodes:
- the third node update two nonlinear projections are used:
- Step (5) Use the weighted pooling operation to obtain the feature x g representing the image, which represents the feature f p combined with context information, and finally the original feature x 0 of the picture P and the feature combined with context information x g is concatenated (concat) to obtain the final feature f p of the picture P, which is used for the classification of pedestrian re-identification, and the training set is used to train the model according to the above method;
- Step (6) In order to verify the accuracy of the method, test it on the test set.
- the test set includes the query picture set and the gallery picture set.
- the goal is to find a query picture that is the same as the query picture in the gallery picture set.
- the test method is to pass the query picture and all the gallery pictures through the above-trained model to extract the features of each pedestrian, then compare the similarity between the features of the query picture and all the features of the gallery pictures, and choose to return the top one with the largest similarity.
- this embodiment provides a pedestrian re-identification system fused with context information, and its problem-solving principle is similar to that of the pedestrian re-identification method fused with context information, and repeated descriptions will not be repeated.
- a pedestrian re-identification system that fuses contextual information, including:
- the extraction module is used to select a pedestrian re-identification data set and extract all pedestrian features in the data set;
- Construct a graph structure module which is used to select the context information of each pedestrian, and construct a graph structure composed of each pedestrian feature and its corresponding context information;
- the update module is used to update the node information of the constructed graph structure
- the pooling module is used to obtain pedestrian features combined with context information after the updated graph structure undergoes a weighted pooling operation
- the model construction module is used to splice the pooled pedestrian features with the corresponding original pedestrian features to obtain the final pedestrian classification features and construct a pedestrian re-identification model;
- the identification module is used to input the image of the pedestrian to be identified into the pedestrian re-identification model, and compare the similarity with all final pedestrian classification features to obtain the matching result of the pedestrian re-identification.
- the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
- computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种融合上下文信息的行人重识别方法及系统,包括以下步骤:选取行人重识别数据集,提取数据集中所有行人特征;选取每个行人的上下文信息,构造图结构;对构造好的图结构进行结点信息更新;将更新后的图结构经过带权重的池化操作后得到结合上下文信息的行人特征;将池化后的行人特征与相应的原始行人特征进行拼接,构建行人重识别模型;将待识别的行人图片输入行人重识别模型,并与所有最终行人分类特征进行相似度比较,得到行人重识别的匹配结果。本发明解决现有特征学习方法在信息上的不足和获取辅助信息的成本过大问题,使得行人重识别模型在不增加额外信息情况下提高模型的准确率。
Description
本发明涉及计算机视觉技术领域,具体涉及一种融合上下文信息的行人重识别方法及系统。
行人重识别是一项跨不同摄像头的图像检索技术,即对于给定监控下的行人图像,要求检索出该行人在其余摄像头中的全部图像。这项技术广泛应用于智能视频监控、安保、刑侦等领域,是当前计算机视觉的研究热点。
现有的基于特征学习的行人重识别方法主要可以分为基于全局特征的行人重识别和基于局部特征的行人重识别,以及基于辅助信息的行人重识别。基于全局和基于局部特征的行人重识别都仅从单个行人图像学习到行人图像的特征,一定程度上限制该特征的表达力。基于辅助信息的行人重识别方法需要额外的信息,如对该行人的文本描述,或利用GAN(生成对抗网络)生成一些伪数据等作为辅助信息,来提高行人重识别模型的鲁棒性,而这些信息的产生往往成本较大。
即,基于全局特征的方法,简单的将一个完整的行人图像做为模型的输入,对于一些遮挡、光线问题的行人图像由于信息减少,无法有效解决。基于局部特征的方法,将一个行人图像水平划分为多份,提取到多个局部特征,然后进行局部特征之间的对比,一定程度上可以提高模型准确率,然而,和基于全局特征类似,单张行人图像信息过少,对于遮挡等问题难以解决。基于辅助信息的行人重识别方法,利用了一些额外的信息来作为行人图像信息的补充,不仅增加了模型的计算成本,同时对于辅助信息也往往难以获取,不符合实际中解 决的问题。
发明内容
本发明的目的是提供一种融合上下文信息的行人重识别方法及系统,解决现有特征学习方法在信息上的不足和获取辅助信息的成本过大问题,使得行人重识别模型在不增加额外信息情况下提高模型的准确率。
为了解决上述技术问题,本发明提供了一种融合上下文信息的行人重识别方法,包括以下步骤:
S1:选取行人重识别数据集,提取数据集中所有行人特征;
S2:选取每个行人的上下文信息,构造每个行人特征与其对应的上下文信息组成的图结构;
S3:对S2中构造好的图结构进行结点信息更新;
S4:将更新后的图结构经过带权重的池化操作后得到结合上下文信息的行人特征;
S5:将池化后的行人特征与步骤S1中的相应的原始行人特征进行拼接,得到最终行人分类特征,构建行人重识别模型;
S6:将待识别的行人图片输入行人重识别模型,并与所有最终行人分类特征进行相似度比较,得到行人重识别的匹配结果。
作为本发明的进一步改进,所述S1中提取行人特征通过训练好的卷积神经网络进行提取。
作为本发明的进一步改进,所述S2中对于单个行人特征构造与其对应的上下文信息组成的图结构的方法,具体包括以下步骤:
S21:对于行人特征P,基于近邻算法在数据集中选取k个行人特征作为行人特征P的上下文信息,得到k+1个行人特征作为构造图的结点X;
S22:基于行人特征P与其余k个行人特征相连,k个行人特征之间余弦距离越大越可能相连的原则构造图的边,特征之间的相似度作为边的权重,得到图的邻接矩阵A。
作为本发明的进一步改进,所述S22中构造图的边的方法,具体包括以下步骤:
S221:对于单个行人的特征P,通过采用余弦距离的方式计算特征P和其余特征的相似程度,相似程度最大值记为σ
p;
S222:结点X与其余k个上下文结点相连,k个上下文结点之间相似程度大于σ
p,则结点相连构造图的边,结点与结点之间的相似程度作为边的权重。
作为本发明的进一步改进,所述S3中,对S2中构造好的图结构进行结点信息更新,具体包括以下步骤:
S31:采用消息传递过程对结点进行第一次信息更新;
S32:采用自注意力机制对结点进行第二次信息更新;
S33:采用非线性函数对结点进行第三次信息更新。
作为本发明的进一步改进,所述S31中对结点进行第一次信息更新后,拼接原来的结点进行下一次更新。
作为本发明的进一步改进,所述数据集包括训练集和测试集,利用训练集不断训练行人重识别模型,将测试集中待识别的行人图片输入行人重识别模型,并与训练集中所有最终行人分类特征进行相似度比较,选择相似程度最大的图片,被认为和待识别的行人图片中是同一行人。
一种融合上下文信息的行人重识别系统,包括:
提取模块,用于选取行人重识别数据集,提取数据集中所有行人特征;
构造图结构模块,用于选取每个行人的上下文信息,构造每个行人特征与其对应的上下文信息组成的图结构;
更新模块,用于对构造好的图结构进行结点信息更新;
池化模块,用于将更新后的图结构经过带权重的池化操作后得到结合上下文信息的行人特征;
模型构建模块,用于将池化后的行人特征与相应的原始行人特征进行拼接,得到最终行人分类特征,构建行人重识别模型;
识别模块,用于将待识别的行人图片输入行人重识别模型,并与所有最终行人分类特征进行相似度比较,得到行人重识别的匹配结果。
一种计算机设备,包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述的融合上下文信息的行人重识别方法的步骤。
一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述的融合上下文信息的行人重识别方法的步骤。
本发明的有益效果:本发明相较于基于全局和基于局部的行人重识别方法,不仅只利用到单独的一张行人图片信息,同时利用到来自其它图片的信息,统一提高该行人的特征鲁棒性;相较于基于辅助信息的行人重识别方法,本发明中未增加额外的信息,可以减少在信息获取及计算成本上的消耗,解决以往特征学习方法在信息上的不足和获取辅助信息的成本过大问题,使的行人重识别模型在不增加额外信息情况下提高模型的准确率。
图1是本发明方法流程示意图;
图2是本发明具体实施流程示意图。
下面结合附图和具体实施例对本发明作进一步说明,以使本领域的技术人员可以更好地理解本发明并能予以实施,但所举实施例不作为对本发明的限定。
实施例一
参考图1,本发明实施例提供了一种融合上下文信息的行人重识别方法,包括以下步骤:
S1:选取行人重识别数据集,提取数据集中所有行人特征;
S2:选取每个行人的上下文信息,构造每个行人特征与其对应的上下文信 息组成的图结构;
S3:对S2中构造好的图结构进行结点信息更新;
S4:将更新后的图结构经过带权重的池化操作后得到结合上下文信息的行人特征;
S5:将池化后的行人特征与步骤S1中的相应的原始行人特征进行拼接,得到最终行人分类特征,构建行人重识别模型;
S6:将待识别的行人图片输入行人重识别模型,并与所有最终行人分类特征进行相似度比较,得到行人重识别的匹配结果。
具体的,选取行人重识别数据集,数据集中包含有训练集和测试集;利用训练好的卷积神经网络,提取数据集中所有行人图片(包括训练集和测试集)的特征;对于提取到的每个行人特征,以行人特征P为例,基于近邻算法在数据集中(训练的时候为训练集,测试的时候为测试集)选取k个行人特征作为该行人特征P的上下文信息,这k+1个行人特征作为构造图的结点X,然后基于特征P与其余k个行人特征相连,k个行人特征之间余弦距离越大越可能相连的原则构造边,特征之间的相似度作为边的权重,得到图的邻接矩阵A。这样即构造好了行人特征P与它的上下文信息组成的图结构,其他行人特征类似;对S3构造好的图结构进行结点信息更新,采用消息传递过程对结点进行第一次信息更新。为了防止过平滑问题,采用自注意力机制对结点进行第二次信息更新,紧接着进行采用非线性函数对结点进行第三次信息更新;将更新后的图结构,经过带权重的池化操作后即得到了结合了上下文信息的行人特征,为了防止经过图更新后丢失一些信息,将池化后的特征与S2步骤中的原始特征进行拼接(concat),得到一个行人的最终特征表示,该特征用于行人分类,利用训练集不断训练这个行人重识别模型。最后,将测试集中待识别的行人图片输入S5构建好的行人重识别模型,并与候选行人库中的行人属性特征进行相似度比较,得到行人重识别的匹配结果。相较于基于全局和基于局部的行人重识别方法,本方法不仅仅只利用到单独的一张行人图片信息,还利用到来自其它图片的信息,使得行人重识别模型的输入不仅仅局限于一张行人图像,而是使用多个行 人图像统一提高该行人的特征鲁棒性;相较于基于辅助信息的行人重识别方法,本方法中未增加额外的信息,即行人重识别模型在不增加额外信息情况下提高模型的准确率。
实施例二
参考图1和图2,本发明实施例提供了一种融合上下文信息的行人重识别方法,基于实施例一,以行人重识别数据集DukeMTMC为例,具体包括以下步骤:
步骤(1):DukeMTMC数据集包含训练集和测试集,假定训练集一共有N张图片,通过一个训练好的卷积神经网络(CNN)提取这N张行人图片的特征
每个特征维度为d维,d的具体数值取决于CNN的具体结构;
步骤(2):对于单个行人的特征f
p,通过采用余弦距离的方式计算特征f
p和其余特征的相似程度,再从大到小排序(最大值记为σ
p),取前k个特征作为特征f
p的上下文信息,接着将这k+1个特征
作为构建图的结点X
p={x
0,x
1,x
2,…,x
k},根据分析和实验验证,k=3是一个最优的选择;
余弦距离公式如下:
进一步的按照x
0(f
p)与其余k个上下文结点相连,而k个上下文结点之间若相似程度大于σ
p,则结点相连的原则构建图的边。结点与结点之间相似程度作为边上的权重,从而得到构建的图的邻接矩阵A
p;
步骤(4):为了结合上下文信息,对步骤(3)构建的图的结点进行多次更新:
第一次结点更新:采用消息传递机制(A
pX
p),其中,为了使特征具有更好的表达能力,在消息传递机制后,采用了拼接操作(concat)拼接原来特征:
第二次结点更新:由于结点相似程度较高,为避免过平滑问题,采用经典的多头自注意力机制进行结点的更新:
第三次结点更新:采用两次非线性投影:
步骤(5):利用带权重的池化操作获得表示该图的特征x
g,该特征即表示结合了上下文信息的特征f
p,最后将图片P的原始特征x
0和结合了上下文信息的特征x
g进行拼接(concat),得到图片P的最终特征f
p,该特征用于行人重识别的分类,用训练集来按上述方法训练模型;
步骤(6):为了验证方法准确率,在测试集上测试,测试集包含query图片集,和gallery图片集,目标是给定一张query图片,在gallery图片集中找出与query图片为同一个行人的图片。测试方法为将query图片和所有gallery图片都通过上述训练好的模型,提取得到每一个行人的特征,然后比较query图片的特征和所有gallery图片特征之间的相似程度,选择返回相似程度最大的前几张gallery图片,这些图片被认为和query图片是同一个行人。
实施例三
基于同一发明构思,本实施例提供了一种融合上下文信息的行人重识别系统,其解决问题的原理与所述融合上下文信息的行人重识别方法类似,重复之处不再赘述。
一种融合上下文信息的行人重识别系统,包括:
提取模块,用于选取行人重识别数据集,提取数据集中所有行人特征;
构造图结构模块,用于选取每个行人的上下文信息,构造每个行人特征与其对应的上下文信息组成的图结构;
更新模块,用于对构造好的图结构进行结点信息更新;
池化模块,用于将更新后的图结构经过带权重的池化操作后得到结合上下文信息的行人特征;
模型构建模块,用于将池化后的行人特征与相应的原始行人特征进行拼接,得到最终行人分类特征,构建行人重识别模型;
识别模块,用于将待识别的行人图片输入行人重识别模型,并与所有最终行人分类特征进行相似度比较,得到行人重识别的匹配结果。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述实施例仅是为充分说明本发明而所举的较佳的实施例,本发明的保护范围不限于此。本技术领域的技术人员在本发明基础上所作的等同替代或变换,均在本发明的保护范围之内。本发明的保护范围以权利要求书为准。
Claims (10)
- 一种融合上下文信息的行人重识别方法,其特征在于:包括以下步骤:S1:选取行人重识别数据集,提取数据集中所有行人特征;S2:选取每个行人的上下文信息,构造每个行人特征与其对应的上下文信息组成的图结构;S3:对S2中构造好的图结构进行结点信息更新;S4:将更新后的图结构经过带权重的池化操作后得到结合上下文信息的行人特征;S5:将池化后的行人特征与步骤S1中的相应的原始行人特征进行拼接,得到最终行人分类特征,构建行人重识别模型;S6:将待识别的行人图片输入行人重识别模型,并与所有最终行人分类特征进行相似度比较,得到行人重识别的匹配结果。
- 如权利要求1所述的一种融合上下文信息的行人重识别方法,其特征在于:所述S1中提取行人特征通过训练好的卷积神经网络进行提取。
- 如权利要求1所述的一种融合上下文信息的行人重识别方法,其特征在于:所述S2中对于单个行人特征构造与其对应的上下文信息组成的图结构的方法,具体包括以下步骤:S21:对于行人特征P,基于近邻算法在数据集中选取k个行人特征作为行人特征P的上下文信息,得到k+1个行人特征作为构造图的结点X;S22:基于行人特征P与其余k个行人特征相连,k个行人特征之间余弦距离越大越可能相连的原则构造图的边,特征之间的相似度作为边的权重,得到图的邻接矩阵A。
- 如权利要求3所述的一种融合上下文信息的行人重识别方法,其特征在于:所述S22中构造图的边的方法,具体包括以下步骤:S221:对于单个行人的特征P,通过采用余弦距离的方式计算特征P和其余特征的相似程度,相似程度最大值记为σ p;S222:结点X与其余k个上下文结点相连,k个上下文结点之间相似程度大于σ p,则结点相连构造图的边,结点与结点之间的相似程度作为边的权重。
- 如权利要求1所述的一种融合上下文信息的行人重识别方法,其特征在于:所述S3中,对S2中构造好的图结构进行结点信息更新,具体包括以下步骤:S31:采用消息传递过程对结点进行第一次信息更新;S32:采用自注意力机制对结点进行第二次信息更新;S33:采用非线性函数对结点进行第三次信息更新。
- 如权利要求5所述的一种融合上下文信息的行人重识别方法,其特征在于:所述S31中对结点进行第一次信息更新后,拼接原来的结点进行下一次更新。
- 如权利要求1所述的一种融合上下文信息的行人重识别方法,其特征在于:所述数据集包括训练集和测试集,利用训练集不断训练行人重识别模型,将测试集中待识别的行人图片输入行人重识别模型,并与训练集中所有最终行人分类特征进行相似度比较,选择相似程度最大的图片,被认为和待识别的行人图片中是同一行人。
- 一种融合上下文信息的行人重识别系统,其特征在于:包括:提取模块,用于选取行人重识别数据集,提取数据集中所有行人特征;构造图结构模块,用于选取每个行人的上下文信息,构造每个行人特征与其对应的上下文信息组成的图结构;更新模块,用于对构造好的图结构进行结点信息更新;池化模块,用于将更新后的图结构经过带权重的池化操作后得到结合上下文信息的行人特征;模型构建模块,用于将池化后的行人特征与相应的原始行人特征进行拼接,得到最终行人分类特征,构建行人重识别模型;识别模块,用于将待识别的行人图片输入行人重识别模型,并与所有最终 行人分类特征进行相似度比较,得到行人重识别的匹配结果。
- 一种计算机设备,包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至7任一项所述的融合上下文信息的行人重识别方法的步骤。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1到7任一项所述的融合上下文信息的行人重识别方法的步骤。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110722073.1 | 2021-06-28 | ||
CN202110722073.1A CN113283394B (zh) | 2021-06-28 | 2021-06-28 | 一种融合上下文信息的行人重识别方法及系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023272801A1 true WO2023272801A1 (zh) | 2023-01-05 |
Family
ID=77285898
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/106989 WO2023272801A1 (zh) | 2021-06-28 | 2021-07-19 | 一种融合上下文信息的行人重识别方法及系统 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113283394B (zh) |
WO (1) | WO2023272801A1 (zh) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109472191A (zh) * | 2018-09-17 | 2019-03-15 | 西安电子科技大学 | 一种基于时空上下文的行人重识别与追踪方法 |
CN111950372A (zh) * | 2020-07-13 | 2020-11-17 | 南京航空航天大学 | 一种基于图卷积网络的无监督行人重识别方法 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107316031B (zh) * | 2017-07-04 | 2020-07-10 | 北京大学深圳研究生院 | 用于行人重识别的图像特征提取方法 |
CN110532884B (zh) * | 2019-07-30 | 2024-04-09 | 平安科技(深圳)有限公司 | 行人重识别方法、装置及计算机可读存储介质 |
CN110738146B (zh) * | 2019-09-27 | 2020-11-17 | 华中科技大学 | 一种目标重识别神经网络及其构建方法和应用 |
CN112347995B (zh) * | 2020-11-30 | 2022-09-23 | 中国科学院自动化研究所 | 基于像素和特征转移相融合的无监督行人再识别方法 |
-
2021
- 2021-06-28 CN CN202110722073.1A patent/CN113283394B/zh active Active
- 2021-07-19 WO PCT/CN2021/106989 patent/WO2023272801A1/zh active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109472191A (zh) * | 2018-09-17 | 2019-03-15 | 西安电子科技大学 | 一种基于时空上下文的行人重识别与追踪方法 |
CN111950372A (zh) * | 2020-07-13 | 2020-11-17 | 南京航空航天大学 | 一种基于图卷积网络的无监督行人重识别方法 |
Non-Patent Citations (3)
Title |
---|
CAO MIN; CHEN CHEN; HU XIYUAN; PENG SILONG: "From Groups to Co-Traveler Sets: Pair Matching Based Person Re-identification Framework", 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), IEEE, 22 October 2017 (2017-10-22), pages 2573 - 2582, XP033303729, DOI: 10.1109/ICCVW.2017.302 * |
YAN YICHAO, QIN JIE, NI BINGBING, CHEN JIAXIN, LIU LI, ZHU FAN, ZHENG WEI-SHI, YANG XIAOKANG, SHAO LING: "Learning Multi-Attention Context Graph for Group-Based Re-Identification", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE COMPUTER SOCIETY., USA, vol. 1, 20 October 2020 (2020-10-20), USA , pages 1 - 18, XP093018688, ISSN: 0162-8828, DOI: 10.1109/TPAMI.2020.3032542 * |
YICHAO YAN; QIANG ZHANG; BINGBING NI; WENDONG ZHANG; MINGHAO XU; XIAOKANG YANG: "Learning Context Graph for Person Search", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 3 April 2019 (2019-04-03), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081164291 * |
Also Published As
Publication number | Publication date |
---|---|
CN113283394A (zh) | 2021-08-20 |
CN113283394B (zh) | 2023-04-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109344787B (zh) | 一种基于人脸识别与行人重识别的特定目标跟踪方法 | |
CN111582409B (zh) | 图像标签分类网络的训练方法、图像标签分类方法及设备 | |
Arietta et al. | City forensics: Using visual elements to predict non-visual city attributes | |
CN110866140A (zh) | 图像特征提取模型训练方法、图像搜索方法及计算机设备 | |
CN110503076B (zh) | 基于人工智能的视频分类方法、装置、设备和介质 | |
WO2023179429A1 (zh) | 一种视频数据的处理方法、装置、电子设备及存储介质 | |
CN112183468A (zh) | 一种基于多注意力联合多级特征的行人再识别方法 | |
CN112949534B (zh) | 一种行人重识别方法、智能终端及计算机可读存储介质 | |
CN112507912B (zh) | 一种识别违规图片的方法及装置 | |
US11908222B1 (en) | Occluded pedestrian re-identification method based on pose estimation and background suppression | |
Jaiswal et al. | Aird: Adversarial learning framework for image repurposing detection | |
WO2023185074A1 (zh) | 一种基于互补时空信息建模的群体行为识别方法 | |
CN113033507A (zh) | 场景识别方法、装置、计算机设备和存储介质 | |
CN113065409A (zh) | 一种基于摄像分头布差异对齐约束的无监督行人重识别方法 | |
CN113591758A (zh) | 一种人体行为识别模型训练方法、装置及计算机设备 | |
CN114332893A (zh) | 表格结构识别方法、装置、计算机设备和存储介质 | |
CN112801138A (zh) | 基于人体拓扑结构对齐的多人姿态估计方法 | |
CN117079310A (zh) | 一种图文多模态融合的行人重识别方法 | |
CN115359492A (zh) | 文本图像匹配模型训练方法、图片标注方法、装置、设备 | |
CN112241470B (zh) | 一种视频分类方法及系统 | |
WO2023272801A1 (zh) | 一种融合上下文信息的行人重识别方法及系统 | |
Nguyen et al. | Fusion schemes for image-to-video person re-identification | |
Zheng et al. | Query attack via opposite-direction feature: Towards robust image retrieval | |
Vacchetti et al. | Cinematographic shot classification through deep learning | |
Mao et al. | An image authentication technology based on depth residual network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21947744 Country of ref document: EP Kind code of ref document: A1 |