CN112906605A - Cross-modal pedestrian re-identification method with high accuracy - Google Patents

Cross-modal pedestrian re-identification method with high accuracy Download PDF

Info

Publication number
CN112906605A
CN112906605A CN202110243887.7A CN202110243887A CN112906605A CN 112906605 A CN112906605 A CN 112906605A CN 202110243887 A CN202110243887 A CN 202110243887A CN 112906605 A CN112906605 A CN 112906605A
Authority
CN
China
Prior art keywords
pedestrian
cross
modal
sample
double
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110243887.7A
Other languages
Chinese (zh)
Other versions
CN112906605B (en
Inventor
张立言
杜国栋
徐旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202110243887.7A priority Critical patent/CN112906605B/en
Publication of CN112906605A publication Critical patent/CN112906605A/en
Application granted granted Critical
Publication of CN112906605B publication Critical patent/CN112906605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a high-accuracy cross-modal pedestrian re-identification method, which comprises the following steps of: acquiring real pedestrian video information under a monitoring environment from the data set, and processing to obtain a pedestrian image sample and pedestrian identity information; building a multi-scale combined double-current cross-modal depth network, initializing network parameters, using a pedestrian image sample and pedestrian identity information as supervision information, and performing supervised training on the double-current cross-modal depth network; and taking the interested pedestrian target query as the input of the double-flow cross-modal depth network, wherein the double-flow cross-modal depth network gives a pedestrian target list with higher similarity with the query target. The method can process heterogeneous sample information of two modes simultaneously, extract mode common features in the sample, and form features with more representation capability through feature fusion of the global scale and the local scale.

Description

Cross-modal pedestrian re-identification method with high accuracy
Technical Field
The invention discloses a method for realizing high-accuracy cross-mode and cross-camera pedestrian matching by applying deep learning and knowledge introduction, and belongs to the field of computer vision.
Background
With the development of society, road monitoring systems are more and more popular. Due to the performance problems of surveillance cameras and the changes of the environmental conditions of surveillance, the face recognition technology in people recognition does not work well in the cross-camera pedestrian tracking, so the importance of pedestrian re-recognition topic is becoming more and more prominent [1 ]. And (4) re-identifying the pedestrian, aiming at searching and screening the same personnel target in a cross-camera environment and further determining the activity track [1] of the corresponding personnel target. Meanwhile, because most images shot by a night camera are infrared images and are different from RGB images shot in the day time, the mode difference between two modes is difficult to overcome by using the traditional Re-ID method, and therefore, the cross-mode Re-ID is proposed to solve the problem [3 ].
With the great popularization of monitoring equipment and the arrival of a big data era, the pedestrian retrieval and matching have more and more significant significance in the field of public safety. However, conventional face recognition techniques do not work well under road monitoring due to limitations in road monitoring installations and performance issues of monitoring equipment [1 ]. Therefore, the necessity and importance of the pedestrian re-identification technology research which can be adapted to the road monitoring task environment and can complete the cross-camera specific person identification tracking are more and more emphasized.
At present, researches on pedestrian Re-identification mainly focus on three aspects, namely single-mode Re-ID, unsupervised Re-ID, cross-mode Re-ID, shielding Re-ID removal, dense crowd, cross-resolution and other small directions. The research topic of the monomodal Re-ID is put forward earliest, so the development is the most perfect, and the research topic lays the foundation for the subsequent other related research directions. The unsupervised Re-ID is a research direction proposed on the premise that the label is difficult to obtain in practical application, and can be said to be a treatment which is necessary in the practical application of the Re-ID. The cross-modal Re-ID is proposed based on the practical need of tracking detection, and most of the criminal activities are performed at night, so that matching between pedestrian images at night and in the day becomes an increasingly important subject.
The monomodal Re-ID has achieved very high matching accuracy on the existing Re-ID dataset, and a number of more robust baselines have been proposed. The single-mode Re-ID is supervised training of RGB images under the condition of a manual label, and the method mainly aims to excavate details with distinguishing force in a pedestrian sample to form sample characteristics so as to improve matching accuracy. Of these methods, self-attribute and part-level feature-based methods are particularly effective. The method of Self-attention in pedestrian re-identification is to block a pedestrian image, and reshape the content of each block by using the relationship between image blocks and the weight of each image block in final matching, so that the reshaped image block can provide stronger distinguishing capability. The method based on part-level features is more direct, and compared with a typical PCB model, the method is to directly transversely divide the image, and each small image block after division is used for representing the original whole pedestrian sample, so that the method forces the model to pay more attention to the detailed region of the image. The segmentation is based on prior knowledge, six pieces of PCB are segmented, corresponding to the human body, and the PCB is segmented into three pieces corresponding to the head, the upper body and the lower body by some methods.
The problems that need to be overcome across modality Re-IDs are much more numerous than for monomodal Re-IDs. In addition to the need to extract sample features that are well characterized, modal differences need to be overcome. The practice of most of the current papers [3[10] is to utilize a two-stream network structure to respectively process two modes, and then use a sharing layer to extract mode sharing characteristics, thereby extracting reliable characteristics. Other methods are dedicated to the role of unique features of the modalities, and one typical method is to fill in missing features of the modality with samples of another modality that are labeled, so as to achieve feature balance between the samples. Similar effects can be achieved by using GAN, and the methods retain the content of the original sample, replace the content with the style characteristics of another modality, expand the data set and achieve the balance of the samples among the modalities [2 ].
Reference documents:
[1]Ye,Mang.(2020).Deep Learning for Person Re-identification:A Survey and Outlook.
[2]Choi,Seokeon&Lee,Sumin&Kim,Youngeun&Kim,Taekyung&Kim,Changick.(2020).Hi-CMD:Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification.10254-10263.10.1109/CVPR42600.2020.01027.
[3]Ye,Mang&Shen,Jianbing&Crandall,David&Shao,Ling&Luo,Jiebo.(2020).Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification.
[4]Wang,Guan-An&Zhang,Tianzhu&Yang,Yang&Cheng,Jian&Chang,Jianlong&Liang,Xu&Hou,Zeng-Guang.(2020).Cross-Modality Paired-Images Generation for RGB-Infrared Person Re-Identification.Proceedings of the AAAI Conference on Artificial Intelligence.34.12144-12151.10.1609/aaai.v34i07.6894.
[5]Y.Lu et al.,"Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer,"2020IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),Seattle,WA,USA,2020,pp.13376-13386,doi:10.1109/CVPR42600.2020.01339.
[6]Jia,Mengxi&Zhai,Yunpeng&Lu,Shijian&Ma,Siwei&Zhang,Jian.(2020).A Similarity Inference Metric for RGB-Infrared Cross-Modality Person Re-identification.
[7]Fan,Xing&Luo,Hao&Zhang,Chi&Jiang,Wei.(2020).Cross-Spectrum Dual-Subspace Pairing for RGB-infrared Cross-Modality Person Re-Identification.
[8]Zhang,Ziyue&Jiang,Shuai&Huang,Congzhentao&Li,Yang&Xu,Richard.(2020).RGB-IR Cross-modality Person ReID based on Teacher-Student GAN Model.
[9]Wang,Guanan&Zhang,Tianzhu&Cheng,Jian&Liu,Si&Yang,Yang&Hou,Zengguang.(2019).RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment.
[10]Zhu,Yuanxin&Yang,Zhao&Wang,Li&Zhao,Sai&Hu,Xiao&Tao,Dapeng.(2019).Hetero-Center Loss for Cross-Modality Person Re-Identification.Neurocomputing.386.10.1016/j.neucom.2019.12.100.
[11]Wang,Zhixiang&Wang,Zheng&Zheng,Yinqiang&Chuang,Yung-Yu&Satoh,Shin'ich.(2019).Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification.618-626.10.1109/CVPR.2019.00071.
[12]Hao,Yi&Wang,Nannan&Gao,Xinbo&Li,Jie&Wang,Xiaoyu.(2019).Dual-alignment Feature Embedding for Cross-modality Person Re-identification.57-65.10.1145/3343031.3351006.
[13]Ye,Mang&Lan,Xiangyuan&Leng,Qingming.(2019).Modality-aware Collaborative Learning for Visible Thermal Person Re-Identification.347-355.10.1145/3343031.3351043.
[14]Liu,Haijun&Cheng,Jian.(2019).Enhancing the Discriminative Feature Learning for Visible-Thermal Cross-Modality Person Re-Identification.
[15]Basaran,Emrah&
Figure BDA0002963350310000031
Muhittin&Kamasak,Mustafa.(2020).An efficient framework for visible-infrared cross modality person re-identification.Signal Processing:Image Communication.87.115933.10.1016/j.image.2020.115933.
[16]Pingyang,Dai&Ji,Rongrong&Wang,Haibin&Wu,Qiong&Huang,Yuyu.(2018).Cross-Modality Person Re-Identification with Generative Adversarial Training.677-683.10.24963/ijcai.2018/94.
[17]Wang,Guanan&Zhang,Tianzhu&Cheng,Jian&Liu,Si&Yang,Yang&Hou,Zengguang.(2019).RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment.
[18]Sun,Y.,Zheng,L.,Yang,Y.,Tian,Q.,&Wang,S.(2018).Beyond Part Models:Person Retrieval with Refined Part Pooling.ArXiv,abs/1711.09349.
[19]Wang,Guanshuo&Yuan,Yufeng&Chen,Xiong&Li,Jiwei&Zhou,Xi.(2018).Learning Discriminative Features with Multiple Granularities for Person Re-Identification.
[20]Ge,Y.,Chen,D.,&Li,H.(2020).Mutual Mean-Teaching:Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification.ArXiv,abs/2001.01526.
disclosure of Invention
In order to solve the defects in the prior art, a more effective method needs to be adopted to process the difference between two different modes, namely the RGB mode and the IR mode, so that a more reasonable feature space is formed, and the subsequent detection and matching are facilitated. In order to achieve the corresponding purpose, not only the similarity relationship between different identity samples in a single mode needs to be processed, but also the similarity relationship between the same identity samples in cross-mode needs to be processed, so that the purpose of finally detecting the same sample in another mode through a query in one mode is achieved.
The invention provides a multi-scale combined double-flow network structure, which can process heterogeneous sample information of two modes simultaneously, extract mode common characteristics in a sample, and form characteristics with more representation capability through the characteristic fusion of the global scale and the local scale, has reasonable design, meets the modeling requirement and has good effect.
In order to achieve the purpose, the invention adopts the technical scheme that:
a cross-modal pedestrian re-identification method with high accuracy comprises the following steps:
step 1, acquiring pedestrian video information under a real monitoring environment from a data set, preprocessing the whole segment of pedestrian video information to obtain a pedestrian image sample, intercepting key pedestrian images in the video and marking corresponding pedestrian identity information;
step 2, building a multi-scale combined double-current cross-modal depth network, initializing network parameters, using the pedestrian image sample obtained in the step 1 and pedestrian identity information as supervision information, performing supervised training on the double-current cross-modal depth network, after the training is finished, finely adjusting hyper-parameters in the double-current cross-modal depth network according to a final training effect, and fixing the network parameters;
and 3, taking the interested pedestrian target query as the input of the double-flow cross-modal depth network, giving a pedestrian target list with higher similarity with the query target by the double-flow cross-modal depth network, and searching the pedestrian target with the same identity according to the pedestrian target list by an operator from high similarity to low similarity to track the pedestrian.
In the first step, the preprocessed pedestrian image sample comprises an image containing the physical appearance characteristics of the pedestrian, pedestrian identity information corresponding to the image sample, and original video sequence information of the pedestrian image sample.
In the step 1, the data set is a SYSU-MM01 true data set.
In the step 2, the double-current cross-modal deep network uses ResNet-50 which is pre-trained on ImageNet in a pyrrch model library as a skeleton network; the dual-stream cross-modal depth network is divided into local branches and global branches, each branch comprises a dual-stream structure and is used for processing sample characteristics of two modalities.
In the step 2, in the Global branch, the layer0 part in the ResNet-50 is used as a double-flow structure, and the following layers 1 to 4 are used as network structures sharing parameters; the double-flow part does not share parameters, respectively extracts the characteristics of two modes, reserves part of mode characteristic information, and the shared parameter network part extracts the mode shared characteristics of two different mode samples and continues to utilize the shared characteristics to carry out subsequent optimization operation; the subsequent optimization operations include: reducing the dimension of the extracted features by using a linear layer, reducing the number of model parameters, reducing the calculation load, and optimizing a feature space by using the triple loss of the samples which are difficult to be loaded and the cross entropy loss after obtaining the final features, wherein the cross entropy loss is used for optimizing the sample relation in the modes, and the triple loss is used for optimizing the sample relation between the modes;
the Local branch comprises two sub-branches, and three blocks and six blocks of samples are respectively split; the double-flow structure of the Local branch is characterized in that all layers of ResNet-50 are not subjected to parameter sharing, more modal characteristics are reserved, the sample characteristics are horizontally divided into three blocks corresponding to the head, the upper half and the lower half of a human body after a backbone is carried out, and six blocks correspond to more fine human body parts; the features of the two modal samples are connected after being cut, dimension reduction is carried out on the features entering a linear layer sharing parameters, and then each cut feature block is independently optimized for a target function; wherein, a cross entropy function is adopted to optimize a characteristic space in a mode; introducing a target function based on a cluster center, and utilizing the cluster center of the same-identity sample to realize an optimization target with a closer distance to the same-identity sample and a farther distance to the different-feature sample;
in the final matching detection stage, the features of all local branches and the features of global branches are connected to form a descriptor with more distinguishing characteristics.
The invention has the beneficial effects that:
(1) the method is different from the existing cross-modal pedestrian re-identification method, is a multi-scale combined characteristic characterization method, not only has coarse-grained characterization characteristics in the traditional method, but also fuses corresponding fine-grained sample local characteristics, and improves the characterization distinguishing capability of the method.
(2) The invention introduces the target function based on the cluster center, can more effectively process the characteristic space between the modes, and meanwhile, the new target function adapts to the optimization mode of local characteristics without introducing corresponding negative effects, thereby greatly improving the accuracy of the method. The difference between the modes can be effectively processed, the Euclidean distance is used as a measurement standard for the target function, the average value of sample characteristics with the same identity is used as a cluster center, the difference value is used as a calculation target, the distance between samples with the same identity in the cross-mode is shortened, the hyper-parameter is used as a margin, and the distance between the samples with different identities in the cross-mode is expanded by using the difference value to form a good characteristic space.
(3) The invention introduces a heterogeneous double-flow network structure, processes global information by using a network structure with more sharing layers, processes local information by using a network structure with less sharing layers, and respectively meets the requirements of target functions of corresponding branches, thereby further improving the cross-modal processing capability.
(4) The invention realizes the combination of the global coarse-grained characteristic and the local fine-grained characteristic. The Global features extracted from the Global branches can provide comprehensive sample information, meanwhile, the local branches can provide fine local features on the basis of feature segmentation, detailed information lacking in the Global features is made up, and distinguishing capability of the model is improved.
Drawings
Fig. 1 is a schematic structural diagram of a dual-stream cross-modal depth network.
Detailed Description
The present invention is further described below.
The invention relates to a high-accuracy cross-modal pedestrian re-identification method, which comprises the following steps of:
step 1, data preparation and formalization definition: the original data of pedestrian re-identification is usually a monitoring video, and key pedestrian information in the monitoring video needs to be cut out manually or by an algorithm. The method is a cross-mode pedestrian re-identification algorithm based on images, so that an image identification and cutting algorithm is used as a front-end method to cut pedestrian images in a video clip and mark corresponding pedestrian identity information to distinguish different pedestrian targets. In the invention, a SYSU-MM01 real data set is used, and the data set is not subjected to deep manual labeling, has a certain noise label and accords with an actual application scene. Preprocessing the whole section of pedestrian video information to obtain a pedestrian image sample, intercepting key pedestrian images in the video and marking corresponding pedestrian identity information; the preprocessed pedestrian image sample comprises an image containing the physical appearance characteristics of the pedestrian, pedestrian identity information corresponding to the image sample and original video sequence information of the pedestrian image sample.
Step 2, constructing a multi-scale combined cross-mode double-flow pedestrian re-identification network: and building a corresponding network structure according to the model schematic diagram shown in FIG. 1. And (2) carrying out supervised training on the double-current cross-modal depth network by using the pedestrian image sample obtained in the step (1) and the pedestrian identity information as supervision information, and after the training is finished, finely adjusting the hyper-parameters in the double-current cross-modal depth network according to the final training effect until a relatively good effect is achieved, and fixing the network parameters.
ResNet-50 pre-trained on ImageNet in the pyrtch model library was used as the backbone network for the model. The model is divided into local and global branches, each of which contains a dual-flow structure to handle sample features of both modalities. The double-flow network is a parallel network with different structures and different shared parameters, and a cross-modal task is processed by using the double-flow network structure, so that unique characteristics of partial modes can be reserved, and subsequent optimization operation is facilitated.
In the Global branch, the layer0 part in ResNet-50 is used as a dual-stream structure, and the following layers 1 to 4 are used as network structures sharing parameters. The double-flow part does not share parameters, respectively extracts the characteristics of two modes, reserves part of mode characteristic information, and the shared parameter network part aims to extract the mode shared characteristics of two different mode samples and continuously utilize the shared characteristics to carry out subsequent optimization operation. And subsequently, reducing the dimension of the extracted features by using a linear layer, reducing the number of model parameters, reducing the calculation load, and optimizing a feature space by using the triple loss of the samples which are difficult to be loaded and the cross entropy loss after obtaining the final features, wherein the cross entropy loss aims at optimizing the sample relation in the modes, and the triple loss aims at optimizing the sample relation among the modes. The hard negative sample triplet loss is expressed as:
Figure BDA0002963350310000071
and P is P pedestrian identity labels randomly selected from each mini-batch, and K is the number of pedestrian samples selected from each pedestrian label, so that each mini-batch has P × K samples in total. f represents the operation of extracting features of the samples by the model, D is a measurement standard, and the Euclidean distance is adopted to judge the distance between the two samples.
The cross entropy loss is expressed as:
Figure BDA0002963350310000072
wherein f represents the sample characteristics extracted by the network, and W is the corresponding same-dimension weight vector.
The Local branch comprises two subbranches, and three blocks and six blocks of segmentation are respectively carried out on the samples. The double-flow structure of the Local branch is different from the global branch, and the double-flow structure of the Local branch has the advantages that all layers of ResNet-50 are not subjected to parameter sharing, more modal characteristics are reserved, and subsequent optimization is facilitated. After the backbone, the sample characteristics are horizontally divided into three blocks corresponding to the head, upper body and lower body of the human body, and six blocks corresponding to the finer human body parts. The features of the two modal samples are connected after being cut, dimension reduction is carried out in a linear module sharing parameters, and then each cut feature block is independently optimized for an objective function. The linear module includes three operations, linear layer dimension reduction, ReLU function reactivation and regularization. Likewise, the cross-entropy function primarily optimizes the feature space within the modality. An objective function based on a cluster center is introduced, and the purpose is to realize an optimization target that the distance between the cluster center and the identity sample is closer and the distance between the different feature samples is farther by utilizing the cluster center of the identity sample. The cluster center based objective function is represented as:
Figure BDA0002963350310000081
and the I represents calculating Euclidean distance, adding and averaging to calculate and solve the center of the same-identity sample cluster and drawing the same-identity sample cluster closer, and the center of the different-identity sample cluster gradually gets away under the action of the hyperparameter mu.
In the final matching detection stage, the features of all local branches and the features of global branches are connected to form a descriptor with more distinguishing characteristics.
In the invention, the model is trained by using marked pedestrian sample data, the training period is 80, the learning rate is set to be 0.01, and each 20 samples are reduced to be 0.1 of the model along with the gradual reduction of the training period. After the whole training period is finished, the parameters of the model after training are stored, so that the subsequent detection processing is facilitated.
And 3, sorting the pedestrian target images needing to be detected, selecting images with more characteristics as query images to be transmitted into the trained model, detecting and matching the pedestrian sample images generated by all the monitoring videos as a detection set, outputting pedestrian samples with high matching degree by the model, arranging the pedestrian samples from high to low in similarity degree, and finding the pedestrian targets with the same identity according to a pedestrian target list by an operator to track the pedestrians.

Claims (5)

1. A cross-modal pedestrian re-identification method with high accuracy is characterized in that: the method comprises the following steps:
step 1, acquiring pedestrian video information under a real monitoring environment from a data set, preprocessing the whole segment of pedestrian video information to obtain a pedestrian image sample, intercepting key pedestrian images in the video and marking corresponding pedestrian identity information;
step 2, building a multi-scale combined double-current cross-modal depth network, initializing network parameters, using the pedestrian image sample obtained in the step 1 and pedestrian identity information as supervision information, performing supervised training on the double-current cross-modal depth network, after the training is finished, finely adjusting hyper-parameters in the double-current cross-modal depth network according to a final training effect, and fixing the network parameters;
and 3, taking the interested pedestrian target query as the input of the double-flow cross-modal depth network, giving a pedestrian target list with higher similarity with the query target by the double-flow cross-modal depth network, and searching the pedestrian target with the same identity according to the pedestrian target list by an operator from high similarity to low similarity to track the pedestrian.
2. The method of claim 1, wherein the cross-modal pedestrian re-identification method with high accuracy is characterized in that: in the first step, the preprocessed pedestrian image sample comprises an image containing the physical appearance characteristics of the pedestrian, pedestrian identity information corresponding to the image sample, and original video sequence information of the pedestrian image sample.
3. The method of claim 1, wherein the cross-modal pedestrian re-identification method with high accuracy is characterized in that: in the step 1, the data set is a SYSU-MM01 true data set.
4. The method of claim 1, wherein the cross-modal pedestrian re-identification method with high accuracy is characterized in that: in the step 2, the double-current cross-modal deep network uses ResNet-50 which is pre-trained on ImageNet in a pyrrch model library as a skeleton network; the dual-stream cross-modal depth network is divided into local branches and global branches, each branch comprises a dual-stream structure and is used for processing sample characteristics of two modalities.
5. The method of claim 4, wherein the cross-modal pedestrian re-identification method with high accuracy is characterized in that: in the step 2, in the Global branch, the layer0 part in the ResNet-50 is used as a double-flow structure, and the following layers 1 to 4 are used as network structures sharing parameters; the double-flow part does not share parameters, respectively extracts the characteristics of two modes, reserves part of mode characteristic information, and the shared parameter network part extracts the mode shared characteristics of two different mode samples and continues to utilize the shared characteristics to carry out subsequent optimization operation; the subsequent optimization operations include: reducing the dimension of the extracted features by using a linear layer, reducing the number of model parameters, reducing the calculation load, and optimizing a feature space by using the triple loss of the samples which are difficult to be loaded and the cross entropy loss after obtaining the final features, wherein the cross entropy loss is used for optimizing the sample relation in the modes, and the triple loss is used for optimizing the sample relation between the modes;
the Local branch comprises two sub-branches, and three blocks and six blocks of samples are respectively split; the double-flow structure of the Local branch is characterized in that all layers of ResNet-50 are not subjected to parameter sharing, more modal characteristics are reserved, the sample characteristics are horizontally divided into three blocks corresponding to the head, the upper half and the lower half of a human body after a backbone is carried out, and six blocks correspond to more fine human body parts; the features of the two modal samples are connected after being cut, dimension reduction is carried out on the features entering a linear layer sharing parameters, and then each cut feature block is independently optimized for a target function; wherein, a cross entropy function is adopted to optimize a characteristic space in a mode; introducing a target function based on a cluster center, and utilizing the cluster center of the same-identity sample to realize an optimization target with a closer distance to the same-identity sample and a farther distance to the different-feature sample;
in the final matching detection stage, the features of all local branches and the features of global branches are connected to form a descriptor with more distinguishing characteristics.
CN202110243887.7A 2021-03-05 2021-03-05 Cross-mode pedestrian re-identification method with high accuracy Active CN112906605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110243887.7A CN112906605B (en) 2021-03-05 2021-03-05 Cross-mode pedestrian re-identification method with high accuracy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110243887.7A CN112906605B (en) 2021-03-05 2021-03-05 Cross-mode pedestrian re-identification method with high accuracy

Publications (2)

Publication Number Publication Date
CN112906605A true CN112906605A (en) 2021-06-04
CN112906605B CN112906605B (en) 2024-02-20

Family

ID=76108318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110243887.7A Active CN112906605B (en) 2021-03-05 2021-03-05 Cross-mode pedestrian re-identification method with high accuracy

Country Status (1)

Country Link
CN (1) CN112906605B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610071A (en) * 2021-10-11 2021-11-05 深圳市一心视觉科技有限公司 Face living body detection method and device, electronic equipment and storage medium
CN113705489A (en) * 2021-08-31 2021-11-26 中国电子科技集团公司第二十八研究所 Remote sensing image fine-grained airplane identification method based on priori regional knowledge guidance
CN113887382A (en) * 2021-09-29 2022-01-04 合肥工业大学 Cross-modal pedestrian re-identification method based on RGB-D, storage medium and equipment
CN113963150A (en) * 2021-11-16 2022-01-21 北京中电兴发科技有限公司 Pedestrian re-identification method based on multi-scale twin cascade network
CN114998925A (en) * 2022-04-22 2022-09-02 四川大学 Robust cross-modal pedestrian re-identification method facing twin noise label
CN115859175A (en) * 2023-02-16 2023-03-28 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Hydraulic shock absorber equipment abnormity detection method based on cross-mode generative learning
CN113705489B (en) * 2021-08-31 2024-06-07 中国电子科技集团公司第二十八研究所 Remote sensing image fine-granularity airplane identification method based on priori regional knowledge guidance

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200285896A1 (en) * 2019-03-09 2020-09-10 Tongji University Method for person re-identification based on deep model with multi-loss fusion training strategy
CN111931637A (en) * 2020-08-07 2020-11-13 华南理工大学 Cross-modal pedestrian re-identification method and system based on double-current convolutional neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200285896A1 (en) * 2019-03-09 2020-09-10 Tongji University Method for person re-identification based on deep model with multi-loss fusion training strategy
CN111931637A (en) * 2020-08-07 2020-11-13 华南理工大学 Cross-modal pedestrian re-identification method and system based on double-current convolutional neural network

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705489A (en) * 2021-08-31 2021-11-26 中国电子科技集团公司第二十八研究所 Remote sensing image fine-grained airplane identification method based on priori regional knowledge guidance
CN113705489B (en) * 2021-08-31 2024-06-07 中国电子科技集团公司第二十八研究所 Remote sensing image fine-granularity airplane identification method based on priori regional knowledge guidance
CN113887382A (en) * 2021-09-29 2022-01-04 合肥工业大学 Cross-modal pedestrian re-identification method based on RGB-D, storage medium and equipment
CN113887382B (en) * 2021-09-29 2024-02-23 合肥工业大学 RGB-D-based cross-mode pedestrian re-identification method, storage medium and device
CN113610071A (en) * 2021-10-11 2021-11-05 深圳市一心视觉科技有限公司 Face living body detection method and device, electronic equipment and storage medium
CN113963150A (en) * 2021-11-16 2022-01-21 北京中电兴发科技有限公司 Pedestrian re-identification method based on multi-scale twin cascade network
CN113963150B (en) * 2021-11-16 2022-04-08 北京中电兴发科技有限公司 Pedestrian re-identification method based on multi-scale twin cascade network
CN114998925A (en) * 2022-04-22 2022-09-02 四川大学 Robust cross-modal pedestrian re-identification method facing twin noise label
CN114998925B (en) * 2022-04-22 2024-04-02 四川大学 Robust cross-mode pedestrian re-identification method for twin noise label
CN115859175A (en) * 2023-02-16 2023-03-28 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Hydraulic shock absorber equipment abnormity detection method based on cross-mode generative learning
CN115859175B (en) * 2023-02-16 2023-05-23 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Hydraulic shock absorber equipment abnormality detection method based on cross-modal generation type learning

Also Published As

Publication number Publication date
CN112906605B (en) 2024-02-20

Similar Documents

Publication Publication Date Title
CN112906605B (en) Cross-mode pedestrian re-identification method with high accuracy
CN109961051B (en) Pedestrian re-identification method based on clustering and block feature extraction
Lin et al. RSCM: Region selection and concurrency model for multi-class weather recognition
Wang et al. A survey of vehicle re-identification based on deep learning
Huang et al. Multi-graph fusion and learning for RGBT image saliency detection
CN111666851B (en) Cross domain self-adaptive pedestrian re-identification method based on multi-granularity label
CN105528794A (en) Moving object detection method based on Gaussian mixture model and superpixel segmentation
Tang et al. Multi-modal metric learning for vehicle re-identification in traffic surveillance environment
Ren et al. A novel squeeze YOLO-based real-time people counting approach
Zheng et al. Robust multi-modality person re-identification
Liu et al. An end-to-end deep model with discriminative facial features for facial expression recognition
CN112801019B (en) Method and system for eliminating re-identification deviation of unsupervised vehicle based on synthetic data
Ren et al. Parallel RCNN: A deep learning method for people detection using RGB-D images
CN114511878A (en) Visible light infrared pedestrian re-identification method based on multi-modal relational polymerization
Shi et al. An underground abnormal behavior recognition method based on an optimized alphapose-st-gcn
CN103577804A (en) Abnormal human behavior identification method based on SIFT flow and hidden conditional random fields
Yin Object Detection Based on Deep Learning: A Brief Review
Muchtar et al. A unified smart surveillance system incorporating adaptive foreground extraction and deep learning-based classification
Mursalin et al. Deep learning for 3D ear detection: A complete pipeline from data generation to segmentation
Lu et al. Multimode gesture recognition algorithm based on convolutional long short-term memory network
Sarker et al. Transformer-Based Person Re-Identification: A Comprehensive Review
Zou et al. Research on human movement target recognition algorithm in complex traffic environment
Zhang et al. Two-stage domain adaptation for infrared ship target segmentation
Song et al. Application and evaluation of image-based information acquisition in railway transportation
Zhu et al. Learning camera invariant deep features for semi-supervised person re-identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant