CN116523969B - MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method - Google Patents
MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method Download PDFInfo
- Publication number
- CN116523969B CN116523969B CN202310772990.XA CN202310772990A CN116523969B CN 116523969 B CN116523969 B CN 116523969B CN 202310772990 A CN202310772990 A CN 202310772990A CN 116523969 B CN116523969 B CN 116523969B
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- feature
- mode
- feature map
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 79
- 238000010586 diagram Methods 0.000 claims abstract description 30
- 230000000903 blocking effect Effects 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims abstract description 9
- 239000011159 matrix material Substances 0.000 claims description 21
- 235000019580 granularity Nutrition 0.000 claims description 11
- 238000005065 mining Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 230000003213 activating effect Effects 0.000 claims 2
- 230000000873 masking effect Effects 0.000 claims 1
- 230000000694 effects Effects 0.000 description 14
- 238000011160 research Methods 0.000 description 14
- 238000012360 testing method Methods 0.000 description 10
- 230000004913 activation Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 230000017105 transposition Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000005286 illumination Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000002411 adverse Effects 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 102100040160 Rabankyrin-5 Human genes 0.000 description 1
- 101710086049 Rabankyrin-5 Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/143—Sensing or illuminating at different wavelengths
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention relates to an infrared-visible light cross-mode pedestrian re-identification method based on MSCFM and MGFE. The method comprises the following steps: constructing a visible light mode image set and an infrared mode image set; images of the same pedestrian in two modes are simultaneously input into a convolution network to extract feature images; splicing a visible light mode characteristic diagram and an infrared mode characteristic diagram; digging sharing characteristics of the same pedestrian in different modes; carrying out global context modeling on the structure of the pedestrian; splicing the obtained self-attention-enhanced feature graphs in batch dimensions, and sending the spliced feature graphs into a shared feature embedding network for encoding; the multi-granularity feature enhancement module is used for conducting different granularity blocking operation on the feature map to guide the network to pay attention to the fine granularity information with identification contained in the feature map, and iterative training is conducted to obtain a final model; and acquiring the image data to be identified, and identifying the image data to be identified by utilizing the finally obtained pedestrian re-identification model.
Description
Technical Field
The invention relates to an infrared-visible light cross-mode pedestrian re-identification method based on MSCFM and MGFE, and belongs to the technical field of image identification.
Background
Pedestrian re-recognition technology aims at solving the problem of pedestrian retrieval across cameras. In recent years, this technology has achieved many research results and plays an important role in some practical applications. However, most pedestrian re-recognition tasks are currently studied to search for pedestrians in a visible light mode, and good pedestrian recognition effects cannot be obtained in special environments such as insufficient illumination and night. With the continuous popularization of the dual-mode monitoring system, people can effectively acquire the video information of pedestrians through the visible light mode and the infrared light mode of the system under the conditions of normal illumination and insufficient illumination. In order to meet the requirements of specific scenes and specific environments, in recent years, researchers have conducted a great deal of research on infrared-visible light cross-modal pedestrian re-identification (VI-ReID) technology. VI-ReID is given a query image of a pedestrian in one mode, features are extracted through an Re-ID model, and then a pedestrian target with the same identity can be found in a gallery image set in the other mode, so that a cross-mode pedestrian retrieval task is completed. Since the imaging modes of the visible light image and the infrared light image are different, the characteristics of pedestrians are greatly different among different modes. In addition, the appearance characteristics of pedestrians are greatly different in the same mode under the influence of uncontrollable factors such as shooting visual angles, posture changes, shielding and the like. Therefore, how to overcome the problems, the network is enabled to effectively extract the distinguishing characteristics of pedestrians in two modes is a key for carrying out infrared-visible light cross-mode pedestrian re-identification research. Summarizing the work of researchers at present, methods for effectively extracting the discriminative features of pedestrians in two modes are mainly divided into a research method based on countermeasure generation, a research method based on 'middle mode' guidance, a research method based on high-order semantic information mining, a research method based on attention mechanisms, a research method based on mining the fine granularity features of pedestrians and a mixed method combining the above. The research method based on the countermeasure generation is to realize the modal information migration through the countermeasure game between the generator and the discriminator, and gradually eliminate the modal difference. However, such methods are not only highly dependent on the quality of the generated image, but also easily introduce additional modal information to exacerbate modal differences, limiting further improvement of model performance. The research method based on the 'middle mode' guidance is to reduce the gap between the infrared light mode and the visible light mode by introducing 'middle mode' information as a 'bridge', and is a common VI-ReID method. However, introducing "intermediate modality" information will inevitably introduce interference information, while also introducing additional computational consumption.
The research method based on the high-order semantic information mining is to get rid of the influence of modal information, such as structural relation, high-frequency information and the like, by mining the high-order relation contained among the pedestrian characteristics of different modes. Such methods typically introduce additional auxiliary models when extracting higher order feature relationships, such as using a key point model to obtain structural relationship features, etc. However, the auxiliary model is very susceptible to actual conditions such as shielding, low resolution and the like existing in the sample, so that the detection accuracy of the auxiliary model is not high, which not only increases the complexity of an algorithm, but also reduces the recognition efficiency of the training model.
The research method based on the attention mechanism is to guide the feature map to highlight important discriminative information and inhibit unimportant information by adopting the attention mechanism, so as to complete the extraction of the mode consistency feature. However, in this process, the presence of features with weak attention scores and no strong discrimination can cause serious interference in extracting significant discriminatory features.
The research method based on the fine granularity characteristic of the excavated pedestrians aims at solving the problem that the model can not extract the identification information of pedestrians because the appearance characteristic of the pedestrians in the same mode is influenced by uncontrollable factors such as shooting view angles, posture changes, shielding and the like. Many workers direct the network to focus on the discriminative information of each local feature by blocking the feature map. However, it is not advantageous to extract the identifying characteristics of pedestrians because the blocking manner is too single.
Disclosure of Invention
In order to solve the problems, the invention provides an infrared-visible light cross-mode pedestrian re-identification method based on MSCFM and MGFE, which can more accurately identify pedestrians and improve identification accuracy.
The technical scheme of the invention is as follows: an infrared-visible light cross-mode pedestrian re-identification method based on MSCFM and MGFE comprises the following steps:
step 1: constructing a visible light mode image set and an infrared mode image set;
step 2: images of the same pedestrian in two modes are simultaneously input into a convolution network to extract feature images;
step 3: splicing a visible light mode characteristic diagram and an infrared mode characteristic diagram;
step 4: the K large-value mask cross attention is utilized to mine sharing characteristics of the same pedestrian under different modes;
step 5: using a self-attention method to model the global context relation of the pedestrian structure under the same mode;
step 6: splicing the obtained self-attention-enhanced feature graphs in batch dimensions, and sending the spliced feature graphs into a shared feature embedding network for encoding;
step 7: the multi-granularity feature enhancement module is used for conducting different granularity blocking operation on the feature map to guide the network to pay attention to the fine granularity information with identification contained in the feature map, and a final pedestrian re-identification model is obtained;
step 8: and acquiring the image data to be identified, and identifying the image data to be identified by utilizing the finally obtained pedestrian re-identification model.
Further, in the step 2, the images of the same pedestrian in two modes need to be input into the network together for processing, that is, the identities of the pedestrian images processed by the two-path convolution network each time are in one-to-one correspondence.
Further, in the step 3, a query feature map of the visible light modality is obtainedAnd key feature map in infrared modeObtaining semantic relationship attention diagram through matrix multiplicationThe formula is as follows:
wherein ,for the sigmoid activation function,for the matrix transposition operation,for the matrix multiplication operation,representing the semantic relationship between the query modality feature map and the queried modality feature map pixels.
Further, in the step 4, a K-value mask operation is performed on the element values of each row, that is, the semantic relation value of the previous K-value in each row is reserved, and the semantic relation value lower than the K-value is set to zero, so as to extract the cross-modal significant sharing information of the pedestrian.
Further, in the step 5, the enhanced cross-modal feature map output by the K-value mask cross-attention module is sent to a self-attention module, where the attention module is configured to obtain a relationship between pixel points in the feature map, and apply a weight of the relationship to the feature map, so as to enhance the discriminative feature of the feature map in a global angle; for enhanced visible light modal feature mapWill beInput to a value convolution layerQuery convolution layerAnd key convolution layerObtaining a value characteristic diagram in a visible light modeQuery feature mapAnd key feature mapThe method comprises the steps of carrying out a first treatment on the surface of the Query feature map of visible light mode through formulaAnd key feature mapPerforming matrix multiplication operation to obtain self-attention relationship diagram,The formula is as follows:
wherein ,for the sigmoid activation function,for the matrix transposition operation,for the matrix multiplication operation,each row in (a) represents an input feature mapSemantic relationships between the middle pixel points;
self-attention relationship graphAnd value feature mapPerforming matrix multiplication operation to obtain a result, and inputting the result into a convolution layerRestoring the original channel size and then combining the channel size with the value characteristic diagramResidual operation is carried out, and finally, a characteristic diagram enhanced by self-attention is obtainedThe method comprises the steps of carrying out a first treatment on the surface of the Similarly, obtain the infrared light mode;
。
Further, in the step 6, a mode significant consistency feature mining module (MSCFM) is configured to mine a feature map in a visible light modeFeature map in infrared light modeAfter splicing in batch dimension, sending the batch dimension into a shared feature embedded networkCoding, obtaining feature vectors through a batch normalization layer BN and a global average pooling layer GAPAnd identity constraint is carried out on the cross entropy loss, wherein the loss function is as follows:
wherein ,is the firstThe true identity tag to which the individual samples correspond,represent the firstThe probability scores obtained by the classifier for each sample,is an identity classifier of the MSCFM module,represented as a batch normalization layer,represented as a global average pooling layer.
Further, in the step 7, in order to achieve the purpose of extracting comprehensive pedestrian identification characteristics and relieving intra-class differences in the same mode, the method provides a multi-granularity characteristic enhancement Module (MGFE); the multi-granularity feature enhancement module MGFE will be embedded into the network by the shared featureThe feature map obtained by encoding is divided in the height dimensionAnd performing horizontal blocking operations with different granularities.
The beneficial effects of the invention are as follows:
1. the method comprises the steps that a mode significant consistency feature mining module is provided, and the mode significant consistency feature mining module eliminates adverse effects of weak identification information and non-identification information on extraction of cross-mode features by carrying out mask operation on attention scores, so that the significance of identification features in the attention scores is improved, and pedestrian feature extraction with mode significant consistency is realized;
2. the multi-granularity characteristic enhancement module is provided, the multi-granularity local characteristics are extracted by utilizing a diversified blocking mode, identity constraint is added for each local characteristic, adverse effects of local semantic misalignment caused by problems of shielding, low resolution and the like are eliminated, and the identity discrimination of the characteristics is improved, so that the discrimination characteristics of pedestrians are enhanced;
3. an effective inter-mode re-identification framework is constructed, and mutual re-identification between visible light and infrared pedestrian pictures is realized;
4. the invention can be applied to pedestrian recognition technology and can be used in the fields of video monitoring, intelligent security, site management and the like. By introducing the mode significant consistency feature mining module and the multi-granularity feature enhancement module, the pedestrian can be identified more accurately, and the identification accuracy is improved; compared with the existing other pedestrian re-identification methods, the pedestrian re-identification method has higher accuracy.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a general frame diagram of the present invention;
FIG. 3 is a flow chart of a mode salient consistency feature mining module of the present invention;
FIG. 4 is a flow chart of a multi-granularity feature enhancement module of the present invention;
FIG. 5 is a graph of distance profiles of other methods (baseline) versus positive and negative pairs of samples of the proposed method; experimental effect on baseline on the left; the right side is the experimental effect of the method provided by the invention; the abscissa is the distance between the positive and negative pairs of samples and the baseline, and the ordinate is the logarithm of the samples;
FIG. 6 is a graph comparing performance of the proposed method on pedestrian retrieval results;
FIG. 7 is a class activation map comparison of the present invention for a baseline and the proposed method of the present invention.
Detailed Description
Example 1: as shown in fig. 1-7, an infrared-visible light cross-mode pedestrian re-identification method based on MSCFM and MGFE comprises the following steps:
step 1: constructing a visible light mode image set and an infrared mode image set; the picture data in the specific embodiment is taken from the common data sets SYSU-MM01 and RegDB;
SYSU-MM01 is a large public data set proposed for the research of the task of infrared-visible cross-modality pedestrian re-identification. The data set is divided into a training set and a test set to train and evaluate the proposed cross-modal model. The training set comprises 22258 visible light pedestrian images of 395 pedestrians shot by 4 visible light cameras, and the test set comprises 11909 infrared light pedestrian images of 96 pedestrians shot by 2 infrared cameras. In the test set, the query set consisted of 3803 infrared images, and the gallery set consisted of ten randomly sampled visible images, each set containing 301 images. In the evaluation phase, the data set has two test modes of all-search and index-search. In all-search test mode, the images in the gallery set consist of images taken by indoor and outdoor cameras. In the index-search test mode, the gallery set uses only images taken by an indoor camera. The final performance of this dataset was the average of 10 test experiments.
The RegDB dataset consisted of 8240 images of 254 female identities and 158 male identities. Wherein each pedestrian contains 10 infrared light images and 10 visible light images. During the training phase, the dataset uses a randomly selected 206-identity pedestrian image set as the training set, and the remaining 206-identity pedestrian image set is used as the test. In the evaluation phase, 10 random experiments will be repeated, taking the average of 10 groups of experiments as the final performance.
Step 2: images of the same pedestrian in two modes are simultaneously input into a convolution network to extract feature images; in the method, a double-flow network framework is adopted, and in the step 2, images of the same pedestrian in two modes are required to be input into a network together for processing, namely, identities of the pedestrian images processed by the two-path convolution network each time are in one-to-one correspondence.
Step 3: splicing a visible light mode characteristic diagram and an infrared mode characteristic diagram;
in the step 3, the query feature map of the visible light mode is obtainedAnd key feature map in infrared modeObtaining semantic relationship attention diagram through matrix multiplicationThe formula is as follows:
wherein ,for the sigmoid activation function,for the matrix transposition operation,for the matrix multiplication operation,representing the semantic relationship between the query modality feature map and the queried modality feature map pixels.
Step 4: the K large-value mask cross attention is utilized to mine sharing characteristics of the same pedestrian under different modes;
in the step 4, a K-value mask operation is performed on the element values of each row, that is, the semantic relation value of the previous K-value in each row is reserved, and the semantic relation value lower than the K-value is set to zero, so that cross-modal significant sharing information of pedestrians is extracted. During training, the range of K values is 0-2592.
Step 5: using a self-attention method to model the global context relation of the pedestrian structure under the same mode;
further, in the step 5, the enhanced cross-modal feature map output by the K-value mask cross-attention module is sent to a self-attention module, where the attention module is configured to obtain a relationship between pixel points in the feature map, and apply a weight of the relationship to the feature map, so as to enhance the discriminative feature of the feature map in a global angle; in an enhanced visible light mode profileFor example, willInput to a value convolution layerQuery convolution layerAnd key convolution layerObtaining a value characteristic diagram in a visible light modeQuery feature mapAnd key feature mapThe method comprises the steps of carrying out a first treatment on the surface of the Query feature map of visible light mode through formulaAnd key feature mapPerforming matrix multiplication operation to obtain self-attention relationship diagram,The formula is as follows:
wherein ,for the sigmoid activation function,for the matrix transposition operation,for the matrix multiplication operation,each row in (a) represents an input feature mapSemantic relationships between the middle pixel points;
self-attention relationship graphAnd value feature mapPerforming matrix multiplication operation to obtain a result, and inputting the result into a convolution layerRestoring the original channel size and then combining the channel size with the value characteristic diagramResidual operation is carried out, and finally, a characteristic diagram enhanced by self-attention is obtained;
Similarly, obtain the infrared light mode;
wherein ,wherein the characteristic diagram is a value characteristic diagram in an infrared light mode,a self-attention relationship diagram in an infrared light mode;
step 6: splicing the obtained self-attention-enhanced feature graphs in batch dimensions, and sending the spliced feature graphs into a shared feature embedding network for encoding;
further, in the step 6, a mode significant consistency feature mining module (MSCFM) is configured to mine a feature map in a visible light modeFeature map in infrared light modeAfter splicing in batch dimension, sending the batch dimension into a shared feature embedded networkCoding, obtaining feature vectors through a batch normalization layer BN and a global average pooling layer GAPAnd identity constraint is carried out on the cross entropy loss, wherein the loss function is as follows:
wherein ,is the firstThe true identity tag to which the individual samples correspond,represent the firstThe probability scores obtained by the classifier for each sample,is an identity classifier of the MSCFM module,represented as a batch normalization layer,represented as a global average pooling layer.
Step 7: in order to achieve the purpose of extracting comprehensive pedestrian identification characteristics and relieving intra-class differences in the same mode, the method provides a multi-granularity characteristic enhancement Module (MGFE); the multi-granularity feature enhancement module is used for conducting different granularity blocking operation on the feature map to guide the network to pay attention to the fine granularity information with identification contained in the feature map, and a final pedestrian re-identification model is obtained;
the multi-granularity feature enhancement module MGFE will be embedded into the network by the shared featureThe characteristic diagram obtained by coding is respectively subjected to horizontal blocking operations with different granularities in the height dimension to obtain,And. Wherein l, m, s respectively represent the feature mapThe horizontal division of 1,3,6 blocks,is a block index.
The blocking feature convolution channel of the MGFE module is set to 256, the batch size is set to 96, and the MGFE module is composed of pedestrian images in two modes, each mode containing 48 images of 6 pedestrians. In addition, all network parameters included in the experiment are optimized by adopting an SGD (generalized gateway) optimizer and matching with a wakeup strategy, wherein momentum is set to 0.9, weight_decade is set to 5 x 10 < -4 >, and the total training process is 60 generations.
Step 8: and acquiring the image data to be identified, and identifying the image data to be identified by utilizing the finally obtained pedestrian re-identification model.
The loss function formula of the invention is as follows:
wherein ,,,andthe weight of identity loss of each module is respectively given.To constrain the cross entropy penalty of the shared feature vector,,,the cross entropy loss when the number of the horizontal blocks is 1,3 and 6 respectively,,the triplet losses for the number of horizontal partitions of 1,6, respectively. Through experimental tests, the existence of the triple loss of the block 3 has little influence on the result.
In the present invention, two common indicators of pedestrian re-recognition are used to evaluate experimental performance, namely Cumulative Matching Curve (CMC) and average mean precision (mAP), respectively.
The Cumulative Matching Curve (CMC), also known as Rank-K value curve, calculates the top K hit ratios in the test results. For example, rank-1 reflects the top-ranked search target probabilities in the detection results. Rank-5 reflects the probability of including the search target in the first 5 search results among the detection results.
The average precision mean (mAP) refers to the mean value of the average prediction precision of each query image and is used for evaluating the overall effect of the pedestrian re-recognition algorithm, wherein AP refers to the average precision of one query sample and represents the effect of a model on a certain sample, and mAP refers to the average value of all query samples AP and represents the overall effect of the model on all query samples.
Table 1 shows the effect of the other methods on the SYSU-MM01 dataset
As shown in table 1. Wherein 19, 20, 21 and 22 in brackets in the left column respectively represent indexes of 2019, 2020, 2021 and 2022, and the performance of the method provided by the invention exceeds that of other methods in an all-search mode and an indoor search mode. Specifically, in the full search mode (all-search), the method provided by the invention achieves 64.40% and 60.70% in Rank-1 and mAP respectively, and exceeds the suboptimal methods DTRM 1.37% and 2.07% respectively. In the indoor search mode, the method provided by the invention obtains 68.08% of performance on a Rank-1 index and 72.26% of performance on a mAP index, which are respectively higher than 1.73% and 0.5% of that of a secondary method DTRM.
In order to more fully verify the performance of the proposed method, the proposed method is compared with other methods on the RegDB dataset.
Table 2 shows the effect of the other methods on the regDB dataset
Overall, the method of the present invention achieves significant advantages in both the "visible light image looking up infrared light image" (Visible To Infrared) and the "infrared light image looking up visible light image" (Infrared To Visible) experimental settings. Under the experimental setting of Visible-To-Inforred, the precision of the method is 95.45% in Rank-1 and 99.02% in mAP, which exceed 13.22% and 20.57% of the DTRM of the suboptimal method respectively. Under the experimental setting of Infrared To Visible, the method provided by the invention respectively obtains 94.20% and 98.81% recognition precision on Rank-1 and mAP, which exceed 15.27% and 23.23% of the inferior method DTRM.
FIG. 5 is a graph showing the distance distribution of other methods (baselines) from positive and negative pairs of samples of the proposed method, left is the experimental effect of the baselines, right is the experimental effect of the proposed method; specifically, as shown in fig. 5, the left side of the graph represents the distance distribution of the positive and negative pairs of samples of the baseline; the right hand side shows the characteristic distance distribution of the positive and negative pairs of samples of the method of the invention. It can be seen that the overlapping area of the feature distance distribution of the positive and negative sample pairs by the method is smaller, which shows that the method can extract pedestrian features with identifying mode consistency.
FIG. 6 is a graph showing the comparison of the performance of the proposed method on pedestrian retrieval results; in fig. 6 and 7, in order to prevent clear faces from appearing, the face portions are intentionally blurred, so that the blurring effect of the face portions is not an effect after the present invention is applied, and is not taken as a judgment of the performance of the present invention; as shown in fig. 6, the figure shows a visual effect diagram of a search pedestrian. Obviously, given the same pedestrian inquiry image, after adding part6 and part3, the first 5 search results in the search results of "base+mgfe" are more accurate than the Base method. After the MSCFM module is added on the basis of base+MGFE, the first 4 in the search result are all hits. part6 and part3 are cases where the multi-granularity feature enhancement Module (MGFE) is horizontally split into 6 blocks and 3 blocks, respectively.
FIG. 7 is a comparison of a baseline and class activation maps of the proposed method; the method of the invention visualizes pedestrian areas concerned by the model through a class activation mapping method. As shown in fig. 7, compared with the baseline method, the method of the present invention can prompt the network to pay attention to more comprehensive pedestrian identification characteristics in both the visible light mode and the infrared light mode, and the base line network is interfered by the background, so that the pedestrian identification characteristics cannot be extracted finally.
While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.
Claims (5)
1. The infrared-visible light cross-mode pedestrian re-identification method based on MSCFM and MGFE is characterized by comprising the following steps of: the method comprises the following steps:
step 1: constructing a visible light mode image set and an infrared mode image set;
step 2: images of the same pedestrian in two modes are simultaneously input into a convolution network to extract feature images;
step 3: splicing a visible light mode characteristic diagram and an infrared mode characteristic diagram;
step 4: the K large-value mask cross attention is utilized to mine sharing characteristics of the same pedestrian under different modes;
step 5: using a self-attention method to model the global context relation of the pedestrian structure under the same mode;
step 6: splicing the obtained self-attention-enhanced feature graphs in batch dimensions, and sending the spliced feature graphs into a shared feature embedding network for encoding;
step 7: the multi-granularity feature enhancement module is used for conducting different granularity blocking operation on the feature map to guide the network to pay attention to the fine granularity information with identification contained in the feature map, and a final pedestrian re-identification model is obtained;
step 8: acquiring image data to be identified, and identifying the image data to be identified by utilizing the finally obtained pedestrian re-identification model;
in the step 3, the query feature map of the visible light mode is obtainedAnd key feature map in the infrared mode +.>Obtaining a semantic relationship attention map by matrix multiplication>The formula is as follows:
;
wherein ,activating a function for sigmoid->For matrix transpose operation, +.>For matrix multiplication operations, +.>Each row in the list represents the semantic relation between the query mode characteristic image and the queried mode characteristic image pixel points;
in the step 4, performing K-value masking operation on the element values of each row, namely reserving the semantic relation value of the previous K-value in each row and setting the semantic relation value lower than the K-value to zero so as to extract the cross-modal significant sharing information of pedestrians; during training, the range of K values is 0-2592.
2. The infrared-visible cross-modal pedestrian re-identification method based on MSCFM and MGFE of claim 1, wherein: in the step 2, the images of the same pedestrian in two modes need to be input into the network together for processing, namely, the identities of the pedestrian images processed by the two-path convolution network each time are in one-to-one correspondence.
3. The infrared-visible cross-modal pedestrian re-identification method based on MSCFM and MGFE of claim 1, wherein: in the step 5, the enhanced cross-modal feature map output by the K-value mask cross-attention module is sent to a self-attention module, and the attention module is used for acquiring the relation between pixel points in the feature map and applying the relation weight on the feature map, so as to enhance the identification feature of the feature map in a global angle; for enhanced visible light modal feature mapWill->Input to a value convolution layer +.>Query convolution layer->And Key convolution layer->Obtaining a value characteristic diagram +.>Query feature map->And key feature map->The method comprises the steps of carrying out a first treatment on the surface of the Query feature map of visible light mode through formula +.>And key feature map->Performing matrix multiplication to obtain self-attention relationship diagram +.>,/>The formula is as follows:
;
wherein ,activating a function for sigmoid->For matrix transpose operation, +.>For matrix multiplication operations, +.>Each row of (a) represents an input feature map +.>Semantic relationships between the middle pixel points;
self-attention relationship graphAND value profile->Performing matrix multiplication to obtain a result, and inputting the result into a convolution layer +.>Restoring the original channel size, and then associating the channel size with the value characteristic diagram +.>Residual operation is carried out, and finally, a characteristic diagram enhanced by self-attention is obtained>The method comprises the steps of carrying out a first treatment on the surface of the Similarly, get +.>;
。
4. The infrared-visible cross-modal pedestrian re-identification method based on MSCFM and MGFE of claim 1, wherein: in the step 6, the mode significant consistency feature mining module is configured to mine a feature map in a visible light modeCharacteristic diagram in infrared light mode>After splicing in batch dimension, send into shared feature embedded network +.>Coding, obtaining a feature vector ++through a batch normalization layer BN and a global average pooling layer GAP>And identity constraint is carried out on the cross entropy loss, wherein the loss function is as follows:
;
wherein ,is->True identity tag corresponding to each sample, +.>Indicate->Probability score of individual samples by classifier, < ->Identity classifier for MSCFM module, < ->Expressed as a batch normalization layer,>represented as a global average pooling layer.
5. The infrared-visible cross-modal pedestrian re-identification method based on MSCFM and MGFE of claim 1, wherein: in the step 7, the multi-granularity feature enhancement module MGFE is embedded into the network by the shared featureAnd carrying out horizontal blocking operations with different granularities on the characteristic diagrams obtained by encoding in the height dimension.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310772990.XA CN116523969B (en) | 2023-06-28 | 2023-06-28 | MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310772990.XA CN116523969B (en) | 2023-06-28 | 2023-06-28 | MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116523969A CN116523969A (en) | 2023-08-01 |
CN116523969B true CN116523969B (en) | 2023-10-03 |
Family
ID=87406666
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310772990.XA Active CN116523969B (en) | 2023-06-28 | 2023-06-28 | MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116523969B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112446305A (en) * | 2020-11-10 | 2021-03-05 | 云南联合视觉科技有限公司 | Pedestrian re-identification method based on classification weight equidistant distribution loss model |
CN114220124A (en) * | 2021-12-16 | 2022-03-22 | 华南农业大学 | Near-infrared-visible light cross-modal double-flow pedestrian re-identification method and system |
CN114550315A (en) * | 2022-01-24 | 2022-05-27 | 云南联合视觉科技有限公司 | Identity comparison and identification method and device and terminal equipment |
CN115100678A (en) * | 2022-06-10 | 2022-09-23 | 河南大学 | Cross-modal pedestrian re-identification method based on channel recombination and attention mechanism |
CN115331112A (en) * | 2022-08-30 | 2022-11-11 | 中国电子科技集团公司第三十八研究所 | Infrared and visible light image fusion method and system based on multi-granularity word elements |
CN116311364A (en) * | 2023-03-07 | 2023-06-23 | 西安电子科技大学 | Multispectral pedestrian detection method based on cross-modal feature enhancement and confidence fusion |
-
2023
- 2023-06-28 CN CN202310772990.XA patent/CN116523969B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112446305A (en) * | 2020-11-10 | 2021-03-05 | 云南联合视觉科技有限公司 | Pedestrian re-identification method based on classification weight equidistant distribution loss model |
CN114220124A (en) * | 2021-12-16 | 2022-03-22 | 华南农业大学 | Near-infrared-visible light cross-modal double-flow pedestrian re-identification method and system |
CN114550315A (en) * | 2022-01-24 | 2022-05-27 | 云南联合视觉科技有限公司 | Identity comparison and identification method and device and terminal equipment |
CN115100678A (en) * | 2022-06-10 | 2022-09-23 | 河南大学 | Cross-modal pedestrian re-identification method based on channel recombination and attention mechanism |
CN115331112A (en) * | 2022-08-30 | 2022-11-11 | 中国电子科技集团公司第三十八研究所 | Infrared and visible light image fusion method and system based on multi-granularity word elements |
CN116311364A (en) * | 2023-03-07 | 2023-06-23 | 西安电子科技大学 | Multispectral pedestrian detection method based on cross-modal feature enhancement and confidence fusion |
Non-Patent Citations (3)
Title |
---|
AFT:Adaptive Fusion Transformer for visible and infrared Images;Chang ZH等;《IEEE Transaction on image processing》;2077-2092 * |
Infrared dim and small targets detection via self-attention mechanism and pipeline correlator;Yong Lan等;《Digital Signal Processing》;1-10 * |
Occluded Visible-Infrared Person Re-Identification;Yujian Feng等;《IEEE Transactions on Multimedia》;1401-1413 * |
Also Published As
Publication number | Publication date |
---|---|
CN116523969A (en) | 2023-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110717411A (en) | Pedestrian re-identification method based on deep layer feature fusion | |
CN110598543B (en) | Model training method based on attribute mining and reasoning and pedestrian re-identification method | |
CN112597866B (en) | Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method | |
WO2021082168A1 (en) | Method for matching specific target object in scene image | |
CN114067444A (en) | Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature | |
CN114973317A (en) | Pedestrian re-identification method based on multi-scale adjacent interaction features | |
CN115841683B (en) | Lightweight pedestrian re-identification method combining multi-level features | |
CN112084895B (en) | Pedestrian re-identification method based on deep learning | |
CN116523969B (en) | MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method | |
CN111241943B (en) | Scene recognition and loopback detection method based on background target and triple loss | |
CN115830637B (en) | Method for re-identifying blocked pedestrians based on attitude estimation and background suppression | |
CN115830643A (en) | Light-weight pedestrian re-identification method for posture-guided alignment | |
CN115050044B (en) | Cross-modal pedestrian re-identification method based on MLP-Mixer | |
CN116246305A (en) | Pedestrian retrieval method based on hybrid component transformation network | |
CN107358200B (en) | Multi-camera non-overlapping vision field pedestrian matching method based on sparse learning | |
CN112417961B (en) | Sea surface target detection method based on scene prior knowledge | |
CN114972146A (en) | Image fusion method and device based on generation countermeasure type double-channel weight distribution | |
Wu et al. | Research on license plate detection algorithm based on ssd | |
CN115690669A (en) | Cross-modal re-identification method based on feature separation and causal comparison loss | |
CN112487927A (en) | Indoor scene recognition implementation method and system based on object associated attention | |
Niu et al. | Real-time recognition and location of indoor objects | |
CN117078967B (en) | Efficient and lightweight multi-scale pedestrian re-identification method | |
CN114581984B (en) | Mask face recognition algorithm based on low-rank attention mechanism | |
CN117351518B (en) | Method and system for identifying unsupervised cross-modal pedestrian based on level difference | |
CN117351533A (en) | Attention knowledge distillation-based lightweight pedestrian re-identification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |