CN116523969B - MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method - Google Patents

MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method Download PDF

Info

Publication number
CN116523969B
CN116523969B CN202310772990.XA CN202310772990A CN116523969B CN 116523969 B CN116523969 B CN 116523969B CN 202310772990 A CN202310772990 A CN 202310772990A CN 116523969 B CN116523969 B CN 116523969B
Authority
CN
China
Prior art keywords
pedestrian
feature
mode
feature map
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310772990.XA
Other languages
Chinese (zh)
Other versions
CN116523969A (en
Inventor
齐冲冲
林旭
许乐
尚永乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan United Visual Technology Co ltd
Original Assignee
Yunnan United Visual Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan United Visual Technology Co ltd filed Critical Yunnan United Visual Technology Co ltd
Priority to CN202310772990.XA priority Critical patent/CN116523969B/en
Publication of CN116523969A publication Critical patent/CN116523969A/en
Application granted granted Critical
Publication of CN116523969B publication Critical patent/CN116523969B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/143Sensing or illuminating at different wavelengths
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention relates to an infrared-visible light cross-mode pedestrian re-identification method based on MSCFM and MGFE. The method comprises the following steps: constructing a visible light mode image set and an infrared mode image set; images of the same pedestrian in two modes are simultaneously input into a convolution network to extract feature images; splicing a visible light mode characteristic diagram and an infrared mode characteristic diagram; digging sharing characteristics of the same pedestrian in different modes; carrying out global context modeling on the structure of the pedestrian; splicing the obtained self-attention-enhanced feature graphs in batch dimensions, and sending the spliced feature graphs into a shared feature embedding network for encoding; the multi-granularity feature enhancement module is used for conducting different granularity blocking operation on the feature map to guide the network to pay attention to the fine granularity information with identification contained in the feature map, and iterative training is conducted to obtain a final model; and acquiring the image data to be identified, and identifying the image data to be identified by utilizing the finally obtained pedestrian re-identification model.

Description

MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method
Technical Field
The invention relates to an infrared-visible light cross-mode pedestrian re-identification method based on MSCFM and MGFE, and belongs to the technical field of image identification.
Background
Pedestrian re-recognition technology aims at solving the problem of pedestrian retrieval across cameras. In recent years, this technology has achieved many research results and plays an important role in some practical applications. However, most pedestrian re-recognition tasks are currently studied to search for pedestrians in a visible light mode, and good pedestrian recognition effects cannot be obtained in special environments such as insufficient illumination and night. With the continuous popularization of the dual-mode monitoring system, people can effectively acquire the video information of pedestrians through the visible light mode and the infrared light mode of the system under the conditions of normal illumination and insufficient illumination. In order to meet the requirements of specific scenes and specific environments, in recent years, researchers have conducted a great deal of research on infrared-visible light cross-modal pedestrian re-identification (VI-ReID) technology. VI-ReID is given a query image of a pedestrian in one mode, features are extracted through an Re-ID model, and then a pedestrian target with the same identity can be found in a gallery image set in the other mode, so that a cross-mode pedestrian retrieval task is completed. Since the imaging modes of the visible light image and the infrared light image are different, the characteristics of pedestrians are greatly different among different modes. In addition, the appearance characteristics of pedestrians are greatly different in the same mode under the influence of uncontrollable factors such as shooting visual angles, posture changes, shielding and the like. Therefore, how to overcome the problems, the network is enabled to effectively extract the distinguishing characteristics of pedestrians in two modes is a key for carrying out infrared-visible light cross-mode pedestrian re-identification research. Summarizing the work of researchers at present, methods for effectively extracting the discriminative features of pedestrians in two modes are mainly divided into a research method based on countermeasure generation, a research method based on 'middle mode' guidance, a research method based on high-order semantic information mining, a research method based on attention mechanisms, a research method based on mining the fine granularity features of pedestrians and a mixed method combining the above. The research method based on the countermeasure generation is to realize the modal information migration through the countermeasure game between the generator and the discriminator, and gradually eliminate the modal difference. However, such methods are not only highly dependent on the quality of the generated image, but also easily introduce additional modal information to exacerbate modal differences, limiting further improvement of model performance. The research method based on the 'middle mode' guidance is to reduce the gap between the infrared light mode and the visible light mode by introducing 'middle mode' information as a 'bridge', and is a common VI-ReID method. However, introducing "intermediate modality" information will inevitably introduce interference information, while also introducing additional computational consumption.
The research method based on the high-order semantic information mining is to get rid of the influence of modal information, such as structural relation, high-frequency information and the like, by mining the high-order relation contained among the pedestrian characteristics of different modes. Such methods typically introduce additional auxiliary models when extracting higher order feature relationships, such as using a key point model to obtain structural relationship features, etc. However, the auxiliary model is very susceptible to actual conditions such as shielding, low resolution and the like existing in the sample, so that the detection accuracy of the auxiliary model is not high, which not only increases the complexity of an algorithm, but also reduces the recognition efficiency of the training model.
The research method based on the attention mechanism is to guide the feature map to highlight important discriminative information and inhibit unimportant information by adopting the attention mechanism, so as to complete the extraction of the mode consistency feature. However, in this process, the presence of features with weak attention scores and no strong discrimination can cause serious interference in extracting significant discriminatory features.
The research method based on the fine granularity characteristic of the excavated pedestrians aims at solving the problem that the model can not extract the identification information of pedestrians because the appearance characteristic of the pedestrians in the same mode is influenced by uncontrollable factors such as shooting view angles, posture changes, shielding and the like. Many workers direct the network to focus on the discriminative information of each local feature by blocking the feature map. However, it is not advantageous to extract the identifying characteristics of pedestrians because the blocking manner is too single.
Disclosure of Invention
In order to solve the problems, the invention provides an infrared-visible light cross-mode pedestrian re-identification method based on MSCFM and MGFE, which can more accurately identify pedestrians and improve identification accuracy.
The technical scheme of the invention is as follows: an infrared-visible light cross-mode pedestrian re-identification method based on MSCFM and MGFE comprises the following steps:
step 1: constructing a visible light mode image set and an infrared mode image set;
step 2: images of the same pedestrian in two modes are simultaneously input into a convolution network to extract feature images;
step 3: splicing a visible light mode characteristic diagram and an infrared mode characteristic diagram;
step 4: the K large-value mask cross attention is utilized to mine sharing characteristics of the same pedestrian under different modes;
step 5: using a self-attention method to model the global context relation of the pedestrian structure under the same mode;
step 6: splicing the obtained self-attention-enhanced feature graphs in batch dimensions, and sending the spliced feature graphs into a shared feature embedding network for encoding;
step 7: the multi-granularity feature enhancement module is used for conducting different granularity blocking operation on the feature map to guide the network to pay attention to the fine granularity information with identification contained in the feature map, and a final pedestrian re-identification model is obtained;
step 8: and acquiring the image data to be identified, and identifying the image data to be identified by utilizing the finally obtained pedestrian re-identification model.
Further, in the step 2, the images of the same pedestrian in two modes need to be input into the network together for processing, that is, the identities of the pedestrian images processed by the two-path convolution network each time are in one-to-one correspondence.
Further, in the step 3, a query feature map of the visible light modality is obtainedAnd key feature map in infrared modeObtaining semantic relationship attention diagram through matrix multiplicationThe formula is as follows:
wherein ,for the sigmoid activation function,for the matrix transposition operation,for the matrix multiplication operation,representing the semantic relationship between the query modality feature map and the queried modality feature map pixels.
Further, in the step 4, a K-value mask operation is performed on the element values of each row, that is, the semantic relation value of the previous K-value in each row is reserved, and the semantic relation value lower than the K-value is set to zero, so as to extract the cross-modal significant sharing information of the pedestrian.
Further, in the step 5, the enhanced cross-modal feature map output by the K-value mask cross-attention module is sent to a self-attention module, where the attention module is configured to obtain a relationship between pixel points in the feature map, and apply a weight of the relationship to the feature map, so as to enhance the discriminative feature of the feature map in a global angle; for enhanced visible light modal feature mapWill beInput to a value convolution layerQuery convolution layerAnd key convolution layerObtaining a value characteristic diagram in a visible light modeQuery feature mapAnd key feature mapThe method comprises the steps of carrying out a first treatment on the surface of the Query feature map of visible light mode through formulaAnd key feature mapPerforming matrix multiplication operation to obtain self-attention relationship diagramThe formula is as follows:
wherein ,for the sigmoid activation function,for the matrix transposition operation,for the matrix multiplication operation,each row in (a) represents an input feature mapSemantic relationships between the middle pixel points;
self-attention relationship graphAnd value feature mapPerforming matrix multiplication operation to obtain a result, and inputting the result into a convolution layerRestoring the original channel size and then combining the channel size with the value characteristic diagramResidual operation is carried out, and finally, a characteristic diagram enhanced by self-attention is obtainedThe method comprises the steps of carrying out a first treatment on the surface of the Similarly, obtain the infrared light mode
Further, in the step 6, a mode significant consistency feature mining module (MSCFM) is configured to mine a feature map in a visible light modeFeature map in infrared light modeAfter splicing in batch dimension, sending the batch dimension into a shared feature embedded networkCoding, obtaining feature vectors through a batch normalization layer BN and a global average pooling layer GAPAnd identity constraint is carried out on the cross entropy loss, wherein the loss function is as follows:
wherein ,is the firstThe true identity tag to which the individual samples correspond,represent the firstThe probability scores obtained by the classifier for each sample,is an identity classifier of the MSCFM module,represented as a batch normalization layer,represented as a global average pooling layer.
Further, in the step 7, in order to achieve the purpose of extracting comprehensive pedestrian identification characteristics and relieving intra-class differences in the same mode, the method provides a multi-granularity characteristic enhancement Module (MGFE); the multi-granularity feature enhancement module MGFE will be embedded into the network by the shared featureThe feature map obtained by encoding is divided in the height dimensionAnd performing horizontal blocking operations with different granularities.
The beneficial effects of the invention are as follows:
1. the method comprises the steps that a mode significant consistency feature mining module is provided, and the mode significant consistency feature mining module eliminates adverse effects of weak identification information and non-identification information on extraction of cross-mode features by carrying out mask operation on attention scores, so that the significance of identification features in the attention scores is improved, and pedestrian feature extraction with mode significant consistency is realized;
2. the multi-granularity characteristic enhancement module is provided, the multi-granularity local characteristics are extracted by utilizing a diversified blocking mode, identity constraint is added for each local characteristic, adverse effects of local semantic misalignment caused by problems of shielding, low resolution and the like are eliminated, and the identity discrimination of the characteristics is improved, so that the discrimination characteristics of pedestrians are enhanced;
3. an effective inter-mode re-identification framework is constructed, and mutual re-identification between visible light and infrared pedestrian pictures is realized;
4. the invention can be applied to pedestrian recognition technology and can be used in the fields of video monitoring, intelligent security, site management and the like. By introducing the mode significant consistency feature mining module and the multi-granularity feature enhancement module, the pedestrian can be identified more accurately, and the identification accuracy is improved; compared with the existing other pedestrian re-identification methods, the pedestrian re-identification method has higher accuracy.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a general frame diagram of the present invention;
FIG. 3 is a flow chart of a mode salient consistency feature mining module of the present invention;
FIG. 4 is a flow chart of a multi-granularity feature enhancement module of the present invention;
FIG. 5 is a graph of distance profiles of other methods (baseline) versus positive and negative pairs of samples of the proposed method; experimental effect on baseline on the left; the right side is the experimental effect of the method provided by the invention; the abscissa is the distance between the positive and negative pairs of samples and the baseline, and the ordinate is the logarithm of the samples;
FIG. 6 is a graph comparing performance of the proposed method on pedestrian retrieval results;
FIG. 7 is a class activation map comparison of the present invention for a baseline and the proposed method of the present invention.
Detailed Description
Example 1: as shown in fig. 1-7, an infrared-visible light cross-mode pedestrian re-identification method based on MSCFM and MGFE comprises the following steps:
step 1: constructing a visible light mode image set and an infrared mode image set; the picture data in the specific embodiment is taken from the common data sets SYSU-MM01 and RegDB;
SYSU-MM01 is a large public data set proposed for the research of the task of infrared-visible cross-modality pedestrian re-identification. The data set is divided into a training set and a test set to train and evaluate the proposed cross-modal model. The training set comprises 22258 visible light pedestrian images of 395 pedestrians shot by 4 visible light cameras, and the test set comprises 11909 infrared light pedestrian images of 96 pedestrians shot by 2 infrared cameras. In the test set, the query set consisted of 3803 infrared images, and the gallery set consisted of ten randomly sampled visible images, each set containing 301 images. In the evaluation phase, the data set has two test modes of all-search and index-search. In all-search test mode, the images in the gallery set consist of images taken by indoor and outdoor cameras. In the index-search test mode, the gallery set uses only images taken by an indoor camera. The final performance of this dataset was the average of 10 test experiments.
The RegDB dataset consisted of 8240 images of 254 female identities and 158 male identities. Wherein each pedestrian contains 10 infrared light images and 10 visible light images. During the training phase, the dataset uses a randomly selected 206-identity pedestrian image set as the training set, and the remaining 206-identity pedestrian image set is used as the test. In the evaluation phase, 10 random experiments will be repeated, taking the average of 10 groups of experiments as the final performance.
Step 2: images of the same pedestrian in two modes are simultaneously input into a convolution network to extract feature images; in the method, a double-flow network framework is adopted, and in the step 2, images of the same pedestrian in two modes are required to be input into a network together for processing, namely, identities of the pedestrian images processed by the two-path convolution network each time are in one-to-one correspondence.
Step 3: splicing a visible light mode characteristic diagram and an infrared mode characteristic diagram;
in the step 3, the query feature map of the visible light mode is obtainedAnd key feature map in infrared modeObtaining semantic relationship attention diagram through matrix multiplicationThe formula is as follows:
wherein ,for the sigmoid activation function,for the matrix transposition operation,for the matrix multiplication operation,representing the semantic relationship between the query modality feature map and the queried modality feature map pixels.
Step 4: the K large-value mask cross attention is utilized to mine sharing characteristics of the same pedestrian under different modes;
in the step 4, a K-value mask operation is performed on the element values of each row, that is, the semantic relation value of the previous K-value in each row is reserved, and the semantic relation value lower than the K-value is set to zero, so that cross-modal significant sharing information of pedestrians is extracted. During training, the range of K values is 0-2592.
Step 5: using a self-attention method to model the global context relation of the pedestrian structure under the same mode;
further, in the step 5, the enhanced cross-modal feature map output by the K-value mask cross-attention module is sent to a self-attention module, where the attention module is configured to obtain a relationship between pixel points in the feature map, and apply a weight of the relationship to the feature map, so as to enhance the discriminative feature of the feature map in a global angle; in an enhanced visible light mode profileFor example, willInput to a value convolution layerQuery convolution layerAnd key convolution layerObtaining a value characteristic diagram in a visible light modeQuery feature mapAnd key feature mapThe method comprises the steps of carrying out a first treatment on the surface of the Query feature map of visible light mode through formulaAnd key feature mapPerforming matrix multiplication operation to obtain self-attention relationship diagramThe formula is as follows:
wherein ,for the sigmoid activation function,for the matrix transposition operation,for the matrix multiplication operation,each row in (a) represents an input feature mapSemantic relationships between the middle pixel points;
self-attention relationship graphAnd value feature mapPerforming matrix multiplication operation to obtain a result, and inputting the result into a convolution layerRestoring the original channel size and then combining the channel size with the value characteristic diagramResidual operation is carried out, and finally, a characteristic diagram enhanced by self-attention is obtained
Similarly, obtain the infrared light mode
wherein ,wherein the characteristic diagram is a value characteristic diagram in an infrared light mode,a self-attention relationship diagram in an infrared light mode;
step 6: splicing the obtained self-attention-enhanced feature graphs in batch dimensions, and sending the spliced feature graphs into a shared feature embedding network for encoding;
further, in the step 6, a mode significant consistency feature mining module (MSCFM) is configured to mine a feature map in a visible light modeFeature map in infrared light modeAfter splicing in batch dimension, sending the batch dimension into a shared feature embedded networkCoding, obtaining feature vectors through a batch normalization layer BN and a global average pooling layer GAPAnd identity constraint is carried out on the cross entropy loss, wherein the loss function is as follows:
wherein ,is the firstThe true identity tag to which the individual samples correspond,represent the firstThe probability scores obtained by the classifier for each sample,is an identity classifier of the MSCFM module,represented as a batch normalization layer,represented as a global average pooling layer.
Step 7: in order to achieve the purpose of extracting comprehensive pedestrian identification characteristics and relieving intra-class differences in the same mode, the method provides a multi-granularity characteristic enhancement Module (MGFE); the multi-granularity feature enhancement module is used for conducting different granularity blocking operation on the feature map to guide the network to pay attention to the fine granularity information with identification contained in the feature map, and a final pedestrian re-identification model is obtained;
the multi-granularity feature enhancement module MGFE will be embedded into the network by the shared featureThe characteristic diagram obtained by coding is respectively subjected to horizontal blocking operations with different granularities in the height dimension to obtainAnd. Wherein l, m, s respectively represent the feature mapThe horizontal division of 1,3,6 blocks,is a block index.
The blocking feature convolution channel of the MGFE module is set to 256, the batch size is set to 96, and the MGFE module is composed of pedestrian images in two modes, each mode containing 48 images of 6 pedestrians. In addition, all network parameters included in the experiment are optimized by adopting an SGD (generalized gateway) optimizer and matching with a wakeup strategy, wherein momentum is set to 0.9, weight_decade is set to 5 x 10 < -4 >, and the total training process is 60 generations.
Step 8: and acquiring the image data to be identified, and identifying the image data to be identified by utilizing the finally obtained pedestrian re-identification model.
The loss function formula of the invention is as follows:
wherein ,andthe weight of identity loss of each module is respectively given.To constrain the cross entropy penalty of the shared feature vector,the cross entropy loss when the number of the horizontal blocks is 1,3 and 6 respectively,the triplet losses for the number of horizontal partitions of 1,6, respectively. Through experimental tests, the existence of the triple loss of the block 3 has little influence on the result.
In the present invention, two common indicators of pedestrian re-recognition are used to evaluate experimental performance, namely Cumulative Matching Curve (CMC) and average mean precision (mAP), respectively.
The Cumulative Matching Curve (CMC), also known as Rank-K value curve, calculates the top K hit ratios in the test results. For example, rank-1 reflects the top-ranked search target probabilities in the detection results. Rank-5 reflects the probability of including the search target in the first 5 search results among the detection results.
The average precision mean (mAP) refers to the mean value of the average prediction precision of each query image and is used for evaluating the overall effect of the pedestrian re-recognition algorithm, wherein AP refers to the average precision of one query sample and represents the effect of a model on a certain sample, and mAP refers to the average value of all query samples AP and represents the overall effect of the model on all query samples.
Table 1 shows the effect of the other methods on the SYSU-MM01 dataset
As shown in table 1. Wherein 19, 20, 21 and 22 in brackets in the left column respectively represent indexes of 2019, 2020, 2021 and 2022, and the performance of the method provided by the invention exceeds that of other methods in an all-search mode and an indoor search mode. Specifically, in the full search mode (all-search), the method provided by the invention achieves 64.40% and 60.70% in Rank-1 and mAP respectively, and exceeds the suboptimal methods DTRM 1.37% and 2.07% respectively. In the indoor search mode, the method provided by the invention obtains 68.08% of performance on a Rank-1 index and 72.26% of performance on a mAP index, which are respectively higher than 1.73% and 0.5% of that of a secondary method DTRM.
In order to more fully verify the performance of the proposed method, the proposed method is compared with other methods on the RegDB dataset.
Table 2 shows the effect of the other methods on the regDB dataset
Overall, the method of the present invention achieves significant advantages in both the "visible light image looking up infrared light image" (Visible To Infrared) and the "infrared light image looking up visible light image" (Infrared To Visible) experimental settings. Under the experimental setting of Visible-To-Inforred, the precision of the method is 95.45% in Rank-1 and 99.02% in mAP, which exceed 13.22% and 20.57% of the DTRM of the suboptimal method respectively. Under the experimental setting of Infrared To Visible, the method provided by the invention respectively obtains 94.20% and 98.81% recognition precision on Rank-1 and mAP, which exceed 15.27% and 23.23% of the inferior method DTRM.
FIG. 5 is a graph showing the distance distribution of other methods (baselines) from positive and negative pairs of samples of the proposed method, left is the experimental effect of the baselines, right is the experimental effect of the proposed method; specifically, as shown in fig. 5, the left side of the graph represents the distance distribution of the positive and negative pairs of samples of the baseline; the right hand side shows the characteristic distance distribution of the positive and negative pairs of samples of the method of the invention. It can be seen that the overlapping area of the feature distance distribution of the positive and negative sample pairs by the method is smaller, which shows that the method can extract pedestrian features with identifying mode consistency.
FIG. 6 is a graph showing the comparison of the performance of the proposed method on pedestrian retrieval results; in fig. 6 and 7, in order to prevent clear faces from appearing, the face portions are intentionally blurred, so that the blurring effect of the face portions is not an effect after the present invention is applied, and is not taken as a judgment of the performance of the present invention; as shown in fig. 6, the figure shows a visual effect diagram of a search pedestrian. Obviously, given the same pedestrian inquiry image, after adding part6 and part3, the first 5 search results in the search results of "base+mgfe" are more accurate than the Base method. After the MSCFM module is added on the basis of base+MGFE, the first 4 in the search result are all hits. part6 and part3 are cases where the multi-granularity feature enhancement Module (MGFE) is horizontally split into 6 blocks and 3 blocks, respectively.
FIG. 7 is a comparison of a baseline and class activation maps of the proposed method; the method of the invention visualizes pedestrian areas concerned by the model through a class activation mapping method. As shown in fig. 7, compared with the baseline method, the method of the present invention can prompt the network to pay attention to more comprehensive pedestrian identification characteristics in both the visible light mode and the infrared light mode, and the base line network is interfered by the background, so that the pedestrian identification characteristics cannot be extracted finally.
While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (5)

1. The infrared-visible light cross-mode pedestrian re-identification method based on MSCFM and MGFE is characterized by comprising the following steps of: the method comprises the following steps:
step 1: constructing a visible light mode image set and an infrared mode image set;
step 2: images of the same pedestrian in two modes are simultaneously input into a convolution network to extract feature images;
step 3: splicing a visible light mode characteristic diagram and an infrared mode characteristic diagram;
step 4: the K large-value mask cross attention is utilized to mine sharing characteristics of the same pedestrian under different modes;
step 5: using a self-attention method to model the global context relation of the pedestrian structure under the same mode;
step 6: splicing the obtained self-attention-enhanced feature graphs in batch dimensions, and sending the spliced feature graphs into a shared feature embedding network for encoding;
step 7: the multi-granularity feature enhancement module is used for conducting different granularity blocking operation on the feature map to guide the network to pay attention to the fine granularity information with identification contained in the feature map, and a final pedestrian re-identification model is obtained;
step 8: acquiring image data to be identified, and identifying the image data to be identified by utilizing the finally obtained pedestrian re-identification model;
in the step 3, the query feature map of the visible light mode is obtainedAnd key feature map in the infrared mode +.>Obtaining a semantic relationship attention map by matrix multiplication>The formula is as follows:
wherein ,activating a function for sigmoid->For matrix transpose operation, +.>For matrix multiplication operations, +.>Each row in the list represents the semantic relation between the query mode characteristic image and the queried mode characteristic image pixel points;
in the step 4, performing K-value masking operation on the element values of each row, namely reserving the semantic relation value of the previous K-value in each row and setting the semantic relation value lower than the K-value to zero so as to extract the cross-modal significant sharing information of pedestrians; during training, the range of K values is 0-2592.
2. The infrared-visible cross-modal pedestrian re-identification method based on MSCFM and MGFE of claim 1, wherein: in the step 2, the images of the same pedestrian in two modes need to be input into the network together for processing, namely, the identities of the pedestrian images processed by the two-path convolution network each time are in one-to-one correspondence.
3. The infrared-visible cross-modal pedestrian re-identification method based on MSCFM and MGFE of claim 1, wherein: in the step 5, the enhanced cross-modal feature map output by the K-value mask cross-attention module is sent to a self-attention module, and the attention module is used for acquiring the relation between pixel points in the feature map and applying the relation weight on the feature map, so as to enhance the identification feature of the feature map in a global angle; for enhanced visible light modal feature mapWill->Input to a value convolution layer +.>Query convolution layer->And Key convolution layer->Obtaining a value characteristic diagram +.>Query feature map->And key feature map->The method comprises the steps of carrying out a first treatment on the surface of the Query feature map of visible light mode through formula +.>And key feature map->Performing matrix multiplication to obtain self-attention relationship diagram +.>,/>The formula is as follows:
wherein ,activating a function for sigmoid->For matrix transpose operation, +.>For matrix multiplication operations, +.>Each row of (a) represents an input feature map +.>Semantic relationships between the middle pixel points;
self-attention relationship graphAND value profile->Performing matrix multiplication to obtain a result, and inputting the result into a convolution layer +.>Restoring the original channel size, and then associating the channel size with the value characteristic diagram +.>Residual operation is carried out, and finally, a characteristic diagram enhanced by self-attention is obtained>The method comprises the steps of carrying out a first treatment on the surface of the Similarly, get +.>
4. The infrared-visible cross-modal pedestrian re-identification method based on MSCFM and MGFE of claim 1, wherein: in the step 6, the mode significant consistency feature mining module is configured to mine a feature map in a visible light modeCharacteristic diagram in infrared light mode>After splicing in batch dimension, send into shared feature embedded network +.>Coding, obtaining a feature vector ++through a batch normalization layer BN and a global average pooling layer GAP>And identity constraint is carried out on the cross entropy loss, wherein the loss function is as follows:
wherein ,is->True identity tag corresponding to each sample, +.>Indicate->Probability score of individual samples by classifier, < ->Identity classifier for MSCFM module, < ->Expressed as a batch normalization layer,>represented as a global average pooling layer.
5. The infrared-visible cross-modal pedestrian re-identification method based on MSCFM and MGFE of claim 1, wherein: in the step 7, the multi-granularity feature enhancement module MGFE is embedded into the network by the shared featureAnd carrying out horizontal blocking operations with different granularities on the characteristic diagrams obtained by encoding in the height dimension.
CN202310772990.XA 2023-06-28 2023-06-28 MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method Active CN116523969B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310772990.XA CN116523969B (en) 2023-06-28 2023-06-28 MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310772990.XA CN116523969B (en) 2023-06-28 2023-06-28 MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method

Publications (2)

Publication Number Publication Date
CN116523969A CN116523969A (en) 2023-08-01
CN116523969B true CN116523969B (en) 2023-10-03

Family

ID=87406666

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310772990.XA Active CN116523969B (en) 2023-06-28 2023-06-28 MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method

Country Status (1)

Country Link
CN (1) CN116523969B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446305A (en) * 2020-11-10 2021-03-05 云南联合视觉科技有限公司 Pedestrian re-identification method based on classification weight equidistant distribution loss model
CN114220124A (en) * 2021-12-16 2022-03-22 华南农业大学 Near-infrared-visible light cross-modal double-flow pedestrian re-identification method and system
CN114550315A (en) * 2022-01-24 2022-05-27 云南联合视觉科技有限公司 Identity comparison and identification method and device and terminal equipment
CN115100678A (en) * 2022-06-10 2022-09-23 河南大学 Cross-modal pedestrian re-identification method based on channel recombination and attention mechanism
CN115331112A (en) * 2022-08-30 2022-11-11 中国电子科技集团公司第三十八研究所 Infrared and visible light image fusion method and system based on multi-granularity word elements
CN116311364A (en) * 2023-03-07 2023-06-23 西安电子科技大学 Multispectral pedestrian detection method based on cross-modal feature enhancement and confidence fusion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446305A (en) * 2020-11-10 2021-03-05 云南联合视觉科技有限公司 Pedestrian re-identification method based on classification weight equidistant distribution loss model
CN114220124A (en) * 2021-12-16 2022-03-22 华南农业大学 Near-infrared-visible light cross-modal double-flow pedestrian re-identification method and system
CN114550315A (en) * 2022-01-24 2022-05-27 云南联合视觉科技有限公司 Identity comparison and identification method and device and terminal equipment
CN115100678A (en) * 2022-06-10 2022-09-23 河南大学 Cross-modal pedestrian re-identification method based on channel recombination and attention mechanism
CN115331112A (en) * 2022-08-30 2022-11-11 中国电子科技集团公司第三十八研究所 Infrared and visible light image fusion method and system based on multi-granularity word elements
CN116311364A (en) * 2023-03-07 2023-06-23 西安电子科技大学 Multispectral pedestrian detection method based on cross-modal feature enhancement and confidence fusion

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
AFT:Adaptive Fusion Transformer for visible and infrared Images;Chang ZH等;《IEEE Transaction on image processing》;2077-2092 *
Infrared dim and small targets detection via self-attention mechanism and pipeline correlator;Yong Lan等;《Digital Signal Processing》;1-10 *
Occluded Visible-Infrared Person Re-Identification;Yujian Feng等;《IEEE Transactions on Multimedia》;1401-1413 *

Also Published As

Publication number Publication date
CN116523969A (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
CN110598543B (en) Model training method based on attribute mining and reasoning and pedestrian re-identification method
CN112597866B (en) Knowledge distillation-based visible light-infrared cross-modal pedestrian re-identification method
WO2021082168A1 (en) Method for matching specific target object in scene image
CN114067444A (en) Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature
CN114973317A (en) Pedestrian re-identification method based on multi-scale adjacent interaction features
CN115841683B (en) Lightweight pedestrian re-identification method combining multi-level features
CN112084895B (en) Pedestrian re-identification method based on deep learning
CN116523969B (en) MSCFM and MGFE-based infrared-visible light cross-mode pedestrian re-identification method
CN111241943B (en) Scene recognition and loopback detection method based on background target and triple loss
CN115830637B (en) Method for re-identifying blocked pedestrians based on attitude estimation and background suppression
CN115830643A (en) Light-weight pedestrian re-identification method for posture-guided alignment
CN115050044B (en) Cross-modal pedestrian re-identification method based on MLP-Mixer
CN116246305A (en) Pedestrian retrieval method based on hybrid component transformation network
CN107358200B (en) Multi-camera non-overlapping vision field pedestrian matching method based on sparse learning
CN112417961B (en) Sea surface target detection method based on scene prior knowledge
CN114972146A (en) Image fusion method and device based on generation countermeasure type double-channel weight distribution
Wu et al. Research on license plate detection algorithm based on ssd
CN115690669A (en) Cross-modal re-identification method based on feature separation and causal comparison loss
CN112487927A (en) Indoor scene recognition implementation method and system based on object associated attention
Niu et al. Real-time recognition and location of indoor objects
CN117078967B (en) Efficient and lightweight multi-scale pedestrian re-identification method
CN114581984B (en) Mask face recognition algorithm based on low-rank attention mechanism
CN117351518B (en) Method and system for identifying unsupervised cross-modal pedestrian based on level difference
CN117351533A (en) Attention knowledge distillation-based lightweight pedestrian re-identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant