CN117994821A - Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning - Google Patents
Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning Download PDFInfo
- Publication number
- CN117994821A CN117994821A CN202410406090.8A CN202410406090A CN117994821A CN 117994821 A CN117994821 A CN 117994821A CN 202410406090 A CN202410406090 A CN 202410406090A CN 117994821 A CN117994821 A CN 117994821A
- Authority
- CN
- China
- Prior art keywords
- mode
- visible light
- infrared
- contrast
- pedestrian
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000006870 function Effects 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 13
- 238000013507 mapping Methods 0.000 claims description 26
- 230000004927 fusion Effects 0.000 claims description 8
- 238000010276 construction Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 2
- 230000035945 sensitivity Effects 0.000 claims description 2
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000003909 pattern recognition Methods 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/143—Sensing or illuminating at different wavelengths
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Computational Linguistics (AREA)
- Library & Information Science (AREA)
- Biodiversity & Conservation Biology (AREA)
- Human Computer Interaction (AREA)
- Image Processing (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention belongs to the field of computer vision and pattern recognition, and is applied to the field of intelligent security, in particular to a visible light-infrared cross-mode pedestrian re-recognition method based on information compensation contrast learning. The hybrid mode contrast learning loss function designed by the invention can map the visible light contrast codes generated by the network through training contrast learning codes, the visible light intermediate mode contrast codes, the infrared contrast codes and the infrared intermediate mode contrast learning codes, maximize the mutual information between the visible light contrast codes and the infrared contrast codes, and fully enable the network to dig out the characteristic information which is beneficial to the improvement of the identity recognition capability.
Description
Technical Field
The invention belongs to the field of computer vision and pattern recognition, and is applied to the field of intelligent security, in particular to a visible light-infrared cross-mode pedestrian re-recognition method based on information compensation contrast learning.
Background
The pedestrian re-identification technology is mainly applied to the field of security and protection and is used for matching pedestrian images consistent with the identity of a target pedestrian between non-overlapping cameras. The common task of pedestrian re-identification is based on a set of visible light images acquired by a visible light camera. Because the visible light camera depends on better illumination conditions, the application of the pedestrian re-identification technology in the security field is limited. The infrared camera is specially used for solving the imaging problem under the condition of darker light, and becomes a supplementary scheme of the visible light camera in security monitoring. Therefore, the combined scheme of the visible light-infrared camera is widely applied to front-end construction of modern security, and provides a sufficient facility foundation for development of visible light-infrared cross-mode pedestrian re-identification.
The visible light-infrared camera defaults to a visible light acquisition mode, and is automatically switched to an infrared mode at night or when light is dark, so that visible light and infrared images cannot be acquired at the same time, namely a cross-mode image pair which is lack of matching is obtained. Therefore, in the task of re-identifying the cross-modal pedestrians, the problem of complex inter-class changes caused by the posture, shielding, camera angles, variable light conditions and the like of the pedestrians needs to be solved, and more complex modal difference changes caused by different imaging principles also needs to be solved. The current mainstream research method mainly learns discriminant and reduces modal differences as much as possible by designing different network structures or loss functions, but the modal differences are huge and the lack of cross-modal image pairs causes the learning of networks to be very challenging.
Disclosure of Invention
The technical solution of the invention is as follows: the visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning is provided to overcome the defects of the prior art. By maximizing mutual information among positive samples, high-level semantic modality invariance is fully utilized, and finally, the characteristic with higher cross-modality matching performance is generated. Two kinds of information compensation based on feature level are realized, including identification force information compensation and information compensation with unchanged cross-modal content, and on the basis, the contrast learning of the design hybrid mode fully excavates the high-level semantic consistency features, so that the features with higher cross-modal matching performance can be generated.
The technical scheme of the invention is as follows:
A visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning comprises the following steps:
the method comprises the steps that firstly, a visible light mode pedestrian snapshot is obtained through a visible light camera, and an infrared mode pedestrian snapshot is obtained through an infrared camera;
The second step, converting the visible light mode pedestrian snapshot obtained in the first step into a visible light middle mode snapshot, and converting the infrared mode pedestrian snapshot obtained in the first step into an infrared middle mode snapshot;
Mapping the visible light mode pedestrian snapshot obtained in the first step into a unified feature space to generate visible light mode embedding;
mapping the infrared mode pedestrian snapshot obtained in the first step into a unified feature space to generate infrared mode embedding;
Mapping the snapshot of the middle mode of the visible light obtained in the second step into a unified feature space to generate embedding of the middle mode of the visible light;
mapping the infrared intermediate mode snapshot obtained in the second step into a unified feature space to generate infrared intermediate mode embedding;
fourthly, fusing and connecting the visible light mode embedding and the visible light middle mode embedding in the unified feature space generated in the third step in series to generate visible light enhancement features after the identification force information compensation;
the infrared mode embedding and the infrared intermediate mode embedding in the unified feature space generated in the third step are used for fusion and series connection to generate infrared enhancement features after the identification force information compensation;
Fifthly, training the visible light enhancement features and the infrared enhancement features generated in the fourth step by using a cross entropy loss function and a triplet loss function together;
Step six, decoupling the visible light enhancement features trained in the step five into visible light mode features and visible light middle mode features;
Decoupling the infrared enhancement features trained in the fifth step into infrared mode features and infrared intermediate mode features;
A seventh step of inputting the visible light mode characteristics generated in the sixth step into a contrast learning code mapping network to generate a visible light mode contrast code;
Inputting the visible light intermediate mode characteristics generated in the sixth step into a contrast learning code mapping network to generate a visible light intermediate mode contrast code;
inputting the infrared mode characteristics generated in the sixth step into a contrast learning code mapping network to generate an infrared mode contrast code;
inputting the infrared intermediate mode characteristics generated in the sixth step into a contrast learning code mapping network to generate infrared intermediate mode contrast codes;
Eighth, training the visible light mode contrast code, the visible light intermediate mode contrast code, the infrared mode contrast code and the infrared intermediate mode contrast code generated in the seventh step by using a mixed mode contrast learning loss function;
and ninth, calculating cosine similarity between the target pedestrian and each pedestrian characteristic in the pedestrian retrieval library by using the visible light mode contrast code and the infrared mode contrast code after training in the eighth step, and sequencing the calculation results in a descending order, wherein the obtained Rank-1 is used as an optimal matching result.
In the second step, the visible light mode pedestrian snapshot is converted into a visible light middle mode snapshot through a middle mode construction module, and the infrared mode pedestrian snapshot is converted into an infrared middle mode snapshot;
The intermediate mode construction module comprises a preprocessing module, a mode encoder and a mode decoder;
The preprocessing module is used for converting the visible light pedestrian snapshot into a gray level diagram and converting the infrared pedestrian snapshot into a single-channel mode;
the modal encoder comprises a visible light modal encoder and an infrared modal encoder, each encoder comprises a convolution layer with the size of 1 multiplied by 1 and a ReLU layer, and the parameters of the two encoders (namely the visible light modal encoder and the infrared modal encoder) are independent;
The modal decoder comprises a visible light modal decoder and an infrared modal decoder, each decoder comprises a1 multiplied by 3 full connection layer and a ReLU layer, and two decoder parameters are shared;
In the third step, mapping the snapshot into a unified feature space through a three-branch network structure;
the visible light model pedestrian snapshot is used as a branch input;
the infrared mode pedestrian snapshot is used as a branch input;
Taking the visible light intermediate mode snapshot and the infrared intermediate mode snapshot as a branch input;
the three-branch network structure comprises a shallow network and a deep network;
the shallow layer network comprises a residual block, and parameters input by the three branches are independent;
The deep network adopts Resnet to pretrain on an ImageNet dataset, and parameters input by three branches are shared;
in the fourth step, the formula for fusion and series connection between the visible light mode embedding and the visible light middle mode embedding is as follows:
the formula for fusion and series connection of infrared mode embedding and infrared middle mode embedding is as follows:
Wherein, For representing visible light enhancement features,/>For representing infrared enhancement features,/>For representing characteristic series,/>Representing visible light mode embedding,/>Representing visible light intermediate modality embedding,/>Representing visible light mode embedding,/>Representing infrared intermediate modality embedding;
in the sixth step, the formula for decoupling the visible light enhancement feature into the visible light mode feature and the visible light intermediate mode feature is as follows:
the formula for decoupling the infrared-enhanced features into visible light mode features and visible light intermediate mode features is as follows:
Wherein, Representing visible light mode characteristics,/>Representing the middle mode characteristics of visible light,/>Representing infrared morphological characteristics,/>Representing infrared intermediate mode characteristics;
In the seventh step, the contrast learning code mapping network includes a full connection layer and a ReLU layer, and the contrast learning code mapping network is used for converting the feature dimension from 2048 dimensions to 512 dimension hybrid mode contrast code, and the formula is as follows:
Wherein, Representing visible light mode contrast coding,/>For contrast coding of visible light intermediate modes,/>For infrared mode contrast coding,/>The infrared intermediate mode contrast code is adopted;
In the eighth step, the hybrid mode contrast learning loss function is:
Wherein the method comprises the steps of Representing the total number of samples of visible light mode contrast coding, visible light intermediate mode contrast coding, infrared mode contrast coding and infrared intermediate mode contrast coding,/>Expressed therein as/>Sample/>。
UsingRepresents the/>Label of individual samples, then/>Positive samples for representing participation in contrast learning include all and/>, of the four contrast codes described abovePositive samples of identity.
For indicating removal/>All other samples except themselves.
Parameters (parameters)Scaling factors, which are measures of similarity between samples, are used to adjust sensitivity to differences between similar samples.
Advantageous effects
The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning can fully mine cross-mode high-level semantic consistency, and improves cross-mode matching performance.
According to the intermediate mode construction method designed by the invention, original mode data is mapped into images with consistent modes through simple mode encoding and decoding.
The invention designs two kinds of characteristic level information compensation based on the intermediate mode, including identification force information compensation and information compensation with unchanged cross-mode content, wherein the identification information of visible light and infrared characteristics can be equally enhanced, and the identification force of the characteristics with unchanged mode can be effectively improved by the aid of the information compensation based on the intermediate mode.
The hybrid mode contrast learning loss function designed by the invention can map the visible light contrast codes generated by the network through training contrast learning codes, the visible light intermediate mode contrast codes, the infrared contrast codes and the infrared intermediate mode contrast learning codes, maximize the mutual information between the visible light contrast codes and the infrared contrast codes, and fully enable the network to dig out the characteristic information which is beneficial to the improvement of the identity recognition capability.
Drawings
FIG. 1 is a schematic diagram of a three-branch network structure according to the present invention;
FIG. 2 is a schematic diagram of an intermediate modality building block;
FIG. 3 is a schematic diagram of the generated mixed mode positive and negative samples;
FIG. 4 is a graph of a comparative learning loss function.
Detailed Description
The invention is further described below with reference to the drawings and examples.
Examples
A visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning comprises the following steps:
the method comprises the steps that firstly, a visible light mode pedestrian snapshot is obtained through a visible light camera, and an infrared mode pedestrian snapshot is obtained through an infrared camera;
step two, converting the visible light mode pedestrian snapshot and the infrared mode pedestrian snapshot obtained in the step one into an intermediate mode snapshot through an intermediate mode construction module shown in fig. 2;
Mapping the visible light mode pedestrian snapshot, the middle mode pedestrian snapshot and the infrared mode pedestrian snapshot obtained in the first step and the second step into a unified feature space through a three-branch network structure shown in fig. 1, and generating visible light mode embedding, middle mode embedding and infrared mode embedding;
A fourth step of generating a visible light enhancement feature after the discrimination enhancement by using a visible light-middle modality embedding part corresponding to the middle modality embedding in the unified feature space generated in the third step to be in fusion and series with the visible light modality embedding; the corresponding infrared-middle mode embedding part in the middle mode embedding in the unified feature space in the third step is used for fusion and series connection with the infrared mode embedding to generate infrared enhancement features with enhanced recognition force;
fifthly, training the visible light enhancement features and the infrared enhancement features generated in the fourth step by using a cross entropy loss function and a triplet loss function;
Step six, decoupling the visible light enhancement features trained in the step five into visible light mode features and visible light middle mode features; decoupling the infrared enhancement features trained in the fifth step into infrared mode features and infrared intermediate mode features;
Seventh, using the visible light mode feature and the visible light middle mode feature generated in the sixth step, and using the infrared mode feature and the infrared middle mode feature; inputting the visible light mode contrast code into a contrast learning code mapping network, and generating a visible light intermediate mode contrast code, an infrared mode contrast code and an infrared intermediate mode contrast code; the positive and negative samples of the four generated hybrid mode contrast codes are shown in fig. 3;
eighth step, as shown in fig. 4, training the four hybrid mode contrast codes generated in the seventh step by using a hybrid mode contrast learning loss function;
and ninth, calculating cosine similarity between the pedestrian target and each pedestrian feature in the pedestrian retrieval library by using the visible light pattern contrast code and the infrared pattern contrast code after training in the eighth step, and sequencing the calculation results in a descending order to obtain Rank-1 as an optimal matching result.
The effects of the present invention will be described with reference to actual measurement data experiments. To evaluate the performance of the proposed method, experiments were performed using the public dataset SYSU-MM 01.
Training process:
Input: each training batch contains 4 pedestrians, each pedestrian randomly selects 4 visible light mode snapshots, and 4 infrared mode snapshots;
And (3) outputting: trained optimal model ;
Initializing: intermediate modality building module; Encoder/>; Contrast learning coding mapping network/>;
Step 1: building modules using intermediate modalitiesConverting the visible light mode pedestrian snapshot RGB and the infrared mode pedestrian snapshot IR into intermediate modes, and respectively generating the visible light intermediate mode pedestrian snapshot RGB-M and the infrared intermediate mode pedestrian snapshot IR-M
Step 2: inputting RGB, IR and (RGB-M, IR-M) into a three-branch network encoder E to generate visible light mode embeddingInfrared mode embedding/>And (visible light intermediate modality embedding/>Infrared intermediate modality embedding/>);
Step 3: by passing throughAnd/>Tandem/>And/>Tandem generation of visible light enhancement features/>And infrared enhanced features;
Step 4: enhancing features for visible lightAnd infrared enhanced features/>Calculating cross entropy loss and triplet loss/>;
Step 5: enhancing visible light featuresDecoupling to visible light modal features/>And visible light intermediate mode features; Infrared enhancement features/>Decoupling into infrared modal features/>And infrared intermediate modality characteristics/>;
Step 6: using visible light mode featuresVisible light intermediate modality characteristics/>Infrared modal characteristics/>And infrared intermediate modality characteristics/>Input into a contrast learning code mapping network Project to generate a mixed mode code
Step 7: encoding for mixed modesCalculating the contrast learning loss of the mixed mode;
Step 7: calculation ofUpdating model parameters/>, through back propagation and optimization;
Repeating the steps, and after 200 iterations, saving the model parameters with optimal effect as an optimal model。
The testing process comprises the following steps:
Step one: inputting the pedestrian snapshot acquired through the infrared camera and inputting the pedestrian snapshot into the optimal model ;
Step two: input modelInfrared mode contrast encoding/>;
Step three: inputting a model for all visible light pictures in a test search libraryGenerating visible light mode contrast code/>In which,/>And (5) comparing and coding the visible light patterns of the ith visible light picture sample in the search library.
Step four: calculation ofAnd/>Cosine similarity of each visible light sample, and performing descending order sorting, wherein Rank-1 is the optimal matching result.
In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. A visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning is characterized by comprising the following steps:
the method comprises the steps that firstly, a visible light mode pedestrian snapshot is obtained through a visible light camera, and an infrared mode pedestrian snapshot is obtained through an infrared camera;
The second step, converting the visible light mode pedestrian snapshot obtained in the first step into a visible light middle mode snapshot, and converting the infrared mode pedestrian snapshot obtained in the first step into an infrared middle mode snapshot;
Mapping the visible light mode pedestrian snapshot obtained in the first step into a unified feature space to generate visible light mode embedding;
mapping the infrared mode pedestrian snapshot obtained in the first step into a unified feature space to generate infrared mode embedding;
Mapping the snapshot of the middle mode of the visible light obtained in the second step into a unified feature space to generate embedding of the middle mode of the visible light;
mapping the infrared intermediate mode snapshot obtained in the second step into a unified feature space to generate infrared intermediate mode embedding;
fourthly, fusing and connecting the visible light mode embedding and the visible light middle mode embedding in the unified feature space generated in the third step in series to generate visible light enhancement features after the identification force information compensation;
the infrared mode embedding and the infrared intermediate mode embedding in the unified feature space generated in the third step are used for fusion and series connection to generate infrared enhancement features after the identification force information compensation;
Fifthly, training the visible light enhancement features and the infrared enhancement features generated in the fourth step by using a cross entropy loss function and a triplet loss function together;
Step six, decoupling the visible light enhancement features trained in the step five into visible light mode features and visible light middle mode features;
Decoupling the infrared enhancement features trained in the fifth step into infrared mode features and infrared intermediate mode features;
A seventh step of inputting the visible light mode characteristics generated in the sixth step into a contrast learning code mapping network to generate a visible light mode contrast code;
Inputting the visible light intermediate mode characteristics generated in the sixth step into a contrast learning code mapping network to generate a visible light intermediate mode contrast code;
inputting the infrared mode characteristics generated in the sixth step into a contrast learning code mapping network to generate an infrared mode contrast code;
inputting the infrared intermediate mode characteristics generated in the sixth step into a contrast learning code mapping network to generate infrared intermediate mode contrast codes;
The visible light intermediate mode contrast coding is used for realizing information compensation with unchanged cross-mode content;
the infrared intermediate mode contrast coding is used for realizing information compensation with unchanged cross-mode content;
Eighth, training the visible light mode contrast code, the visible light intermediate mode contrast code, the infrared mode contrast code and the infrared intermediate mode contrast code generated in the seventh step by using a mixed mode contrast learning loss function;
and ninth, calculating cosine similarity between the target pedestrian and each pedestrian characteristic in the pedestrian retrieval library by using the visible light mode contrast code and the infrared mode contrast code after training in the eighth step, and sequencing the calculation results in a descending order, wherein the obtained Rank-1 is used as an optimal matching result.
2. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 1, wherein the method is characterized by comprising the following steps of:
In the second step, the visible light mode pedestrian snapshot is converted into the visible light middle mode snapshot through the middle mode construction module, and the infrared mode pedestrian snapshot is converted into the infrared middle mode snapshot.
3. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:
The intermediate mode construction module comprises a preprocessing module, a mode encoder and a mode decoder;
The preprocessing module is used for converting the visible light pedestrian snapshot into a gray level diagram and converting the infrared pedestrian snapshot into a single-channel mode;
The modal encoder comprises a visible light modal encoder and an infrared modal encoder, each encoder comprises a convolution layer of 1 multiplied by 1 and a ReLU layer, and the two encoders are independent in parameters;
the modal decoder comprises a visible mode decoder and an infrared mode decoder, each decoder comprising a 1x 3 fully connected layer and one ReLU layer, the two decoder parameters being shared.
4. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:
In the third step, mapping the snapshot into a unified feature space through a three-branch network structure;
the visible light model pedestrian snapshot is used as a branch input;
the infrared mode pedestrian snapshot is used as a branch input;
Taking the visible light intermediate mode snapshot and the infrared intermediate mode snapshot as a branch input;
the three-branch network structure comprises a shallow network and a deep network;
the shallow layer network comprises a residual block, and parameters input by the three branches are independent;
The deep network adopts Resnet to pretrain on an ImageNet dataset, and parameters input by three branches are shared.
5. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:
in the fourth step, the formula for fusion and series connection between the visible light mode embedding and the visible light middle mode embedding is as follows:
the formula for fusion and series connection of infrared mode embedding and infrared middle mode embedding is as follows:
Wherein, For representing visible light enhancement features,/>For representing infrared enhancement features,/>For representing characteristic series,/>Representing visible light mode embedding,/>Representing visible light intermediate modality embedding,/>Representing visible light mode embedding,/>Indicating infrared intermediate modality embedding.
6. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:
in the sixth step, the formula for decoupling the visible light enhancement feature into the visible light mode feature and the visible light intermediate mode feature is as follows:
the formula for decoupling the infrared-enhanced features into visible light mode features and visible light intermediate mode features is as follows:
Wherein, Representing visible light mode characteristics,/>Representing the middle mode characteristics of visible light,/>The infrared-ray mode characteristics are represented,Representing infrared intermediate mode characteristics.
7. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:
In the seventh step, the contrast learning code mapping network includes a full connection layer and a ReLU layer, and the contrast learning code mapping network is used for converting the feature dimension from 2048 dimensions to 512 dimension hybrid mode contrast code, and the formula is as follows:
Wherein, Representing visible light mode contrast coding,/>For contrast coding of visible light intermediate modes,/>For infrared mode contrast coding,/>And (5) comparing and encoding the infrared intermediate modes.
8. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:
In the eighth step, the hybrid mode contrast learning loss function is:
Wherein the method comprises the steps of Representing the total number of samples of visible light mode contrast coding, visible light intermediate mode contrast coding, infrared mode contrast coding and infrared intermediate mode contrast coding,/>Expressed therein as/>Sample/>;
UsingRepresents the/>Label of individual samples, then/>Positive samples for representing participation in contrast learning include all and/>, of the four contrast codes described abovePositive samples of the same identity;
For indicating removal/> All other samples except themselves;
Parameters (parameters) Scaling factors, which are measures of similarity between samples, are used to adjust sensitivity to differences between similar samples.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410406090.8A CN117994821B (en) | 2024-04-07 | 2024-04-07 | Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410406090.8A CN117994821B (en) | 2024-04-07 | 2024-04-07 | Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117994821A true CN117994821A (en) | 2024-05-07 |
CN117994821B CN117994821B (en) | 2024-07-26 |
Family
ID=90901045
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410406090.8A Active CN117994821B (en) | 2024-04-07 | 2024-04-07 | Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117994821B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190180467A1 (en) * | 2017-12-11 | 2019-06-13 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for identifying and positioning objects around a vehicle |
CN115862064A (en) * | 2022-11-30 | 2023-03-28 | 中国人民公安大学 | Visible light-infrared cross-modal pedestrian re-identification method and system |
CN116311384A (en) * | 2023-05-16 | 2023-06-23 | 西安科技大学 | Cross-modal pedestrian re-recognition method and device based on intermediate mode and characterization learning |
CN117351518A (en) * | 2023-09-26 | 2024-01-05 | 武汉大学 | Method and system for identifying unsupervised cross-modal pedestrian based on level difference |
CN117576729A (en) * | 2023-11-27 | 2024-02-20 | 新疆大学 | Visible light-infrared pedestrian re-identification method based on multi-stage auxiliary learning |
CN117746467A (en) * | 2024-01-05 | 2024-03-22 | 南京信息工程大学 | Modal enhancement and compensation cross-modal pedestrian re-recognition method |
-
2024
- 2024-04-07 CN CN202410406090.8A patent/CN117994821B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190180467A1 (en) * | 2017-12-11 | 2019-06-13 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for identifying and positioning objects around a vehicle |
CN115862064A (en) * | 2022-11-30 | 2023-03-28 | 中国人民公安大学 | Visible light-infrared cross-modal pedestrian re-identification method and system |
CN116311384A (en) * | 2023-05-16 | 2023-06-23 | 西安科技大学 | Cross-modal pedestrian re-recognition method and device based on intermediate mode and characterization learning |
CN117351518A (en) * | 2023-09-26 | 2024-01-05 | 武汉大学 | Method and system for identifying unsupervised cross-modal pedestrian based on level difference |
CN117576729A (en) * | 2023-11-27 | 2024-02-20 | 新疆大学 | Visible light-infrared pedestrian re-identification method based on multi-stage auxiliary learning |
CN117746467A (en) * | 2024-01-05 | 2024-03-22 | 南京信息工程大学 | Modal enhancement and compensation cross-modal pedestrian re-recognition method |
Also Published As
Publication number | Publication date |
---|---|
CN117994821B (en) | 2024-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108520216B (en) | Gait image-based identity recognition method | |
CN111539255A (en) | Cross-modal pedestrian re-identification method based on multi-modal image style conversion | |
CN109635726B (en) | Landslide identification method based on combination of symmetric deep network and multi-scale pooling | |
CN112651940B (en) | Collaborative visual saliency detection method based on dual-encoder generation type countermeasure network | |
CN113743544A (en) | Cross-modal neural network construction method, pedestrian retrieval method and system | |
CN116798070A (en) | Cross-mode pedestrian re-recognition method based on spectrum sensing and attention mechanism | |
CN118116035B (en) | Modal imbalance characteristic conversion cross-modal pedestrian re-identification method | |
CN114898429B (en) | Thermal infrared-visible light cross-modal face recognition method | |
CN112766217A (en) | Cross-modal pedestrian re-identification method based on disentanglement and feature level difference learning | |
CN117333908A (en) | Cross-modal pedestrian re-recognition method based on attitude feature alignment | |
CN116452805A (en) | Transformer-based RGB-D semantic segmentation method of cross-modal fusion network | |
CN118115947A (en) | Cross-mode pedestrian re-identification method based on random color conversion and multi-scale feature fusion | |
CN116863223A (en) | Method for classifying remote sensing image scenes by embedding semantic attention features into Swin transform network | |
CN112836605B (en) | Near-infrared and visible light cross-modal face recognition method based on modal augmentation | |
CN117576729A (en) | Visible light-infrared pedestrian re-identification method based on multi-stage auxiliary learning | |
CN117994821B (en) | Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning | |
CN112330562A (en) | Heterogeneous remote sensing image transformation method and system | |
CN117173595A (en) | Unmanned aerial vehicle aerial image target detection method based on improved YOLOv7 | |
CN116168418A (en) | Multi-mode target perception and re-identification method for image | |
Niu et al. | Real-time recognition and location of indoor objects | |
CN117994822B (en) | Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion | |
Song et al. | A Semantic Segmentation Method for Road Environment Images Based on Hybrid Convolutional Auto-Encoder | |
CN117912099A (en) | Visible light-infrared cross-mode pedestrian re-identification method based on mode invariant feature enhancement | |
CN117409268A (en) | Scene recognition method and system based on mutual attention fusion and distillation mechanism | |
CN116343122A (en) | Cross-modal pedestrian re-recognition method based on multi-modal common feature space exploration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |