CN117994821A - Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning - Google Patents

Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning Download PDF

Info

Publication number
CN117994821A
CN117994821A CN202410406090.8A CN202410406090A CN117994821A CN 117994821 A CN117994821 A CN 117994821A CN 202410406090 A CN202410406090 A CN 202410406090A CN 117994821 A CN117994821 A CN 117994821A
Authority
CN
China
Prior art keywords
mode
visible light
infrared
contrast
pedestrian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410406090.8A
Other languages
Chinese (zh)
Other versions
CN117994821B (en
Inventor
张腊
孙健
王钢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202410406090.8A priority Critical patent/CN117994821B/en
Publication of CN117994821A publication Critical patent/CN117994821A/en
Application granted granted Critical
Publication of CN117994821B publication Critical patent/CN117994821B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/143Sensing or illuminating at different wavelengths
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Library & Information Science (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Human Computer Interaction (AREA)
  • Image Processing (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention belongs to the field of computer vision and pattern recognition, and is applied to the field of intelligent security, in particular to a visible light-infrared cross-mode pedestrian re-recognition method based on information compensation contrast learning. The hybrid mode contrast learning loss function designed by the invention can map the visible light contrast codes generated by the network through training contrast learning codes, the visible light intermediate mode contrast codes, the infrared contrast codes and the infrared intermediate mode contrast learning codes, maximize the mutual information between the visible light contrast codes and the infrared contrast codes, and fully enable the network to dig out the characteristic information which is beneficial to the improvement of the identity recognition capability.

Description

Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning
Technical Field
The invention belongs to the field of computer vision and pattern recognition, and is applied to the field of intelligent security, in particular to a visible light-infrared cross-mode pedestrian re-recognition method based on information compensation contrast learning.
Background
The pedestrian re-identification technology is mainly applied to the field of security and protection and is used for matching pedestrian images consistent with the identity of a target pedestrian between non-overlapping cameras. The common task of pedestrian re-identification is based on a set of visible light images acquired by a visible light camera. Because the visible light camera depends on better illumination conditions, the application of the pedestrian re-identification technology in the security field is limited. The infrared camera is specially used for solving the imaging problem under the condition of darker light, and becomes a supplementary scheme of the visible light camera in security monitoring. Therefore, the combined scheme of the visible light-infrared camera is widely applied to front-end construction of modern security, and provides a sufficient facility foundation for development of visible light-infrared cross-mode pedestrian re-identification.
The visible light-infrared camera defaults to a visible light acquisition mode, and is automatically switched to an infrared mode at night or when light is dark, so that visible light and infrared images cannot be acquired at the same time, namely a cross-mode image pair which is lack of matching is obtained. Therefore, in the task of re-identifying the cross-modal pedestrians, the problem of complex inter-class changes caused by the posture, shielding, camera angles, variable light conditions and the like of the pedestrians needs to be solved, and more complex modal difference changes caused by different imaging principles also needs to be solved. The current mainstream research method mainly learns discriminant and reduces modal differences as much as possible by designing different network structures or loss functions, but the modal differences are huge and the lack of cross-modal image pairs causes the learning of networks to be very challenging.
Disclosure of Invention
The technical solution of the invention is as follows: the visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning is provided to overcome the defects of the prior art. By maximizing mutual information among positive samples, high-level semantic modality invariance is fully utilized, and finally, the characteristic with higher cross-modality matching performance is generated. Two kinds of information compensation based on feature level are realized, including identification force information compensation and information compensation with unchanged cross-modal content, and on the basis, the contrast learning of the design hybrid mode fully excavates the high-level semantic consistency features, so that the features with higher cross-modal matching performance can be generated.
The technical scheme of the invention is as follows:
A visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning comprises the following steps:
the method comprises the steps that firstly, a visible light mode pedestrian snapshot is obtained through a visible light camera, and an infrared mode pedestrian snapshot is obtained through an infrared camera;
The second step, converting the visible light mode pedestrian snapshot obtained in the first step into a visible light middle mode snapshot, and converting the infrared mode pedestrian snapshot obtained in the first step into an infrared middle mode snapshot;
Mapping the visible light mode pedestrian snapshot obtained in the first step into a unified feature space to generate visible light mode embedding;
mapping the infrared mode pedestrian snapshot obtained in the first step into a unified feature space to generate infrared mode embedding;
Mapping the snapshot of the middle mode of the visible light obtained in the second step into a unified feature space to generate embedding of the middle mode of the visible light;
mapping the infrared intermediate mode snapshot obtained in the second step into a unified feature space to generate infrared intermediate mode embedding;
fourthly, fusing and connecting the visible light mode embedding and the visible light middle mode embedding in the unified feature space generated in the third step in series to generate visible light enhancement features after the identification force information compensation;
the infrared mode embedding and the infrared intermediate mode embedding in the unified feature space generated in the third step are used for fusion and series connection to generate infrared enhancement features after the identification force information compensation;
Fifthly, training the visible light enhancement features and the infrared enhancement features generated in the fourth step by using a cross entropy loss function and a triplet loss function together;
Step six, decoupling the visible light enhancement features trained in the step five into visible light mode features and visible light middle mode features;
Decoupling the infrared enhancement features trained in the fifth step into infrared mode features and infrared intermediate mode features;
A seventh step of inputting the visible light mode characteristics generated in the sixth step into a contrast learning code mapping network to generate a visible light mode contrast code;
Inputting the visible light intermediate mode characteristics generated in the sixth step into a contrast learning code mapping network to generate a visible light intermediate mode contrast code;
inputting the infrared mode characteristics generated in the sixth step into a contrast learning code mapping network to generate an infrared mode contrast code;
inputting the infrared intermediate mode characteristics generated in the sixth step into a contrast learning code mapping network to generate infrared intermediate mode contrast codes;
Eighth, training the visible light mode contrast code, the visible light intermediate mode contrast code, the infrared mode contrast code and the infrared intermediate mode contrast code generated in the seventh step by using a mixed mode contrast learning loss function;
and ninth, calculating cosine similarity between the target pedestrian and each pedestrian characteristic in the pedestrian retrieval library by using the visible light mode contrast code and the infrared mode contrast code after training in the eighth step, and sequencing the calculation results in a descending order, wherein the obtained Rank-1 is used as an optimal matching result.
In the second step, the visible light mode pedestrian snapshot is converted into a visible light middle mode snapshot through a middle mode construction module, and the infrared mode pedestrian snapshot is converted into an infrared middle mode snapshot;
The intermediate mode construction module comprises a preprocessing module, a mode encoder and a mode decoder;
The preprocessing module is used for converting the visible light pedestrian snapshot into a gray level diagram and converting the infrared pedestrian snapshot into a single-channel mode;
the modal encoder comprises a visible light modal encoder and an infrared modal encoder, each encoder comprises a convolution layer with the size of 1 multiplied by 1 and a ReLU layer, and the parameters of the two encoders (namely the visible light modal encoder and the infrared modal encoder) are independent;
The modal decoder comprises a visible light modal decoder and an infrared modal decoder, each decoder comprises a1 multiplied by 3 full connection layer and a ReLU layer, and two decoder parameters are shared;
In the third step, mapping the snapshot into a unified feature space through a three-branch network structure;
the visible light model pedestrian snapshot is used as a branch input;
the infrared mode pedestrian snapshot is used as a branch input;
Taking the visible light intermediate mode snapshot and the infrared intermediate mode snapshot as a branch input;
the three-branch network structure comprises a shallow network and a deep network;
the shallow layer network comprises a residual block, and parameters input by the three branches are independent;
The deep network adopts Resnet to pretrain on an ImageNet dataset, and parameters input by three branches are shared;
in the fourth step, the formula for fusion and series connection between the visible light mode embedding and the visible light middle mode embedding is as follows:
the formula for fusion and series connection of infrared mode embedding and infrared middle mode embedding is as follows:
Wherein, For representing visible light enhancement features,/>For representing infrared enhancement features,/>For representing characteristic series,/>Representing visible light mode embedding,/>Representing visible light intermediate modality embedding,/>Representing visible light mode embedding,/>Representing infrared intermediate modality embedding;
in the sixth step, the formula for decoupling the visible light enhancement feature into the visible light mode feature and the visible light intermediate mode feature is as follows:
the formula for decoupling the infrared-enhanced features into visible light mode features and visible light intermediate mode features is as follows:
Wherein, Representing visible light mode characteristics,/>Representing the middle mode characteristics of visible light,/>Representing infrared morphological characteristics,/>Representing infrared intermediate mode characteristics;
In the seventh step, the contrast learning code mapping network includes a full connection layer and a ReLU layer, and the contrast learning code mapping network is used for converting the feature dimension from 2048 dimensions to 512 dimension hybrid mode contrast code, and the formula is as follows:
Wherein, Representing visible light mode contrast coding,/>For contrast coding of visible light intermediate modes,/>For infrared mode contrast coding,/>The infrared intermediate mode contrast code is adopted;
In the eighth step, the hybrid mode contrast learning loss function is:
Wherein the method comprises the steps of Representing the total number of samples of visible light mode contrast coding, visible light intermediate mode contrast coding, infrared mode contrast coding and infrared intermediate mode contrast coding,/>Expressed therein as/>Sample/>
UsingRepresents the/>Label of individual samples, then/>Positive samples for representing participation in contrast learning include all and/>, of the four contrast codes described abovePositive samples of identity.
For indicating removal/>All other samples except themselves.
Parameters (parameters)Scaling factors, which are measures of similarity between samples, are used to adjust sensitivity to differences between similar samples.
Advantageous effects
The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning can fully mine cross-mode high-level semantic consistency, and improves cross-mode matching performance.
According to the intermediate mode construction method designed by the invention, original mode data is mapped into images with consistent modes through simple mode encoding and decoding.
The invention designs two kinds of characteristic level information compensation based on the intermediate mode, including identification force information compensation and information compensation with unchanged cross-mode content, wherein the identification information of visible light and infrared characteristics can be equally enhanced, and the identification force of the characteristics with unchanged mode can be effectively improved by the aid of the information compensation based on the intermediate mode.
The hybrid mode contrast learning loss function designed by the invention can map the visible light contrast codes generated by the network through training contrast learning codes, the visible light intermediate mode contrast codes, the infrared contrast codes and the infrared intermediate mode contrast learning codes, maximize the mutual information between the visible light contrast codes and the infrared contrast codes, and fully enable the network to dig out the characteristic information which is beneficial to the improvement of the identity recognition capability.
Drawings
FIG. 1 is a schematic diagram of a three-branch network structure according to the present invention;
FIG. 2 is a schematic diagram of an intermediate modality building block;
FIG. 3 is a schematic diagram of the generated mixed mode positive and negative samples;
FIG. 4 is a graph of a comparative learning loss function.
Detailed Description
The invention is further described below with reference to the drawings and examples.
Examples
A visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning comprises the following steps:
the method comprises the steps that firstly, a visible light mode pedestrian snapshot is obtained through a visible light camera, and an infrared mode pedestrian snapshot is obtained through an infrared camera;
step two, converting the visible light mode pedestrian snapshot and the infrared mode pedestrian snapshot obtained in the step one into an intermediate mode snapshot through an intermediate mode construction module shown in fig. 2;
Mapping the visible light mode pedestrian snapshot, the middle mode pedestrian snapshot and the infrared mode pedestrian snapshot obtained in the first step and the second step into a unified feature space through a three-branch network structure shown in fig. 1, and generating visible light mode embedding, middle mode embedding and infrared mode embedding;
A fourth step of generating a visible light enhancement feature after the discrimination enhancement by using a visible light-middle modality embedding part corresponding to the middle modality embedding in the unified feature space generated in the third step to be in fusion and series with the visible light modality embedding; the corresponding infrared-middle mode embedding part in the middle mode embedding in the unified feature space in the third step is used for fusion and series connection with the infrared mode embedding to generate infrared enhancement features with enhanced recognition force;
fifthly, training the visible light enhancement features and the infrared enhancement features generated in the fourth step by using a cross entropy loss function and a triplet loss function;
Step six, decoupling the visible light enhancement features trained in the step five into visible light mode features and visible light middle mode features; decoupling the infrared enhancement features trained in the fifth step into infrared mode features and infrared intermediate mode features;
Seventh, using the visible light mode feature and the visible light middle mode feature generated in the sixth step, and using the infrared mode feature and the infrared middle mode feature; inputting the visible light mode contrast code into a contrast learning code mapping network, and generating a visible light intermediate mode contrast code, an infrared mode contrast code and an infrared intermediate mode contrast code; the positive and negative samples of the four generated hybrid mode contrast codes are shown in fig. 3;
eighth step, as shown in fig. 4, training the four hybrid mode contrast codes generated in the seventh step by using a hybrid mode contrast learning loss function;
and ninth, calculating cosine similarity between the pedestrian target and each pedestrian feature in the pedestrian retrieval library by using the visible light pattern contrast code and the infrared pattern contrast code after training in the eighth step, and sequencing the calculation results in a descending order to obtain Rank-1 as an optimal matching result.
The effects of the present invention will be described with reference to actual measurement data experiments. To evaluate the performance of the proposed method, experiments were performed using the public dataset SYSU-MM 01.
Training process:
Input: each training batch contains 4 pedestrians, each pedestrian randomly selects 4 visible light mode snapshots, and 4 infrared mode snapshots;
And (3) outputting: trained optimal model
Initializing: intermediate modality building module; Encoder/>; Contrast learning coding mapping network/>
Step 1: building modules using intermediate modalitiesConverting the visible light mode pedestrian snapshot RGB and the infrared mode pedestrian snapshot IR into intermediate modes, and respectively generating the visible light intermediate mode pedestrian snapshot RGB-M and the infrared intermediate mode pedestrian snapshot IR-M
Step 2: inputting RGB, IR and (RGB-M, IR-M) into a three-branch network encoder E to generate visible light mode embeddingInfrared mode embedding/>And (visible light intermediate modality embedding/>Infrared intermediate modality embedding/>);
Step 3: by passing throughAnd/>Tandem/>And/>Tandem generation of visible light enhancement features/>And infrared enhanced features
Step 4: enhancing features for visible lightAnd infrared enhanced features/>Calculating cross entropy loss and triplet loss/>
Step 5: enhancing visible light featuresDecoupling to visible light modal features/>And visible light intermediate mode features; Infrared enhancement features/>Decoupling into infrared modal features/>And infrared intermediate modality characteristics/>
Step 6: using visible light mode featuresVisible light intermediate modality characteristics/>Infrared modal characteristics/>And infrared intermediate modality characteristics/>Input into a contrast learning code mapping network Project to generate a mixed mode code
Step 7: encoding for mixed modesCalculating the contrast learning loss of the mixed mode
Step 7: calculation ofUpdating model parameters/>, through back propagation and optimization
Repeating the steps, and after 200 iterations, saving the model parameters with optimal effect as an optimal model
The testing process comprises the following steps:
Step one: inputting the pedestrian snapshot acquired through the infrared camera and inputting the pedestrian snapshot into the optimal model
Step two: input modelInfrared mode contrast encoding/>
Step three: inputting a model for all visible light pictures in a test search libraryGenerating visible light mode contrast code/>In which,/>And (5) comparing and coding the visible light patterns of the ith visible light picture sample in the search library.
Step four: calculation ofAnd/>Cosine similarity of each visible light sample, and performing descending order sorting, wherein Rank-1 is the optimal matching result.
In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning is characterized by comprising the following steps:
the method comprises the steps that firstly, a visible light mode pedestrian snapshot is obtained through a visible light camera, and an infrared mode pedestrian snapshot is obtained through an infrared camera;
The second step, converting the visible light mode pedestrian snapshot obtained in the first step into a visible light middle mode snapshot, and converting the infrared mode pedestrian snapshot obtained in the first step into an infrared middle mode snapshot;
Mapping the visible light mode pedestrian snapshot obtained in the first step into a unified feature space to generate visible light mode embedding;
mapping the infrared mode pedestrian snapshot obtained in the first step into a unified feature space to generate infrared mode embedding;
Mapping the snapshot of the middle mode of the visible light obtained in the second step into a unified feature space to generate embedding of the middle mode of the visible light;
mapping the infrared intermediate mode snapshot obtained in the second step into a unified feature space to generate infrared intermediate mode embedding;
fourthly, fusing and connecting the visible light mode embedding and the visible light middle mode embedding in the unified feature space generated in the third step in series to generate visible light enhancement features after the identification force information compensation;
the infrared mode embedding and the infrared intermediate mode embedding in the unified feature space generated in the third step are used for fusion and series connection to generate infrared enhancement features after the identification force information compensation;
Fifthly, training the visible light enhancement features and the infrared enhancement features generated in the fourth step by using a cross entropy loss function and a triplet loss function together;
Step six, decoupling the visible light enhancement features trained in the step five into visible light mode features and visible light middle mode features;
Decoupling the infrared enhancement features trained in the fifth step into infrared mode features and infrared intermediate mode features;
A seventh step of inputting the visible light mode characteristics generated in the sixth step into a contrast learning code mapping network to generate a visible light mode contrast code;
Inputting the visible light intermediate mode characteristics generated in the sixth step into a contrast learning code mapping network to generate a visible light intermediate mode contrast code;
inputting the infrared mode characteristics generated in the sixth step into a contrast learning code mapping network to generate an infrared mode contrast code;
inputting the infrared intermediate mode characteristics generated in the sixth step into a contrast learning code mapping network to generate infrared intermediate mode contrast codes;
The visible light intermediate mode contrast coding is used for realizing information compensation with unchanged cross-mode content;
the infrared intermediate mode contrast coding is used for realizing information compensation with unchanged cross-mode content;
Eighth, training the visible light mode contrast code, the visible light intermediate mode contrast code, the infrared mode contrast code and the infrared intermediate mode contrast code generated in the seventh step by using a mixed mode contrast learning loss function;
and ninth, calculating cosine similarity between the target pedestrian and each pedestrian characteristic in the pedestrian retrieval library by using the visible light mode contrast code and the infrared mode contrast code after training in the eighth step, and sequencing the calculation results in a descending order, wherein the obtained Rank-1 is used as an optimal matching result.
2. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 1, wherein the method is characterized by comprising the following steps of:
In the second step, the visible light mode pedestrian snapshot is converted into the visible light middle mode snapshot through the middle mode construction module, and the infrared mode pedestrian snapshot is converted into the infrared middle mode snapshot.
3. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:
The intermediate mode construction module comprises a preprocessing module, a mode encoder and a mode decoder;
The preprocessing module is used for converting the visible light pedestrian snapshot into a gray level diagram and converting the infrared pedestrian snapshot into a single-channel mode;
The modal encoder comprises a visible light modal encoder and an infrared modal encoder, each encoder comprises a convolution layer of 1 multiplied by 1 and a ReLU layer, and the two encoders are independent in parameters;
the modal decoder comprises a visible mode decoder and an infrared mode decoder, each decoder comprising a 1x 3 fully connected layer and one ReLU layer, the two decoder parameters being shared.
4. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:
In the third step, mapping the snapshot into a unified feature space through a three-branch network structure;
the visible light model pedestrian snapshot is used as a branch input;
the infrared mode pedestrian snapshot is used as a branch input;
Taking the visible light intermediate mode snapshot and the infrared intermediate mode snapshot as a branch input;
the three-branch network structure comprises a shallow network and a deep network;
the shallow layer network comprises a residual block, and parameters input by the three branches are independent;
The deep network adopts Resnet to pretrain on an ImageNet dataset, and parameters input by three branches are shared.
5. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:
in the fourth step, the formula for fusion and series connection between the visible light mode embedding and the visible light middle mode embedding is as follows:
the formula for fusion and series connection of infrared mode embedding and infrared middle mode embedding is as follows:
Wherein, For representing visible light enhancement features,/>For representing infrared enhancement features,/>For representing characteristic series,/>Representing visible light mode embedding,/>Representing visible light intermediate modality embedding,/>Representing visible light mode embedding,/>Indicating infrared intermediate modality embedding.
6. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:
in the sixth step, the formula for decoupling the visible light enhancement feature into the visible light mode feature and the visible light intermediate mode feature is as follows:
the formula for decoupling the infrared-enhanced features into visible light mode features and visible light intermediate mode features is as follows:
Wherein, Representing visible light mode characteristics,/>Representing the middle mode characteristics of visible light,/>The infrared-ray mode characteristics are represented,Representing infrared intermediate mode characteristics.
7. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:
In the seventh step, the contrast learning code mapping network includes a full connection layer and a ReLU layer, and the contrast learning code mapping network is used for converting the feature dimension from 2048 dimensions to 512 dimension hybrid mode contrast code, and the formula is as follows:
Wherein, Representing visible light mode contrast coding,/>For contrast coding of visible light intermediate modes,/>For infrared mode contrast coding,/>And (5) comparing and encoding the infrared intermediate modes.
8. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:
In the eighth step, the hybrid mode contrast learning loss function is:
Wherein the method comprises the steps of Representing the total number of samples of visible light mode contrast coding, visible light intermediate mode contrast coding, infrared mode contrast coding and infrared intermediate mode contrast coding,/>Expressed therein as/>Sample/>
UsingRepresents the/>Label of individual samples, then/>Positive samples for representing participation in contrast learning include all and/>, of the four contrast codes described abovePositive samples of the same identity;
For indicating removal/> All other samples except themselves;
Parameters (parameters) Scaling factors, which are measures of similarity between samples, are used to adjust sensitivity to differences between similar samples.
CN202410406090.8A 2024-04-07 2024-04-07 Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning Active CN117994821B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410406090.8A CN117994821B (en) 2024-04-07 2024-04-07 Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410406090.8A CN117994821B (en) 2024-04-07 2024-04-07 Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning

Publications (2)

Publication Number Publication Date
CN117994821A true CN117994821A (en) 2024-05-07
CN117994821B CN117994821B (en) 2024-07-26

Family

ID=90901045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410406090.8A Active CN117994821B (en) 2024-04-07 2024-04-07 Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning

Country Status (1)

Country Link
CN (1) CN117994821B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190180467A1 (en) * 2017-12-11 2019-06-13 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for identifying and positioning objects around a vehicle
CN115862064A (en) * 2022-11-30 2023-03-28 中国人民公安大学 Visible light-infrared cross-modal pedestrian re-identification method and system
CN116311384A (en) * 2023-05-16 2023-06-23 西安科技大学 Cross-modal pedestrian re-recognition method and device based on intermediate mode and characterization learning
CN117351518A (en) * 2023-09-26 2024-01-05 武汉大学 Method and system for identifying unsupervised cross-modal pedestrian based on level difference
CN117576729A (en) * 2023-11-27 2024-02-20 新疆大学 Visible light-infrared pedestrian re-identification method based on multi-stage auxiliary learning
CN117746467A (en) * 2024-01-05 2024-03-22 南京信息工程大学 Modal enhancement and compensation cross-modal pedestrian re-recognition method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190180467A1 (en) * 2017-12-11 2019-06-13 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for identifying and positioning objects around a vehicle
CN115862064A (en) * 2022-11-30 2023-03-28 中国人民公安大学 Visible light-infrared cross-modal pedestrian re-identification method and system
CN116311384A (en) * 2023-05-16 2023-06-23 西安科技大学 Cross-modal pedestrian re-recognition method and device based on intermediate mode and characterization learning
CN117351518A (en) * 2023-09-26 2024-01-05 武汉大学 Method and system for identifying unsupervised cross-modal pedestrian based on level difference
CN117576729A (en) * 2023-11-27 2024-02-20 新疆大学 Visible light-infrared pedestrian re-identification method based on multi-stage auxiliary learning
CN117746467A (en) * 2024-01-05 2024-03-22 南京信息工程大学 Modal enhancement and compensation cross-modal pedestrian re-recognition method

Also Published As

Publication number Publication date
CN117994821B (en) 2024-07-26

Similar Documents

Publication Publication Date Title
CN108520216B (en) Gait image-based identity recognition method
CN111539255A (en) Cross-modal pedestrian re-identification method based on multi-modal image style conversion
CN109635726B (en) Landslide identification method based on combination of symmetric deep network and multi-scale pooling
CN112651940B (en) Collaborative visual saliency detection method based on dual-encoder generation type countermeasure network
CN113743544A (en) Cross-modal neural network construction method, pedestrian retrieval method and system
CN116798070A (en) Cross-mode pedestrian re-recognition method based on spectrum sensing and attention mechanism
CN118116035B (en) Modal imbalance characteristic conversion cross-modal pedestrian re-identification method
CN114898429B (en) Thermal infrared-visible light cross-modal face recognition method
CN112766217A (en) Cross-modal pedestrian re-identification method based on disentanglement and feature level difference learning
CN117333908A (en) Cross-modal pedestrian re-recognition method based on attitude feature alignment
CN116452805A (en) Transformer-based RGB-D semantic segmentation method of cross-modal fusion network
CN118115947A (en) Cross-mode pedestrian re-identification method based on random color conversion and multi-scale feature fusion
CN116863223A (en) Method for classifying remote sensing image scenes by embedding semantic attention features into Swin transform network
CN112836605B (en) Near-infrared and visible light cross-modal face recognition method based on modal augmentation
CN117576729A (en) Visible light-infrared pedestrian re-identification method based on multi-stage auxiliary learning
CN117994821B (en) Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning
CN112330562A (en) Heterogeneous remote sensing image transformation method and system
CN117173595A (en) Unmanned aerial vehicle aerial image target detection method based on improved YOLOv7
CN116168418A (en) Multi-mode target perception and re-identification method for image
Niu et al. Real-time recognition and location of indoor objects
CN117994822B (en) Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion
Song et al. A Semantic Segmentation Method for Road Environment Images Based on Hybrid Convolutional Auto-Encoder
CN117912099A (en) Visible light-infrared cross-mode pedestrian re-identification method based on mode invariant feature enhancement
CN117409268A (en) Scene recognition method and system based on mutual attention fusion and distillation mechanism
CN116343122A (en) Cross-modal pedestrian re-recognition method based on multi-modal common feature space exploration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant