CN117994821A

CN117994821A - Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning

Info

Publication number: CN117994821A
Application number: CN202410406090.8A
Authority: CN
Inventors: 张腊; 孙健; 王钢
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2024-04-07
Filing date: 2024-04-07
Publication date: 2024-05-07
Anticipated expiration: 2044-04-07
Also published as: CN117994821B

Abstract

The invention belongs to the field of computer vision and pattern recognition, and is applied to the field of intelligent security, in particular to a visible light-infrared cross-mode pedestrian re-recognition method based on information compensation contrast learning. The hybrid mode contrast learning loss function designed by the invention can map the visible light contrast codes generated by the network through training contrast learning codes, the visible light intermediate mode contrast codes, the infrared contrast codes and the infrared intermediate mode contrast learning codes, maximize the mutual information between the visible light contrast codes and the infrared contrast codes, and fully enable the network to dig out the characteristic information which is beneficial to the improvement of the identity recognition capability.

Description

Visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning

Technical Field

The invention belongs to the field of computer vision and pattern recognition, and is applied to the field of intelligent security, in particular to a visible light-infrared cross-mode pedestrian re-recognition method based on information compensation contrast learning.

Background

The pedestrian re-identification technology is mainly applied to the field of security and protection and is used for matching pedestrian images consistent with the identity of a target pedestrian between non-overlapping cameras. The common task of pedestrian re-identification is based on a set of visible light images acquired by a visible light camera. Because the visible light camera depends on better illumination conditions, the application of the pedestrian re-identification technology in the security field is limited. The infrared camera is specially used for solving the imaging problem under the condition of darker light, and becomes a supplementary scheme of the visible light camera in security monitoring. Therefore, the combined scheme of the visible light-infrared camera is widely applied to front-end construction of modern security, and provides a sufficient facility foundation for development of visible light-infrared cross-mode pedestrian re-identification.

The visible light-infrared camera defaults to a visible light acquisition mode, and is automatically switched to an infrared mode at night or when light is dark, so that visible light and infrared images cannot be acquired at the same time, namely a cross-mode image pair which is lack of matching is obtained. Therefore, in the task of re-identifying the cross-modal pedestrians, the problem of complex inter-class changes caused by the posture, shielding, camera angles, variable light conditions and the like of the pedestrians needs to be solved, and more complex modal difference changes caused by different imaging principles also needs to be solved. The current mainstream research method mainly learns discriminant and reduces modal differences as much as possible by designing different network structures or loss functions, but the modal differences are huge and the lack of cross-modal image pairs causes the learning of networks to be very challenging.

Disclosure of Invention

The technical solution of the invention is as follows: the visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning is provided to overcome the defects of the prior art. By maximizing mutual information among positive samples, high-level semantic modality invariance is fully utilized, and finally, the characteristic with higher cross-modality matching performance is generated. Two kinds of information compensation based on feature level are realized, including identification force information compensation and information compensation with unchanged cross-modal content, and on the basis, the contrast learning of the design hybrid mode fully excavates the high-level semantic consistency features, so that the features with higher cross-modal matching performance can be generated.

The technical scheme of the invention is as follows:

A visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning comprises the following steps:

the method comprises the steps that firstly, a visible light mode pedestrian snapshot is obtained through a visible light camera, and an infrared mode pedestrian snapshot is obtained through an infrared camera;

The second step, converting the visible light mode pedestrian snapshot obtained in the first step into a visible light middle mode snapshot, and converting the infrared mode pedestrian snapshot obtained in the first step into an infrared middle mode snapshot;

Mapping the visible light mode pedestrian snapshot obtained in the first step into a unified feature space to generate visible light mode embedding;

mapping the infrared mode pedestrian snapshot obtained in the first step into a unified feature space to generate infrared mode embedding;

Mapping the snapshot of the middle mode of the visible light obtained in the second step into a unified feature space to generate embedding of the middle mode of the visible light;

mapping the infrared intermediate mode snapshot obtained in the second step into a unified feature space to generate infrared intermediate mode embedding;

fourthly, fusing and connecting the visible light mode embedding and the visible light middle mode embedding in the unified feature space generated in the third step in series to generate visible light enhancement features after the identification force information compensation;

the infrared mode embedding and the infrared intermediate mode embedding in the unified feature space generated in the third step are used for fusion and series connection to generate infrared enhancement features after the identification force information compensation;

Fifthly, training the visible light enhancement features and the infrared enhancement features generated in the fourth step by using a cross entropy loss function and a triplet loss function together;

Step six, decoupling the visible light enhancement features trained in the step five into visible light mode features and visible light middle mode features;

Decoupling the infrared enhancement features trained in the fifth step into infrared mode features and infrared intermediate mode features;

A seventh step of inputting the visible light mode characteristics generated in the sixth step into a contrast learning code mapping network to generate a visible light mode contrast code;

Inputting the visible light intermediate mode characteristics generated in the sixth step into a contrast learning code mapping network to generate a visible light intermediate mode contrast code;

inputting the infrared mode characteristics generated in the sixth step into a contrast learning code mapping network to generate an infrared mode contrast code;

inputting the infrared intermediate mode characteristics generated in the sixth step into a contrast learning code mapping network to generate infrared intermediate mode contrast codes;

Eighth, training the visible light mode contrast code, the visible light intermediate mode contrast code, the infrared mode contrast code and the infrared intermediate mode contrast code generated in the seventh step by using a mixed mode contrast learning loss function;

and ninth, calculating cosine similarity between the target pedestrian and each pedestrian characteristic in the pedestrian retrieval library by using the visible light mode contrast code and the infrared mode contrast code after training in the eighth step, and sequencing the calculation results in a descending order, wherein the obtained Rank-1 is used as an optimal matching result.

In the second step, the visible light mode pedestrian snapshot is converted into a visible light middle mode snapshot through a middle mode construction module, and the infrared mode pedestrian snapshot is converted into an infrared middle mode snapshot;

The intermediate mode construction module comprises a preprocessing module, a mode encoder and a mode decoder;

The preprocessing module is used for converting the visible light pedestrian snapshot into a gray level diagram and converting the infrared pedestrian snapshot into a single-channel mode;

the modal encoder comprises a visible light modal encoder and an infrared modal encoder, each encoder comprises a convolution layer with the size of 1 multiplied by 1 and a ReLU layer, and the parameters of the two encoders (namely the visible light modal encoder and the infrared modal encoder) are independent;

The modal decoder comprises a visible light modal decoder and an infrared modal decoder, each decoder comprises a1 multiplied by 3 full connection layer and a ReLU layer, and two decoder parameters are shared;

In the third step, mapping the snapshot into a unified feature space through a three-branch network structure;

the visible light model pedestrian snapshot is used as a branch input;

the infrared mode pedestrian snapshot is used as a branch input;

Taking the visible light intermediate mode snapshot and the infrared intermediate mode snapshot as a branch input;

the three-branch network structure comprises a shallow network and a deep network;

the shallow layer network comprises a residual block, and parameters input by the three branches are independent;

The deep network adopts Resnet to pretrain on an ImageNet dataset, and parameters input by three branches are shared;

in the fourth step, the formula for fusion and series connection between the visible light mode embedding and the visible light middle mode embedding is as follows:

the formula for fusion and series connection of infrared mode embedding and infrared middle mode embedding is as follows:

Wherein, For representing visible light enhancement features,/>For representing infrared enhancement features,/>For representing characteristic series,/>Representing visible light mode embedding,/>Representing visible light intermediate modality embedding,/>Representing visible light mode embedding,/>Representing infrared intermediate modality embedding;

in the sixth step, the formula for decoupling the visible light enhancement feature into the visible light mode feature and the visible light intermediate mode feature is as follows:

the formula for decoupling the infrared-enhanced features into visible light mode features and visible light intermediate mode features is as follows:

Wherein, Representing visible light mode characteristics,/>Representing the middle mode characteristics of visible light,/>Representing infrared morphological characteristics,/>Representing infrared intermediate mode characteristics;

In the seventh step, the contrast learning code mapping network includes a full connection layer and a ReLU layer, and the contrast learning code mapping network is used for converting the feature dimension from 2048 dimensions to 512 dimension hybrid mode contrast code, and the formula is as follows:

Wherein, Representing visible light mode contrast coding,/>For contrast coding of visible light intermediate modes,/>For infrared mode contrast coding,/>The infrared intermediate mode contrast code is adopted;

In the eighth step, the hybrid mode contrast learning loss function is:

Wherein the method comprises the steps of Representing the total number of samples of visible light mode contrast coding, visible light intermediate mode contrast coding, infrared mode contrast coding and infrared intermediate mode contrast coding,/>Expressed therein as/>Sample/>。

UsingRepresents the/>Label of individual samples, then/>Positive samples for representing participation in contrast learning include all and/>, of the four contrast codes described abovePositive samples of identity.

For indicating removal/>All other samples except themselves.

Parameters (parameters)Scaling factors, which are measures of similarity between samples, are used to adjust sensitivity to differences between similar samples.

Advantageous effects

The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning can fully mine cross-mode high-level semantic consistency, and improves cross-mode matching performance.

According to the intermediate mode construction method designed by the invention, original mode data is mapped into images with consistent modes through simple mode encoding and decoding.

The invention designs two kinds of characteristic level information compensation based on the intermediate mode, including identification force information compensation and information compensation with unchanged cross-mode content, wherein the identification information of visible light and infrared characteristics can be equally enhanced, and the identification force of the characteristics with unchanged mode can be effectively improved by the aid of the information compensation based on the intermediate mode.

The hybrid mode contrast learning loss function designed by the invention can map the visible light contrast codes generated by the network through training contrast learning codes, the visible light intermediate mode contrast codes, the infrared contrast codes and the infrared intermediate mode contrast learning codes, maximize the mutual information between the visible light contrast codes and the infrared contrast codes, and fully enable the network to dig out the characteristic information which is beneficial to the improvement of the identity recognition capability.

Drawings

FIG. 1 is a schematic diagram of a three-branch network structure according to the present invention;

FIG. 2 is a schematic diagram of an intermediate modality building block;

FIG. 3 is a schematic diagram of the generated mixed mode positive and negative samples;

FIG. 4 is a graph of a comparative learning loss function.

Detailed Description

The invention is further described below with reference to the drawings and examples.

Examples

step two, converting the visible light mode pedestrian snapshot and the infrared mode pedestrian snapshot obtained in the step one into an intermediate mode snapshot through an intermediate mode construction module shown in fig. 2;

Mapping the visible light mode pedestrian snapshot, the middle mode pedestrian snapshot and the infrared mode pedestrian snapshot obtained in the first step and the second step into a unified feature space through a three-branch network structure shown in fig. 1, and generating visible light mode embedding, middle mode embedding and infrared mode embedding;

A fourth step of generating a visible light enhancement feature after the discrimination enhancement by using a visible light-middle modality embedding part corresponding to the middle modality embedding in the unified feature space generated in the third step to be in fusion and series with the visible light modality embedding; the corresponding infrared-middle mode embedding part in the middle mode embedding in the unified feature space in the third step is used for fusion and series connection with the infrared mode embedding to generate infrared enhancement features with enhanced recognition force;

fifthly, training the visible light enhancement features and the infrared enhancement features generated in the fourth step by using a cross entropy loss function and a triplet loss function;

Step six, decoupling the visible light enhancement features trained in the step five into visible light mode features and visible light middle mode features; decoupling the infrared enhancement features trained in the fifth step into infrared mode features and infrared intermediate mode features;

Seventh, using the visible light mode feature and the visible light middle mode feature generated in the sixth step, and using the infrared mode feature and the infrared middle mode feature; inputting the visible light mode contrast code into a contrast learning code mapping network, and generating a visible light intermediate mode contrast code, an infrared mode contrast code and an infrared intermediate mode contrast code; the positive and negative samples of the four generated hybrid mode contrast codes are shown in fig. 3;

eighth step, as shown in fig. 4, training the four hybrid mode contrast codes generated in the seventh step by using a hybrid mode contrast learning loss function;

and ninth, calculating cosine similarity between the pedestrian target and each pedestrian feature in the pedestrian retrieval library by using the visible light pattern contrast code and the infrared pattern contrast code after training in the eighth step, and sequencing the calculation results in a descending order to obtain Rank-1 as an optimal matching result.

The effects of the present invention will be described with reference to actual measurement data experiments. To evaluate the performance of the proposed method, experiments were performed using the public dataset SYSU-MM 01.

Training process:

Input: each training batch contains 4 pedestrians, each pedestrian randomly selects 4 visible light mode snapshots, and 4 infrared mode snapshots;

And (3) outputting: trained optimal model ；

Initializing: intermediate modality building module; Encoder/>; Contrast learning coding mapping network/>；

Step 1: building modules using intermediate modalitiesConverting the visible light mode pedestrian snapshot RGB and the infrared mode pedestrian snapshot IR into intermediate modes, and respectively generating the visible light intermediate mode pedestrian snapshot RGB-M and the infrared intermediate mode pedestrian snapshot IR-M

Step 2: inputting RGB, IR and (RGB-M, IR-M) into a three-branch network encoder E to generate visible light mode embeddingInfrared mode embedding/>And (visible light intermediate modality embedding/>Infrared intermediate modality embedding/>）；

Step 3: by passing throughAnd/>Tandem/>And/>Tandem generation of visible light enhancement features/>And infrared enhanced features；

Step 4: enhancing features for visible lightAnd infrared enhanced features/>Calculating cross entropy loss and triplet loss/>；

Step 5: enhancing visible light featuresDecoupling to visible light modal features/>And visible light intermediate mode features; Infrared enhancement features/>Decoupling into infrared modal features/>And infrared intermediate modality characteristics/>；

Step 6: using visible light mode featuresVisible light intermediate modality characteristics/>Infrared modal characteristics/>And infrared intermediate modality characteristics/>Input into a contrast learning code mapping network Project to generate a mixed mode code

Step 7: encoding for mixed modesCalculating the contrast learning loss of the mixed mode；

Step 7: calculation ofUpdating model parameters/>, through back propagation and optimization；

Repeating the steps, and after 200 iterations, saving the model parameters with optimal effect as an optimal model。

The testing process comprises the following steps:

Step one: inputting the pedestrian snapshot acquired through the infrared camera and inputting the pedestrian snapshot into the optimal model ；

Step two: input modelInfrared mode contrast encoding/>；

Step three: inputting a model for all visible light pictures in a test search libraryGenerating visible light mode contrast code/>In which,/>And (5) comparing and coding the visible light patterns of the ith visible light picture sample in the search library.

Step four: calculation ofAnd/>Cosine similarity of each visible light sample, and performing descending order sorting, wherein Rank-1 is the optimal matching result.

In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning is characterized by comprising the following steps:

The visible light intermediate mode contrast coding is used for realizing information compensation with unchanged cross-mode content;

the infrared intermediate mode contrast coding is used for realizing information compensation with unchanged cross-mode content;

2. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 1, wherein the method is characterized by comprising the following steps of:

In the second step, the visible light mode pedestrian snapshot is converted into the visible light middle mode snapshot through the middle mode construction module, and the infrared mode pedestrian snapshot is converted into the infrared middle mode snapshot.

3. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:

The modal encoder comprises a visible light modal encoder and an infrared modal encoder, each encoder comprises a convolution layer of 1 multiplied by 1 and a ReLU layer, and the two encoders are independent in parameters;

the modal decoder comprises a visible mode decoder and an infrared mode decoder, each decoder comprising a 1x 3 fully connected layer and one ReLU layer, the two decoder parameters being shared.

4. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:

the visible light model pedestrian snapshot is used as a branch input;

the infrared mode pedestrian snapshot is used as a branch input;

The deep network adopts Resnet to pretrain on an ImageNet dataset, and parameters input by three branches are shared.

5. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:

Wherein, For representing visible light enhancement features,/>For representing infrared enhancement features,/>For representing characteristic series,/>Representing visible light mode embedding,/>Representing visible light intermediate modality embedding,/>Representing visible light mode embedding,/>Indicating infrared intermediate modality embedding.

6. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:

Wherein, Representing visible light mode characteristics,/>Representing the middle mode characteristics of visible light,/>The infrared-ray mode characteristics are represented,Representing infrared intermediate mode characteristics.

7. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:

Wherein, Representing visible light mode contrast coding,/>For contrast coding of visible light intermediate modes,/>For infrared mode contrast coding,/>And (5) comparing and encoding the infrared intermediate modes.

8. The visible light-infrared cross-mode pedestrian re-identification method based on information compensation contrast learning of claim 2, wherein the method is characterized by comprising the following steps of:

In the eighth step, the hybrid mode contrast learning loss function is:

Wherein the method comprises the steps of Representing the total number of samples of visible light mode contrast coding, visible light intermediate mode contrast coding, infrared mode contrast coding and infrared intermediate mode contrast coding,/>Expressed therein as/>Sample/>；

UsingRepresents the/>Label of individual samples, then/>Positive samples for representing participation in contrast learning include all and/>, of the four contrast codes described abovePositive samples of the same identity;

For indicating removal/> All other samples except themselves;

Parameters (parameters) Scaling factors, which are measures of similarity between samples, are used to adjust sensitivity to differences between similar samples.