CN115147774A

CN115147774A - Pedestrian re-identification method in degradation environment based on feature alignment

Info

Publication number: CN115147774A
Application number: CN202210792619.5A
Authority: CN
Inventors: 查正军; 刘嘉威; 王堃宇
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2022-07-05
Filing date: 2022-07-05
Publication date: 2022-10-04
Anticipated expiration: 2042-07-05
Also published as: CN115147774B

Abstract

The invention discloses a pedestrian re-identification method in a degradation environment based on feature alignment, which comprises the following steps: 1. constructing a new pedestrian re-identification neural network model in a degradation environment; 2. processing and calculating input data by using the model; 3. calculating each loss function of the model to obtain a total loss function; 4. and performing iterative optimization on the model according to the total loss function. The feature alignment module provided by the invention is a plug-and-play module, can be combined with the existing pedestrian re-identification model, so that the performance of the model in a degraded environment can be improved, the performance of the model in a clean environment can be ensured not to be lost, the high efficiency of running in a normal environment and the degraded environment can be realized at the same time, and the pedestrian re-identification with high accuracy is realized.

Description

Pedestrian re-identification method in degradation environment based on feature alignment

Technical Field

The invention belongs to the technical field of image processing, particularly relates to pedestrian re-recognition in a degradation environment, and provides a plug-and-play feature alignment module-based pedestrian re-recognition algorithm in the degradation environment.

Background

Pedestrian re-identification is directed to open pedestrian retrieval in a non-overlapping camera network. However, in practical applications, the image of the pedestrian may be degraded to different degrees due to illumination, resolution and weather, for example, the monitoring camera image (i.e. the picture to be queried in pedestrian re-identification, query set) usually has a lower resolution due to the problem of the device, but the image of the galeley set matched with the monitoring camera image usually has a higher resolution. As a result, pedestrian re-recognition models trained on clean pictures do not perform well in degraded environments that are widespread in reality. In addition, because it is extremely difficult to collect large-scale labeled degraded images for various degraded scenes in reality, it is not feasible to retrain the supervised pedestrian re-recognition mode for various degraded environments.

Currently, there are two main approaches to solve the above-mentioned dilemma, but both have their own drawbacks. And (1) an unsupervised domain adaptation-based method. The premise of this strategy is that the deep neural network can align the edge distributions of the low-quality and high-quality images in the learned feature space. Once the difference between the edge distributions in the learned feature space is reduced, the re-id network will perform well on low quality images. Although unsupervised domain adaptive pedestrian re-identification based methods can improve performance in degraded environments, such methods also change the mapping rules of the clean picture, thereby compromising pedestrian re-identification performance on the clean picture, which is not desirable for real-world applications. (2) The degraded image is pre-processed using off-the-shelf image restoration or enhancement methods that do not affect the performance of the clean image and can eliminate the negative impact of the degraded environment on pedestrian re-identification, e.g., low light image enhancement techniques can be used to improve the visual quality of images of people taken at night. This solution based on image pre-processing, also called two-stage approach, is suitable for various degraded scenes by integrating different image restoration modules. However, the goal of the image restoration or enhancement method is to achieve a subjectively pleasing visual effect without much attention being paid to the performance of pedestrian re-identification, and therefore the performance improvement of the two-stage method on degraded images is limited.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a pedestrian re-identification method in a degradation environment based on feature alignment, so that the performance of a model on a degraded picture can be improved as much as possible while the performance of the model on the pedestrian re-identification on a clean picture is not sacrificed, and therefore the pedestrian re-identification with high accuracy can be realized.

In order to solve the technical problems, the invention adopts the following technical scheme:

the invention relates to a pedestrian re-identification method in a degradation environment based on feature alignment, which is characterized by comprising the following steps of:

step 1, acquiring pedestrian image data set (X) shot in normal environment ₁ ，X ₂ ，…，X _i ，…，X _N ) Wherein X is _i Representing the ith normal pedestrian image, and N represents the total number of images; acquiring a pedestrian image dataset (Y) taken in a degraded environment ₁ ，Y ₂ ，…，Y _j ，…，Y _M ) Wherein Y is _j Representing the j-th degraded pedestrian image, and M represents the total number of images;

step 2, constructing a deep learning model of pedestrian re-identification in a degradation environment based on feature alignment, comprising the following steps of: pedestrian re-recognition model F and two feature alignment modules G _c2d And G _d2c Two authentication networks D _c And D _d ；

Step 2.1, theThe pedestrian re-identification model F consists of a backbone network and a classification network, wherein the backbone network is based on a ResNet-50 network; pre-training the pedestrian re-recognition model F by utilizing a pedestrian image data set shot in a normal environment to obtain a pre-trained pedestrian re-recognition model

Freezing the pre-training weights;

step 2.2, the feature alignment Module G _c2d And G _d2c The network structures of (a) each include: m residual convolution modules;

each residual convolution module consists of a convolution layer, a batch normalization layer and an activation function RELU in sequence, wherein the convolution kernel of the convolution layer has the size of k multiplied by k and the step length of j; the input of the residual convolution module is spliced with the output of the residual convolution module and then used as the final output of the residual convolution module;

step 2.3, the authentication network D _c And D _d The network structures of (a) each include: a feature extraction module and a classification module;

the structure of the feature extraction module is the same as that of the backbone network, and the pre-training weight is loaded to serve as a network parameter of the feature extraction module; the classification module consists of a global average pooling layer, two full-connection layers, a batch normalization layer and an activation function leak RELU in sequence;

step 3, training a deep learning model for pedestrian re-recognition in a degradation environment based on feature alignment:

step 3.1, the ith normal pedestrian image X _i And j-th degraded pedestrian image Y _j Inputting the pre-trained pedestrian re-recognition model

The backbone network carries out feature extraction to obtain the corresponding pedestrian feature

And

step 3.2, characteristics of pedestrians

Inputting the feature alignment module G _c2d And obtaining the aligned pedestrian features

Characterizing pedestrians

Input feature alignment module G _d2c And obtaining the aligned pedestrian features

Characterizing pedestrians

And

inputting the authentication network D _c And correspondingly obtaining the probability under the normal environment

And

characterizing pedestrians

And

input the authentication network D _d And correspondingly obtaining the probability under the degradation environment

And Dd;

respectively constructing pedestrian image X by using formula (1) and formula (2) _i And Y _j Against loss of

And

in the formulae (1) and (2), E represents desirably;

step 3.3, aligning the pedestrian characteristics

Inputting the feature alignment module G _d2c And obtaining reconstructed pedestrian features

Features of the pedestrian after alignment

Inputting the feature alignment module G _c2d And obtaining reconstructed pedestrian features

Construction of pedestrian image X using equations (3) and (4) _i And Y _j Loss of cyclic consistency

And

step 3.4, characteristics of pedestrians

Inputting the feature alignment module G _d2c And obtaining individual retention characteristics

Characterizing pedestrians

Inputting the feature alignment module G _c2d To obtain individual retention characteristics

Construction of pedestrian image X using equations (5) and (6) _i And Y _j Individual maintenance loss of

And

step 3.5,Construction of pedestrian image X Using equation (7) _i And Y _j Degraded residual consistency loss L _res ：

Step 3.6, establishing a global loss function L by using the formula (8) _total ：

In formula (8), λ ₁ 、λ ₂ 、λ ₃ 、λ ₄ 4 hyper-parameters of the global loss function respectively;

step 3.7, aligning two feature alignment modules G by a random gradient descent method _c2d And G _d2c And two authentication networks D _c And D _d Carrying out optimization solution and calculating a global loss function L _total Then carrying out gradient back propagation until the convergence of the global loss function L is reached, thereby obtaining the trained feature alignment module

And

and authenticating the network

And

step 4, aligning the trained features to a module

Pedestrian re-recognition model connected in pre-training

And obtaining a final pedestrian re-identification model for identifying the pedestrian picture in the degraded environment.

The electronic device comprises a memory and a processor, and is characterized in that the memory is used for storing a program for supporting the processor to execute the pedestrian re-identification method under the characteristic alignment-based degradation environment, and the processor is configured to execute the program stored in the memory.

The invention relates to a computer-readable storage medium, on which a computer program is stored, wherein the computer program is executed by a processor to perform the steps of the method for re-identifying a pedestrian in a degraded environment based on feature alignment.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides a new structural paradigm taking a feature alignment module as a plug-and-play module for a pedestrian re-identification task in a degraded environment. The plug-and-play structure model has high structure flexibility, can be combined with any Keke pedestrian re-identification model, has less network parameters, and can ensure the efficient operation of the pedestrian re-identification network after the module is inserted.

2. According to the method, a feature alignment module is inserted into the network, and an unsupervised confrontation training strategy is used to learn feature alignment between the degraded image and the clean image under the guidance of the pedestrian re-identification model trained by the clean image.

Drawings

FIG. 1 is a flow chart of a pedestrian re-identification algorithm in a degraded environment based on feature alignment of the present invention;

FIG. 2 is a graph comparing pedestrian re-identification performance with a two-stage approach to performance optimization in several fog-degraded environments;

FIG. 3 is a graph comparing pedestrian re-identification performance with several two-stage methods of optimal performance in low light degradation environments;

FIG. 4 is a graph comparing pedestrian re-identification performance with a two-stage approach to performance optimization in several hybrid degradation environments.

Detailed Description

In this embodiment, an idea of image preprocessing is applied to a feature space, that is, degraded features are aligned to clean features in an unsupervised learning and self-supervised learning manner to suppress influence caused by image degradation, and a plug-and-play feature alignment module is provided to improve performance of pedestrian re-identification in a degraded environment, specifically, as shown in fig. 1, the method includes the following steps:

step 1, acquiring pedestrian image data set (X) shot in normal environment ₁ ，X ₂ ，…，X _i ，…，X _N ) Wherein X is _i Representing the ith normal pedestrian image, and N represents the total number of images; acquiring a pedestrian image dataset (Y) taken in a degraded environment ₁ ，Y ₂ ，…，Y _j ，…，Y _M ) Wherein Y is _j Representing j-th degraded pedestrian image, and M represents the total number of images;

Step 2.1, the pedestrian re-identification model F consists of a backbone network and a classification network, wherein the backbone network is based on a ResNet-50 network; pre-training the pedestrian re-recognition model F by utilizing a pedestrian image data set shot in a normal environment to obtain a pre-trained pedestrian re-recognition model

And freezing the pre-training weights;

step 2.2, feature alignment Module G _c2d And G _d2c The network structures of (a) each include: m residual convolution modules;

each residual convolution module consists of a convolution layer, a batch normalization layer and an activation function RELU in sequence, wherein the convolution kernel of the convolution layer has the size of k multiplied by k, and the step length is j; the input of the residual convolution module is spliced with the output of the residual convolution module and then used as the final output of the residual convolution module;

step 2.3, authentication network D _c And D _d The network structures of (a) each include: a feature extraction module and a classification module;

the structure of the feature extraction module is the same as that of the backbone network, the pre-training weight is loaded to serve as the network parameter of the feature extraction module, the structure of the feature extraction module is kept the same as that of the backbone network, and the identifier can extract features from the pedestrian re-identification angle by adopting the same and training parameters, so that more attention is paid to the pedestrian re-identification task; the classification module sequentially comprises a global average pooling layer, two full-connection layers, a batch normalization layer and an activation function leak RELU;

step 3, training of a deep learning model for pedestrian re-identification in a degradation environment based on feature alignment:

step 3.1, carrying out image X on the ith normal pedestrian _i And the j-th degraded pedestrian image Y _j Inputting a pre-trained pedestrian re-recognition model

The characteristics of the pedestrian are extracted in the backbone network to obtain the corresponding characteristics of the pedestrian

And

step 3.2, characterizing the pedestrians

Input feature alignment module G _c2d And obtaining the aligned pedestrian features

Characterizing pedestrians

Characterizing pedestrians

And

inputting the authentication network D _c And obtaining an authentication network D _c Characteristic of pedestrians

And

is the probability extracted from the pedestrian picture shot under normal environment

And

characterizing pedestrians

And

inputting the authentication network D _d And obtaining an authentication network D _d Characteristic of pedestrians

And

is the probability extracted from the pedestrian picture taken in a degraded environment

And

respectively constructing pedestrian images X by using formula (1) and formula (2) _i And Y _j Against loss of

And

in equations (1) and (2), E represents the expectation that the countermeasure loss causes alignment of the clean feature to the degraded feature and alignment of the degraded feature to the clean feature by countermeasure training, making the aligned features more similar to the real features;

step 3.3, aligning the pedestrian characteristics

Input feature alignment module G _d2c And obtaining reconstructed pedestrian features

Features of the pedestrian after alignment

Input feature alignment module G _c2d And obtaining reconstructed pedestrian characteristics

Construction of pedestrian image X using equations (3) and (4) _i And Y _j Loss of cycle consistency

And

the cycle consistency loss is reconstructed by the aligned features through the alignment module, and the features are converted back to the original pedestrian features, so that the aim of ensuring the consistency of the content information is fulfilled;

step 3.4, characteristics of pedestrians

Input feature alignment module G _d2c And obtaining individual retention characteristics

Characterizing pedestrians

Input feature alignment module G _c2d To obtain individual retention characteristics

And

individual retention loss by encouraging alignment of module G _d2c And G _c2d More attention is paid to the degradation information in the features so as to achieve the aim of further protecting the content information in the features;

step 3.5, constructing a pedestrian image X by using the formula (7) _i And Y _j Degraded residual consistency loss L _res ：

Because the invention adopts an unsupervised mode to train the network, stronger constraint can be applied to the network by adopting degradation consistency loss so as to ensure the stability of network training;

In formula (8), λ ₁ 、λ ₂ 、λ ₃ 、λ ₄ Respectively, of the global loss function, in the present embodiment, λ is fixed ₁ ＝1，λ ₂ ＝5，λ ₃ ＝10，λ ₄ ＝1；

Step 3.7, aligning the two feature alignment modules G by a random gradient descent method _c2d And G _d2c And two authentication networks D _c And D _d Carrying out optimization solution and calculating a global loss function L _total Then, gradient back propagation is carried out until the convergence of a global loss function L is reached, so that a trained feature alignment module is obtained

And

and authenticating the network

And

step 4, aligning the trained features to a module

Pedestrian re-recognition model connected in pre-training

In this embodiment, an electronic device includes a memory for storing a program for supporting a processor to execute a pedestrian re-recognition method in a degraded environment based on feature alignment, and a processor configured to execute the program stored in the memory.

In this embodiment, a computer-readable storage medium has stored thereon a computer program which, when executed by a processor, executes the steps of the pedestrian re-identification method in a degraded environment based on feature alignment.

In order to quantitatively evaluate the effect of the invention and verify the effectiveness of the invention, the method is compared with a plurality of performance-optimized two-stage methods under a fog degradation environment, a low-light degradation environment and a mixed degradation environment, and three performance indexes of CMC-k (cumulative matching characterization, a.k.a, rank-k matching access), mAP and mINP are selected as evaluation indexes;

FIG. 2 illustrates the pedestrian re-identification performance of the present invention with a two-stage approach to performance optimization in several fog-degraded environments; FIG. 3 illustrates a comparison of pedestrian re-identification performance with several two-stage methods of optimal performance in low light degradation environments; FIG. 4 illustrates the comparison of the pedestrian re-identification performance of the present invention with several two-stage methods of optimal performance in a hybrid degradation environment; the results clearly show that the method achieves the optimal pedestrian re-identification performance in three degradation environments, and the performance is greatly improved compared with a two-stage method.

Claims

1. A pedestrian re-identification method in a degradation environment based on feature alignment is characterized by comprising the following steps:

step 1, acquiring a pedestrian image data set (X) shot in a normal environment ₁ ，X ₂ ，…，X _i ，…，X _N ) Wherein X is _i Representing the ith normal pedestrian image, and N represents the total number of images; acquiring a pedestrian image dataset (Y) taken in a degraded environment ₁ ，Y ₂ ，…，Y _j ，…，Y _M ) Wherein Y is _j Representing the j-th degraded pedestrian image, and M represents the total number of images;

Step 2.1, the pedestrian re-identification model F consists of a backbone network and a classification network, wherein the backbone network takes a ResNet-50 network as a baseA foundation; pre-training the pedestrian re-recognition model F by utilizing a pedestrian image data set shot in a normal environment to obtain a pre-trained pedestrian re-recognition model

And freezing the pre-training weights;

step 3.1, the ith normal pedestrian image X _i And the j-th degraded pedestrian image Y _j Inputting the pre-trained pedestrian re-recognition model

And

step 3.2, characterizing the pedestrians

Characterizing pedestrians

Characterizing pedestrians

And

And

characterizing pedestrians

And

inputting the authentication network D _d And correspondingly obtaining the probability under the degradation environment

And D _d ；

And

in the formulae (1) and (2), E represents desirably;

step 3.3, aligning the pedestrian characteristics

Inputting the feature alignment module G _d2c And obtaining reconstructed pedestrian characteristics

Features of the pedestrian after alignment

And

step 3.4, characterizing the pedestrians

Characterizing pedestrians

And

And

and authenticating the network

And

step 4, aligning the trained features to a module

Pedestrian re-recognition model connected in pre-training

2. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program that enables the processor to perform the method of claim 1, and wherein the processor is configured to execute the program stored in the memory.

3. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as claimed in claim 1.