CN116343294A

CN116343294A - Pedestrian re-identification method suitable for generalization of field

Info

Publication number: CN116343294A
Application number: CN202310191111.4A
Authority: CN
Inventors: 周雪; 丁金; 邹见效; 徐红兵
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2023-03-02
Filing date: 2023-03-02
Publication date: 2023-06-27

Abstract

The invention discloses a pedestrian re-identification method suitable for generalization in the field, and belongs to the technical field of computer vision and machine learning. The invention constructs a domain normalization module DN by combining the distribution information of the IN layer and the BN layer, and replaces a normalization module BN of a deep network structure IN a deep learning network model by the domain normalization module DN to construct a strong baseline model; in addition, a implicit distribution space is constructed by utilizing the distribution information of all the source domains, and the distribution alignment is realized by projecting the distribution of the source domain and the target domain to the implicit distribution space; meanwhile, AN inverse normalization module AN is designed, each source domain is regarded as a 'simulated target domain' of other source domains, and the target domain is restricted to project to a hidden distribution space by using the distribution information of the source domain as AN inverse normalization parameter on the basis of removing style information and identity authentication information by using IN operation, so that the recognition accuracy of the model IN the target domain is further improved.

Description

Pedestrian re-identification method suitable for generalization of field

Technical Field

The invention belongs to the technical fields of computer vision, machine learning and the like, and particularly relates to an unsupervised field generalized pedestrian re-identification method based on deep learning.

Background

Pedestrian re-recognition is an important and challenging research problem in computer vision, and by associating specific pedestrians captured by a camera at different times and different physical positions, pedestrians at different viewing angles can be rapidly and accurately obtained, and a foundation is laid for subsequent high-level applications such as pedestrian tracking, pedestrian attribute recognition, pedestrian behavior analysis and the like. At present, the pedestrian re-identification technology has shown great development prospect and economic benefit in a plurality of fields such as intelligent video monitoring, intelligent urban traffic, intelligent photo album, intelligent retail and the like.

Pedestrian re-recognition based on the deep learning method can be roughly classified into supervised and unsupervised pedestrian re-recognition according to whether pedestrian identity tag information is used. At present, a pedestrian re-identification method for performing supervised learning by using identity tag information of pedestrians has reached very high performance. However, because the acquisition of identity tag information is time consuming and laborious and involves personal privacy concerns, unsupervised pedestrian re-identification methods have received much attention. These methods typically utilize a clustering algorithm or similarity measurement method to obtain pseudo-labels of the input images, and then train the model in a supervised manner.

Although unsupervised pedestrian re-recognition methods have achieved significant attention, most methods assume that the training and test data are from the same dataset (domain), i.e., both have similar distributions. In an actual scene, there are large differences (domain differences) between different pedestrian re-recognition data sets, such as photographed scenes, illumination intensities of photographed points, resolutions of cameras, dressing styles of pedestrians, etc., so that these unsupervised methods can only be applied to test sets with similar distribution to the current training set, but cannot generalize to other data sets, and recognition performance is greatly compromised.

In order to solve the domain difference problem, many researchers in recent years pay attention to how to train a model with better generalization by only using labeled source domain data, so that it can achieve better recognition effect in an unlabeled target domain, namely, realize unsupervised cross-domain pedestrian re-recognition. According to whether target domain data can be used for training, the unsupervised cross-domain pedestrian re-recognition method can be mainly divided into two types: domain adaptive methods and domain generalization methods. The goal of the domain-adaptive approach is to adaptively migrate knowledge learned by the source domain to the target domain that has a different domain style and known data. Because the target domain data can be used for training, these methods focus on exploring the relationship between the source domain image and the target domain image, or directly using the target domain data for unsupervised learning. Although domain-adaptive methods alleviate domain discrepancy problems to some extent, these methods rely heavily on training data of the target domain and cannot be tested directly in the target domain. The domain generalization method solves this problem well. The domain generalization method hopes to obtain the generalization representation of pedestrians by using limited source domains, so that good recognition performance can be achieved in target domains with different domain styles and unknown data. The biggest difference from domain adaptation is that the data of the target domain is not available for training.

At present, the unsupervised field generalized pedestrian re-identification method can be mainly divided into four types: methods based on token learning, based on distribution alignment, based on meta learning, and based on data enhancement. (1) The method based on the characteristic learning aims at learning the pedestrian characteristic of invariance in the field, so that the robustness of the model is improved. For example, jin et al propose a style normalization and restoration model that hopes to filter out interference factors that are not related to pedestrian identity while decoupling the identifying pedestrian identity features. (2) The distribution alignment-based method assumes that the data distribution of both the source domain and the target domain obeys a multivariate gaussian distribution, and hopefully fits the data distribution of the target domain using the distribution information, i.e., the mean and variance, of each source domain dataset. For example, xu et al propose an adaptive aggregation-based simulation embedding method that considers the correlation of unknown target samples and source domain datasets and designs an aggregation module to adaptively integrate distribution information of multiple source domains to simulate the distribution of the target domain. (3) The meta-learning-based method is focused on formulating an effective learning strategy to improve the robustness of the algorithm. These methods allow the model to learn to eliminate domain differences by modeling domain differences between the source domain and the target domain during training. (4) Data enhancement based methods typically perform data enhancement at a feature level, hopefully generating more diverse features, thereby enhancing the generalization of the model to the target domain. For example, ang et al propose a method of domain characterization augmentation that improves domain manifold coverage by implicitly projecting feature points in the direction of the source domain distribution, thereby obtaining enhanced features.

IN recent years, under the influence of style migration networks, a distribution alignment method by adding an instance normalization layer (Instance Normalization, IN) to a network structure has gained favor of more and more students. The method normalizes each example sample by using an IN layer so as to remove style information of a source domain, and normalizes the whole source domain data set to a multi-element Gaussian distribution with a mean value of 0 and a variance of 1 by using a batch normalization layer (Batch Normalization, BN) IN a network so as to extract pedestrian identity information which is not affected by domain differences and has authentication IN the source domain. However, although the IN layer can remove some style information causing domain differences, identity information with authentication IN the pedestrian image is inevitably lost; although the BN layer can extract the distribution information of the source domain and achieve better recognition performance in the source domain, since the target domain data is not available for training, it is directly affected by some domain information unique to the source domain when testing the target domain having a completely different distribution.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention provides a pedestrian re-identification method applicable to generalization in the field. A domain normalization module (Domain Normalization, DN) is constructed by combining the distribution information of the IN layer and the BN layer, and the domain normalization module DN is utilized to replace a normalization module BN of a deep network structure IN a deep learning network model to construct a strong baseline model; the domain normalization module balances the functions of the IN layer and the BN layer, relieves the influence of domain difference and can extract high-level semantic information with identity authentication of pedestrians. Because the BN layer extracts the distribution information of the source domain, the influence of domain difference cannot be completely eliminated by combining different distribution information only by a linear interpolation mode; therefore, on the basis of a strong baseline model, the invention constructs a recessive distribution space by utilizing the distribution information of all source domains, and realizes the distribution alignment by projecting the distribution of the source domain and the target domain to the recessive distribution space; meanwhile, AN inverse Normalization (AN) module is designed, each source domain is regarded as a 'simulated target domain' of other source domains, and the target domain is enabled to utilize the distribution information of the source domain as AN inverse Normalization parameter on the basis of removing style information and identity authentication information by utilizing IN operation, so that the target domain is constrained to project to a recessive distribution space, and the recognition accuracy of the model IN the target domain is further improved.

In order to achieve the above purpose, the invention adopts the following technical scheme:

the pedestrian re-identification method suitable for the generalization of the field is characterized by comprising the following steps of:

s1, establishing a strong baseline model.

S2, training the strong baseline model by adopting at least two source domains to obtain a trained strong baseline model.

S3, inputting the target domain image to be identified into the trained strong baseline model, and completing the re-identification task.

Specifically, in step S1, the strong baseline model uses the deep learning network model as a backbone network, and replaces the normalization module of the deep network structure in the deep learning network model with the domain normalization module DN to form the strong baseline model.

The calculation formula of the domain normalization module DN is as follows:

wherein,,

representing the feature map result, a, after the ith DN is passed for the input feature map x ₁ 、a ₂ Respectively representing two learnable parameters for adaptively learning the functions of the balanced IN layer and the BN layer; gamma and beta represent a learnable scaling parameter and a translation parameter, respectively; e is a constant, preventing denominator from being 0; />

Respectively represent the mean, variance, < >/IN the ith source domain IN layer>

Respectively representing the mean and the variance in the ith source domain BN layer, wherein the calculation mode is as follows:

where n.epsilon. {1,2,3, …, N }, h.epsilon. {1,2,3, …, H }, w.epsilon. {1,2,3, …, W }, N, C, H and W represent the input feature maps x.epsilon.R, respectively ^N×C×H×W A number of channels, a height and a width.

Preferably, on the basis of a strong baseline model, a second deep network structure which is parallel to the deep network structure and has the same structure is added, and a domain normalization module DN in the second deep network structure is replaced by AN inverse normalization module AN; meanwhile, a storage module is added, and the mean value and the variance of the source domain stored in the storage module are respectively averaged to obtain the mean value and the variance of the implicit distribution, so that an implicit distribution space is constructed.

The calculation formula of the inverse normalization module AN is as follows:

wherein,,

representing a feature map result after the input feature map x passes through the ith AN; ith source domain D for input _i D is to _i "simulation target domain" considered as remaining source domain, m _j Is the jth source domain D _j The leachable parameters of j e { j not equal to i|1,2,3, …, k } where k is the number of source fields; average value of the remaining source domains in deep network structure>

Mean square error->

As an inverse normalization parameter and constraining the source domain D after inverse normalization by a second deep network structure _i Is aligned with the recessive distribution space constructed by the storage module; the specific alignment mode is as follows:

wherein μ and σ ² Mean and variance of implicit distribution, respectively, |·| | #, are represented ² Represents L ₂ Square of norm.

Average mean value

Mean square error->

The calculation mode of (2) is as follows:

wherein T represents the iteration number, and alpha is the exponential moving average super-parameter.

Preferably, KL divergence loss L is employed _kl Each sample of the "simulation target domain" is constrained, as calculated:

wherein,,

mean value of last layer characteristic diagram representing nth sample in batch sample after height and width normalization, ++>

Representing the variance of the last layer feature map of the nth sample in a batch of samples after height and width normalization, ++>

And->

Are all R in size ^N×C 。

The beneficial effects of the invention are as follows:

1. IN the strong baseline model established by the invention, the domain normalization module DN is established by combining the distribution information of the IN layer and the BN layer, so that the functions of the IN layer and the BN layer are balanced, the influence of domain difference is relieved, and meanwhile, the identity information of the pedestrians with the identification is extracted.

2. The invention further provides a recessive distribution space on the basis of a strong baseline model; calculating mean and variance of implicit distribution by using the distribution information of the source domains, and projecting the distribution of all source domain data to the implicit distribution space; meanwhile, the designed inverse normalization module AN uses the distribution information of the source domain as AN inverse normalization parameter to restrict the projection of the target domain to the recessive distribution space, and finally, the distribution of the source domain and the target domain is indirectly aligned, so that the domain difference is reduced, and the accuracy rate of domain generalized pedestrian re-identification is further enhanced.

Drawings

FIG. 1 is a schematic diagram of a strong baseline model suitable for a domain generalized pedestrian re-recognition task.

Fig. 2 is a schematic diagram of a domain generalized pedestrian re-recognition method based on implicit distribution alignment.

Detailed Description

The technical scheme of the invention is described in detail below with reference to the attached drawings and specific embodiments.

Embodiment one:

the embodiment provides a field generalized pedestrian re-identification method based on a strong baseline model, which comprises the following steps:

s1, establishing a strong baseline model.

In this embodiment, a network structure of ResNet50 with BN layer is used as a backbone network, and is divided into 4 stages according to the characteristics of its bottleneck structure. In stage1 and stage2, since the two stages are in the shallow layer of the network, most of the extracted semantic information is the shallow layer of the pedestrian image, so that the structure in the two stages does not need to be changed; for deep network structures stage3 and stage4, all batch normalization layers BN in the deep network structures are replaced by domain normalization modules DN to form a strong baseline model, as shown in figure 1.

For the input feature map x ε R ^N×C×H×W Wherein N, C, H and W respectively represent the number of batch samples, the number of channels, the height and the width of the input feature map, and the domain normalization module DN is calculated as follows:

wherein a is ₁ 、a ₂ Respectively represent two learnable parameters for adaptively learning the functions of the balanced IN layer and BN layer, and a IN each DN module ₁ 、a ₂ Unique to it, not shared with other DN modules; gamma and beta represent a learnable scaling parameter and a translation parameter, respectively; e is a constant, preventing denominator from being 0;

and->

Respectively representing the mean and variance IN the IN layer and BN layer IN the ith source domain, wherein the calculation mode is as follows:

in the strong baseline model, the BN positions in the deep network structures stage3 and stage4 are replaced by DN modules of each corresponding source domain and all other structures are shared, so that parameters of the network structure are greatly reduced, the model can extract commonality of identity authentication information of pedestrians in all source domains, and meanwhile, generalization of the network can be enhanced to a certain extent by utilizing different DN modules.

During training, only 1 source domain is randomly selected as input at a time, preventing multiple source domains from updating the model together from biasing the model towards a source domain that is easier to fit. Ith source domain D for input _i I epsilon {1,2,3, …, k }, k represents the number of source domains, and after data passes through a shared shallow layer structure in the network and a deep layer network structure containing unique DN modules thereof, output characteristics f are finally obtained and are respectively classified by each source domainPredictive probability p of layer, trained penalty function includes classification penalty L _cls And ternary loss L _tri The calculation mode is as follows:

where n.epsilon. {1,2,3, …, N }, id.epsilon. {1,2,3, …, ID }, ID is the number of pedestrian identities in the ith source domain, N is the number of samples of the ith source domain, y is the true pedestrian identity tag, p _j Representing the probability that the model predictive sample belongs to the pedestrian identity tag j, ε is a constant for tag smoothing, q _j Is a "soft label" after label smoothing.

Wherein II represents L ₂ The norm of the sample is calculated,

and->

Features f representing input samples, respectively _n In one batch of positive and negative samples, margin is a boundary constant, which in this example takes 0.35.

And (3) training a strong baseline model by using the formula (4) and the formula (5) for the input source domain to obtain a trained strong baseline model.

S3, inputting the target domain image into the trained strong baseline model to finish the re-identification task.

In the testing process, a target domain sample is input into a trained strong baseline model, k output features are obtained through unique DN modules of all source domains, the average value of the k output features is calculated, and finally features for similarity measurement are obtained to perform feature matching, so that a re-identification task is completed.

In each DN module, average is usedValue of

Mean square error->

Substitution of the mean +.>

Sum of variances->

The strong baseline model was tested and calculated as follows:

Embodiment two:

the embodiment provides a field generalization pedestrian re-identification method based on implicit distribution alignment, which is characterized in that a second deep network structure which is parallel to the deep network structure and has the same structure is added on the basis of a strong base line model in the first embodiment, and a field normalization module DN in the second deep network structure is replaced by AN inverse normalization module AN; meanwhile, a storage module is added, and the mean value and the variance of the source domain stored in the storage module are averaged to obtain the mean value and the variance of the implicit distribution, so that an implicit distribution space is constructed, and specifically:

since the BN layer IN the strong baseline model still extracts the distribution information of the source domain, combining the distribution information IN the IN layer and the BN layer only by linear interpolation cannot completely eliminate the influence of the domain difference. Therefore, on the basis of a strong baseline model, an implicit distribution space is constructed by using the distribution information of all source domains, and the distribution of all source domain data is projected to the space. Aiming at the problem that the data of the target domain is unknown, AN inverse normalization module (AN) is designed, each source domain is regarded as a 'simulated target domain' of other source domains, and the target domain takes the distribution information of the source domain as AN inverse normalization parameter on the basis of removing the style information and the identity authentication information by utilizing the IN operation, so that the target domain is constrained to project to AN implicit distribution space.

As shown in fig. 2, the domain generalization pedestrian re-recognition method based on implicit distribution alignment mainly comprises two characteristics.

First, a Memory module (M) is added for storing the mean and variance of each source domain, the size of which is R ² ^×k×C K represents the number of source fields, and C represents the number of channels of the final layer of feature map. In the training process, for one batch of data of the ith source domain, the updating mode of M is as follows:

where δ is a super parameter, set to 0.2 in this embodiment. The storage module is averaged in the 1 st dimension, namely the mean value mu epsilon R of the recessive distribution space formed by k source domains is obtained ^1×C Sum of variances sigma ² ∈R ^1×C 。

Secondly, the network structures of stage5 and stage6 are increased, the structures of stage5 and stage6 are consistent with the structures of stage3 and stage4, the domain normalization module DN is replaced by the inverse normalization module AN, and the inverse normalization module AN is utilized to project the target domain samples to the recessive distribution space. The inverse normalization module AN is calculated as follows:

wherein m is _j Is capable of learningParameters; ith source domain D for input _i The AN firstly removes the style information by using the IN operation, and D is obtained _i The "simulated target domain" considered as the remaining source domain, remaining source domain D _j The corresponding mean value of j E { j not equal to i|1,2,3, …, k } in stage3 and stage4

Sum of variances->

As an inverse normalization parameter, use D _j Is initialized and D after inverse normalization is constrained by stage5 and stage6 _i Is aligned with the implicit distribution built by the memory module. Due to D _i The method can be well generalized to a real target domain because the distribution information of the method is not used when the inverse normalization is performed.

During the training process, 1 source domain is still randomly selected from all source domains as input. Ith source domain D for input _i Firstly, through stage1 and stage2 shared by all source domains, then through stage3 and stage4, updating the mean value and variance of the source domains in the storage module, and calculating the classification loss L according to the mode in the first embodiment _cls And ternary loss L _tri . Meanwhile, in order to restrict the distribution projection of the batch data to the implicit distribution space, in this embodiment, the mean square error loss L is used _mse Constraining the mean of the batch of data

Sum of variances->

The mean value and the variance of the implicit distribution space are kept consistent, and the calculation mode is as follows:

then freezing the shared network structure and inputting againThe source domain data passes through stage5 and stage6 to obtain the average value of the last layer of feature map normalization of each sample in the batch data

Sum of variances

And constrains the distribution of each sample to align with the implicit distribution. Because part of samples have unique properties, forcedly keeping each sample consistent with implicit distribution can cause network failure to converge, so KL divergence loss L is adopted _kl Constraint is carried out, and the calculation mode is as follows:

finally, the total loss function L during training _total The method comprises the following steps:

L _t o _tal ＝L _cls +L _tri +L _mse +L _kl (13)

in the testing process, target domain data respectively pass through stage3 and stage4 branches and stage5 and stage6 branches, average characteristics of the obtained k output characteristics are calculated, then the average characteristics of the two branches are spliced in channel dimension to serve as characteristics for similarity measurement finally to perform characteristic matching, and a re-identification task is completed.

Cross-domain training is performed on a large data set, and the effectiveness of the cross-domain training is verified through experiments.

The cross-domain experiment is carried out by using 4 large pedestrian re-identification data sets, and specific experimental settings comprise: M+MS+CS→C3, M+CS+C3→MS, MS+CS+C3→M, wherein "→" left side represents source domain, right side represents target domain, and only training set of all source domains is adopted as training set of cross-domain experiment.

TABLE 1 comparison with other unsupervised domain generalized pedestrian re-identification methods

As shown in table 1, the bolded numbers represent the highest points and the underlined numbers represent the next highest points. It can be seen that the generalized pedestrian re-recognition method based on the field of distribution alignment achieves good performance, for example, the MetaBIN method combines the characteristics of source domain data after passing through an IN layer and a BN layer, and simultaneously combines element learning training strategies to optimize model updating modes; the META method also uses the source domain as a "simulated target domain", calculates the distance between the distribution of each sample in the target domain and the distribution of all source domains, uses this as the weight of the network output characteristics, and aggregates the output characteristics of all source domain networks. Unlike these methods, the present invention combines the distribution information IN the IN layer and BN layer to allow the network to adaptively balance the functions of the IN layer and BN layer and construct a Strong Baseline model corresponding to Strong Baseline IN the table. It can be seen that mAP and Rank-1 indexes of the strong baseline model exceed all the existing front edge methods in a cross-domain experiment of M+MS+CS- & gt C3, and the effectiveness of the invention is demonstrated. In addition, the method designs an inverse normalization module, constructs a field generalized pedestrian re-identification method based on implicit distribution alignment, aligns the distribution of a source domain and a target domain to an implicit distribution, and achieves better re-identification performance in the distribution space. Finally, mAP indexes in all cross-domain experimental settings of the method reach the optimal, and the average performance is respectively improved by 0.6 percent and 1.0 percent on mAP and Rank-1 indexes compared with that of the META method.

In summary, the invention has the following characteristics and advantages: (1) By combining the mean value and the variance of the IN layer and the BN layer, a domain normalization module is established, and the module can balance the functions of the IN layer and the BN layer, so that the influence of domain difference is relieved, and meanwhile, advanced semantic information with identity authentication of pedestrians is extracted. The module is added into a network structure and a strong baseline model suitable for the re-identification of the field-generalized pedestrians is constructed. (2) On the basis of a strong baseline model, a field generalized pedestrian re-identification method based on implicit distribution alignment is provided, a implicit distribution space is constructed by using the distribution information of all source domains, the distribution of all source domain data is projected to the space, and meanwhile, an inverse normalization module is designed, so that the target domain data uses the distribution information of the source domains as an inverse normalization parameter to restrict the projection of the target domain to the implicit distribution space. By projecting both the source domain and the target domain into the implicit distribution space, the distribution alignment of the source domain and the target domain is indirectly realized, and the domain difference between the source domain and the target domain is reduced. (3) The generalization experimental effect in the field of 4 large pedestrian re-identification data sets shows that the method provided by the invention reaches an advanced level in average performance.

Claims

1. The pedestrian re-identification method suitable for the generalization of the field is characterized by comprising the following steps of:

s1, establishing a strong baseline model;

s2, training the strong baseline model by adopting at least two source domains to obtain a trained strong baseline model;

s3, inputting the target domain image to be identified into a trained strong baseline model to finish the re-identification task;

specifically, in step S1, the strong baseline model uses the deep learning network model as a backbone network, and replaces the normalization module of the deep network structure in the deep learning network model with the domain normalization module DN to form the strong baseline model;

the calculation formula of the domain normalization module DN is as follows:

wherein,,

2. The pedestrian re-recognition method suitable for domain generalization as claimed in claim 1, wherein a second deep network structure which is parallel to the deep network structure and has the same structure is added on the basis of a strong baseline model, and a domain normalization module DN is replaced by AN inverse normalization module AN; meanwhile, a storage module is added, and the mean value and the variance of the source domain stored in the storage module are respectively averaged to obtain the mean value and the variance of the implicit distribution, so that an implicit distribution space is constructed;

the calculation formula of the inverse normalization module AN is as follows:

wherein,,

Mean square error->

wherein μ and σ ² Mean and variance of implicit distribution, respectively, |·| | #, are represented ² Represents L ₂ Squaring the norm;

average mean value

Mean square error->

The calculation mode of (2) is as follows:

3. A pedestrian re-identification method suitable for generalization in the field as claimed in claim 2, wherein KL divergence loss L is used _kl Each sample of the "simulation target domain" is constrained, as calculated:

wherein,,

And->

Are all R in size ^N×C 。