CN116343294A - Pedestrian re-identification method suitable for generalization of field - Google Patents

Pedestrian re-identification method suitable for generalization of field Download PDF

Info

Publication number
CN116343294A
CN116343294A CN202310191111.4A CN202310191111A CN116343294A CN 116343294 A CN116343294 A CN 116343294A CN 202310191111 A CN202310191111 A CN 202310191111A CN 116343294 A CN116343294 A CN 116343294A
Authority
CN
China
Prior art keywords
domain
source
distribution
layer
pedestrian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310191111.4A
Other languages
Chinese (zh)
Inventor
周雪
丁金
邹见效
徐红兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202310191111.4A priority Critical patent/CN116343294A/en
Publication of CN116343294A publication Critical patent/CN116343294A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • G06V40/173Classification, e.g. identification face re-identification, e.g. recognising unknown faces across different face tracks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pedestrian re-identification method suitable for generalization in the field, and belongs to the technical field of computer vision and machine learning. The invention constructs a domain normalization module DN by combining the distribution information of the IN layer and the BN layer, and replaces a normalization module BN of a deep network structure IN a deep learning network model by the domain normalization module DN to construct a strong baseline model; in addition, a implicit distribution space is constructed by utilizing the distribution information of all the source domains, and the distribution alignment is realized by projecting the distribution of the source domain and the target domain to the implicit distribution space; meanwhile, AN inverse normalization module AN is designed, each source domain is regarded as a 'simulated target domain' of other source domains, and the target domain is restricted to project to a hidden distribution space by using the distribution information of the source domain as AN inverse normalization parameter on the basis of removing style information and identity authentication information by using IN operation, so that the recognition accuracy of the model IN the target domain is further improved.

Description

Pedestrian re-identification method suitable for generalization of field
Technical Field
The invention belongs to the technical fields of computer vision, machine learning and the like, and particularly relates to an unsupervised field generalized pedestrian re-identification method based on deep learning.
Background
Pedestrian re-recognition is an important and challenging research problem in computer vision, and by associating specific pedestrians captured by a camera at different times and different physical positions, pedestrians at different viewing angles can be rapidly and accurately obtained, and a foundation is laid for subsequent high-level applications such as pedestrian tracking, pedestrian attribute recognition, pedestrian behavior analysis and the like. At present, the pedestrian re-identification technology has shown great development prospect and economic benefit in a plurality of fields such as intelligent video monitoring, intelligent urban traffic, intelligent photo album, intelligent retail and the like.
Pedestrian re-recognition based on the deep learning method can be roughly classified into supervised and unsupervised pedestrian re-recognition according to whether pedestrian identity tag information is used. At present, a pedestrian re-identification method for performing supervised learning by using identity tag information of pedestrians has reached very high performance. However, because the acquisition of identity tag information is time consuming and laborious and involves personal privacy concerns, unsupervised pedestrian re-identification methods have received much attention. These methods typically utilize a clustering algorithm or similarity measurement method to obtain pseudo-labels of the input images, and then train the model in a supervised manner.
Although unsupervised pedestrian re-recognition methods have achieved significant attention, most methods assume that the training and test data are from the same dataset (domain), i.e., both have similar distributions. In an actual scene, there are large differences (domain differences) between different pedestrian re-recognition data sets, such as photographed scenes, illumination intensities of photographed points, resolutions of cameras, dressing styles of pedestrians, etc., so that these unsupervised methods can only be applied to test sets with similar distribution to the current training set, but cannot generalize to other data sets, and recognition performance is greatly compromised.
In order to solve the domain difference problem, many researchers in recent years pay attention to how to train a model with better generalization by only using labeled source domain data, so that it can achieve better recognition effect in an unlabeled target domain, namely, realize unsupervised cross-domain pedestrian re-recognition. According to whether target domain data can be used for training, the unsupervised cross-domain pedestrian re-recognition method can be mainly divided into two types: domain adaptive methods and domain generalization methods. The goal of the domain-adaptive approach is to adaptively migrate knowledge learned by the source domain to the target domain that has a different domain style and known data. Because the target domain data can be used for training, these methods focus on exploring the relationship between the source domain image and the target domain image, or directly using the target domain data for unsupervised learning. Although domain-adaptive methods alleviate domain discrepancy problems to some extent, these methods rely heavily on training data of the target domain and cannot be tested directly in the target domain. The domain generalization method solves this problem well. The domain generalization method hopes to obtain the generalization representation of pedestrians by using limited source domains, so that good recognition performance can be achieved in target domains with different domain styles and unknown data. The biggest difference from domain adaptation is that the data of the target domain is not available for training.
At present, the unsupervised field generalized pedestrian re-identification method can be mainly divided into four types: methods based on token learning, based on distribution alignment, based on meta learning, and based on data enhancement. (1) The method based on the characteristic learning aims at learning the pedestrian characteristic of invariance in the field, so that the robustness of the model is improved. For example, jin et al propose a style normalization and restoration model that hopes to filter out interference factors that are not related to pedestrian identity while decoupling the identifying pedestrian identity features. (2) The distribution alignment-based method assumes that the data distribution of both the source domain and the target domain obeys a multivariate gaussian distribution, and hopefully fits the data distribution of the target domain using the distribution information, i.e., the mean and variance, of each source domain dataset. For example, xu et al propose an adaptive aggregation-based simulation embedding method that considers the correlation of unknown target samples and source domain datasets and designs an aggregation module to adaptively integrate distribution information of multiple source domains to simulate the distribution of the target domain. (3) The meta-learning-based method is focused on formulating an effective learning strategy to improve the robustness of the algorithm. These methods allow the model to learn to eliminate domain differences by modeling domain differences between the source domain and the target domain during training. (4) Data enhancement based methods typically perform data enhancement at a feature level, hopefully generating more diverse features, thereby enhancing the generalization of the model to the target domain. For example, ang et al propose a method of domain characterization augmentation that improves domain manifold coverage by implicitly projecting feature points in the direction of the source domain distribution, thereby obtaining enhanced features.
IN recent years, under the influence of style migration networks, a distribution alignment method by adding an instance normalization layer (Instance Normalization, IN) to a network structure has gained favor of more and more students. The method normalizes each example sample by using an IN layer so as to remove style information of a source domain, and normalizes the whole source domain data set to a multi-element Gaussian distribution with a mean value of 0 and a variance of 1 by using a batch normalization layer (Batch Normalization, BN) IN a network so as to extract pedestrian identity information which is not affected by domain differences and has authentication IN the source domain. However, although the IN layer can remove some style information causing domain differences, identity information with authentication IN the pedestrian image is inevitably lost; although the BN layer can extract the distribution information of the source domain and achieve better recognition performance in the source domain, since the target domain data is not available for training, it is directly affected by some domain information unique to the source domain when testing the target domain having a completely different distribution.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a pedestrian re-identification method applicable to generalization in the field. A domain normalization module (Domain Normalization, DN) is constructed by combining the distribution information of the IN layer and the BN layer, and the domain normalization module DN is utilized to replace a normalization module BN of a deep network structure IN a deep learning network model to construct a strong baseline model; the domain normalization module balances the functions of the IN layer and the BN layer, relieves the influence of domain difference and can extract high-level semantic information with identity authentication of pedestrians. Because the BN layer extracts the distribution information of the source domain, the influence of domain difference cannot be completely eliminated by combining different distribution information only by a linear interpolation mode; therefore, on the basis of a strong baseline model, the invention constructs a recessive distribution space by utilizing the distribution information of all source domains, and realizes the distribution alignment by projecting the distribution of the source domain and the target domain to the recessive distribution space; meanwhile, AN inverse Normalization (AN) module is designed, each source domain is regarded as a 'simulated target domain' of other source domains, and the target domain is enabled to utilize the distribution information of the source domain as AN inverse Normalization parameter on the basis of removing style information and identity authentication information by utilizing IN operation, so that the target domain is constrained to project to a recessive distribution space, and the recognition accuracy of the model IN the target domain is further improved.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the pedestrian re-identification method suitable for the generalization of the field is characterized by comprising the following steps of:
s1, establishing a strong baseline model.
S2, training the strong baseline model by adopting at least two source domains to obtain a trained strong baseline model.
S3, inputting the target domain image to be identified into the trained strong baseline model, and completing the re-identification task.
Specifically, in step S1, the strong baseline model uses the deep learning network model as a backbone network, and replaces the normalization module of the deep network structure in the deep learning network model with the domain normalization module DN to form the strong baseline model.
The calculation formula of the domain normalization module DN is as follows:
Figure BDA0004105493410000051
wherein,,
Figure BDA0004105493410000052
representing the feature map result, a, after the ith DN is passed for the input feature map x 1 、a 2 Respectively representing two learnable parameters for adaptively learning the functions of the balanced IN layer and the BN layer; gamma and beta represent a learnable scaling parameter and a translation parameter, respectively; e is a constant, preventing denominator from being 0; />
Figure BDA0004105493410000053
Respectively represent the mean, variance, < >/IN the ith source domain IN layer>
Figure BDA0004105493410000054
Respectively representing the mean and the variance in the ith source domain BN layer, wherein the calculation mode is as follows:
Figure BDA0004105493410000055
Figure BDA0004105493410000056
where n.epsilon. {1,2,3, …, N }, h.epsilon. {1,2,3, …, H }, w.epsilon. {1,2,3, …, W }, N, C, H and W represent the input feature maps x.epsilon.R, respectively N×C×H×W A number of channels, a height and a width.
Preferably, on the basis of a strong baseline model, a second deep network structure which is parallel to the deep network structure and has the same structure is added, and a domain normalization module DN in the second deep network structure is replaced by AN inverse normalization module AN; meanwhile, a storage module is added, and the mean value and the variance of the source domain stored in the storage module are respectively averaged to obtain the mean value and the variance of the implicit distribution, so that an implicit distribution space is constructed.
The calculation formula of the inverse normalization module AN is as follows:
Figure BDA0004105493410000061
wherein,,
Figure BDA0004105493410000062
representing a feature map result after the input feature map x passes through the ith AN; ith source domain D for input i D is to i "simulation target domain" considered as remaining source domain, m j Is the jth source domain D j The leachable parameters of j e { j not equal to i|1,2,3, …, k } where k is the number of source fields; average value of the remaining source domains in deep network structure>
Figure BDA0004105493410000063
Mean square error->
Figure BDA0004105493410000064
As an inverse normalization parameter and constraining the source domain D after inverse normalization by a second deep network structure i Is aligned with the recessive distribution space constructed by the storage module; the specific alignment mode is as follows:
Figure BDA0004105493410000065
wherein μ and σ 2 Mean and variance of implicit distribution, respectively, |·| | #, are represented 2 Represents L 2 Square of norm.
Average mean value
Figure BDA0004105493410000066
Mean square error->
Figure BDA0004105493410000067
The calculation mode of (2) is as follows:
Figure BDA0004105493410000068
Figure BDA0004105493410000069
wherein T represents the iteration number, and alpha is the exponential moving average super-parameter.
Preferably, KL divergence loss L is employed kl Each sample of the "simulation target domain" is constrained, as calculated:
Figure BDA0004105493410000071
wherein,,
Figure BDA0004105493410000072
mean value of last layer characteristic diagram representing nth sample in batch sample after height and width normalization, ++>
Figure BDA0004105493410000073
Representing the variance of the last layer feature map of the nth sample in a batch of samples after height and width normalization, ++>
Figure BDA0004105493410000074
And->
Figure BDA0004105493410000075
Are all R in size N×C
The beneficial effects of the invention are as follows:
1. IN the strong baseline model established by the invention, the domain normalization module DN is established by combining the distribution information of the IN layer and the BN layer, so that the functions of the IN layer and the BN layer are balanced, the influence of domain difference is relieved, and meanwhile, the identity information of the pedestrians with the identification is extracted.
2. The invention further provides a recessive distribution space on the basis of a strong baseline model; calculating mean and variance of implicit distribution by using the distribution information of the source domains, and projecting the distribution of all source domain data to the implicit distribution space; meanwhile, the designed inverse normalization module AN uses the distribution information of the source domain as AN inverse normalization parameter to restrict the projection of the target domain to the recessive distribution space, and finally, the distribution of the source domain and the target domain is indirectly aligned, so that the domain difference is reduced, and the accuracy rate of domain generalized pedestrian re-identification is further enhanced.
Drawings
FIG. 1 is a schematic diagram of a strong baseline model suitable for a domain generalized pedestrian re-recognition task.
Fig. 2 is a schematic diagram of a domain generalized pedestrian re-recognition method based on implicit distribution alignment.
Detailed Description
The technical scheme of the invention is described in detail below with reference to the attached drawings and specific embodiments.
Embodiment one:
the embodiment provides a field generalized pedestrian re-identification method based on a strong baseline model, which comprises the following steps:
s1, establishing a strong baseline model.
In this embodiment, a network structure of ResNet50 with BN layer is used as a backbone network, and is divided into 4 stages according to the characteristics of its bottleneck structure. In stage1 and stage2, since the two stages are in the shallow layer of the network, most of the extracted semantic information is the shallow layer of the pedestrian image, so that the structure in the two stages does not need to be changed; for deep network structures stage3 and stage4, all batch normalization layers BN in the deep network structures are replaced by domain normalization modules DN to form a strong baseline model, as shown in figure 1.
For the input feature map x ε R N×C×H×W Wherein N, C, H and W respectively represent the number of batch samples, the number of channels, the height and the width of the input feature map, and the domain normalization module DN is calculated as follows:
Figure BDA0004105493410000081
wherein a is 1 、a 2 Respectively represent two learnable parameters for adaptively learning the functions of the balanced IN layer and BN layer, and a IN each DN module 1 、a 2 Unique to it, not shared with other DN modules; gamma and beta represent a learnable scaling parameter and a translation parameter, respectively; e is a constant, preventing denominator from being 0;
Figure BDA0004105493410000082
and->
Figure BDA0004105493410000083
Respectively representing the mean and variance IN the IN layer and BN layer IN the ith source domain, wherein the calculation mode is as follows:
Figure BDA0004105493410000084
Figure BDA0004105493410000085
in the strong baseline model, the BN positions in the deep network structures stage3 and stage4 are replaced by DN modules of each corresponding source domain and all other structures are shared, so that parameters of the network structure are greatly reduced, the model can extract commonality of identity authentication information of pedestrians in all source domains, and meanwhile, generalization of the network can be enhanced to a certain extent by utilizing different DN modules.
S2, training the strong baseline model by adopting at least two source domains to obtain a trained strong baseline model.
During training, only 1 source domain is randomly selected as input at a time, preventing multiple source domains from updating the model together from biasing the model towards a source domain that is easier to fit. Ith source domain D for input i I epsilon {1,2,3, …, k }, k represents the number of source domains, and after data passes through a shared shallow layer structure in the network and a deep layer network structure containing unique DN modules thereof, output characteristics f are finally obtained and are respectively classified by each source domainPredictive probability p of layer, trained penalty function includes classification penalty L cls And ternary loss L tri The calculation mode is as follows:
Figure BDA0004105493410000091
where n.epsilon. {1,2,3, …, N }, id.epsilon. {1,2,3, …, ID }, ID is the number of pedestrian identities in the ith source domain, N is the number of samples of the ith source domain, y is the true pedestrian identity tag, p j Representing the probability that the model predictive sample belongs to the pedestrian identity tag j, ε is a constant for tag smoothing, q j Is a "soft label" after label smoothing.
Figure BDA0004105493410000092
Wherein II represents L 2 The norm of the sample is calculated,
Figure BDA0004105493410000093
and->
Figure BDA0004105493410000094
Features f representing input samples, respectively n In one batch of positive and negative samples, margin is a boundary constant, which in this example takes 0.35.
And (3) training a strong baseline model by using the formula (4) and the formula (5) for the input source domain to obtain a trained strong baseline model.
S3, inputting the target domain image into the trained strong baseline model to finish the re-identification task.
In the testing process, a target domain sample is input into a trained strong baseline model, k output features are obtained through unique DN modules of all source domains, the average value of the k output features is calculated, and finally features for similarity measurement are obtained to perform feature matching, so that a re-identification task is completed.
In each DN module, average is usedValue of
Figure BDA0004105493410000101
Mean square error->
Figure BDA0004105493410000102
Substitution of the mean +.>
Figure BDA0004105493410000103
Sum of variances->
Figure BDA0004105493410000104
The strong baseline model was tested and calculated as follows:
Figure BDA0004105493410000105
Figure BDA0004105493410000106
wherein T represents the iteration number, and alpha is the exponential moving average super-parameter.
Embodiment two:
the embodiment provides a field generalization pedestrian re-identification method based on implicit distribution alignment, which is characterized in that a second deep network structure which is parallel to the deep network structure and has the same structure is added on the basis of a strong base line model in the first embodiment, and a field normalization module DN in the second deep network structure is replaced by AN inverse normalization module AN; meanwhile, a storage module is added, and the mean value and the variance of the source domain stored in the storage module are averaged to obtain the mean value and the variance of the implicit distribution, so that an implicit distribution space is constructed, and specifically:
since the BN layer IN the strong baseline model still extracts the distribution information of the source domain, combining the distribution information IN the IN layer and the BN layer only by linear interpolation cannot completely eliminate the influence of the domain difference. Therefore, on the basis of a strong baseline model, an implicit distribution space is constructed by using the distribution information of all source domains, and the distribution of all source domain data is projected to the space. Aiming at the problem that the data of the target domain is unknown, AN inverse normalization module (AN) is designed, each source domain is regarded as a 'simulated target domain' of other source domains, and the target domain takes the distribution information of the source domain as AN inverse normalization parameter on the basis of removing the style information and the identity authentication information by utilizing the IN operation, so that the target domain is constrained to project to AN implicit distribution space.
As shown in fig. 2, the domain generalization pedestrian re-recognition method based on implicit distribution alignment mainly comprises two characteristics.
First, a Memory module (M) is added for storing the mean and variance of each source domain, the size of which is R 2 ×k×C K represents the number of source fields, and C represents the number of channels of the final layer of feature map. In the training process, for one batch of data of the ith source domain, the updating mode of M is as follows:
Figure BDA0004105493410000111
Figure BDA0004105493410000112
where δ is a super parameter, set to 0.2 in this embodiment. The storage module is averaged in the 1 st dimension, namely the mean value mu epsilon R of the recessive distribution space formed by k source domains is obtained 1×C Sum of variances sigma 2 ∈R 1×C
Secondly, the network structures of stage5 and stage6 are increased, the structures of stage5 and stage6 are consistent with the structures of stage3 and stage4, the domain normalization module DN is replaced by the inverse normalization module AN, and the inverse normalization module AN is utilized to project the target domain samples to the recessive distribution space. The inverse normalization module AN is calculated as follows:
Figure BDA0004105493410000113
wherein m is j Is capable of learningParameters; ith source domain D for input i The AN firstly removes the style information by using the IN operation, and D is obtained i The "simulated target domain" considered as the remaining source domain, remaining source domain D j The corresponding mean value of j E { j not equal to i|1,2,3, …, k } in stage3 and stage4
Figure BDA0004105493410000121
Sum of variances->
Figure BDA0004105493410000122
As an inverse normalization parameter, use D j Is initialized and D after inverse normalization is constrained by stage5 and stage6 i Is aligned with the implicit distribution built by the memory module. Due to D i The method can be well generalized to a real target domain because the distribution information of the method is not used when the inverse normalization is performed.
During the training process, 1 source domain is still randomly selected from all source domains as input. Ith source domain D for input i Firstly, through stage1 and stage2 shared by all source domains, then through stage3 and stage4, updating the mean value and variance of the source domains in the storage module, and calculating the classification loss L according to the mode in the first embodiment cls And ternary loss L tri . Meanwhile, in order to restrict the distribution projection of the batch data to the implicit distribution space, in this embodiment, the mean square error loss L is used mse Constraining the mean of the batch of data
Figure BDA0004105493410000123
Sum of variances->
Figure BDA0004105493410000124
The mean value and the variance of the implicit distribution space are kept consistent, and the calculation mode is as follows:
Figure BDA0004105493410000125
then freezing the shared network structure and inputting againThe source domain data passes through stage5 and stage6 to obtain the average value of the last layer of feature map normalization of each sample in the batch data
Figure BDA0004105493410000126
Sum of variances
Figure BDA0004105493410000127
And constrains the distribution of each sample to align with the implicit distribution. Because part of samples have unique properties, forcedly keeping each sample consistent with implicit distribution can cause network failure to converge, so KL divergence loss L is adopted kl Constraint is carried out, and the calculation mode is as follows:
Figure BDA0004105493410000128
finally, the total loss function L during training total The method comprises the following steps:
L t o tal =L cls +L tri +L mse +L kl (13)
in the testing process, target domain data respectively pass through stage3 and stage4 branches and stage5 and stage6 branches, average characteristics of the obtained k output characteristics are calculated, then the average characteristics of the two branches are spliced in channel dimension to serve as characteristics for similarity measurement finally to perform characteristic matching, and a re-identification task is completed.
Cross-domain training is performed on a large data set, and the effectiveness of the cross-domain training is verified through experiments.
The cross-domain experiment is carried out by using 4 large pedestrian re-identification data sets, and specific experimental settings comprise: M+MS+CS→C3, M+CS+C3→MS, MS+CS+C3→M, wherein "→" left side represents source domain, right side represents target domain, and only training set of all source domains is adopted as training set of cross-domain experiment.
TABLE 1 comparison with other unsupervised domain generalized pedestrian re-identification methods
Figure BDA0004105493410000131
As shown in table 1, the bolded numbers represent the highest points and the underlined numbers represent the next highest points. It can be seen that the generalized pedestrian re-recognition method based on the field of distribution alignment achieves good performance, for example, the MetaBIN method combines the characteristics of source domain data after passing through an IN layer and a BN layer, and simultaneously combines element learning training strategies to optimize model updating modes; the META method also uses the source domain as a "simulated target domain", calculates the distance between the distribution of each sample in the target domain and the distribution of all source domains, uses this as the weight of the network output characteristics, and aggregates the output characteristics of all source domain networks. Unlike these methods, the present invention combines the distribution information IN the IN layer and BN layer to allow the network to adaptively balance the functions of the IN layer and BN layer and construct a Strong Baseline model corresponding to Strong Baseline IN the table. It can be seen that mAP and Rank-1 indexes of the strong baseline model exceed all the existing front edge methods in a cross-domain experiment of M+MS+CS- & gt C3, and the effectiveness of the invention is demonstrated. In addition, the method designs an inverse normalization module, constructs a field generalized pedestrian re-identification method based on implicit distribution alignment, aligns the distribution of a source domain and a target domain to an implicit distribution, and achieves better re-identification performance in the distribution space. Finally, mAP indexes in all cross-domain experimental settings of the method reach the optimal, and the average performance is respectively improved by 0.6 percent and 1.0 percent on mAP and Rank-1 indexes compared with that of the META method.
In summary, the invention has the following characteristics and advantages: (1) By combining the mean value and the variance of the IN layer and the BN layer, a domain normalization module is established, and the module can balance the functions of the IN layer and the BN layer, so that the influence of domain difference is relieved, and meanwhile, advanced semantic information with identity authentication of pedestrians is extracted. The module is added into a network structure and a strong baseline model suitable for the re-identification of the field-generalized pedestrians is constructed. (2) On the basis of a strong baseline model, a field generalized pedestrian re-identification method based on implicit distribution alignment is provided, a implicit distribution space is constructed by using the distribution information of all source domains, the distribution of all source domain data is projected to the space, and meanwhile, an inverse normalization module is designed, so that the target domain data uses the distribution information of the source domains as an inverse normalization parameter to restrict the projection of the target domain to the implicit distribution space. By projecting both the source domain and the target domain into the implicit distribution space, the distribution alignment of the source domain and the target domain is indirectly realized, and the domain difference between the source domain and the target domain is reduced. (3) The generalization experimental effect in the field of 4 large pedestrian re-identification data sets shows that the method provided by the invention reaches an advanced level in average performance.

Claims (3)

1. The pedestrian re-identification method suitable for the generalization of the field is characterized by comprising the following steps of:
s1, establishing a strong baseline model;
s2, training the strong baseline model by adopting at least two source domains to obtain a trained strong baseline model;
s3, inputting the target domain image to be identified into a trained strong baseline model to finish the re-identification task;
specifically, in step S1, the strong baseline model uses the deep learning network model as a backbone network, and replaces the normalization module of the deep network structure in the deep learning network model with the domain normalization module DN to form the strong baseline model;
the calculation formula of the domain normalization module DN is as follows:
Figure FDA0004105493390000011
wherein,,
Figure FDA0004105493390000012
representing the feature map result, a, after the ith DN is passed for the input feature map x 1 、a 2 Respectively representing two learnable parameters for adaptively learning the functions of the balanced IN layer and the BN layer; gamma and beta represent a learnable scaling parameter and a translation parameter, respectively; e is a constant, preventing denominator from being 0; />
Figure FDA0004105493390000013
Respectively represent the mean, variance, < >/IN the ith source domain IN layer>
Figure FDA0004105493390000014
Respectively representing the mean and the variance in the ith source domain BN layer, wherein the calculation mode is as follows:
Figure FDA0004105493390000015
Figure FDA0004105493390000016
where n.epsilon. {1,2,3, …, N }, h.epsilon. {1,2,3, …, H }, w.epsilon. {1,2,3, …, W }, N, C, H and W represent the input feature maps x.epsilon.R, respectively N×C×H×W A number of channels, a height and a width.
2. The pedestrian re-recognition method suitable for domain generalization as claimed in claim 1, wherein a second deep network structure which is parallel to the deep network structure and has the same structure is added on the basis of a strong baseline model, and a domain normalization module DN is replaced by AN inverse normalization module AN; meanwhile, a storage module is added, and the mean value and the variance of the source domain stored in the storage module are respectively averaged to obtain the mean value and the variance of the implicit distribution, so that an implicit distribution space is constructed;
the calculation formula of the inverse normalization module AN is as follows:
Figure FDA0004105493390000021
wherein,,
Figure FDA0004105493390000022
representing a feature map result after the input feature map x passes through the ith AN; ith source domain D for input i D is to i "simulation target domain" considered as remaining source domain, m j Is the jth source domain D j The leachable parameters of j e { j not equal to i|1,2,3, …, k } where k is the number of source fields; average value of the remaining source domains in deep network structure>
Figure FDA0004105493390000023
Mean square error->
Figure FDA0004105493390000024
As an inverse normalization parameter and constraining the source domain D after inverse normalization by a second deep network structure i Is aligned with the recessive distribution space constructed by the storage module; the specific alignment mode is as follows:
Figure FDA0004105493390000025
wherein μ and σ 2 Mean and variance of implicit distribution, respectively, |·| | #, are represented 2 Represents L 2 Squaring the norm;
average mean value
Figure FDA0004105493390000026
Mean square error->
Figure FDA0004105493390000027
The calculation mode of (2) is as follows:
Figure FDA0004105493390000028
Figure FDA0004105493390000029
wherein T represents the iteration number, and alpha is the exponential moving average super-parameter.
3. A pedestrian re-identification method suitable for generalization in the field as claimed in claim 2, wherein KL divergence loss L is used kl Each sample of the "simulation target domain" is constrained, as calculated:
Figure FDA00041054933900000210
wherein,,
Figure FDA00041054933900000211
mean value of last layer characteristic diagram representing nth sample in batch sample after height and width normalization, ++>
Figure FDA00041054933900000212
Representing the variance of the last layer feature map of the nth sample in a batch of samples after height and width normalization, ++>
Figure FDA0004105493390000031
And->
Figure FDA0004105493390000032
Are all R in size N×C
CN202310191111.4A 2023-03-02 2023-03-02 Pedestrian re-identification method suitable for generalization of field Pending CN116343294A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310191111.4A CN116343294A (en) 2023-03-02 2023-03-02 Pedestrian re-identification method suitable for generalization of field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310191111.4A CN116343294A (en) 2023-03-02 2023-03-02 Pedestrian re-identification method suitable for generalization of field

Publications (1)

Publication Number Publication Date
CN116343294A true CN116343294A (en) 2023-06-27

Family

ID=86890741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310191111.4A Pending CN116343294A (en) 2023-03-02 2023-03-02 Pedestrian re-identification method suitable for generalization of field

Country Status (1)

Country Link
CN (1) CN116343294A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173476A (en) * 2023-09-05 2023-12-05 北京交通大学 Single-source domain generalized pedestrian re-identification method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173476A (en) * 2023-09-05 2023-12-05 北京交通大学 Single-source domain generalized pedestrian re-identification method
CN117173476B (en) * 2023-09-05 2024-05-24 北京交通大学 Single-source domain generalized pedestrian re-identification method

Similar Documents

Publication Publication Date Title
Fu et al. Self-similarity grouping: A simple unsupervised cross domain adaptation approach for person re-identification
Bai et al. Group-sensitive triplet embedding for vehicle reidentification
Deng et al. Variational prototype learning for deep face recognition
Wu et al. Pseudo-pair based self-similarity learning for unsupervised person re-identification
CN110163117B (en) Pedestrian re-identification method based on self-excitation discriminant feature learning
Liang et al. M2m-gan: Many-to-many generative adversarial transfer learning for person re-identification
CN112488229A (en) Domain self-adaptive unsupervised target detection method based on feature separation and alignment
CN110097033A (en) A kind of single sample face recognition method expanded based on feature
CN114299542A (en) Video pedestrian re-identification method based on multi-scale feature fusion
CN114692741A (en) Generalized face counterfeiting detection method based on domain invariant features
Wang et al. Viewpoint adaptation learning with cross-view distance metric for robust vehicle re-identification
CN116343294A (en) Pedestrian re-identification method suitable for generalization of field
CN111126155B (en) Pedestrian re-identification method for generating countermeasure network based on semantic constraint
CN112766378A (en) Cross-domain small sample image classification model method focusing on fine-grained identification
An Pedestrian Re‐Recognition Algorithm Based on Optimization Deep Learning‐Sequence Memory Model
CN111428650A (en) Pedestrian re-identification method based on SP-PGGAN style migration
Chen et al. Part alignment network for vehicle re-identification
Zhang et al. Beyond triplet loss: Meta prototypical N-tuple loss for person re-identification
Wei et al. Reinforced domain adaptation with attention and adversarial learning for unsupervised person Re-ID
Ahmad et al. Deep convolutional neural network using triplet loss to distinguish the identical twins
CN110968735B (en) Unsupervised pedestrian re-identification method based on spherical similarity hierarchical clustering
CN113051962B (en) Pedestrian re-identification method based on twin Margin-Softmax network combined attention machine
Yu et al. Reference-oriented loss for person re-identification
Chen et al. Dual Attention Network for Unsupervised Domain Adaptive Person Re-identification
Dhamija et al. An approach to enhance performance of age invariant face recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination