CN112069920A - Cross-domain pedestrian re-identification method based on attribute feature driven clustering - Google Patents

Cross-domain pedestrian re-identification method based on attribute feature driven clustering Download PDF

Info

Publication number
CN112069920A
CN112069920A CN202010828757.5A CN202010828757A CN112069920A CN 112069920 A CN112069920 A CN 112069920A CN 202010828757 A CN202010828757 A CN 202010828757A CN 112069920 A CN112069920 A CN 112069920A
Authority
CN
China
Prior art keywords
pedestrian
attribute
domain
cross
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010828757.5A
Other languages
Chinese (zh)
Other versions
CN112069920B (en
Inventor
种衍文
章郴
潘少明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202010828757.5A priority Critical patent/CN112069920B/en
Publication of CN112069920A publication Critical patent/CN112069920A/en
Application granted granted Critical
Publication of CN112069920B publication Critical patent/CN112069920B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention designs a cross-domain pedestrian re-identification method based on attribute feature driven clustering, which adopts a pytorch frame to construct a network architecture, takes an ID classification module based on global pedestrian features as a main module, is assisted by an ID classification module based on local pedestrian features, a pedestrian attribute classification module and an unsupervised domain adaptation module based on iteration, introduces semantic information with different granularities and refines an attribute identification process through a multi-task combined learning mode, and constructs an effective cross-domain pedestrian re-identification network. Compared with the similar method, the invention innovatively provides that the attribute information is incorporated into the clustering characteristics, so that the closed set-based attribute classification task is utilized to assist the open set-based pedestrian re-identification task to perform cross-domain unsupervised domain adaptation, a more accurate clustering result is obtained, the pseudo label data in the iterative training process is more accurate, the generalization capability of the model is greatly enhanced, and the requirement of deployment in an actual scene is met.

Description

Cross-domain pedestrian re-identification method based on attribute feature driven clustering
Technical Field
The method can be applied to the field of pedestrian retrieval/pedestrian re-identification, a frame is built by taking the pytorch as a model, and a multi-task joint learning structure is constructed to realize pedestrian retrieval.
Background
The pedestrian re-identification technology is a research hotspot in the field of image retrieval all the time, and can be widely applied to scenes such as security and video monitoring systems. The image retrieval method is the same as the tasks of vehicle weight identification, northeast tiger weight identification, fine-grained image retrieval and the like, and belongs to the subtasks of image retrieval. The traditional pedestrian re-identification algorithm is characterized in that simple low-level features such as color features, Haar features and LOMO features are extracted to serve as pedestrian representation, so that the identifiability and the generalization are often not strong enough, and much inconvenience is brought to later-stage application.
In recent years, with the explosion of hardware level and the rapid rise of deep learning, Convolutional Neural Networks (CNNs) are beginning to be widely applied to image processing and have made breakthrough progress in different fields. The features extracted by the CNNs have strong discriminativity, so the method is particularly suitable for image classification/identification/retrieval tasks, and provides opportunities for the wide application of the CNNs in the field of pedestrian re-identification.
Generally, there are two main methods for pedestrian re-identification based on deep learning, one is to regard pedestrian re-identification as a special classification task, and extract the optimal pedestrian features for classification by using the extremely strong feature characterization capability of CNNs, and such models are collectively referred to as ID Embedding (IDE) models. The other is considered from the Setting of pedestrian re-recognition (Setting), and unlike the traditional closed set classification task (such as mnist handwritten digit classification, the training set and the test set both classify ten categories of numbers 0-9), the IDs of the training set and the test set are non-overlapping (open set), and the number of IDs in reality is often extremely large, so that it should be regarded as a search task. The method enables the intra-class samples to be closer and the inter-class samples to be farther by designing different metric learning methods.
In addition, the research of pedestrian re-identification is also divided into two directions, one is to improve the characterization capability of pedestrian features on a single domain, namely training sets and test sets are from the same data distribution although the IDs are not overlapped, and have similar style characteristics, and such research is usually dedicated to designing more complex network structures to improve the model discrimination. And the other method is based on practical application and considers cross-domain pedestrian re-identification, namely the training set and the test set are not only provided with IDs which are not overlapped with each other, but also come from different data sets and have different data distribution, and the information of the style, illumination, shielding and the like of the image is also different, namely, a large domain bias (domain bias) exists. Research in this area is directed to improving the generalization performance of the model. To improve the cross-domain performance (generalization) of the model, efficient unsupervised domain adaptation is necessary. Common unsupervised domain adaptation efforts are classified into methods based on generation of countermeasure networks (GANs), which simulate samples with different distributions by generating more target domain samples, and clustering fine-tuning methods, which fine-tune the network by clustering to generate "pseudo label" data, by which the model inevitably reaches sub-optimal solutions and deviates from the globally optimal solution. The open set characteristic of pedestrian re-identification prevents the application of the two types of common unsupervised domain adaptation methods designed aiming at the closed set task in the cross-domain pedestrian re-identification field.
In summary, pedestrian re-identification as an open set task, whether from a single domain or across domains, presents a significant challenge. The method is characterized in that a more reasonable method is designed based on the problem of cross-domain pedestrian re-identification, the generalization capability of a model under cross-domain setting is improved, and the application of the technology of assisting in real life is considered.
Disclosure of Invention
In view of the problems and defects of the existing pedestrian re-identification method, the invention provides the cross-domain pedestrian re-identification method based on attribute feature driven clustering, and the problem can be effectively solved. The key point of the method is to creatively introduce an attribute identification task (closed set) into a pedestrian re-identification task (open set) to solve the problem of cross-domain deviation, so that the challenging problem is converted into a common unsupervised domain adaptation problem.
The technical scheme of the invention is a cross-domain pedestrian re-identification method based on attribute feature driven clustering, which constructs a cross-domain pedestrian re-identification network in a multi-task joint learning mode, wherein the network takes an ID classification module based on global pedestrian features as a main module and is assisted by an ID classification module based on local pedestrian features, a pedestrian attribute classification module and an unsupervised domain adaptation module based on iteration;
the ID classification module based on the global pedestrian features is used for acquiring the global features e of pedestrians;
the ID classification module based on the local pedestrian features is used for acquiring local feature information except global feature information of pedestrians, and comprises an upper body feature eupAnd lower body characteristics edown
The pedestrian attribute classification module is used as an auxiliary task in the input image xaObtaining a global feature e and a local feature eupAnd edownThen, a selection module is used for selecting preset characteristics to carry out specific attribute classification to obtain pedestrian attribute characteristics eatt
The iteration-based unsupervised domain adaptation module consists of two parts: clustering unlabeled samples of the target domain and generating "pseudo labels"; performing fine-tuning on the cross-domain pedestrian re-identification network by using the generated pseudo tag data in a self-learning manner;
the specific steps of utilizing the cross-domain pedestrian re-identification network to perform cross-domain pedestrian re-identification are as follows:
(1) firstly, training a cross-domain pedestrian re-recognition network by using source domain data irrelevant to target domain data to obtain a corresponding network model and weight;
(2) loading a network model and weights to initialize a network, and then extracting pedestrian features and attribute features of a target domain training set, wherein the pedestrian features comprise a global feature e and an upper body feature eupAnd lower body characteristics edownAttribute feature eattObtained by a pedestrian attribute classification module;
(3) connecting the pedestrian features and the attribute features extracted in the step (2) together, sending the pedestrian features and the attribute features to an iteration-based unsupervised domain adaptation module for clustering, assigning 'pseudo labels' and generating a target domain 'training sample';
(4) feeding the 'pseudo label' sample generated in the step (3) to a cross-domain pedestrian re-identification network initialized by source domain data for fine-tuning;
(5) repeating the steps (2) to (4) until the network model converges;
(6) and extracting the global features of the pedestrians from the input target domain query image, calculating the Euclidean distance between the global features of the pedestrians and the global features of the pedestrians extracted from each image in the target domain image library, and then sequencing from small to large to obtain a retrieval result.
Further, the global pedestrian feature-based ID classification module outputs global pedestrian features e on avg pool of a residual error network Res-50 serving as a backbone network, and then sequentially performs batch normalization and C classification and normalization to probability p, wherein C is the number of training set IDs, the specific training process of the global pedestrian feature-based ID classification module is as follows,
constructing a triple by adopting PK sampling, namely randomly selecting P pedestrians, and selecting K images for each pedestrian; for each anchor image xaSelect the image x with the same ID but the farthest distancepAs a positive sample, an image x with a different ID but closest distance is selectednAs negative examples, a triplet is formed, and the formulation of loss supervision is described as follows:
Figure BDA0002637162310000031
wherein
Figure BDA0002637162310000032
Representing images
Figure BDA0002637162310000033
Belong to yiProbability p of class, and
Figure BDA0002637162310000034
and
Figure BDA0002637162310000035
respectively representing an anchor point, a positive sample and a negative sample, respectively representing the ID class to which the image belongs, and m representing the triple loss mThe argin parameter, D (-) represents the Euclidean distance between the two, e (-) represents the extraction of the features e (-) via the global pedestrian feature-based ID classification module]+It is indicated that the loss of change is,
Figure BDA0002637162310000036
in order to achieve a cross-entropy loss,
Figure BDA0002637162310000037
is a loss of triples based on difficult samples, and LgAnd representing the supervision loss of the global branch, and is used for updating the ID classification module based on the global pedestrian feature.
Further, the specific process of the ID classification module based on the local pedestrian feature is as follows,
for an input image xaFirstly, two space transformation networks are utilized to respectively obtain the upper body characteristic diagram and the lower body characteristic diagram of the pedestrian in a self-adaptive manner on the output of an ID classification module based on the global pedestrian characteristic, and then the results are respectively subjected to reshape operation to obtain the local characteristic e of the pedestrianupAnd edownRespectively carrying out batch normalization and C classification in sequence to obtain the probability pupAnd pdown. Specifically, the spatial transformation network consists of a positioning network and a sampling network, wherein the positioning network consists of 2048 convolution layers of 3 × 3 convolution kernels, batch normalization, ReLU activation, global pooling, a full-connection layer of 512 neurons, ReLU activation and a full-connection layer of 6 neurons, and the sampling network respectively and adaptively acquires an upper body feature map and a lower body feature map of the pedestrian on an output feature map through 6 parameters learned by the positioning network, namely the output of the positioning network; the local pedestrian feature-based ID classification module is also optimized using cross-entropy penalties and difficult triplet penalties, and the formula is described as follows:
Figure BDA0002637162310000041
wherein
Figure BDA0002637162310000042
Representing images
Figure BDA0002637162310000043
Belong to yiThe probability p of the category, superscript l, represents the local branch, i.e. l ∈ { up, down }; while
Figure BDA0002637162310000044
And
Figure BDA0002637162310000045
respectively representing an anchor point, a positive sample and a negative sample, superscripts i and j respectively representing the ID category of the image, m representing a margin parameter of triple loss, D (·,) representing the Euclidean distance between the two, e (·) representing the extraction of a feature e, [ · via an ID classification module based on global pedestrian features]+It is indicated that the loss of change is,
Figure BDA0002637162310000046
in order to achieve a cross-entropy loss,
Figure BDA0002637162310000047
for the triplet loss based on difficult samples, the loss L is obtainedupAnd LdownThe classification of the upper body of the pedestrian and the lower body of the pedestrian is supervised respectively.
Furthermore, the pedestrian attribute classification module comprises M pedestrian attribute classification tasks, and the specific implementation steps are as follows,
the M pedestrian attribute classification tasks are realized by using M branches with the same structure, wherein each branch comprises batch normalization, 512-dimensional feature embedding and a classification layer, and the final output probability pattAnd performing attribute two classification operation, namely having the attribute or not having the attribute, wherein the formula is described as follows:
Figure BDA0002637162310000051
wherein
Figure BDA0002637162310000052
Representing the cross-entropy loss of the jth attribute, for supervising the classification process of the jth attribute, LattThen there is an overall loss of M attributes.
Figure BDA0002637162310000053
Representing a sample
Figure BDA0002637162310000054
Belongs to the category
Figure BDA0002637162310000055
Probability p ofatt,pattBy eattThe features are transformed by 512-dimensional full-connected layer and normalized by softmax, and eatt∈{e,eup,edown}。
Further, an hdbscan clustering method is adopted in the iteration-based unsupervised domain adaptation module to realize clustering.
Further, in the step (1), before training, the size of the image in the source domain data is readjusted to 256 × 128 × 3, then the adjusted image is converted into tensor data processed by the pyrrch framework, and each pixel is normalized by the mean value and variance of ImageNet.
The invention relates to a method applied to pedestrian re-identification/pedestrian retrieval, which has the following advantages compared with the prior art:
(1) aiming at the characteristic that the generalization of the pedestrian features in the existing cross-domain pedestrian re-recognition task is difficult to effectively improve in the unsupervised domain adaptation method, the invention provides a multi-task joint learning mode, introduces the attribute features into the clustering information, improves the quality of the clustering sample, and provides a more reliable 'pseudo label' training sample for the fine-tuning.
(2) The invention designs a more reasonable classification mode for the attribute identification model, namely identifying specific attributes at the specified part, refining the task of attribute identification and leading the attribute identification to have pertinence.
(3) The invention increases with extremely fine calculation cost, greatly improves the performance of the model on the cross-domain pedestrian re-identification task, achieves competitive results on two commonly used large-scale data sets of Market1501 and DukeMTMC-reiD, and meets the requirement of deployment in an actual scene.
Drawings
Fig. 1 is a diagram showing the structure of a backbone network (residual error network Res-50) used in the present invention.
Fig. 2 is a structural diagram of a Spatial Transform Network (STN) used in the present invention.
Fig. 3 is an explanatory diagram for performing different attribute classifications in different regions used in the present invention.
Fig. 4 is a diagram of a complete model architecture for use with the present invention.
Fig. 5 is a diagram of the search effect on mark 1501 according to the present invention.
Detailed Description
The following describes a detailed pedestrian re-identification process with reference to an example and the accompanying drawings.
As shown in fig. 4, the present invention provides a cross-domain pedestrian re-identification method based on attribute feature-driven clustering, which constructs a cross-domain pedestrian re-identification network in a multi-task joint learning (multi-tasks learning) manner. The network takes an ID classification module based on global pedestrian characteristics as a main module, and is assisted by an ID classification module based on local pedestrian characteristics, a pedestrian attribute classification module and an unsupervised domain adaptation module based on iteration, and semantic information with different granularities is introduced, so that the problem in the cross-domain pedestrian re-identification field is effectively solved.
The ID classification module based on global pedestrian features is used for classifying pedestrians of different classes (IDs) in the training set. The existing backbone network (figure 1) avg pool outputs the global pedestrian feature e and then sequentially carries out batch normalization and C classification and normalization to probability p, wherein C is the ID number of the training set. Considering the characteristic that the pedestrian re-identification belongs to both the classification task and the retrieval task, cross entropy loss and triplet loss based on difficult samples are used for optimizing the neural network. To make the network converge faster and more stable, we choose to optimize these two loss functions at different layers.
We use the PK sampling structureAnd (4) establishing a triple group, namely randomly selecting P pedestrians, and selecting K images for each pedestrian. For each anchor image xaSelect the image x with the same ID but the farthest distancepAs a positive sample, an image x with a different ID but closest distance is selectednAs negative examples, triplets are composed. The formulaic description of loss supervision is shown below:
Figure BDA0002637162310000061
wherein
Figure BDA0002637162310000062
Representing images
Figure BDA0002637162310000063
Belong to yiProbability p of class, and
Figure BDA0002637162310000064
and
Figure BDA0002637162310000065
respectively representing an anchor point, a positive sample and a negative sample, and superscripts i and j respectively representing the ID class to which the image belongs. m represents the margin parameter of the triplet loss, D (-) represents the Euclidean distance between the two, e (-) represents the network extracted feature e, [ ·]+It is indicated that the loss of change is,
Figure BDA0002637162310000066
in order to achieve a cross-entropy loss,
Figure BDA0002637162310000067
is a loss of triples based on difficult samples, and LgRepresents the supervised loss of the global branch for updating the model.
As an auxiliary task, the ID classification module based on local pedestrian features mainly has the following three purposes: 1. improving the discrimination ability of the network when training with source domain data unrelated to the target domain, so that the characterization ability of the extracted pedestrian features is betterStrong; 2. in the target domain adaptation process, the clustering result is more accurate as a part of the clustering characteristics, so that the generalization capability of the model in the target domain is enhanced; 3. in the source domain and the target domain, an auxiliary attribute identification task provides local characteristic (upper body and lower body, body level granularity) information in addition to pedestrian whole-body characteristic (global level granularity) information, and some specific pedestrian attributes are identified on the pedestrian body information (for example, "hat" is more suitable for upper body identification rather than lower body or whole body, and "age" needs to be identified from whole-body information), so that pedestrian attribute information with finer granularity is introduced. For an input image xaFirstly, two space transformation networks (composed of a positioning network and a sampling network, as shown in fig. 2, wherein the positioning network is composed of 2048 convolution layers of 3 × 3 convolution kernels, batch normalization, ReLU activation, global pooling, 512 full-connection layers of neurons, ReLU activation and 6 full-connection layers of neurons), and the sampling network respectively performs affine transformation and bilinear interpolation on the output feature map of the avg pool layer through 6 parameters (output of the positioning network) learned by the positioning network) to respectively and adaptively acquire the upper half-body feature map and the lower half-body feature map of the pedestrian on the output feature map of the avg pool layer, and then respectively performs reshape operation on the result to obtain the local feature e of the pedestrianupAnd edownRespectively carrying out batch normalization and C classification in sequence to obtain the probability pupAnd pdownCross-entropy losses and difficult triplet losses are also used to optimize the neural network. The formula is described as follows:
Figure BDA0002637162310000071
most of the definitions are similar to equation (1), and the superscript L denotes the local branch, i.e. L ∈ { up, down }, resulting in a loss LupAnd LdownThe classification of the upper body of the pedestrian and the lower body of the pedestrian is supervised respectively.
The pedestrian attribute classification module is used as an auxiliary task in the input image xaObtaining a global feature e and a local feature eupAnd edownThen through oneThe selection module selects preset characteristics for specific attribute classification (wherein the Market1501 data set is provided with whether a backpack is available or not, the hair length, whether a hat is worn or not, the color of the coat (blue, green, gray, purple, red, white, black and yellow), and the attribute information of the coat type is selected from the upper part e of the pedestrianupObtaining whether the pedestrian carries a bag or not, whether the pedestrian carries a handbag or not, and the color of the lower garment (black, blue, brown, gray, green, pink, purple, white and yellow) is determined by the lower half body e of the pedestriandownObtaining the age and the sex of the pedestrian from the whole body e of the pedestrian; on the DukeMTMC-reID dataset, whether hat is present, jacket type, jacket color (black, white, red, pink, gray, blue, green, brown) attribute information is obtained from the pedestrian's upper body, whether belt is present, whether handbag is present and under-coat color (black, white, red, brown, blue, green, gray) is obtained from the pedestrian's lower body region, and gender is obtained from the pedestrian's general information, see fig. 3 for details). The M pedestrian attribute classification tasks are realized by using M branches with the same structure (different supervision information), wherein each branch comprises batch normalization, 512-dimensional feature embedding and a classification layer. Final output probability pattProperty binary operation (with/without the property, with M branches making up a classification task of M properties) is performed. The formula is described as follows:
Figure BDA0002637162310000081
wherein
Figure BDA0002637162310000082
Representing the cross-entropy loss of the jth attribute, for supervising the classification process of the jth attribute, LattThen there is an overall loss of M attributes.
Figure BDA0002637162310000083
Representing a sample
Figure BDA0002637162310000084
Belongs to the category
Figure BDA0002637162310000085
Probability p ofatt(i.e., the probability of having the property), pattBy eattThe features are transformed by 512-dimensional full-connected layer and normalized by softmax, and eatt∈{e,eup,edownFig. 3 shows a specific selection rule.
It can be seen that, unlike the conventional method, we have carefully designed rules for attribute identification. Intuitively, for human beings, if we want to recognize the color of a person's coat, we tend to focus only on the upper body of the pedestrian, and focus on the lower body or whole body information of the pedestrian does not help in classification. This is also the process for machines, and furthermore, since machines do not have a global perception of vision like humans, there may instead be additional computational burden and adverse effects if attention is given to the lower body or the whole body of a pedestrian. Specifically, for different attributes, we select different regions for attribute classification, and the detailed correspondence rule is shown in fig. 3.
And the unsupervised domain adaptation module based on iteration excavates the similarity between the target domain samples in a self-learning mode, so that the generalization capability of the network is effectively improved. The module consists of two parts: clustering unlabeled samples of the target domain and generating "pseudo labels"; the generated pseudo label data is subjected to fine-tuning on the network in a self-learning (self-training/self-learning) mode. The clustering process of the label-free samples is the core of the whole problem, and the clustering result is directly related to the data quality of the pseudo label, so that the fine-tuning process of the second stage is influenced, and the performance of the model is finally influenced. It should be noted that the network supervised and trained by using the training set image necessarily tends to have similar distribution with the training set when extracting the pedestrian features, and the pedestrian re-recognition is an open set task, so the features extracted from the test set image will also have the characteristics of the training set, and this phenomenon is more serious for the cross-domain task of the pedestrian re-recognition because the training set and the test set in the cross-domain pedestrian re-recognition are from different distributions, which often has great difference, which is unfavorable for the pedestrian re-recognition task. And attribute identification as oneAnd the closed set classification task has good migratability. Thus the attribute features e we will extractattAnd pedestrian features (including a pedestrian global feature e and a pedestrian local feature e)up,edown) Connected together, introduced into the clustering features for hdbscan clustering to capture as much as possible of the commonality between the source domain samples and the target domain samples (if there is a high probability that more than 20 attributes of the two images match completely and belong to the same ID; and if many of the attributes of the two images do not match, then they are less likely to belong to the same person).
The method comprises the following operation steps:
(1) firstly, training a pedestrian re-identification network by using source domain data to obtain a corresponding network model and weight;
(2) the model weight is loaded to initialize the network, and then the pedestrian features (including the global feature e and the upper body feature e) of the target domain training set are extractedupAnd lower body characteristics edown) And attribute feature eatt
(3) Connecting the pedestrian features and the attribute features extracted in the step (2), sending the pedestrian features and the attribute features to an hdbscan clustering network for clustering, assigning pseudo labels to the hdbscan clustering network, and generating a target domain training sample;
(4) feeding the 'pseudo label' sample generated in the step (3) to a pedestrian re-identification network initialized by source domain data for finish-tuning;
(5) repeating (2) - (4) until the model converges.
(6) After the model is converged, extracting the global features of the pedestrians from the input target domain query image, calculating the Euclidean distance between the global features of the pedestrians and the global features of the pedestrians extracted from each image in the target domain image library, and then sequencing from small to large to obtain a retrieval result.
Taking training on a Market1501 data set and testing on a DukeMTMC-reiD data set as an example, the main steps comprise:
1) training an initial network model, namely a cross-domain pedestrian re-recognition network, by using source domain (Market1501) data which is independent of target domain data;
firstly, images of a Market1501 data set are largeThe small readjustment is 256 × 128 × 3, and then the adjusted image is converted into tensor data that can be processed by the pyrrch framework, and each pixel of the tensor data is normalized by the mean and variance of ImageNet. We used an Adam optimizer with the initial learning rate set to 3.5 × 10-4The learning rate is attenuated by a factor of 10 at 40 and 70 epochs, respectively, until the model converges or training ends at the 120 th epoch. During training, the batch size is set to 64 comprising 16 IDs, each ID comprises 4 images, cross entropy loss and triple loss (margin parameter is set to be 0.3) are used for supervising pedestrian re-identification branches during training, cross entropy loss is used for supervising attribute identification branches, formula description is shown in formula 1 and formula 3, and model and network weight are saved.
2) Clustering and assigning "pseudo-labels" to target domain (DukeMTMC-reiD) unlabeled training samples "
Establishing a model and loading model weights obtained by the previous training to initialize a network, then extracting pedestrian features (including global features and local features) and attribute features from a target domain training sample, connecting the two features together, and feeding the two features to an hdbscan module for clustering. The minimum number of samples in each cluster of hdbscan clustering is set to be 5, after a clustering result is obtained, a 'pseudo label' with the ID of 0, 1, 2 and 3 … … is marked on each cluster respectively to serve as a 'labeled' training sample in the next stage, the ID of a sample which is not clustered is assigned to be-1, and the training in the next stage is not participated in (when the next iteration is carried out, part of the samples are clustered into useful 'pseudo label' data).
3) Fine-tuning the model with the target domain "pseudo-tag" samples
Loading the model weight obtained by training in 1) to initialize the network, resizing the sample of the "pseudo label" obtained in the last step to 256 × 128 × 3, and performing further fine-tuning training on the model by using the samples according to the same setting and operation in 1) until the model converges.
4) Extracting and retrieving the features of the target domain test set
And after the model weight obtained by the previous training is loaded, resizing the target domain test set sample to 256 multiplied by 128 multiplied by 3, closing model parameter updating, extracting pedestrian global features of the sample, then calculating Euclidean distances among the features, sequencing from small to large, and returning a retrieval result.
The following describes the visualization effect of the present invention with reference to table 1 and fig. 5. Our method achieves competitive results, with cross-domain performance on two common large-scale datasets in the field of pedestrian re-identification as shown in table 1. And fig. 5 shows the search results on the Market1501, where the leftmost image of each group of images is the query image (probe, pedestrian to be searched), and the right side is the result of model search, where the image surrounded by the thick rectangular box indicates that the search result is wrong, otherwise, indicates that the search result is correct. It can be seen that the model can capture the identification information under most conditions, accurately identify the query image and achieve the effect enough to meet the deployment requirement in the real world.
TABLE 1 comparison of Performance of different methods on Market1501 and DukeMTMC-reiD datasets
Figure BDA0002637162310000101
Reference documents:
[1].Luo H,Gu Y,Liao X,et al.Bag of tricks and a strong baseline for deep person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2019:0-0.
[2].Fan H,Zheng L,Yan C,et al.Unsupervised person re-identification:Clustering and fine-tuning[J].ACM Transactions on Multimedia Computing,Communications, and Applications(TOMM),2018,14(4):1-18.
[3].Wang J,Zhu X,Gong S,et al.Transferable joint attribute-identity deep learning for unsupervised person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2275-2284.
[4].Zhong Z,Zheng L,Luo Z,et al.Invariance matters:Exemplar memory for domain adaptive person re-identification[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2019:598-607.
[5].Song L,Wang C,Zhang L,et al.Unsupervised domain adaptive re-identification: Theory and practice[J].Pattern Recognition,2020,102:107173。

Claims (6)

1. a cross-domain pedestrian re-identification method based on attribute feature driven clustering is characterized by comprising the following steps: constructing a cross-domain pedestrian re-identification network in a multi-task combined learning mode, wherein the network takes an ID classification module based on global pedestrian characteristics as a main module, and is assisted by an ID classification module based on local pedestrian characteristics, a pedestrian attribute classification module and an unsupervised domain adaptation module based on iteration;
the ID classification module based on the global pedestrian features is used for acquiring the global features e of pedestrians;
the ID classification module based on the local pedestrian features is used for acquiring local feature information except global feature information of pedestrians, and comprises an upper body feature eupAnd lower body characteristics edown
The pedestrian attribute classification module is used as an auxiliary task in the input image xaObtaining a global feature e and a local feature eupAnd edownThen, a selection module is used for selecting preset characteristics to carry out specific attribute classification and obtaining pedestrian attribute characteristics eatt
The iteration-based unsupervised domain adaptation module consists of two parts: clustering unlabeled samples of the target domain and generating "pseudo labels"; performing fine-tuning on the cross-domain pedestrian re-identification network by using the generated pseudo tag data in a self-learning manner;
the specific steps of utilizing the cross-domain pedestrian re-identification network to perform cross-domain pedestrian re-identification are as follows:
(1) firstly, training a cross-domain pedestrian re-recognition network by using source domain data irrelevant to target domain data to obtain a corresponding network model and weight;
(2) for initializing the network by loading the network model and weights and then extracting the training set of the target domainPedestrian characteristics and attribute characteristics, wherein the pedestrian characteristics comprise a global characteristic e and an upper body characteristic eupAnd lower body characteristics edownAttribute feature eattObtained by a pedestrian attribute classification module;
(3) connecting the pedestrian features and the attribute features extracted in the step (2) together, sending the pedestrian features and the attribute features to an iteration-based unsupervised domain adaptation module for clustering, assigning 'pseudo labels' and generating a target domain 'training sample';
(4) feeding the 'pseudo label' sample generated in the step (3) to a cross-domain pedestrian re-identification network initialized by source domain data for fine-tuning;
(5) repeating the steps (2) to (4) until the network model converges;
(6) and extracting the global features of the pedestrians from the input target domain query image, calculating the Euclidean distance between the global features of the pedestrians and the global features of the pedestrians extracted from each image in the target domain image library, and then sequencing from small to large to obtain a retrieval result.
2. The cross-domain pedestrian re-identification method based on attribute feature driven clustering as claimed in claim 1, wherein: the ID classification module based on the global pedestrian features outputs the global pedestrian features e on the avg pool of which the backbone network is a residual error network Res-50, then sequentially carries out batch normalization and C classification and normalization to probability p, wherein C is the number of ID of a training set, the specific training process is as follows,
constructing a triple by adopting PK sampling, namely randomly selecting P pedestrians, and selecting K images for each pedestrian; for each anchor image xaSelect the image x with the same ID but the farthest distancepAs a positive sample, an image x with a different ID but closest distance is selectednAs negative examples, a triplet is formed, and the formulation of loss supervision is described as follows:
Figure FDA0002637162300000021
wherein
Figure FDA0002637162300000022
Representing images
Figure FDA0002637162300000023
Belong to yiProbability p of class, and
Figure FDA0002637162300000024
and
Figure FDA0002637162300000025
respectively representing an anchor point, a positive sample and a negative sample, superscripts i and j respectively representing the ID category of the image, m representing a margin parameter of triple loss, D (·,) representing the Euclidean distance between the two, e (·) representing the extraction of a feature e, [ · via an ID classification module based on global pedestrian features]+It is indicated that the loss of change is,
Figure FDA0002637162300000026
in order to achieve a cross-entropy loss,
Figure FDA0002637162300000027
is a loss of triples based on difficult samples, and LgAnd representing the supervision loss of the global branch, and is used for updating the ID classification module based on the global pedestrian feature.
3. The cross-domain pedestrian re-identification method based on attribute feature driven clustering as claimed in claim 1, wherein: the specific process of the ID classification module based on the local pedestrian feature is as follows,
for an input image xaFirstly, two space transformation networks are utilized to respectively obtain the upper body characteristic diagram and the lower body characteristic diagram of the pedestrian in a self-adaptive manner on the output of an ID classification module based on the global pedestrian characteristic, and then the results are respectively subjected to reshape operation to obtain the local characteristic e of the pedestrianupAnd edownRespectively carrying out batch normalization and C classification in sequence to obtain the probability pupAnd pdown. In particular, the spatial transformation network is composed of a positioning network and a sampling networkThe positioning network comprises 2048 convolution layers of 3 multiplied by 3 convolution kernels, batch normalization, ReLU activation, global pooling, a full-connection layer of 512 neurons, ReLU activation and a full-connection layer of 6 neurons, and the sampling network respectively obtains an upper body characteristic diagram and a lower body characteristic diagram of the pedestrian in a self-adaptive manner on the output characteristic diagram through 6 parameters learned by the positioning network, namely the output of the positioning network; the local pedestrian feature-based ID classification module is also optimized using cross-entropy penalties and difficult triplet penalties, and the formula is described as follows:
Figure FDA0002637162300000031
wherein
Figure FDA0002637162300000032
Representing images
Figure FDA0002637162300000033
Belong to yiThe probability p of the category, superscript l, represents the local branch, i.e. l ∈ { up, down }; while
Figure FDA0002637162300000034
And
Figure FDA0002637162300000035
respectively representing an anchor point, a positive sample and a negative sample, superscripts i and j respectively representing the ID category of the image, m representing a margin parameter of triple loss, D (·,) representing the Euclidean distance between the two, e (·) representing the extraction of a feature e, [ · via an ID classification module based on global pedestrian features]+It is indicated that the loss of change is,
Figure FDA0002637162300000036
in order to achieve a cross-entropy loss,
Figure FDA0002637162300000037
for ternary based on difficult samplesGroup loss, resulting in a loss LupAnd LdownThe classification of the upper body of the pedestrian and the lower body of the pedestrian is supervised respectively.
4. The cross-domain pedestrian re-identification method based on attribute feature driven clustering as claimed in claim 1, wherein: the pedestrian attribute classification module comprises M pedestrian attribute classification tasks, and the specific implementation steps are as follows,
the M pedestrian attribute classification tasks are realized by using M branches with the same structure, wherein each branch comprises batch normalization, 512-dimensional feature embedding and a classification layer, and the final output probability pattAnd performing attribute two classification operation, namely having the attribute or not having the attribute, wherein the formula is described as follows:
Figure FDA0002637162300000038
wherein
Figure FDA0002637162300000039
Representing the cross-entropy loss of the jth attribute, for supervising the classification process of the jth attribute, LattThen there is an overall loss of M attributes.
Figure FDA00026371623000000310
Representing a sample
Figure FDA00026371623000000311
Belongs to the category
Figure FDA00026371623000000312
Probability p ofatt,pattBy eattThe features are transformed by 512-dimensional full-connected layer and normalized by softmax, and eatt∈{e,eup,edown}。
5. The cross-domain pedestrian re-identification method based on attribute feature driven clustering as claimed in claim 1, wherein: and an hdbscan clustering method is adopted in the iteration-based unsupervised domain adaptation module to realize clustering.
6. The cross-domain pedestrian re-identification method based on attribute feature driven clustering as claimed in claim 1, wherein: in the step (1), before training, the size of the image in the source domain data is readjusted to 256 × 128 × 3, then the adjusted image is converted into tensor data processed by a pyrrch framework, and each pixel is normalized by using the mean value and variance of ImageNet.
CN202010828757.5A 2020-08-18 2020-08-18 Cross-domain pedestrian re-identification method based on attribute feature driven clustering Active CN112069920B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010828757.5A CN112069920B (en) 2020-08-18 2020-08-18 Cross-domain pedestrian re-identification method based on attribute feature driven clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010828757.5A CN112069920B (en) 2020-08-18 2020-08-18 Cross-domain pedestrian re-identification method based on attribute feature driven clustering

Publications (2)

Publication Number Publication Date
CN112069920A true CN112069920A (en) 2020-12-11
CN112069920B CN112069920B (en) 2022-03-15

Family

ID=73661871

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010828757.5A Active CN112069920B (en) 2020-08-18 2020-08-18 Cross-domain pedestrian re-identification method based on attribute feature driven clustering

Country Status (1)

Country Link
CN (1) CN112069920B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686135A (en) * 2020-12-29 2021-04-20 中南大学 Generalized pedestrian re-identification method based on distribution fitting
CN112733695A (en) * 2021-01-04 2021-04-30 电子科技大学 Unsupervised key frame selection method in pedestrian re-identification field
CN113052017A (en) * 2021-03-09 2021-06-29 北京工业大学 Unsupervised pedestrian re-identification method based on multi-granularity feature representation and domain adaptive learning
CN113065409A (en) * 2021-03-09 2021-07-02 北京工业大学 Unsupervised pedestrian re-identification method based on camera distribution difference alignment constraint
CN113095221A (en) * 2021-04-13 2021-07-09 电子科技大学 Cross-domain pedestrian re-identification method based on attribute feature and identity feature fusion
CN113221656A (en) * 2021-04-13 2021-08-06 电子科技大学 Cross-domain pedestrian re-identification model based on domain invariant features and method thereof
CN113221770A (en) * 2021-05-18 2021-08-06 青岛根尖智能科技有限公司 Cross-domain pedestrian re-identification method and system based on multi-feature hybrid learning
CN113255604A (en) * 2021-06-29 2021-08-13 苏州浪潮智能科技有限公司 Pedestrian re-identification method, device, equipment and medium based on deep learning network
CN113378632A (en) * 2021-04-28 2021-09-10 南京大学 Unsupervised domain pedestrian re-identification algorithm based on pseudo label optimization
CN113392786A (en) * 2021-06-21 2021-09-14 电子科技大学 Cross-domain pedestrian re-identification method based on normalization and feature enhancement
CN113392740A (en) * 2021-06-03 2021-09-14 吉林大学 Pedestrian re-identification system based on dual attention mechanism
CN113570644A (en) * 2021-09-27 2021-10-29 中国民用航空总局第二研究所 Airport passenger positioning method, airport passenger positioning device, electronic equipment and medium
CN114067356A (en) * 2021-10-21 2022-02-18 电子科技大学 Pedestrian re-identification method based on joint local guidance and attribute clustering
CN114333062A (en) * 2021-12-31 2022-04-12 江南大学 Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
CN116403015A (en) * 2023-03-13 2023-07-07 武汉大学 Unsupervised target re-identification method and system based on perception-aided learning transducer model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942025A (en) * 2019-11-26 2020-03-31 河海大学 Unsupervised cross-domain pedestrian re-identification method based on clustering
CN111259786A (en) * 2020-01-14 2020-06-09 浙江大学 Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
US20200218888A1 (en) * 2017-07-18 2020-07-09 Vision Semantics Limited Target Re-Identification
CN111476168A (en) * 2020-04-08 2020-07-31 山东师范大学 Cross-domain pedestrian re-identification method and system based on three stages

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200218888A1 (en) * 2017-07-18 2020-07-09 Vision Semantics Limited Target Re-Identification
CN110942025A (en) * 2019-11-26 2020-03-31 河海大学 Unsupervised cross-domain pedestrian re-identification method based on clustering
CN111259786A (en) * 2020-01-14 2020-06-09 浙江大学 Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN111476168A (en) * 2020-04-08 2020-07-31 山东师范大学 Cross-domain pedestrian re-identification method and system based on three stages

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PENG WANG ET AL.: "Local-Global Extraction Unit for Person Re-identification", 《BICS 2018》 *
YANWEN CHONG ET AL.: "Unsupervised Cross-Domain Person Re-identification Based on Style Transfer", 《ICIC 2019》 *
张晓伟 等: "基于局部语义特征不变性的跨域行人重识别", 《北京航空航天大学学报》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686135A (en) * 2020-12-29 2021-04-20 中南大学 Generalized pedestrian re-identification method based on distribution fitting
CN112733695A (en) * 2021-01-04 2021-04-30 电子科技大学 Unsupervised key frame selection method in pedestrian re-identification field
CN112733695B (en) * 2021-01-04 2023-04-25 电子科技大学 Unsupervised keyframe selection method in pedestrian re-identification field
CN113052017A (en) * 2021-03-09 2021-06-29 北京工业大学 Unsupervised pedestrian re-identification method based on multi-granularity feature representation and domain adaptive learning
CN113065409A (en) * 2021-03-09 2021-07-02 北京工业大学 Unsupervised pedestrian re-identification method based on camera distribution difference alignment constraint
CN113095221A (en) * 2021-04-13 2021-07-09 电子科技大学 Cross-domain pedestrian re-identification method based on attribute feature and identity feature fusion
CN113221656A (en) * 2021-04-13 2021-08-06 电子科技大学 Cross-domain pedestrian re-identification model based on domain invariant features and method thereof
CN113378632B (en) * 2021-04-28 2024-04-12 南京大学 Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
CN113378632A (en) * 2021-04-28 2021-09-10 南京大学 Unsupervised domain pedestrian re-identification algorithm based on pseudo label optimization
CN113221770A (en) * 2021-05-18 2021-08-06 青岛根尖智能科技有限公司 Cross-domain pedestrian re-identification method and system based on multi-feature hybrid learning
CN113392740A (en) * 2021-06-03 2021-09-14 吉林大学 Pedestrian re-identification system based on dual attention mechanism
CN113392740B (en) * 2021-06-03 2022-06-28 吉林大学 Pedestrian heavy identification system based on dual attention mechanism
CN113392786A (en) * 2021-06-21 2021-09-14 电子科技大学 Cross-domain pedestrian re-identification method based on normalization and feature enhancement
CN113255604B (en) * 2021-06-29 2021-10-15 苏州浪潮智能科技有限公司 Pedestrian re-identification method, device, equipment and medium based on deep learning network
US11810388B1 (en) 2021-06-29 2023-11-07 Inspur Suzhou Intelligent Technology Co., Ltd. Person re-identification method and apparatus based on deep learning network, device, and medium
CN113255604A (en) * 2021-06-29 2021-08-13 苏州浪潮智能科技有限公司 Pedestrian re-identification method, device, equipment and medium based on deep learning network
CN113570644A (en) * 2021-09-27 2021-10-29 中国民用航空总局第二研究所 Airport passenger positioning method, airport passenger positioning device, electronic equipment and medium
CN113570644B (en) * 2021-09-27 2021-11-30 中国民用航空总局第二研究所 Airport passenger positioning method, airport passenger positioning device, electronic equipment and medium
CN114067356A (en) * 2021-10-21 2022-02-18 电子科技大学 Pedestrian re-identification method based on joint local guidance and attribute clustering
CN114333062A (en) * 2021-12-31 2022-04-12 江南大学 Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
CN114333062B (en) * 2021-12-31 2022-07-15 江南大学 Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
CN116403015A (en) * 2023-03-13 2023-07-07 武汉大学 Unsupervised target re-identification method and system based on perception-aided learning transducer model
CN116403015B (en) * 2023-03-13 2024-05-03 武汉大学 Unsupervised target re-identification method and system based on perception-aided learning transducer model

Also Published As

Publication number Publication date
CN112069920B (en) 2022-03-15

Similar Documents

Publication Publication Date Title
CN112069920B (en) Cross-domain pedestrian re-identification method based on attribute feature driven clustering
Zhao et al. Object detection with deep learning: A review
Cui et al. Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop
Yi et al. Shared representation learning for heterogenous face recognition
CN110942025A (en) Unsupervised cross-domain pedestrian re-identification method based on clustering
CN112307995B (en) Semi-supervised pedestrian re-identification method based on feature decoupling learning
Gutta et al. Face recognition using hybrid classifier systems
Shen et al. Category-aware spatial constraint for weakly supervised detection
Hsu et al. Weakly supervised salient object detection by learning a classifier-driven map generator
Ni et al. Transfer model collaborating metric learning and dictionary learning for cross-domain facial expression recognition
CN112784728A (en) Multi-granularity clothes changing pedestrian re-identification method based on clothing desensitization network
Han et al. Weakly supervised person search with region siamese networks
CN113222072A (en) Lung X-ray image classification method based on K-means clustering and GAN
Ren et al. Discriminative residual analysis for image set classification with posture and age variations
Liu et al. Bilaterally normalized scale-consistent sinkhorn distance for few-shot image classification
Lu et al. Mask-aware pseudo label denoising for unsupervised vehicle re-identification
Zhao et al. Visible-infrared person re-identification based on frequency-domain simulated multispectral modality for dual-mode cameras
He et al. Spatial and Temporal Dual-Attention for Unsupervised Person Re-Identification
Nikhal et al. Multi-context grouped attention for unsupervised person re-identification
Bianchi et al. An interpretable graph-based image classifier
Wang et al. Deep metric learning on the SPD manifold for image set classification
Dou et al. Learning global and local consistent representations for unsupervised image retrieval via deep graph diffusion networks
Li et al. Cross-modal distribution alignment embedding network for generalized zero-shot learning
Huang et al. Condition-Adaptive Graph Convolution Learning for Skeleton-Based Gait Recognition
Li et al. Criminal investigation image classification based on spatial cnn features and elm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant