CN112069920A

CN112069920A - Cross-domain pedestrian re-identification method based on attribute feature driven clustering

Info

Publication number: CN112069920A
Application number: CN202010828757.5A
Authority: CN
Inventors: 种衍文; 章郴; 潘少明
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2020-08-18
Filing date: 2020-08-18
Publication date: 2020-12-11
Anticipated expiration: 2040-08-18
Also published as: CN112069920B

Abstract

The invention designs a cross-domain pedestrian re-identification method based on attribute feature driven clustering, which adopts a pytorch frame to construct a network architecture, takes an ID classification module based on global pedestrian features as a main module, is assisted by an ID classification module based on local pedestrian features, a pedestrian attribute classification module and an unsupervised domain adaptation module based on iteration, introduces semantic information with different granularities and refines an attribute identification process through a multi-task combined learning mode, and constructs an effective cross-domain pedestrian re-identification network. Compared with the similar method, the invention innovatively provides that the attribute information is incorporated into the clustering characteristics, so that the closed set-based attribute classification task is utilized to assist the open set-based pedestrian re-identification task to perform cross-domain unsupervised domain adaptation, a more accurate clustering result is obtained, the pseudo label data in the iterative training process is more accurate, the generalization capability of the model is greatly enhanced, and the requirement of deployment in an actual scene is met.

Description

Cross-domain pedestrian re-identification method based on attribute feature driven clustering

Technical Field

The method can be applied to the field of pedestrian retrieval/pedestrian re-identification, a frame is built by taking the pytorch as a model, and a multi-task joint learning structure is constructed to realize pedestrian retrieval.

Background

The pedestrian re-identification technology is a research hotspot in the field of image retrieval all the time, and can be widely applied to scenes such as security and video monitoring systems. The image retrieval method is the same as the tasks of vehicle weight identification, northeast tiger weight identification, fine-grained image retrieval and the like, and belongs to the subtasks of image retrieval. The traditional pedestrian re-identification algorithm is characterized in that simple low-level features such as color features, Haar features and LOMO features are extracted to serve as pedestrian representation, so that the identifiability and the generalization are often not strong enough, and much inconvenience is brought to later-stage application.

In recent years, with the explosion of hardware level and the rapid rise of deep learning, Convolutional Neural Networks (CNNs) are beginning to be widely applied to image processing and have made breakthrough progress in different fields. The features extracted by the CNNs have strong discriminativity, so the method is particularly suitable for image classification/identification/retrieval tasks, and provides opportunities for the wide application of the CNNs in the field of pedestrian re-identification.

Generally, there are two main methods for pedestrian re-identification based on deep learning, one is to regard pedestrian re-identification as a special classification task, and extract the optimal pedestrian features for classification by using the extremely strong feature characterization capability of CNNs, and such models are collectively referred to as ID Embedding (IDE) models. The other is considered from the Setting of pedestrian re-recognition (Setting), and unlike the traditional closed set classification task (such as mnist handwritten digit classification, the training set and the test set both classify ten categories of numbers 0-9), the IDs of the training set and the test set are non-overlapping (open set), and the number of IDs in reality is often extremely large, so that it should be regarded as a search task. The method enables the intra-class samples to be closer and the inter-class samples to be farther by designing different metric learning methods.

In addition, the research of pedestrian re-identification is also divided into two directions, one is to improve the characterization capability of pedestrian features on a single domain, namely training sets and test sets are from the same data distribution although the IDs are not overlapped, and have similar style characteristics, and such research is usually dedicated to designing more complex network structures to improve the model discrimination. And the other method is based on practical application and considers cross-domain pedestrian re-identification, namely the training set and the test set are not only provided with IDs which are not overlapped with each other, but also come from different data sets and have different data distribution, and the information of the style, illumination, shielding and the like of the image is also different, namely, a large domain bias (domain bias) exists. Research in this area is directed to improving the generalization performance of the model. To improve the cross-domain performance (generalization) of the model, efficient unsupervised domain adaptation is necessary. Common unsupervised domain adaptation efforts are classified into methods based on generation of countermeasure networks (GANs), which simulate samples with different distributions by generating more target domain samples, and clustering fine-tuning methods, which fine-tune the network by clustering to generate "pseudo label" data, by which the model inevitably reaches sub-optimal solutions and deviates from the globally optimal solution. The open set characteristic of pedestrian re-identification prevents the application of the two types of common unsupervised domain adaptation methods designed aiming at the closed set task in the cross-domain pedestrian re-identification field.

In summary, pedestrian re-identification as an open set task, whether from a single domain or across domains, presents a significant challenge. The method is characterized in that a more reasonable method is designed based on the problem of cross-domain pedestrian re-identification, the generalization capability of a model under cross-domain setting is improved, and the application of the technology of assisting in real life is considered.

Disclosure of Invention

In view of the problems and defects of the existing pedestrian re-identification method, the invention provides the cross-domain pedestrian re-identification method based on attribute feature driven clustering, and the problem can be effectively solved. The key point of the method is to creatively introduce an attribute identification task (closed set) into a pedestrian re-identification task (open set) to solve the problem of cross-domain deviation, so that the challenging problem is converted into a common unsupervised domain adaptation problem.

The technical scheme of the invention is a cross-domain pedestrian re-identification method based on attribute feature driven clustering, which constructs a cross-domain pedestrian re-identification network in a multi-task joint learning mode, wherein the network takes an ID classification module based on global pedestrian features as a main module and is assisted by an ID classification module based on local pedestrian features, a pedestrian attribute classification module and an unsupervised domain adaptation module based on iteration;

the ID classification module based on the global pedestrian features is used for acquiring the global features e of pedestrians;

the ID classification module based on the local pedestrian features is used for acquiring local feature information except global feature information of pedestrians, and comprises an upper body feature e^upAnd lower body characteristics e^down；

The pedestrian attribute classification module is used as an auxiliary task in the input image x_aObtaining a global feature e and a local feature e^upAnd e^downThen, a selection module is used for selecting preset characteristics to carry out specific attribute classification to obtain pedestrian attribute characteristics e^att；

The iteration-based unsupervised domain adaptation module consists of two parts: clustering unlabeled samples of the target domain and generating "pseudo labels"; performing fine-tuning on the cross-domain pedestrian re-identification network by using the generated pseudo tag data in a self-learning manner;

the specific steps of utilizing the cross-domain pedestrian re-identification network to perform cross-domain pedestrian re-identification are as follows:

(1) firstly, training a cross-domain pedestrian re-recognition network by using source domain data irrelevant to target domain data to obtain a corresponding network model and weight;

(2) loading a network model and weights to initialize a network, and then extracting pedestrian features and attribute features of a target domain training set, wherein the pedestrian features comprise a global feature e and an upper body feature e^upAnd lower body characteristics e^downAttribute feature e^attObtained by a pedestrian attribute classification module;

(3) connecting the pedestrian features and the attribute features extracted in the step (2) together, sending the pedestrian features and the attribute features to an iteration-based unsupervised domain adaptation module for clustering, assigning 'pseudo labels' and generating a target domain 'training sample';

(4) feeding the 'pseudo label' sample generated in the step (3) to a cross-domain pedestrian re-identification network initialized by source domain data for fine-tuning;

(5) repeating the steps (2) to (4) until the network model converges;

(6) and extracting the global features of the pedestrians from the input target domain query image, calculating the Euclidean distance between the global features of the pedestrians and the global features of the pedestrians extracted from each image in the target domain image library, and then sequencing from small to large to obtain a retrieval result.

Further, the global pedestrian feature-based ID classification module outputs global pedestrian features e on avg pool of a residual error network Res-50 serving as a backbone network, and then sequentially performs batch normalization and C classification and normalization to probability p, wherein C is the number of training set IDs, the specific training process of the global pedestrian feature-based ID classification module is as follows,

constructing a triple by adopting PK sampling, namely randomly selecting P pedestrians, and selecting K images for each pedestrian; for each anchor image x_aSelect the image x with the same ID but the farthest distance_pAs a positive sample, an image x with a different ID but closest distance is selected_nAs negative examples, a triplet is formed, and the formulation of loss supervision is described as follows:

wherein

Representing images

Belong to y_iProbability p of class, and

and

respectively representing an anchor point, a positive sample and a negative sample, respectively representing the ID class to which the image belongs, and m representing the triple loss mThe argin parameter, D (-) represents the Euclidean distance between the two, e (-) represents the extraction of the features e (-) via the global pedestrian feature-based ID classification module]₊It is indicated that the loss of change is,

in order to achieve a cross-entropy loss,

is a loss of triples based on difficult samples, and L^gAnd representing the supervision loss of the global branch, and is used for updating the ID classification module based on the global pedestrian feature.

Further, the specific process of the ID classification module based on the local pedestrian feature is as follows,

for an input image x_aFirstly, two space transformation networks are utilized to respectively obtain the upper body characteristic diagram and the lower body characteristic diagram of the pedestrian in a self-adaptive manner on the output of an ID classification module based on the global pedestrian characteristic, and then the results are respectively subjected to reshape operation to obtain the local characteristic e of the pedestrian^upAnd e^downRespectively carrying out batch normalization and C classification in sequence to obtain the probability p^upAnd p^down. Specifically, the spatial transformation network consists of a positioning network and a sampling network, wherein the positioning network consists of 2048 convolution layers of 3 × 3 convolution kernels, batch normalization, ReLU activation, global pooling, a full-connection layer of 512 neurons, ReLU activation and a full-connection layer of 6 neurons, and the sampling network respectively and adaptively acquires an upper body feature map and a lower body feature map of the pedestrian on an output feature map through 6 parameters learned by the positioning network, namely the output of the positioning network; the local pedestrian feature-based ID classification module is also optimized using cross-entropy penalties and difficult triplet penalties, and the formula is described as follows:

wherein

Representing images

Belong to y_iThe probability p of the category, superscript l, represents the local branch, i.e. l ∈ { up, down }; while

And

respectively representing an anchor point, a positive sample and a negative sample, superscripts i and j respectively representing the ID category of the image, m representing a margin parameter of triple loss, D (·,) representing the Euclidean distance between the two, e (·) representing the extraction of a feature e, [ · via an ID classification module based on global pedestrian features]₊It is indicated that the loss of change is,

in order to achieve a cross-entropy loss,

for the triplet loss based on difficult samples, the loss L is obtained^upAnd L^downThe classification of the upper body of the pedestrian and the lower body of the pedestrian is supervised respectively.

Furthermore, the pedestrian attribute classification module comprises M pedestrian attribute classification tasks, and the specific implementation steps are as follows,

the M pedestrian attribute classification tasks are realized by using M branches with the same structure, wherein each branch comprises batch normalization, 512-dimensional feature embedding and a classification layer, and the final output probability p^attAnd performing attribute two classification operation, namely having the attribute or not having the attribute, wherein the formula is described as follows:

wherein

Representing the cross-entropy loss of the jth attribute, for supervising the classification process of the jth attribute, L^attThen there is an overall loss of M attributes.

Representing a sample

Belongs to the category

Probability p of^att，p^attBy e^attThe features are transformed by 512-dimensional full-connected layer and normalized by softmax, and e^att∈{e,e^up,e^down}。

Further, an hdbscan clustering method is adopted in the iteration-based unsupervised domain adaptation module to realize clustering.

Further, in the step (1), before training, the size of the image in the source domain data is readjusted to 256 × 128 × 3, then the adjusted image is converted into tensor data processed by the pyrrch framework, and each pixel is normalized by the mean value and variance of ImageNet.

The invention relates to a method applied to pedestrian re-identification/pedestrian retrieval, which has the following advantages compared with the prior art:

(1) aiming at the characteristic that the generalization of the pedestrian features in the existing cross-domain pedestrian re-recognition task is difficult to effectively improve in the unsupervised domain adaptation method, the invention provides a multi-task joint learning mode, introduces the attribute features into the clustering information, improves the quality of the clustering sample, and provides a more reliable 'pseudo label' training sample for the fine-tuning.

(2) The invention designs a more reasonable classification mode for the attribute identification model, namely identifying specific attributes at the specified part, refining the task of attribute identification and leading the attribute identification to have pertinence.

(3) The invention increases with extremely fine calculation cost, greatly improves the performance of the model on the cross-domain pedestrian re-identification task, achieves competitive results on two commonly used large-scale data sets of Market1501 and DukeMTMC-reiD, and meets the requirement of deployment in an actual scene.

Drawings

Fig. 1 is a diagram showing the structure of a backbone network (residual error network Res-50) used in the present invention.

Fig. 2 is a structural diagram of a Spatial Transform Network (STN) used in the present invention.

Fig. 3 is an explanatory diagram for performing different attribute classifications in different regions used in the present invention.

Fig. 4 is a diagram of a complete model architecture for use with the present invention.

Fig. 5 is a diagram of the search effect on mark 1501 according to the present invention.

Detailed Description

The following describes a detailed pedestrian re-identification process with reference to an example and the accompanying drawings.

As shown in fig. 4, the present invention provides a cross-domain pedestrian re-identification method based on attribute feature-driven clustering, which constructs a cross-domain pedestrian re-identification network in a multi-task joint learning (multi-tasks learning) manner. The network takes an ID classification module based on global pedestrian characteristics as a main module, and is assisted by an ID classification module based on local pedestrian characteristics, a pedestrian attribute classification module and an unsupervised domain adaptation module based on iteration, and semantic information with different granularities is introduced, so that the problem in the cross-domain pedestrian re-identification field is effectively solved.

The ID classification module based on global pedestrian features is used for classifying pedestrians of different classes (IDs) in the training set. The existing backbone network (figure 1) avg pool outputs the global pedestrian feature e and then sequentially carries out batch normalization and C classification and normalization to probability p, wherein C is the ID number of the training set. Considering the characteristic that the pedestrian re-identification belongs to both the classification task and the retrieval task, cross entropy loss and triplet loss based on difficult samples are used for optimizing the neural network. To make the network converge faster and more stable, we choose to optimize these two loss functions at different layers.

We use the PK sampling structureAnd (4) establishing a triple group, namely randomly selecting P pedestrians, and selecting K images for each pedestrian. For each anchor image x_aSelect the image x with the same ID but the farthest distance_pAs a positive sample, an image x with a different ID but closest distance is selected_nAs negative examples, triplets are composed. The formulaic description of loss supervision is shown below:

wherein

Representing images

Belong to y_iProbability p of class, and

and

respectively representing an anchor point, a positive sample and a negative sample, and superscripts i and j respectively representing the ID class to which the image belongs. m represents the margin parameter of the triplet loss, D (-) represents the Euclidean distance between the two, e (-) represents the network extracted feature e, [ ·]₊It is indicated that the loss of change is,

in order to achieve a cross-entropy loss,

is a loss of triples based on difficult samples, and L^gRepresents the supervised loss of the global branch for updating the model.

As an auxiliary task, the ID classification module based on local pedestrian features mainly has the following three purposes: 1. improving the discrimination ability of the network when training with source domain data unrelated to the target domain, so that the characterization ability of the extracted pedestrian features is betterStrong; 2. in the target domain adaptation process, the clustering result is more accurate as a part of the clustering characteristics, so that the generalization capability of the model in the target domain is enhanced; 3. in the source domain and the target domain, an auxiliary attribute identification task provides local characteristic (upper body and lower body, body level granularity) information in addition to pedestrian whole-body characteristic (global level granularity) information, and some specific pedestrian attributes are identified on the pedestrian body information (for example, "hat" is more suitable for upper body identification rather than lower body or whole body, and "age" needs to be identified from whole-body information), so that pedestrian attribute information with finer granularity is introduced. For an input image x_aFirstly, two space transformation networks (composed of a positioning network and a sampling network, as shown in fig. 2, wherein the positioning network is composed of 2048 convolution layers of 3 × 3 convolution kernels, batch normalization, ReLU activation, global pooling, 512 full-connection layers of neurons, ReLU activation and 6 full-connection layers of neurons), and the sampling network respectively performs affine transformation and bilinear interpolation on the output feature map of the avg pool layer through 6 parameters (output of the positioning network) learned by the positioning network) to respectively and adaptively acquire the upper half-body feature map and the lower half-body feature map of the pedestrian on the output feature map of the avg pool layer, and then respectively performs reshape operation on the result to obtain the local feature e of the pedestrian^upAnd e^downRespectively carrying out batch normalization and C classification in sequence to obtain the probability p^upAnd p^downCross-entropy losses and difficult triplet losses are also used to optimize the neural network. The formula is described as follows:

most of the definitions are similar to equation (1), and the superscript L denotes the local branch, i.e. L ∈ { up, down }, resulting in a loss L^upAnd L^downThe classification of the upper body of the pedestrian and the lower body of the pedestrian is supervised respectively.

The pedestrian attribute classification module is used as an auxiliary task in the input image x_aObtaining a global feature e and a local feature e^upAnd e^downThen through oneThe selection module selects preset characteristics for specific attribute classification (wherein the Market1501 data set is provided with whether a backpack is available or not, the hair length, whether a hat is worn or not, the color of the coat (blue, green, gray, purple, red, white, black and yellow), and the attribute information of the coat type is selected from the upper part e of the pedestrian^upObtaining whether the pedestrian carries a bag or not, whether the pedestrian carries a handbag or not, and the color of the lower garment (black, blue, brown, gray, green, pink, purple, white and yellow) is determined by the lower half body e of the pedestrian^downObtaining the age and the sex of the pedestrian from the whole body e of the pedestrian; on the DukeMTMC-reID dataset, whether hat is present, jacket type, jacket color (black, white, red, pink, gray, blue, green, brown) attribute information is obtained from the pedestrian's upper body, whether belt is present, whether handbag is present and under-coat color (black, white, red, brown, blue, green, gray) is obtained from the pedestrian's lower body region, and gender is obtained from the pedestrian's general information, see fig. 3 for details). The M pedestrian attribute classification tasks are realized by using M branches with the same structure (different supervision information), wherein each branch comprises batch normalization, 512-dimensional feature embedding and a classification layer. Final output probability p^attProperty binary operation (with/without the property, with M branches making up a classification task of M properties) is performed. The formula is described as follows:

wherein

Representing a sample

Belongs to the category

Probability p of^att(i.e., the probability of having the property), p^attBy e^attThe features are transformed by 512-dimensional full-connected layer and normalized by softmax, and e^att∈{e,e^up,e^downFig. 3 shows a specific selection rule.

It can be seen that, unlike the conventional method, we have carefully designed rules for attribute identification. Intuitively, for human beings, if we want to recognize the color of a person's coat, we tend to focus only on the upper body of the pedestrian, and focus on the lower body or whole body information of the pedestrian does not help in classification. This is also the process for machines, and furthermore, since machines do not have a global perception of vision like humans, there may instead be additional computational burden and adverse effects if attention is given to the lower body or the whole body of a pedestrian. Specifically, for different attributes, we select different regions for attribute classification, and the detailed correspondence rule is shown in fig. 3.

And the unsupervised domain adaptation module based on iteration excavates the similarity between the target domain samples in a self-learning mode, so that the generalization capability of the network is effectively improved. The module consists of two parts: clustering unlabeled samples of the target domain and generating "pseudo labels"; the generated pseudo label data is subjected to fine-tuning on the network in a self-learning (self-training/self-learning) mode. The clustering process of the label-free samples is the core of the whole problem, and the clustering result is directly related to the data quality of the pseudo label, so that the fine-tuning process of the second stage is influenced, and the performance of the model is finally influenced. It should be noted that the network supervised and trained by using the training set image necessarily tends to have similar distribution with the training set when extracting the pedestrian features, and the pedestrian re-recognition is an open set task, so the features extracted from the test set image will also have the characteristics of the training set, and this phenomenon is more serious for the cross-domain task of the pedestrian re-recognition because the training set and the test set in the cross-domain pedestrian re-recognition are from different distributions, which often has great difference, which is unfavorable for the pedestrian re-recognition task. And attribute identification as oneAnd the closed set classification task has good migratability. Thus the attribute features e we will extract^attAnd pedestrian features (including a pedestrian global feature e and a pedestrian local feature e)^up，e^down) Connected together, introduced into the clustering features for hdbscan clustering to capture as much as possible of the commonality between the source domain samples and the target domain samples (if there is a high probability that more than 20 attributes of the two images match completely and belong to the same ID; and if many of the attributes of the two images do not match, then they are less likely to belong to the same person).

The method comprises the following operation steps:

(1) firstly, training a pedestrian re-identification network by using source domain data to obtain a corresponding network model and weight;

(2) the model weight is loaded to initialize the network, and then the pedestrian features (including the global feature e and the upper body feature e) of the target domain training set are extracted^upAnd lower body characteristics e^down) And attribute feature e^att；

(3) Connecting the pedestrian features and the attribute features extracted in the step (2), sending the pedestrian features and the attribute features to an hdbscan clustering network for clustering, assigning pseudo labels to the hdbscan clustering network, and generating a target domain training sample;

(4) feeding the 'pseudo label' sample generated in the step (3) to a pedestrian re-identification network initialized by source domain data for finish-tuning;

(5) repeating (2) - (4) until the model converges.

(6) After the model is converged, extracting the global features of the pedestrians from the input target domain query image, calculating the Euclidean distance between the global features of the pedestrians and the global features of the pedestrians extracted from each image in the target domain image library, and then sequencing from small to large to obtain a retrieval result.

Taking training on a Market1501 data set and testing on a DukeMTMC-reiD data set as an example, the main steps comprise:

1) training an initial network model, namely a cross-domain pedestrian re-recognition network, by using source domain (Market1501) data which is independent of target domain data;

firstly, images of a Market1501 data set are largeThe small readjustment is 256 × 128 × 3, and then the adjusted image is converted into tensor data that can be processed by the pyrrch framework, and each pixel of the tensor data is normalized by the mean and variance of ImageNet. We used an Adam optimizer with the initial learning rate set to 3.5 × 10^-4The learning rate is attenuated by a factor of 10 at 40 and 70 epochs, respectively, until the model converges or training ends at the 120 th epoch. During training, the batch size is set to 64 comprising 16 IDs, each ID comprises 4 images, cross entropy loss and triple loss (margin parameter is set to be 0.3) are used for supervising pedestrian re-identification branches during training, cross entropy loss is used for supervising attribute identification branches, formula description is shown in formula 1 and formula 3, and model and network weight are saved.

2) Clustering and assigning "pseudo-labels" to target domain (DukeMTMC-reiD) unlabeled training samples "

Establishing a model and loading model weights obtained by the previous training to initialize a network, then extracting pedestrian features (including global features and local features) and attribute features from a target domain training sample, connecting the two features together, and feeding the two features to an hdbscan module for clustering. The minimum number of samples in each cluster of hdbscan clustering is set to be 5, after a clustering result is obtained, a 'pseudo label' with the ID of 0, 1, 2 and 3 … … is marked on each cluster respectively to serve as a 'labeled' training sample in the next stage, the ID of a sample which is not clustered is assigned to be-1, and the training in the next stage is not participated in (when the next iteration is carried out, part of the samples are clustered into useful 'pseudo label' data).

3) Fine-tuning the model with the target domain "pseudo-tag" samples

Loading the model weight obtained by training in 1) to initialize the network, resizing the sample of the "pseudo label" obtained in the last step to 256 × 128 × 3, and performing further fine-tuning training on the model by using the samples according to the same setting and operation in 1) until the model converges.

4) Extracting and retrieving the features of the target domain test set

And after the model weight obtained by the previous training is loaded, resizing the target domain test set sample to 256 multiplied by 128 multiplied by 3, closing model parameter updating, extracting pedestrian global features of the sample, then calculating Euclidean distances among the features, sequencing from small to large, and returning a retrieval result.

The following describes the visualization effect of the present invention with reference to table 1 and fig. 5. Our method achieves competitive results, with cross-domain performance on two common large-scale datasets in the field of pedestrian re-identification as shown in table 1. And fig. 5 shows the search results on the Market1501, where the leftmost image of each group of images is the query image (probe, pedestrian to be searched), and the right side is the result of model search, where the image surrounded by the thick rectangular box indicates that the search result is wrong, otherwise, indicates that the search result is correct. It can be seen that the model can capture the identification information under most conditions, accurately identify the query image and achieve the effect enough to meet the deployment requirement in the real world.

TABLE 1 comparison of Performance of different methods on Market1501 and DukeMTMC-reiD datasets

Reference documents:

[1].Luo H,Gu Y,Liao X,et al.Bag of tricks and a strong baseline for deep person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2019:0-0.

[2].Fan H,Zheng L,Yan C,et al.Unsupervised person re-identification:Clustering and fine-tuning[J].ACM Transactions on Multimedia Computing,Communications, and Applications(TOMM),2018,14(4):1-18.

[3].Wang J,Zhu X,Gong S,et al.Transferable joint attribute-identity deep learning for unsupervised person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2275-2284.

[4].Zhong Z,Zheng L,Luo Z,et al.Invariance matters:Exemplar memory for domain adaptive person re-identification[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2019:598-607.

[5].Song L,Wang C,Zhang L,et al.Unsupervised domain adaptive re-identification: Theory and practice[J].Pattern Recognition,2020,102:107173。

Claims

1. a cross-domain pedestrian re-identification method based on attribute feature driven clustering is characterized by comprising the following steps: constructing a cross-domain pedestrian re-identification network in a multi-task combined learning mode, wherein the network takes an ID classification module based on global pedestrian characteristics as a main module, and is assisted by an ID classification module based on local pedestrian characteristics, a pedestrian attribute classification module and an unsupervised domain adaptation module based on iteration;

The pedestrian attribute classification module is used as an auxiliary task in the input image x_aObtaining a global feature e and a local feature e^upAnd e^downThen, a selection module is used for selecting preset characteristics to carry out specific attribute classification and obtaining pedestrian attribute characteristics e^att；

(2) for initializing the network by loading the network model and weights and then extracting the training set of the target domainPedestrian characteristics and attribute characteristics, wherein the pedestrian characteristics comprise a global characteristic e and an upper body characteristic e^upAnd lower body characteristics e^downAttribute feature e^attObtained by a pedestrian attribute classification module;

(5) repeating the steps (2) to (4) until the network model converges;

2. The cross-domain pedestrian re-identification method based on attribute feature driven clustering as claimed in claim 1, wherein: the ID classification module based on the global pedestrian features outputs the global pedestrian features e on the avg pool of which the backbone network is a residual error network Res-50, then sequentially carries out batch normalization and C classification and normalization to probability p, wherein C is the number of ID of a training set, the specific training process is as follows,

wherein

Representing images

Belong to y_iProbability p of class, and

and

in order to achieve a cross-entropy loss,

3. The cross-domain pedestrian re-identification method based on attribute feature driven clustering as claimed in claim 1, wherein: the specific process of the ID classification module based on the local pedestrian feature is as follows,

for an input image x_aFirstly, two space transformation networks are utilized to respectively obtain the upper body characteristic diagram and the lower body characteristic diagram of the pedestrian in a self-adaptive manner on the output of an ID classification module based on the global pedestrian characteristic, and then the results are respectively subjected to reshape operation to obtain the local characteristic e of the pedestrian^upAnd e^downRespectively carrying out batch normalization and C classification in sequence to obtain the probability p^upAnd p^down. In particular, the spatial transformation network is composed of a positioning network and a sampling networkThe positioning network comprises 2048 convolution layers of 3 multiplied by 3 convolution kernels, batch normalization, ReLU activation, global pooling, a full-connection layer of 512 neurons, ReLU activation and a full-connection layer of 6 neurons, and the sampling network respectively obtains an upper body characteristic diagram and a lower body characteristic diagram of the pedestrian in a self-adaptive manner on the output characteristic diagram through 6 parameters learned by the positioning network, namely the output of the positioning network; the local pedestrian feature-based ID classification module is also optimized using cross-entropy penalties and difficult triplet penalties, and the formula is described as follows:

wherein

Representing images

And

in order to achieve a cross-entropy loss,

for ternary based on difficult samplesGroup loss, resulting in a loss L^upAnd L^downThe classification of the upper body of the pedestrian and the lower body of the pedestrian is supervised respectively.

4. The cross-domain pedestrian re-identification method based on attribute feature driven clustering as claimed in claim 1, wherein: the pedestrian attribute classification module comprises M pedestrian attribute classification tasks, and the specific implementation steps are as follows,

wherein

Representing a sample

Belongs to the category

5. The cross-domain pedestrian re-identification method based on attribute feature driven clustering as claimed in claim 1, wherein: and an hdbscan clustering method is adopted in the iteration-based unsupervised domain adaptation module to realize clustering.

6. The cross-domain pedestrian re-identification method based on attribute feature driven clustering as claimed in claim 1, wherein: in the step (1), before training, the size of the image in the source domain data is readjusted to 256 × 128 × 3, then the adjusted image is converted into tensor data processed by a pyrrch framework, and each pixel is normalized by using the mean value and variance of ImageNet.