CN111832514A

CN111832514A - Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels

Info

Publication number: CN111832514A
Application number: CN202010705108.6A
Authority: CN
Inventors: 张宝华; 朱思雨; 张宗义; 李建军; 关海芳; 邬可; 张晓艳
Original assignee: Inner Mongolia University of Science and Technology
Current assignee: Inner Mongolia University of Science and Technology
Priority date: 2020-07-21
Filing date: 2020-07-21
Publication date: 2020-10-27
Anticipated expiration: 2040-07-21
Also published as: CN111832514B

Abstract

The invention provides an unsupervised pedestrian re-identification method and a device based on soft multi-labels, wherein the method comprises the following pre-training steps: firstly, pre-training by using a deep network with an attention mechanism to obtain a characteristic diagram of each auxiliary reference data; secondly, calculating a loss function and optimizing the loss function through an optimization algorithm until the loss function corresponding to each auxiliary reference data feature vector is minimum, so as to obtain a pre-training model; a target set training step, namely, based on a pre-training model, using a deep network training with channel attention and space attention added to obtain a target characteristic diagram; the method comprises the following steps of optimizing a target set, calculating a corresponding loss function, and achieving the purposes of mining potential discrimination information, distinguishing different visually similar target pairs and solving the problem of cross-visual field label consistency; and (4) sequencing the target set, namely taking the target data with the output result larger than a preset threshold value as an unsupervised pedestrian re-identification matching object. The invention can improve the training speed and precision.

Description

Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels

Technical Field

The invention relates to the technical field of pedestrian re-identification, in particular to an unsupervised pedestrian re-identification method and device based on soft multi-labels.

Background

Due to the ever-increasing public safety needs, large-scale high-quality and inexpensive video camera devices are widely used in areas such as airports, subways, train stations, roads, schools, shopping malls, parking lots, theaters, and the like. Large-scale camera networks covering these areas provide large amounts of video data for anomaly or event of interest detection, target tracking, forensic, etc. However, because of the huge amount of videos, it is time-consuming, labor-consuming and inefficient to accurately find the object of interest from the camera network by only human beings, so that by using modern computer vision technology to automatically analyze a large amount of video data, the data can be processed more quickly and the monitoring quality can be significantly improved. Due to the fact that vision field crossing cannot be achieved in a monitoring network, buildings and the like are shielded, and positions of pedestrians are randomly changed, the motion track in the pedestrian video network is interrupted, when the pedestrians reappear, association needs to be conducted again, and therefore the pedestrian re-identification technology is needed.

Pedestrian re-identification is mainly used for tracking pedestrians in a non-overlapping area shot in a cross-camera scene, namely, an interested pedestrian is retrieved from an image shot by a camera, and then a target similar to the image of the interested pedestrian is retrieved in the cross-camera scene. By utilizing the technology to search the suspect image in the pedestrian database, a large amount of time and labor can be saved. The method has good application prospect in the aspects of intelligent security, criminal investigation work, missing person searching, image retrieval and the like.

The pedestrian re-identification method can be divided into supervised learning and unsupervised learning, and the problems of cross-vision field change, height similarity among different pedestrians and the like in the supervised learning can reduce the marking precision, so that the expansibility of the related method is poor. The problem of expandability of a supervised model can be solved by unsupervised learning, but the identification precision of the unsupervised learning is low at present, no mapping label exists in a cross-camera image, and the re-identification of unsupervised pedestrians is limited.

In the unsupervised pedestrian re-identification of directions, typical methods are a pseudo tag-based method and a domain adaptive method. But the model based on pseudo-label learning assigns pseudo-labels by directly comparing visual features (e.g., by K-means clustering) without concern for potential discriminative information. Unsupervised adaptation based approaches only focus on migrating or adapting the discriminative information from the source domain, ignoring the mining of discriminative label information in unlabeled target domains, and even after adaptation, the discriminative information in the source domain is less effective in the target domain.

Disclosure of Invention

In view of this, the invention provides an unsupervised pedestrian re-identification method and device based on soft multi-labels, so as to improve the training speed and precision.

In one aspect, the invention provides an unsupervised pedestrian re-identification method based on soft multi-labels, which comprises the following steps:

a pre-training step, namely adding an auxiliary reference data set into a residual error network of a channel and a space attention mechanism to obtain characteristic diagrams of each auxiliary reference data; the auxiliary reference data set comprises a plurality of tagged auxiliary reference pedestrian data; secondly, calculating corresponding loss functions according to each auxiliary reference data characteristic diagram, wherein the loss functions comprise soft multi-label loss functions; optimizing through an optimization algorithm again until loss functions corresponding to the characteristic vectors of the auxiliary reference data are minimum, obtaining a pre-trained weight file, and taking the pre-trained weight file as a pre-training model;

a target set training step, namely, based on a pre-training model, enabling a target data set to pass through a residual error network of the channel and space attention mechanism to obtain each target data characteristic diagram, wherein the target data set comprises a plurality of label-free target pedestrian data;

an objective set optimization step, calculating soft multi-label and loss functions according to each objective data characteristic graph and performing iterative optimization by using an optimization algorithm, wherein the loss functions comprise a soft multi-label loss function, a reference proxy loss function and a cross-view label consistent loss function;

and a target set sequencing step, namely sequencing the target pedestrians according to the loss function result, taking the target data with the result larger than a preset threshold value as an unsupervised pedestrian re-identification matching object, and finally calculating the matching accuracy and precision.

Further, the loss function L is: l ═ L₁+λ₁L₂+λ₂L₃Wherein λ is₁And λ₂Respectively, a cross-view label uniform loss function L₂And a reference proxy learning loss function L₃b.L.₁Is a soft multi-label loss function.

Further, suppose (x)_i,x_j) For the label-free target pair, the soft multi-label loss function is calculated in the following manner:

m＝{(i,j)|f(x_i)^Tf(x_j)≥S,L(y_i,y_j)≥T} n＝{(τ,ω)|f(x_τ)^Tf(x_ω)≥S,L(y_τ,y_ω)＜T}

wherein m is all the set of the opposite faces; n is the set of all hard negative pairs; s is a characteristic similarity threshold; t is a soft multi-label similarity threshold; l (·,) is soft multi-label consistency based on L1 distance; y is softA multi-label function;

is a target data set without labels; f (-) is a mapping function to be learned by the soft multi-label; f (-) is the discriminative depth feature embedding to learn;

is an incorporated reference proxy.

Further, the cross-view label consistency loss function is calculated in the following manner:

where S (y) is the soft multi-label distribution of the data set X, S_θ(y) is the soft multi-label distribution in the theta camera view in dataset X, W (·,) is the distance between the two distributions, with a simplified 2-Wasserstein distance, μ and σ representing the mean and variance vectors of the soft multi-label logarithm, μ_θAnd σ_θIs the mean and variance vectors of the soft multi-labeled logarithms in the theta camera view.

Further, the calculation method of the reference agent learning loss function cross-view label consistent loss function is as follows:

wherein the labeled auxiliary reference data set is

Is per reference person z_iA corresponding label; z is a radical of_τIs a secondary data set with a tag lambda_τOf the person of (1) < th >, α_λτIs labeled as λ_τHuman characteristic expression of (a); { a_iIs an introduced reference proxy; wherein

Represents the ith generationReason a_iAssociated mined data, [ ·]₊Is a hinge function; beta is a parameter for the size of the equilibrium loss.

Further, the optimization algorithm is an adaptive moment estimation optimization algorithm, or/and the number of layers of the residual error network is 50.

Further, the channel attention is calculated as:

first, aggregating spatial information of data features by using an average pooling operation and a maximum pooling operation;

then, inputting the results after average pooling and maximum pooling into a shared network to generate a channel attention characteristic vector, wherein the shared network comprises a multi-layer perceptron with a hidden layer;

and finally, carrying out element-by-element summation and combining the results to be used as the channel attention feature vector.

Further, the spatial attention is calculated as:

firstly, carrying out average pooling and maximum pooling operations along a channel axis and merging channel attention feature vectors to generate average pooling features and maximum pooling features;

second, the average pooled feature and the maximum pooled feature are concatenated and convolved by a standard convolution layer to generate a spatial attention feature vector.

In another aspect of the invention, an unsupervised pedestrian re-identification apparatus based on soft multi-tags, the apparatus comprising:

the pre-training module is used for adding the auxiliary reference data set into a residual error network of a channel and a space attention mechanism to obtain characteristic diagrams of each auxiliary reference data; the auxiliary reference data set comprises a plurality of tagged auxiliary reference pedestrian data; secondly, calculating corresponding loss functions according to each auxiliary reference data characteristic diagram, wherein the loss functions comprise soft multi-label loss functions; optimizing through an optimization algorithm again until loss functions corresponding to the characteristic vectors of the auxiliary reference data are minimum, obtaining a pre-trained weight file, and taking the pre-trained weight file as a pre-training model;

the target set training module is used for passing a target data set through a residual error network of the channel and space attention mechanism based on a pre-training model to obtain each target data characteristic diagram, wherein the target data set comprises a plurality of label-free target pedestrian data;

the target set optimization module is used for calculating soft multi-label and loss functions according to each target data characteristic graph and performing iterative optimization by using an optimization algorithm, wherein the loss functions comprise soft multi-label loss functions, reference proxy loss functions and cross-view label consistent loss functions;

and the target set sequencing module is used for sequencing the target pedestrians according to the loss function result, taking the target data with the result larger than the preset threshold value as an unsupervised pedestrian re-identification matching object, and finally calculating the matching accuracy and precision.

In yet another aspect, a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the soft multi-tag based unsupervised pedestrian re-identification method.

The invention relates to an unsupervised pedestrian re-identification method and a device based on soft multi-label, which introduces the concepts of a reference agent and soft multi-label into the soft multi-label unsupervised pedestrian re-identification method based on an attention model and an optimization algorithm, pre-trains a reference data set through a loss function of a soft multi-label function, and constructs a mapping model of pre-training and training results. And calculating the mean value and the standard deviation of the soft multi-label in the camera view to obtain a loss function through the expectation of the minimum distance of the label distribution of the same target under different vision fields, namely the simplified 2-Wasserstein distance, solving the problem of cross-vision field label consistency, mining important information in the channel and space aspects through an attention mechanism, and controlling the variance of the self-adaptive learning rate by using an optimization algorithm to finally achieve the purpose of improving the training speed and precision.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flow chart of a soft multi-tag based unsupervised pedestrian re-identification method according to an exemplary first embodiment of the present invention;

FIG. 2 is a schematic diagram of a soft multi-tag based unsupervised pedestrian re-identification method according to an exemplary second embodiment of the present invention;

fig. 3 is a block diagram of a soft multi-tag based unsupervised pedestrian re-identification apparatus according to an exemplary third embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

It should be noted that, in the case of no conflict, the features in the following embodiments and examples may be combined with each other; moreover, all other embodiments that can be derived by one of ordinary skill in the art from the embodiments disclosed herein without making any creative effort fall within the scope of the present disclosure.

It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.

As shown in fig. 1, an unsupervised pedestrian re-identification method based on soft multi-tags includes:

101, a pre-training step, namely adding an auxiliary reference data set into a residual error network of a channel and a space attention mechanism to obtain characteristic diagrams of each auxiliary reference data; the auxiliary reference data set comprises a plurality of tagged auxiliary reference pedestrian data; secondly, calculating corresponding loss functions according to each auxiliary reference data characteristic diagram, wherein the loss functions comprise soft multi-label loss functions; optimizing through an optimization algorithm again until loss functions corresponding to the characteristic vectors of the auxiliary reference data are minimum, obtaining a pre-trained weight file, and taking the pre-trained weight file as a pre-training model;

102, training a target set, namely, passing the target data set through a residual error network of the channel and space attention mechanism based on a pre-training model to obtain each target data characteristic diagram, wherein the target data set comprises a plurality of unlabeled target pedestrian data;

103, an object set optimization step, namely calculating soft multi-labels and loss functions according to each object data feature map, and performing iterative optimization by using an optimization algorithm, wherein the loss functions comprise soft multi-label loss functions, reference proxy loss functions and cross-view label consistent loss functions, and the soft multi-label loss functions can reduce inter-class distances and increase intra-class distances, are beneficial to extracting potential discrimination information and distinguish non-similar object pairs with high similarity. The cross-view soft multi-label consistent loss function can reduce the soft multi-label difference of images under different cameras. The reference agent loss function can reduce the difference between a reference person and a reference agent, excavate cross-domain information and solve the cross-domain distribution problem between a target domain and a reference domain;

and 104, sequencing the target pedestrians according to the loss function result, taking the target data with the result larger than the preset threshold value as an unsupervised pedestrian re-identification matching object, and finally calculating the matching accuracy and precision.

In the embodiment, the soft multi-label unsupervised pedestrian re-identification method based on the attention model and the optimization algorithm introduces concepts of a reference agent and a soft multi-label, pre-trains a reference data set through a loss function of a soft multi-label function, and constructs a mapping model of pre-training and training results. And calculating the mean value and the standard deviation of the soft multi-label in the camera view to obtain a loss function through the expectation of the minimum distance of the label distribution of the same target under different vision fields, namely the simplified 2-Wasserstein distance, solving the problem of cross-vision field label consistency, mining important information in the channel and space aspects through an attention mechanism, and controlling the variance of the self-adaptive learning rate by using an optimization algorithm to finally achieve the purpose of improving the training speed and precision.

Referring to fig. 2, a schematic diagram of a soft multi-tag based unsupervised pedestrian re-identification method is a preferred embodiment of the method shown in fig. 1, and the steps are explained in detail:

first step (as a preferred step of step 101): the marker dataset and the Duke dataset are used as target datasets, and the MSMT17 is used as an auxiliary reference dataset. And pre-training an auxiliary reference data set by using a residual 50-layer network added with a channel and space attention mechanism and an adaptive moment estimation optimization algorithm.

Second step (as a preferred step of step 102): training a target data set, inputting a half random unlabeled image and a half random labeled reference sample, and extracting features by using a residual error network added with a channel and a spatial attention mechanism.

Specifically, the method comprises the following steps: the specific calculation process of the channel and space attention mechanism is as follows:

in order to realize fine-grained classification, an attention module is added behind each module in a residual error network, and cross-channel and spatial information are fused to represent target features. The complementary attention is calculated first by the channel attention module and then by the spatial attention module. Finally, the attention module effectively helps the network extract information and improve recognition accuracy by learning information to be emphasized or suppressed.

Attention of the channel:

(1) first, aggregating spatial information of data features by using an average pooling operation and a maximum pooling operation;

(2) then, inputting the results after average pooling and maximum pooling into a shared network to generate a channel attention characteristic vector, wherein the shared network comprises a multi-layer perceptron with a hidden layer;

(3) and finally, carrying out element-by-element summation and combining the results to be used as the channel attention feature vector.

Spatial attention is as follows:

(1) first, average pooling and max pooling operations are performed along the channel axis to merge the channel attention feature vectors, generating average pooling features and maximum pooling features.

(2) Second, the average pooled feature and the maximum pooled feature are concatenated and convolved by a standard convolution layer to generate a spatial attention feature vector.

Third step (as a preferred step of step 103): comparing the target pair characteristics and calculating a loss function, and optimizing by using an adaptive moment estimation (Rectified Adam) optimization algorithm to enable the soft multi-label to be closer to a real label and mine negative sample information, thereby realizing cross-domain label consistency and the like.

Specifically, the method comprises the following steps:

the loss function is specifically formulated as follows:

the loss function is: l ═ L₁+λ₁L₂+λ₂L₃Wherein λ is₁And λ₂Respectively, a cross-view label uniform loss function L₂And a reference proxy learning loss function L₃Is determined.

wherein

m is the set of all right-facing; n is the set of all hard negative pairs; s is a characteristic similarity threshold; t is a soft multi-label similarity threshold; l (·,) is soft multi-label consistency based on L1 distance; y is a soft multi-label function;

is an incorporated reference proxy.

wherein the labeled auxiliary reference data set is

Represents the ith agent a_iAssociated mined data, [ ·]₊Is a hinge function; beta is a parameter for the size of the equilibrium loss.

The adaptive moment estimation optimization algorithm is as follows:

let the adaptive learning rate psi (-) and adaptive momentum phi (-) functions be:

where T is {1, … T },

for a random target gradient of time step t, { g₁,…,g_tObey the positive Tai distribution

f_t(θ) is a random objective function; theta is a parameter; beta is a₁,β₂Is the attenuation rate; 1 × 10 ═ m^-8Is a smaller constant;

fourth step (as a preferred step of step 104): setting a threshold value, and sorting positive samples above the threshold value.

The performance is evaluated in table 1 by the first recognition accuracy (Rank1), the first five recognition accuracy (Rank5), and the mean average accuracy (mAP). Compared with the related classical algorithm, on the two public data sets, the first recognition accuracy is improved by at least 3.9%, the first five recognition accuracy is improved by at least 2.7%, and the average precision mean value is improved by at least 4.7%.

TABLE 1 comparison of unsupervised pedestrian re-identification results with related methods

The invention provides a soft multi-label unsupervised pedestrian re-identification model based on a channel and space attention mechanism and an adaptive moment estimation optimization algorithm, and the model has a good effect on two data sets commonly adopted by pedestrian re-identification. From the research results, the design of the soft multi-label has a great effect in the unsupervised pedestrian re-identification, the precision of the soft multi-label is far higher than that of other unsupervised pedestrian re-identification methods, important information in the aspects of channels and spaces can be mined by adding an attention model, and the adaptive moment estimation optimization algorithm has a better convergence effect and a more stable training process.

As shown in fig. 3, the structure diagram of the unsupervised pedestrian re-identification device based on soft multi-tag corresponds to the device embodiment shown in fig. 1 and 2, as shown in fig. 3, the unsupervised pedestrian re-identification device comprises:

the pre-training module 301 firstly adds the auxiliary reference data set to a residual error network of a channel and a space attention mechanism to obtain characteristic diagrams of each auxiliary reference data; the auxiliary reference data set comprises a plurality of tagged auxiliary reference pedestrian data; secondly, calculating corresponding loss functions according to each auxiliary reference data characteristic diagram, wherein the loss functions comprise soft multi-label loss functions; optimizing through an optimization algorithm again until loss functions corresponding to the characteristic vectors of the auxiliary reference data are minimum, obtaining a pre-trained weight file, and taking the pre-trained weight file as a pre-training model;

a target set training module 302, configured to obtain each target data feature map by passing a target data set through a residual error network of the channel and spatial attention mechanism based on a pre-training model, where the target data set includes multiple unlabeled target pedestrian data;

and the target set optimization module 303 calculates soft multi-labels and loss functions according to the target data feature maps, and performs iterative optimization by using an optimization algorithm, where the loss functions include a soft multi-label loss function, a reference proxy loss function, and a cross-view label consistent loss function, and the soft multi-label loss function can reduce inter-class distances and increase intra-class distances, and is helpful to extract potential discrimination information and distinguish non-similar target pairs with high similarity. The cross-view soft multi-label consistent loss function can reduce the soft multi-label difference of images under different cameras. The reference agent loss function can reduce the difference between a reference person and a reference agent, excavate cross-domain information and solve the cross-domain distribution problem between a target domain and a reference domain;

and the target set sequencing module 304 is used for sequencing the target pedestrians according to the loss function result, taking the target data with the result larger than the preset threshold value as an unsupervised pedestrian re-identification matching object, and finally calculating the matching accuracy and precision.

The operation principle of the unsupervised pedestrian re-identification device is briefly described as follows:

1. pre-training a reference data set by a residual network, only keeping a relevant part of the reference data set in the pre-training, adding a channel and a space attention mechanism in the network, pre-training by using an adaptive moment estimation optimization algorithm, and taking a weight file obtained after the pre-training as a pre-training model.

2. Inputting the same amount of target data and reference data into a network, and extracting depth features and fine-grained features by using a residual error network of an attention model added into a channel and a space.

3. And calculating a loss function, learning soft multi-labels for the label-free targets, reducing intra-class differences, expanding inter-class differences, reducing soft multi-label differences of targets under different cameras, and mining potential information to enable the soft multi-labels to be close to real labels.

4. And an adaptive moment estimation optimization algorithm is used, so that the optimizer is more stable in the initial training stage and the training is quicker. And sorting the recognition results by setting a threshold value, and calculating the accuracy.

Aiming at the defects of the existing unsupervised pedestrian re-identification technology, the embodiment applies an attention mechanism and an adaptive moment estimation optimization algorithm to soft multi-label unsupervised pedestrian re-identification. The attention feature optimization and the channel and space modules are combined for use, so that the performance is greatly improved while a small calculation amount is kept, more abundant high-level features can be extracted, important regions on the space can be determined, potential discriminant information is mined, and the purposes of reducing intra-class gaps and expanding inter-class gaps are achieved. In the embodiment, the adaptive moment estimation optimization algorithm is used for replacing a random gradient descent algorithm, the adaptive moment estimation optimization algorithm can provide a better initial value for the optimizer, the adaptive momentum is adjusted by a dynamic rectifier according to the variance, the variance of data is stabilized, and an efficient automatic preheating process is provided. The model can be effectively prevented from falling into local optimization under the condition of no need of preheating, rapid convergence is realized, and finally a better convergence effect and a more stable training process are realized. Finally, solving the problem that the target data set is not labeled by the unsupervised pedestrian through soft multi-label solution; and the loss function corresponding to the soft multi-label is used, so that the accuracy of the soft multi-label is improved, and the problems of consistent cross-camera label and correction of cross-domain distribution dislocation are solved. Finally, the purpose of improving the identification accuracy of unsupervised pedestrian re-identification is achieved.

The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the soft multi-tag based unsupervised pedestrian re-identification method.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An unsupervised pedestrian re-identification method based on soft multi-labels, characterized in that the method comprises:

pre-training: firstly, adding an auxiliary reference data set into a residual error network of a channel and space attention mechanism to obtain characteristic diagrams of each auxiliary reference data; the auxiliary reference data set comprises a plurality of tagged auxiliary reference pedestrian data; secondly, calculating corresponding loss functions according to each auxiliary reference data characteristic diagram, wherein the loss functions comprise soft multi-label loss functions; optimizing through an optimization algorithm again until loss functions corresponding to the characteristic vectors of the auxiliary reference data are minimum, obtaining a pre-trained weight file, and taking the pre-trained weight file as a pre-training model;

2. The method of claim 1,

the loss function L is: l ═ L₁+λ₁L₂+λ₂L₃Wherein λ is₁And λ₂Respectively, a cross-view label uniform loss function L₂And a reference proxy learning loss function L₃b.L.₁Is a soft multi-label loss function.

3. The method of claim 2, wherein (x) is assumed_i,x_j) For the label-free target pair, the soft multi-label loss function is calculated in the following manner:

m＝{(i,j)|f(x_i)^Tf(x_j)≥S,L(y_i,y_j)≥T}n＝{(τ,ω)|f(x_τ)^Tf(x_ω)≥S,L(y_τ,y_ω)＜T}

wherein the content of the first and second substances,

is an incorporated reference proxy.

4. The method of claim 3, wherein the cross-view tag consistent loss function is computed by:

5. The method of claim 2, wherein the reference agent learning loss function is computed across view label consistent loss functions by:

wherein the labeled auxiliary reference data set is

λ_i＝1,...,N_zIs per reference person z_iA corresponding label; z is a radical of_τIs a secondary data set with a tag lambda_τOf the person of (1) < th >, α_λτIs labeled as λ_τHuman characteristic expression of (a); { a_iIs an introduced reference proxy; wherein

6. The method according to any of claims 1-5, wherein the optimization algorithm is an adaptive moment estimation optimization algorithm, or/and the number of layers of the residual network is 50 layers.

7. The method of claim 6, wherein the channel attention is calculated by:

8. The method of claim 6, wherein the spatial attention is calculated by:

9. An unsupervised pedestrian re-identification device based on soft multi-tags, the device comprising:

10. A computer-readable storage medium storing a computer program, characterized in that the computer program, when being executed by a processor, implements the soft multi-tag based unsupervised pedestrian re-identification method according to any one of claims 1 to 8.