CN111832514A - Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels - Google Patents

Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels Download PDF

Info

Publication number
CN111832514A
CN111832514A CN202010705108.6A CN202010705108A CN111832514A CN 111832514 A CN111832514 A CN 111832514A CN 202010705108 A CN202010705108 A CN 202010705108A CN 111832514 A CN111832514 A CN 111832514A
Authority
CN
China
Prior art keywords
label
target
soft multi
loss function
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010705108.6A
Other languages
Chinese (zh)
Other versions
CN111832514B (en
Inventor
张宝华
朱思雨
张宗义
李建军
关海芳
邬可
张晓艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University of Science and Technology
Original Assignee
Inner Mongolia University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University of Science and Technology filed Critical Inner Mongolia University of Science and Technology
Priority to CN202010705108.6A priority Critical patent/CN111832514B/en
Publication of CN111832514A publication Critical patent/CN111832514A/en
Application granted granted Critical
Publication of CN111832514B publication Critical patent/CN111832514B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an unsupervised pedestrian re-identification method and a device based on soft multi-labels, wherein the method comprises the following pre-training steps: firstly, pre-training by using a deep network with an attention mechanism to obtain a characteristic diagram of each auxiliary reference data; secondly, calculating a loss function and optimizing the loss function through an optimization algorithm until the loss function corresponding to each auxiliary reference data feature vector is minimum, so as to obtain a pre-training model; a target set training step, namely, based on a pre-training model, using a deep network training with channel attention and space attention added to obtain a target characteristic diagram; the method comprises the following steps of optimizing a target set, calculating a corresponding loss function, and achieving the purposes of mining potential discrimination information, distinguishing different visually similar target pairs and solving the problem of cross-visual field label consistency; and (4) sequencing the target set, namely taking the target data with the output result larger than a preset threshold value as an unsupervised pedestrian re-identification matching object. The invention can improve the training speed and precision.

Description

Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels
Technical Field
The invention relates to the technical field of pedestrian re-identification, in particular to an unsupervised pedestrian re-identification method and device based on soft multi-labels.
Background
Due to the ever-increasing public safety needs, large-scale high-quality and inexpensive video camera devices are widely used in areas such as airports, subways, train stations, roads, schools, shopping malls, parking lots, theaters, and the like. Large-scale camera networks covering these areas provide large amounts of video data for anomaly or event of interest detection, target tracking, forensic, etc. However, because of the huge amount of videos, it is time-consuming, labor-consuming and inefficient to accurately find the object of interest from the camera network by only human beings, so that by using modern computer vision technology to automatically analyze a large amount of video data, the data can be processed more quickly and the monitoring quality can be significantly improved. Due to the fact that vision field crossing cannot be achieved in a monitoring network, buildings and the like are shielded, and positions of pedestrians are randomly changed, the motion track in the pedestrian video network is interrupted, when the pedestrians reappear, association needs to be conducted again, and therefore the pedestrian re-identification technology is needed.
Pedestrian re-identification is mainly used for tracking pedestrians in a non-overlapping area shot in a cross-camera scene, namely, an interested pedestrian is retrieved from an image shot by a camera, and then a target similar to the image of the interested pedestrian is retrieved in the cross-camera scene. By utilizing the technology to search the suspect image in the pedestrian database, a large amount of time and labor can be saved. The method has good application prospect in the aspects of intelligent security, criminal investigation work, missing person searching, image retrieval and the like.
The pedestrian re-identification method can be divided into supervised learning and unsupervised learning, and the problems of cross-vision field change, height similarity among different pedestrians and the like in the supervised learning can reduce the marking precision, so that the expansibility of the related method is poor. The problem of expandability of a supervised model can be solved by unsupervised learning, but the identification precision of the unsupervised learning is low at present, no mapping label exists in a cross-camera image, and the re-identification of unsupervised pedestrians is limited.
In the unsupervised pedestrian re-identification of directions, typical methods are a pseudo tag-based method and a domain adaptive method. But the model based on pseudo-label learning assigns pseudo-labels by directly comparing visual features (e.g., by K-means clustering) without concern for potential discriminative information. Unsupervised adaptation based approaches only focus on migrating or adapting the discriminative information from the source domain, ignoring the mining of discriminative label information in unlabeled target domains, and even after adaptation, the discriminative information in the source domain is less effective in the target domain.
Disclosure of Invention
In view of this, the invention provides an unsupervised pedestrian re-identification method and device based on soft multi-labels, so as to improve the training speed and precision.
In one aspect, the invention provides an unsupervised pedestrian re-identification method based on soft multi-labels, which comprises the following steps:
a pre-training step, namely adding an auxiliary reference data set into a residual error network of a channel and a space attention mechanism to obtain characteristic diagrams of each auxiliary reference data; the auxiliary reference data set comprises a plurality of tagged auxiliary reference pedestrian data; secondly, calculating corresponding loss functions according to each auxiliary reference data characteristic diagram, wherein the loss functions comprise soft multi-label loss functions; optimizing through an optimization algorithm again until loss functions corresponding to the characteristic vectors of the auxiliary reference data are minimum, obtaining a pre-trained weight file, and taking the pre-trained weight file as a pre-training model;
a target set training step, namely, based on a pre-training model, enabling a target data set to pass through a residual error network of the channel and space attention mechanism to obtain each target data characteristic diagram, wherein the target data set comprises a plurality of label-free target pedestrian data;
an objective set optimization step, calculating soft multi-label and loss functions according to each objective data characteristic graph and performing iterative optimization by using an optimization algorithm, wherein the loss functions comprise a soft multi-label loss function, a reference proxy loss function and a cross-view label consistent loss function;
and a target set sequencing step, namely sequencing the target pedestrians according to the loss function result, taking the target data with the result larger than a preset threshold value as an unsupervised pedestrian re-identification matching object, and finally calculating the matching accuracy and precision.
Further, the loss function L is: l ═ L11L22L3Wherein λ is1And λ2Respectively, a cross-view label uniform loss function L2And a reference proxy learning loss function L3b.L.1Is a soft multi-label loss function.
Further, suppose (x)i,xj) For the label-free target pair, the soft multi-label loss function is calculated in the following manner:
Figure BDA0002594433660000021
m={(i,j)|f(xi)Tf(xj)≥S,L(yi,yj)≥T} n={(τ,ω)|f(xτ)Tf(xω)≥S,L(yτ,yω)<T}
Figure BDA0002594433660000022
wherein m is all the set of the opposite faces; n is the set of all hard negative pairs; s is a characteristic similarity threshold; t is a soft multi-label similarity threshold; l (·,) is soft multi-label consistency based on L1 distance; y is softA multi-label function;
Figure BDA0002594433660000023
is a target data set without labels; f (-) is a mapping function to be learned by the soft multi-label; f (-) is the discriminative depth feature embedding to learn;
Figure BDA0002594433660000031
is an incorporated reference proxy.
Further, the cross-view label consistency loss function is calculated in the following manner:
Figure BDA0002594433660000032
where S (y) is the soft multi-label distribution of the data set X, Sθ(y) is the soft multi-label distribution in the theta camera view in dataset X, W (·,) is the distance between the two distributions, with a simplified 2-Wasserstein distance, μ and σ representing the mean and variance vectors of the soft multi-label logarithm, μθAnd σθIs the mean and variance vectors of the soft multi-labeled logarithms in the theta camera view.
Further, the calculation method of the reference agent learning loss function cross-view label consistent loss function is as follows:
Figure BDA0002594433660000033
wherein the labeled auxiliary reference data set is
Figure BDA0002594433660000034
Is per reference person ziA corresponding label; z is a radical ofτIs a secondary data set with a tag lambdaτOf the person of (1) < th >, αλτIs labeled as λτHuman characteristic expression of (a); { aiIs an introduced reference proxy; wherein
Figure BDA0002594433660000035
Represents the ith generationReason aiAssociated mined data, [ ·]+Is a hinge function; beta is a parameter for the size of the equilibrium loss.
Further, the optimization algorithm is an adaptive moment estimation optimization algorithm, or/and the number of layers of the residual error network is 50.
Further, the channel attention is calculated as:
first, aggregating spatial information of data features by using an average pooling operation and a maximum pooling operation;
then, inputting the results after average pooling and maximum pooling into a shared network to generate a channel attention characteristic vector, wherein the shared network comprises a multi-layer perceptron with a hidden layer;
and finally, carrying out element-by-element summation and combining the results to be used as the channel attention feature vector.
Further, the spatial attention is calculated as:
firstly, carrying out average pooling and maximum pooling operations along a channel axis and merging channel attention feature vectors to generate average pooling features and maximum pooling features;
second, the average pooled feature and the maximum pooled feature are concatenated and convolved by a standard convolution layer to generate a spatial attention feature vector.
In another aspect of the invention, an unsupervised pedestrian re-identification apparatus based on soft multi-tags, the apparatus comprising:
the pre-training module is used for adding the auxiliary reference data set into a residual error network of a channel and a space attention mechanism to obtain characteristic diagrams of each auxiliary reference data; the auxiliary reference data set comprises a plurality of tagged auxiliary reference pedestrian data; secondly, calculating corresponding loss functions according to each auxiliary reference data characteristic diagram, wherein the loss functions comprise soft multi-label loss functions; optimizing through an optimization algorithm again until loss functions corresponding to the characteristic vectors of the auxiliary reference data are minimum, obtaining a pre-trained weight file, and taking the pre-trained weight file as a pre-training model;
the target set training module is used for passing a target data set through a residual error network of the channel and space attention mechanism based on a pre-training model to obtain each target data characteristic diagram, wherein the target data set comprises a plurality of label-free target pedestrian data;
the target set optimization module is used for calculating soft multi-label and loss functions according to each target data characteristic graph and performing iterative optimization by using an optimization algorithm, wherein the loss functions comprise soft multi-label loss functions, reference proxy loss functions and cross-view label consistent loss functions;
and the target set sequencing module is used for sequencing the target pedestrians according to the loss function result, taking the target data with the result larger than the preset threshold value as an unsupervised pedestrian re-identification matching object, and finally calculating the matching accuracy and precision.
In yet another aspect, a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the soft multi-tag based unsupervised pedestrian re-identification method.
The invention relates to an unsupervised pedestrian re-identification method and a device based on soft multi-label, which introduces the concepts of a reference agent and soft multi-label into the soft multi-label unsupervised pedestrian re-identification method based on an attention model and an optimization algorithm, pre-trains a reference data set through a loss function of a soft multi-label function, and constructs a mapping model of pre-training and training results. And calculating the mean value and the standard deviation of the soft multi-label in the camera view to obtain a loss function through the expectation of the minimum distance of the label distribution of the same target under different vision fields, namely the simplified 2-Wasserstein distance, solving the problem of cross-vision field label consistency, mining important information in the channel and space aspects through an attention mechanism, and controlling the variance of the self-adaptive learning rate by using an optimization algorithm to finally achieve the purpose of improving the training speed and precision.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flow chart of a soft multi-tag based unsupervised pedestrian re-identification method according to an exemplary first embodiment of the present invention;
FIG. 2 is a schematic diagram of a soft multi-tag based unsupervised pedestrian re-identification method according to an exemplary second embodiment of the present invention;
fig. 3 is a block diagram of a soft multi-tag based unsupervised pedestrian re-identification apparatus according to an exemplary third embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
It should be noted that, in the case of no conflict, the features in the following embodiments and examples may be combined with each other; moreover, all other embodiments that can be derived by one of ordinary skill in the art from the embodiments disclosed herein without making any creative effort fall within the scope of the present disclosure.
It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
As shown in fig. 1, an unsupervised pedestrian re-identification method based on soft multi-tags includes:
101, a pre-training step, namely adding an auxiliary reference data set into a residual error network of a channel and a space attention mechanism to obtain characteristic diagrams of each auxiliary reference data; the auxiliary reference data set comprises a plurality of tagged auxiliary reference pedestrian data; secondly, calculating corresponding loss functions according to each auxiliary reference data characteristic diagram, wherein the loss functions comprise soft multi-label loss functions; optimizing through an optimization algorithm again until loss functions corresponding to the characteristic vectors of the auxiliary reference data are minimum, obtaining a pre-trained weight file, and taking the pre-trained weight file as a pre-training model;
102, training a target set, namely, passing the target data set through a residual error network of the channel and space attention mechanism based on a pre-training model to obtain each target data characteristic diagram, wherein the target data set comprises a plurality of unlabeled target pedestrian data;
103, an object set optimization step, namely calculating soft multi-labels and loss functions according to each object data feature map, and performing iterative optimization by using an optimization algorithm, wherein the loss functions comprise soft multi-label loss functions, reference proxy loss functions and cross-view label consistent loss functions, and the soft multi-label loss functions can reduce inter-class distances and increase intra-class distances, are beneficial to extracting potential discrimination information and distinguish non-similar object pairs with high similarity. The cross-view soft multi-label consistent loss function can reduce the soft multi-label difference of images under different cameras. The reference agent loss function can reduce the difference between a reference person and a reference agent, excavate cross-domain information and solve the cross-domain distribution problem between a target domain and a reference domain;
and 104, sequencing the target pedestrians according to the loss function result, taking the target data with the result larger than the preset threshold value as an unsupervised pedestrian re-identification matching object, and finally calculating the matching accuracy and precision.
In the embodiment, the soft multi-label unsupervised pedestrian re-identification method based on the attention model and the optimization algorithm introduces concepts of a reference agent and a soft multi-label, pre-trains a reference data set through a loss function of a soft multi-label function, and constructs a mapping model of pre-training and training results. And calculating the mean value and the standard deviation of the soft multi-label in the camera view to obtain a loss function through the expectation of the minimum distance of the label distribution of the same target under different vision fields, namely the simplified 2-Wasserstein distance, solving the problem of cross-vision field label consistency, mining important information in the channel and space aspects through an attention mechanism, and controlling the variance of the self-adaptive learning rate by using an optimization algorithm to finally achieve the purpose of improving the training speed and precision.
Referring to fig. 2, a schematic diagram of a soft multi-tag based unsupervised pedestrian re-identification method is a preferred embodiment of the method shown in fig. 1, and the steps are explained in detail:
first step (as a preferred step of step 101): the marker dataset and the Duke dataset are used as target datasets, and the MSMT17 is used as an auxiliary reference dataset. And pre-training an auxiliary reference data set by using a residual 50-layer network added with a channel and space attention mechanism and an adaptive moment estimation optimization algorithm.
Second step (as a preferred step of step 102): training a target data set, inputting a half random unlabeled image and a half random labeled reference sample, and extracting features by using a residual error network added with a channel and a spatial attention mechanism.
Specifically, the method comprises the following steps: the specific calculation process of the channel and space attention mechanism is as follows:
in order to realize fine-grained classification, an attention module is added behind each module in a residual error network, and cross-channel and spatial information are fused to represent target features. The complementary attention is calculated first by the channel attention module and then by the spatial attention module. Finally, the attention module effectively helps the network extract information and improve recognition accuracy by learning information to be emphasized or suppressed.
Attention of the channel:
(1) first, aggregating spatial information of data features by using an average pooling operation and a maximum pooling operation;
(2) then, inputting the results after average pooling and maximum pooling into a shared network to generate a channel attention characteristic vector, wherein the shared network comprises a multi-layer perceptron with a hidden layer;
(3) and finally, carrying out element-by-element summation and combining the results to be used as the channel attention feature vector.
Spatial attention is as follows:
(1) first, average pooling and max pooling operations are performed along the channel axis to merge the channel attention feature vectors, generating average pooling features and maximum pooling features.
(2) Second, the average pooled feature and the maximum pooled feature are concatenated and convolved by a standard convolution layer to generate a spatial attention feature vector.
Third step (as a preferred step of step 103): comparing the target pair characteristics and calculating a loss function, and optimizing by using an adaptive moment estimation (Rectified Adam) optimization algorithm to enable the soft multi-label to be closer to a real label and mine negative sample information, thereby realizing cross-domain label consistency and the like.
Specifically, the method comprises the following steps:
the loss function is specifically formulated as follows:
the loss function is: l ═ L11L22L3Wherein λ is1And λ2Respectively, a cross-view label uniform loss function L2And a reference proxy learning loss function L3Is determined.
Further, suppose (x)i,xj) For the label-free target pair, the soft multi-label loss function is calculated in the following manner:
Figure BDA0002594433660000071
wherein
m is the set of all right-facing; n is the set of all hard negative pairs; s is a characteristic similarity threshold; t is a soft multi-label similarity threshold; l (·,) is soft multi-label consistency based on L1 distance; y is a soft multi-label function;
Figure BDA0002594433660000072
is a target data set without labels; f (-) is a mapping function to be learned by the soft multi-label; f (-) is the discriminative depth feature embedding to learn;
Figure BDA0002594433660000073
is an incorporated reference proxy.
m={(i,j)|f(xi)Tf(xj)≥S,L(yi,yj)≥T} n={(τ,ω)|f(xτ)Tf(xω)≥S,L(yτ,yω)<T}
Figure BDA0002594433660000074
Further, the cross-view label consistency loss function is calculated in the following manner:
Figure BDA0002594433660000075
where S (y) is the soft multi-label distribution of the data set X, Sθ(y) is the soft multi-label distribution in the theta camera view in dataset X, W (·,) is the distance between the two distributions, with a simplified 2-Wasserstein distance, μ and σ representing the mean and variance vectors of the soft multi-label logarithm, μθAnd σθIs the mean and variance vectors of the soft multi-labeled logarithms in the theta camera view.
Further, the calculation method of the reference agent learning loss function cross-view label consistent loss function is as follows:
Figure BDA0002594433660000081
wherein the labeled auxiliary reference data set is
Figure BDA0002594433660000082
Is per reference person ziA corresponding label; z is a radical ofτIs a secondary data set with a tag lambdaτOf the person of (1) < th >, αλτIs labeled as λτHuman characteristic expression of (a); { aiIs an introduced reference proxy; wherein
Figure BDA0002594433660000083
Represents the ith agent aiAssociated mined data, [ ·]+Is a hinge function; beta is a parameter for the size of the equilibrium loss.
The adaptive moment estimation optimization algorithm is as follows:
let the adaptive learning rate psi (-) and adaptive momentum phi (-) functions be:
Figure BDA0002594433660000084
Figure BDA0002594433660000085
where T is {1, … T },
Figure BDA0002594433660000086
for a random target gradient of time step t, { g1,…,gtObey the positive Tai distribution
Figure BDA0002594433660000087
ft(θ) is a random objective function; theta is a parameter; beta is a12Is the attenuation rate; 1 × 10 ═ m-8Is a smaller constant;
fourth step (as a preferred step of step 104): setting a threshold value, and sorting positive samples above the threshold value.
The performance is evaluated in table 1 by the first recognition accuracy (Rank1), the first five recognition accuracy (Rank5), and the mean average accuracy (mAP). Compared with the related classical algorithm, on the two public data sets, the first recognition accuracy is improved by at least 3.9%, the first five recognition accuracy is improved by at least 2.7%, and the average precision mean value is improved by at least 4.7%.
TABLE 1 comparison of unsupervised pedestrian re-identification results with related methods
Figure BDA0002594433660000088
The invention provides a soft multi-label unsupervised pedestrian re-identification model based on a channel and space attention mechanism and an adaptive moment estimation optimization algorithm, and the model has a good effect on two data sets commonly adopted by pedestrian re-identification. From the research results, the design of the soft multi-label has a great effect in the unsupervised pedestrian re-identification, the precision of the soft multi-label is far higher than that of other unsupervised pedestrian re-identification methods, important information in the aspects of channels and spaces can be mined by adding an attention model, and the adaptive moment estimation optimization algorithm has a better convergence effect and a more stable training process.
As shown in fig. 3, the structure diagram of the unsupervised pedestrian re-identification device based on soft multi-tag corresponds to the device embodiment shown in fig. 1 and 2, as shown in fig. 3, the unsupervised pedestrian re-identification device comprises:
the pre-training module 301 firstly adds the auxiliary reference data set to a residual error network of a channel and a space attention mechanism to obtain characteristic diagrams of each auxiliary reference data; the auxiliary reference data set comprises a plurality of tagged auxiliary reference pedestrian data; secondly, calculating corresponding loss functions according to each auxiliary reference data characteristic diagram, wherein the loss functions comprise soft multi-label loss functions; optimizing through an optimization algorithm again until loss functions corresponding to the characteristic vectors of the auxiliary reference data are minimum, obtaining a pre-trained weight file, and taking the pre-trained weight file as a pre-training model;
a target set training module 302, configured to obtain each target data feature map by passing a target data set through a residual error network of the channel and spatial attention mechanism based on a pre-training model, where the target data set includes multiple unlabeled target pedestrian data;
and the target set optimization module 303 calculates soft multi-labels and loss functions according to the target data feature maps, and performs iterative optimization by using an optimization algorithm, where the loss functions include a soft multi-label loss function, a reference proxy loss function, and a cross-view label consistent loss function, and the soft multi-label loss function can reduce inter-class distances and increase intra-class distances, and is helpful to extract potential discrimination information and distinguish non-similar target pairs with high similarity. The cross-view soft multi-label consistent loss function can reduce the soft multi-label difference of images under different cameras. The reference agent loss function can reduce the difference between a reference person and a reference agent, excavate cross-domain information and solve the cross-domain distribution problem between a target domain and a reference domain;
and the target set sequencing module 304 is used for sequencing the target pedestrians according to the loss function result, taking the target data with the result larger than the preset threshold value as an unsupervised pedestrian re-identification matching object, and finally calculating the matching accuracy and precision.
The operation principle of the unsupervised pedestrian re-identification device is briefly described as follows:
1. pre-training a reference data set by a residual network, only keeping a relevant part of the reference data set in the pre-training, adding a channel and a space attention mechanism in the network, pre-training by using an adaptive moment estimation optimization algorithm, and taking a weight file obtained after the pre-training as a pre-training model.
2. Inputting the same amount of target data and reference data into a network, and extracting depth features and fine-grained features by using a residual error network of an attention model added into a channel and a space.
3. And calculating a loss function, learning soft multi-labels for the label-free targets, reducing intra-class differences, expanding inter-class differences, reducing soft multi-label differences of targets under different cameras, and mining potential information to enable the soft multi-labels to be close to real labels.
4. And an adaptive moment estimation optimization algorithm is used, so that the optimizer is more stable in the initial training stage and the training is quicker. And sorting the recognition results by setting a threshold value, and calculating the accuracy.
Aiming at the defects of the existing unsupervised pedestrian re-identification technology, the embodiment applies an attention mechanism and an adaptive moment estimation optimization algorithm to soft multi-label unsupervised pedestrian re-identification. The attention feature optimization and the channel and space modules are combined for use, so that the performance is greatly improved while a small calculation amount is kept, more abundant high-level features can be extracted, important regions on the space can be determined, potential discriminant information is mined, and the purposes of reducing intra-class gaps and expanding inter-class gaps are achieved. In the embodiment, the adaptive moment estimation optimization algorithm is used for replacing a random gradient descent algorithm, the adaptive moment estimation optimization algorithm can provide a better initial value for the optimizer, the adaptive momentum is adjusted by a dynamic rectifier according to the variance, the variance of data is stabilized, and an efficient automatic preheating process is provided. The model can be effectively prevented from falling into local optimization under the condition of no need of preheating, rapid convergence is realized, and finally a better convergence effect and a more stable training process are realized. Finally, solving the problem that the target data set is not labeled by the unsupervised pedestrian through soft multi-label solution; and the loss function corresponding to the soft multi-label is used, so that the accuracy of the soft multi-label is improved, and the problems of consistent cross-camera label and correction of cross-domain distribution dislocation are solved. Finally, the purpose of improving the identification accuracy of unsupervised pedestrian re-identification is achieved.
The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the soft multi-tag based unsupervised pedestrian re-identification method.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An unsupervised pedestrian re-identification method based on soft multi-labels, characterized in that the method comprises:
pre-training: firstly, adding an auxiliary reference data set into a residual error network of a channel and space attention mechanism to obtain characteristic diagrams of each auxiliary reference data; the auxiliary reference data set comprises a plurality of tagged auxiliary reference pedestrian data; secondly, calculating corresponding loss functions according to each auxiliary reference data characteristic diagram, wherein the loss functions comprise soft multi-label loss functions; optimizing through an optimization algorithm again until loss functions corresponding to the characteristic vectors of the auxiliary reference data are minimum, obtaining a pre-trained weight file, and taking the pre-trained weight file as a pre-training model;
a target set training step, namely, based on a pre-training model, enabling a target data set to pass through a residual error network of the channel and space attention mechanism to obtain each target data characteristic diagram, wherein the target data set comprises a plurality of label-free target pedestrian data;
an objective set optimization step, calculating soft multi-label and loss functions according to each objective data characteristic graph and performing iterative optimization by using an optimization algorithm, wherein the loss functions comprise a soft multi-label loss function, a reference proxy loss function and a cross-view label consistent loss function;
and a target set sequencing step, namely sequencing the target pedestrians according to the loss function result, taking the target data with the result larger than a preset threshold value as an unsupervised pedestrian re-identification matching object, and finally calculating the matching accuracy and precision.
2. The method of claim 1,
the loss function L is: l ═ L11L22L3Wherein λ is1And λ2Respectively, a cross-view label uniform loss function L2And a reference proxy learning loss function L3b.L.1Is a soft multi-label loss function.
3. The method of claim 2, wherein (x) is assumedi,xj) For the label-free target pair, the soft multi-label loss function is calculated in the following manner:
Figure FDA0002594433650000011
m={(i,j)|f(xi)Tf(xj)≥S,L(yi,yj)≥T}n={(τ,ω)|f(xτ)Tf(xω)≥S,L(yτ,yω)<T}
Figure FDA0002594433650000012
wherein the content of the first and second substances,
m is the set of all right-facing; n is the set of all hard negative pairs; s is a characteristic similarity threshold; t is a soft multi-label similarity threshold; l (·,) is soft multi-label consistency based on L1 distance; y is a soft multi-label function;
Figure FDA0002594433650000013
is a target data set without labels; f (-) is a mapping function to be learned by the soft multi-label; f (-) is the discriminative depth feature embedding to learn;
Figure FDA0002594433650000025
is an incorporated reference proxy.
4. The method of claim 3, wherein the cross-view tag consistent loss function is computed by:
Figure FDA0002594433650000021
where S (y) is the soft multi-label distribution of the data set X, Sθ(y) is the soft multi-label distribution in the theta camera view in dataset X, W (·,) is the distance between the two distributions, with a simplified 2-Wasserstein distance, μ and σ representing the mean and variance vectors of the soft multi-label logarithm, μθAnd σθIs the mean and variance vectors of the soft multi-labeled logarithms in the theta camera view.
5. The method of claim 2, wherein the reference agent learning loss function is computed across view label consistent loss functions by:
Figure FDA0002594433650000022
wherein the labeled auxiliary reference data set is
Figure FDA0002594433650000023
λi=1,...,NzIs per reference person ziA corresponding label; z is a radical ofτIs a secondary data set with a tag lambdaτOf the person of (1) < th >, αλτIs labeled as λτHuman characteristic expression of (a); { aiIs an introduced reference proxy; wherein
Figure FDA0002594433650000024
Represents the ith agent aiAssociated mined data, [ ·]+Is a hinge function; beta is a parameter for the size of the equilibrium loss.
6. The method according to any of claims 1-5, wherein the optimization algorithm is an adaptive moment estimation optimization algorithm, or/and the number of layers of the residual network is 50 layers.
7. The method of claim 6, wherein the channel attention is calculated by:
first, aggregating spatial information of data features by using an average pooling operation and a maximum pooling operation;
then, inputting the results after average pooling and maximum pooling into a shared network to generate a channel attention characteristic vector, wherein the shared network comprises a multi-layer perceptron with a hidden layer;
and finally, carrying out element-by-element summation and combining the results to be used as the channel attention feature vector.
8. The method of claim 6, wherein the spatial attention is calculated by:
firstly, carrying out average pooling and maximum pooling operations along a channel axis and merging channel attention feature vectors to generate average pooling features and maximum pooling features;
second, the average pooled feature and the maximum pooled feature are concatenated and convolved by a standard convolution layer to generate a spatial attention feature vector.
9. An unsupervised pedestrian re-identification device based on soft multi-tags, the device comprising:
the pre-training module is used for adding the auxiliary reference data set into a residual error network of a channel and a space attention mechanism to obtain characteristic diagrams of each auxiliary reference data; the auxiliary reference data set comprises a plurality of tagged auxiliary reference pedestrian data; secondly, calculating corresponding loss functions according to each auxiliary reference data characteristic diagram, wherein the loss functions comprise soft multi-label loss functions; optimizing through an optimization algorithm again until loss functions corresponding to the characteristic vectors of the auxiliary reference data are minimum, obtaining a pre-trained weight file, and taking the pre-trained weight file as a pre-training model;
the target set training module is used for passing a target data set through a residual error network of the channel and space attention mechanism based on a pre-training model to obtain each target data characteristic diagram, wherein the target data set comprises a plurality of label-free target pedestrian data;
the target set optimization module is used for calculating soft multi-label and loss functions according to each target data characteristic graph and performing iterative optimization by using an optimization algorithm, wherein the loss functions comprise soft multi-label loss functions, reference proxy loss functions and cross-view label consistent loss functions;
and the target set sequencing module is used for sequencing the target pedestrians according to the loss function result, taking the target data with the result larger than the preset threshold value as an unsupervised pedestrian re-identification matching object, and finally calculating the matching accuracy and precision.
10. A computer-readable storage medium storing a computer program, characterized in that the computer program, when being executed by a processor, implements the soft multi-tag based unsupervised pedestrian re-identification method according to any one of claims 1 to 8.
CN202010705108.6A 2020-07-21 2020-07-21 Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels Active CN111832514B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010705108.6A CN111832514B (en) 2020-07-21 2020-07-21 Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010705108.6A CN111832514B (en) 2020-07-21 2020-07-21 Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels

Publications (2)

Publication Number Publication Date
CN111832514A true CN111832514A (en) 2020-10-27
CN111832514B CN111832514B (en) 2023-02-28

Family

ID=72924557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010705108.6A Active CN111832514B (en) 2020-07-21 2020-07-21 Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels

Country Status (1)

Country Link
CN (1) CN111832514B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347977A (en) * 2020-11-23 2021-02-09 深圳大学 Automatic detection method, storage medium and device for induced pluripotent stem cells
CN112507941A (en) * 2020-12-17 2021-03-16 中国矿业大学 Cross-vision field pedestrian re-identification method and device for mine AI video analysis
CN112733695A (en) * 2021-01-04 2021-04-30 电子科技大学 Unsupervised key frame selection method in pedestrian re-identification field
CN113011920A (en) * 2021-03-15 2021-06-22 北京百度网讯科技有限公司 Conversion rate estimation model training method and device and electronic equipment
CN113112005A (en) * 2021-04-27 2021-07-13 南京大学 Domain self-adaption method based on attention mechanism
CN113139381A (en) * 2021-04-29 2021-07-20 平安国际智慧城市科技股份有限公司 Unbalanced sample classification method and device, electronic equipment and storage medium
CN113221656A (en) * 2021-04-13 2021-08-06 电子科技大学 Cross-domain pedestrian re-identification model based on domain invariant features and method thereof
CN113283320A (en) * 2021-05-13 2021-08-20 桂林安维科技有限公司 Pedestrian re-identification method based on channel feature aggregation
CN113392786A (en) * 2021-06-21 2021-09-14 电子科技大学 Cross-domain pedestrian re-identification method based on normalization and feature enhancement
CN113435329A (en) * 2021-06-25 2021-09-24 湖南大学 Unsupervised pedestrian re-identification method based on video track feature association learning
CN113920472A (en) * 2021-10-15 2022-01-11 中国海洋大学 Unsupervised target re-identification method and system based on attention mechanism

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451545A (en) * 2017-07-15 2017-12-08 西安电子科技大学 The face identification method of Non-negative Matrix Factorization is differentiated based on multichannel under soft label
US10108850B1 (en) * 2017-04-24 2018-10-23 Intel Corporation Recognition, reidentification and security enhancements using autonomous machines
CN110807434A (en) * 2019-11-06 2020-02-18 威海若维信息科技有限公司 Pedestrian re-identification system and method based on combination of human body analysis and coarse and fine particle sizes
JP2020038343A (en) * 2018-08-30 2020-03-12 国立研究開発法人情報通信研究機構 Method and device for training language identification model, and computer program for it
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN111339983A (en) * 2020-03-05 2020-06-26 四川长虹电器股份有限公司 Method for fine-tuning face recognition model
CN111428073A (en) * 2020-03-31 2020-07-17 新疆大学 Image retrieval method of depth supervision quantization hash

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10108850B1 (en) * 2017-04-24 2018-10-23 Intel Corporation Recognition, reidentification and security enhancements using autonomous machines
CN107451545A (en) * 2017-07-15 2017-12-08 西安电子科技大学 The face identification method of Non-negative Matrix Factorization is differentiated based on multichannel under soft label
JP2020038343A (en) * 2018-08-30 2020-03-12 国立研究開発法人情報通信研究機構 Method and device for training language identification model, and computer program for it
CN110807434A (en) * 2019-11-06 2020-02-18 威海若维信息科技有限公司 Pedestrian re-identification system and method based on combination of human body analysis and coarse and fine particle sizes
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN111339983A (en) * 2020-03-05 2020-06-26 四川长虹电器股份有限公司 Method for fine-tuning face recognition model
CN111428073A (en) * 2020-03-31 2020-07-17 新疆大学 Image retrieval method of depth supervision quantization hash

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HONG-XING YU等: "《Unsupervised Person Re-identification by Soft Multilabel Learning》", 《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
沈庆等: "《基于图卷积属性增强的行人再识别方法》", 《信息科技》 *
郑少飞等: "《基于改进损失函数的多阶段行人属性识别方法》", 《信息科技》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347977A (en) * 2020-11-23 2021-02-09 深圳大学 Automatic detection method, storage medium and device for induced pluripotent stem cells
CN112507941A (en) * 2020-12-17 2021-03-16 中国矿业大学 Cross-vision field pedestrian re-identification method and device for mine AI video analysis
CN112507941B (en) * 2020-12-17 2024-05-10 中国矿业大学 Cross-view pedestrian re-identification method and device for mine AI video analysis
CN112733695A (en) * 2021-01-04 2021-04-30 电子科技大学 Unsupervised key frame selection method in pedestrian re-identification field
CN112733695B (en) * 2021-01-04 2023-04-25 电子科技大学 Unsupervised keyframe selection method in pedestrian re-identification field
CN113011920A (en) * 2021-03-15 2021-06-22 北京百度网讯科技有限公司 Conversion rate estimation model training method and device and electronic equipment
CN113011920B (en) * 2021-03-15 2024-02-13 北京百度网讯科技有限公司 Training method and device for conversion rate estimation model and electronic equipment
CN113221656B (en) * 2021-04-13 2022-07-22 电子科技大学 Cross-domain pedestrian re-identification device and method based on domain invariant features
CN113221656A (en) * 2021-04-13 2021-08-06 电子科技大学 Cross-domain pedestrian re-identification model based on domain invariant features and method thereof
CN113112005A (en) * 2021-04-27 2021-07-13 南京大学 Domain self-adaption method based on attention mechanism
CN113139381B (en) * 2021-04-29 2023-11-28 平安国际智慧城市科技股份有限公司 Unbalanced sample classification method, unbalanced sample classification device, electronic equipment and storage medium
CN113139381A (en) * 2021-04-29 2021-07-20 平安国际智慧城市科技股份有限公司 Unbalanced sample classification method and device, electronic equipment and storage medium
CN113283320A (en) * 2021-05-13 2021-08-20 桂林安维科技有限公司 Pedestrian re-identification method based on channel feature aggregation
CN113392786A (en) * 2021-06-21 2021-09-14 电子科技大学 Cross-domain pedestrian re-identification method based on normalization and feature enhancement
CN113435329B (en) * 2021-06-25 2022-06-21 湖南大学 Unsupervised pedestrian re-identification method based on video track feature association learning
CN113435329A (en) * 2021-06-25 2021-09-24 湖南大学 Unsupervised pedestrian re-identification method based on video track feature association learning
CN113920472A (en) * 2021-10-15 2022-01-11 中国海洋大学 Unsupervised target re-identification method and system based on attention mechanism
CN113920472B (en) * 2021-10-15 2024-05-24 中国海洋大学 Attention mechanism-based unsupervised target re-identification method and system

Also Published As

Publication number Publication date
CN111832514B (en) 2023-02-28

Similar Documents

Publication Publication Date Title
CN111832514B (en) Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels
CN111126360B (en) Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model
Wu et al. Deep learning-based methods for person re-identification: A comprehensive review
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
Wang et al. A survey of vehicle re-identification based on deep learning
CN108960184B (en) Pedestrian re-identification method based on heterogeneous component deep neural network
Li et al. Adaptive deep convolutional neural networks for scene-specific object detection
Li et al. A method of cross-layer fusion multi-object detection and recognition based on improved faster R-CNN model in complex traffic environment
Jia et al. Obstacle detection in single images with deep neural networks
Tang et al. Multi-modal metric learning for vehicle re-identification in traffic surveillance environment
CN112906606B (en) Domain self-adaptive pedestrian re-identification method based on mutual divergence learning
Balaska et al. Unsupervised semantic clustering and localization for mobile robotics tasks
CN110728216A (en) Unsupervised pedestrian re-identification method based on pedestrian attribute adaptive learning
Zhang et al. Uncertain motion tracking based on convolutional net with semantics estimation and region proposals
CN112464775A (en) Video target re-identification method based on multi-branch network
Wang et al. Multiple pedestrian tracking with graph attention map on urban road scene
Papapetros et al. Visual loop-closure detection via prominent feature tracking
Liao et al. Multi-scale saliency features fusion model for person re-identification
CN111783738A (en) Abnormal motion trajectory detection method for communication radiation source
CN112052722A (en) Pedestrian identity re-identification method and storage medium
CN114359493B (en) Method and system for generating three-dimensional semantic map for unmanned ship
CN115082854A (en) Pedestrian searching method oriented to security monitoring video
Li et al. Pedestrian Motion Path Detection Method Based on Deep Learning and Foreground Detection
Huberman-Spiegelglas et al. Single image object counting and localizing using active-learning
Yang et al. Lightweight lane line detection based on learnable cluster segmentation with self‐attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant