CN115439887A

CN115439887A - Pedestrian re-identification method and system based on pseudo label optimization and storage medium

Info

Publication number: CN115439887A
Application number: CN202211033566.5A
Authority: CN
Inventors: 韩崇; 徐龙华; 严军荣; 赵忠
Original assignee: Sunwave Communications Co Ltd
Current assignee: Sunwave Communications Co Ltd
Priority date: 2022-08-26
Filing date: 2022-08-26
Publication date: 2022-12-06

Abstract

The invention discloses a pedestrian re-identification method, a system and a storage medium based on pseudo label optimization, wherein the method comprises the following steps: extracting the characteristics of the pedestrian training data and clustering; distributing pseudo labels according to the clustering result; dividing and optimizing pseudo labels; calculating the loss of the clustering characteristics according to the similarity between the most difficult query examples and the clustering characteristics, and updating the pseudo labels according to the loss of the clustering characteristics; and identifying the pedestrian category according to the pseudo label. The method solves the problem that the accuracy of unsupervised pedestrian re-identification is low due to inaccuracy of pseudo labels generated by unsupervised data clustering in the related technology.

Description

Pedestrian re-identification method and system based on pseudo tag optimization and storage medium

Technical Field

The invention belongs to the technical field of intelligent security and protection, and particularly relates to a pedestrian re-identification method and system based on pseudo tag optimization and a storage medium.

Background

In modern society, by carrying out intelligent and automatic analysis on the monitoring video, the early warning can be timely carried out, potential safety hazards can be eliminated, and the target can be quickly and efficiently tracked. Pedestrian re-identification (ReID) is a key component of video surveillance and intelligent analysis. Pedestrian re-identification is to use computer vision to determine whether there is a target person in a non-overlapping image or video sequence.

In an actual application scene, the environment is complex and changeable, for example, the monitoring quality difference is large due to the difference of the frame number, the position and the resolution among different cameras; the target person is often blocked by objects or other pedestrians, so that the body part of the target person is incomplete; bad weather and insufficient illumination can also influence the definition of the image, and effective pedestrian features are difficult to extract in the identification process. Therefore, a pedestrian re-identification model and method using supervision is proposed. However, the method needs to manually label a large amount of data, and has huge workload and higher cost. Meanwhile, due to the difference of the fields, the pedestrian re-recognition model trained under one camera data cannot be well generalized to a new camera network. This limits the adaptability of supervised pedestrian re-identification methods in real scenes. In addition, a semi-supervised and unsupervised learning technology based on the pseudo label is also provided, but the updating process of the clustering characteristic in the technology is not uniform, the accuracy rate of the pseudo label is low, the recognition success rate is insufficient, and the technology cannot be generalized to the actual application scene.

In order to solve the problem that the accuracy of unsupervised pedestrian re-identification is low due to inaccuracy of pseudo labels generated by non-artificial labeling data clustering, a pedestrian re-identification method and system based on pseudo label optimization and a storage medium are provided.

Disclosure of Invention

The embodiment of the invention provides a pedestrian re-identification method, a system and a storage medium based on pseudo label optimization, and at least solves the problem that the accuracy of unsupervised pedestrian re-identification is low due to inaccuracy of pseudo labels generated by unsupervised labeling data clustering in the related technology.

According to one embodiment of the invention, an unsupervised pedestrian re-identification method based on pseudo label optimization is provided, and comprises the following steps:

extracting the characteristics of the pedestrian training data and clustering;

distributing pseudo labels according to the clustering result;

partitioning and optimizing pseudo labels;

calculating the loss of the clustering characteristics according to the similarity between the most difficult query examples and the clustering characteristics, and updating the pseudo labels according to the loss of the clustering characteristics;

and identifying the pedestrian category according to the pseudo label.

In an exemplary embodiment, the extracting and clustering the features of the pedestrian training data includes the steps of:

acquiring training data of pedestrians in different states, and recording the training data as a sample set D = (x) ₁ ，x ₂ ，x ₃ ，…，x _m ) (ii) a The different states of the pedestrian comprise any one or more of normal walking, turning, stopping and traversing; the training data comprises video data or image data;

extracting data characteristics, namely extracting characteristic vectors of training data through a convolutional neural network;

and dividing the clustering clusters according to the data characteristics.

In an exemplary embodiment, the dividing the cluster according to the data characteristics includes the steps of:

initializing a set of core objects

Initializing cluster number S =0, initializing unvisited sample set Γ = D, cluster partitioning

Obtaining samples by means of distance measuresThis x _j Epsilon neighborhood subsample N _ε (x _j )；

If the number of the sub-sample set samples satisfies the personal number (N) _ε (x _j ) | ≥ Minpts, mixing sample x _j Adding a core object sample set: Ω = Ω & { x + _j }；

If core object set

Finishing clustering, otherwise, randomly selecting a core object o in a core object set omega, and initializing a current cluster core object queue omega _cur = o, initializing class index s = s +1, initializing current cluster sample set C _i = { o }, update unvisited sample set Γ = Γ - { o };

if the current cluster core object queue

Then the current cluster C is clustered _i After generation, update cluster partition C = { C = { C = } ₁ ，C ₂ ，...，C _N H, updating the core object set Ω = Ω -C _N (ii) a Otherwise, updating the core object set omega = omega-C _i ；

In the current cluster core object queue omega _cur Taking out a core object o', finding out all epsilon neighborhood subsample sets N through a neighborhood distance threshold epsilon _ε (o') let Δ = N _∈ (o') # Γ, updating the current cluster sample set C _i ＝C _i And U delta, updating an unvisited sample set gamma = gamma-delta and updating omega _cur ＝Ω _cur ∪(Δ∩Ω)-o′；

Repeatedly executing the steps until the cluster generation is finished, and dividing the cluster into C = { C = ₁ ，C ₂ ，...C _N }。

In an exemplary embodiment, the assigning the pseudo label according to the clustering result includes:

dividing the training data into clusters according to the cluster, wherein each cluster represents a pedestrian category;

distributing pseudo label to each cluster, trainingThe data set is represented as:

where D represents a set of training data, m represents the number of training data, x _i Denotes the ith picture, y _i A pseudo label representing the ith picture.

In one exemplary embodiment, the partitioning and optimizing the pseudo tag includes the steps of:

dividing the pseudo label into a trusted label part and a label part containing noise; the trusted tag portion is represented as the set X = { (X) _b ,y _b ) B e (1, \8230;, B) }, the part containing the noise label is represented as the set U = { U = _b :b∈(1,…,B)}，D＝X∪U；

Partitioning the pseudo labels based on a confidence policy and/or a metric policy;

pseudo labels are optimized using label smoothing and semi-supervised learning methods.

In an exemplary embodiment, the dividing the pseudo label based on the confidence policy and/or the measurement policy includes any one of dividing the pseudo label based on the confidence policy, dividing the pseudo label based on the measurement policy, and dividing the pseudo label by combining the confidence policy and the measurement policy;

the step of dividing the pseudo labels based on the confidence degree strategy is to use the confidence degree strategy based on the unsupervised classifier, and for training samples (x, y) belonging to E D, when the confidence degree score of the pseudo label y is greater than the set threshold value Gamma ₁ If so, adding the corresponding picture and the corresponding pseudo label into the X, otherwise, adding the picture and the pseudo label into the U;

the pseudo label is divided based on the measurement strategy by adopting an additional embedded network h _ψ For the training sample (x, y) ∈ D, y' = k-NN (h) is calculated according to the k neighbor classification method _ψ (x) When argmax (y) = argmax (y'), determine the current pseudo tag and h _ψ If the classification results are consistent, adding the pseudo label into the set X, otherwise, adding the pseudo label into the set U;

the step of dividing the pseudo labels by combining the confidence strategy and the measurement strategy is that when the pseudo labels are judged to be added into the set X based on the confidence strategy and the measurement strategy, the pseudo labels are added into the set X, otherwise, the pseudo labels are added into the set U.

In an exemplary embodiment, the optimizing the pseudo tag using the tag smoothing and semi-supervised learning method includes:

performing label smoothing operation on each pseudo label of the two data sets;

expanding two data sets by using a MixMatch method;

introducing a collaborative training network, performing parallel training on the two networks, and exchanging the prediction of the two networks according to a collaborative refinement label method;

training the independent loss of the two classification prediction networks by using a semi-supervised learning model and calculating a final loss function according to the independent loss;

updating the set X and the set U; namely, the noise samples are updated at the end of each training, and if the confidence of the samples exceeds a threshold value, the pseudo labels of the corresponding samples are updated by the predicted value of the network and the updated pseudo labels are added into the set X.

In an exemplary embodiment, the calculating a loss of the cluster feature according to the similarity between the most difficult query instance and the cluster feature and updating the pseudo label according to the loss of the cluster feature comprises:

initializing cluster characteristics by using characteristics of random instances in the cluster;

selecting the most illegible sample in the training data set as the most illegible query instance q and updating the clustering feature vector according to the most illegible sample;

and calculating a contrast loss function according to the similarity of the most difficult query example q and the current clustering characteristics of all clusters as follows:

wherein c is ⁺ Is the positive clustering feature vector of the query instance q, tau is a preset parameter, and the most difficult query instance q and the positive clustering feature c ⁺ Is inversely proportional to the loss function value, the most difficult to query instance q and all other cluster features c _i The similarity of (d) is proportional to the loss function value;

and circularly training the network until the loss function is converged, thereby obtaining the updated pseudo label.

According to yet another embodiment of the present invention, there is also provided a computer-readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to execute the above-mentioned method.

According to another embodiment of the present invention, there is also provided an unsupervised pedestrian re-identification system based on pseudo tag optimization, including:

a data acquisition unit;

a processor;

a memory;

and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the programs causing the computer to perform the above-described methods.

The pedestrian re-identification method, system and storage medium based on pseudo label optimization have the advantages that:

(1) By adopting an unsupervised pedestrian re-identification method, data does not need to be labeled in advance, the loss of labeled data can be effectively reduced, and the scene adaptability is improved.

(2) The similarity of the data is calculated according to the time, the position, the color characteristic and the speed change characteristic of the data, and the clustering algorithm is combined to obtain the clustering cluster, so that the clustering accuracy can be effectively improved, and the complexity of subsequent training and the time required by training are reduced.

(3) By adopting a hard batch sampling method and an updating process of unifying clustering characteristics, not only is the occupied operation memory effectively reduced, but also the network can calculate the contrast loss by using more stable clustering characteristics, and the prediction stability is improved.

(4) After the training data are clustered and the pseudo labels are distributed, the pseudo labels are divided, and are continuously optimized in a label smoothing mode and a training loss mode until a loss function is sufficiently converged, so that wrong pseudo labels can be effectively corrected, the problem of excessive confidence in the prediction process is relieved, and the accuracy of network prediction is improved.

Drawings

Fig. 1 is a flowchart of an unsupervised pedestrian re-identification method based on pseudo tag optimization according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method of step S01 according to an embodiment of the present invention;

fig. 3 is a flowchart of sub-step S013 of an embodiment of the invention;

FIG. 4 is a flowchart of a method of step S02 of an embodiment of the present invention;

FIG. 5 is a flowchart of a method of step S03 according to an embodiment of the present invention;

FIG. 6 is a flow chart of sub-step S033 of an embodiment of the present invention;

FIG. 7 is a flowchart of a method of step S04 according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an unsupervised pedestrian re-identification system based on pseudo tag optimization according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

In the pedestrian re-identification method, system and storage medium based on pseudo label optimization, all training data are recorded as a set X, and data features X in images are extracted through a Resnet50 network ^key Based on the obtained data characteristics, the data is classified by the DBScan method and pseudo labels are assigned. Dividing the obtained pseudo label into a trustable label and a part containing a noise label, expanding a data set according to a MixMatch method, and adopting a label smoothing mode to enable a model to be predicted from excessive confidence to noise, thereby improving the standard of prediction. And then, continuously predicting unlabeled data by using two networks, with the network prediction being more and more accurate, the labels containing noise are less and less finally, and when the loss function is sufficiently converged, a high-quality pseudo label can be obtained.

Each one is polymerizedThe characteristic label of class is C ₁ ，C ₂ ，C ₃ …C _N And storing the clustering feature vectors in a memory dictionary of an internal memory. And then sampling by adopting a P multiplied by K sampling method, and extracting data of P pedestrians, wherein each pedestrian has K pictures, so that each small batch contains P multiplied by K pictures.

Selecting a hard sample sampling mode, selecting a picture which is most difficult to represent as a sample for each pedestrian, and dynamically updating C _i And the clustering characteristic is adopted, so that the effect of extracting the characteristic from the network can be improved, and the clustering method is more stable. Therefore, the process can select P pedestrian pictures, namely P query instances, and update corresponding P cluster feature vectors. And for each query example, comparing the query example with all the clustering characteristics C in the clustering characteristics by adopting a clustering comparison loss method until all the clustering characteristics are updated, and repeating the process of P multiplied by K sampling. And circulating the whole process until the loss function is converged, indicating that the network is trained at the moment, and storing the model. And (4) the test data passes through the trained network and model to obtain a final prediction result.

The embodiment of the invention provides an unsupervised pedestrian re-identification method based on pseudo label optimization, a flow chart of which is shown in figure 1, and the method comprises the following steps:

s01, extracting the characteristics of pedestrian training data and clustering;

s02, distributing pseudo labels according to clustering results;

s03, dividing and optimizing pseudo labels;

s04, calculating clustering feature loss according to the similarity between the most difficult query example and the clustering feature, and updating the pseudo label according to the clustering feature loss;

and S05, identifying the pedestrian category according to the pseudo label.

In an exemplary embodiment, the step S01 of extracting features of pedestrian training data and clustering, as shown in fig. 2, includes the steps of:

step S011, acquiring training data of pedestrians in different states, and recording the training data as a sample set D = (x) ₁ ，x ₂ ，x ₃ ，…，x _m ) (ii) a The different states of the pedestrian comprise any one or more of normal walking, turning, stopping and traversing; the training data comprises video data or image data;

s012, extracting data characteristics, namely extracting characteristic vectors of the training data through a convolutional neural network;

and S013, dividing clustering clusters according to the data characteristics.

In this embodiment, the adopted pedestrian re-identification data considers different rooms, the test pedestrians walk towards any possible direction of the room, each person is recorded independently, and 5 persons participate in the pedestrians. In the first phase, 5 persons were recorded to walk randomly for 5 minutes in one room and after two weeks, the same 5 persons were recorded to walk for 15 minutes in another room. The entire training data set contains video and image information for 20 minutes for each person; the test set contained 5 more people, data under the same conditions. Other actions besides conventional walking are recorded, including turning, short pause and unexpected movement, data of multiple rooms in multiple days are collected, and influences of different clothes, shoes and other environmental factors are considered. The samples are divided into training data and testing data, and pedestrians in the two data are not overlapped.

And (3) outputting the feature vector after the training data passes through a convolutional neural network (such as a resnet50 network). The resnet50 network may be replaced with other convolutional neural networks herein.

Inputting the partition cluster: sample set D = (x) ₁ ，x ₂ ，x ₃ ，…，x _m ) Neighborhood parameters (ε, minpts), a sample distance metric, where ε represents the maximum radius of the neighborhood to determine if two points are similar or belong to the same class of distance, a larger ε represents a larger cluster (containing more data points) and a smaller ε builds a smaller cluster. Minpts represents the number of points considered as a cluster within a radius epsilon of a domain, such as: minpts =4, it means that any 4 or more points within the radius ∈ of a domain are considered as a cluster.

And (3) output of the division clustering: and C, cluster division.

In an exemplary embodiment, before the step S013 dividing the clusters according to the data characteristics, the method further includes the steps of: and calculating the relevance of the pedestrian data characteristics.

The step of calculating the relevance of the pedestrian data features is to calculate the relevance between different pedestrian data according to the time position relevance of the pedestrian data and/or the color distribution similarity of the pedestrian data and/or the speed change consistency of the pedestrian data.

The time-position association degree of the pedestrian data is calculated according to the positive correlation between the position similarity degree of different pedestrian data at adjacent time and the time-position association degree, and is represented by a variable a;

the color distribution similarity of the pedestrian data is calculated according to the negative correlation relationship between the color pixel value distribution difference value (the average value of the pixel difference or the difference of the pixel distribution variance) of different pedestrian data and the color distribution similarity, and is represented by a variable d;

the speed change consistency of the pedestrian data is calculated according to the negative correlation relationship between the instantaneous speed change difference value (difference between the instantaneous speed change mean values) and the speed change consistency of different pedestrian data, and is represented by a variable w;

calculating the relevance between the pedestrian data according to the time position relevance of the pedestrian data and/or the color distribution similarity of the pedestrian data and/or the positive correlation of the relevance between the speed change consistency of the pedestrian data and the pedestrian data, and expressing the relevance by using a variable e;

a1 to A7 in table a show different embodiments of calculating the correlation between pedestrian data, in which the degree of correlation a in time and position, the degree of similarity d in color distribution of pedestrian data, and the uniformity w in speed change of pedestrian data referred to in table a are obtained using the formulas in the above-described embodiments.

Table a different embodiment for calculating the association between pedestrian data

In an exemplary embodiment, in step S013, the clustering clusters are divided according to the data characteristics, and the flowchart is shown in fig. 3 and includes the steps of:

step S131, cluster initialization. I.e. initializing the core object set

Initializing cluster number S =0, initializing unvisited sample set Γ = D, clustering

Step S0132, obtaining a sample x by a distance measurement mode _j Epsilon neighborhood subsample N _ε (x _j )；

Step S0133, if the number of sub-sample set samples satisfies | N _ε (x _j ) | is not less than Minpts, samplex _j Adding a core object sample set: Ω = Ω & { x + _j }；

Step S0134, if the core object set

Finishing clustering, otherwise, randomly selecting a core object o in a core object set omega, and initializing a current cluster core object queue omega _cur = o, initializing class index s = s +1, initializing the current cluster sample set C _i = { o }, update unvisited sample set Γ = Γ - { o };

step S0135, if the current cluster core object queue

Then the current cluster C is clustered _i After generation, update cluster partition C = { C = { C = } ₁ ，C ₂ ，...，C _N H, updating a core object set omega = omega-C _N (ii) a Otherwise, updating the core object set omega = omega-C _i ；

Step S0136 of queuing omega in current cluster core object _cur Taking out a core object o', finding out all epsilon neighborhood subsample sets N through a neighborhood distance threshold epsilon _ε (o') let Δ = N _∈ (o') # Γ, updating the current cluster sample set C _i ＝C _i And U delta, updating an unaccessed sample set gamma = gamma-delta and updating omega _cur ＝Ω _cur ∪(Δ∩Ω)-o′；

Step S0137, repeating the above steps until the cluster generation is completed, and the cluster division is C = { C = ₁ ，C ₂ ，...C _N }. And after the cluster generation is finished, any item that no new cluster is generated or all data is divided into clusters or the residual data cannot generate the cluster is selected.

In this embodiment, a core object set is initialized

For j =1,2, \8230m, all core objects were found as follows:

a) Obtaining samples x by means of distance measures _j Epsilon neighborhood subsample N _ε (x _j )。

b) If the number of the sub-sample set samples meets the value of the agent N _ε (x _j ) The | is more than or equal to Minpts (minimum number of sample copies), and the sample x is put _j Adding a core object sample set: Ω = Ω & { x + _j }。

If core object set

The algorithm is ended, otherwise, in the core object set omega, one core object o is randomly selected, and the current cluster core object queue omega is initialized _cur = o, initializing class index s = s +1, initializing the current cluster sample set C _i = o, update unvisited sample set Γ = Γ - { o }.

If the current cluster core object queue

Then the current cluster C is clustered _i After the generation is finished, updating cluster division C = { C = { C = ₁ ，C ₂ ，...，C _N H, updating a core object set omega = omega-C _N Continuing to divide the next cluster; otherwise, updating the core object set omega = omega-C _i 。

In the current cluster core object queue omega _cur Taking out a core object o', finding out all epsilon neighborhood subsample sets N through a neighborhood distance threshold epsilon _ε (o') let Δ = N _∈ (o') # Γ, updating the current cluster sample set C _i ＝C _i And U delta, updating an unaccessed sample set gamma = gamma-delta and updating omega _cur ＝Ω _cur ∪(Δ∩Ω)-o′。

The output result of the steps is as follows: cluster division C = { C ₁ ，C ₂ ，...C _N }。

In another exemplary embodiment, the steps in the above embodiments: finding sample x by distance metric _j The epsilon neighborhood subsamples of (c) further comprise a relevance judging step, namely:

a preset data relevance threshold E =0.7, the relevance E (for example, A7) between arbitrary pedestrian data is calculated according to any one of the tables a, and if the relevance E < E, it is determined that there is no relevance between the pedestrian data, and the pedestrian data is not taken as a neighborhood subsample.

In another exemplary embodiment, a neighborhood weight value is calculated from a product or weighted sum of the correlation e between pedestrian data and the distance, and the sample x is found from the neighborhood weight value _j Epsilon neighborhood subsamples.

Compared with the existing clustering method, the method considers the matching of time and position and the relation between color complexity and personnel category, so that the label which is not marked has a pre-classification basis. And performing preset classification according to the similarity of the data, and effectively reducing the complexity of subsequent training and the time required by training by combining the clustering obtained by the clustering algorithm.

In an exemplary embodiment, in the step S02, the pseudo label is assigned according to the clustering result, and a flowchart is shown in fig. 4, and includes the steps of:

step S021, dividing the training data into clusters according to the cluster, wherein each cluster represents a pedestrian category;

step S022, distributing pseudo labels to each cluster, wherein a training data set is represented as:

In this embodiment, the training data is divided into clusters according to the cluster, each cluster represents a pedestrian category, a pseudo label is assigned to each cluster, and the training data set is represented as:

In an exemplary embodiment, the step S03, dividing and optimizing the pseudo tag, and the flowchart is shown in fig. 5, and includes the steps of:

step S031, divide the false label into the trusty label part and contains the part of the noise label; the trusted tag portion is represented as the set X = { (X) _b ,y _b ) B e (1, \8230;, B) }, the part containing the noise label represented as the set U = { U = _b :b∈(1,…,B)}，D＝X∪U；

Step S032, dividing pseudo labels based on a confidence strategy and/or a measurement strategy;

step S033, optimizes the pseudo label using label smoothing and semi-supervised learning methods.

In this embodiment, the pseudo tag is divided into a trusted tag and a noisy tag, and the pseudo tag of the data set is divided into two disjoint sets X = { (X) _b ,y _b ) B e (1, \8230;, B) } and U = { U = { (1, \8230;, B) } _b B ∈ (1, \8230;, B) }, i.e. D = X £ U; and carrying out secondary division on the pseudo labels based on the confidence degree strategy or the measurement strategy or combined with the confidence degree and the measurement strategy, and executing an optimization algorithm on the pseudo labels subjected to secondary division. The optimization algorithm refers to a label smoothing algorithm and a semi-supervised learning algorithm.

In an exemplary embodiment, the dividing the pseudo tags based on the confidence policy and/or the metric policy of step S032 includes dividing the pseudo tags based on the confidence policy, dividing the pseudo tags based on the metric policy, or dividing the pseudo tags based on the confidence policy and the metric policy;

the step of dividing the pseudo labels based on the confidence degree strategy is to use the confidence degree strategy based on an unsupervised classifier, for a training sample (x, y) epsilon D, when the confidence degree score of the pseudo label y is greater than a set threshold value Gamma ₁ If so, adding the corresponding picture and the corresponding pseudo label into the X, otherwise, adding the picture and the pseudo label into the U; in this embodiment, the threshold r is preset for the confidence score ₁ All being greater than the threshold valueThe pseudo tags in (b) are placed in a trusted set of tags X and the remaining pseudo tags are placed in a noisy set of tags U.

The pseudo label is divided based on the measurement strategy by adopting an additional embedded network h _ψ For the training sample (x, y) ∈ D, y' = k-NN (h) is calculated according to the k nearest neighbor classification method _ψ (x) When argmax (y) = argmax (y'), determine the current pseudo tag and h _ψ If the classification results are consistent, adding the pseudo label into the set X, otherwise, adding the pseudo label into the set U; in this embodiment, a new embedded network function is introduced, a proximity classification algorithm is used to determine whether a current pseudo tag matches an embedded network classification result, if yes, the pseudo tag is placed in a trusted tag set X, otherwise, the pseudo tag is placed in a tag set U containing noise, and the above operations are sequentially performed on all pseudo tags until all pseudo tags are classified into the set X or the set U.

The step of dividing the pseudo labels by combining the confidence strategy and the measurement strategy is that when the pseudo labels are judged to be added into the set X based on the confidence strategy and the measurement strategy, the pseudo labels are added into the set X, otherwise, the pseudo labels are added into the set U. In this embodiment, the threshold r is preset for the confidence score ₁ Introducing a new embedded network function when the confidence score of the pseudo label is larger than the score threshold value r ₁ And when the pseudo label is matched with the classification result of the embedded network, putting the pseudo label into a credible label set X, otherwise, putting the pseudo label into a label set U containing noise, and sequentially executing the operation on all the pseudo labels until all the pseudo labels are divided into the set X or the set U.

In an exemplary embodiment, the step S033 optimizes the pseudo label by using label smoothing and semi-supervised learning method, and a flowchart is shown in fig. 6, and includes:

step S0331, performing a label smoothing operation on each pseudo label in the divided set X and set U;

step S0332, expanding two data sets by using a MixMatch method;

step S0333, a collaborative training network is introduced to carry out parallel training on the two classification prediction networks, and a reliable label is generated by combining the predictions of the two classification prediction networks;

step S0334, training the individual loss of the two classification prediction networks by using a semi-supervised learning model, and calculating a final loss function according to the individual loss;

step S0335, update set X and set U. That is, if the confidence of a sample exceeds the threshold, the pseudo label of the corresponding sample is updated by the predicted value of the network and the updated pseudo label is added to the set X.

In this embodiment, in order to make the model from over-confident to noisy prediction, label smoothing and semi-supervised learning methods are used, and uniform noise is added. To prove the effectiveness of network prediction, the prediction labels should remain consistent after picture enhancement and de-emphasis, phi _A (x)、ф _a (x) Respectively representing enhanced or attenuated sample pictures, the attenuation of a picture requires M operations, M ∈ {1, \8230;, M }, i.e. there is x _b,m ,u _b,m ＝ф _a (x _b )，ф _a (u _b ). At the same time, it is proposed to use the Resnet50 model as a class prediction network f _θ(1) 、f _θ(2) . And the accuracy of the pseudo label is ensured by training a prediction network. In particular, the amount of the solvent to be used,

step S0331, label smoothing. And performing label smoothing operation on each original label. The specific operation is shown in formula (1):

wherein C represents the clustering quantity, and epsilon represents a parameter of uniform noise;

calculating soft label

And the cross entropy of the predicted strongly enhanced samples, as shown in equation (2):

when cross entropy function

The effect of noise samples can be minimized upon convergence.

Step S0332, extending the data set by the MixMatch method, which is expressed by formula (3):

to ensure that the data set augmented by the MixMatch method is sufficiently similar to the real data set, the loss functions shown in equation (4) and equation (5) are calculated:

and step S0333, refining the label. Maintaining a single network for learning tends to over-fit incorrect pseudo-tags, as the initial errors of the network are propagated back again and thus accumulate. To avoid this error, a co-training module is introduced, in which two networks f _θ(1) 、f _θ(2) Trained in parallel and augmented with a step of co-refining labels to exchange their predictions to guide each other is a label refinement process aimed at producing reliable labels by merging predictions of two networks. Taking training the first network as an example, the collaborative refinement labels are shown in equations (6-9):

it finally ends up in

Is the confidence of the network for picture x, and T is the sharpness. For pseudo-tag data, a prediction set of two nets is used to guess a data sample u _b Is not marked

After updating the above labels, the update set X is as shown in equations (10) and (11):

the update set U is:

b∈(1,…,B)}。

the overall process of refining the tags can be represented by equations (12) and (13):

step S0334, loss of training. The final loss function is：

Wherein

And

is two separate losses after using MixMatch, trained using a semi-supervised learning model, ensuring that after expansion of the data set, the prediction remains consistent with before, with the aim of minimizing noise by strongly enhancing the samples.

And λ _u For controlling the effect of the MixMatch loss. The training process is shown in equation (14):

step S0335, updates the two data sets. Updating the noise samples at the end of each epoch training, if the confidence of the network for a given unclean sample exceeds a threshold, updating the label of the corresponding sample with the predicted value of the network, the updated label being considered clean and added to the marked clean set X, as shown in equations (15) and (16):

p＝f _θ(k) (u),where k＝arg max _k′ (max(f _θ(k′) (u))) (15)

X←X∪{(u,1 _p )|max(p)>τ ₂ } (16)

wherein p is a one-hot notation, 1 _p Denotes that the ith element value in p is 1,i = argmax (p).

And (4) circulating the steps until the function is converged, and considering that all the predicted labels generated by the network are correct labels, namely finishing the optimization of the pseudo labels.

In an exemplary embodiment, after the optimization of the pseudo label is finished, each cluster is allocated with the optimized pseudo label.The feature of each cluster is labeled as { c ₁ ,c ₂ ,....,c _N And stored in an in-memory dictionary, the number N also changes since clusters and pseudo-labels are always updated. And sampling a cluster characteristic for each cluster by using a random sampling method, and distributing the optimized pseudo label.

In an exemplary embodiment, the step S04 calculates a cluster feature loss according to the similarity between the most difficult query instance and the cluster feature, and updates the pseudo label accordingly, and the flowchart is shown in fig. 7 and includes:

s041, initializing cluster characteristics by using characteristics of random instances in the cluster;

s042, selecting the most illegible sample in the training data set as the most illegible query instance q and updating the clustering feature vector according to the most illegible sample;

step S043, calculating a contrast loss function according to the similarity of the most difficult query instance q and the current clustering features of all clusters as follows:

wherein c is ⁺ Is the positive clustering feature vector of the query instance q, tau is a preset parameter, and the most difficult query instance q and the positive clustering feature c ⁺ Is inversely proportional to the loss function value, the most difficult to query the instance q and all other cluster features c _i The similarity of (d) is proportional to the loss function value;

and S044, circularly training the network until the loss function is converged, so as to obtain the updated pseudo label.

In this embodiment, the feature { C of each cluster is used ₁ ，C ₂ ，C ₃ ，…C _N Stored in the memory dictionary, running at each stage of the clustering algorithm, the number of clusters N changes with the model cycle, and features of random instances in the clusters are used to initialize the cluster features, as shown in formula (17):

C _i ←U(X _i ) (17)

where U (-) is a uniformly sampled function, X _i Representing the ith cluster set containing all samples in cluster i.

In the training process, P pedestrians are extracted, each pedestrian has K instances with a fixed quantity, therefore, a total number of P multiplied by K query pictures can be obtained in each small batch, P samples which are difficult to recognize are selected from the P multiplied by K as the difficult query instances, and corresponding clustering feature vectors are updated. For a certain cluster i, the selection way of the most difficult query instance is shown in formula (18):

q _hard ←arg min q _q ·c _i ,q∈Q ⁱ (18)

wherein q is _hard Is and clustering property c _i The example with the smallest similarity (i.e. the most difficult to query example) is used for measuring the similarity in a point-by-point manner.

The update mode of the feature vector is shown in formula (19):

c _i ←m·c _i +(1-m)·q _hard (19)

wherein m is a predetermined parameter, Q _i Is the characteristic of the cluster labeled i in the current batch.

Example q of the above _hard Namely, the most difficult query instance q is compared with all current cluster characteristics C on the clustering level, and the obtained comparison loss function is shown in formula (20):

wherein c is ⁺ Is the positive clustering feature vector of the query instance q, tau is a preset parameter, and the most difficult query instance q and the positive clustering feature c ⁺ Is inversely proportional to the loss function value and the most difficult to query instances q are directly proportional to the loss function value.

Training the network cyclically up to a loss function L _q And converging to obtain the updated pseudo label.

In an exemplary embodiment, the step S05 of identifying the pedestrian category according to the pseudo tag is: the pedestrian video or image data is output to the algorithm of the embodiment, the final pseudo label of the pedestrian is obtained through the steps of data clustering, pseudo label distribution, pseudo label division, pseudo label optimization, loss function calculation, pseudo label updating and the like, and the category of the current pedestrian is identified according to the category corresponding to the pseudo label set in advance.

In another exemplary embodiment, the step S05 of identifying the pedestrian category according to the pseudo tag is: calculating a feature vector of the currently output pedestrian video or image data, calculating a pseudo label corresponding to the feature vector of the current pedestrian according to the corresponding relation between the feature vector of the training data and the final pseudo label, and identifying the category of the current pedestrian according to the category corresponding to the pre-set pseudo label.

Recording all training data as a set X, and extracting data characteristics X in the image by using a Resnet50 network ^key And classifying the data by using a DBScan method according to the obtained data characteristics, and allocating a pseudo label. Dividing the obtained pseudo label into a trustable label and a part containing a noise label, expanding a data set according to a MixMatch method, and adopting a label smoothing mode to enable a model to be predicted from excessive confidence to noise, thereby improving the standard of prediction. And then continuously predicting unlabeled data by using two networks, with the network prediction being more and more accurate, the labels containing noise are less and less finally, and when the loss function is sufficiently converged, the pseudo labels with high quality can be obtained.

Labeling the feature of each cluster as C ₁ ，C ₂ ，C ₃ …C _N And simultaneously storing the clustering feature vectors in a memory dictionary of the memory. And then sampling by adopting a P multiplied by K sampling method, and extracting data of P pedestrians, wherein each pedestrian has K pictures, so that each small batch contains P multiplied by K pictures.

Selecting a hard sample sampling mode, selecting a picture which is most difficult to represent as a sample for each pedestrian, and dynamically updating C _i And the clustering characteristic is adopted, so that the effect of extracting the characteristic from the network can be improved, and the clustering method is more stable. Therefore, the process can select P pedestrian pictures, namely P query instances, and update corresponding P cluster feature vectors. For each query example, clustering contrast loss method is adopted to carry outAnd comparing with all the cluster characteristics C in the cluster characteristics until all the cluster characteristics are updated, and repeating the process of P multiplied by K sampling. And circulating the whole process until the loss function is converged, indicating that the network is trained at the moment, and storing the model. And (4) the test data passes through the trained network and model to obtain a final prediction result.

A computer-readable storage medium of an embodiment of the invention stores a computer program for electronic data exchange, wherein the computer program causes a computer to perform the method of any of the above embodiments.

The non-supervision pedestrian re-identification system based on pseudo label optimization of the embodiment of the invention has a structural schematic diagram as shown in fig. 8, and comprises:

a data acquisition unit;

a processor;

a memory;

and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor of the source node, the programs causing the computer to perform the method of any of the embodiments described above.

Of course, those skilled in the art should realize that the above embodiments are only used for illustrating the present invention, and not as a limitation of the present invention, and that changes and modifications to the above embodiments are within the scope of the present invention.

Claims

1. A pedestrian re-identification method based on pseudo tag optimization is characterized by comprising the following steps:

extracting the characteristics of the pedestrian training data and clustering;

distributing pseudo labels according to the clustering result;

dividing and optimizing pseudo labels;

and identifying the pedestrian category according to the pseudo label.

2. The pedestrian re-identification method based on pseudo tag optimization according to claim 1, wherein the step of extracting and clustering the features of the pedestrian training data comprises the steps of:

extracting data features, namely extracting feature vectors from training data through a convolutional neural network;

and dividing the clustering clusters according to the data characteristics.

3. The pedestrian re-identification method based on the pseudo label optimization according to claim 2, wherein the clustering is divided according to the data characteristics, and the method comprises the following steps:

initializing a set of core objects

Obtaining samples x by means of distance measurement _j Epsilon neighborhood subsample N _ε (x _j )；

If the number of the sub-sample set samples satisfies the personal number (N) _ε (x _j ) A | is greater than or equal to Minpts, then sample x _j Adding a core object sample set: Ω = Ω & { x + _j }；

If core object set

Finishing clustering, otherwise randomly selecting a core object o in a core object set omega, and initializingFront cluster core object queue omega _cur = o, initializing class index s = s +1, initializing the current cluster sample set C _i = { o }, update unvisited sample set Γ = Γ - { o };

if the current cluster core object queue

Then the current cluster C is clustered _i After the generation is finished, updating cluster division C = { C = { C = ₁ ，C ₂ ，...，C _N H, updating a core object set omega = omega-C _N (ii) a Otherwise, updating the core object set omega = omega-C _i ；

In the current cluster core object queue omega _cur Taking out a core object o', and finding out all epsilon neighborhood subsample sets N through a neighborhood distance threshold epsilon _ε (o') let Δ = N _∈ (o') # Γ, updating the current cluster sample set C _i ＝C _i And U delta, updating an unvisited sample set gamma = gamma-delta and updating omega _cur ＝Ω _cur ∪(Δ∩Ω)-o′；

4. The pedestrian re-identification method based on pseudo tag optimization according to claim 3, wherein the pseudo tag is assigned according to the clustering result, comprising the steps of:

each cluster is assigned a pseudo label, and the training data set is represented as:

5. The pedestrian re-identification method based on pseudo tag optimization according to claim 4, wherein the dividing and optimizing the pseudo tags comprises the steps of:

dividing the pseudo tag into a trusted tag part and a tag part containing noise; the trusted tag portion is represented as the set X = { (X) _b ,y _b ) B e (1, \8230;, B) }, the part containing the noise label is represented as the set U = { U = _b :b∈(1,…,B)}，D＝X∪U；

Partitioning the pseudo-tags based on a confidence policy and/or a metric policy;

6. The pedestrian re-identification method based on the pseudo-label optimization according to claim 5, wherein the dividing of the pseudo labels based on the confidence strategy and/or the metric strategy comprises any one of dividing the pseudo labels based on the confidence strategy, dividing the pseudo labels based on the metric strategy, and dividing the pseudo labels by combining the confidence strategy and the metric strategy;

the step of dividing the pseudo labels based on the confidence degree strategy is to use the confidence degree strategy based on an unsupervised classifier, for a training sample (x, y) epsilon D, when the confidence degree score of the pseudo label y is greater than a set threshold value Gamma ₁ If so, adding the corresponding picture and the corresponding pseudo label into the set X, otherwise, adding the picture and the pseudo label into the set U;

7. The pedestrian re-identification method based on pseudo label optimization according to claim 6, wherein the pseudo label is optimized by using a label smoothing and semi-supervised learning method, comprising the steps of:

performing label smoothing operation on each pseudo label in the divided set X and set U;

expanding two data sets by using a MixMatch method;

a collaborative training network is introduced to carry out parallel training on the two classification prediction networks, and a reliable label is generated by combining the predictions of the two classification prediction networks;

set X and set U are updated.

8. The pedestrian re-identification method based on pseudo tag optimization according to claim 7, wherein the step of calculating cluster feature loss according to the similarity between the most difficult query instance and the cluster feature and updating the pseudo tag according to the cluster feature loss comprises the following steps:

and calculating a comparison loss function according to the similarity of the most difficult query example q and the current clustering characteristics of all clusters as follows:

9. A computer-readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to perform the method according to any one of claims 1-8.

10. A pedestrian re-identification system based on pseudo tag optimization, comprising:

a data acquisition unit;

a processor;

a memory;

and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the programs causing the computer to perform the method of any of claims 1-8.