CN111027421A

CN111027421A - Graph-based direct-push type semi-supervised pedestrian re-identification method

Info

Publication number: CN111027421A
Application number: CN201911173132.3A
Authority: CN
Inventors: 常新远; 龚怡宏; 魏星; 洪晓鹏; 马智恒
Original assignee: Xi'an Honggui Electronic Technology Co Ltd
Current assignee: Xi'an Honggui Electronic Technology Co Ltd
Priority date: 2019-11-26
Filing date: 2019-11-26
Publication date: 2020-04-17

Abstract

The invention discloses a direct-push type semi-supervised pedestrian re-identification method based on a graph, and belongs to the technical field of computer vision pedestrian re-identification. Firstly, training a two-channel model by using labeled pedestrian data, performing feature extraction on unlabeled pedestrian data after obtaining a base model, establishing a graph model for the extracted unlabeled pedestrian data features, giving a pseudo label to the unlabeled pedestrian data according to the graph model, and constructing a positive and negative sample pair by using the labeled pedestrian data and the unlabeled pedestrian data with the pseudo label; using a graph model to endow positive and negative samples with opposite credibility and then jointly fine-tuning a base model; gradually increasing the difficulty and confidence of the positive and negative sample pairs, training the base model to be completely converged by using a course learning method, performing feature extraction and feature matching on the verification set data after obtaining a final model, and completing pedestrian re-identification according to a matching result. The method reduces the negative influence caused by false labels, improves the robustness of the model, and further improves the accuracy of pedestrian re-identification.

Description

Graph-based direct-push type semi-supervised pedestrian re-identification method

Technical Field

The invention belongs to the technical field of computer vision pedestrian re-identification, and particularly relates to a direct-push type semi-supervised pedestrian re-identification method based on a graph.

Background

Pedestrian re-identification is a technique that uses computer vision techniques to determine whether a particular pedestrian is present in an image or video sequence. Is widely considered as a sub-problem for image retrieval. Given a monitored pedestrian image, the pedestrian image is retrieved across the device. The camera aims to make up the visual limitation of the existing fixed camera, can be combined with a pedestrian detection/pedestrian tracking technology, and can be widely applied to the fields of intelligent video monitoring, intelligent security and the like. Pedestrian re-identification is the need for the machine to identify all of the images of a particular person taken by different cameras. Specifically, it is a person comparison technique implemented by the overall characteristics of pedestrians to find one or more pictures (query images) belonging to a given person from among multiple pictures (galery images). The pedestrian re-identification technology has high application value in public security criminal investigation work, image retrieval and other scenes. Besides, the pedestrian re-identification technology can help mobile phone users to realize photo album clustering, help retail or business super-operators to obtain effective customer tracks and mine business values. However, the accuracy of the existing pedestrian re-identification technology is not high, and much work still depends on the input of a large amount of manpower.

Pedestrian re-identification is a very important and challenging task, and due to the fact that the time and the place of image shooting are random, the light, the angle and the posture are different, and in addition, pedestrians are easily affected by factors such as detection precision and shielding, the research work of pedestrian re-identification faces more difficult challenges in practical application. Most pedestrian re-identification algorithms employ fully supervised convolutional neural networks. However, a neural network with better normalization performance often requires tens of thousands of labeled training samples to train. Unlike classification datasets or face recognition datasets, the amount of data has spread to millions, with most pedestrian re-recognition datasets being less than 2000 people, with tens of images per person. Semi-supervised learning based pedestrian re-identification techniques would be more valuable in practical applications since obtaining labeled pedestrian samples is too expensive.

The research work in the field of pedestrian re-identification mainly adopts a feature representation method for researching a pedestrian object to extract identification features with higher robustness to represent pedestrians, and adopts a distance measurement learning method to enable the distance between images of the same person to be smaller than the distance between images of different pedestrians by learning a discriminative distance measurement function. The core goal of the image-based pedestrian re-identification technology is to find a pedestrian image which is most similar to a candidate set comprising N pedestrian images for a specified pedestrian image. In order to distinguish pedestrians with different identities, the pedestrian re-identification needs to extract an identifying pedestrian feature descriptor. In daily life, humans usually identify whether the same pedestrian is according to clothing, while in intelligent multi-camera surveillance systems, pedestrian appearance often changes dramatically due to changes in lighting, walking pose, camera view. How to extract robust descriptors under severe appearance change is a technical difficulty which needs to be solved at present, so that the method has great limitation in the practical application process.

Disclosure of Invention

In order to solve the above problems, an object of the present invention is to provide a graph-based direct-push semi-supervised pedestrian re-identification method, which reduces negative effects caused by false labels, improves robustness of a model, and further improves accuracy of pedestrian re-identification.

The invention is realized by the following technical scheme:

a graph-based direct-push type semi-supervised pedestrian re-identification method comprises the following steps:

step 1: training a double-channel model by using the pedestrian data with the labels to obtain a base model;

step 2: performing feature extraction on the unlabeled pedestrian data by using a base model, establishing a graph model for the extracted unlabeled pedestrian data features, giving a pseudo label to the unlabeled pedestrian data according to the graph model, and constructing a positive and negative sample pair by using the labeled pedestrian data and the unlabeled pedestrian data with the pseudo label;

and step 3: giving positive and negative sample opposite confidence degrees by using a graph model, and fine-tuning the base model obtained in the step 1 by using a positive and negative sample pair with the confidence degrees;

and 4, step 4: repeating the steps 2 and 3, gradually increasing the difficulty and confidence coefficient of the positive and negative sample pairs, and training the base model by using a course learning method until the base model is completely converged to obtain a final model;

and 5: and performing feature extraction and feature matching on the verification set data by using the final model, and completing pedestrian re-identification according to a matching result.

Preferably, step 1 is specifically: the pedestrian data marked with the label is X^LThe non-tag pedestrian data is X^U(ii) a Defining sample pairs belonging to the same label as positive sample pairs, and defining sample pairs not belonging to the same label as negative sample pairs; in a two-channel model, one of the two-channel model is ResNet50, and parameters of the model are obtained by learning and are set as a student model; another channel model is ResNet50, and the parameters of the model are obtained by the "student" model through exponential average moving calculation, and are set as the "teacher" model, and the calculation formula is as follows:

θ_t′＝αθ_t-1′+(1-α)θ_t

in the formula, theta_tFor "student" model parameters, θ_t'teacher' model parameter, α smoothing coefficient, and the loss function used by the two-channel model comprises three parts, namely a consistency loss function based on characteristics, a triple loss function and a cross entropy loss function, wherein:

in the formula, L^CLFor the consistency loss function, N is the number of samples,

η is the square of the norm of L2_iAnd η_i' are two different noises;

in the formula (I), the compound is shown in the specification,

is a triple loss function, N is the number of triples, f_θ() is a feature obtained by extracting a pedestrian image for a student model; theta is a parameter of the student model;

α is a boundary parameter in the triple loss function;

in the formula (I), the compound is shown in the specification,

is a cross entropy loss function, sigma is a standard softmax cross entropy loss function, N is the number of the labeled pedestrian data in the current training batch, and y is_iThe method comprises the following steps of (1) marking a label with labeled pedestrian data in a current training batch, wherein omega is a parameter of the last full-connection layer in a student model;

using a hyperparameter lambda₀Combining the triple loss function and the cross entropy loss function to obtain a fully supervised loss function L^SL：

Using a hyperparameter lambda₁Will fully supervise the loss function L^SLAnd consistency learning loss function L^CLAnd combining to obtain a final loss function of the labeled pedestrian data:

L^SL-CL＝L^SL+λ₁L^CL

and using the final loss function of the pedestrian data with the label as constraint, and using the pedestrian data with the label to train the double channels to obtain a base model.

Preferably, step 2 is specifically: performing feature extraction on the unlabeled pedestrian data by using a base model, and constructing a directed KNN graph G (V, E) by using the features of the unlabeled pedestrian data, wherein in G:

V＝{v_i＝f_θ(x_i)|x_i∈X^U}

E＝{e_ij＝P(v_i,v_j)|v_j∈N_k(v_i)}

in the formula, N_k(v_i) Is a vertex v_iKNN map of (C), P (v)_i,v_j) Is v is_iTo v_jA directed edge in between; selecting pairwise combination of all non-label pedestrian data as a positive sample pair C in a closed loop formed by a plurality of edges in a KNN image_tWherein t is the number of edges of the closed loop; for an anchor sample in the pedestrian data with the label, acquiring a positive sample from the pedestrian data with the same label, and acquiring a negative sample from the pedestrian data with different labels and the pedestrian data without the label; for anchor samples in unlabeled pedestrian data, from C_tObtaining a positive sample pair, and obtaining a negative sample pair from the labeled pedestrian data; difficulty in raising excavated negative samples:

where min (-) is the pair of samples selected with the smallest Euclidean distance,

is at C_tThe ith non-tag pedestrian data of (1),

is from X^LThe selected tagged pedestrian data is selected from the group,

to be in the current training batch, from C_tSelecting unlabeled pedestrian data, wherein D (-) is the calculated Euclidean distance; for one to belong to C_tAnchor sample of (2), N_iIs from X^LThe negative samples selected in (1) are selected,

is from C_tC is a constant used to control the confidence of the negative sample pair.

Preferably, the difficulty of mining negative samples is boosted by gradually decreasing the constant c.

Preferably, step 3 is specifically: and after positive and negative sample pairs are obtained, constructing triple loss by using the positive and negative sample pairs:

wherein N represents the number of triples in the current training batch, and the number of the non-labeled data and the number of the labeled data in each batch are equal; triplet slave C of unlabeled data_t，N_i，

Sampling by using a standard triple sampling strategy for triples with label data; assignment of triple confidence, s, using graph models_iIs the confidence of the ith triplet; confidence s for triples of tagged data_iSet to a constant of 1; for confidence setting of triples of unlabeled data, D is used_iTo define the repetition times of the ith sample pair in the graph model, wherein the sample pair with the most repetition times in the graph model is D_max＝max({D_i}) from

Using a constant c for controlling the confidence coefficient of the negative sample pair as the confidence coefficient of the negative sample pair; from N_iA negative pair of mid-samples, using c as its confidence; the final triplet confidence for unlabeled data is defined as:

wherein α represents the lowest confidence of the positive sample pairs of unlabeled data selected from the graph model, defining a direct-push metric learning loss function as:

using a hyperparameter lambda₀Combining a direct-push type metric learning loss function with a consistency learning loss function, wherein the final loss function is as follows:

L^TSML-CL＝L^TSML+λ₁L^CL，

and (3) based on the loss function, jointly fine-tuning the base model obtained in the step (1) by using positive and negative sample pairs with confidence coefficients, and improving the performance of the model.

Further preferably, the value range of the constant c is 0.5-1.

Compared with the prior art, the invention has the following beneficial technical effects:

the invention relates to a direct-push type semi-supervised pedestrian re-identification method based on a graph, which is combined with a deep convolutional neural network technology, firstly constructs a graph model aiming at non-tag data, performs difficult sample mining on the non-tag data by utilizing the graph model, simultaneously endows the difficult sample confidence coefficient by utilizing the graph model, and optimizes the model by utilizing a pseudo tag with the confidence coefficient. Compared with traditional algorithms such as Knn, K-means and the like, the method based on the graph model can obtain more accurate pseudo labels. The introduction of confidence also makes the pseudo-label more stable when used, for example, samples with lower confidence will control its influence on the optimization model process. The method for mining the difficult samples can mine more difficult positive and negative sample pairs, and can maximize the feature expression capability of the model when used in metric learning. The traditional method is only suitable for mining the difficult samples with the labels, the difficult sample mining method provided by the invention is also suitable for non-label data, and the traditional difficult sample mining method is improved aiming at the prior knowledge of data distribution, so that the positive and negative sample pairs are more sufficient, and the performance of the model can be further greatly improved.

Meanwhile, consistency learning is introduced and improved, and the model is optimized by using the assumption that the same data added with different noises has consistent characteristics. For the unlabeled data, the optimization process of the model is constrained by using the characteristic consistency hypothesis, so that the teacher model and the student model can learn mutually to optimize parameters, the convergence is accelerated, and the performances of the two models are improved. The course learning method is introduced, wherein simple knowledge (samples) is firstly learned, and then the knowledge (samples) difficult to learn is gradually increased. Firstly, fine-tuning a basic model by using the non-label data and the labeled data with high confidence level, updating the confidence level of the non-label data after the performance of the model is improved, wherein the confidence level information is more reliable than that before the updating, and then fine-tuning the model again by using the non-label data and the labeled data which update the confidence level information, so that the model is repeatedly converged to obtain a final model. Meaningful training data ordering can maximize the improvement in model performance. The method reduces the negative influence caused by false labels, improves the robustness of the model, and further improves the accuracy of pedestrian re-identification.

Drawings

FIG. 1 is a flow chart of the present invention;

fig. 2 is a schematic diagram of a direct-push metric learning network structure according to the present invention.

Detailed Description

The invention will be described in further detail with reference to the following drawings and examples, which are given by way of illustration and not by way of limitation.

Fig. 1 is a logic block diagram of the flow of the present invention, and the diagram-based direct-push type semi-supervised pedestrian re-identification method of the present invention includes the following steps:

step 1: constructing a triplet by using the pedestrian data with the labels;

step 2: training a double-channel model, as shown in FIG. 2, to obtain a base model;

and step 3: performing feature extraction on the unlabeled pedestrian data by using a base model;

and 4, step 4: establishing a graph model for the extracted non-tag pedestrian data characteristics;

and 5: giving a pseudo label with confidence degree to the non-label pedestrian data according to the graph model, and constructing a triple for the non-label pedestrian data;

step 6: using the tagged pedestrian data triple and the untagged pedestrian data triple with the pseudo tag to jointly fine tune the model;

and 7: repeating the steps 2, 3, 4, 5 and 6, and training the model until the model is completely converged to obtain a final model;

and 8: and performing feature extraction and feature matching on the verification set data by using the final model, and completing pedestrian re-identification according to a matching result.

Specifically, the method comprises the following steps: the pedestrian data marked with the label is X^LThe non-tag pedestrian data is X^U(ii) a Defining sample pairs belonging to the same label as positive sample pairs, and defining sample pairs not belonging to the same label as negative sample pairs; in a two-channel model, one of the two-channel model is ResNet50, and parameters of the model are obtained by learning and are set as a student model; another channel model is ResNet50, and the parameters of the model are calculated by the "student" model through Exponential Moving Average (Exponential Moving Average), and are set as the "teacher" model, and the calculation formula is as follows:

θ_t′＝αθ_t-1′+(1-α)θ_t

in the formula, theta_tFor "student" model parameters, θ_t'teacher' model parameters, model frame as shown in FIG. 2, α smoothing coefficients, and loss function used by the two-channel model includes three parts, one part based on characteristicsAn induced Loss function, a triple Loss function, and a cross entropy Loss function, as shown in fig. 2, which are respectively consistence Loss, triple Loss, and class Loss; wherein:

η is the square of the norm of L2_iAnd η_iTwo different types of noise;

in the formula (I), the compound is shown in the specification,

α is a boundary parameter in the triple loss function;

in the formula (I), the compound is shown in the specification,

using a hyperparameter lambda₀Combining the triplet loss function and the cross entropy loss function, λ₀Is set to 0.1 during model training to obtain a fully supervised loss function L^SL：

L^SL-CL＝L^SL+λ₁L^CL

using the final loss function of the pedestrian data with the label as constraint, and using the pedestrian data with the label to train the double channels to obtain a base model; as shown in fig. 1 with labeled data triplet construction and model training.

Feature extraction is performed on the unlabeled pedestrian data by using a basis model, a directed KNN graph G (V, E) is constructed by using the features of the unlabeled pedestrian data, and K-4 is used as a neighbor number. In G:

V＝{v_i＝f_θ(x_i)|x_i∈X^U}

E＝{e_ij＝P(v_i,v_j)|v_j∈N_k(v_i)}

in the formula, N_k(v_i) Is a vertex v_iKNN map of (C), P (v)_i,v_j) Is v is_iTo v_jA directed edge in between; if t directed edges are available, a vertex can be connected back to itself, e_ij→e_jk→e_kl→e_liWe call such vertices to form a "ring", and t refers to the order of this "ring". Any combination of samples in a ring is referred to as a positive sample pair generated by the ring. The probability that the "ring" obtained by the KNN map belongs to the same tag is higher. Compared with the neighbor-based method, the method can only find the positive sample pair with smaller intra-class variance. The positive sample pairs provided by the "ring" have a larger intra-class variance. In KNN diagramIn a closed loop formed by a plurality of edges, selecting pairwise combination of all non-label pedestrian data as a positive sample pair C_tWherein t is the number of edges of the closed loop; for an anchor sample in the pedestrian data with the label, acquiring a positive sample from the pedestrian data with the same label, and acquiring a negative sample from the pedestrian data with different labels and the pedestrian data without the label; for anchor samples in unlabeled pedestrian data, from C_tObtaining a positive sample pair, and obtaining a negative sample pair from the labeled pedestrian data; difficulty in raising excavated negative samples:

is at C_tThe ith non-tag pedestrian data of (1),

is from X^LThe selected tagged pedestrian data is selected from the group,

is from C_tC is the confidence used to control the negative sample pairA constant of (d); c-0.7, c-0.8, c-0.9 are typically used to create more difficult negative sample pairs.

And after positive and negative sample pairs are obtained, constructing triple loss by using the positive and negative sample pairs:

For the negative sample pair of the middle sample, a constant c for controlling the confidence coefficient of the negative sample pair is used as the confidence coefficient of the negative sample pair; from N_iA negative pair of samples, using c-1 as its confidence; the final triplet confidence for unlabeled data is defined as:

where α represents the lowest confidence of a positive sample pair of unlabeled data selected from the graph model, α is set to 0.8, and a direct-push metric learning loss function is defined as:

L^TSML-CL＝L^TSML+λ₁L^CL，

And after the performance of the model is optimized, repeatedly extracting the features, constructing a graph model, continuously updating the confidence coefficients of the positive and negative sample pairs and the positive and negative sample pairs, and training the base model to be completely converged by using a course learning method to obtain a final model. And performing feature extraction and feature matching on the verification set data by using the final model, and completing pedestrian re-identification according to a matching result.

The invention is further illustrated by the following specific examples:

the method is realized by adopting a ResNet50 convolutional neural network model, the size of an input image is 128 x 384, the dimension of a feature layer is 2048, a PyTorch framework is used, an Adam optimizer is used, the initial learning rate is 0.0002, the weight decade is 0.0005, 32 pedestrian IDs (identity) are in total in each batch, each pedestrian ID is provided with 4 images, and the model is pre-trained in an ImageNet data set.

Table 1 lists the comparison of the classification accuracy of the pedestrian re-identification method of the present invention with that of other methods on the public pedestrian re-identification data set (Market1501, DukeMCMT-ReID), and it can be seen that the accuracy of the model obtained by the method is higher.

TABLE 1

[1]X.Xin,J.Wang,R.Xie,S.Zhou,W.Huang,N.Zheng,Semi-supervised personre-identification using multi-view clustering,Pattern Recognition 88(2019)285–297.

[2]X.Xin,X.Wu,Y.Wang,J.Wang,Deep self-paced learning for semi-supervised person re-identification using multi-view self-paced clustering,in:2019IEEE International Conference on Image Processing(ICIP),2019,pp.2631–2635.doi:10.1109/ICIP.2019.8803290.

It should be noted that the above description is only a part of the embodiments of the present invention, and all equivalent changes made according to the present invention are included in the protection scope of the present invention. Those skilled in the art to which the invention relates may substitute similar embodiments for the specific examples described, all falling within the scope of the invention, without thereby departing from the invention or exceeding the scope of the claims defined thereby.

Claims

1. A graph-based direct-push semi-supervised pedestrian re-identification method is characterized by comprising the following steps:

2. The graph-based direct-push semi-supervised pedestrian re-identification method according to claim 1, wherein the step 1 is specifically as follows: the pedestrian data marked with the label is X^LNumber of pedestrians without labelsAccording to X^U(ii) a Defining sample pairs belonging to the same label as positive sample pairs, and defining sample pairs not belonging to the same label as negative sample pairs; in a two-channel model, one of the two-channel model is ResNet50, and parameters of the model are obtained by learning and are set as a student model; another channel model is ResNet50, and the parameters of the model are obtained by the "student" model through exponential average moving calculation, and are set as the "teacher" model, and the calculation formula is as follows:

θ_t′＝αθ_t-1′+(1-α)θ_t

η is the square of the norm of L2_iAnd η_i' are two different noises;

in the formula (I), the compound is shown in the specification,

anchor, positive and negative samples in the tripletα is the boundary parameter in the triple loss function;

in the formula (I), the compound is shown in the specification,

L^SL-CL＝L^SL+λ₁L^CL

3. The graph-based direct-push semi-supervised pedestrian re-identification method according to claim 1, wherein the step 2 is specifically as follows: performing feature extraction on the unlabeled pedestrian data by using a base model, and constructing a directed KNN graph G (V, E) by using the features of the unlabeled pedestrian data, wherein in G:

V＝{v_i＝f_θ(x_i)|x_i∈X^U}

E＝{e_ij＝P(v_i,v_j)|v_j∈N_k(v_i)}

is at C_tThe ith non-tag pedestrian data of (1),

is from X^LThe selected tagged pedestrian data is selected from the group,

to be in the current training batch, from C_tSelecting unlabeled pedestrian data, wherein D (-) is the calculated Euclidean distance; for oneBelong to C_tAnchor sample of (2), N_iIs from X^LThe negative samples selected in (1) are selected,

4. The graph-based direct-push semi-supervised pedestrian re-identification method of claim 1, wherein the difficulty of mining negative examples is boosted by gradually decreasing the constant c.

5. The graph-based direct-push semi-supervised pedestrian re-identification method according to claim 1, wherein the step 3 is specifically as follows: and after positive and negative sample pairs are obtained, constructing triple loss by using the positive and negative sample pairs:

Negative sample pair of medium sampling, using controlThe constant c of the confidence degree of the negative sample pair is used as the confidence degree of the negative sample pair; from N_iA negative pair of mid-samples, using c as its confidence; the final triplet confidence for unlabeled data is defined as:

L^TSML-CL＝L^TSML+λ₁L^CL，

6. The graph-based direct-push semi-supervised pedestrian re-identification method as claimed in any one of claims 3 to 5, wherein the constant c has a value in a range of 0.5 to 1.