CN112395997A

CN112395997A - Weak supervision training method of pedestrian re-recognition model based on micrographic learning

Info

Publication number: CN112395997A
Application number: CN202011303629.5A
Authority: CN
Inventors: 张吉祺; 林倞; 聂琳; 王广润; 王广聪
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2021-02-23
Anticipated expiration: 2040-11-19
Also published as: CN112395997B

Abstract

The invention provides a weak supervision training method of a pedestrian re-recognition model based on micrographic learning, which comprises the steps of firstly grouping pedestrian pictures into bags according to a shooting time period and distributing bag type labels; then, capturing the dependency relationship among all pictures in each bag to generate a reliable pseudo-pedestrian category label for each picture in the bag of the category, wherein the label is used as supervision information for training a pedestrian re-identification model; then, carrying out integral training of the pedestrian re-identification model and the graph model; and (3) taking the linear combination of the graph model loss and the re-identification loss as a total loss function, and updating parameters of all layers of the network by using a back propagation algorithm. The method can achieve advanced model performance without heavy manual labeling cost and almost without increasing the computational complexity.

Description

Weak supervision training method of pedestrian re-recognition model based on micrographic learning

Technical Field

The invention relates to the technical field of machine vision, in particular to a weak supervision training method of a pedestrian re-recognition model based on micrographic learning.

Background

At present, the pedestrian re-identification problem mainly has three realization methods: (1) extracting distinguishing features; (2) learning a stable metric or subspace for matching; (3) combining the two methods. However, most implementations require strong supervised training labels, i.e. manual labeling of each picture of the dataset. In addition, a pedestrian re-identification method based on unsupervised learning without manual labeling uses a local saliency matching or clustering model, but the remarkable difference of cross-camera views is difficult to model, so that high precision is difficult to achieve. In contrast, the weak supervision pedestrian re-identification method provided by the invention is an excellent training method, and can achieve higher precision without expensive manual labeling cost.

Weak supervision learning: although training deep neural networks with weakly supervised approaches is a challenging problem, it has been little studied to solve certain tasks such as picture classification, semantic segmentation and object detection. Similar to these studies, the present invention is also based on the generation of pseudo-labels, but the task of weakly supervised pedestrian re-identification has two features: (1) a representative image of each individual pedestrian cannot be found because people change clothes in a short time, and therefore the label is not clear; (2) the entropy is larger than that of other tasks, for example, pixels of an image in a weak supervision semantic segmentation task have certain stability, and pedestrians in a pedestrian re-identification task are more disordered and irregular. The two characteristics improve the difficulty of re-identification of the weak supervision pedestrian.

Uncertain label learning: where single sample pedestrian re-identification is most relevant to the present invention, but two differences exist: (1) each pedestrian category of single-sample pedestrian re-identification needs at least one picture instance, and the data set of the method does not need an accurate pedestrian category label; (2) the method introduces the bag type label as the estimation for limiting and guiding the pseudo-pedestrian type label, and ensures that the pseudo-label is more reliable to generate than the single-sample pedestrian re-identification.

Pedestrian search: the process of pedestrian detection and pedestrian re-identification is combined. The present invention has two main differences with it: (1) the present invention is only concerned with visual feature matching because the capabilities of current people detectors are adequate; (2) the method benefits from low-cost weak labels, and each training picture searched by the pedestrian still needs strong labels.

Application No. 201710487019.7 discloses an image quality scoring method using a depth-generating machine learning model that can use depth machine learning to create a generative model expecting good quality images for image quality scoring of images from a medical scanner. The deviation of the input image from the generative model is used as an input feature vector for the discriminant model; the discriminant model may also operate on additional input feature vectors derived from the input image. Based on these. However, the patent cannot directly express graph learning as a loss function which can be slightly applied to network parameters, so that the graph learning can be optimized by a random gradient descent method, and the integrated training of a graph model and a pedestrian re-recognition model is realized.

Disclosure of Invention

The invention provides a weak supervision training method of a pedestrian re-recognition model based on micrographic learning, which is characterized in that a module for automatically generating a training label is added to a pedestrian re-recognition deep neural network and is trained with the pedestrian re-recognition deep neural network integrally, so that the algorithm complexity is reduced.

In order to achieve the technical effects, the technical scheme of the invention is as follows:

a pedestrian re-recognition model weak supervision training method based on micrographic learning comprises the following steps:

s1: grouping the pedestrian pictures into bags according to the shooting time period and distributing bag type labels;

s2: capturing the dependency relationship among all pictures in each bag to generate a reliable pseudo-pedestrian category label for each picture in the bag of the category, wherein the label is used as supervision information for training a pedestrian re-identification model;

s3: carrying out integrated training on a pedestrian re-identification model and a graph model;

s4: and (3) taking the linear combination of the graph model loss and the re-identification loss as a total loss function, and updating parameters of all layers of the network by using a back propagation algorithm.

Further, the specific process of step S1 is:

denote by b a bag containing p pictures, i.e. b ═ x₁,x₂,…,x_j,…,x_p，y＝y₁,y₂,…,y_j,…,y_pThe pedestrian category label is denoted by l.

Further, the process of step S2 is:

if only the bag type label l is available for the weak supervision pedestrian re-identification, a false pedestrian type label needs to be estimated for each picture, and is represented by a probability vector Y; assuming that n pedestrian categories are contained in the bags under the category label, the whole training set has m pedestrian categories, and the bag category label is used for limiting Y, each picture x_jThe probability vector of the pedestrian category label is:

further, the process of step S3 is:

defining a directed graph, each node representing a picture x in a bag_iEach edge represents the relationship between pictures, and the energy function of assigning a pedestrian category label y to a node x on the graph is as follows:

where U and V represent nodes and edges, respectively, phi (y)_i|x_i) Is calculated as picture x_iAssigning label y_iA unitary term of cost of Ψ (y)_i,y_j|x_i；x_j) Is calculated as a picture pair (x)_i,x_j) Assigning pairs of penalties for labels, and eliminating false labels generated by weak supervised learning by formula (2);

the univariate term in equation (2) is defined as:

Φ(y_i|x_i)＝-log(Y_i[y_i]) Wherein

Wherein P is_iIs the neural network as picture x_iCalculated probability of pedestrian class label, Y_iIs the bag limit expressed by equation (1),

representing element-by-element products [ ·]Representing a vector index;

since the output of the unary terms of different pictures are independent of each other, the unary terms are unstable and need to be smoothed by the pairwise terms:

calculating the appearance similarity by using a Gaussian kernel based on RGB colors, controlling the size of the Gaussian kernel by using a super-parameter sigma, and limiting pictures with similar appearances to have the same label; tag compatibility ζ (y)_i,y_j) Expressed by the glass model:

further, the bag category label contains additional information to improve the generation of the pseudo label: correcting the estimated pseudo label as the pedestrian classification with the highest prediction score in the bag; causing a portion of the picture to be assigned to a pedestrian category that is not predicted; the false pedestrian category label for each picture can be obtained by minimizing equation (2):

where {1,2,3, …, m } represents all pedestrian classes in the training set.

Further, in step S3, before performing the integrated training of the pedestrian re-identification model and the graph model, the graph model needs to be miniaturized, and the specific process is as follows:

obtaining a pseudo-pedestrian category label by using an external graph model for supervising training of a pedestrian to re-identify a deep neural network, wherein the calculation of obtaining the pseudo label by minimizing the formula (2) is not trivial, so that the graph model is not compatible with the deep neural network, and therefore the relaxation formula (2) is required to be:

the discrete Φ and Ψ are serialized:

the difference between the formula (8) and the formula (3) is that in the non-micrographic model, all possible y needs to be input into the energy function, and the y with the lowest energy is taken as the optimal solution; in a micro-map model, directly inputting a picture x into a deep neural network to obtain the prediction of y; the difference between equation (9) and equation (4) is that the cross-entropy term- (Y) is used_iP_i)^T log(Y_jP_j) Approximation of the irreducible term ζ (y) in equation (4)_i,y_j)Y_iY_j。

Further, in the step S4, the graph model loss L_{Drawing (A)}And classification/re-identification loss L_{Classification}，L_{Classification}Is a pseudo tag

As a supervised normalized exponential cross entropy loss function:

wherein

Show that

Conversion into a function of the unique heat vector, n representing the number of pictures in a bag, P_iThe probability representing the pedestrian class calculated by the neural network is a normalized exponential function of the logarithm of the network output z:

where m represents the number of pedestrian classes of the training set, the total loss function L is a linear combination of the two loss functions:

L＝w_{classification}L_{Classification}+w_{Drawing (A)}L_{Drawing (A)} (12)

Wherein w_{Classification}And w_{Drawing (A)}The weights representing the two losses, respectively, are set to 1 and 0.5, respectively.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention combines the micrographic learning method and the weak supervised learning method, adds a module for automatically generating the training label for the deep neural network for pedestrian re-identification and trains the module and the deep neural network integrally.

Drawings

FIG. 1 is a graph model of a bag of pictures generating pseudo-pedestrian category labels;

FIG. 2 is a training flow diagram of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

1. ranging from supervised pedestrian re-identification to weakly supervised pedestrian re-identification

Denote by b a bag containing p pictures, i.e. b ═ x₁,x₂,…,x_j,…,x_p，y＝y₁,y₂,…,y_j,…,y_pThe pedestrian category label is denoted by l. The supervised pedestrian re-identification needs classification prediction of a pedestrian class label y supervision model; the weak supervision pedestrian re-identification only uses a bag type label l, a false pedestrian type label needs to be estimated for each picture, and the false pedestrian type label is represented by a probability vector Y. Assuming that l contains n pedestrian categories, the whole training set has m pedestrian categories, and Y is limited by bag category labels, then each picture x_jThe probability vector of the pedestrian category label is:

2. weak supervision pedestrian re-identification based on micrographic learning

Graph model pedestrian re-identification

As shown in FIG. 1, a directed graph is defined, each node representing a picture x in a bag_iEach edge represents the relationship between pictures, and the energy function of assigning a pedestrian category label y to a node x on the graph is as follows:

where U and V represent nodes and edges, respectively, phi (y)_i|x_i) Is calculated as picture x_iAssigning label y_iA unitary term of cost of Ψ (y)_i,y_j|x_i；x_j) Is calculated as a picture pair (x)_i,x_j) A pair of penalties for the tag is assigned. Equation (2) eliminates false tags of errors generated by weakly supervised learning.

Unary item

The univariate term in equation (2) is defined as:

Φ(y_i|x_i)＝-log(Y_i[y_i]) Wherein

representing element-by-element products [ ·]Representing the vector index.

Paired items

bag restraint

Indeed, the bag category label contains additional information to improve the generation of the pseudo label: correcting the estimated pseudo label as the pedestrian classification with the highest prediction score in the bag; causing portions of the picture to be assigned to pedestrian categories that are not predicted.

Inference of false pedestrian category labels

The false pedestrian category label for each picture can be obtained by minimizing equation (2):

where {1,2,3, …, m } represents all pedestrian classes in the training set.

The study of the drawings can be miniaturized

The weak supervision pedestrian re-identification method is not integrally trained, because firstly, an external graph model is needed to obtain a pseudo pedestrian category label for training the supervision pedestrian re-identification deep neural network. Minimizing the computation of equation (2) to obtain pseudo labels is not trivial, making the graph model incompatible with deep neural networks, and therefore requires relaxing equation (2) as:

the discrete Φ and Ψ are serialized:

the difference between the formula (8) and the formula (3) is that in the non-micrographic model, all possible y needs to be input into the energy function, and the y with the lowest energy is taken as the optimal solution; in the micrographic model, the picture x is directly input into the deep neural network to obtain the prediction of y. The difference between equation (9) and equation (4) is that the cross-entropy term- (Y) is used_iP_i)^T log(Y_jP_j) Approximation of the irreducible term ζ (y) in equation (4)_i,y_j)Y_iY_j。

3. Whole neural network structure

Fig. 2 is a network structure of training and reasoning, with dashed lines representing the training data flow and solid lines representing the reasoning data flow, wherein the graph model only participates in the training phase. The overall structure comprises three main modules:

feature extraction module

Referring to FIG. 2(a), ResNet-50 is used as backbone network, the last layer of original ResNet-50 is removed, and the network is changed into a full connection layer with 512-dimensional output, a batch normalization, a linear rectification function with leakage and a dropout.

Coarse pedestrian heavy identification module

As shown in fig. 2(b), a full-link layer with the same output dimension as the number of the pedestrian categories is added to the top of the feature extraction module, and then the normalized exponential cross entropy is used as a loss function. Pedestrian category prediction score

As a rough pedestrian re-identification estimate, the probability of the pedestrian category of the picture in pocket b is represented.

Refined pedestrian re-identification module

As shown in fig. 2(c), coarse pedestrian re-identification scores, appearances and bag limits are input into the graph model according to equations (8) and (9), and the pseudo labels generated by the graph model can be used to update network parameters like artificially labeled real labels.

4. Optimization

The gradient of the integral loss value corresponding to the deep neural network parameters can be calculated by obtaining the pseudo-pedestrian category label, and the gradient is transmitted back to all layers of the network by utilizing a back propagation algorithm, so that the integral training of all parameters of the weak supervision model is realized.

Loss function

The optimization objective of the method comprises graph model loss L_{Drawing (A)}And classification/re-identification loss L_{Classification}，L_{Classification}Is a pseudo tag

As a supervised normalized exponential cross entropy loss function:

wherein

Show that

where m represents the number of pedestrian classes of the training set.

The total loss function L is a linear combination of these two loss functions:

L＝w_{classification}L_{Classification}+w_{Drawing (A)}L_{Drawing (A)} (12)

Wherein w_{Classification}And w_{Drawing (A)}Representing the weights of the two losses, respectively, the method is set to 1 and 0.5.

The same or similar reference numerals correspond to the same or similar parts;

the positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. a weakly supervised training method based on the pedestrian re-identification model of differentiable graph learning, is characterized in that, comprises the following steps:

S1: Group pedestrian images into bags according to the shooting time period and assign bag category labels;

S2: Capture the dependencies between all pictures in each bag to generate a reliable pseudo-pedestrian category label for each picture in the bag of this category, as the supervision information for the training of the pedestrian re-identification model;

S3: Perform integrated training of person re-identification model and graph model;

S4: Use the linear combination of the graph model loss and the re-identification loss as the total loss function, and use the back-propagation algorithm to update the parameters of all layers of the network.

2. the weakly supervised training method of the pedestrian re-identification model based on differentiable graph learning according to claim 1, is characterized in that, the concrete process of described step S1 is:

Let b denote a bag containing p pictures, that is, b=x ₁ , x ₂ ,..., x _j ,..., x _p , y=y ₁ , y ₂ ,..., y _j ,..., y _p is the pedestrian category label, denoted by l for the bag category label.

3. the weakly supervised training method of the pedestrian re-identification model based on differentiable graph learning according to claim 2, is characterized in that, the process of described step S2 is:

If only the bag category label l is available for weakly supervised pedestrian re-identification, it is necessary to estimate a pseudo-pedestrian category label for each image, represented by a probability vector Y; assuming that the bag under the l category label contains n pedestrian categories, the entire training set There are m pedestrian categories in total, and Y is limited by the bag category label, then the probability vector of the pedestrian category label of each image x _j is:

4. the weakly supervised training method of the pedestrian re-identification model based on differentiable graph learning according to claim 3, is characterized in that, the process of described step S3 is:

Define a directed graph, each node represents a picture x _i in a bag, each edge represents the relationship between the pictures, the energy function of assigning the pedestrian category label y to the node x on the graph is:

where U and V represent nodes and edges, respectively, Φ(y _i |x _i ) is a unary term that computes the cost of assigning a label y _i to an image _xi , and Ψ(y _i , y _j |x _i ; x _j ) is the computation Pairwise terms of the penalty for assigning labels to image pairs (x _i , x _j ), Eq. (2) eliminates erroneous pseudo-labels generated by weakly supervised learning;

The unary term in formula (2) is defined as:

where P _i is the probability of the pedestrian category label calculated by the neural network for the image x _i , Y _i is the bag limit expressed by formula (1), ⊙ denotes the element-wise product, [ ] denotes the vector index;

Since the outputs of the unary items of different pictures are independent of each other, the unary items are unstable and need to be smoothed with paired items:

Among them, a Gaussian kernel based on RGB color is used to calculate the appearance similarity, and the hyperparameter σ controls the size of the Gaussian kernel, limiting the appearance of similar pictures to have the same label; label compatibility ζ(y _i , y _j ) is represented by the Botz model:

5. The weakly supervised training method of a person re-identification model based on differentiable graph learning according to claim 4, wherein the bag category label contains additional information to improve the generation of pseudo-labels: correct the estimated pseudo-label to be in the bag Pedestrian classification with the highest predicted score; prompts some images to be assigned to pedestrian classes that were not predicted.

6. The weakly supervised training method of the pedestrian re-identification model based on differentiable graph learning according to claim 5, the pseudo pedestrian category label of each picture can be obtained by minimizing formula (2):

where {1, 2, 3, ..., m} represent all pedestrian categories in the training set.

7. the weakly supervised training method of the pedestrian re-identification model based on differentiable graph learning according to claim 6, is characterized in that, in step S3, before carrying out the integrated training of pedestrian re-identification model and graph model, need to carry out graph The model can be differentiable, and the specific process is:

An external graph model is used to obtain the pseudo-pedestrian category label, which is used to supervise the training of the deep neural network for pedestrian re-identification. The calculation of the pseudo-label obtained by minimizing the formula (2) is non-differentiable, making the graph model incompatible with the deep neural network. Therefore, it is necessary to relax formula (2) as:

Continuing discrete Φ and Ψ:

The difference between formula (8) and formula (3) is that in the non-differentiable graph model, all possible y needs to be input to the energy function, and the y with the lowest energy is used as the optimal solution; in the differentiable graph model, the image is directly x is input to the deep neural network to obtain the prediction of y; the difference between formula (9) and formula (4) is that the non-differentiable in formula (4) is approximated by the cross-entropy term -(Y _i P _i ) ^T log(Y _j P _j ) The term ζ(y _i , y _j )Y _i Y _j .

8. the weakly supervised training method of the pedestrian re-identification model based on differentiable graph learning according to claim 7, is characterized in that, in described step S4, graph model loss L _graph and classification/re-identification loss L _{classification} , L _{Classification} is pseudo-label

As a supervised normalized exponential cross-entropy loss function:

in

means to

The function converted to a one-hot vector, _n represents the number of pictures in a bag, Pi represents the probability of the pedestrian class calculated by the neural network, and is a normalized exponential function of the network output logarithm z:

where m represents the number of pedestrian categories in the training set, and the total loss function L is a linear combination of these two loss functions:

L=w _{classification} L _{classification} + w _map L _map (12)

where w _{classification} and w _map represent the weights of the two losses, respectively.

9 . The weakly supervised training method of a person re-identification model based on differentiable graph learning according to claim 8 , wherein the w _{classification} is set to 1. 10 .

10 . The weakly supervised training method of a person re-identification model based on differentiable graph learning according to claim 8 , wherein the w _graph is set to 0.5. 11 .