CN112395997B

CN112395997B - Weak supervision training method based on pedestrian re-recognition model capable of micro-graph learning

Info

Publication number: CN112395997B
Application number: CN202011303629.5A
Authority: CN
Inventors: 聂琳; 张吉祺; 林倞; 王广润; 王广聪
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2023-11-24
Anticipated expiration: 2040-11-19
Also published as: CN112395997A

Abstract

The invention provides a weak supervision training method based on a pedestrian re-identification model capable of micro-image learning, which comprises the steps of firstly grouping pedestrian images into bags according to shooting time periods and distributing bag type labels; then, capturing the dependency relationship among all pictures in each bag to generate a reliable pseudo pedestrian category label for each picture in the bag of the category, and taking the pseudo pedestrian category label as supervision information for training a pedestrian re-identification model; then, carrying out integrated training of the pedestrian re-recognition model and the graph model; the linear combination of graph model loss and re-identification loss is used as a total loss function, and the parameters of all layers of the network are updated by using a back propagation algorithm. The invention can achieve the leading model performance without heavy manual labeling cost and little increase of computational complexity.

Description

Weak supervision training method based on pedestrian re-recognition model capable of micro-graph learning

Technical Field

The invention relates to the technical field of machine vision, in particular to a weak supervision training method based on a pedestrian re-recognition model capable of micro-graph learning.

Background

At present, three main implementation methods for the problem of pedestrian re-identification are as follows: (1) extracting distinguishing characteristics; (2) Learning a stable metric or subspace for matching; (3) combining the two methods. However, most implementations require strong supervision training labels, i.e., manual labeling of each picture of the dataset. In addition, there are pedestrian re-recognition methods based on unsupervised learning that do not require manual labeling, using local saliency matching or clustering models, but it is difficult to model the significant differences across camera views, and therefore it is difficult to achieve high accuracy. In contrast, the weak supervision pedestrian re-identification method provided by the invention is an excellent training method, and can achieve higher precision without expensive manual labeling cost.

Weak supervision learning: while training deep neural networks with weakly supervised methods is a challenging problem, it has been studied to solve certain tasks such as picture classification, semantic segmentation, and object detection. Similar to these studies, the invention is also based on the generation of pseudo tags, but the task of weak supervision pedestrian re-identification has two characteristics: (1) Representative images of each individual pedestrian cannot be found, and the tags are ambiguous because people change clothes in a short time; (2) Entropy is larger than other tasks, for example, pixels of images in a weak supervision semantic segmentation task have certain stability, and pedestrians in a pedestrian re-recognition task are more disordered and irregular. The two characteristics improve the difficulty of the re-identification of the weak supervision pedestrians.

Uncertain tag learning: where single sample pedestrian re-recognition is most relevant to the present invention, but there are two differences: (1) Each pedestrian category identified by the single sample pedestrian re-identification needs at least one picture instance, and the data set of the method does not need an accurate pedestrian category label; (2) The invention introduces the bag type label as the estimation of the limited guide pseudo pedestrian type label, and ensures that the pseudo label is generated more reliably than the single sample pedestrian re-identification.

Pedestrian search: combining the processes of pedestrian detection and pedestrian re-identification. The invention differs from the above two mainly: (1) The present invention focuses only on visual feature matching, as the capabilities of current person detectors are adequate; (2) The invention benefits from low-cost weak labeling, and each training picture searched by the pedestrian still needs strong labeling.

Application number 201710487019.7 discloses an image quality scoring method using a depth-generated machine learning model that uses depth machine learning to create a generated model of an expected good quality image for image quality scoring of images from a medical scanner. Deviations of the input image from the generative model are used as input feature vectors for the discriminant model; the discriminant model may also operate on further input feature vectors derived from the input image. Based on this. However, the patent fails to directly represent graph learning as a loss function that is differentiable for network parameters, so that the loss function can be optimized by a random gradient descent method, and the integrated training of the graph model and the pedestrian re-recognition model is realized.

Disclosure of Invention

The invention provides a weak supervision training method based on a pedestrian re-recognition model capable of micro-graph learning, which is used for adding a module for automatically generating a training label into a pedestrian re-recognition deep neural network and training the module and the module integrally, so that the algorithm complexity is reduced.

In order to achieve the technical effects, the technical scheme of the invention is as follows:

a weak supervision training method based on a pedestrian re-recognition model capable of micro-graph learning comprises the following steps:

s1: grouping pedestrian pictures into bags according to shooting time periods and distributing bag type labels;

s2: capturing the dependency relationship among all pictures in each bag to generate a reliable pseudo pedestrian category label for each picture in the bag of the category, and taking the pseudo pedestrian category label as supervision information for training a pedestrian re-identification model;

s3: performing integrated training of a pedestrian re-recognition model and a graph model;

s4: the linear combination of graph model loss and re-identification loss is used as a total loss function, and the parameters of all layers of the network are updated by using a back propagation algorithm.

Further, the specific process of the step S1 is:

denoted by b is a pocket containing p pictures, i.e. b=x ₁ ,x ₂ ,…,x _j ,…,x _p ，y＝y ₁ ,y ₂ ,…,y _j ,…,y _p For pedestrian category labels, the bag category label is denoted by l.

Further, the process of step S2 is characterized in that:

if only the bag type label l is available for the weak supervision pedestrian re-identification, estimating a pseudo pedestrian type label for each picture, and representing the pseudo pedestrian type label by a probability vector Y; assuming that the bag under the class label comprises n pedestrian classes, the whole training set has m pedestrian classes, and the bag class label is used for limiting Y, each picture x _j The probability vector for a pedestrian category label is:

further, the process of step S3 is:

defining a directed graph, each node representing a picture x in a bag _i Each edge represents a relationship between pictures, and the energy function of assigning the pedestrian category label y to the node x on the graph is as follows:

wherein U and V represent nodes and edges, respectively, Φ (y _i |x _i ) Is calculated as picture x _i Dispensing label y _i Is a term of the cost of (a), ψ (y _i ,y _j |x _i ；x _j ) Is calculated as a picture pair (x _i ,x _j ) Assigning a penalty pair term for the tag, equation (2) eliminates false tags generated by weak supervised learning;

the univariate term in equation (2) is defined as:

Φ(y _i |x _i )＝-log(Y _i [y _i ]) Wherein

Wherein P is _i Is the neural network as the picture x _i Calculated probability of pedestrian category label, Y _i Is the bag constraint expressed by equation (1),representing element-by-element product, []Representing a vector index;

because the output of the univariate items of different pictures are independent, the univariate items are unstable, and the smoothing of paired items is needed:

the method comprises the steps of calculating appearance similarity by using a Gaussian kernel based on RGB colors, controlling the size of the Gaussian kernel by using a super parameter sigma, and limiting pictures with similar appearance to have the same label; label compatibility ζ (y) _i ,y _j ) Expressed in the glass's model:

further, the bag category label contains additional information to improve the generation of the pseudo label: correcting the estimated pseudo tag to be the pedestrian classification with the highest predictive score in the bag; causing a partial picture to be assigned to a pedestrian category that is not predicted; the pseudo pedestrian category label for each picture can be obtained by minimizing formula (2):

where {1,2,3, …, m } represents all pedestrian categories in the training set.

Further, in step S3, before the integrated training of the pedestrian re-recognition model and the graph model is performed, the graph model needs to be miniaturized, which specifically includes:

obtaining pseudo pedestrian class labels by using an external graph model for supervising training of a pedestrian re-recognition deep neural network, wherein the calculation for obtaining the pseudo labels by minimizing the formula (2) is not tiny, so that the graph model is incompatible with the deep neural network, and therefore, a relaxation formula (2) is needed to be:

the discrete Φ and ψ are serialized:

the difference between the formula (8) and the formula (3) is that in the non-microchargable model, all possible y needs to be input to the energy function, and the y with the lowest energy is taken as the optimal solution; in the micro-image model, directly inputting a picture x into a deep neural network to obtain a prediction of y; the difference between equation (9) and equation (4) is that the cross entropy term- (Y) _i P _i ) ^T log(Y _j P _j ) Approximation of the non-differentiable term ζ (y) in equation (4) _i ,y _j )Y _i Y _j 。

Further, in the step S4, the graph model loses L _{Drawing of the figure} And a loss of classification/re-identification L _{Classification} ，L _{Classification} Is a pseudo tagAs a supervised normalized exponential cross entropy loss function:

wherein the method comprises the steps ofThe representation will->Function converted into a single thermal vector, n representing the number of pictures in a bag, P _i Representing pedestrian categories computed by neural networksProbability, which is a normalized exponential function of the logarithm of the network output z:

where m represents the number of pedestrian categories of the training set, and the total loss function L is a linear combination of the two loss functions:

L＝w _{classification} L _{Classification} +w _{Drawing of the figure} L _{Drawing of the figure} (12)

Wherein w is _{Classification} And w _{Drawing of the figure} The weights of the two losses are indicated, respectively, and are set to 1 and 0.5, respectively.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

according to the method, a micro-graph learning method and a weak supervision learning method are combined, a module for automatically generating training labels is added for the pedestrian re-recognition deep neural network and is trained integrally with the pedestrian re-recognition deep neural network, and compared with a common pedestrian re-recognition method, the method can achieve the leading model performance without heavy manual labeling cost and almost without increasing calculation complexity.

Drawings

FIG. 1 is a diagram model of a bag of pictures to generate pseudo pedestrian category labels;

figure 2 is a training flow chart of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions;

it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.

1. from supervised pedestrian re-recognition to weakly supervised pedestrian re-recognition

Denoted by b is a pocket containing p pictures, i.e. b=x ₁ ,x ₂ ,…,x _j ,…,x _p ，y＝y ₁ ,y ₂ ,…,y _j ,…,y _p For pedestrian category labels, the bag category label is denoted by l. The supervised pedestrian re-recognition needs classification prediction of a pedestrian category label y supervision model; only the bag class label l is available for the weak supervision pedestrian re-identification, and a pseudo pedestrian class label needs to be estimated for each picture first and is represented by a probability vector Y. Assuming that l contains n pedestrian categories, the whole training set has m pedestrian categories, and Y is limited by bag category labels, each picture x _j The probability vector for a pedestrian category label is:

2. weak supervision pedestrian re-identification based on micro-graph learning

Pedestrian re-identification of graph model

As shown in FIG. 1, a directed graph is defined, each node representing a picture x in a bag _i Each edge represents a relationship between pictures, and the energy function of assigning the pedestrian category label y to the node x on the graph is as follows:

wherein U and V represent nodes and edges, respectively, Φ (y _i |x _i ) Is calculated as picture x _i Dispensing label y _i Is a term of the cost of (a), ψ (y _i ,y _j |x _i ；x _j ) Is calculated as a picture pair (x _i ,x _j ) Pair of items that assign penalties to tags. Equation (2) eliminates false labels generated by weak supervised learning.

Unitary item

The univariate term in equation (2) is defined as:

Φ(y _i |x _i )＝-log(Y _i [y _i ]) Wherein

Wherein P is _i Is the neural network as the picture x _i Calculated probability of pedestrian category label, Y _i Is the bag constraint expressed by equation (1),representing element-by-element product, []Representing the vector index.

Paired item

bag restraint

In effect, the bag category label contains additional information to improve the generation of the pseudo label: correcting the estimated pseudo tag to be the pedestrian classification with the highest predictive score in the bag; causing a partial picture to be assigned to a pedestrian category that is not predicted.

Inference of pseudo pedestrian category labels

The pseudo pedestrian category label for each picture can be obtained by minimizing formula (2):

Graph learning can be miniaturized

The weak supervision pedestrian re-recognition method is not integrally trained, because an external graph model is needed to obtain a pseudo pedestrian class label for supervising the training of the pedestrian re-recognition deep neural network. Minimizing equation (2) results in the computation of pseudo labels that is not trivial, making the graph model incompatible with deep neural networks, thus requiring relaxation of equation (2) to:

the discrete Φ and ψ are serialized:

the difference between the formula (8) and the formula (3) is that in the non-microchargable model, all possible y needs to be input to the energy function, and the y with the lowest energy is taken as the optimal solution; in the micro-image model, the image x is directly input into a deep neural network to obtain the prediction of y. The difference between equation (9) and equation (4) is that the cross entropy term- (Y) _i P _i ) ^T log(Y _j P _j ) Approximation of the non-differentiable term ζ (y) in equation (4) _i ,y _j )Y _i Y _j 。

3. Integral neural network structure

Fig. 2 is a network structure of training and reasoning, with broken lines representing training data flows and solid lines representing reasoning data flows, where the graph model only participates in the training phase. The overall structure comprises three main modules:

feature extraction module

As shown in FIG. 2 (a), the last layer of the original ResNet-50 is removed using ResNet-50 as the backbone network, and replaced with a fully connected layer with 512-dimensional output, a batch normalization, a leaky linear rectification function, and a dropout.

Coarse pedestrian re-identification module

As shown in fig. 2 (b), a full-connection layer with the same output dimension as the number of pedestrian categories is added at the top of the feature extraction module, and normalized exponential cross entropy is used as a loss function. Pedestrian category prediction scoreAs a rough pedestrian re-recognition estimate, the probability of the pedestrian category of the picture in the bag b is represented.

Refined pedestrian re-identification module

As in fig. 2 (c), the coarse pedestrian re-recognition score, look and bag limit are input into the graph model according to formulas (8) and (9), and the pseudo tag generated by the graph model can be used to update the network parameters just like a manually labeled real tag.

4. Optimization

The gradient of the overall loss value to the deep neural network parameters can be calculated by obtaining the pseudo pedestrian class labels, and the gradient is transmitted back to all layers of the network by utilizing a back propagation algorithm, so that the integrated training of all parameters of the weak supervision model is realized.

Loss function

The optimization objective of the method comprises graph model loss L _{Drawing of the figure} And a loss of classification/re-identification L _{Classification} ，L _{Classification} Is a pseudo tagAs a supervised normalized exponential cross entropy loss function:

wherein the method comprises the steps ofThe representation will->Function converted into a single thermal vector, n representing the number of pictures in a bag, P _i The probability representing the pedestrian category calculated by the neural network is a normalized exponential function of the logarithm of the network output z:

where m represents the number of pedestrian categories of the training set.

The total loss function L is a linear combination of these two loss functions:

Wherein w is _{Classification} And w _{Drawing of the figure} The weights of the two losses are shown, respectively, and the method is set to 1 and 0.5.

The same or similar reference numerals correspond to the same or similar components;

the positional relationship depicted in the drawings is for illustrative purposes only and is not to be construed as limiting the present patent;

it is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. A weak supervision training method based on a pedestrian re-recognition model capable of micro-graph learning is characterized by comprising the following steps:

s2: capturing the dependency relationship among all pictures in each bag to generate a pseudo pedestrian category label for each picture in the bag of the category, and taking the pseudo pedestrian category label as supervision information for training a pedestrian re-identification model;

s4: taking the linear combination of graph model loss and re-identification loss as a total loss function, and updating parameters of all layers of the network by using a back propagation algorithm;

the process of the step S2 is as follows:

the process of the step S3 is as follows:

the univariate term in equation (2) is defined as:

Φ(y _i |x _i )＝-log(Y _i [y _i ]) Wherein y=y _i ⊙P _i (3)

Wherein P is _i Is the neural network as the picture x _i The calculated probability of the pedestrian category label represents network output; y is Y _i Is the bag constraint expressed by formula (1), and by which is the element-wise product, []Representing a vector index;

the method comprises the steps of calculating appearance similarity by using a Gaussian kernel based on RGB colors, controlling the size of the Gaussian kernel by using a super parameter sigma, and limiting pictures with similar appearance to have the same label; y is Y _i [y _i ]Y _j [y _j ]Indicating that the bag is to be restrained,representing appearance similarity;

label compatibility ζ (y) _i ,y _j ) Expressed in the glass's model:

2. the weak supervision training method based on the micro-graph learning pedestrian re-recognition model according to claim 1, wherein the specific process of step S1 is as follows:

3. The weak supervision training method based on the micro-graph learning pedestrian re-recognition model according to claim 1, wherein the bag category label contains additional information to improve the generation of the pseudo label: correcting the estimated pseudo tag to be the pedestrian classification with the highest predictive score in the bag; causing a partial picture to be assigned to a pedestrian category that is not predicted.

4. The weak supervision training method based on the micro-graph learning pedestrian re-recognition model according to claim 3, wherein the pseudo pedestrian category label of each picture is obtained by minimizing the formula (2):

5. The weak supervision training method based on the pedestrian re-recognition model capable of micro-image learning according to claim 4, wherein in step S3, before the integrated training of the pedestrian re-recognition model and the image model is performed, the image model is required to be micro-processed, which comprises the following specific steps:

the discrete Φ and ψ are serialized:

the difference between the formula (8) and the formula (3) is that in the non-microchargable model, all y are input to the energy function, and y with the lowest energy is taken as an optimal solution; in the micro-image model, directly inputting a picture x into a deep neural network to obtain a prediction of y; the difference between equation (9) and equation (4) is that the cross entropy term- (Y) _i P _i ) ^T log(Y _j P _j ) Approximation of the non-differentiable term ζ (y) in equation (4) _i ,y _j )Y _i Y _j 。

6. The weak supervision training method based on the micro-image learning pedestrian re-recognition model according to claim 5, wherein in the step S4, the image model loses L _{Drawing of the figure} And a loss of classification/re-identification L _{Classification} ，L _{Classification} Is a pseudo tagAs a supervised normalized exponential cross entropy loss function:

Wherein w is _{Classification} And w _{Drawing of the figure} Respectively representing the weights of the two losses.

7. The weak supervision training method based on the micro-image-learning pedestrian re-recognition model according to claim 6, wherein w _{Classification} Set to 1.

8. The weak supervision training method based on the micro-image-learning pedestrian re-recognition model according to claim 7, wherein w _{Drawing of the figure} Set to 0.5.