CN113128456B

CN113128456B - Pedestrian re-identification method based on combined picture generation

Info

Publication number: CN113128456B
Application number: CN202110485010.9A
Authority: CN
Inventors: 苏迪; 张�成; 王少博; 邱语聃; 冀瑞静
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2023-04-07
Anticipated expiration: 2041-04-30
Also published as: CN113128456A

Abstract

The invention discloses a pedestrian re-identification method generated by combining pictures. The pedestrian re-recognition method is based on the confrontation generation network GAN to re-recognize the pedestrian picture, and in the training process of the confrontation generation network, a similar unsupervised learning method is used, a teacher model is adopted to predict the identity (characteristic information) of the pedestrian picture, a feature encoder of the GAN is guided, the information of a generated sample is utilized more fully, the quality of the model is improved, and a student model with higher precision is obtained. In addition, the invention also utilizes the thought of triple loss, expands the data, shortens the difference in the classes, increases the difference between the classes and further improves the identification precision.

Description

Pedestrian re-identification method based on joint picture generation

Technical Field

The invention relates to the technical field of computer vision, in particular to a pedestrian re-identification method generated by combining pictures.

Background

Pedestrian re-identification refers to that a pedestrian picture to be retrieved is known, other pictures of the pedestrian are retrieved from a non-overlapping camera shooting picture database, and the pedestrian re-identification is generally regarded as a metric learning problem. The difficulty is that pictures shot by different cameras have serious intra-class difference on information such as pedestrian appearance, posture, background and the like. Meanwhile, pedestrians can have certain inter-class similarity under a specific visual angle. With the development of deep convolutional networks in recent years, the pedestrian re-identification is greatly progressed by representing a pedestrian picture more robustly and with stronger identification capability, and the representation is even better than the human identification on some data sets.

To reduce the effect of intra-class differences, some methods perform component segmentation on the pedestrian picture, which in turn embeds the component features directly into the pedestrian feature representation through component matching or. Although the robust metric learning strategy solves the problem of insufficient data to some extent, the introduction of additional pedestrian pictures can directly increase the accuracy of the model. With the development of generative confrontation networks, generative models are widely used in augmenting data, and some methods augment the original data set by generating new samples. However, the method for generating the sample has the problems of less distinguishing information of the generated sample, low identity reality degree of the sample, separation of the generated model and the re-recognition model and the like.

In the prior art, information such as Pose or semantics is usually introduced to generate a sample, for example, a method for extracting identity information from a pedestrian picture and generating the pedestrian picture with the corresponding Pose by using GAN in combination with the human Pose is disclosed in FD-GAN (position-guided Feature distinguishing GAN for Robust Person Re-identification) published in NeuroIPS of 2018 computer vision conference; joint characterization and generation Learning for Person Re-identification published in 2019 on computer vision conference CVPR discloses a Joint Learning frame in which a pedestrian identity feature encoder is embedded into a pedestrian Re-identification model, and meanwhile, a pre-trained pedestrian Re-identification model is used as a teacher model to distribute pseudo labels to generated samples. However, in practical applications, pre-trained models with higher recognition accuracy do not necessarily exist.

Disclosure of Invention

In view of this, the invention provides a pedestrian re-identification method combining picture generation, which generates unlabeled samples by using a coder-decoder structure based on a generated countermeasure network (GAN), introduces a teacher model to perform identity prediction on the generated samples, does not need to introduce additional labeling information, a pre-training model and the like, can improve identification precision, and improves the applicability of a pedestrian re-identification algorithm in different scenes.

The pedestrian re-identification method combining the picture generation adopts a generation countermeasure network to complete the picture generation and re-identification, wherein the generation countermeasure network comprises a characteristic encoder, a structure encoder, a generator and a discriminator; the system comprises a pedestrian image input module, a teacher model and a control module, wherein the teacher model is used for identifying characteristic information of an input pedestrian image; taking the feature encoder as a student model, wherein the structure of a teacher model is consistent with that of the feature encoder; the parameter updating mode of the teacher model is as follows: the parameter of the teacher model at the current moment is the weighted sum of the parameter of the student model at the current moment and the parameter of the teacher model at the previous moment, and the initial value of the parameter of the teacher model is the initial value of the parameter of the characteristic encoder;

generating a loss function of the countermeasure network, wherein the loss function also comprises consistency loss of the characteristic information identified by the teacher model at the current time and the characteristic information identified by the characteristic encoder at the current time;

and aiming at the pedestrian picture to be identified, the characteristic encoder in the generation countermeasure network acquires the characteristic information of the pedestrian picture, and compares the characteristic information with the sample set data to complete the identification of the pedestrian.

Preferably, the parameter of the teacher model at the current moment is an exponential weighted average sum of the parameter of the teacher model at the previous moment and the parameter of the feature encoder at the current moment.

Preferably, generating the penalty function for the counterpoise network further includes a triple penalty, wherein the triple penalty L _trip Comprises the following steps:

wherein,

to utilize the pedestrian picture x _i Characteristic information f of _i Pedestrian picture x with different identities _j Structural code s of _j Reconstructing the generated picture; e _f A feature encoder; d is a distance; m is a set value.

Preferably, after the generation of the confrontation network training is completed, the characteristic information of the pedestrian picture to be identified is directly obtained by the teacher model, and the characteristic information is compared with the sample concentrated data to complete the identification of the pedestrian.

Has the advantages that:

(1) The pedestrian re-recognition method based on the generated countermeasure network GAN is used for re-recognizing the pedestrian picture, a similar unsupervised learning method is used in the training process of generating the countermeasure network, a teacher model is used for predicting the identity (characteristic information) of the pedestrian picture, a feature encoder of the GAN is guided, the information of a generated sample is utilized more fully, the quality of the model is improved, and a student model with higher precision is obtained.

(2) The teacher model updates the model parameters by adopting a weighted average method, and the model parameters do not need to be updated through iterative training, so that the training time is saved, and the complexity of the whole model is reduced.

(3) The invention utilizes the thought of triple loss, expands the data, shortens the intra-class difference, increases the inter-class difference and further improves the identification precision.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic view of the overall model structure of the present invention;

fig. 3 is a schematic structural diagram of an average teacher model of the present invention.

Detailed Description

The invention is described in detail below by way of example with reference to the accompanying drawings.

The invention provides a pedestrian re-identification method generated by combining pictures, a flow chart of which is shown in figure 1, and the method specifically comprises the following steps:

step 1, reading data from the sample data set, and acquiring a pedestrian picture x in the data set _i (x _i E.x, i =1,2 …, N is the total number of pictures in the sample data set) and its labeled identity information y _k (y _k E.g. Y, K =1,2 …, K, K being the number of samplesTotal number of identity categories in the dataset).

Step 2, constructing a generation countermeasure network (GAN):

the generation countermeasure network (GAN) comprises an encoder E, a generator G and a discriminator D, wherein the encoder comprises a structure encoder E _s Sum-feature encoder E _f (ii) a The overall structure of the resulting countermeasure network is shown in fig. 2.

Pedestrian picture x with identity mark to be read _i Input to a constituent encoder E _s Sum-feature encoder E _f Respectively acquiring the structural information s of the pedestrian pictures _i And characteristic information f _i The structure information comprises information such as picture background and pedestrian posture, the characteristic information comprises information related to identity recognition, and then a generator G is used for reconstructing the picture; the picture reconstruction of the generator G comprises self reconstruction and cross reconstruction:

(1) Self-reconstruction:

pedestrian picture x _i Self-reconstruction is achieved by two methods:

(1-1) Picture of pedestrian x _i Characteristic information f of _i And structural information s _i Reconstructing to generate characteristic information f _i Structural information of s _i Reconstructed picture of

As shown in the second row of the picture in fig. 2; the constraint condition of the reconstruction is that the identity of the reconstructed picture is consistent with that of the original pedestrian picture and the reconstructed picture is consistent on the pixel, so that the loss can be solved by a method of consistent on the pixel, namely the reconstruction uses l ₁ The method calculates the loss:

wherein G represents a generator, E represents a desired calculation, "| | | | calculation ₁ "denotes a ₁ A loss calculation method.

(1-2) utilization with Picture x _i Having pedestrians of the same identity but with other different structural informationPicture x of _k Drawing x _i Structural information s of _i And picture x _k Characteristic information f of _k Generating characteristic information f through G reconstruction _k Structural information of s _i Picture of

As illustrated in the third row of the picture in fig. 2; similarly, the constraint condition of the reconstruction is that the generated picture and the original pedestrian picture keep consistent identity and consistent pixel, so the loss can be solved by a consistent pixel method, namely the reconstruction uses one ₁ The method calculates the loss:

the above two reconstruction methods serve as main constraints on the generator G, which facilitates aggregation of feature information of the same identity and reduction of differences in features within classes.

Meanwhile, the invention introduces identity loss to ensure that the characteristic information of different identities keeps separation:

wherein p (y) _i |x _i ) Is based on the feature information to the picture x _i Identity prediction and its true value y _i The consistency comparison. The log in the formula is based on e, as follows.

(2) Cross reconstruction:

the cross reconstruction is to mix two pedestrian pictures with different identities to generate one picture, wherein the pedestrian picture x _i Providing a feature encoder E _f Coded characteristic information f _i Pedestrian picture x with different identities _j Providing a structural encoder E _s Coded structure information s _j Reconstructing the generated picture

I.e. at x _j Structural information s of _j Upper hold x _i Characteristic information f of _i As illustrated in the first row of pictures in fig. 2. The loss cannot be solved by a method of consistency on picture pixels due to no real value, but the characteristic information and the structure information of the generated picture are reconstructed, and the constraint condition is that the characteristic information and the structure information are consistent with the original input information, so that the generated picture and the pedestrian picture from which the characteristic information comes have partial consistency on the identity characteristic, and the reconstructed picture is combined with the generated picture>

Respectively performing loss calculation on the characteristic information and the structural information:

/>

likewise, identity loss is also used to maintain separation of its characteristic information on identity:

wherein

Based on the characteristic information, the picture is judged>

Identity prediction of (2) and its true value y _i The consistency comparison.

And finally, judging the authenticity of the generated picture by adopting a discriminator D.

Generating pictures using GAN inevitably introduces a penalty to ensure the authenticity of the generated pictures, said penalty being:

L _adv ＝E[log D(x _i )+log(1-D(G(f _i ,s _j )))] (7)

for cross reconstruction, because the identity information of the generated cross identity picture is difficult to define, the invention not only assumes the identity source is consistent with the identity source of the feature information, but also predicts the identity by using a teacher model and guides the recognition of a feature encoder in the GAN.

Taking a characteristic encoder in the GAN as a student model, and constructing a teacher model, wherein the structure of the teacher model is consistent with that of the characteristic encoder, and the initial parameters of the teacher model are the initial parameters of the characteristic encoder; in the training process of the GAN, the parameters of the teacher model are updated along with the student models, but the parameters are not optimally updated through loss functions like the student models, and the teacher model is updated to be the weighted sum of the parameters of the teacher model at the last moment and the parameters of the student models at the current moment.

In this embodiment, the teacher model parameter θ at the current time _t ' use parameter θ of student model at present moment _t And teacher model parameter θ 'of last time' _t-1 I.e.:

θ′ _t ＝αθ′ _t-1 +(1-α)θ _t (8)

where α is a smoothing coefficient.

The average weight teacher model can replace a pre-training re-recognition model to label the identity of the generated picture, meanwhile, the teacher model is not updated through a loss function, the complexity of the whole model is reduced, and the structural schematic of the average weight teacher model is shown in fig. 3.

The invention considers that the teacher model has higher pedestrian weight recognition precision than the student model in the early stage of model training. To reconcile student model predictions with teacher models, use l ₂ Losses constitute a consistent loss function for both (student and teacher models) predictions:

wherein,

respectively, the teacher model and the student model predict the identity probability distribution.

In addition, in the cross-reconstruction process, the embodiment also applies to the picture x _i 、x _j 、

Feature information uses triple loss. Due to the generated picture->

Is expected to be in x _j Under the structure of (1) hold x _i Is inevitably related to x _j Characteristically similar, the picture ≥ is generated for enhancement>

And identity discrimination, introducing characteristic separation loss, and adopting a similar triple mode to enable the distance between the generated picture and the characteristic information of the two pedestrian identity pictures to be larger than m so as to form a characteristic separation loss function.

Specifically, the three pictures are taken as a triplet, and the feature encoder is taken as a mapping function, so that the loss of the feature triplet can be obtained:

/>

wherein,

represents->

x _i The distance of the characteristic information->

Represents->

x _j The distance of the feature information, m, is a feature distance constant.

The total loss function is obtained by weighted addition of the loss functions (equations (1) to (7), and equations (9) to (10)):

wherein,

is a loss function of self-reconstruction>

Is a loss function of the reconstruction of the characteristic information and the structural information in the cross reconstruction, lambda _i 、λ _id 、λ _cons Are the weighting coefficients that control the importance of the loss function.

Optimizing feature encoder E using sample set data _f Encoder of structure E _s A generator and a discriminator D, finally obtaining a well-trained generated countermeasure network.

And aiming at the pedestrian picture to be identified, identifying the characteristic information of the pedestrian picture by a characteristic encoder, and then comparing the characteristic information with the sample set data to finish the identification of the pedestrian.

To demonstrate the effectiveness of the method of the invention, comparative experiments were performed on the Market1501 and Duke MTMC-reiD datasets, with the experimental results shown in Table 1 being the results of the underlying network, the use of average teacher losses only, the use of triple losses only, and both. Experiments show that the method has the advantages of enabling the accuracy of the pedestrian re-identification model to be higher.

TABLE 1

The invention also discloses a published prescription with better effectThe results of comparison are shown in table 2, the upper half is the supervised method without using the generated samples, the lower half is the method using the generated samples, and we test the feature encoder E separately _f And teacher model E _t . Experiments have shown that the method of the invention is very challenging and more accurate therein.

TABLE 2

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A pedestrian re-identification method combining picture generation adopts a generation countermeasure network to complete picture generation and re-identification, wherein the generation countermeasure network comprises a feature encoder, a structure encoder, a generator and a discriminator; the system is characterized by further comprising a teacher model, wherein the teacher model is used for identifying characteristic information of an input pedestrian picture; the feature encoder is used as a student model, and the structure of a teacher model is consistent with that of the feature encoder; the parameter updating mode of the teacher model is as follows: the parameter of the teacher model at the current moment is the weighted sum of the parameter of the student model at the current moment and the parameter of the teacher model at the previous moment, and the initial value of the parameter of the teacher model is the initial value of the parameter of the feature encoder;

generating a loss function of the countermeasure network, wherein the loss function also comprises consistency loss of the characteristic information identified by the teacher model at the current moment and the characteristic information identified by the characteristic encoder at the current moment;

2. The pedestrian re-identification method with joint picture generation as claimed in claim 1, wherein generating the loss function of the countermeasure network further comprises a triplet loss, wherein the triplet loss L _trip Comprises the following steps:

wherein,

to utilize the pedestrian picture x _i Characteristic information f of _i Pedestrian picture x with different identities _j Structural code s of _j Reconstructing the generated picture; e _f A feature encoder; d is a distance; m is a set value. />