CN115205903A

CN115205903A - Pedestrian re-identification method for generating confrontation network based on identity migration

Info

Publication number: CN115205903A
Application number: CN202210890765.1A
Authority: CN
Inventors: 朱容波; 吴天; 张�浩; 李松泉
Original assignee: Huazhong Agricultural University
Current assignee: Huazhong Agricultural University
Priority date: 2022-07-27
Filing date: 2022-07-27
Publication date: 2022-10-18
Anticipated expiration: 2042-07-27
Also published as: CN115205903B

Abstract

The invention discloses a pedestrian re-identification method for generating an antagonistic network based on identity migration, which comprises the following steps of: acquiring a pedestrian image data set, and generating a semantic graph corresponding to a pedestrian image through a human body semantic analysis model; constructing an integral model of pedestrian re-identification, which comprises a generator, a discriminator and a pedestrian re-identification network; the generator and the discriminator form a generated confrontation network based on semantic graph identity migration, and the generator and the discriminator are trained in a confrontation learning mode; constructing a gradient enhancement method based on a local quality attention mechanism, and improving an antagonistic network; establishing a joint training mode for generating a confrontation network and a pedestrian re-recognition network; and inputting a pedestrian image to be recognized, and outputting a pedestrian re-recognition result through the trained pedestrian re-recognition network. The invention improves the diversity of the pedestrian re-identification data set, can effectively improve the quality of the generated image and improve the identification precision of the pedestrian re-identification model.

Description

Pedestrian re-identification method for generating confrontation network based on identity migration

Technical Field

The invention relates to the technical field of computer vision, in particular to a pedestrian re-identification method for generating an anti-network based on identity migration.

Background

Pedestrian re-identification is an important task in the field of computer vision, and aims to establish identity association of pedestrians in a cross-camera scene. Pedestrian re-identification has wide application in the fields of video surveillance and security, etc., and extracts images containing interested persons from non-overlapping cameras according to given query images. However, the background, the view angle and the posture of images shot by different cameras have great difference, which brings great challenge to finding a target pedestrian in a cross-camera scene. Therefore, in order to cope with the difference between images, it is necessary to learn a feature expression having discriminative power from training data as much as possible. With the development of deep learning, many works are trained by using a deep metric learning or classification learning method by means of strong characterization capability of a convolutional neural network, so that the identification accuracy of the model is greatly improved. In order to further learn local features in the image, many works align pedestrian features by using local feature information such as horizontal division or attitude skeleton, and the like, so that the characterization capability of the model is enhanced.

The improvement of the model structure is one aspect of improving the accuracy of pedestrian re-identification, and another reason that the pedestrian re-identification model is difficult to learn the robust representation of the differences of the background, the visual angle, the posture and the like is that the data diversity of the data set is insufficient and the data scale is small. The postures of pedestrians are changeable and the backgrounds of pedestrians are disordered in the moving process, and it is impractical to collect images of pedestrians under different conditions in a real scene, so that data sets are difficult to include images of pedestrians under various changes, and the diversity of image data of pedestrians is insufficient. In addition, the increase in the amount of data causes an increase in the labeling cost, which makes it difficult to construct a large-scale data set. As generative models develop, and in particular, antagonistic networks are generated, the manner in which training data sets are augmented with generative models is adopted by more and more research. Some researchers have expanded the pedestrian re-recognition data set by synthesizing new pedestrian images using random noise or pose key points, increasing the diversity of pedestrian poses in the data set. However, random noise and attitude key points used in the method contain too little prior information, which cannot accurately guide the generation of pedestrian features, resulting in blur and artifacts in the generated image and inaccurate identity features. The generated images with poor quality mislead the learning of the model to the characteristics in the training process of the pedestrian re-identification network, so that the improvement of the model identification precision is hindered, and the training of the model is not facilitated.

Disclosure of Invention

The invention aims to solve the technical problem of providing a pedestrian re-identification method for generating a countermeasure network based on identity migration aiming at the defects in the prior art.

The technical scheme adopted by the invention for solving the technical problem is as follows:

the invention provides a pedestrian re-identification method for generating an antagonistic network based on identity migration, which comprises the following steps:

step 1, acquiring a pedestrian image data set, generating a semantic graph corresponding to a pedestrian image through a human body semantic analysis model, allocating a semantic category to each pixel in the pedestrian image through the human body semantic analysis model, and dividing the pedestrian image introduced with the semantic graph into a training set and a test set;

step 2, constructing an integral model of pedestrian re-identification, which comprises a generator G, a discriminator D and a pedestrian re-identification network R; the generator G comprises a structural encoder E _s Identity information extractor E _id Decoder G _dec The generator G and the discriminator D form a generated confrontation network based on semantic graph identity migration, and the generator G and the discriminator D are trained in a confrontation learning mode;

step 3, constructing a gradient enhancement method based on a local quality attention mechanism, and improving a generation countermeasure network;

step 4, establishing a joint training mode for generating the confrontation network and the pedestrian re-recognition network, inputting a training set, outputting a new generated image through the generated confrontation network, using the generated image and the pedestrian image in the training set for training the pedestrian re-recognition network to obtain a trained integral model, and testing by using the test set;

and 5, inputting a pedestrian image to be recognized, and outputting a pedestrian re-recognition result through the trained pedestrian re-recognition network.

Further, the method in step 1 of the present invention comprises:

acquiring a pedestrian image data set, wherein each pedestrian in the pedestrian image has a pedestrian label, and dividing the pedestrian label into a training set and a testing set, wherein the training set and the testing set do not have repeated pedestrian labels; the semantic image corresponding to the pedestrian image is generated through a human body semantic analysis model, the human body semantic analysis model allocates a semantic category for each pixel in the image, and the generated semantic image comprises 20 semantic categories which are respectively background, hat, hair, gloves, sunglasses, jacket, one-piece dress, coat, socks, trousers, jumpsuit, scarf, skirt, face, left arm, right arm, left leg, right leg, left shoe and right shoe; dividing all semantic categories into 5 parts, namely a head part, an upper body, a lower body, shoes and a background according to the spatial position relation of the semantic categories; the semantic graph is used for independently extracting the features of each part, so that fine feature extraction is realized; and all images are scaled uniformly to a certain pixel size before training.

Further, the method in step 2 of the present invention comprises:

semantic graph-based identity migration generation countermeasure network routing structure encoder E _s Identity information extractor E _id Decoder G _dec And a discriminator D, where E _s 、E _id And G _dec The combination is a generator G which forms a generation countermeasure network with the discriminator D, and the countermeasure loss is used for training;

defining a training set as

Each training sample is formed by pedestrian images

Identity tag y of image _n ∈[1,K]And the semantic map of a pedestrian

Composition, where N represents the number of images in the dataset, K represents the number of identities in the dataset, C represents the number of categories of semantic tags, H and W represent the height and width of the images, respectively;

in the process of training to generate the countermeasure network, two real samples are randomly taken out of the training set

And

wherein a ∈ [1, N ]]And b is ∈ [1, N ]]To convert an image x _a Identity feature of (2) to image x _b In the above, the generator G first uses the identity extractor E _id Extracting an image x _a Identity information of (I) _a Then using a structural encoder E _s Image x _b And its corresponding semantic graph s _b Coded as structural features F _b (ii) a Finally using a decoder G _dec Will I _a And F _b Decoding into a new pedestrian image

Namely, generating an image

With a pedestrian y _b Structural feature of (1) and pedestrian y _a The identity of (2).

Further, the method for performing identity feature migration in step 2 specifically includes:

in a position toImage x _a Identity feature of (2) is migrated to image x _b In the process of (1), firstly, the image x is processed _a Corresponding semantic graph s _a Carrying out pretreatment; semantic graph s _a Includes pedestrian y _a The semantic information of (1) is used by dividing all the semantic information into 5 parts of a head, an upper body, a lower body, shoes and a background according to the spatial position relationship of the semantic information

Represents; then, the network E is extracted by the identity feature _id The identity characteristic of each part of the pedestrian is extracted and calculated as follows:

in the process of calculation

Is automatically expanded into 3-dimensional, an indicates that the corresponding element is multiplied; wherein

And

affine parameters containing identity information of each semantic part; the identity information injection of the pedestrian image is realized through self-adaptive example normalization operation, and the self-adaptive example normalization operation is defined as follows:

wherein mu (-) is the operation of taking the mean value, and sigma (-) is the operation of taking the standard deviation; the self-adaptive example normalization operation replaces affine parameters with conditional style information on the basis of the example normalization operation so as to achieve the purpose of style transformation;

there are two cases of identity migration:

when identity label y _a ≠y _b If so, generating the cross identity, otherwise, generating the same identity; under the condition of identity generation, generating real images corresponding to the generated images in a training set; to generate an image

Not only can obtain the pedestrian y _a Can also maintain clear structural features, utilize

Loss supervised training of the generated images:

when identity label y _a ＝ _b Time, image x _a And image x _b The generated images can be reconstructed by supervised learning, so that the generator learns complete structural information.

Further, the specific method of training in a counterlearning manner in step 2 of the present invention includes:

training is carried out between the generator G and the discriminator D in a counterstudy mode to generate images

More visually realistic, the penalty of generator G versus discriminator D is defined as follows:

and the WGAN-GP is used for optimizing the loss resistance in the training process, so that the training process is more stable.

Further, the method for constructing a gradient enhancement based on a local mass attention mechanism in step 3 specifically includes:

in the local quality attention mechanism, a no-reference image quality evaluation model BIECON is used for scoring non-overlapping patches in a generated image, after evaluation is completed, each non-overlapping patch area in the generated image can obtain a score between [0 and 1], the closer the score is to 0, the worse the quality is, and otherwise, the better the quality is; taking the mass fraction of each patch as the mass fraction of each pixel in the patch, and acquiring a mass fraction matrix Q with the same input size; finally, the local mass attention mechanism is realized by:

M＝1-Q

the larger the median value of the attention matrix M, the worse the pixel quality, and the generator focuses on the region;

loss by formula during the gradient pass back stage

And the gradient Delta of the discriminator is calculated according to the parameters of the discriminator _D Then from the gradient Δ of the arbiter _D Computationally generating samples

Gradient of (2)

In a standard generative confrontation network, the gradient of the generative sample will be used directly to update the parameters of the generator, while the local quality attention based gradient enhancement method utilizes an attention matrix M versus the gradient of the generative sample

Modifying by using the product of the corresponding elementsNow:

where α is the hyperparameter of the tuning weight, the generator updates the parameters of the model using the modified gradient.

Further, the method for performing joint training in step 4 of the present invention includes:

different loss functions are adopted for the generated image and the real image, the triple loss function is applied to training of the generated image, and the formula is defined as follows:

wherein B and E represent the number of identities and instances in the mini-batch, respectively; f. of _a 、f _p 、f _n Respectively representing feature vectors of an anchor point sample, a positive sample and a negative sample extracted from a pedestrian re-identification network, wherein gamma is a boundary hyper-parameter between an intra-class distance and an inter-class distance; the triple loss is characterized in that the distance between an anchor point sample and a positive sample is shortened, and the distance between a negative sample and the anchor point sample is lengthened, so that discriminant feature representation is learned; for real images, learning is done using ID loss:

where x represents the true image in the training dataset and p (y | x) represents the probability that x is predicted to be its true identity label y;

through an overall objective of optimizing the weighted sum of the losses, the combined training generates a confrontation network and a pedestrian re-recognition network:

wherein

Is to combat the loss, for ensuring that the generator generates a visually realistic image, λ _id 、λ _rec 、λ _tri Is a hyper-parameter used to balance the associated loss term.

Further, the method in step 4 of the present invention further includes:

because the generation of the countermeasure network cannot generate new identities in the process of generating images, in order to prevent the pedestrian re-recognition model from being over-fitted, a two-stage training mode is adopted for the pedestrian re-recognition model; performing joint training by using an overall target in the first stage, and introducing an LSRO method to further fine-tune the model in the second stage; the LSRO method is used to reduce the likelihood of model overfitting, and assigns a uniformly distributed label to the generated image, which is defined as follows:

wherein

Denotes the generation of an image, k ∈ [1, K ]]Thus, therefore, it is

Representing a generated image

The probability of belonging to each type of identity is 1/K; the real images and the generated images are trained by using ID loss, and the loss of the real images and the loss of the generated images are unified as follows:

for real images, Z =0; for the generated image, Z =1.

The invention has the following beneficial effects:

(1) In order to solve the problem that random noise and attitude key points cannot accurately guide the generation of pedestrian features, a semantic graph is introduced into the generation process of a pedestrian image, and a semantic graph-guided identity migration generation countermeasure network is provided. By means of the accurate division of different areas of the pedestrian by the semantic graph, the accurate editing of the pedestrian image is achieved, and the generation quality of the pedestrian image is improved. The pedestrian identity in the pedestrian image is migrated to different pedestrian images through the identity migration generation countermeasure network, the diversity of the pedestrian re-identification data set is increased, and therefore the robustness of the model to differences of the background, the visual angle, the posture and the like is improved.

(2) In order to solve the problem of the generation quality imbalance of the local area of the generation countermeasure network, a gradient enhancement method based on a local quality attention mechanism is provided, so that the generation countermeasure network can not only adjust the generation quality of the image globally, but also improve the quality of the image locally.

(3) In order to enable the pedestrian re-recognition network to better utilize the generated image, a joint training mode of the generation countermeasure network and the pedestrian re-recognition network is provided, on one hand, the pedestrian re-recognition network is utilized to classify the generated image of the generation countermeasure network to promote the identity transfer capability of the generation countermeasure network, and on the other hand, the pedestrian re-recognition network learns the feature representation with more discriminative power by means of the image generated by the generation countermeasure network.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is an overall structure of a model of an embodiment of the invention;

FIG. 2 is a homonym migration of an embodiment of the present invention;

FIG. 3 is a two-stage pedestrian re-identification network training of an embodiment of the present invention;

FIG. 4 is a gradient enhancement method based on a local mass attention mechanism according to an embodiment of the present invention;

FIG. 5 shows the identity migration results of the model on the Market-1501 data set in accordance with an embodiment of the present invention;

FIG. 6 is a flowchart of the overall training of the model according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

The first embodiment is as follows:

the pedestrian re-identification method for generating the countermeasure network based on identity migration comprises the following steps:

(1) And constructing a semantic graph-based identity migration generation countermeasure network model.

Semantic graph-based identity migration generation countermeasure network routing structure encoder E _s Identity information extractor E _id Decoder G _dec And a discriminator D, where E _s 、E _id And G _dec The combination is a generator G, and forms a generation countermeasure network with a discriminator D, and training is carried out by using the countermeasure loss. Defining a training data set as

Each training sample is formed by pedestrian images

Identity tag y of image _n ∈[1,K]And semantic image of pedestrian

Composition, where N represents the number of images in the dataset, K represents the number of identities in the dataset, C represents the number of categories of semantic tags, and H and W represent the height and width of the images, respectively. In the process of training to generate the confrontation network, two real samples are randomly taken out of the training data set

And

wherein a ∈ [1, N ]]And b ∈ [1, N ]]To convert the image x _a Identity feature of (2) is migrated to image x _b In the above, the generator G first uses the identity extractor E _id Extracting an image x _a Identity information of (I) _a Then using a structural encoder E _s Image x _b And its corresponding semantic graph s _b Coded as structural features F _b . Finally using a decoder G _dec Will I _a And F _b Decoding into new pedestrian images

Should have a pedestrian y _b Structural feature of (a) and pedestrian y _a The identity of (2).

Specifically, image x is to be displayed _a Identity feature of (2) to image x _b In the process of (2), firstly, the image x needs to be processed _a Corresponding semantic image s _a And (4) carrying out pretreatment. Semantic image s _a Includes pedestrian y _a The semantic information of (1) roughly divides all the semantic information into 5 parts of a head, an upper body, a lower body, shoes and a background according to the spatial position relationship of the semantic information, and uses

And (4) showing. Then, through the identity feature extraction network E _id Extracting the identity characteristics of each part of the pedestrian, and calculating as follows:

in the process of calculation

Is automatically expanded to 3-dimensional, an indicates that the corresponding element is multiplied. Wherein

And

affine parameters containing identity information for each semantic part. The identity information injection of the pedestrian image is realized through self-adaptive example normalization operation, and the self-adaptive example normalization operation is defined as follows:

where μ (-) is the mean and σ (-) is the standard deviation. The adaptive instance normalization operation replaces affine parameters with conditional style information on the basis of the instance normalization operation, so that the purpose of style transformation is achieved.

By using the semantic labels, the identity features contain accurate feature information of each semantic part of the pedestrian image, and the identity information is accurately migrated to the target image by using the style migration capability of the adaptive instance normalization operation, so that the generator G has more accurate identity feature migration capability.

There are two cases of identity migration, when identity label y _a ≠y _b And if so, generating the cross identity, otherwise, generating the same identity. Under the condition of identity generation, the generated images have corresponding real images in the training data set. To generate an image

Loss supervised training of the generated images:

Generating an image

Should be able to correctly obtain the pedestrian y _a For which a pedestrian re-identification network is used to constrain the generation of images

The identity of (c). Image generation through pedestrian re-identification network pair

Performing discrimination, and generating image by using identity loss function

The constraint is specifically expressed as follows:

wherein

To represent

Is predicted as image x _a Class label y of _a The probability of (c). Identity loss by minimizing generators

So that the generator learns the identity characteristic knowledge of the pedestrian re-identification network.

Between generators and discriminatorsTraining against learning to generate images

Is more visually realistic. The penalty of generator versus arbiter is defined as follows:

(2) A gradient enhancement method based on a local mass attention mechanism is constructed.

Training is carried out between the generator and the discriminator in a mode of resisting learning, the generator should generate images which are as real as possible to confuse the discriminator, and the discriminator needs to distinguish the generated images from the real images. In the training phase of the generator, the discriminator takes the generated image as input and predicts its authenticity. A loss value is then calculated based on the prediction, which is ultimately used by the arbiter to provide feedback information to the generator. The generator updates the parameters by using the feedback information, thereby improving the generation capability of the image and enabling the generated image to be more visually real. Based on the above analysis, it was observed that the feedback information provided by the discriminator was calculated only from a value representing the authenticity of the entire image, while ignoring the problem of local areas in the image creating an imbalance. The imbalance is represented by the phenomena of artifact, blurring and the like in a local area of a generated image, and the phenomena can further influence the identity discrimination of a pedestrian re-identification network on the generated image.

The proposed method consists of two parts, local mass attention mechanism and gradient enhancement. The effect of the local quality attention mechanism is to find the area with poor local generation in the generated image, so that the generator is more interested in the generation of the local area. And (3) scoring the non-overlapping patches in the generated image by using a non-reference image quality evaluation model BIECON, and after the evaluation is finished, each non-overlapping patch area in the generated image obtains a score between [0 and 1], wherein the score is closer to 0, the quality is worse, and otherwise, the quality is better. The quality score of each patch is taken as the quality score of each pixel in the patch, so that a quality score matrix Q with the same size as the input can be obtained. Finally, the local mass attention mechanism is realized by:

M＝1-Q#(8)

the larger the value of the attention matrix M, the worse the quality of the pixels, the generator should be given an important focus on this area. Loss by formula during the gradient pass back stage

And calculating the gradient Delta of the discriminator according to the parameters of the discriminator _D Then the gradient Delta from the discriminator _D Computationally generating samples

Gradient of (2)

The modification is made using the product of the corresponding elements to achieve:

where α is a hyperparameter that adjusts the weights, α =0.2 is set following XAI-GAN. The generator updates the parameters of the model by using the modified gradient, and intuitively, the attention matrix guides the generator to pay more attention to the generation condition of the local area by increasing the gradient of the poor quality area, so that the model not only can improve the overall quality of the image, but also can further optimize the image quality from the local part.

(3) And establishing a joint training mode for generating a confrontation network and a pedestrian re-recognition network.

Training of the pedestrian re-identification network is combined with generation of the countermeasure network, and a new pedestrian image generated by the generation of the countermeasure network is used for training of the pedestrian re-identification network together with the real image in the training data set. The identity information of the generated image is derived from the image providing the identity feature, and therefore the identity label of the generated image should ideally coincide with the image providing the identity feature. However, training for generating the countermeasure network is a gradual process, and in the early stage of training, the quality of generated images is not perfect, and accurate identity migration cannot be realized. Therefore, the direct application of the identity label to the generated image can mislead the learning of the identity characteristics by the people re-identification network, further influence the accuracy of identity migration, and cause the instability and even collapse of training. To avoid the above problem, different loss functions are employed for the generated image and the real image. The hard sample mining triplet loss function is applied to the training to generate the image, and the formula is defined as follows:

where B and E represent the identity and number of instances in the mini-batch, respectively. f. of _a 、f _p 、f _n Respectively representing the feature vectors of an anchor point sample, a positive sample and a negative sample extracted from the pedestrian re-identification network, wherein gamma is a boundary hyperparameter between the intra-class distance and the inter-class distance and is set to be 0.3 in the experiment. The triplet loss learns the discriminative feature representation by narrowing the distance between the anchor sample and the positive sample and by narrowing the distance between the negative sample and the anchor sample. For real images, learning is done using ID loss:

where x represents the true image in the training dataset and p (y | x) represents the probability that x is predicted to be its true identity label y.

By optimizing the overall goal consisting of the weighted sum of the losses (4), (5), (6), (7), (10), and (11), the joint training generates a countermeasure network and a pedestrian re-recognition network:

wherein

Is to combat the loss, for ensuring that the generator generates a visually realistic image, λ _id 、λ _rec 、λ _tri Is a hyperparameter for balancing the relevant loss terms.

Since the generation of the countermeasure network does not generate new identities in the process of generating the image, in order to prevent the pedestrian re-recognition model from being over-fitted, a two-stage training mode as shown in fig. 3 is adopted for the pedestrian re-recognition model. The above mentioned overall targets are used for joint training in the first stage, and the LSRO method is introduced in the second stage to further fine-tune the model. The LSRO method is used to reduce the likelihood of model overfitting, and assigns a uniformly distributed label to the generated image, which is defined as follows:

wherein

Representing the generation of an image, k ∈ [1, K ]]Thus, it is possible to

Representing a generated image

The probability of belonging to each class of identity is 1/K. The real image and the generated image are both trained by using ID loss, and the loss of the real image and the loss of the generated image are unified as follows by combining a formula (5):

for real images, Z =0. For the generated image, Z =1.

Example two:

(1) Training data set preparation

A Market-1501 data set is obtained, 6 cameras of a self-clearing university campus are collected, and 1501 pedestrians are marked in total. The 751 pedestrians are marked to be used in a training set, the 750 pedestrians are marked to be used in a testing set, and the repetitive pedestrian labels are not arranged in the training set and the testing set. The semantic image corresponding to the pedestrian image is generated through a Human body semantic analysis model (Self Correction for Human matching), the Human body semantic analysis model allocates a semantic category to each pixel in the image, and the generated semantic image comprises 20 semantic categories which are respectively background, hat, hair, gloves, sunglasses, jacket, one-piece dress, coat, socks, trousers, jumpsuits, scarf, skirt, face, left arm, right arm, left leg, right leg, left shoe and right shoe. All semantic categories are roughly divided into 5 parts, namely a head part, an upper body, a lower body, shoes and a background according to the spatial position relation of the semantic categories. In the identity migration process, the semantic graph is used for independently extracting the features of each part to realize fine feature extraction, and then the features are respectively injected into the generation countermeasure network to generate a pedestrian image with more accurate features. All input images are uniformly scaled to a pixel size of 256 × 128 before training.

(2) Model construction

All models are realized through a deep learning frame Pythrch, and the overall structure of the model is shown in FIG. 1 and comprises a generator G, a discriminator D and a pedestrian re-identification network R. The generator G adopts an encoder-decoder architecture, the structure encoder E _s Is a shallow network of three convolutional layers, and, in contrast, a decoder G _dec Is a network composed of three layers of transposed convolutions. Identity information extractor E _id Five convolutional layers are adopted, and global average pooling is used at the last layer of the network to obtain an adaptive instance normalization parameter I, all E _id The network parameters are shared. The generator G uses five residual blocks to inject the identity information of different semantic regions into the structural feature F, respectively, following the paper MUNIT that each residual block contains two adaptive instance normalization layers. Discriminator D follows the popular PatchGAN structure. The structure of the pedestrian re-identification network R is based on ResNet50, the pre-training parameters on ImageNet are used for initializing the pedestrian re-identification network R, the dimensionality of the full connection layer is modified into K, and the K represents the number of identities in the training data set.

(3) Joint training generation of confrontation network and pedestrian re-recognition network

During training, generation of a countermeasure network and a pedestrian re-recognition network are trained by using an Adam optimizer, and a parameter beta is set ₁ ＝0.5，β ₂ =0.999. Parameter setting in total loss is λ _id ＝1、λ _rec ＝10、λ _tri =1. In the training of the first stage, the generation countermeasure network and the pedestrian re-recognition network are jointly trained, the learning rates of the generator and the discriminator are set to be 0.0001, and the learning rate of the pedestrian re-recognition network is set to be 0.00035. The batch size is set to 32, the number of identities B is set to 8 and the number of instances E is set to 4 in a batch. In the second stage, the training for generating the countermeasure network is stopped, and the pedestrian re-identification network is finely adjusted by using the LSRO loss. Throughout the experiment all input images were resized to 256 x 128, in order to remove the effect of the original identity information, the texture encoder E _s Is converted into a grayscale image.

(4) Analysis of experiments

The evaluation of the model is divided into an image generation evaluation and a pedestrian re-recognition evaluation. The image generation evaluation is presented by migrating the identity of the pedestrian image to a different image using a generation countermeasure network, the result of which is shown in fig. 5. In fig. 5, the first column of images represents the source image of the identity, the first row represents the target image of the identity migration, and the target image provides structural information in the identity migration. The other images in fig. 5 are images after identity migration, and it can be seen from the images that the generated images better retain the structural information of the target image, and accurately complete the migration of the identity information, showing that the identity migration generation countermeasure network in the present invention has better image generation capability and identity migration capability. The pedestrian re-identification evaluation criteria of the model comprise (1) a Rank-n value, which represents the probability that at least 1 image in the first n images of the query result meets the query result; (2) mAP (mean average precision), which reflects how well the retrieved person is in the query database with all correct pictures in front of the query result. The pedestrian re-identification network achieves 93.9% of accuracy on a Rank-1 value on a Market-1501 test data set and achieves 83.5% on an mAP. The identity of the pedestrian image is transferred to different images by using the generated countermeasure network, so that the diversity of the training data set is effectively expanded, and the robustness of the pedestrian re-recognition network to differences of backgrounds, visual angles, postures and the like is improved.

It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims

1. A pedestrian re-identification method for generating a confrontation network based on identity migration is characterized by comprising the following steps:

and 5, inputting the pedestrian image to be recognized, and outputting a pedestrian re-recognition result through the trained pedestrian re-recognition network.

2. The pedestrian re-identification method for generating the countermeasure network based on the identity migration as claimed in claim 1, wherein the method in the step 1 comprises:

acquiring a pedestrian image data set, wherein each pedestrian in the pedestrian image has a pedestrian label, and dividing the pedestrian label into a training set and a testing set, wherein the training set and the testing set do not have repeated pedestrian labels; the semantic image corresponding to the pedestrian image is generated through a human body semantic analysis model, the human body semantic analysis model allocates a semantic category to each pixel in the image, and the generated semantic image comprises 20 semantic categories which are respectively a background, a hat, hair, gloves, sunglasses, a coat, a one-piece dress, a coat, socks, trousers, jumpsuits, a scarf, a skirt, a face, a left arm, a right arm, a left leg, a right leg, a left shoe and a right shoe; dividing all semantic categories into 5 parts, namely a head part, an upper body, a lower body, shoes and a background according to the spatial position relation of the semantic categories; the semantic graph is used for independently extracting the features of each part, so that fine feature extraction is realized; and all images are scaled uniformly to a certain pixel size before training.

3. The pedestrian re-identification method for generating the countermeasure network based on the identity migration as claimed in claim 1, wherein the method in the step 2 comprises:

define a training set as

Each training sample is formed by pedestrian images

Identity label y of image _n ∈[1，K]And the semantic map of a pedestrian

in the process of training to generate the confrontation network, two real samples are randomly taken out of the training set

And

wherein a is [1, N ]]And b is ∈ [1, N ]]To convert an image x _a Identity feature of (2) is migrated to image x _b In the above, the generator G first uses the identity extractor E _id Extracting an image x _a Of (1)Identity information I _a Then using a structural encoder E _s Image x _b And its corresponding semantic graph s _b Coded as structural features F _b (ii) a Finally using a decoder G _dec Will I _a And F _b Decoding into a new pedestrian image

Namely, generating an image

4. The pedestrian re-identification method based on the identity migration generation countermeasure network of claim 3, wherein the method for identity feature migration in the step 2 specifically comprises:

in the image x _a Identity feature of (2) to image x _b In the process of (2), firstly, the image x is processed _a Corresponding semantic graph s _a Carrying out pretreatment; semantic graph s _a Includes pedestrian y _a The semantic information of (1) is used by dividing all the semantic information into 5 parts of a head, an upper body, a lower body, shoes and a background according to the spatial position relationship of the semantic information

in the process of calculation

Is automatically extended to 3-dimension, an indicates that the corresponding element is multiplied; wherein

And

affine parameters for identity information containing each semantic portion; the identity information injection of the pedestrian image is realized through self-adaptive example normalization operation, and the self-adaptive example normalization operation is defined as follows:

wherein mu (-) is an operation of taking a mean value, and sigma (-) is an operation of taking a standard deviation; the self-adaptive example normalization operation replaces affine parameters with conditional style information on the basis of the example normalization operation so as to achieve the purpose of style conversion;

there are two cases of identity migration:

Not only can obtain the pedestrian y _a Can also maintain clear structural characteristics, using ₁ Loss supervised training of the generated images:

when identity label y _a ＝y _b When the utility model is used, the water is discharged,image x _a And image x _b The generated images can be reconstructed by supervised learning, so that the generator learns complete structural information.

5. The pedestrian re-identification method for generating the countermeasure network based on identity migration according to claim 4, wherein the specific method of training in the countermeasure learning manner in the step 2 comprises:

training is carried out between the generator G and the discriminator D in a mode of counterstudy, so that an image is generated

6. The pedestrian re-identification method based on identity migration to generate the countermeasure network according to claim 1, wherein the step 3 of constructing the gradient enhancement method based on the local quality attention mechanism specifically comprises:

M＝1-Q

loss by formula during the gradient pass back stage

And calculating the gradient Delta of the discriminator according to the parameters of the discriminator _D Then the gradient Delta from the discriminator _D Computationally generated samples

Gradient of (2)

7. The pedestrian re-recognition method for generating the countermeasure network based on identity migration as claimed in claim 1, wherein the method for performing the joint training in the step 4 comprises:

by optimizing the overall objective of the weighted sum of the losses, the joint training generates a countermeasure network and a pedestrian re-recognition network:

wherein

8. The pedestrian re-identification method for generating the countermeasure network based on the identity migration as claimed in claim 7, wherein the method in the step 4 further comprises:

because the generation of the countermeasure network cannot generate new identities in the process of generating the images, in order to prevent the pedestrian re-recognition model from being over-fitted, a two-stage training mode is adopted for the pedestrian re-recognition model; performing joint training by using an overall target in the first stage, and introducing an LSRO method to further fine-tune the model in the second stage; the LSRO method is used to reduce the likelihood of model overfitting, and assigns a uniformly distributed label to the generated image, which is defined as follows:

wherein

Representing the generation of an image, k ∈ [1, K ]]Thus, it is possible to

Representing a generated image

The probability of belonging to each type of identity is 1/K; the real image and the generated image are trained by using ID loss, and the loss of the real image and the loss of the generated image are unified as follows:

for real images, Z =0; for the generated image, Z =1.