CN115205903B - Pedestrian re-recognition method based on identity migration generation countermeasure network - Google Patents
Pedestrian re-recognition method based on identity migration generation countermeasure network Download PDFInfo
- Publication number
- CN115205903B CN115205903B CN202210890765.1A CN202210890765A CN115205903B CN 115205903 B CN115205903 B CN 115205903B CN 202210890765 A CN202210890765 A CN 202210890765A CN 115205903 B CN115205903 B CN 115205903B
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- image
- identity
- training
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses a pedestrian re-identification method for generating an antagonism network based on identity migration, which comprises the following steps: acquiring a pedestrian image dataset, and generating a semantic graph corresponding to the pedestrian image through a human semantic analysis model; constructing an overall model of pedestrian re-recognition, wherein the overall model comprises a generator, a discriminator and a pedestrian re-recognition network; the generator and the discriminator form a generating countermeasure network based on semantic graph identity migration, and training is carried out between the generator and the discriminator in a countermeasure learning mode; constructing a gradient enhancement method based on a local quality attention mechanism, and improving the generation of an antagonism network; establishing a combined training mode for generating an antagonism network and a pedestrian re-recognition network; and inputting a pedestrian image to be identified, and outputting a pedestrian re-identification result through a trained pedestrian re-identification network. The invention improves the diversity of the pedestrian re-recognition data set, can effectively improve the quality of the generated image and improves the recognition precision of the pedestrian re-recognition model.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a pedestrian re-identification method based on identity migration generation countermeasure network.
Background
Pedestrian re-recognition is an important task in the field of computer vision, whose purpose is to establish identity association of pedestrians in a cross-camera scenario. Pedestrian re-recognition has found wide application in the fields of video surveillance and security, and extracts images containing a person of interest from non-overlapping cameras based on a given query image. But the background, the visual angle and the gesture of the images shot by different cameras are greatly different, which brings great challenges for searching target pedestrians in a cross-camera scene. Therefore, in order to cope with the difference between images, it is necessary to learn the feature representation having discrimination from the training data as much as possible. With the development of deep learning, many works train by means of the strong characterization capability of convolutional neural networks and using a deep metric learning or classification learning method, so that the recognition accuracy of the model is greatly improved. In order to further learn the local features in the images, many works align the pedestrian features by utilizing the local feature information such as horizontal division or gesture frameworks and the like, and the characterization capability of the model is enhanced.
The improvement of the model structure is one aspect of improving the pedestrian re-recognition precision, and the other aspect of improving the pedestrian re-recognition precision is that the pedestrian re-recognition model is difficult to learn the representation of the difference robustness of the background, the view angle, the gesture and the like, and the reasons are insufficient data diversity and small data scale of the data set. The pedestrian has changeable postures and disordered backgrounds in the movement process, and pedestrian images in various different conditions are impractical to collect in a real scene, so that the data set is difficult to include the pedestrian images under various changes, and the diversity of the pedestrian image data is insufficient. In addition, the increase in the amount of data also causes an increase in the labeling cost, resulting in difficulty in constructing a large-scale data set. As generative models evolve, and in particular, antagonistic networks, the way in which training data sets are augmented with generative models is increasingly being explored. Some researchers synthesize new pedestrian images by using random noise or gesture keypoints, thereby expanding the pedestrian re-recognition dataset and increasing the diversity of pedestrian gestures in the dataset. However, random noise and priori information contained in gesture key points used by the method are too little to accurately guide the generation of pedestrian features, so that blurring and artifacts exist in generated images, and the identity features are not accurate enough. The generated images with poor quality can mislead the model to learn the characteristics in the training process of the pedestrian re-recognition network, thereby preventing the improvement of the recognition accuracy of the model and being unfavorable for the training of the model.
Disclosure of Invention
The invention aims to solve the technical problem of providing a pedestrian re-identification method for generating an countermeasure network based on identity migration aiming at the defects in the prior art.
The technical scheme adopted for solving the technical problems is as follows:
the invention provides a pedestrian re-identification method for generating an antagonism network based on identity migration, which comprises the following steps:
step 1, acquiring a pedestrian image dataset, generating a semantic graph corresponding to a pedestrian image through a human body semantic analysis model, distributing a semantic category for each pixel in the pedestrian image through the human body semantic analysis model, and dividing the pedestrian image with the semantic graph introduced into a training set and a testing set;
step 2, constructing an overall model of pedestrian re-recognition, wherein the overall model comprises a generator G, a discriminator D and a pedestrian re-recognition network R; the generator G comprises a structure encoder E s Identity information extractor E id Decoder G dec The generator G and the discriminator D form a generated countermeasure network based on semantic graph identity migration, and training is carried out between the generator G and the discriminator D in a countermeasure learning mode;
step 3, constructing a gradient enhancement method based on a local quality attention mechanism, and improving the generation of an antagonism network;
step 4, establishing a combined training mode for generating the countermeasure network and the pedestrian re-recognition network, inputting a training set, outputting a new generated image through the generation of the countermeasure network, using the new generated image and the pedestrian image in the training set for training the pedestrian re-recognition network, obtaining a trained overall model, and testing by using a testing set;
and 5, inputting a pedestrian image to be identified, and outputting a pedestrian re-identification result through a trained pedestrian re-identification network.
Further, the method in the step 1 of the present invention includes:
acquiring a pedestrian image data set, wherein each pedestrian in a pedestrian image is provided with a pedestrian label, dividing the pedestrian image into a training set and a testing set, and the training set and the testing set are not provided with repeated pedestrian labels; the semantic image corresponding to the pedestrian image is generated through a human body semantic analysis model, the human body semantic analysis model distributes a semantic category for each pixel in the image, and the generated semantic image comprises 20 semantic categories, namely a background, a hat, hair, gloves, sunglasses, a jacket, one-piece dress, a coat, socks, trousers, one-piece trousers, scarves, skirts, faces, left arms, right arms, left legs, right legs, left shoes and right shoes; dividing all semantic categories into 5 parts of a head part, an upper body part, a lower body part, shoes and a background according to the spatial position relation of the semantic categories; extracting the characteristics of each part by utilizing the semantic graph to realize fine characteristic extraction; and uniformly scaling all images to a certain pixel size before training.
Further, the method in the step 2 of the present invention includes:
identity migration generation countermeasure network based on semantic graph by structure encoder E s Identity information extractor E id Decoder G dec And a discriminator D, wherein E s 、E id And G dec Combining the generator G and the discriminator D to form a generation countermeasure network, and training by using the countermeasure loss;
define training set asEach training sample is composed of pedestrian imagesIdentity tag y of image n ∈[1,K]Semantic map of pedestrian->Composition, wherein N represents the number of images in the dataset, K represents the number of identities in the dataset, C represents the number of categories of semantic tags, and H and W represent the height and width of the images, respectively;
during the process of training to generate the countermeasure network, two real samples are randomly taken from the training setAndwherein a is E [1, N]And b E [1, N]To take image x a Migration of identity features to image x b In the above, the generator G first uses the identity extractor E id Extracting image x a Identity information I of (1) a Then use the structure encoder E s Image x b Semantic graph s corresponding to it b Encoded as structural features F b The method comprises the steps of carrying out a first treatment on the surface of the Finally use decoder G dec Will I a And F b Decoding into a new pedestrian image +.>I.e. generate an image, generate an image->With pedestrian y b Structural features of (a) and pedestrian y a Is a characteristic of identity of (a).
Further, the method for identity feature migration in step 2 of the present invention specifically includes:
in the process of taking the image x a Migration of identity features to image x b In the process of (1), first, for image x a Corresponding semantic graph s a Pretreating; semantic graph s a Involving pedestrians y a According to the spatial position relation of the semantic information, dividing all the semantic information into 5 parts of a head part, an upper body, a lower body, shoes and a background, and usingA representation; network E is then extracted by identity feature id The identity characteristics of each part of the pedestrian are extracted, and the method is calculated as follows:
in the process of calculationIs automatically extended to 3-dimensional, as if it were the multiplication of the corresponding element; wherein->Andaffine parameters for identity information containing each semantic part; identity information injection of the pedestrian image is realized through an adaptive instance normalization operation, and the definition of the adaptive instance normalization operation is as follows:
wherein mu (·) is the mean-taking operation, and sigma (·) is the standard deviation-taking operation; the self-adaptive instance normalization operation replaces affine parameters with conditional style information on the basis of instance normalization operation so as to achieve the purpose of converting styles;
identity migration has two cases:
when the identity label y a ≠y b When the identity is generated, the generation process is cross identity generation, otherwise, the identity generation is the same identity generation; under the condition of same identity generation, generating a real image corresponding to the generated image in the training set; in order to generate an imageNot only can obtain the pedestrian y a Can also maintain clear structural features, by means of +.>Loss supervised training of the generated images:
when the identity label y a = b Image x a And image x b The method comprises the steps of reconstructing a generated image through supervised learning, so that the generator learns complete structural information.
Further, the specific method for training in the step 2 of the present invention by adopting the manner of countermeasure learning includes:
training the generator G and the discriminator D by adopting an countermeasure learning mode to generate imagesVisually more realistic, the loss of antagonism of generator G and arbiter D is defined as follows:
the WGAN-GP challenge loss is optimized during the training process, making the training process more stable.
Further, the gradient enhancement method based on the local mass attention mechanism constructed in the step 3 specifically comprises the following steps:
in the local quality attention mechanism, non-overlapping patches in a generated image are scored by using a non-reference image quality evaluation model BIECON, after evaluation is completed, each non-overlapping patch area in the generated image can obtain a score between [0,1], and the score is lower when the score is closer to 0, the quality is poorer, and otherwise, the score is better; taking the quality score of each patch as the quality score of each pixel in the patch, and acquiring a quality score matrix Q with the same size as the input; finally, the local mass attention mechanism is implemented as follows:
M=1-Q
the larger the median of the attention matrix M, the worse the pixel quality, and the generator focuses on the region;
in the gradient return stage, the loss is caused by the formulaAnd parameters of the arbiter calculate the gradient delta of the arbiter D Then from the gradient delta of the arbiter D Calculate the generation sample->Gradient of->In a standard generation countermeasure network, the gradient of the generated samples will be used directly to update the parameters of the generator, while the local mass attention based gradient enhancement method utilizes the attention matrix M to +.>Modification is performed, and the product realization of the corresponding elements is used:
where α is the hyper-parameter of the adjustment weights, the generator updates the parameters of the model using the modified gradients.
Further, the method for performing the joint training in the step 4 of the present invention includes:
different loss functions are adopted for the generated image and the real image, the triplet loss function is applied to training of the generated image, and the formula is defined as follows:
wherein B and E represent the number of identities and instances, respectively, in a small lot; f (f) a 、f p 、f n Respectively represent slaveThe pedestrian re-identifies the characteristic vectors of the anchor point sample, the positive sample and the negative sample extracted from the network, and gamma is the boundary super parameter between the intra-class distance and the inter-class distance; the triplet loss is used for pulling the distance between the anchor point sample and the positive sample and the distance between the negative sample and the anchor point sample, so that the characteristic representation with discrimination is learned; for real images, learning is performed using ID loss:
where x represents the real image in the training dataset and p (y|x) represents the probability that x is predicted as its real identity tag y;
by optimizing the overall objective of the weighted sum composition of losses, the joint training generates an antagonism network and a pedestrian re-recognition network:
wherein the method comprises the steps ofIs against loss, for ensuring that the generator generates a visually authentic image, lambda id 、λ rec 、λ tri Is a hyper-parameter for balancing the associated loss term.
Further, the method in step 4 of the present invention further comprises:
because the generation countermeasure network does not generate new identity in the process of generating the image, in order to prevent the pedestrian re-recognition model from being fitted, a two-stage training mode is adopted for the pedestrian re-recognition model; performing joint training by using a general target in a first stage, and introducing an LSRO method to further fine tune the model in a second stage; the LSRO method is used to reduce the likelihood of model overfitting, and the LSRO method assigns a uniformly distributed label to the generated image, defined as follows:
wherein the method comprises the steps ofRepresenting the generated image, k.epsilon.1, K]Thus->Representing the generation of an image +.>The probability of each type of identity is 1/K; the real image and the generated image are trained by using ID loss, and the loss of the real image and the generated image is unified as follows: />
For a real image, z=0; for the generated image, z=1.
The invention has the beneficial effects that:
(1) In order to solve the problem that random noise and gesture key points cannot accurately guide pedestrian characteristic generation, a semantic graph is introduced into the generation process of a pedestrian image, and an identity migration generation countermeasure network guided by the semantic graph is provided. By means of accurate division of the semantic graph on different areas of the pedestrian, accurate editing of the pedestrian image is achieved, and the generation quality of the pedestrian image is improved. The identity migration generation countermeasure network is used for migrating the identities of pedestrians in the pedestrian images to different pedestrian images, so that the diversity of the pedestrian re-identification data set is increased, and the robustness of the model to differences of the background, the view angle, the gesture and the like is improved.
(2) In order to solve the problem of unbalanced quality of local area generation of the generation countermeasure network, a gradient enhancement method based on a local quality attention mechanism is provided, so that the generation countermeasure network can not only globally adjust the generation quality of the image, but also locally improve the quality of the image.
(3) In order to make the pedestrian re-recognition network better utilize the generated image, a combined training mode of generating the countermeasure network and the pedestrian re-recognition network is provided, on one hand, the generated image of the countermeasure network is classified by the pedestrian re-recognition network, so that the identity migration capability of the generated countermeasure network is promoted, and on the other hand, the pedestrian re-recognition network learns more discriminative characteristic representation by means of the generated image of the countermeasure network.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is an overall structure of a model of an embodiment of the present invention;
FIG. 2 is a homoidentity migration of an embodiment of the present invention;
FIG. 3 is a two-stage pedestrian re-recognition network training of an embodiment of the invention;
FIG. 4 is a gradient enhancement method based on local mass attention mechanisms in accordance with an embodiment of the present invention;
FIG. 5 is an identity migration result of a model of an embodiment of the present invention on a Market-1501 dataset;
FIG. 6 is a model-based training flowchart of an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Embodiment one:
the pedestrian re-identification method based on the identity migration generation countermeasure network provided by the embodiment of the invention comprises the following steps:
(1) And constructing an identity migration generation countermeasure network model based on the semantic graph.
Identity migration generation countermeasure network based on semantic graph by structure encoder E s Identity information extractor E id Decoder G dec And a discriminator D, wherein E s 、E id And G dec The combination is used for generating a antagonism network with the discriminator D, and training is performed by using the antagonism loss. Defining training data sets asEach training sample is composed of pedestrian imagesIdentity tag y of image n ∈[1,K]Semantic image of pedestrian->Composition, where N represents the number of images in the dataset, K represents the number of identities in the dataset, C represents the number of categories of semantic tags, and H and W represent the height and width of the images, respectively. During training to generate an countermeasure network, two real samples are randomly taken from the training dataset +.>And->Wherein a is E [1, N]And b E [1, N]To take image x a Migration of identity features to image x b In the above, the generator G first uses the identity extractor E id Extracting image x a Identity information I of (1) a Then use the structure encoder E s Image x b Semantic graph s corresponding to it b Encoded as structural features F b . Finally use decoder G dec Will I a And F b Decoding into a new pedestrian image +.>Should have a pedestrian y b Structural features of (a) and pedestrian y a Is a characteristic of identity of (a).
Specifically, in the case of image x a Migration of identity features to image x b In the process of (1), first, the image x is required to be processed a Corresponding semantic image s a Proceeding withAnd (5) pretreatment. Semantic image s a Involving pedestrians y a According to the spatial position relation of the semantic information, dividing all the semantic information into 5 parts of a head part, an upper body, a lower body, shoes and a background, usingAnd (3) representing. Network E is then extracted by identity feature id The identity characteristics of each part of the pedestrian are extracted, and the method is calculated as follows:
in the process of calculationIs automatically extended to 3 dimensions, as would be the case if the corresponding elements were multiplied. Wherein->Andaffine parameters for identity information containing each semantic part. Identity information injection of the pedestrian image is realized through an adaptive instance normalization operation, and the definition of the adaptive instance normalization operation is as follows:
wherein μ (·) is the mean operation and σ (·) is the standard deviation operation. The self-adaptive instance normalization operation replaces affine parameters with conditional style information on the basis of instance normalization operation so as to achieve the purpose of transforming styles.
By using the semantic tags, the identity features comprise accurate feature information of each semantic part of the pedestrian image, and the identity information is accurately migrated to the target image by utilizing the style migration capability of the normalization operation of the self-adaptive instance, so that the generator G has more accurate identity feature migration capability.
Identity migration has two cases, when identity tag y a ≠y b And the generation process is cross identity generation, and otherwise, same identity generation. Under the condition of homoidentity generation, the generated image has a real image corresponding to the generated image in the training data set. In order to generate an imageNot only can obtain the pedestrian y a Can also maintain clear structural features, by means of +.>Loss supervised training of the generated images:
when the identity label y a = b Image x a And image x b The method comprises the steps of reconstructing a generated image through supervised learning, so that the generator learns complete structural information.
Generating an imageShould be able to correctly obtain the pedestrian y a For which the pedestrian re-recognition network is used for constraint generation of the image +.>Is the identity of (a). Generating an image via a pedestrian re-recognition network pair>Discriminating, generating an image by applying an identity loss function pair>The constraint is carried out as follows: />
Wherein the method comprises the steps ofRepresentation->Predicted as image x a Category label y a Is a probability of (2). By minimizing identity loss of the generator +.>So that the generator learns the identity knowledge of the pedestrian re-recognition network.
Training between the generator and the discriminator by adopting an countermeasure learning mode to generate imagesIs more visually realistic. The penalty of the generator and arbiter is defined as follows:
the WGAN-GP challenge loss is optimized during the training process, making the training process more stable.
(2) Gradient enhancement methods based on local mass attention mechanisms are constructed.
Training is carried out between the generator and the discriminator in an anti-learning mode, the generator should generate images which are as real as possible so as to confuse the discriminator, and the discriminator needs to distinguish the generated images from the real images. During the training phase of the generator, the arbiter takes the generated image as input and predicts its authenticity. Then, a loss value is calculated based on the prediction, which is ultimately used by the arbiter to provide feedback information to the generator. The generator updates parameters by using feedback information, so that the generation capacity of the image is improved, and the generated image is more visually real. Based on the above analysis, it is observed that the feedback information provided by the arbiter is calculated from only one value representing the true or false of the whole image, and the problem of local area generation imbalance in the image is ignored. The imbalance is represented by the phenomena of artifacts, blurring and the like in the local area of the generated image, and the phenomena can further influence the identity discrimination of the pedestrian re-recognition network on the generated image.
The proposed method consists of two parts, local mass attention mechanisms and gradient enhancement. The local quality attention mechanism is used for finding out the areas with poor local generation in the generated image, so that the generator focuses more on the generation of the local areas. And scoring non-overlapped patches in the generated image by using a non-reference image quality evaluation model BIECON, wherein after the evaluation is finished, each non-overlapped patch area in the generated image can obtain a score between [0 and 1], and the score is better when the score is closer to 0, the quality is worse, otherwise, the score is better. The quality score of each patch is taken as the quality score of each pixel in the patch, so that a quality score matrix Q with the same size as the input can be obtained. Finally, the local mass attention mechanism is implemented as follows:
M=1-Q#(8)
the larger the value in the attention matrix M, the worse the pixel quality, and the generator should be given a great deal of attention to this area. In the gradient return stage, the loss is caused by the formulaAnd parameters of the arbiter calculate the gradient delta of the arbiter D Then from the gradient delta of the arbiter D Calculate the generation sample->Gradient of->In a standard generation countermeasure network, the gradient of the generated samples will be used directly to update the parameters of the generator, while the local mass attention based gradient enhancement method utilizes the attention matrix M to +.>Modification is performed, and the product realization of the corresponding elements is used:
where α is a superparameter that adjusts the weights, the XAI-GAN setting α=0.2 is followed. The generator uses the modified gradient to update the parameters of the model, and intuitively, the attention matrix guides the generator to pay more attention to the generation condition of the local area by adding the gradient of the poor quality area, so that the model can not only improve the overall quality of the image, but also further optimize the image quality from the local area.
(3) And establishing a combined training mode for generating an antagonism network and a pedestrian re-recognition network.
The training of the pedestrian re-recognition network is performed in conjunction with the generation of the countermeasure network, which generates new images of pedestrians to be used in the training of the pedestrian re-recognition network together with the real images in the training dataset. The identity information of the generated image originates from the image providing the identity feature, so that the identity tag of the generated image should in theory coincide with the image providing the identity feature. However, the training to generate the countermeasure network is a progressive process, and the quality of the generated image is not perfect in the early stage of the training, so that accurate identity migration cannot be realized. Therefore, the direct application of the identity tag to the generated image can mislead the study of the identity feature by the human re-recognition network, thereby affecting the accuracy of identity migration, leading to unstable training and even collapse. To avoid the above problems, different loss functions are employed for the generated image and the real image. The difficult sample mining triplet loss function is applied to training to generate images, the formula is defined as follows:
where B and E represent the number of identities and instances, respectively, in a small lot. f (f) a 、f p 、f n And respectively representing the characteristic vectors of the anchor point sample, the positive sample and the negative sample extracted from the pedestrian re-recognition network, wherein gamma is the boundary super-parameter between the intra-class distance and the inter-class distance, and is set to 0.3 in the experiment. The triplet loss learns the discriminative feature representation by pulling the distance between the anchor point sample and the positive sample closer and the distance between the negative sample and the anchor point sample farther. For real images, learning is performed using ID loss:
where x represents the real image in the training dataset and p (y|x) represents the probability that x is predicted to be its real identity tag y.
By optimizing the overall objective of the weighted sum composition of the losses (4), (5), (6), (7), (10) and (11), the joint training generates an antagonism network and a pedestrian re-recognition network:
wherein the method comprises the steps ofIs against loss, for ensuring that the generator generates a visually authentic image, lambda id 、λ rec 、λ tri Is a hyper-parameter for balancing the associated loss term.
Since the generation countermeasure network does not generate new identity in the process of generating the image, in order to prevent the pedestrian re-recognition model from being over-fitted, a two-stage training manner as shown in fig. 3 is adopted for the pedestrian re-recognition model. The model is further fine-tuned in a first stage using the above-mentioned overall objective for joint training, and in a second stage introducing the LSRO method. The LSRO method is used to reduce the likelihood of model overfitting, and the LSRO method assigns a uniformly distributed label to the generated image, defined as follows:
wherein the method comprises the steps ofRepresenting the generated image, k.epsilon.1, K]Thus->Representing the generation of an image +.>The probability of belonging to each type of identity is 1/K. The real image and the generated image are trained by using ID loss, and the loss of the real image and the generated image is unified as follows in combination with the formula (5):
for a real image, z=0. For the generated image, z=1.
Embodiment two:
the pedestrian re-identification method based on the identity migration generation countermeasure network provided by the embodiment of the invention comprises the following steps:
(1) Training data set preparation
A Market-1501 dataset is acquired, wherein the dataset is acquired from 6 cameras of a university campus of Qinghai, and 1501 pedestrians are marked in total. Wherein 751 pedestrian labels are used for the training set and 750 pedestrian labels are used for the test set, and there are no duplicate pedestrian labels in the training set and the test set. The semantic image corresponding to the pedestrian image is generated through a human body semantic analysis model (Self Correction for Human Parsing), the human body semantic analysis model distributes a semantic category for each pixel in the image, and the generated semantic image comprises 20 semantic categories, namely a background, a hat, hair, gloves, sunglasses, a coat, one-piece dress, a coat, socks, trousers, one-piece trousers, a scarf, a skirt, a face, a left arm, a right arm, a left leg, a right leg, a left shoe and a right shoe. According to the spatial position relation of the semantic categories, all the semantic categories are roughly divided into 5 parts of a head part, an upper body, a lower body, shoes and a background. In the identity migration process, the features of each part are extracted by utilizing the semantic graph to realize fine feature extraction, and then the features are respectively injected into a generating countermeasure network to generate pedestrian images with more accurate features. All input images are scaled uniformly to a pixel size of 256 x 128 before training.
(2) Model construction
All models are realized through a deep learning framework Pytorch, and the overall structure of the models is shown in fig. 1 and consists of a generator G, a discriminator D and a pedestrian re-recognition network R. The generator G adopts the architecture of encoder-decoder, the structure encoder E s Is a shallow network of three convolutions, in contrast to the decoder G dec Is a network consisting of three layers of transposed convolutions. Identity information extractor E id Obtaining adaptive instance normalization parameters I at the last layer of the network by using global average pooling by adopting five convolution layers, and all E id Sharing network parameters. The generator G uses five residual blocks to inject identity information of different semantic regions into the structural features F, respectively, each residual block containing two adaptive instance normalization layers following the paper MUNIT. Arbiter D follows the popular PatchGAN structure. The architecture of pedestrian re-recognition network R is based on res net50, which is initialized using pre-training parameters on ImageNetAnd modifying the dimension of the full connection layer to be K, wherein K represents the number of identities in the training data set.
(3) Joint training generation of an countermeasure network and a pedestrian re-recognition network
During training, the generation countermeasure network and the pedestrian re-recognition network are trained by using an Adam optimizer, and a parameter beta is set 1 =0.5,β 2 =0.999. The parameter in the overall loss is set to lambda id =1、λ rec =10、λ tri =1. In the training of the first stage, the combined training of the countermeasure network and the pedestrian re-recognition network is generated, the learning rate of the generator and the arbiter is set to 0.0001, and the learning rate of the pedestrian re-recognition network is set to 0.00035. The batch size is set to 32, the number of identities B in one batch is set to 8, and the number of instances E is set to 4. In the second stage, the training to generate the countermeasure network is stopped, and the LSRO loss is used to fine tune the pedestrian re-recognition network. Throughout the experiment, all the input images were resized to 256×128, to remove the effect of the original identity information, the encoder E s Is converted into a gray scale image.
(4) Experimental analysis
The evaluation of the model is classified into an image generation evaluation and a pedestrian re-recognition evaluation. The image generation evaluation is presented by migrating the identity of the pedestrian image onto a different image using a generation countermeasure network, the result of which is shown in fig. 5. In fig. 5, the first column image represents the source image of the identity, the first row represents the target image of the identity migration, and the target image provides the structural information in the identity migration. The other images in fig. 5 are images after identity migration, and it can be seen from the images that the generated images better retain the structural information of the target image, and the migration of the identity information is accurately completed, so that the identity migration generation in the invention has better image generation capability and identity migration capability against the network. The pedestrian re-recognition evaluation criterion of the model includes (1) a Rank-n value representing a probability that at least 1 image among the first n images of the query result meets the query result; (2) mAP (mean average precision), which reflects the extent to which the retrieved person ranks all the correct pictures in the query database in front of the query result. The pedestrian re-identification network achieves 93.9% of accuracy on the Rank-1 value on the mark-1501 test data set and 83.5% on mAP. The method and the device have the advantages that the identity of the pedestrian image is migrated to different images through the generation countermeasure network, so that the diversity of the training data set is effectively expanded, and the robustness of the pedestrian re-recognition network to differences of the background, the visual angle, the gesture and the like is improved.
It will be understood that modifications and variations will be apparent to those skilled in the art from the foregoing description, and it is intended that all such modifications and variations be included within the scope of the following claims.
Claims (6)
1. A pedestrian re-recognition method for generating an countermeasure network based on identity migration, the method comprising the steps of:
step 1, acquiring a pedestrian image dataset, generating a semantic graph corresponding to a pedestrian image through a human body semantic analysis model, distributing a semantic category for each pixel in the pedestrian image through the human body semantic analysis model, and dividing the pedestrian image with the semantic graph introduced into a training set and a testing set;
step 2, constructing an overall model of pedestrian re-recognition, wherein the overall model comprises a generator G, a discriminator D and a pedestrian re-recognition network R; the generator G comprises a structure encoder E s Identity information extractor E id Decoder G dec The generator G and the discriminator D form a generated countermeasure network based on semantic graph identity migration, and training is carried out between the generator G and the discriminator D in a countermeasure learning mode;
the method in the step 2 comprises the following steps:
identity migration generation countermeasure network based on semantic graph by structure encoder E s Identity information extractor E id Decoder G dec And a discriminator D, wherein E s 、E id And G dec Combining the generator G and the discriminator D to form a generation countermeasure network, and training by using the countermeasure loss;
define training set asEach training sample is composed of pedestrian images +.>Identity tag y of image n ∈[1,K]Semantic map of pedestrian->Composition, wherein N represents the number of images in the dataset, K represents the number of identities in the dataset, C represents the number of categories of semantic tags, and H and W represent the height and width of the images, respectively;
during the process of training to generate the countermeasure network, two real samples are randomly taken from the training setAnd->Wherein a is E [1, N]And b E [1, N]To take image x a Migration of identity features to image x b In the above, the generator G first uses the identity extractor E id Extracting image x a Identity information I of (1) a Then use the structure encoder E s Image x b Semantic graph s corresponding to it b Encoded as structural features F b The method comprises the steps of carrying out a first treatment on the surface of the Finally use decoder G dec Will I a And F b Decoding into a new pedestrian image +.>I.e. generate an image, generate an image->With pedestrian y b Structural features of (a) and pedestrian y a Identity feature of (a);
step 3, constructing a gradient enhancement method based on a local quality attention mechanism, and improving the generation of an antagonism network;
the gradient enhancement method based on the local mass attention mechanism constructed in the step 3 specifically comprises the following steps:
in the local quality attention mechanism, non-overlapping patches in a generated image are scored by using a non-reference image quality evaluation model BIECON, after evaluation is completed, each non-overlapping patch area in the generated image can obtain a score between [0,1], and the score is lower when the score is closer to 0, the quality is poorer, and otherwise, the score is better; taking the quality score of each patch as the quality score of each pixel in the patch, and acquiring a quality score matrix Q with the same size as the input; finally, the local mass attention mechanism is implemented as follows:
M=1-Q
the larger the median of the attention matrix M, the worse the pixel quality, and the generator focuses on the region;
in the gradient return stage, the loss is caused by the formulaAnd parameters of the arbiter calculate the gradient delta of the arbiter D Then from the gradient delta of the arbiter D Calculate the generation sample->Gradient of->In a standard generation countermeasure network, the gradient of the generated samples will be used directly to update the parameters of the generator, while the local mass attention based gradient enhancement method utilizes the attention matrix M to +.>Modification is performed, and the product realization of the corresponding elements is used:
wherein α is a superparameter for adjusting the weights, and the generator updates the parameters of the model using the modified gradients;
step 4, establishing a combined training mode for generating the countermeasure network and the pedestrian re-recognition network, inputting a training set, outputting a new generated image through the generation of the countermeasure network, using the new generated image and the pedestrian image in the training set for training the pedestrian re-recognition network, obtaining a trained overall model, and testing by using a testing set;
and 5, inputting a pedestrian image to be identified, and outputting a pedestrian re-identification result through a trained pedestrian re-identification network.
2. The identity migration-based generation of pedestrian re-recognition method of claim 1, wherein the method in step 1 comprises:
acquiring a pedestrian image data set, wherein each pedestrian in a pedestrian image is provided with a pedestrian label, dividing the pedestrian image into a training set and a testing set, and the training set and the testing set are not provided with repeated pedestrian labels; the semantic image corresponding to the pedestrian image is generated through a human body semantic analysis model, the human body semantic analysis model distributes a semantic category for each pixel in the image, and the generated semantic image comprises 20 semantic categories, namely a background, a hat, hair, gloves, sunglasses, a jacket, one-piece dress, a coat, socks, trousers, one-piece trousers, scarves, skirts, faces, left arms, right arms, left legs, right legs, left shoes and right shoes; dividing all semantic categories into 5 parts of a head part, an upper body part, a lower body part, shoes and a background according to the spatial position relation of the semantic categories; extracting the characteristics of each part by utilizing the semantic graph to realize fine characteristic extraction; and uniformly scaling all images to a certain pixel size before training.
3. The method for identifying the pedestrian re-based on the identity migration generation countermeasure network according to claim 1, wherein the method for performing the identity feature migration in the step 2 specifically includes:
in the process of taking the image x a Migration of identity features to image x b In the process of (1), first, for image x a Corresponding semantic graph s a Pretreating; semantic graph s a Involving pedestrians y a According to the spatial position relation of the semantic information, dividing all the semantic information into 5 parts of a head part, an upper body, a lower body, shoes and a background, and usingA representation; network E is then extracted by identity feature id The identity characteristics of each part of the pedestrian are extracted, and the method is calculated as follows:
in the process of calculationIs automatically extended to 3-dimensional, as if it were the multiplication of the corresponding element; wherein->And->Affine parameters for identity information containing each semantic part; identity information injection of the pedestrian image is realized through an adaptive instance normalization operation, and the definition of the adaptive instance normalization operation is as follows:
wherein mu (·) is the mean-taking operation, and sigma (·) is the standard deviation-taking operation; the self-adaptive instance normalization operation replaces affine parameters with conditional style information on the basis of instance normalization operation so as to achieve the purpose of converting styles;
identity migration has two cases:
when the identity label y a ≠y b When the identity is generated, the generation process is cross identity generation, otherwise, the identity generation is the same identity generation; under the condition of same identity generation, generating a real image corresponding to the generated image in the training set; in order to generate an imageNot only can obtain the pedestrian y a Can also maintain clear structural features, and utilizes l 1 Loss supervised training of the generated images:
when the identity label y a =y b Image x a And image x b The method comprises the steps of reconstructing a generated image through supervised learning, so that the generator learns complete structural information.
4. The method for re-identifying pedestrians based on identity migration generation countermeasure network according to claim 3, wherein the specific method for training in the countermeasure learning manner in the step 2 includes:
training the generator G and the discriminator D by adopting an countermeasure learning mode to generate imagesVisually more realistic, the loss of antagonism of generator G and arbiter D is defined as follows:
the WGAN-GP challenge loss is optimized during the training process, making the training process more stable.
5. The identity migration-based generation of countering network pedestrian re-recognition method of claim 1, wherein the method of joint training in step 4 comprises:
different loss functions are adopted for the generated image and the real image, the triplet loss function is applied to training of the generated image, and the formula is defined as follows:
wherein B and E represent the number of identities and instances, respectively, in a small lot; f (f) a 、f p 、f n Respectively representing the feature vectors of the anchor point sample, the positive sample and the negative sample extracted from the pedestrian re-recognition network, wherein gamma is the boundary super-parameter between the intra-class distance and the inter-class distance; the triplet loss is used for pulling the distance between the anchor point sample and the positive sample and the distance between the negative sample and the anchor point sample, so that the characteristic representation with discrimination is learned; for real images, learning is performed using ID loss:
where x represents the real image in the training dataset and p (y|x) represents the probability that x is predicted as its real identity tag y;
by optimizing the overall objective of the weighted sum composition of losses, the joint training generates an antagonism network and a pedestrian re-recognition network:
6. The identity migration-based generation of pedestrian re-recognition method of claim 5, wherein the method of step 4 further comprises:
because the generation countermeasure network does not generate new identity in the process of generating the image, in order to prevent the pedestrian re-recognition model from being fitted, a two-stage training mode is adopted for the pedestrian re-recognition model; performing joint training by using a general target in a first stage, and introducing an LSRO method to further fine tune the model in a second stage; the LSRO method is used to reduce the likelihood of model overfitting, and the LSRO method assigns a uniformly distributed label to the generated image, defined as follows:
wherein the method comprises the steps ofRepresenting the generated image, k.epsilon.1, K]Thus->Representing the generation of an image +.>The probability of each type of identity is 1/K; the real image and the generated image are trained by using ID loss, and the loss of the real image and the generated image is unified as follows: />
For a real image, z=0; for the generated image, z=1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210890765.1A CN115205903B (en) | 2022-07-27 | 2022-07-27 | Pedestrian re-recognition method based on identity migration generation countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210890765.1A CN115205903B (en) | 2022-07-27 | 2022-07-27 | Pedestrian re-recognition method based on identity migration generation countermeasure network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115205903A CN115205903A (en) | 2022-10-18 |
CN115205903B true CN115205903B (en) | 2023-05-23 |
Family
ID=83583415
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210890765.1A Active CN115205903B (en) | 2022-07-27 | 2022-07-27 | Pedestrian re-recognition method based on identity migration generation countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115205903B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116276956B (en) * | 2022-12-01 | 2023-12-08 | 北京科技大学 | Method and device for simulating and learning operation skills of customized medicine preparation robot |
CN117351522A (en) * | 2023-12-06 | 2024-01-05 | 云南联合视觉科技有限公司 | Pedestrian re-recognition method based on style injection and cross-view difficult sample mining |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110688897A (en) * | 2019-08-23 | 2020-01-14 | 深圳久凌软件技术有限公司 | Pedestrian re-identification method and device based on joint judgment and generation learning |
CN111126155A (en) * | 2019-11-25 | 2020-05-08 | 天津师范大学 | Pedestrian re-identification method for generating confrontation network based on semantic constraint |
CN112949608A (en) * | 2021-04-15 | 2021-06-11 | 南京邮电大学 | Pedestrian re-identification method based on twin semantic self-encoder and branch fusion |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111723611A (en) * | 2019-03-20 | 2020-09-29 | 北京沃东天骏信息技术有限公司 | Pedestrian re-identification method and device and storage medium |
CN110659586B (en) * | 2019-08-31 | 2022-03-15 | 电子科技大学 | Gait recognition method based on identity-preserving cyclic generation type confrontation network |
CN110688966B (en) * | 2019-09-30 | 2024-01-09 | 华东师范大学 | Semantic guidance pedestrian re-recognition method |
CN111666851B (en) * | 2020-05-28 | 2022-02-15 | 大连理工大学 | Cross domain self-adaptive pedestrian re-identification method based on multi-granularity label |
CN111783603A (en) * | 2020-06-24 | 2020-10-16 | 有半岛(北京)信息科技有限公司 | Training method for generating confrontation network, image face changing method and video face changing method and device |
CN113592982B (en) * | 2021-09-29 | 2022-09-27 | 北京奇艺世纪科技有限公司 | Identity migration model construction method and device, electronic equipment and readable storage medium |
-
2022
- 2022-07-27 CN CN202210890765.1A patent/CN115205903B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110688897A (en) * | 2019-08-23 | 2020-01-14 | 深圳久凌软件技术有限公司 | Pedestrian re-identification method and device based on joint judgment and generation learning |
CN111126155A (en) * | 2019-11-25 | 2020-05-08 | 天津师范大学 | Pedestrian re-identification method for generating confrontation network based on semantic constraint |
CN112949608A (en) * | 2021-04-15 | 2021-06-11 | 南京邮电大学 | Pedestrian re-identification method based on twin semantic self-encoder and branch fusion |
Also Published As
Publication number | Publication date |
---|---|
CN115205903A (en) | 2022-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shin et al. | Pepsi++: Fast and lightweight network for image inpainting | |
CN108846358B (en) | Target tracking method for feature fusion based on twin network | |
Peng et al. | Syn2real: A new benchmark forsynthetic-to-real visual domain adaptation | |
CN115205903B (en) | Pedestrian re-recognition method based on identity migration generation countermeasure network | |
CN108416266B (en) | Method for rapidly identifying video behaviors by extracting moving object through optical flow | |
CN109598268A (en) | A kind of RGB-D well-marked target detection method based on single flow depth degree network | |
CN111160264B (en) | Cartoon character identity recognition method based on generation countermeasure network | |
Geng et al. | Human action recognition based on convolutional neural networks with a convolutional auto-encoder | |
Li et al. | Effective person re-identification by self-attention model guided feature learning | |
Liu et al. | A 3 GAN: an attribute-aware attentive generative adversarial network for face aging | |
CN112364791B (en) | Pedestrian re-identification method and system based on generation of confrontation network | |
Wu et al. | Condition-aware comparison scheme for gait recognition | |
CN112801019B (en) | Method and system for eliminating re-identification deviation of unsupervised vehicle based on synthetic data | |
Sinha et al. | Identity-preserving realistic talking face generation | |
CN111783521A (en) | Pedestrian re-identification method based on low-rank prior guidance and based on domain invariant information separation | |
CN111783698A (en) | Method for improving training stability of face recognition model | |
CN113870157A (en) | SAR image synthesis method based on cycleGAN | |
Zhang et al. | Lightweight texture correlation network for pose guided person image generation | |
Premalatha et al. | Improved gait recognition through gait energy image partitioning | |
Chen et al. | Pose-guided spatial alignment and key frame selection for one-shot video-based person re-identification | |
Li et al. | Foldover features for dynamic object behaviour description in microscopic videos | |
AU2020102476A4 (en) | A method of Clothing Attribute Prediction with Auto-Encoding Transformations | |
Tian et al. | End-to-end thorough body perception for person search | |
Liu et al. | A3GAN: An attribute-aware attentive generative adversarial network for face aging | |
Duan et al. | An approach to dynamic hand gesture modeling and real-time extraction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |