CN111428650B - Pedestrian re-recognition method based on SP-PGGAN style migration - Google Patents
Pedestrian re-recognition method based on SP-PGGAN style migration Download PDFInfo
- Publication number
- CN111428650B CN111428650B CN202010226128.5A CN202010226128A CN111428650B CN 111428650 B CN111428650 B CN 111428650B CN 202010226128 A CN202010226128 A CN 202010226128A CN 111428650 B CN111428650 B CN 111428650B
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- picture
- pggan
- generator
- discriminator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 66
- 238000013508 migration Methods 0.000 title claims abstract description 59
- 230000005012 migration Effects 0.000 title claims abstract description 59
- 238000012549 training Methods 0.000 claims abstract description 62
- 238000012360 testing method Methods 0.000 claims abstract description 19
- 238000011176 pooling Methods 0.000 claims description 10
- 238000002372 labelling Methods 0.000 claims description 5
- 208000009119 Giant Axonal Neuropathy Diseases 0.000 claims description 4
- 201000003382 giant axonal neuropathy 1 Diseases 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 7
- 238000002474 experimental method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- RTAQQCXQSZGOHL-UHFFFAOYSA-N Titanium Chemical compound [Ti] RTAQQCXQSZGOHL-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 229910052719 titanium Inorganic materials 0.000 description 1
- 239000010936 titanium Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a pedestrian re-identification method based on SP-PGGAN style migration. The method comprises the following steps: constructing an SP-PGGAN model based on the cycleGAN; simultaneously inputting a training set of the marked pedestrian re-recognition data set and a training set of the unmarked pedestrian re-recognition data set into the SP-PGGAN model for training, namely obtaining the training set of the marked pedestrian re-recognition data set after migration through a generator G; training a classification network on the training set of the migrated marked pedestrian re-recognition data set by utilizing a pedestrian re-recognition model IDE to obtain a trained IDE model; and inputting the test set of the unlabeled pedestrian re-identification data set into the trained IDE model to realize the pedestrian re-identification of the unlabeled data set. Because the SP-PGGAN migration model designed by the invention is more accurate in the style migration process, the pedestrian re-identification effect of the unlabeled data set can be improved to a great extent.
Description
Technical Field
The invention belongs to the field of computer vision, and particularly relates to technologies such as deep learning, countermeasure network, image processing, feature extraction and the like. The pedestrian re-recognition method based on SP-PGGAN style migration can realize style migration from the marked data set to the unmarked data set, and the pedestrian re-recognition network trained after the marked data set migration can improve the effect of pedestrian re-recognition of the unmarked data set.
Background
Pedestrian Re-recognition (Person Re-identification) is also called pedestrian Re-recognition, and is a technique for judging whether a specific pedestrian exists in an image or video sequence by using a computer vision technique. Widely recognized as a sub-problem of image retrieval. Given a monitored pedestrian image, the pedestrian image is retrieved across devices. The camera is used for making up the visual limitation of the current fixed camera, can be combined with the pedestrian detection/pedestrian tracking technology, and can be widely applied to the fields of intelligent video monitoring, intelligent security and the like. In recent years, pedestrian re-recognition has gained increasing attention in the field of computer vision. The pedestrian re-identification has wide application prospects, including pedestrian retrieval, pedestrian tracking, street event detection, pedestrian action behavior analysis and the like. The method and the device for identifying the pedestrian re-recognition data aim at solving the problems, wherein one of the challenges is to the requirement of a large amount of marked training data, and an effective migration model is provided for the challenge, so that the marked data training model can be better applied to a non-marked data set, and the pedestrian re-recognition accuracy of the non-marked data is improved.
The existing pedestrian re-identification data set aiming at the unlabeled data is mostly trained on the labeled data set, and the model after training is directly used for testing the unlabeled data set. Therefore, the direct migration is not high in accuracy of pedestrian re-identification directly used for testing due to different bottom layer distribution of the two data sets. In recent years, with the development of the countermeasure network and the progress of migration technology, it has become possible to perform style migration using unpaired data sets, and even data sets of different underlying distributions can perform style conversion well. Under the background, how to perform more accurate migration on the basis of CycleGAN is one of hot spots of image recognition research, and has wide application prospect.
Disclosure of Invention
Aiming at the challenge of difficult data annotation in pedestrian re-recognition, the invention provides a new migration model SP-PGGAN by utilizing a deep learning technology, and realizes pedestrian re-recognition on the basis of the migration model. The existing pedestrian re-identification data set aiming at the unlabeled data is mostly trained on the labeled data set, and the model after training is directly used for testing the unlabeled data set. Therefore, the direct migration is not high in accuracy of pedestrian re-identification directly used for testing due to different bottom layer distribution of the two data sets. Firstly, an SP-PGGAN model is built, then a training set with a marked pedestrian re-recognition data set and a training set without a marked pedestrian re-recognition data set are simultaneously input into the built SP-PGGAN model for style migration, the training set with the marked pedestrian re-recognition data set after migration is obtained for IDE training, and then pedestrian re-recognition is realized on a testing set without the marked pedestrian re-recognition data set. The main flow of the invention is shown in figure 1, and can be divided into the following three steps: and (3) constructing an SP-PGGAN model, migrating the style of the SP-PGGAN model, and realizing pedestrian re-identification.
(1) Construction of SP-PGGAN model
The invention firstly builds an SP-PGGAN model, the model structure of which is improved on the basis of a cycleGAN, and the cycleGAN is essentially two mirror symmetry GANs, so as to form a ring network. Two GANs share two generators and each has a local arbiter, i.e., two local arbiters and two generators in common. The SP-PGGAN is based on the cyclegaN, a generator of the cyclegaN is reserved, and a twin network is added after the generator is generated to guide the generation process of the generator. Meanwhile, the two discriminators are replaced by a local discriminator and a global discriminator which are parallel. The invention points of the SP-PGGAN model proposed by the invention are intuitively shown in fig. 2.
(2) Style migration of SP-PGGAN model
In order to better test on the unmarked dataset test set, the training set of the marked pedestrian re-identification dataset and the training set of the unmarked pedestrian re-identification dataset are simultaneously input into the built SP-PGGAN model for style migration. Two similarities are maintained during migration: firstly, if an image from a marked data set is migrated to an unmarked data set, the migrated image is consistent with the unmarked data set in style; secondly, the ID information of pedestrians in the images needing to be kept with the marked data sets before and after the image migration is unchanged. Such ID information is not the background of the image or the style of the image, but the pedestrian region of the image having a potential relation to the ID information, i.e., the labeling information of the pedestrian, other than the background.
(3) Implementation of pedestrian re-recognition
After style migration, the invention needs to further carry out experiments of pedestrian re-identification. According to the invention, the training set with the marked data set obtained after SP-PGGAN style migration is trained in a classification network by utilizing the IDE model of pedestrian re-identification, so that the trained IDE model is obtained, and then the pedestrian re-identification is realized on the test set without the marked data set.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings required in the description of the embodiments will be briefly described as follows:
FIG. 1 is a flow chart of an SP-PGGAN style migration method for pedestrian re-identification;
FIG. 2 is a diagram showing the comparison of SP-PGGAN model and cycleGAN model;
FIG. 3 is a schematic diagram of a cycleGAN model network;
FIG. 4 is a network architecture diagram based on SP-PGGAN style migration;
FIG. 5 is a network architecture diagram of a pedestrian re-identification method (IDE);
advantageous effects
The invention adds the twin network on the basis of the CycleGAN, and the twin network can make two pictures with similar pedestrian information more similar and two pictures with dissimilar pedestrian information more dissimilar. Meanwhile, in order to consider the information of the local part and the whole part in the pedestrian migration process and improve the effect of the pedestrian migration process, the invention uses the global discriminator and the local discriminator as the SP-PGGAN discriminator at the same time. The improved migration network can enable pictures generated in the pedestrian migration process to be better used for training of the classification network, and experiments show that the invention realizes improvement of mAP (mean average precision, average recognition accuracy) of 12% of recognition than that of the direct-migrated pedestrians.
Detailed Description
In light of the foregoing, the following is a specific implementation, but the scope of protection of this patent is not limited to this implementation.
Step 1: construction of SP-PGGAN model
The invention firstly builds a migration model of SP-PGGAN. The model can be used for a migration process of pedestrian re-identification, but is not limited to the process, the pedestrian re-identification is an application scene of the invention, and other similar scenes related to style migration can be universal.
The SP-PGGAN model is an improvement over cycleGAN. The CycleGAN is bidirectional, and is a ring network formed by two mirror symmetry gags, wherein the two gags share two generators and are respectively provided with a local discriminator, namely, the two local discriminators and the two generators. As shown in FIG. 3, G and F are two generators of cycleGAN, D x And D y Is two local discriminators of CycleGAN. In this model, the purpose of the generator is to make the pictures generated during training more and more realistic, and attempt to cheat the discriminator, and the purpose of the discriminator is to make the pictures more and more interesting during training, and can identify the true or false of the pictures.
As shown in FIG. 4, the SP-PGGAN model provided by the invention consists of two major parts, namely a generator and a discriminator. The generator of SP-PGGAN consists of two parts: one part is a generator G and a generator F of the CycleGAN, and each generator is sequentially provided with two convolution layers with the step length of 2, six residual blocks and two deconvolution layers with the step length of 1/2 from shallow to deep; the other part is a twin network, which sequentially comprises four layers of convolution layers with the step length of 2, four layers of maximum pooling layers with the step length of 2 and one layer of full connection layer from shallow to deep; the SP-PGGAN discriminator includes a discriminator D T Sum discriminator D s Wherein the discriminator D T Comprising a local discriminant D T And global arbiter D T Discriminator D s Comprising a local discriminant D s And global arbiter D s The method comprises the steps of carrying out a first treatment on the surface of the Distinguishing device D T Sum discriminator D s The structure is the same, the parameters are not shared, wherein, the global arbiter D T And local discriminant D T The four convolution layers with the step length of 2 are shared, and the four convolution layers are divided into two paths from the end of the fourth layer, wherein one path is D T A global discriminator, which outputs a binary number finally, the binary number determining whether the whole picture is true or false; the other way is D T Local discriminant byThe full connection layer outputs a 256-dimensional vector, and each number determines whether the image block at the corresponding position is true or false. The specific network structure of the SP-PGGAN model is shown in Table 1.
TABLE 1
Step 2: style migration of SP-PGGAN model
After the model is built, in order to better test on a test set of the unlabeled dataset, the training set of the labeled pedestrian re-recognition dataset and the training set of the unlabeled pedestrian re-recognition dataset are simultaneously input into the built SP-PGGAN model for style migration. As shown in fig. 4, in the migration process, for the forward direction, a picture x from a labeled dataset is generated by a generator G, then a picture G (x) is generated by another generator F, a part of losses of the generator are difference losses between x and F (G (x)), another part of losses are from a twin network, the distance between the picture x and the picture G (x) is shortened after passing through the twin network, the distance between the picture G (x) and the picture y from a non-labeled dataset is lengthened after passing through the twin network, and the losses of the part are from' distance losses of two pairs of samples. The loss of the discriminators is the global discriminators D of the generated picture G (x) and picture y T And local discriminant D T Cross entropy loss below. Also in the opposite direction, the picture y from the unlabeled dataset generates a picture F (y) by the generator F, then the picture F (y) generates a picture G (F (y)) by the other generator G, a part of the loss of the generator is the difference loss between y and G (F (y)), the other part of the loss is from the twinning network, the distance between the picture y and the picture F (y) is shortened after passing through the twinning network, the distance between the picture F (y) and the picture x from the labeled dataset is lengthened after passing through the twinning network, and the loss of the part is from the distance loss of the two pairs of samples. Discrimination ofThe loss of the device is that the generated picture F (y) and the picture x are in the global discriminator D s And local discriminant D s Cross entropy loss below. In the training process, the parameters of the discriminant are kept unchanged, parameters of the training generation process are fixed, and the parameters of the discriminant are trained. Repeating the above process to gradually evolve the generator and the discriminator respectively. The overall loss in this process is shown in equation 1:
L=L Tadv +L Sadv +L PTadv +L PSadv +γ 1 L cyc +γ 2 L ide +γ 3 L con (1)
L Tadv and L Sadv Is a local discriminant D of two mirror symmetry in the SP-PGGAN model T And D s The losses of (2) are cross entropy losses, as shown in formula (2) and formula (3); wherein x and y respectively represent pictures in the training set with the marked data set and pictures in the training set without the marked data set, P x 、P y Refers to the x and y compliant distribution, G, D T Generator and local discriminant, F, D, representing forward direction respectively s A generator and a local arbiter representing the opposite direction, respectively. G (x) represents a picture generated by the marked training set picture x after passing through the generator G, D T (G (x)) and D T (y) denotes that G (x) and y pass through the local discriminator D T And judging the result. F (y) represents a picture generated by generating a non-labeling training set picture y through a generator F, D s (x) And D s (F (y)) means that x and F (y) pass through the local discriminant D s And judging the result.
L PTadv And L PSadv Is a global discriminant D of two mirror symmetry in the SP-PGGAN model T And D s And the loss of the local discriminant D T And D s Loss of (2)Identical, i.e. L PTadv =L Tadv And L is PSadv =L Sadv ;
L cyc Is the sum of the losses of two mirror-symmetrical generators in the SP-PGGAN model, i.e. the sum of the euclidean distances between x and F (G (x)), y and G (F (y)); as shown in equation (4). For the positive direction, x represents a picture in the training set with the labeled dataset, G (x) represents a picture obtained after passing through the generator G, F (x)) represents a picture obtained after passing through the generator G and then through the generator F, and the euclidean distance between the picture x and the generated picture F (x)) represents a generation loss. For the reverse direction, y represents a picture in the unlabeled dataset training set, F (y) represents a picture obtained through the generator F, G (F (y)) represents a picture obtained through the generator F and further through the generator G, and the euclidean distance between the picture x and the generated picture G (F (y)) represents a generation loss.
L ide The color consistency loss in the forward and reverse direction generation process is represented as shown in a formula (5). For the positive direction, x represents a picture in the training set with the labeled dataset, F (x) represents a picture generated by x through the generator F, and Euclidean distance between the picture x and the generated picture F (x) represents a loss of color consistency in the generation process. For the opposite direction, y represents the picture in the training set with the labeled dataset, G (y) represents the picture in which x is generated by the generator F, and the euclidean distance between the picture y and the generated picture G (y) represents the loss of color consistency in the generation process.
L con Is the generation loss of the twin network, and consists of the Euclidean distance of the positive sample and the Euclidean distance of the negative sample. As shown in equation (6). X is x 1 ,x 2 Two input pictures representing a twin network, i represents the label of the input vector pair, when i=X represents at 1 1 ,x 2 Is a positive sample pair, which refers to two pictures of the same pedestrian, namely x and G (x) in the forward direction and y and F (y) in the opposite direction. i=0 times represents x 1 ,x 2 Is a negative pair of samples, which refers to two pictures of different pedestrians, namely y and G (x) in the positive direction and x and F (y) in the opposite direction. d represents the euclidean distance between two input pictures. m is E [0,2 ]]The range of values of m is defined.
L con (i,x 1 ,x 2 )=(1-i){max(0,m-d)} 2 +id 2 (6)
In formula (1), γ 1 、γ 2 、γ 3 Control L respectively cyc 、L ide 、L con The importance parameter is in the range of [1,10]。
The invention uses the mark-1501 and DukeMTMC-reID as training sets to train the SP-PGGAN, wherein the mark-1501 comprises 1501 pedestrians, 12,936 training set pictures and 19,732 searching set pictures. Of which 751 pedestrians were used for training and 750 were used for testing. Each pedestrian is captured under a maximum of six shots. DukeMTMC-reID includes 34,183 pictures of 1,404 pedestrians: 702 pedestrians were used for training, and the remaining 702 pedestrians were used for testing. There were 2,228 images to be tested and 17,661 images to be found.
The invention uses TensorFlow as a framework, uses Market-1501 and DukeMTMC-reID as training sets to train SP-PGGAN, and does not use any ID information in the training process. In all migration experiments we set γ in equation (1) 1 =10,γ 2 =5,γ 3 =3, m=2 in formula (6). The initial learning rate of SP-PGGAN was set at 0.0002 at the same time, and the model stopped training after 5 epochs. During the test phase we use generator G for the migration of Market-1501→DukeMTMC-reiD and generator F for the migration of DukeMTMC-reiD→Market-1501.
Step 3: implementation of pedestrian re-recognition
After style migration, the invention needs to further carry out experiments of pedestrian re-identification. The invention is mainly an improvement in the style migration process, so that in the second step, a basic IDE (ID-discriminative Embedding) method is adopted, the method can be replaced by any pedestrian re-recognition method, the method is mainly selected because the method is a basic method for pedestrian re-recognition, and the migration method of the invention obtains a better result on the model.
In an experiment of pedestrian re-recognition, the training set with the marked data set obtained after SP-PGGAN style migration is trained on the classification network by utilizing the IDE model of pedestrian re-recognition, so that a trained classification network is obtained. And then inputting the test set without the marked data set into a trained classification network, so that the pedestrian re-recognition is realized. As shown in fig. 5, the SP-PGGAN style migrated picture G (x) is trained by the IDE model to obtain a trained IDE network. In the testing process, the pedestrian pictures to be tested in the unlabeled data set and the searching set are input into the trained IDE network together, and the pedestrian pictures identical to the pedestrian pictures to be tested are found in the searching set, so that the pedestrian re-identification process is realized.
As shown in FIG. 5, the IDE of the present invention employs ResNet-50 as the baseline model and adjusts the output dimension of the fully connected layer of the last layer to the number of rows in the training dataset. IDE is a network for training a class, the first 5 layers of convolution layers, 6, 7 are fully connected layers of 1024 neurons, and layer 8 is a class layer of ID number. The network is then trained as a classification task.
The present invention uses ResNet-50 pre-trained on ImageNet to adjust the training set. The final full connection layer of the output is tuned to 751 and 702, respectively, for Market-1501 and DukeMTMC-reID, respectively. The experiments at this step of the present invention used a small batch SGD on a 1080 titanium GPU to train the CNN model. The batch, maximum number of units, momentum and gamma during training were set to 16, 50, 0.9 and 0.1, respectively. The initial learning rate was 0.001, decaying to 0.0001 after 40 batches.
In the present invention, in order to further enhance the effect of pedestrian re-recognition on the target data set, a feature pooling method called Local Maximum Pooling (LMP) is introduced. It can be used well on trained IDE models and can reduce the impact on the re-recognition process due to erroneous pictures in the migration process. In the original ResNet-50, global averaging pooling was used on the fifth convolutional layer. After adding LMP, the feature map obtained by the fifth convolution layer needs to be divided into P parts in the horizontal direction first, and then global/average pooling is used for each part after the division. Finally, the invention splices the global average pooling or global maximum pooling results, and the process can be directly used in the test process. In experiments, the invention can compare global maximum pooling with global average pooling, and select good effect to splice for LMP.
The results of the examples of the present invention are shown in table 2:
TABLE 2
As can be seen from table 2, the migration using SP-PGGAN (m=2) is better than the pedestrian re-recognition effect using CycleGAN. That is, rank-1 and mAP tested on Market-1501 with SP-PGGAN (m=2) were 11.2% and 6.2% more than cyclegaN, respectively, and rank-1 and mAP were 6.7% and 3.5% more, respectively, when tested on DukeMTMC-reID. Thus, the effectiveness of the SP-PGGAN model proposed by the invention is demonstrated. At the same time, the use of LMP on SP-PGGAN has been demonstrated to improve pedestrian re-recognition to some extent.
Claims (5)
1. The pedestrian re-identification method based on SP-PGGAN style migration is characterized by comprising the following three steps:
step (1) construction of SP-PGGAN model
The model is used for style migration between marked pedestrian re-identification data sets and unmarked pedestrian re-identification data sets, the model structure is improved based on a CycleGAN, the CycleGAN is two mirror symmetry GANs, a ring network is formed, the two GANs share a generator G and a generator F, and the generator G and the generator F are respectively provided with a local discriminator which is respectively D X And D Y The SP-PGGAN is based on the cyclegaN, a generator of the cyclegaN is reserved, a twin network is added after the generator is generated to guide the generation process of the generator, and meanwhile, a local discriminator D of the cyclegaN is used for determining the generation process of the cyclegaN X And D Y The method is replaced by a local discriminator and a global discriminator which are parallel;
step (2) style migration of SP-PGGAN model
Simultaneously inputting a training set of the marked pedestrian re-recognition data set and a training set of the unmarked pedestrian re-recognition data set into a constructed SP-PGGAN model for style migration, and obtaining a training set of the migrated marked pedestrian re-recognition data set; two similarities are maintained during migration: firstly, if an image from a marked data set is migrated to an unmarked data set, the migrated image is consistent with the unmarked data set in style; secondly, before and after the image migration, the ID information of the pedestrian in the image with the marked data set is required to be kept unchanged; the ID information refers to the image pedestrian area with potential relation with the ID information except the background, namely the labeling information of pedestrians;
implementation of pedestrian re-identification in step (3)
Training a classification network on a training set with a marked data set obtained after SP-PGGAN style migration by utilizing an IDE model for pedestrian re-identification to obtain a trained IDE model, and realizing pedestrian re-identification on a test set without the marked data set;
the step (2) comprises the following steps:
in the migration process, for the positive direction, the generation process is that a training set picture x from a marked data set generates a picture G (x) through a generator G, the picture G (x) generates a picture F (G (x)) through a generator F, then the pictures x and G (x), the picture G (x) and a picture y of a non-marked data set training set are respectively input into a twin network, and the twin network is used for improving the accuracy of pedestrian ID information in marked data in the style migration process; the discrimination process is that the picture G (x) and the picture y of the training set of the unmarked data set are simultaneously input into the global discriminator D T1 And local discriminant D T2 Is performed in the middle ofTraining a discriminator, wherein the global discriminator discriminates the true and false of the whole picture, and the local discriminator discriminates the local true and false of the picture; for the opposite direction, the generation process is that a picture y from a training set without a marked data set generates a picture F (y) through a generator F, the picture F (y) generates a picture G (F (y)) through a generator G, and then the pictures y and F (y), the picture F (y) and a picture x with the training set with the marked data set are respectively input into a twin network; the discrimination process is that the picture F (y) and the picture x with the marked data set training set are simultaneously input into the global discriminator D S1 And local discriminant D S2 Training the discriminator;
in the training process, parameters of the discriminant are kept unchanged, parameters of the generating process are trained, then parameters of the generator and the discriminant are fixed, and the processes are repeated, so that the generator and the discriminant respectively gradually evolve.
2. The pedestrian re-recognition method based on SP-PGGAN style migration according to claim 1, wherein the pedestrian re-recognition method is characterized by: the SP-PGGAN model in the step (1) consists of a generator and a discriminator;
wherein the generator of SP-PGGAN is composed of two parts: one part is a generator G and a generator F of the CycleGAN, and each generator is sequentially provided with two convolution layers with the step length of 2, six residual blocks and two deconvolution layers with the step length of 1/2 from shallow to deep; the other part is a twin network, which sequentially comprises four layers of convolution layers with the step length of 2, four layers of maximum pooling layers with the step length of 2 and one layer of full connection layer from shallow to deep;
the SP-PGGAN discriminator includes a discriminator D T Sum discriminator D S Wherein the discriminator D T Comprising a local discriminant D T2 And global arbiter D T1 Discriminator D s Comprising a local discriminant D S2 And global arbiter D S1 The method comprises the steps of carrying out a first treatment on the surface of the Distinguishing device D T Sum discriminator D S The structure is the same, the parameters are not shared, wherein, the global arbiter D T1 And local discriminant D T2 The four convolution layers with the step length of 2 are shared, and the four convolution layers are divided into two paths from the end of the fourth convolution layer, wherein one path is a global discriminator D T1 Finally, a binary number is output, and the binary number determines whether the whole picture is judged to be true or false; the other path is a local discriminator D T2 It outputs a 256-dimensional vector through the full-connection layer, and each number determines whether the image block at the corresponding position is true or false.
3. The pedestrian re-recognition method based on SP-PGGAN style migration according to claim 1, wherein the loss function of the SP-PGGAN model in the repeated training process is:
L=L Tadv +L Sadv +L PTadv +L PSadv +γ 1 L cyc +γ 2 L ide +γ 3 L con
L Tadv and L Sadv Is a local discriminant D with mirror symmetry in two directions in the SP-PGGAN model T2 And local discriminant D S2 The losses of (2) are all cross entropy losses; l (L) PTadv And L PSadv Is a global discriminant D of mirror symmetry in two directions in the SP-PGGAN model T1 And global arbiter D S1 The loss of the (2) is the same as the loss of the local discriminant, and is the cross entropy loss; l (L) cyc Is the sum of the losses of two mirror-symmetrical generators in the SP-PGGAN model; l (L) ide Indicating the consistent loss of color in the generation process of the positive direction and the negative direction; l (L) con Is the generation loss of the twin network, gamma 1 、γ 2 、γ 3 Control L respectively cyc 、L ide 、L con The importance parameter is in the range of [1,10]。
4. The pedestrian re-recognition method based on SP-PGGAN style migration of claim 3, wherein:
the L is Tadv The calculation formula of (2) is as follows:
wherein G (x) represents a marked trainingThe training set picture x is a picture generated by a generator G, D T (G (x)) and D T (y) denotes the passage of G (x) and y through the local discriminator D, respectively T2 The result of the discrimination;
the L is Sadv The calculation formula of (2) is as follows:
wherein F (y) represents a picture generated by the non-labeling training set picture y after the non-labeling training set picture passes through the generator F, D s (x) And D s (F (y)) means that x and F (y) pass through the local discriminant D S2 The result of the discrimination;
the L is cyc Is the sum of Euclidean distances between x and F (G (x)), y and G (F (y));
the L is ide The calculation formula of (2) is as follows:
wherein F (x) represents the picture in which x is generated by the generator F, and G (y) represents the picture in which x is generated by the generator F;
the L is con The calculation formula of (2) is as follows:
L con (i,x 1 ,x 2 )=(1-i){max(0,m-d)} 2 +id 2
wherein x is 1 ,x 2 Two input pictures representing a twin network, i representing the labels of the input vector pair, and x when i=1 1 ,x 2 Is a positive sample pair, which refers to two pictures of the same pedestrian, namely x and G (x) in the positive direction and y and F (y) in the opposite direction; i=0 times represents x 1 ,x 2 Is a negative sample pair, which refers to two pictures of different pedestrians, namely y and G (x) in the positive direction, and x and F (y) in the opposite direction; d represents the Euclidean distance between two input pictures; m is E [0,2 ]]The range of values of m is defined.
5. The pedestrian re-recognition method based on SP-PGGAN style migration of claim 1,
the method is characterized in that: the step (3) is specifically as follows:
training the SP-PGGAN style migrated picture G (x) through an IDE model to obtain a trained IDE model, and inputting the picture of the pedestrian to be tested and the search set in the unlabeled data set into the trained IDE model together in the test process, so as to intensively find the picture of the pedestrian identical to the picture of the pedestrian to be tested, thereby realizing the process of re-identifying the pedestrian.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010226128.5A CN111428650B (en) | 2020-03-26 | 2020-03-26 | Pedestrian re-recognition method based on SP-PGGAN style migration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010226128.5A CN111428650B (en) | 2020-03-26 | 2020-03-26 | Pedestrian re-recognition method based on SP-PGGAN style migration |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111428650A CN111428650A (en) | 2020-07-17 |
CN111428650B true CN111428650B (en) | 2024-04-02 |
Family
ID=71548862
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010226128.5A Active CN111428650B (en) | 2020-03-26 | 2020-03-26 | Pedestrian re-recognition method based on SP-PGGAN style migration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111428650B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112232422A (en) * | 2020-10-20 | 2021-01-15 | 北京大学 | Target pedestrian re-identification method and device, electronic equipment and storage medium |
CN113569627B (en) * | 2021-06-11 | 2024-06-14 | 北京旷视科技有限公司 | Human body posture prediction model training method, human body posture prediction method and device |
CN113658178B (en) * | 2021-10-14 | 2022-01-25 | 北京字节跳动网络技术有限公司 | Tissue image identification method and device, readable medium and electronic equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670528A (en) * | 2018-11-14 | 2019-04-23 | 中国矿业大学 | The data extending method for blocking strategy at random based on paired samples towards pedestrian's weight identification mission |
CN110163110A (en) * | 2019-04-23 | 2019-08-23 | 中电科大数据研究院有限公司 | A kind of pedestrian's recognition methods again merged based on transfer learning and depth characteristic |
-
2020
- 2020-03-26 CN CN202010226128.5A patent/CN111428650B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670528A (en) * | 2018-11-14 | 2019-04-23 | 中国矿业大学 | The data extending method for blocking strategy at random based on paired samples towards pedestrian's weight identification mission |
CN110163110A (en) * | 2019-04-23 | 2019-08-23 | 中电科大数据研究院有限公司 | A kind of pedestrian's recognition methods again merged based on transfer learning and depth characteristic |
Non-Patent Citations (2)
Title |
---|
SATOSHI IIZUKA 等.Globally and Locally Consistent Image Completion.《ACM Transactions on Graphics》.2017,第36卷(第4期),1-14. * |
Weijian Deng 等.Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification.《arXiv:1711.07027v3》.2018,1-10. * |
Also Published As
Publication number | Publication date |
---|---|
CN111428650A (en) | 2020-07-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | SAANet: Siamese action-units attention network for improving dynamic facial expression recognition | |
Liu et al. | Leveraging unlabeled data for crowd counting by learning to rank | |
Vazquez et al. | Virtual and real world adaptation for pedestrian detection | |
CN111428650B (en) | Pedestrian re-recognition method based on SP-PGGAN style migration | |
Peng et al. | Learning multi-region features for vehicle re-identification with context-based ranking method | |
Wang et al. | Afan: Augmented feature alignment network for cross-domain object detection | |
CN104200237A (en) | High speed automatic multi-target tracking method based on coring relevant filtering | |
Xiong et al. | ASK: Adaptively selecting key local features for RGB-D scene recognition | |
Luo et al. | SFA: small faces attention face detector | |
Jiang et al. | Application of a fast RCNN based on upper and lower layers in face recognition | |
Fan et al. | Multi-task and multi-modal learning for rgb dynamic gesture recognition | |
Yang et al. | Sampling agnostic feature representation for long-term person re-identification | |
Qian et al. | URRNet: A Unified Relational Reasoning Network for Vehicle Re-Identification | |
Shi et al. | Spatial-wise and channel-wise feature uncertainty for occluded person re-identification | |
Tian et al. | Domain adaptive object detection with model-agnostic knowledge transferring | |
Song et al. | Dense face network: A dense face detector based on global context and visual attention mechanism | |
Cai et al. | Beyond photo-domain object recognition: Benchmarks for the cross-depiction problem | |
Tian et al. | Self-regulation feature network for person reidentification | |
Zhu et al. | Expression recognition method combining convolutional features and Transformer. | |
Yang et al. | Actor and action modular network for text-based video segmentation | |
Fu et al. | Distractor-aware event-based tracking | |
Ammar et al. | Comparative Study of latest CNN based Optical Flow Estimation | |
Lu et al. | A Traffic Sign Detection Network Based on PosNeg-Balanced Anchors and Domain Adaptation | |
CN114140524A (en) | Closed loop detection system and method for multi-scale feature fusion | |
Irawan et al. | Spontaneous Micro-Expression Recognition Using 3DCNN on Long Videos for Emotion Analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |