CN111709364A - Pedestrian re-identification method based on visual angle information and batch characteristic erasing - Google Patents

Pedestrian re-identification method based on visual angle information and batch characteristic erasing Download PDF

Info

Publication number
CN111709364A
CN111709364A CN202010549985.9A CN202010549985A CN111709364A CN 111709364 A CN111709364 A CN 111709364A CN 202010549985 A CN202010549985 A CN 202010549985A CN 111709364 A CN111709364 A CN 111709364A
Authority
CN
China
Prior art keywords
pedestrian
batch
features
visual angle
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010549985.9A
Other languages
Chinese (zh)
Inventor
张红
李建华
徐志刚
曹洁
任伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lanzhou University of Technology
Original Assignee
Lanzhou University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lanzhou University of Technology filed Critical Lanzhou University of Technology
Priority to CN202010549985.9A priority Critical patent/CN111709364A/en
Publication of CN111709364A publication Critical patent/CN111709364A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pedestrian re-identification method based on visual angle information and batch characteristic erasure, and belongs to the technical field of computer vision and pattern identification. The method mainly realizes pedestrian re-identification through the construction of a PSE network model, the training of the PSE model and the construction of a BFE network model. The invention has good generalization capability and robustness on three data sets by a method of combining visual angle information and batch characteristic erasure. The method can obtain better identification effect by using three attention mechanisms, and the precision of the Rank-1 is improved by 0.5% compared with the independent visual angle characteristic attention and is improved by 0.2% compared with the independent CBAM attention; the mAP result is improved by 1.3% compared with the attention of the individual visual angle characteristic and 1.2% compared with the attention of the individual CBAM, and the method has a good using effect.

Description

Pedestrian re-identification method based on visual angle information and batch characteristic erasing
Technical Field
The invention belongs to the technical field of computer vision and pattern recognition, and particularly relates to a pedestrian re-recognition method based on visual angle information and batch characteristic erasing.
Background
The pedestrian re-identification means that pedestrian pictures shot by the non-overlapping cameras are matched. In practical application, the number of the monitoring cameras is large, and the pedestrian target is influenced by the hardware difference and the environment difference of different cameras, so that the image of the same pedestrian is influenced by problems such as background change, illumination change, posture change and shielding. Therefore, how to select image features with strong discrimination performance aiming at the influences and how to establish a proper model to provide a more efficient and robust method for pedestrian re-identification is the problem to be solved by the invention. The current pedestrian re-identification research usually only aims at one feature to carry out model optimization, and global features or local features. Therefore, the invention provides a method capable of fully utilizing global features and fine-grained features of a pedestrian image, visual angle features of the pedestrian image are extracted as global features, fine-grained features of the pedestrian image are extracted by a batch feature erasing method, and the model can extract more discriminative image features by combined learning of the two features. In the process of establishing the model, the complex structure is optimized by adopting an attention mechanism, and a mechanism with the combined action of a plurality of attentions is formed, so that the accuracy of pedestrian re-identification can be ensured, the parameter quantity of the model can be reduced, and the generalization performance of the model is improved.
Disclosure of Invention
The invention aims to provide a pedestrian re-identification method based on visual angle information and batch feature erasure, which can fully utilize the global features and fine-grained features of a pedestrian image and extract the visual angle features of the pedestrian image as the global features.
The pedestrian re-identification method based on visual angle information and batch characteristic erasing comprises the following steps:
1) construction of a PSE network model: with Resnet50 as a basic structure, each of Block1, Block2, Block3 and Block4 corresponds to a corresponding Block structure of Resnet50, a view classifier branch is added after Block1, after a series of convolution operations are carried out on a pedestrian image, probability values of the directions of the front, back and side of the pedestrian image are obtained by utilizing softmax, and the values predict the view direction of the image;
2) PSE model training: the model training is mainly realized by the following method:
firstly, loading ImageNet pre-trained weight parameters for a related structure of Resnet50 as initialization;
training a view classifier using the RAP dataset containing the orientation label;
migrating the trained view classifier to a PSE network, fixing parameters of the view classifier, Block1, Block2 and Block3, and training a visual angle unit by using a pedestrian to recognize a data set so as to initialize parameters of the visual angle unit;
fourthly, extracting 14 joint point information of the whole body of the pedestrian from all the pedestrian images by adopting a DeeperCut model;
fifthly, taking the extracted 14 joint points as input, wherein the input is 17 channels, fixing all Block structures, and finely adjusting the first layer and the last classification layer of the network to enable the network to adapt to the input of the 17 channels;
fine tuning the view classifier by using the joint point information extracted by the RAP data set;
seventhly, training a network model by adopting a pedestrian re-recognition data set;
3) constructing a BFE network model: extracting two branches of a Global branch and a featurerasing branch by taking Resnet50 as a basic structure, adding a Bottleneck structure in the featurerasing branch, introducing a Mask structure, randomly erasing the features in the same batch by the structure, obtaining fine-grained features of an image by performing maximum pooling and dimension reduction on the remaining features, and finally fusing the Feature vectors extracted by the two branches to be used as a final Feature vector of an input pedestrian picture.
Further, when the view direction of the predicted image is "front" in step 1), the probability value of "front" obtained by softmax is large, and the probability values of "rear" and "side" are small.
Further, when the batch features of the BFE are erased in step 3), the method may erase the same semantic section for the same batch of features when all input images re-identified by the pedestrian are approximately aligned.
Further, in the step 3), a view unit structure is added behind a Block3, different view units are used for learning different azimuth information of the pedestrian image, a batch feature erasing branch is added behind a Block3 structure, the branch continues to extract depth features from a Block4 structure and a Bottleneck structure, and then operations such as pooling and dimension reduction are performed through a batch feature erasing method, so that fine-grained features of the image are obtained finally.
Further, each view unit is composed of a 1 × 1 convolution layer, a convolution block attention module, i.e., CBAM, a 1 × 1 convolution layer, a Batch normalization layer, and a ReLu layer, and the CBAM module performs attention mapping on the input feature map according to two independent dimensions, i.e., channel and space.
Further, the another network model training process includes the following steps:
1) loading the weight parameters trained on ImageNet of the Resnet50 network part, and initializing the skeleton network part;
2) training a view classifier using a RAP data set, with a learning rate set to 0.0001;
3) fixing the view classifier and the Resnet50 related structure parameters, using a pedestrian re-identification data set, and only finely adjusting the visual angle unit part and the final pedestrian identity label classification layer;
4) the entire network is trained using a pedestrian re-identification dataset.
Compared with the prior art, the invention has the beneficial effects that:
1) the batch feature erasing method of BFE in the invention has the advantages that when all input images identified by pedestrians are approximately aligned, the method can erase the same semantic interval from the same batch of features, so that the model has a good learning effect on the remaining fine-grained features.
2) According to the invention, the CBAM module is introduced, and attention on the channel and space dimensions is performed on the features of each visual angle unit, and the three attention mechanisms act together, so that the extraction of high-level features is greatly enriched, and the three different visual angle units can learn more different features to the greatest extent. Thus, the improved view angle unit can ensure the accuracy of pedestrian re-identification while simplifying the network structure.
3) The invention has good generalization capability and robustness on three data sets by a method of combining visual angle information and batch characteristic erasure.
4) The method can obtain better identification effect by using three attention mechanisms, and the precision of the Rank-1 is improved by 0.5% compared with the independent visual angle characteristic attention and is improved by 0.2% compared with the independent CBAM attention; the mAP result is 1.3% higher than the visual angle characteristic attention alone and 1.2% higher than the CBAM attention alone.
Detailed Description
The following examples further describe in detail specific embodiments of the present invention. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Example 1
The pedestrian re-identification method based on visual angle information and batch characteristic erasing comprises the following steps:
1) construction of a PSE network model: with Resnet50 as a basic structure, each of Block1, Block2, Block3 and Block4 corresponds to a corresponding Block structure of Resnet50, a view classifier branch is added after Block1, after a series of convolution operations are carried out on a pedestrian image, probability values of the directions of the front, back and side of the pedestrian image are obtained by utilizing softmax, and the values predict the view direction of the image;
2) PSE model training: the model training is mainly realized by the following method:
firstly, loading ImageNet pre-trained weight parameters for a related structure of Resnet50 as initialization;
training a view classifier using the RAP dataset containing the orientation label;
migrating the trained view classifier to a PSE network, fixing parameters of the view classifier, Block1, Block2 and Block3, and training a visual angle unit by using a pedestrian to recognize a data set so as to initialize parameters of the visual angle unit;
fourthly, extracting 14 joint point information of the whole body of the pedestrian from all the pedestrian images by adopting a DeeperCut model;
fifthly, taking the extracted 14 joint points as input, wherein the input is 17 channels, fixing all Block structures, and finely adjusting the first layer and the last classification layer of the network to enable the network to adapt to the input of the 17 channels;
fine tuning the view classifier by using the joint point information extracted by the RAP data set;
seventhly, training a network model by adopting a pedestrian re-recognition data set;
3) constructing a BFE network model: extracting two branches of a Global branch and a featurerasing branch by taking Resnet50 as a basic structure, adding a Bottleneck structure in the featurerasing branch, introducing a Mask structure, randomly erasing the features in the same batch by the structure, obtaining fine-grained features of an image by performing maximum pooling and dimension reduction on the remaining features, and finally fusing the Feature vectors extracted by the two branches to be used as a final Feature vector of an input pedestrian picture.
In the step 1), when the view direction of the predicted image is "front", the probability value of "front" obtained by softmax is large, and the probability values of "rear" and "side" are small.
When the batch features of the BFE are erased in the step 3), when all input images re-identified by pedestrians are approximately aligned, the method can enable the features of the same batch to erase the same semantic interval. Adding a view angle unit structure behind Block3, wherein different view angle units are used for learning different azimuth information of a pedestrian image, adding a batch feature erasing branch behind a Block3 structure, continuously extracting depth features from a Block4 structure and a Bottleneck structure, performing pooling, dimension reduction and other operations by a batch feature erasing method, and finally obtaining fine-grained features of the image. Each view angle unit is composed of a 1 × 1 convolution layer, a convolution block attention module, namely a CBAM, a 1 × 1 convolution layer, a Batchnormalization layer and a ReLu layer, and the CBAM module performs attention mapping on an input feature map according to two independent dimensions, namely a channel and a space.
The other network model training process comprises the following steps:
1) loading the weight parameters trained on ImageNet of the Resnet50 network part, and initializing the skeleton network part;
2) training a view classifier using a RAP data set, with a learning rate set to 0.0001;
3) fixing the view classifier and the Resnet50 related structure parameters, using a pedestrian re-identification data set, and only finely adjusting the visual angle unit part and the final pedestrian identity label classification layer;
4) the entire network is trained using a pedestrian re-identification dataset.
During training, all input images were resized to 384 × 128 and data enhancement was performed using normalization and random horizontal flipping. Because horizontal overturning is adopted, the left and right directions are uniformly classified as the sides in the experiment of the invention. In the batch feature erase branch, the erase aspect ratio is set to rh 0.5 and rw 1.0, respectively. In the whole network training process, the batch size is set to be 128, and 700 epochs are trained by adopting an Adam optimizer. The learning rate is adjusted as the epoch increases, the learning rate is set to 1e-4 × (epoch/5+1) when the epoch is less than 50, 1e-3 when the epoch is greater than 50 and less than 200, 1e-4 when the epoch is greater than 200 and less than 300, and 1e-5 after 300 epochs. The loss function is the loss function in the BFE model training, namely the sum of the Soft margin batch-hard triple loss and the softmax loss. Wherein Soft margin batch-hard tripletloss is defined as follows:
Figure BDA0002542108670000061
Figure BDA0002542108670000062
in the formula, P is the number of different pedestrians in one batch, and K is the number of pictures of each pedestrian. For anchor samples, positive samples, and negative samples, these three samples constitute a triplet. Here, the image features with the same pedestrian identity as the anchor sample but the farthest similarity are selected as the positive samples: and selecting the image features which have different pedestrian identities from the anchor samples and have the closest similarity as the negative samples. The feature vectors thus learned are expressed, and the euclidean distances between samples are calculated.
During the test, the pedestrian image was also reset to 384 × 128 and normalized. It is worth noting that in the testing process, all the images of the query set Xquery and the candidate set Xgalery are horizontally inverted, and the feature vectors learned by the inverted images and the feature vectors learned by the original images are added to calculate the average to be used as the feature vectors of the final pedestrian images. And calculating and sorting Euclidean distance between the feature vector f (q) of each image q in the query set and the feature vector f (g) of the image g in the candidate set. According to the similarity obtained by the Euclidean distance, if the pedestrian IDs are the same and the camera IDs are different, the identification is correct.
1. Results and analysis of the experiments
The experiment adopts a deep learning frame Python 1.0.1 based on Python3.7, and relevant experiments are carried out on three public pedestrian re-identification data sets, Market1501, DukeMTMC-reiD and CUHK 03. Compared with other advanced methods, the method achieves good identification precision on Rank-1 and mAP indexes, and shows that the method has good pedestrian re-identification performance.
1.1 comparison of the recognition Performance of the method of the invention with other ReiD methods
In order to prove that the method has advanced pedestrian re-identification performance, nine latest pedestrian re-identification methods are selected for comparison experiments, and the experimental results are shown in table 1.1.
Figure BDA0002542108670000071
Figure BDA0002542108670000081
As can be seen from Table 1.1, the Rank-1 precision and the mAP precision of the method of the invention on three data sets are obviously improved. On the data set Market1501, the Rank-1 precision is 3.6% higher than the Rank-1 precision of the algorithm HA-CNN, and the mAP precision is 11.1% higher; compared with the Rank-1 precision of the algorithm PCB, the accuracy is higher by 2.4 percent, and the mAP precision is higher by 9.5 percent. On a data set DukeMTMC-reiD, the Rank-1 precision of the method is improved by 6.9 percent compared with the Rank-1 of an algorithm PCB, and the mAP precision is improved by 11.9 percent; compared with the Rank-1 precision of the algorithm HA-CNN, the accuracy is improved by 8.3 percent, and the mAP precision is improved by 13.4 percent. On the data set CUHK03-label, the Rank-1 precision of the invention is 14.7% higher than that of the Rank-1 of the algorithm DaRe, and the mAP precision is 16.1% higher; the precision is 1.2% higher than that of Rank-1 of the algorithm BFE, and the precision of mAP is 1.3% higher. On a data set CUHK03-detect, the precision of the Rank-1 is 1.8% higher than that of BFE, and the precision of mAP is 2.4% higher; the precision is 14.0 percent higher than that of Rank-1 of the algorithm DaRe, and the precision of mAP is 15.7 percent higher. When compared with the MGN method, the invention has less than optimal Rank-1 precision and mAP precision in the data sets Market1501 and DukeMTMC-reiD, but on the data set CUHK03-label, the Rank-1 precision and the mAP precision respectively exceed 12.8 percent and 10.3 percent of MGN, on the data set CUHK03-detect, the Rank-1 precision exceeds 10.5 percent of MGN, and the mAP precision exceeds 8.7 percent. Because training samples of the CUHK03 data set are fewer, difficulty is increased for training of a network model, and an overfitting phenomenon is easy to generate, the method has better recognition performance and generalization capability. In addition, the MGN model has 8 feature extraction branches, 11 loss functions, and its network structure is very complex. When compared with the BFE model, the Rank-1 precision of the data sets Market1501 and DukeMTMC-reiD is only slightly improved (+ 0.4%, + 0.1%), but the mAP precision is obviously improved (+ 1.8%, + 1.4%). Experiments prove that the method combining the visual angle information and the batch characteristic erasing has good generalization capability and robustness on three data sets.
1.2 identification Performance comparison Using different View Unit modules
Since the view unit of the PSE network is composed of three blocks 4, each Block4 has more than 20 layers, the structure is complex, and the parameter amount is too large. If only a simple combination of the view branch of the PSE network and the batch erasure branch of the BFE network is performed, this will result in insufficient computational memory. The invention thus proposes a simple and effective view cell structure. To verify the performance of the structure, we performed comparative experiments on the branch of view information, and the experimental results are shown in table 1.2:
Figure BDA0002542108670000091
1.3 identification Performance comparison with different attention mechanisms
The process of weighting the scores predicted by the view classifier and the view units is essentially an attention mechanism to the characteristic attributes, and on the basis of the attention mechanism, by introducing a CBAM module, each view unit carries out channel attention and space attention, so that three attention mechanisms are formed. In order to verify the effectiveness of the three-attention mechanism, the present invention was verified experimentally. The results of the experiment are shown in table 1.3.
In the experiment, the view angle information branch of the model provided by the invention is also used as a basic structure, and a Market1501 is adopted as a data set. Wherein, only the view classifier is introduced to predict the image orientation for feature attention, without adding a CBAM module in a view unit structure; only CBAM attribute indicates that only one view cell structure with CBAM module is used without using a view classifier; view and CBAM attention represents the complete branch of improved view information, both the CBAM attention module and the characteristic attention of the view information. According to experimental results, the method of using the CBAM attention module alone has a little advantage over the method of using the visual angle characteristic attention alone, and the Rank-1 precision is 0.3% higher than the visual angle characteristic attention alone, and the mAP precision is 0.1% higher than the mAP precision. The method using the three attention mechanisms can achieve a better identification effect, and the accuracy of the Rank-1 is improved by 0.5% compared with the attention of individual visual angle features and is improved by 0.2% compared with the attention of individual CBAM; the mAP result is 1.3% higher than the visual angle characteristic attention alone and 1.2% higher than the CBAM attention alone. Therefore, the performance of the model is better improved by using the three attention mechanism method, and the effectiveness of the three attention mechanism is verified.
Figure BDA0002542108670000101
1.4 multiple feature fused recognition Performance comparison
The invention combines the visual angle information and the batch characteristic erasing method, the visual angle information is used as the global characteristic, and the batch characteristic erasing method learns the fine-grained characteristic. In order to verify that the two characteristics are fused to enable the network model to have better discrimination performance, the invention is verified through experiments, and the experimental results are shown in table 1.4.
Figure BDA0002542108670000102
The feature fusion contrast experiment uses a data set Market 1501. The View information branch is a View information branch of the model of the invention, and the branch learns the global characteristics of the pedestrian image. Feature erasing branch of the present invention, which learns fine-grained features of pedestrian images. And All is the whole network structure, namely the learned global features and the learned fine-grained features are fused. As can be seen from Table 1.4, in the data set Market1501, the accuracy of the feature erasure branch is improved by 0.1% compared with the Rank-1 accuracy of the view information branch, and the mAP accuracy is improved by 1.7%, which is slightly superior. However, when the two branches act simultaneously, the Rank-1 precision and the mAP precision are optimal and respectively reach 94.8 percent and 86.8 percent. Compared with the single view information branch Rank-1, the view information branch Rank-1 is improved by 1.9 percent, and the mAP is improved by 4.4 percent; compared with the single characteristic erasure branch Rank-1, the erasure branch Rank-1 is improved by 1.0 percent, and the mAP is improved by 2.7 percent.
The analysis shows that the method for combining the global features learned by the view information branch and the fine-grained features learned by the feature erasure branch has stronger characterization performance and higher identification precision compared with the single view features and the single fine-grained features, and proves the effectiveness of the two feature fusion algorithms.
1.5 batch feature erase Module recognition Performance comparison on other models
The invention improves the visual angle information structure of the PSE model, and adds BFE module branches to the structure, thereby extracting pedestrian features with more discriminative power. Experiments show that the BFE module can effectively improve the re-identification accuracy of the model and has good generalization capability on a plurality of data sets. In order to verify whether the BFE module can improve the identification precision on other network models, the method adopts another two network models for experiment, and the experimental result is shown in table 1.5.
Figure BDA0002542108670000111
The experiment adopts a data set Market1501, and the network model adopts IDE and PCB. The IDE is a common pedestrian re-identification infrastructure, and slightly changes the Resnet50, and mainly extracts the global features of the pedestrian image. The PCB network uniformly divides the pedestrian features into 6 blocks, each block of features adopts respective loss to train the model, and the local features of the pedestrian image are mainly excavated. In the experiment using the IDE model, the loss function selects triplet loss, margin is set to 1.0, and the learning rate and optimizer are the same as those set in the experiment of the present invention. The IDE + BFE is based on the IDE model, and a BFE branch is added to the IDE model. According to the experimental result, the Rank-1 precision and the mAP precision of the BFE branches are respectively improved by 2.1% and 3.0% compared with those of an IDE model. The learning rate was reduced to 0.01 after 40 epochs, on a 0.1 basis. The Rank-1 precision is 92.2%, and the mAP precision is 77.8%. After the BFE module is added to the PCB model, the precision is reduced inversely, the Rank-1 precision is only 88.3%, the mAP precision is 70.4%, and the precision is far inferior to the experimental result of the PCB. The PCB model extracts the local features of the pedestrian image, the BFE module is added to extract the fine-grained features, the identification precision cannot be improved, the fine-grained features become interference features, the IDE model mainly extracts the global features, and the fine-grained features can improve the identification precision of the model under the supervision and supplement of the global features.
From the above analysis, in the comparison of the batch feature erasing module and the two network models, the identification precision is improved on the IDE model for extracting the global features, and no good experimental effect is obtained on the PCB model for extracting the local features, so that the BFE module can improve the identification precision of the models under the supervision and supplement of the global features.
1.6 identification Performance comparison with different loss functions
The method adopts a loss function combining Soft margin batch-hard triple loss and Soft max loss in the training process. The identification capability of the model is improved by using a strategy of combining two loss functions, and the accuracy of pedestrian re-identification is effectively improved. In order to verify the superior performance of the Soft margin batch-hard triplet loss function, the invention performs a comparison experiment with the reference triplet loss function, compares the combination of the triplet loss and the softmax loss function with the loss function of the invention, and the experimental result is shown in table 1.6.
Figure BDA0002542108670000121
In the comparison with the benchmark triplet loss, the Soft margin batch-hard triplet loss function is adopted to achieve the optimal solution of the recognition precision, the loss function avoids the setting of a margin threshold, and the model training can be fast and effective by using the loss function.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments or portions thereof without departing from the spirit and scope of the invention.

Claims (6)

1. The pedestrian re-identification method based on visual angle information and batch characteristic erasing is characterized by comprising the following steps of:
1) construction of a PSE network model: with Resnet50 as a basic structure, each of Block1, Block2, Block3 and Block4 corresponds to a corresponding Block structure of Resnet50, a view classifier branch is added after Block1, after a series of convolution operations are carried out on a pedestrian image, probability values of the directions of the front, back and side of the pedestrian image are obtained by utilizing softmax, and the values predict the view direction of the image;
2) PSE model training: the model training is mainly realized by the following method:
firstly, loading ImageNet pre-trained weight parameters for a related structure of Resnet50 as initialization;
training a view classifier using the RAP dataset containing the orientation label;
migrating the trained view classifier to a PSE network, fixing parameters of the view classifier, Block1, Block2 and Block3, and training a visual angle unit by using a pedestrian to recognize a data set so as to initialize parameters of the visual angle unit;
fourthly, extracting 14 joint point information of the whole body of the pedestrian from all the pedestrian images by adopting a DeeperCut model;
fifthly, taking the extracted 14 joint points as input, wherein the input is 17 channels, fixing all Block structures, and finely adjusting the first layer and the last classification layer of the network to enable the network to adapt to the input of the 17 channels;
fine tuning the view classifier by using the joint point information extracted by the RAP data set;
seventhly, training a network model by adopting a pedestrian re-recognition data set;
3) constructing a BFE network model: taking Resnet50 as a basic structure, extracting two branches of a Globalbranch and a featurerasing branch, adding a Bottleneck structure in the featurerasing branch, introducing a Mask structure, randomly erasing the features in the same batch by the structure, obtaining fine-grained features of an image by performing maximum pooling and dimension reduction on the remaining features, and finally fusing the Feature vectors extracted by the two branches to be used as a final Feature vector of an input pedestrian picture.
2. The method for re-identifying pedestrians based on view information and batch feature erasures as claimed in claim 1, wherein in step 1), when the view direction of the predicted image is "front", then probability value of "front" obtained by softmax is large, and probability value of "back" and "side" is small.
3. The pedestrian re-identification method based on perspective information and batch feature erasure as claimed in claim 1, wherein when erasing the batch features of BFE in step 3), the method can erase the same semantic interval for the same batch of features when all input images for pedestrian re-identification are roughly aligned.
4. The pedestrian re-identification method based on perspective information and batch feature erasure of claim 1, wherein in step 3), a perspective unit structure is added after Block3, different perspective units are used for learning different orientation information of pedestrian images, a batch feature erasure branch is added after Block3 structure, the branch continues to extract depth features from a Block4 structure and a boltleeck structure, and then pooling, dimension reduction and other operations are performed through the batch feature erasure method, and finally fine-grained features of the images are obtained.
5. The pedestrian re-identification method based on perspective information and Batch feature erasure of claim 1, wherein each perspective unit is composed of a 1 x 1 convolution layer, a Convolution Block Attention Module (CBAM), a 1 x 1 convolution layer, a Batch normalization layer and a ReLu layer, and the CBAM module performs attention mapping on the input feature map according to two independent dimensions, i.e. channel and space.
6. The pedestrian re-identification method based on perspective information and batch feature erasure as claimed in claim 1 wherein said another network model training process comprises the steps of:
1) loading the weight parameters trained on ImageNet of the Resnet50 network part, and initializing the skeleton network part;
2) training a view classifier using a RAP data set, with a learning rate set to 0.0001;
3) fixing the view classifier and the Resnet50 related structure parameters, using a pedestrian re-identification data set, and only finely adjusting the visual angle unit part and the final pedestrian identity label classification layer;
4) the entire network is trained using a pedestrian re-identification dataset.
CN202010549985.9A 2020-06-16 2020-06-16 Pedestrian re-identification method based on visual angle information and batch characteristic erasing Withdrawn CN111709364A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010549985.9A CN111709364A (en) 2020-06-16 2020-06-16 Pedestrian re-identification method based on visual angle information and batch characteristic erasing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010549985.9A CN111709364A (en) 2020-06-16 2020-06-16 Pedestrian re-identification method based on visual angle information and batch characteristic erasing

Publications (1)

Publication Number Publication Date
CN111709364A true CN111709364A (en) 2020-09-25

Family

ID=72540679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010549985.9A Withdrawn CN111709364A (en) 2020-06-16 2020-06-16 Pedestrian re-identification method based on visual angle information and batch characteristic erasing

Country Status (1)

Country Link
CN (1) CN111709364A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766353A (en) * 2021-01-13 2021-05-07 南京信息工程大学 Double-branch vehicle re-identification method for enhancing local attention
CN113221770A (en) * 2021-05-18 2021-08-06 青岛根尖智能科技有限公司 Cross-domain pedestrian re-identification method and system based on multi-feature hybrid learning
CN114022906A (en) * 2021-12-10 2022-02-08 南通大学 Pedestrian re-identification method based on multi-level features and attention mechanism

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766353A (en) * 2021-01-13 2021-05-07 南京信息工程大学 Double-branch vehicle re-identification method for enhancing local attention
CN112766353B (en) * 2021-01-13 2023-07-21 南京信息工程大学 Double-branch vehicle re-identification method for strengthening local attention
CN113221770A (en) * 2021-05-18 2021-08-06 青岛根尖智能科技有限公司 Cross-domain pedestrian re-identification method and system based on multi-feature hybrid learning
CN113221770B (en) * 2021-05-18 2024-06-04 青岛根尖智能科技有限公司 Cross-domain pedestrian re-recognition method and system based on multi-feature hybrid learning
CN114022906A (en) * 2021-12-10 2022-02-08 南通大学 Pedestrian re-identification method based on multi-level features and attention mechanism
CN114022906B (en) * 2021-12-10 2024-07-09 南通大学 Pedestrian re-identification method based on multi-level characteristics and attention mechanism

Similar Documents

Publication Publication Date Title
Guo et al. Two-level attention network with multi-grain ranking loss for vehicle re-identification
CN113408492B (en) Pedestrian re-identification method based on global-local feature dynamic alignment
CN110321813B (en) Cross-domain pedestrian re-identification method based on pedestrian segmentation
Teng et al. Multi-view spatial attention embedding for vehicle re-identification
CN110263697A (en) Pedestrian based on unsupervised learning recognition methods, device and medium again
CN110110689B (en) Pedestrian re-identification method
CN111709364A (en) Pedestrian re-identification method based on visual angle information and batch characteristic erasing
CN111507217A (en) Pedestrian re-identification method based on local resolution feature fusion
CN109598268A (en) A kind of RGB-D well-marked target detection method based on single flow depth degree network
Li et al. Discriminative semi-coupled projective dictionary learning for low-resolution person re-identification
Chen et al. Vehicle re-identification using distance-based global and partial multi-regional feature learning
CN111738143A (en) Pedestrian re-identification method based on expectation maximization
Kim et al. Deep stereo confidence prediction for depth estimation
CN114782977B (en) Pedestrian re-recognition guiding method based on topology information and affinity information
CN110956158A (en) Pedestrian shielding re-identification method based on teacher and student learning frame
CN112070010A (en) Pedestrian re-recognition method combining multi-loss dynamic training strategy to enhance local feature learning
CN113627380A (en) Cross-vision-field pedestrian re-identification method and system for intelligent security and early warning
CN117935299A (en) Pedestrian re-recognition model based on multi-order characteristic branches and local attention
Cygert et al. Closer look at the uncertainty estimation in semantic segmentation under distributional shift
Zhang et al. Unsupervised maritime vessel re-identification with multi-level contrastive learning
Jiao et al. Vehicle re-identification in aerial images and videos: Dataset and approach
CN111695531A (en) Cross-domain pedestrian re-identification method based on heterogeneous convolutional network
Lee et al. Learning to distill convolutional features into compact local descriptors
Nikhal et al. Multi-context grouped attention for unsupervised person re-identification
CN114495004A (en) Unsupervised cross-modal pedestrian re-identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200925