CN114022686A - Pedestrian re-identification method oriented to occlusion scene - Google Patents

Pedestrian re-identification method oriented to occlusion scene Download PDF

Info

Publication number
CN114022686A
CN114022686A CN202111484998.3A CN202111484998A CN114022686A CN 114022686 A CN114022686 A CN 114022686A CN 202111484998 A CN202111484998 A CN 202111484998A CN 114022686 A CN114022686 A CN 114022686A
Authority
CN
China
Prior art keywords
features
global
loss
pedestrian
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111484998.3A
Other languages
Chinese (zh)
Inventor
王蓉
孙义博
张文靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PEOPLE'S PUBLIC SECURITY UNIVERSITY OF CHINA
Original Assignee
PEOPLE'S PUBLIC SECURITY UNIVERSITY OF CHINA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PEOPLE'S PUBLIC SECURITY UNIVERSITY OF CHINA filed Critical PEOPLE'S PUBLIC SECURITY UNIVERSITY OF CHINA
Priority to CN202111484998.3A priority Critical patent/CN114022686A/en
Publication of CN114022686A publication Critical patent/CN114022686A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a pedestrian re-identification method facing an occlusion scene, which is characterized in that a global contrast pooling module is introduced into a global feature branch, the average pooling characteristic and the maximum pooling characteristic are fused, and global features with stronger background noise and occlusion anti-interference performance are extracted; introducing an One-vs-rest relation module into the local feature branch, associating each part of local features with the rest of local features, and extracting the local features containing global information; and in the measurement learning stage, the training process of the model is supervised by combining three loss functions of cross entropy loss, sample-difficult sampling triple loss and central loss, so that the network model can extract pedestrian features with more discriminative power. And finally, evaluating on an Occluded-DukeMTMC data set, and fully embodying the effectiveness and the advancement of the method in solving the problem of pedestrian re-identification shielding.

Description

Pedestrian re-identification method oriented to occlusion scene
Technical Field
The invention relates to a pedestrian re-identification method, in particular to a pedestrian re-identification method facing an occlusion scene.
Background
Pedestrian re-identification is a popular and complex research topic in the field of computer vision. A specific pedestrian image to be inquired is given, whether the specific pedestrian exists under other non-overlapping cameras is judged by using a computer vision technology, the vision limitation of a single fixed camera is made up to a great extent, and the method is usually combined with technologies such as pedestrian detection and pedestrian tracking and is applied to the fields of mode recognition and the like.
Compared with the traditional pedestrian re-identification method, the pedestrian re-identification method based on deep learning can extract more discriminative feature description through automatic learning and simple similarity measurement, and better performance is obtained. However, in practical application scenarios, the pedestrian re-identification technology faces various problems: 1. the pedestrian postures shot by the camera are variable due to the angle and the position of the camera; 2. the resolution of the pedestrian image is changed due to the shooting distance; cross-modal re-identification due to image domain variation; 4. and (6) shielding. Under the shielding scene, pedestrians in the image shot by the camera are often shielded by some obstacles (such as luggage, a counter, people, an automobile and trees), or due to the fact that the partial region of the pedestrian body leaves the shooting range of the camera, the identification information contained in the image of the pedestrians is reduced, and meanwhile, the pedestrians are interfered by noise of the shielding part, so that the target pedestrians are easily matched with wrong pedestrians. Therefore, accurately matching the pedestrian picture only with local observability, efficiently utilizing the pedestrian features of the non-occluded area and the relationship among the local features, and reducing the noise interference of the occluded area are the keys for improving the pedestrian re-identification precision in the occluded scene.
At present, when the problem of pedestrian re-identification shielding is solved in the academic world, a commonly used method is to detect key points of a human body through a posture estimator and use the key points as auxiliary information to guide a model to focus on a visual area of an image so as to distinguish a shielding area and an unshielded area of a pedestrian image and further reduce noise interference of the shielding area. However, these methods usually only directly use the local features of the pedestrian, and do not consider the relationship between the parts of the body, so that it is easy to confuse two different pedestrian pictures with similar features at the corresponding parts; in addition, some methods only adopt a single loss function to supervise the training process of the model, which will make the sample class information not be fully utilized and mined, and finally lead to the problem of reduced accuracy of pedestrian re-identification. Therefore, the invention provides a pedestrian re-identification method facing an occlusion scene.
Disclosure of Invention
In order to solve the defects of the technology, the invention provides a pedestrian re-identification method facing an occlusion scene, wherein a global contrast pooling module is introduced into a global feature branch, so that the average pooling characteristic and the maximum pooling characteristic are fused, and global features with stronger background noise and occlusion anti-interference performance are extracted; introducing an One-vs-rest relation module into the local feature branch, associating each part of local features with the rest of local features, and extracting the local features containing global information; and in the measurement learning stage, the training process of the model is supervised by combining three loss functions of cross entropy loss, sample-difficult sampling triple loss and central loss, so that the network model can extract pedestrian features with more discriminative power.
In order to solve the technical problems, the invention adopts the technical scheme that: a pedestrian re-identification method facing an occlusion scene comprises the following steps:
step S1: generating human body key points by means of a pre-trained attitude estimator, and combining global features to enable the model to pay more attention to the pedestrian region which is not shielded, so that noise interference caused by the shielded region is reduced;
step S2: a global contrast pooling module is introduced, so that noise interference caused by background clutter and shielding is reduced, and the information of the whole body area of the pedestrian is better expressed;
step S3: deeper feature extraction is carried out on local part block features through a One-vs-rest relation module, so that the features of each local layer contain information of the corresponding part and other body parts, and the relation among the body parts is better reflected;
step S4: and (3) supervising the model by adopting a mode of multi-loss function joint training, so that the model ensures the accuracy of the predicted label and considers the inter-class dispersion and the intra-class compactness.
Step S1 specifically includes the following steps:
step S11: inputting a pedestrian picture, namely an original picture, and generating a human body key point M through a pre-trained attitude estimatoriCoordinates (x) corresponding to each key pointi,yi) And confidence level
Figure RE-GDA0003454395450000031
Step S12: removing key points with low confidence coefficient, namely removing the confidence coefficient through a filtering mechanism
Figure RE-GDA0003454395450000032
Keeping the coordinates of the key points larger than the threshold value theta, and keeping the confidence coefficient
Figure RE-GDA0003454395450000033
And removing the coordinates of the key points smaller than the threshold value theta, wherein the filtering mechanism is shown as formula 1:
Figure RE-GDA0003454395450000034
wherein M isi、(xi,yi) And
Figure RE-GDA0003454395450000035
respectively representing the ith key point and the corresponding coordinate and confidence coefficient of the ith key point, wherein theta represents a threshold value in a filtering mechanism, and N represents the number of the key points;
step S13: mapping the key points to an original picture to generate a heat map, performing down-sampling by a bilinear interpolation method, and combining the down-sampling with global features to form attitude guide features;
step S14: splicing the posture guide features obtained through average pooling and maximum pooling with the global features after average pooling in the local feature branches, and reducing the dimension through a full-connection layer;
step S15: and performing label prediction on the extracted global features through a full connection layer and a softmax layer.
Step S2 specifically includes the following steps:
step S21: feature P of local block1~P6Obtaining P through global maximum pooling and global average poolingmaxAnd PavgSubtracting the two to obtain a difference characteristic PcontThe calculation process is shown in formula 2:
Pcont=Pmax-Pavgequation 2
Step S22: pmaxAnd PcontThe two parts of features are subjected to dimensionality reduction through a sub-network consisting of a 1 multiplied by 1 convolution layer, a BN normalization layer and a ReLU activation function layer respectively to obtain the features
Figure RE-GDA0003454395450000036
After splicing, the two are conveyed to a sub-network again to be reduced from 2c dimension to c dimension, and finally, the two are connected with
Figure RE-GDA0003454395450000041
Adding to obtain a representative global feature Q0The specific implementation process is shown in formula 3:
Figure RE-GDA0003454395450000042
wherein Q is0Represents the global contrast feature, CBR represents the sub-network consisting of the 1 × 1 convolution layer, the BN normalization layer, and the ReLU activation function layer, and Concat (·) represents the splicing operation.
Step S3 specifically includes the following steps:
step S31: with PiFor example, to PiOther local features P thanjPerforming global average pooling to obtain RiThe calculation process is shown in formula 4:
Figure RE-GDA0003454395450000043
wherein, PjRepresenting other partial local features except the ith partial local feature;
step S32: piAnd RiThe two parts of features are subjected to dimensionality reduction through a sub-network consisting of a 1 multiplied by 1 convolution layer, a BN normalization layer and a ReLU activation function layer respectively to obtain the features
Figure RE-GDA0003454395450000044
After splicing, the two are conveyed to a sub-network again to be reduced from 2c dimension to c dimension, and finally, the two are connected with
Figure RE-GDA0003454395450000045
Integration results in a local feature Q with global tiesiThe specific implementation process is shown in formula 5:
Figure RE-GDA0003454395450000046
wherein Q isiRepresenting local features with global ties, CBR representing a sub-network consisting of a 1 × 1 convolutional layer, a BN normalization layer, and a ReLU activation function layer, Concat (·) representing a splicing operation.
Step S4 specifically includes the following steps:
step S41: and (3) constraining the posture guidance features by adopting a cross entropy loss function, namely calculating the difference between the predicted label and the real label, wherein the calculation process is shown as a formula 6 and a formula 7:
Figure RE-GDA0003454395450000051
Figure RE-GDA0003454395450000052
wherein L isID_lossRepresents a cross entropy loss function, K represents the number of classes, c represents a certain class,
Figure RE-GDA0003454395450000053
qirepresentation characteristic qiOutput values after full connection layer classification; n and ynRespectively representing the number of input images of a small batch and the real label,
Figure RE-GDA0003454395450000054
representing each feature qiThe predictive tag of (a);
step S42: adopting three loss functions of a difficult sample sampling triple loss and a central loss to constrain global contrast characteristics and local relation characteristics;
the calculation process of the hard sample sampling triplet loss function is shown in equation 8:
Figure RE-GDA0003454395450000055
wherein N isKNumber of entities, N, representing small lotMRepresenting the number of images for each entity, alpha represents a threshold parameter that controls the distance of the positive and negative sample pairs in the feature space,
Figure RE-GDA0003454395450000056
respectively representing an anchor picture, a positive sample and a negative sample, and k and l respectively representing entity indexes; m and n represent image indexes;
the calculation process of the center loss function is shown in equation 9:
Figure RE-GDA0003454395450000061
wherein m represents the batch size,
Figure RE-GDA0003454395450000062
and yiRespectively representing the characteristics and labels of the ith picture in the batch data,
Figure RE-GDA0003454395450000063
indicates co-existence of yiA characteristic center point of sample data of the category;
step S43: combining the loss functions in the steps S41 and S42, jointly supervising the training process of the model, reducing the classification error, and simultaneously constraining the inter-class distance and the intra-class distance of the sample, so that the network model learns the characteristics with more discriminative power, the generalization capability of the network model is improved, and the calculation process is shown as a formula 10, a formula 11 and a formula 12:
LML=λLGCLR+(1-λ)LPGFequation 10
LPGF=LID_lossEquation 11
LGCLR=LTriHard+αLID_loss+βLCenter_lossEquation 12
Wherein L isMLRepresenting a joint loss function, LPGFCross entropy loss function, L, employed to represent pose guidance featuresGCLRThree loss functions, L, used to represent global pair bit and local relationship featuresTriHardRepresenting a hard sample sampling triplet loss function, LID_lossRepresenting the cross entropy loss function, LCenter_lossRepresenting a central loss function, and alpha, beta, and lambda represent weighting coefficients for balancing the losses of the respective portions.
According to the invention, the global feature branch generates the human body key points by means of the pre-trained attitude estimator, and the human body key points are mapped into the original feature map to generate the heat map, so that the model focuses more on the pedestrian region which is not shielded by combining the global feature, and the noise interference caused by the shielded region is reduced; a global contrast pooling module is introduced, so that noise interference caused by background clutter and shielding is reduced, and the information of the whole body area of the pedestrian is better expressed; in the local feature branch, the feature graph generated by the backbone network is horizontally segmented, local block features are obtained after the maximum pooling, and then deeper feature extraction is carried out on the local block features through a One-vs-rest relation module, so that the features of each local layer can contain the information of the corresponding part and other body parts, and the relation among the body parts is better reflected; for metric learning, a model is supervised by adopting a mode of multi-loss function joint training, so that the model ensures the accuracy of a predicted label and considers the inter-class dispersion and the intra-class compactness.
Drawings
Fig. 1 is a diagram of a network model architecture of the present invention.
FIG. 2 is a schematic diagram of pose keypoint generation according to the present invention.
FIG. 3 is a heat map of the present invention.
FIG. 4 is a diagram of the global contrast pooling process of the present invention.
FIG. 5 is a schematic diagram of One-vs-rest relationship module according to the present invention.
FIG. 6 is a diagram of a multiple loss fusion process of the present invention.
FIG. 7 is a CMC graph of the pedestrian re-identification method based on feature association improvement on Occluded-DukeMTMC data set in accordance with the present invention.
FIG. 8 is a CMC graph of the pedestrian re-identification method based on feature association and multi-loss fusion on Occluded-DukeMTMC data set.
FIG. 9 is a graph of the visual ranking results of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the pedestrian re-identification method facing the occlusion scene includes two branches:
1. global feature branching: generating human body key points by means of a pre-trained attitude estimator, mapping the human body key points into an original characteristic diagram, and combining global characteristics to enable a model to pay more attention to an unblocked pedestrian area so as to reduce noise interference caused by the blocked area; a global contrast pooling module is introduced, so that noise interference caused by background clutter and shielding is reduced, and the information of the whole body area of the pedestrian is better expressed;
2. local feature branching: the local part block features are extracted more deeply through a One-vs-rest relation module, so that the features of each local layer can contain the information of the corresponding part and other body parts, and the relation among the body parts is better reflected; for metric learning, a model is supervised by adopting a mode of multi-loss function joint training, so that the model ensures the accuracy of a predicted label and considers the inter-class dispersion and the intra-class compactness.
The invention improves and optimizes a base line method (PGFA), trains and tests an original model and an improved model on a data set, and thus selects the improved method with the highest recognition accuracy. Wherein, the identification precision is reflected on two evaluation indexes of CMC and mAP.
The invention is further illustrated by the following examples.
As shown in fig. 2, a pre-trained alphaphase pose estimator is used to generate human key points of an input picture and coordinates and confidence degrees corresponding to the key points, and key points with lower confidence degrees are filtered and removed, and a filtering mechanism is shown in formula 1; secondly, mapping the key points to an original picture to generate a heat map, performing down-sampling by a bilinear interpolation method, and combining the down-sampled heat map with global features to form attitude guide features, as shown in FIG. 3; then splicing the posture guide features obtained after average pooling and maximum pooling with the global features obtained after average pooling in the local feature branches, and reducing the dimension through a full-connection layer; finally, label prediction is carried out on the extracted global features through a full connection layer and a softmax layer (an output layer in a neural network).
Figure RE-GDA0003454395450000081
Wherein M isi、(xi,yi) And
Figure RE-GDA0003454395450000082
respectively representing the ith key point and the corresponding coordinate and confidence coefficient of the point, and a theta tableShowing a threshold value in a filtering mechanism, wherein N shows the number of key points;
as shown in fig. 4, the local blocking feature P1~P6Obtaining P through global maximum pooling and global average poolingmaxAnd PavgSubtracting the two to obtain a difference characteristic PcontThe calculation process is shown in formula 2:
Pcont=Pmax-Pavgequation 2
PmaxAnd PcontThe two parts of features are respectively reduced in dimension through a sub-network consisting of a 1 × 1 convolution layer, a BN (batch normalization) normalization layer and a ReLU (rectified Linear Unit) activation function layer to obtain the features
Figure RE-GDA0003454395450000091
After splicing, the two are conveyed to a sub-network again to be reduced from 2c dimension to c dimension, and finally, the two are connected with
Figure RE-GDA0003454395450000092
Adding to obtain a representative global feature Q0The specific implementation process is shown in formula 3:
Figure RE-GDA0003454395450000093
wherein Q is0Represents the global contrast feature, CBR represents the sub-network consisting of the 1 × 1 convolution layer, the BN normalization layer, and the ReLU activation function layer, and Concat (·) represents the splicing operation.
As shown in fig. 5, the local part block features are subjected to deeper feature extraction through the One-vs-rest relation module, so that the features of each local layer can contain information of the corresponding part and other body parts, and the relationship among the body parts is better reflected.
With PiFor example, to PiOther local features P thanjPerforming global average pooling to obtain RiThe calculation process is shown in formula 4:
Figure RE-GDA0003454395450000094
wherein, PjRepresenting other partial local features except the ith partial local feature;
Piand RiThe two parts of features are subjected to dimensionality reduction through a sub-network consisting of a 1 multiplied by 1 convolution layer, a BN normalization layer and a ReLU activation function layer respectively to obtain the features
Figure RE-GDA0003454395450000095
After splicing, the two are conveyed to a sub-network again to be reduced from 2c dimension to c dimension, and finally, the two are connected with
Figure RE-GDA0003454395450000101
Integration results in a local feature Q with global tiesiThe specific implementation process is shown in formula 5:
Figure RE-GDA0003454395450000102
wherein Q isiRepresenting local relational features with global ties, CBR representing a sub-network consisting of a 1 × 1 convolutional layer, a BN normalization layer, and a ReLU activation function layer, and Concat (·) representing a splicing operation.
As shown in fig. 6, the training process of the model is supervised by adopting a combined training mode of three loss functions, namely cross entropy loss, sample-difficult triple loss and central loss, so that the inter-class distance and the intra-class distance of the samples are constrained while the classification error is reduced, the network model learns the characteristics with more discriminative power, and the generalization capability of the network model is improved. The calculation process of the cross entropy loss function is shown in formula 6 and formula 7:
Figure RE-GDA0003454395450000103
Figure RE-GDA0003454395450000104
wherein L isID_lossRepresents a cross entropy loss function, K represents the number of classes, c represents a certain class,
Figure RE-GDA0003454395450000105
qirepresentation characteristic qiOutput values after full connection layer classification; n and ynRespectively representing the number of input images of a small batch and the real label,
Figure RE-GDA0003454395450000106
representing each feature qiThe predictive tag of (a);
the calculation process of the hard sample sampling triplet loss function is shown in equation 8:
Figure RE-GDA0003454395450000111
wherein N isKNumber of entities, N, representing small lotMRepresenting the number of images for each entity, alpha represents a threshold parameter that controls the distance of the positive and negative sample pairs in the feature space,
Figure RE-GDA0003454395450000112
respectively representing an anchor picture, a positive sample and a negative sample, and k and l respectively representing entity indexes; m and n represent image indexes;
the calculation process of the center loss function is shown in equation 9:
Figure RE-GDA0003454395450000113
wherein m represents the batch size,
Figure RE-GDA0003454395450000114
and yiRespectively representing the characteristics and labels of the ith picture in the batch data,
Figure RE-GDA0003454395450000115
indicates co-existence of yiA characteristic center point of sample data of the category;
the finally adopted combined loss function consists of a cross entropy loss function, a sample-sampling-difficult triple loss function and a central loss function, and the calculation process is shown as a formula 10, a formula 11 and a formula 12:
LML=λLGCLR+(1-λ)LPGFequation 10
LPGF=LID_lossEquation 11
LGCLR=LTriHard+αLID_loss+βLCenter_lossEquation 12
Wherein L isMLRepresenting a joint loss function, LPGFCross entropy loss function, L, employed to represent pose guidance featuresGCLRThree loss functions, L, used to represent global pair bit and local relationship featuresTriHardRepresenting a hard sample sampling triplet loss function, LID_lossRepresenting the cross entropy loss function, LCenter_lossRepresenting a central loss function, and alpha, beta, and lambda represent weighting coefficients for balancing the losses of the respective portions.
The experiment was implemented on an NVIDIA TITAN V GPU (computer graphics model) using a PyTorch deep learning framework. Adopting ResNet50 with the average pooling layer and the full-connection layer removed as a backbone network, and setting the convolution step length of the last layer to be 1; in the training process, data enhancement is carried out on an input image through random overturning and random erasing operations, the size of the image is adjusted to be 384 multiplied by 128, the total number of rounds is set to be 60, and the batch size is set to be 16; the initial learning rate is set to 0.01, and the learning rate is automatically multiplied by 0.1 to be attenuated every 20 rounds; optimizing by random gradient descent algorithm, and setting weight attenuation parameter to 5 × 10-4Momentum is set to 0.9; three weighting coefficients of alpha, beta and lambda are respectively set to be 2, 5 multiplied by 10-4And 0.2. The PyTorch is a Python-based continuous computing package, providing two high-level functions: 1. tensor computation with powerful GPU acceleration (e.g.NumPy). 2. A deep neural network comprising an automatic derivation system. Resnet is an abbreviation for Residual Network (Residual Network), a family of networks widely used in the field of object classification and the like and as part of the classical neural Network of the computer vision task backbone, typical networks being Resnet50, Resnet101 and the like.
The method trains and tests on Occluded-DukeMTMC data set. The Occluded-DukeMTMC data set is a large-scale data set specially constructed by Miao et al for the pedestrian re-identification problem in an occlusion scene, wherein a training set comprises 702 entities and 15618 images, and a test set comprises 1110 entities, 2210 images of an Occluded pedestrian to be queried and 17661 images of a candidate pedestrian.
The evaluation indexes of the invention adopt CMC and mAP which are commonly used in retrieval tasks. The CMC refers to an accumulated matching curve, the abscissa is k, and the ordinate is Rank-k and is used for representing the probability that the front k bits in the matching result hit the target to be queried; the mAP refers to average mean value precision, the value range is [0, 1], and compared with the CMC evaluation index, the result can better reflect the performance quality of the pedestrian re-identification model. The larger the mAP index is, the larger the number of positive samples retrieved by the model is, and the higher the ranking of the positive samples is, the better the performance of the model is.
TABLE 1 Performance of pedestrian re-identification method on Occluded-DukeMTMC dataset based on feature correlation improvement
Figure RE-GDA0003454395450000131
Carrying out an ablation experiment on the pedestrian re-identification method based on characteristic correlation improvement, wherein PGFA represents a baseline method, OURS _ IDloss (R) represents that a One-vs-rest relation module is independently introduced, OURS _ IDloss (G) represents that a global contrast pooling module is independently introduced, and OURS _ IDloss represents that the global contrast pooling module and the One-vs-rest relation module are simultaneously introduced. As can be seen from Table 1, the Rank-1 and mAP of the method of independently introducing the One-vs-rest relation module are respectively improved by 2.7% and 2.2%; the Rank-1 and mAP of the method of separately introducing the global contrast pooling module are not improved; the method of introducing the global contrast pooling module and the One-vs-rest relation module can achieve higher re-identification precision, and Rank-1 and mAP are respectively improved by 3.1% and 2.7%.
The cumulative matching curve of the pedestrian re-identification method on the Occluded-DukeMTMC data set based on the characteristic association improvement is visually reflected by introducing a global contrast pooling module and an One-vs-rest relation module simultaneously as shown in FIG. 7, so that the pedestrian characteristics with higher discriminative power and fine granularity can be extracted by the model, and higher re-identification precision is realized.
TABLE 2 Performance of pedestrian re-identification method on Occluded-DukeMTMC dataset based on feature association and multiple loss fusion
Figure RE-GDA0003454395450000132
Performing an ablation experiment on a pedestrian re-identification method based on feature association and multi-loss fusion, wherein PGFA represents a baseline method, OURS _ IDloss represents that measurement learning is performed only by adopting a cross entropy loss function, OURS _ IDloss + TriHard represents that measurement learning is performed by simultaneously adopting the cross entropy loss function and a difficult sample sampling triplet loss function, and OURS _ IDloss + TriHard + Centerlos represents that measurement learning is performed by simultaneously adopting three loss functions of cross entropy loss, difficult sample sampling triplet loss and central loss. As can be seen from Table 2, Rank-1 and mAP of the method using cross entropy loss and sample-refractory triplet loss are respectively improved by 2.1% and 3.3%; meanwhile, Rank-1 and mAP of the method using cross entropy loss, sample-sampling-difficulty triplet loss and center loss are respectively improved by 3.5% and 4.2%.
The cumulative matching curve of the pedestrian re-identification method based on feature association and multi-loss fusion on the Occluded-DukeMTMC data set is shown in FIG. 8, so that the monitoring of the training process of the model by simultaneously using three loss functions of cross entropy loss, sample-difficultly-sampled triple loss and central loss is intuitively reflected, and higher re-identification precision can be realized.
As shown in fig. 9, a pedestrian image is randomly selected as a target to be queried, the improved method is tested, and a visual sorting result is returned. The Query is a target to be queried, and the image with the T/F letter superscript is a Query result returned by the candidate library. T represents that the query result and the target to be queried belong to the same entity; f represents that the query result and the target to be queried do not belong to the same entity. It can be seen that the improved pedestrian re-identification method can return a better ranking result. The PGFA (Pose-Guided feed Alignment) represents a baseline method, OURS _ IDloss (R) represents that an One-vs-rest relation module is independently introduced, OURS _ IDloss (G) represents that a global contrast pooling module is independently introduced, OURS _ IDloss represents that the global contrast pooling module and the One-vs-rest relation module are simultaneously introduced, OURS _ IDloss + Trid represents that a cross entropy loss and a difficult sample sampling triplet loss function are adopted while the global contrast pooling module and the One-vs-rest relation module are introduced, and OURS _ IDloss + Hard + Centerlos represents that a cross entropy loss, a difficult sample sampling triplet loss and a central loss function are adopted while the global contrast pooling module and the One-vs-rest relation module are introduced. As can be seen from fig. 9, the present invention has effectiveness and advancement in solving the problem of pedestrian re-identification occlusion, and can achieve higher re-identification accuracy.
The above embodiments are not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make variations, modifications, additions or substitutions within the technical scope of the present invention.

Claims (5)

1. A pedestrian re-identification method facing an occlusion scene is characterized in that: the method comprises the following steps:
step S1: generating human body key points by means of a pre-trained attitude estimator, and combining global features to enable the model to pay more attention to the pedestrian region which is not shielded, so that noise interference caused by the shielded region is reduced;
step S2: a global contrast pooling module is introduced, so that noise interference caused by background clutter and shielding is reduced, and the information of the whole body area of the pedestrian is better expressed;
step S3: deeper feature extraction is carried out on local part block features through a One-vs-rest relation module, so that the features of each local layer contain information of the corresponding part and other body parts, and the relation among the body parts is better reflected;
step S4: and (3) supervising the model by adopting a mode of multi-loss function joint training, so that the model ensures the accuracy of the predicted label and considers the inter-class dispersion and the intra-class compactness.
2. The occlusion scene-oriented pedestrian re-identification method according to claim 1, wherein: the step S1 specifically includes the following steps:
step S11: inputting a pedestrian picture, namely an original picture, and generating a human body key point M through a pre-trained attitude estimatoriCoordinates (x) corresponding to each key pointi,yi) And confidence level
Figure RE-FDA0003454395440000011
Step S12: removing key points with low confidence coefficient, namely removing the confidence coefficient through a filtering mechanism
Figure RE-FDA0003454395440000012
Keeping the coordinates of the key points larger than the threshold value theta, and keeping the confidence coefficient
Figure RE-FDA0003454395440000013
And removing the coordinates of the key points smaller than the threshold value theta, wherein the filtering mechanism is shown as formula 1:
Figure RE-FDA0003454395440000014
wherein M isi、(xi,yi) And
Figure RE-FDA0003454395440000015
respectively represent the ith key point and theCorresponding coordinates and confidence degrees, theta represents a threshold value in a filtering mechanism, and N represents the number of key points;
step S13: mapping the key points to an original picture to generate a heat map, performing down-sampling by a bilinear interpolation method, and combining the down-sampling with global features to form attitude guide features;
step S14: splicing the posture guide features obtained through average pooling and maximum pooling with the global features after average pooling in the local feature branches, and reducing the dimension through a full-connection layer;
step S15: and performing label prediction on the extracted global features through a full connection layer and a softmax layer.
3. The occlusion scene-oriented pedestrian re-identification method according to claim 1, wherein: the step S2 specifically includes the following steps:
step S21: feature P of local block1~P6Obtaining P through global maximum pooling and global average poolingmaxAnd PavgSubtracting the two to obtain a difference characteristic PcontThe calculation process is shown in formula 2:
Pcont=Pmax-Pavgequation 2
Step S22: pmaxAnd PcontThe two parts of features are subjected to dimensionality reduction through a sub-network consisting of a 1 multiplied by 1 convolution layer, a BN normalization layer and a ReLU activation function layer respectively to obtain the features
Figure RE-FDA0003454395440000021
And
Figure RE-FDA0003454395440000022
after splicing, the two are conveyed to a sub-network again to be reduced from 2c dimension to c dimension, and finally, the two are connected with
Figure RE-FDA0003454395440000023
Adding to obtain a representative global feature Q0The concrete implementation process is shown in formula 3:
Figure RE-FDA0003454395440000024
Wherein Q is0Represents the global contrast feature, CBR represents the sub-network consisting of the 1 × 1 convolution layer, the BN normalization layer, and the ReLU activation function layer, and Concat (·) represents the splicing operation.
4. The occlusion scene-oriented pedestrian re-identification method according to claim 1, wherein: the step S3 specifically includes the following steps:
step S31: with PiFor example, to PiOther local features P thanjPerforming global average pooling to obtain RiThe calculation process is shown in formula 4:
Figure RE-FDA0003454395440000031
wherein, PjRepresenting other partial local features except the ith partial local feature;
step S32: piAnd RiThe two parts of features are subjected to dimensionality reduction through a sub-network consisting of a 1 multiplied by 1 convolution layer, a BN normalization layer and a ReLU activation function layer respectively to obtain the features
Figure RE-FDA0003454395440000032
And
Figure RE-FDA0003454395440000033
after splicing, the two are conveyed to a sub-network again to be reduced from 2c dimension to c dimension, and finally, the two are connected with
Figure RE-FDA0003454395440000034
Integration results in a local feature Q with global tiesiThe specific implementation process is shown in formula 5:
Figure RE-FDA0003454395440000035
wherein Q isiRepresenting local features with global ties, CBR representing a sub-network consisting of a 1 × 1 convolutional layer, a BN normalization layer, and a ReLU activation function layer, Concat (·) representing a splicing operation.
5. The occlusion scene-oriented pedestrian re-identification method according to claim 1, wherein: the step S4 specifically includes the following steps:
step S41: and (3) constraining the posture guidance features by adopting a cross entropy loss function, namely calculating the difference between the predicted label and the real label, wherein the calculation process is shown as a formula 6 and a formula 7:
Figure RE-FDA0003454395440000036
Figure RE-FDA0003454395440000037
wherein L isID_lossRepresents a cross entropy loss function, K represents the number of classes, c represents a certain class,
Figure RE-FDA0003454395440000041
representation characteristic qiOutput values after full connection layer classification; n and ynRespectively representing the number of input images of a small batch and the real label,
Figure RE-FDA0003454395440000042
representing each feature qiThe predictive tag of (a);
step S42: adopting three loss functions of a difficult sample sampling triple loss and a central loss to constrain global contrast characteristics and local relation characteristics;
the calculation process of the hard sample sampling triplet loss function is shown in equation 8:
Figure RE-FDA0003454395440000043
wherein N isKNumber of entities, N, representing small lotMRepresenting the number of images for each entity, alpha represents a threshold parameter that controls the distance of the positive and negative sample pairs in the feature space,
Figure RE-FDA0003454395440000044
and
Figure RE-FDA0003454395440000045
respectively representing an anchor picture, a positive sample and a negative sample, and k and l respectively representing entity indexes; m and n represent image indexes;
the calculation process of the center loss function is shown in equation 9:
Figure RE-FDA0003454395440000046
wherein m represents the batch size,
Figure RE-FDA0003454395440000047
and yiRespectively representing the characteristics and labels of the ith picture in the batch data,
Figure RE-FDA0003454395440000048
indicates co-existence of yiA characteristic center point of sample data of the category;
step S43: combining the loss functions in the steps S41 and S42, jointly supervising the training process of the model, reducing the classification error, and simultaneously constraining the inter-class distance and the intra-class distance of the sample, so that the network model learns the characteristics with more discriminative power, the generalization capability of the network model is improved, and the calculation process is shown as a formula 10, a formula 11 and a formula 12:
LML=λLGCLR+(1-λ)LPGFequation 10
LPGF=LID_lossEquation 11
LGCLR=LTriHard+αLID_loss+βLCenter_lossEquation 12
Wherein L isMLRepresenting a joint loss function, LPGFCross entropy loss function, L, employed to represent pose guidance featuresGCLRThree loss functions, L, used to represent global pair bit and local relationship featuresTriHardRepresenting a hard sample sampling triplet loss function, LID_lossRepresenting the cross entropy loss function, LCenter_lossRepresenting a central loss function, and alpha, beta, and lambda represent weighting coefficients for balancing the losses of the respective portions.
CN202111484998.3A 2021-12-07 2021-12-07 Pedestrian re-identification method oriented to occlusion scene Pending CN114022686A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111484998.3A CN114022686A (en) 2021-12-07 2021-12-07 Pedestrian re-identification method oriented to occlusion scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111484998.3A CN114022686A (en) 2021-12-07 2021-12-07 Pedestrian re-identification method oriented to occlusion scene

Publications (1)

Publication Number Publication Date
CN114022686A true CN114022686A (en) 2022-02-08

Family

ID=80067980

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111484998.3A Pending CN114022686A (en) 2021-12-07 2021-12-07 Pedestrian re-identification method oriented to occlusion scene

Country Status (1)

Country Link
CN (1) CN114022686A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824695A (en) * 2023-06-07 2023-09-29 南通大学 Pedestrian re-identification non-local defense method based on feature denoising

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956560A (en) * 2016-05-06 2016-09-21 电子科技大学 Vehicle model identification method based on pooling multi-scale depth convolution characteristics
CN111339903A (en) * 2020-02-21 2020-06-26 河北工业大学 Multi-person human body posture estimation method
KR20200121206A (en) * 2019-04-15 2020-10-23 계명대학교 산학협력단 Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof
CN113408492A (en) * 2021-07-23 2021-09-17 四川大学 Pedestrian re-identification method based on global-local feature dynamic alignment
CN113469006A (en) * 2021-06-24 2021-10-01 浙江华巽科技有限公司 Pedestrian attribute identification method based on graph convolution
CN113642515A (en) * 2021-08-30 2021-11-12 北京航空航天大学 Pedestrian recognition method and device based on attitude association, electronic equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956560A (en) * 2016-05-06 2016-09-21 电子科技大学 Vehicle model identification method based on pooling multi-scale depth convolution characteristics
KR20200121206A (en) * 2019-04-15 2020-10-23 계명대학교 산학협력단 Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof
CN111339903A (en) * 2020-02-21 2020-06-26 河北工业大学 Multi-person human body posture estimation method
CN113469006A (en) * 2021-06-24 2021-10-01 浙江华巽科技有限公司 Pedestrian attribute identification method based on graph convolution
CN113408492A (en) * 2021-07-23 2021-09-17 四川大学 Pedestrian re-identification method based on global-local feature dynamic alignment
CN113642515A (en) * 2021-08-30 2021-11-12 北京航空航天大学 Pedestrian recognition method and device based on attitude association, electronic equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙义博,王蓉: "基于特征关联和多损失融合的行人再识别方法", 《中国科技论文》, 23 March 2022 (2022-03-23) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824695A (en) * 2023-06-07 2023-09-29 南通大学 Pedestrian re-identification non-local defense method based on feature denoising

Similar Documents

Publication Publication Date Title
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN112200111B (en) Global and local feature fused occlusion robust pedestrian re-identification method
CN110532920B (en) Face recognition method for small-quantity data set based on FaceNet method
CN106846361B (en) Target tracking method and device based on intuitive fuzzy random forest
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
CN110070066A (en) A kind of video pedestrian based on posture key frame recognition methods and system again
CN110728694B (en) Long-time visual target tracking method based on continuous learning
CN110163117B (en) Pedestrian re-identification method based on self-excitation discriminant feature learning
CN113239907B (en) Face recognition detection method and device, electronic equipment and storage medium
CN112801051A (en) Method for re-identifying blocked pedestrians based on multitask learning
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
CN112668544A (en) Pedestrian re-identification method based on hard sample confusion and feature robustness enhancement
CN113343247A (en) Biological characteristic identification counterattack sample attack safety evaluation method, system, device, processor and computer readable storage medium thereof
CN117935299A (en) Pedestrian re-recognition model based on multi-order characteristic branches and local attention
CN112507924A (en) 3D gesture recognition method, device and system
Kadim et al. Deep-learning based single object tracker for night surveillance.
CN115223239A (en) Gesture recognition method and system, computer equipment and readable storage medium
CN115376159A (en) Cross-appearance pedestrian re-recognition method based on multi-mode information
Li et al. Egocentric action recognition by automatic relation modeling
CN114022686A (en) Pedestrian re-identification method oriented to occlusion scene
CN113378620B (en) Cross-camera pedestrian re-identification method in surveillance video noise environment
CN108257148B (en) Target suggestion window generation method of specific object and application of target suggestion window generation method in target tracking
Budiarsa et al. Face recognition for occluded face with mask region convolutional neural network and fully convolutional network: a literature review
CN113298037B (en) Vehicle weight recognition method based on capsule network
CN113032612B (en) Construction method of multi-target image retrieval model, retrieval method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination