CN112101150B - Multi-feature fusion pedestrian re-identification method based on orientation constraint - Google Patents

Multi-feature fusion pedestrian re-identification method based on orientation constraint Download PDF

Info

Publication number
CN112101150B
CN112101150B CN202010901241.9A CN202010901241A CN112101150B CN 112101150 B CN112101150 B CN 112101150B CN 202010901241 A CN202010901241 A CN 202010901241A CN 112101150 B CN112101150 B CN 112101150B
Authority
CN
China
Prior art keywords
orientation
pedestrian
samples
network
different
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010901241.9A
Other languages
Chinese (zh)
Other versions
CN112101150A (en
Inventor
艾明晶
单国志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202010901241.9A priority Critical patent/CN112101150B/en
Publication of CN112101150A publication Critical patent/CN112101150A/en
Application granted granted Critical
Publication of CN112101150B publication Critical patent/CN112101150B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-feature fusion pedestrian re-identification method based on orientation constraint, and provides a new network model aiming at factors such as orientation change, local shielding and the like. The influence of orientation difference is mostly ignored in the existing research on pedestrian re-identification, and the method can preferentially identify the target with the same orientation as the direction of the query image while ensuring the accuracy. Firstly, designing a pedestrian orientation classifier, and marking the orientation of a pedestrian in a picture; and then inputting the picture into a two-branch convolutional neural network, extracting global features and local features of the pedestrian and carrying out constraint training, wherein one branch processes samples with the same orientation, and the other branch processes samples with different orientations. According to the method, a mixed loss function of orientation constraint is designed at the same time, three parts of loss learning network weights are combined, and the accuracy is effectively improved. Experiments prove that the invention respectively achieves 94.71 percent and 87.31 percent of rank-1 accuracy on the mark-1501 and Duke MTMC-ReID data sets, and the average level is superior to that of most methods.

Description

Multi-feature fusion pedestrian re-identification method based on orientation constraint
Technical Field
The invention relates to the field of computer vision and image processing, in particular to a multi-feature fusion pedestrian re-identification method based on orientation constraint (figure 1). The pedestrian re-identification method based on the two-branch orientation constraint network mainly overcomes the adverse effect of orientation difference on the re-identification through the two-branch orientation constraint network model, solves the problem of local shielding through a method of fusing global features and local features, and finally can preferentially identify the pedestrian target with the same orientation as an inquiry image on the premise of ensuring the accuracy rate of the re-identification of the pedestrian, and the re-identification result is more accurate and tidy and better accords with the actual application scene.
Background
Pedestrian Re-identification (ReID) is a technique for determining whether a specific pedestrian exists in an image or a video by using a computer vision technique, and belongs to a sub-problem of image retrieval. Due to the application value of the method in the fields of video monitoring and security, the method becomes a research hotspot in recent years. In 2006, the pedestrian re-identification technology was first separated from target tracking and studied as an independent visual topic. To date, the research methods are mainly divided into two categories: traditional methods based on manual features and deep learning methods based on neural networks. Before 2014, pedestrian re-identification mainly utilizes a traditional image processing means to extract low-level color features, texture features and medium-level attribute features, but because the features are easily interfered by external environment and are not enough in distinguishing degree, high accuracy cannot be obtained all the time.
In recent years, the wide application of deep learning technology in the field of computers has led to breakthrough development of the technology. But this is still a very challenging issue due to the effects of local occlusion, pose variations, orientation differences, illumination and resolution. According to different research emphasis, the pedestrian re-identification method based on deep learning can be generally divided into metric learning and feature extraction. In particular, the pedestrian orientation-based approach is most relevant to the work of the present invention, and the related research background thereof is described in turn as follows.
(1) Method for learning based on measurement
The goal of metric learning is to make the maximum distance between samples belonging to the same class smaller than the minimum distance between samples of different classes. In deep learning, the main focus of implementing metric learning is how to design the corresponding loss function. At the beginning of research, pedestrian re-identification is simply regarded as a classification problem, a plurality of pictures belonging to the same pedestrian are taken as a category, a full connection layer is connected to the tail end of a Convolutional Neural Network (CNN), then the pictures are converted into probability distribution through a softmax function, and finally training is carried out through cross entropy loss.
With the continuous research of people, the metric learning method directly maps the pictures of the same pedestrian to a high-dimensional space to form a clustering effect, namely, the pictures of the same pedestrian are regarded as a positive sample pair, the pictures of different pedestrians are regarded as a negative sample pair, and the essence of the method is that the distance between the positive sample pair and the high-dimensional space is smaller than that between the negative sample pair. Typical metric learning methods include contrast loss, triplet loss, quadruplet loss, and the like.
In 2015, the related scholars proposed a triple loss function in the research on face recognition, which becomes a typical loss measurement method, and a schematic diagram thereof is shown in fig. 2. Wherein (Anchor, Positive) is a Positive sample pair, and (Anchor, Negative) is a Negative sample pair. Through continuous iteration of the training process, the distance between the positive sample pairs is gradually reduced, and the distance between the negative samples is continuously enlarged, so that the clustering purpose is achieved, as shown in formula (1). (reference 1: Schroff, Florian, Kalenecheko, Dnitry, James. faceNet: A unifired Embedding for Face Recognition and clarification [ J ],2015.)
L tri =[d(a,p)-d(a,n)+α] + (1)
Figure BDA0002659831580000021
[x] + =max(x,0) (3)
Wherein d represents the distance between two feature vectors, generally adopting Euclidean distance, as shown in formula (2). [ x ] of] + Represents the larger value between x and 0, as shown in equation (3). The reference sample Anchor is denoted by the letter a, p denotes the Positive sample pair Positive, n denotes the Negative sample pair Negative, and α is the threshold for controlling the sample distance.
In 2016, Cheng et al proposed an improved triple loss function based on the thought of adding absolute distance, so that the re-recognition performance was greatly improved. (reference 2: D.Cheng, Y.H.Gong, S.P.Zhou, J.J.Wang and N.N.Zheng, "Person re-identification by multi-channel part-based CNN with improved triple function," in Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, Las Vegas, NV, USA: IEEE, pages 1335-1344,2016.)
In 2017, Cheng et al proposed a quadruple loss, which is a negative sample picture more than the triplet, namely: reference sample a, positive sample p, negative sample n1, and n 2. As shown in equation (4), the former term is called strong pushing and the latter term is called weak pushing. The quadruple directly considers the absolute distance between positive and negative samples by adding weak push, so that the model can learn better characteristics. (reference 3: W.H.Chen, X.T.Chen, J.G.Zhang and K.Q.Huang, "Beyond triple loss: a deep rectangle network for person re-identification," in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA: IEEE,2017.)
L q =[d(a,p)-d(a,n)+α] + +[d(a,p)-d(n1,n2)+β] + (4)
Where n1 denotes the first negative sample, n2 denotes the second negative sample, and α and β are threshold parameters.
In 2017, Hermans et al propose a Hard sample sampling (Hard sample mining) method for re-identifying the relation of input samples for pedestrians. The basic idea is that the selection of the sample pairs is as difficult as possible, and the sample which is the least similar to (farthest from) the reference sample anchor is selected as a positive sample in the pictures of the same person in a training batch, and the sample which is the most similar to (closest to) the reference sample anchor is selected as a negative sample in the pictures of other persons. The difficult sample triples obtained in the way obviously improve the generalization performance. (reference 4: A. Hermans, L.Beyer and B.Leibe, "In feedback of the triple loss for person re-identification," arXiv preprinting arXiv:1703.07737,2017.)
In addition, Xiao et al further propose a boundary mining penalty based on the combination of the quadruple and the hard sample sampling advantage. The metric learning-based method is based on the triplets and becomes the most widely applied method in the similarity measurement of pedestrian re-identification at present, and fig. 3 can summarize the evolution process of the metric loss.
(2) Method based on feature expression
In general, the feature description of an image is divided into three layers: low-level color features, medium-level attribute features, depth features. At present, the main approach is to extract deep features based on deep neural networks. Initially, the CNN-based method mainly extracts global features of a whole pedestrian picture, and with the progress of research, it is widely recognized that a good degree of distinction cannot be achieved only by using the global features, so that the semantic feature-based method and the local feature-based method become current research hotspots.
A global feature based approach. In the early stage of research, some methods directly use classical models such as ResNet and GooleNet to extract global features from a whole pedestrian picture, such as: in 2017, Sun et al propose an SVDNet pedestrian re-identification network, and iteratively optimize a converged network model by using a singular value decomposition full-connection layer weight mode. (reference 5: Y.Sun, L.ZHEN, W.Deng and S.Wang, "SVDNet for Pedestrian Retrieval," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, pages 3820-. (reference 6: H.Luo, Y.Gu, X.Liao, S.Lai, and W.Jiang, "Bags of Tricks and A Strong Baseline for Deep Person Re-identification," arXiv preprint arXiv:1903.07071,2019.)
Methods based on semantic features. The method is gradually developed along with the research of human body posture estimation, and the main idea is to acquire a local Region of Interest (ROI) by utilizing skeleton key point positioning or image semantic segmentation and obtain complex feature representation by combining global features. In 2017, a Spindle Net method proposed by ZHao et al is a representative study based on semantic features, and the method firstly extracts 14 human body key points by using a posture detection model, then divides 7 ROI areas by using the key points, and enters the same CNN network with an original picture to extract features. (reference 7: H.Y.ZHao, M.Q.Tian, S.Y.Sun, J.Shao, J.J.Yan and S.Yi et al, "Spindle net: person re-identification with human body region defined feature extraction and fusion," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA: IEEE, pages907-915,2017.) in addition, the GLAD (Global-local alignment descriptor) model proposed by Wei et al is a comparative classical semantic feature extraction method. (reference document 8: L.Wei, S.Zhang, H.Yao, W.Gao and Q.Tian, "GLAD: Global-Local-Alignment Descriptor for Scalable Person Re-Identification," in IEEE Transactions on Multimedia, vol.21, No.4, pages 986-
A local feature based approach. Although it is reasonable to extract local features according to semantic image division, it is not always necessary, especially because the effect of the human body posture estimation at present is not ideal, and errors are introduced by wrong posture estimation. Therefore, many studies are now conducted to divide the pedestrian image horizontally or vertically and then to align the parts by using a certain strategy. In 2018, the PCB method proposed by the university of qinghua and the AlignedReID method proposed by the academy of spaciousness are two typical methods. The PCB divides the picture into 6 parts from top to bottom, then 6 local features are obtained by utilizing horizontal pooling, and then each feature passes through a full connection layer and respectively calculates a cross entropy loss representing learning. (reference 9: Y.Sun, et al. "Beyond part models: Person retrieved with retrieved part powder)," in Proceedings of the European Conference on Computer Vision (ECCV),2018.) the AlignedReID also horizontally divides the picture into 8 parts, then extracts features separately using CNNs and implements local alignment using the shortest path method. (reference 10: X.Zhang, H.Luo, X.Fan, W.L.Xiao, Y.X.Sun and Q.Xiao et al, "AlignedReID: reproducing man-level performance in person re-identification," arXiv predicting arXiv:1711.08184,2017.)
(3) Orientation and view-based method
Most of current researches concern the important factor of local shading, but the consideration of factors such as orientation change is relatively less, so that a network model cannot adapt to complex orientation change, and misjudgment of a network can be caused under the condition that the orientation difference is obvious. The following analyses the relevant research work based on orientation changes. In 2018, DVAML methods attempted to learn feature spaces between samples in the same orientation and in different orientations, but their methods did not achieve very high accuracy. (reference 11: P. Chen, X. xu and C. Deng, "Deep view-aware metric learning for person re-identification," in Proceedings of the Twenty-Seveth Interval Joint reference on Intelligent analysis, pages 620 626, International Joint reference on Intelligent analysis Organization,2018.) Sun et al generated a large virtual pedestrian data set PersonX using the Unity engine and quantitatively analyzed the impact of orientation on pedestrian re-identification, which is a great inspiration for the present invention. (reference document 12: X.Sun and L.Zheng, "separating person re-identification from the viewpoint of view-point," arXiv preprinting arXiv:1812.02162,2018.)
In summary, most of the existing pedestrian re-identification technologies solve common occlusion problems by fusing global and local features in an inter-class constraint training mode, but relatively few researches on orientation differences are performed, and particularly, misjudgment of a network can be caused under the condition that samples of the same person in different orientations have obvious differences or samples of different pedestrians in the same orientation are very similar. The method mainly solves the problem, can improve the identification accuracy, and can preferentially identify the sample with the same direction as the query image during retrieval.
Disclosure of Invention
The invention aims to overcome the influence of factors such as pedestrian orientation difference, local shielding and the like on the ReiD technology, and the accuracy and the reliability of pedestrian re-identification are improved through the provided pedestrian orientation classification algorithm and the pedestrian re-identification network model, so that a research basis is provided for target tracking and other computer vision tasks.
The invention mainly researches the pedestrian re-identification problem from the direction difference angle. As shown in fig. 4, from an orientation perspective, one of the main factors affecting the ReID technique is: the samples of the same person in different orientations have a lower similarity (e.g., a and b in fig. 4 are images of the same pedestrian in different orientations), while different pedestrians with the same orientation are sometimes similar (e.g., c, d, e, f in fig. 4 are images of different pedestrians in the same orientation). From the aspect of feature similarity, the distance between the same person image should be smaller than the distance between different person images. Meanwhile, for the same person, the distance of the image with the same orientation should be smaller than that of the image with different orientation, which is reasonable because the same direction with the person should have the highest similarity.
Based on the consideration, the invention provides an effective pedestrian orientation classification method for judging the orientation of a pedestrian in a picture, and provides an orientation constraint-based multi-feature fusion pedestrian re-recognition network model on the basis of the orientation, which comprises two different branches, wherein the two different branches are used for respectively processing samples with the same and different orientations, each branch is fused with two global and local features to represent the pedestrian, and finally, three loss functions are fused for constraint training, so that accurate pedestrian re-recognition is realized, the target with the same orientation as the retrieved picture can be preferentially recognized, and the practical value is very high.
As shown in fig. 5, a represents original pedestrian sample data, where different colors represent different pedestrians and different shapes represent different body orientations. The pedestrian re-identification network maps each pedestrian sample to a high-dimensional space, the result obtained by the method without considering the orientation factor can be represented by a graph b, only the inter-class distance is considered, and the result of the invention is shown by a graph c, so that different pedestrians can be distinguished, and meanwhile, the clustering of the view angle level can be formed, and the model can preferentially identify the pedestrians in the same orientation.
Next, the main content of the present invention will be described in detail, which specifically includes the following steps:
the method comprises the following steps: design pedestrian orientation classifier based on multi-feature fusion
Orientation information (front, back, left and right) is an inherent attribute feature of a pedestrian image, which greatly influences the discrimination capability of a re-identification network, but the existing ReID data set does not mark the attribute when being collected. Therefore, the invention firstly designs a pedestrian orientation classifier based on multi-feature fusion for accurately judging the orientation of the pedestrian in the picture.
The orientation classification is actually a multi-classification task, and in order to improve the classification accuracy, the invention designs an orientation classification network model as shown in fig. 6. As shown in the figure, for a pedestrian image, 18 joint key points of the pedestrian are extracted by a PAFs method (openpos human pose key point extraction network), and the 18 key points can approximately describe the contour of the pedestrian and can also accurately obtain the coordinate position of each key point in the image. Secondly, the whole pedestrian image can be divided into three body parts of a head, an upper body and a lower body by transversely dividing the coordinates.
The entire pedestrian image and the three body parts form the input of a convolutional neural network that extracts the features of the four parts of the image using convolutional modules (here, the ResNet50 network) and then concatenates the resulting feature vectors to form a combined vector for the final pedestrian representation. And finally, adding a full connection layer at the tail end of the network, performing four classifications by using a softmax loss function, and continuously training and iterating to obtain a final classification result.
The method has the advantages that the global features and the three body local features are fused, the robustness of the features is enhanced, and particularly, the local features of the head and the feet can distinguish orientation differences better under certain conditions. The network is trained using the RAP dataset because the RAP dataset is tagged with an orientation, which can simplify the tagging cost. It should be noted that the classifier is pre-trained using a training set, and the weights of the network are not changed in the implementation of the subsequent steps.
Finally, the orientation classification network is used for labeling the orientation information of the two large-scale pedestrian re-identification data sets Market1501 and DukeMTMC-reiD, and a foundation is provided for orientation constraint in subsequent pedestrian re-identification. Experiments prove that the method has good performance effect on pedestrian orientation identification and is superior to most methods.
Step two: sampling difficult samples based on pedestrian orientation, selecting triplets for training
As shown in equation (1), the triplet loss is the most widely used measurement method, but the training process depends on the selection manner of the triplet sample to a great extent, and the excessively simple triplet is not favorable for the learning of the image features. Practice proves that the sampling of the hard samples based on the training batch (batch) is a relatively effective triple selection strategy, namely, for each training batch, P pedestrians are randomly selected, K different pictures are randomly selected in the image of each pedestrian, namely, P × K images are contained in one batch, and then the most difficult positive sample (the most dissimilar) and the most difficult negative sample (the most similar) are selected for each picture to form a triple.
The second step of the invention is to add the consideration of the orientation of the pedestrian to the widely used sampling strategy of the difficult sample, and provide a sampling strategy of the difficult sample based on the orientation of the pedestrian for selecting the training triples. This is based on the simple assumption that the distance between differently oriented samples of the same person is greater than the distance between the same oriented samples, i.e. samples of the same person in the same direction should be more similar.
Specifically, on the basis of sampling of difficult samples in batches, P pedestrians are randomly selected for each training batch, but the K pictures of each pedestrian are not simply randomly selected, and the K pictures are ensured to have samples with the same orientation and also have samples with different orientations. For example, in the experiment, if K is 4, when selecting a sample for each pedestrian, two different directions (divided into four directions, namely front, rear, left and right directions) are selected first, and then two samples of the pedestrian are selected from each direction, so that a training batch composed of the samples with different orientations is included.
The advantages of this orientation-based sample selection strategy are: when positive and negative samples are taken for each pedestrian, it is more likely that a positive sample is found that is less similar to it (i.e., a sample that is oriented differently from it). In order to prove the correctness of the strategy, the invention uses a mark-1501 data set to perform experimental verification on the basis of BaseLine. The result proves that under the condition that other strategies are not changed, the accuracy of the two performance indexes of mAP and rank-1 is improved by about 0.7% only by changing the sample selection strategy. The verification of the strategy also lays a foundation for establishing a re-recognition network model based on orientation constraint.
Step three: multi-feature pedestrian re-identification network design based on orientation constraint
On the basis of the orientation judgment and the sampling strategy verification of the first two steps, the third step of the method designs a specific pedestrian re-identification network model. The method is divided into the following three aspects:
3.1 design of network architecture
The overall network structure diagram of the present invention is shown in fig. 1, and the main purpose of the present network is to overcome the influence of orientation difference on re-identification, that is, to make similar samples belonging to the same orientation but different classes more separable, and to make samples belonging to different orientations but the same class closer. The network is a two-branch structure, and can map a sample to two different feature spaces simultaneously, wherein each feature space corresponds to one network branch. Wherein each branch is designed with different mixing loss functions, the network design enables the first branch (called the same orientation branch) to focus on samples with the same orientation mainly, and the second branch (called the different orientation branch) to adapt to the change situation of samples with different orientations better.
First, based on the triple sample sampling method based on the pedestrian orientation proposed in step two, the network selects a batch of input images (N images, where N is P K). Then, the trained orientation classifier is used for judging the orientation of the pedestrian in each picture to obtain a corresponding orientation label. Thus, each pedestrian picture can be represented by the triplet (I, Y, O), where I represents the image, Y represents the ID of the pedestrian, i.e. to which pedestrian it belongs, and O represents the orientation label of the pedestrian.
As shown in fig. 1, for the input image of the batch, the network first extracts its simple features through a shared convolution network module, and the shared module can extract common color, attribute, texture and other features due to the convolution modules belonging to the lower layers. Then, two branch convolution networks are added after the shared convolution module to map the samples into two different high-dimensional subspaces. The invention uses ResNet50 which is most commonly used at present as a backbone network, and the structure of the backbone network has obvious hierarchy, so that a sharing layer and a branch layer can be easily divided. After testing, the first layer is used as a sharing module, the last three layers are used as branches, and finally, the upper layer network and the lower layer network respectively output a feature vector of N x d. Thus, each picture is equivalent to extracting two different features, and 2N features are output in total.
The two different feature spaces are different mainly in that the first branch only selects triads with the same pedestrian orientation, and the second branch only selects triads with different pedestrian orientations. Therefore, due to the difference of the triplets, the upper branch and the lower branch have respective emphasis, and the change of the orientation difference can be better adapted. Different triplets represent different training strategies, and are introduced below mainly for the feature fusion strategy used in the network and the training strategy of each branch.
3.2 Multi-feature fusion strategy
The design of the network structure is mainly designed by considering orientation factors, which is the main innovation of the invention. In order to improve the distinguishing degree of the features, the invention also adopts a multi-feature fusion mode when each branch extracts the image features.
The combination of the global feature and the local feature can better describe the pedestrian feature of an image and can overcome the problem of occlusion of body parts to a certain extent. Therefore, when extracting image features, each branch of the network does not simply extract a global representation, but also adopts a mode of combining global features and local features. By introducing local features, the expression capacity of the features can be enhanced on one hand; on the other hand, especially in the same orientation branch, the negative sample pair (a pair of images not belonging to the same pedestrian) of the same orientation is likely to be very similar, and the local features can capture some detail differences, thereby better distinguishing the positive and negative samples.
In order to avoid complicated semantic division, the invention refers to a simple and effective horizontal segmentation method AlignedReID, as shown in FIG. 7, the AlignedReID method also horizontally segments the picture into 8 parts, and then uses CNN to respectively extract features. However, in order to solve the alignment problem of different parts, for each 8 local features of two pictures, the aligndreid calculates the distance matrix between them, and then finds a shortest path from the starting point to the end point, where the total distance of the shortest path is the final distance of the two pictures. Similarly, the body part is obtained through horizontal division, and automatic alignment is realized through a dynamic planning method based on the shortest path. In the training stage of the network, the triples are directly mined through global features, but the local feature triples are added in the upper branch and the lower branch for auxiliary training, and only the global features are still used in the testing stage. This further improves the performance of the model.
3.3 network training strategy
In the practice of convolutional neural networks, the key after the network structure is determined is the selection of a loss function, and the characteristics of the loss function are directly related to the learning effect of the network. The triple loss can consider the relative distance between a positive sample and a negative sample, the softmax classification loss can learn the distribution of the samples in the feature space, and the center loss can enable the samples of the same class to be close to the class center. Many studies show that the three kinds of loss joint training can achieve better effect, so that most of the current researches adopt a mixed training strategy.
The hybrid training strategy adopted in the present invention is described in detail below, and is composed of three parts, namely, same-orientation branches, different-orientation branches and cross constraints. This is one of the key factors that the present invention can achieve in addition to the network structure.
Branching in the same orientation. For the same-orientation branch, only the samples with the same orientation are selected from one batch to form a triplet, that is, the samples with the same orientation are selected when both positive and negative samples are selected for the current training sample (since each batch is selected according to the orientation-based sampling strategy set forth in step two, the samples with the same orientation must exist). The selected triples have obvious advantages, can learn certain pedestrian orientation information while learning pedestrian identity information, and reduce the complexity of re-identification to a certain extent, so that samples which belong to the same pedestrian and have the same orientation are gathered more closely, and the apparent characteristics of different pedestrians in the same orientation are very similar, therefore, the negative samples have certain complexity and are more representative. Here, a denotes anchor, p denotes positive samples, n denotes negative samples, s denotes the same orientation, and d denotes a different orientation. As shown in formula (5):
Figure BDA0002659831580000091
wherein ps represents the same pedestrian in the same orientation, ns represents different pedestrians in different orientations, and the rest characters have the same meaning as formula (1).
Meanwhile, in order to better learn feature distribution, the strategy adds softmax classification loss and center loss on the basis of the triples. However, it should be noted that, since the same-orientation branches only consider the same-orientation samples, the pedestrians and the directions are not classified according to the id of the pedestrian, but are considered as a combined label, and one orientation of one person is classified into one category, for example, M pedestrians are shared, and four directions are classified into M × 4 different categories. Both softmax loss and center loss are performed at this combined classification level. As shown below, equation (6) represents a softmax loss, and equation (7) represents a central loss.
Figure BDA0002659831580000092
Figure BDA0002659831580000093
Wherein N represents the size of batch, f i Feature vector representing the ith image, f i (k) The kth dimension of the ith feature vector is represented, label _ i represents the combination label (id) of id and orientation, and M × T represents the number of classes to be classified, and is also the length of the feature vector obtained after the layers are all connected. C label_i Representing class centers for class i.
The final total loss for the same orientation branch is made up of three parts (as shown in equation 8). It can be seen that the purpose of these three parts of lost training is consistent, and they together make the same-orientation samples with people form a good clustering effect in the feature space. In fact, even though this branch only considers samples of the same orientation, the rank-1 accuracy obtained on Market-1501 by performing experiments using this branch alone is already over 90%. Where λ represents a weighting factor for the center loss.
L same =L triSame +L ceSame +λL center (8)
Branching in different orientations. It is clear that the first branch ignores the relationship between differently oriented samples, where the second branch is used to account for the training of differently oriented samples, thus making up for the deficiencies of the first branch. Due to the different orientations, even the same person tends to have very different apparent characteristics. Thus, the construction of this branch triplet selects samples of different orientations, i.e., when the triplet is selected, both positive and negative samples select images of different orientations than the training samples. The purpose is to allow samples of different orientations to get more attention and thus to be able to widen the inter-class distance.
Similar to equation (5), the triplets of differently oriented branches are represented as equation (9):
Figure BDA0002659831580000094
where pd represents a sample of different orientations of the same person, nd represents different pedestrians in different orientations, and β is the distance threshold.
In order to take into account the distribution and the intra-class distance of the samples, the softmax loss and the center loss are also used, but since the branch only considers samples with different orientations, the branch is classified only according to the id of the sample, and M persons have M categories, which are different from the branch with the same orientation. The formula (10) is a softmax loss formula, and the central loss is the same as the formula (7).
Figure BDA0002659831580000101
Wherein M represents the number of classes of pedestrians, and the rest characters have the same meaning as formula (6).
Finally, it can be obtained that the loss of the branches with different orientations is also the sum of three parts, as shown in formula (11), which compensates for the relation of the samples with different orientations that are not considered by the first branch.
L diff =L triDiff +L ceDiff +λL center (11)
And (5) cross constraint training. The first two branches respectively consider the relation between samples based on orientation constraints. In order to consider the overall distribution rule of the sample, the training strategy adds cross constraint among branches, and is still mainly based on triple loss, such as formula (12) and formula (13).
Figure BDA0002659831580000102
Equation (12) is also a triplet loss function, where θ is the distance threshold. For training sample a, the positive sample selected is different from its orientation, while the negative sample is consistent with its orientation. With this option, it is ensured that the distance of the positive samples obtained in the two branches is always smaller than the distance of the negative samples, thus organically combining the training of the two branches.
Figure BDA0002659831580000103
Equation (13) does not consider negative samples, but only the relative relationship between positive samples, so it is an intra-class constraint. This option selects a relatively small separation threshold δ that ensures that within a class (i.e., between samples of the same pedestrian), the distance between samples of the same orientation is less than the distance between samples of different orientations. This is somewhat meaningful because with such an intra-class constraint, when retrieved from a query image, samples can often be preferentially obtained that are oriented the same, thereby increasing the probability of being the same person.
In summary, when the network is trained by combining the above three losses in the training phase, the total loss function can be represented by equation (14), where μ is a weight parameter and can take a relatively small value.
L Total =L same +L diff +L cross +μL intra (14)
Since the distance between the features of the sample image represents the similarity between the two images, it can be converted into distance calculation in the testing stage. Similarly, firstly, judging the orientation of the sample image through a designed orientation classifier, and if the sample image is the sample with the same orientation, calculating the Euclidean distance by using the features obtained by the first branch; if the samples are of different orientations, the distance is calculated using the samples of the second branch. And finally, fusing to form a distance matrix, and performing performance test on the test gallery based on the distance matrix.
In general, the main contribution of the invention is to design a pedestrian re-recognition neural network model based on orientation constraint and multi-feature fusion, and belongs to the more classical subject in the field of computer vision. The pedestrian re-identification method is based on multiple factors influencing the accuracy rate of pedestrian re-identification, comprehensively considers the problems of pedestrian orientation difference, local shielding and the like, respectively considers pedestrians in the same orientation and different orientations, overcomes the influence of the orientation difference on the re-identification to a certain extent, and has certain innovation because the proposed orientation constraint scheme is not involved in other existing pedestrian re-identification methods.
In addition, in cooperation with the proposed pedestrian re-identification method, the invention also provides a sampling strategy for selecting the triple sample based on the pedestrian orientation information and a classifier design scheme for judging the pedestrian orientation, which are also one of the main contributions of the invention. Experimental results prove that the pedestrian re-identification scheme provided by the invention is superior to most of the existing methods in identification accuracy and is more applicable in practical scenes.
Drawings
Fig. 1 is an overall structure diagram of the orientation constraint re-identification network proposed by the present invention, which is described in detail in step three.
Fig. 2 is a schematic diagram of a triplet loss function.
Fig. 3 is a diagram illustrating the evolution of the metric loss function.
Fig. 4 is an example of pedestrian image contrast for different orientations.
Fig. 5 is a schematic diagram of the recognition effect of the network model of the present invention.
FIG. 6 is a proposed global and local feature based orientation classifier of the present invention.
FIG. 7 is a schematic diagram of an AlignedReiD method based on local feature alignment.
FIG. 8 is a schematic view of vector angles for orientation classification based on pose joint points.
Fig. 9 is an example of a data set sample used in the experiment.
FIG. 10 is an example of the results of an experiment of the present invention on a data set.
Fig. 11 is a schematic diagram of a distance curve obtained in the training process of the proposed network model.
Detailed Description
The technical scheme, the experimental method and the test result of the invention are further described in detail with reference to the accompanying drawings and specific experimental embodiments.
The invention relates to a pedestrian re-identification subject in the field of computer vision, and provides a multi-feature fusion pedestrian re-identification method based on orientation constraint.
The experimental procedure is specifically described below.
The method comprises the following steps: the data set is sorted (taking mark-1501 as an example), orientation judgment is carried out on each image in the data set by adopting a mode (method III) based on combination of global features and local features, and orientation labels are marked.
Step two: and (3) constructing a two-branch convolutional neural network, realizing a corresponding loss function, inputting a training set sample into the network for training, observing the training condition, and continuously iterating to obtain a training model.
Step three: and testing according to the training result, searching the pedestrian images with the same id as the query image from the galery library for each query image in the query to form a result sequence, and simultaneously calculating to obtain a corresponding evaluation index.
The experimental conditions and conclusions of this patent are described in detail below.
(1) Experimental results of pedestrian orientation classifier
To test the accuracy of the orientation classifier proposed in the present invention, the present invention performed comparative experiments based on the RAP dataset and the other two methods.
The first method is to classify based on the relative positions of posture joint points, firstly extract joint key points of each pedestrian image based on a PAFs method, then select two joint points of a left shoulder and a right shoulder from the extracted joint key points to form a left-to-right vector, and finally solve a clockwise included angle between the vector and the vertical direction (from top to bottom). The pedestrian orientation can be judged by the range of the included angle (with 45 degrees as classification intervals), as shown in fig. 8.
And the second method directly uses the convolutional neural network ResNet50 for training to realize the four-point classification of the pedestrian images.
The third method is a classification method based on the fusion of global features and local features.
The classification accuracy and performance of the three pedestrian orientation classification methods are compared through experiments, the experimental results are shown in table 1, and obviously, the method provided by the invention has certain advantages in accuracy.
TABLE 1 comparison of Performance of three pedestrian orientation classifiers
Method Description of the method Accuracy (%)
Method 1 Classification based on relative positions of posture joint points (mathematical method) 82.07
Method two Global feature based on pedestrian image classification using CNN 87.33
Method III Classification based on global and local feature combinations (invention) 89.03
(2) Pedestrian re-identification data set and evaluation index
The test data sets and evaluation indices used in the ReID experiments are presented next. As shown in FIG. 9, the proposed method of the present invention was tested on both large public data sets, Market-1501 and DukeMTMC-ReiD. The Market-1501 includes 1501 pedestrians shot by 6 cameras and 32668 detected rectangular frames of the pedestrians, the training set comprises 751 persons and 12,936 images, and on average, each person has 17.2 pieces of training data; the test set had 750 persons, contained 19732 images, and an average of 26.3 test data per person. DukeMTMC-ReID is a pedestrian re-recognition subset of the pedestrian tracking DukeMTMC dataset that contains a total of 36411 pictures of 1404 pedestrians, of which 16522 images of the 702 pedestrians were used for training and the remaining images were used for testing.
In the task of pedestrian re-identification, the testing process usually gives a (or a group of) image to be queried (query), then calculates similarity between the image and images in a candidate set (galery) according to a model, and arranges the images in a sequence from large to small according to the similarity, wherein the closer the image is to the query image. In order to evaluate the performance of the pedestrian re-identification algorithm, the current practice is to calculate the corresponding index on the public data set, and then compare with other models. CMC curves (relational Material metrics) and mAP (mean Average precision) are the two most commonly used evaluation criteria.
In the experiment, rank-1, rank-5 and mAP indexes which are most commonly used in a CMC curve are mainly selected, wherein rank-k refers to the probability that the top k-sheet (with the highest confidence) in the search results has correct results, and the mAP index is actually equivalent to an average level, and the higher the mAP index is, the higher the query results which are the same person as the query are in the whole ranking list is, the higher the model effect is.
(3) ReID experimental details and main parameter configuration
In the experiment, the present invention used ResNet50 as the backbone network, the first layer and all layers before it (layer) as shared modules, and the last three layers as branch modules (not sharing weights). Setting the convolution step length (stride) of the last layer as 1, obtaining the final 2048-dimensional features through global average pooling, and adding a batch normalization layer and a full connection layer for calculating classification loss.
For all input data, the method resets all image sizes to 256 × 128 and the batch size to 128, including 32 pedestrians and 4 pictures per pedestrian (N-128, P-32, K-4). The images were then randomly augmented and cropped, and each image was Randomly Erased (REA) with a probability of 0.5. It should be noted that: when distinguishing between left and right orientations, horizontal flipping is not possible, as this would change the orientation of the pedestrian.
In training, the network is trained for 120 generations, and the initial learning rate is set to be 3.5X 10 -4 The first 10 generations were trained using a learning rate warm-up (warmup) strategy, followed by a reduction of the learning rate by 0.1 times the original at generations 35, 75, and 95, respectively. In the design of the loss function, corresponding distance intervals and weight parameters are selected through experiments, and the following steps are performed in sequence: α is 1, β is 0.7, θ is 0.7, δ is 0.001,λ=0.0005,μ=0.1。
(4) Re-recognition of network experimental results
Based on the evaluation indexes and the experimental details, the method is tested based on two training sets, and corresponding experimental results are obtained. As shown in tables 2 and 3, the experiment compared the method of the present invention with other network models that are currently more advanced. In comparison, some methods most closely related to the present invention were selected, including: global feature based methods, metric learning methods, pose based methods, horizontal segmentation local methods. In particular, a comparison is also made with the closest viewing-angle-based correlation method. Wherein, RR represents that the search result is reordered.
TABLE 2 comparison of results with other methods (market-1501 data set)
Name of method Rank-1(%) Rank-5(%) mAP(%)
PCB 92.3 97.2 77.4
AlignedReID 91.8 97.1 79.3
PIE 87.33 95.56 69.25
GLAD 89.9 - 73.9
Spindle 76.9 91.5 -
HA-CNN 91.2 - 75.7
TriHard 86.67 93.38 81.07
HPM 94.2 97.5 82.7
PGR 93.87 97.74 77.21
OSCNN 83.9 - 73.5
ours 94.71 98.06 84.11
ours+RR 94.87 98.30 92.71
TABLE 3 comparison of results with other methods (DukeMTMC-ReiD data set)
Name of method Rank-1(%) Rank-5(%) mAP(%)
PCB 81.7 89.7 66.1
AlignedReID 81.2 - 67.4
PIE 80.84 88.30 64.09
HA-CNN 80.5 - 63.8
HPM 86.6 - 74.3
PGR 83.63 91.66 65.98
SVDNet 76.7 - 56.8
Ours 87.31 93.54 73.20
Ours+RR 90.63 94.25 87.67
In order to further prove the effectiveness of the network structure and the training strategy provided by the invention, an ablation experiment is designed based on a Market-1501 data set. Firstly, using ResNet50 as a backbone network, training the network by using the combination of triple loss and cross entropy loss, and obtaining a test result as BaseLine.
Next, the sample selection mode of each batch is changed from random selection to orientation-based selection strategy, and the loss function and other parameters of the network are kept unchanged. After repeated experiments, the strategy was found to bring about a 0.7% improvement for rank-1 and mAP.
Then, experiments considering only the same orientation and different orientation branches, and testing with each branch alone, found that each branch alone was relatively poor, which was expected because the individual branches only considered a single orientation combination, missing many representative triples.
Finally, on the basis of two-branch co-training, the experiment verifies the effects of cross-constraint and introduction of local features, and the result proves that the cross-constraint is very effective because the cross-constraint makes negative samples in the same orientation and positive samples in different orientations more separable.
Specific ablation test results are shown in table 4.
TABLE 4 comparison of ablation test results
Figure BDA0002659831580000151
Through the comparison experiment and the ablation experiment, the method is superior to the existing method in the accuracy rate of pedestrian re-identification. Meanwhile, because the method adds the intra-class constraint in the cross space constraint, a tiny interval exists between samples of the same pedestrian in different directions, namely the samples of the same pedestrian in the same direction are closer. This is very significant in practical applications, such as in object tracking where two objects are very similar, it may be preferable to identify pedestrians that are heading toward the same, and this is often true. As shown in fig. 10, some examples of searching on the data set are given, wherein the leftmost image represents the query image query, and the following five images are the first five images obtained by searching in turn, and the similarity of the images is arranged from high to low, so that the effectiveness of the invention can be verified visually.
As shown in fig. 11, in the data set training process, the present invention records four distance relationships between samples on two data sets, namely, the distance between the same person in the same orientation (curve a in the figure), the distance between the same person in different orientations (curve B in the figure), the distance between different pedestrians in different orientations (curve C), and the distance between different pedestrians in the same orientation (curve D). The relative relation of the four distance curves can represent the training process and the purpose of the method, and has certain descriptive significance.
In conclusion, the invention provides a multi-feature fusion pedestrian re-identification method based on orientation constraint. Different orientation combinations are concerned through a two-branch re-identification network model, the influence of orientation difference on re-identification is overcome, the accuracy of the proposed network on mark-1501 and Duke MTMC-ReiD data sets of rank-1 is 94.71% and 87.31% respectively, and the average level is superior to that of most methods at present. Meanwhile, the invention provides a pedestrian orientation classifier based on multi-feature fusion and an orientation-based sample selection strategy, orientation information of two data sets is marked according to the orientation information, and important influence of orientation change and local shielding on pedestrian re-identification is proved again by fusing global and local features. In particular, the method of the invention can preferentially identify pedestrians with the same orientation during retrieval, and can provide some references for further analysis of orientation factors and construction of future pedestrian re-identification data sets.

Claims (2)

1. A multi-feature fusion pedestrian re-identification method based on orientation constraint is characterized by comprising the following steps:
respectively processing samples with the same orientation and different orientations through an orientation constraint network model of two branches, fusing two features of global and local in each branch to represent pedestrians, and finally fusing three parts of loss joint constraint training to obtain network weight parameters; the method comprises the following implementation steps:
s1, designing an orientation constraint network:
the orientation constraint network is a two-branch network structure, a pedestrian sample can be simultaneously mapped to two different feature spaces, each feature space corresponds to one network branch, and different mixing loss functions are designed for each branch; the network design enables a first branch to mainly focus on samples with the same orientation, a second branch to be more suitable for the change situation of the samples with different orientations, and the two branches are jointly trained to enable a network model to have good adaptability to the orientation of the pedestrian, wherein the first branch is called as the same-orientation branch, and the second branch is called as a different-orientation branch;
firstly, the network selects N pedestrian images of a batch, wherein N is P and K, P represents P pedestrians in the batch, and each pedestrian selects K different samples; then, judging the orientation of the pedestrian in each picture by using the trained orientation classifier to obtain a corresponding orientation label; thus, each pedestrian picture can be represented by the three group of (I, Y, O), where I represents the image, Y represents the ID of the pedestrian, i.e. which pedestrian belongs to, and O represents the orientation label of the pedestrian;
for the input images of the batch, firstly extracting simple features of the orientation constraint network through a shared convolution network module, wherein the simple features comprise common color, attribute and texture features; then, two branch convolution networks are added after the shared convolution module, and the samples are mapped to two different high-dimensional subspaces through the learning of weight parameters; dividing ResNet50 into a sharing layer and a branch layer by taking the ResNet50 as a backbone network; the first layer of convolution of ResNet50 is used as a sharing module, the last three layers are used as branches, and finally, the upper layer and the lower layer of networks respectively output a feature vector with dimension of N x d; extracting two different features from each pedestrian image, and outputting 2N features in total;
after obtaining 2N characteristics, the network selects different types of triples based on orientation in each branch, and trains and iterates continuously based on the orientation-constrained mixed loss function to obtain the final network weight;
s2, a difficult sample sampling strategy based on the orientation of the pedestrian:
in the network training process, P pedestrians are randomly selected in each training batch, K different pictures are randomly selected in the image of each pedestrian, the random selection of the K pictures is based on the orientation, the samples with the same orientation and the samples with different orientations in the K pictures need to be ensured, and the sampling diversity is ensured; after network mapping, selecting positive and negative sample pairs for each image to form a triple and participate in the calculation process of mixing loss; specifically, for a certain pedestrian image a, selecting the most difficult positive sample and the most difficult negative sample in the batch, wherein the most difficult positive sample is the image with the largest difference between the vector distances of the positive samples and the image a, namely the positive sample which is the least similar to a, and the most difficult negative sample is the image with the smallest difference between the vector distances of the negative samples and the image a, namely the negative sample which is the most similar to a;
s3, network joint training strategy:
on the specific training strategy, the network adopts a mode of joint training of three loss functions of triple loss, softmax classification loss and central loss, and constraint designs of orientation factors are added on different branches; for branches with the same orientation, only samples with the same orientation are selected from one batch to form a triple, namely, samples with the same orientation are selected when positive and negative samples are selected for the current training sample, so that the network ignores the influence of orientation information while learning the identity information of the pedestrian; a represents a certain pedestrian in the batch, p represents a positive sample of the same pedestrian as a, n represents a negative sample of a pedestrian different from a, s represents the same orientation, and d represents a different orientation, and the formula is as follows:
Figure FDA0003707184520000021
wherein ps represents the same pedestrian in the same orientation, ns represents different pedestrians but with the same orientation, d represents the Euclidean distance between two feature vectors, and α represents the distance threshold of the triplet loss;
for classification loss, the pedestrians and the orientation are taken as combined labels by the same orientation branch, and for M pedestrians, the four different orientations of the front, the rear, the left and the right are divided into M x 4 different categories; the formula (2) is a softmax loss, and the formula (3) is a central loss; where N represents the number of images of the batch, f i Feature vector representing the ith image, f i (k) A kth dimension representing an ith feature vector, label _ i represents a pedestrian ID and a combined label (ID) of orientation, M × T represents the number of classified categories, and is also the length of the feature vector obtained after the layers are all connected; c label_i Class center representing class i;
Figure FDA0003707184520000022
Figure FDA0003707184520000023
finally, the final total loss for the same orientation branch consists of three parts;
L same =L triSame +L ceSame +λL center (4)
wherein λ represents a weighting factor of the center loss;
for the branches with different orientations, mainly considering the training of samples with different orientations, the composition of the triples selects samples with different orientations, and the triples of the branches with different orientations are represented as formula (5):
Figure FDA0003707184520000024
where pd represents a sample of different orientations of the same person, nd represents different pedestrians in different orientations, and β is also the distance threshold; the difference is that only the ID of the pedestrian sample is classified on the design of classification loss, and M individuals have M categories; the formula (6) is a softmax loss formula of branches with different directions, and the central loss is the same as the formula (3);
Figure FDA0003707184520000031
wherein M represents the number of classes of pedestrians, and the meanings of the rest characters are the same as the formula (2);
finally, the sum of the losses of the branches in different directions can be obtained, and the formula (7) shows;
L diff =L triDiff +L ceDiff +λL center (7)
meanwhile, cross constraint among branches is added in training, and the triple loss is still mainly based on the triple loss, as shown in the formula (8) and the formula (9);
Figure FDA0003707184520000032
equation (8) is also a triplet loss function, where θ is the distance threshold; for training sample a, the positive sample selected is different from its orientation, and the negative sample is consistent with its orientation;
Figure FDA0003707184520000033
the formula (9) does not consider the negative samples, but only considers the relative relation between the positive samples to form the intra-class constraint; setting an interval threshold value delta so that the distance between samples with the same orientation is smaller than the distance between samples with different orientations in the same class, namely, between a plurality of samples of the same pedestrian;
L Total =L same +L diff +L cross +μL intra (10)
the overall loss function can be represented by equation (10), where μ is a weighting parameter.
2. The pedestrian re-identification method based on multi-feature fusion of orientation constraints as claimed in claim 1, which proposes a pedestrian orientation classification method based on fusion of global features and local features, characterized in that: combining global features and local features of the pedestrian image to make pedestrians in different orientations more separable, wherein the orientations comprise front, back, left and right; the method comprises the following implementation steps:
for a pedestrian image, firstly extracting 18 joint key points of the pedestrian by using a PAFs method, and using the key points to describe the contour of the pedestrian; meanwhile, the coordinate position of each key point in the image can be accurately obtained;
secondly, transversely dividing the whole pedestrian image into three body parts, namely a head part, an upper body part and a lower body part, so that the whole pedestrian image and the three body parts form a plurality of inputs of a convolution neural network, extracting the characteristics of the four parts of images by using convolution modules respectively, using a ResNet50 network, and then splicing the obtained characteristic vectors to form a combined vector for final pedestrian representation;
and finally, adding a full connection layer at the tail end of the network, performing four-classification by using a softmax loss function, and continuously training and iterating to obtain a final classification result.
CN202010901241.9A 2020-09-01 2020-09-01 Multi-feature fusion pedestrian re-identification method based on orientation constraint Active CN112101150B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010901241.9A CN112101150B (en) 2020-09-01 2020-09-01 Multi-feature fusion pedestrian re-identification method based on orientation constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010901241.9A CN112101150B (en) 2020-09-01 2020-09-01 Multi-feature fusion pedestrian re-identification method based on orientation constraint

Publications (2)

Publication Number Publication Date
CN112101150A CN112101150A (en) 2020-12-18
CN112101150B true CN112101150B (en) 2022-08-12

Family

ID=73757029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010901241.9A Active CN112101150B (en) 2020-09-01 2020-09-01 Multi-feature fusion pedestrian re-identification method based on orientation constraint

Country Status (1)

Country Link
CN (1) CN112101150B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613480A (en) * 2021-01-04 2021-04-06 上海明略人工智能(集团)有限公司 Face recognition method, face recognition system, electronic equipment and storage medium
CN114821629A (en) * 2021-01-27 2022-07-29 天津大学 Pedestrian re-identification method for performing cross image feature fusion based on neural network parallel training architecture
CN113011440B (en) * 2021-03-19 2023-11-28 中联煤层气有限责任公司 Coal-bed gas well site monitoring and re-identification technology
CN113011429B (en) * 2021-03-19 2023-07-25 厦门大学 Real-time street view image semantic segmentation method based on staged feature semantic alignment
CN113159142B (en) * 2021-04-02 2024-02-20 杭州电子科技大学 Loss function variable super-parameter determination method for fine-granularity image classification
CN113095263B (en) * 2021-04-21 2024-02-20 中国矿业大学 Training method and device for pedestrian re-recognition model under shielding and pedestrian re-recognition method and device under shielding
CN113688776B (en) * 2021-09-06 2023-10-20 北京航空航天大学 Space-time constraint model construction method for cross-field target re-identification
CN113723345B (en) * 2021-09-09 2023-11-14 河北工业大学 Domain self-adaptive pedestrian re-identification method based on style conversion and joint learning network
CN113642547B (en) * 2021-10-18 2022-02-11 中国海洋大学 Unsupervised domain adaptive character re-identification method and system based on density clustering
CN114299542A (en) * 2021-12-29 2022-04-08 北京航空航天大学 Video pedestrian re-identification method based on multi-scale feature fusion
CN114937231B (en) * 2022-07-21 2022-09-30 成都西物信安智能系统有限公司 Target identification tracking method
CN115661688B (en) * 2022-10-09 2024-04-26 武汉大学 Unmanned aerial vehicle target re-identification method, system and equipment with rotation invariance
CN115661722B (en) * 2022-11-16 2023-06-06 北京航空航天大学 Pedestrian re-identification method combining attribute and orientation
CN115631464B (en) * 2022-11-17 2023-04-04 北京航空航天大学 Pedestrian three-dimensional representation method oriented to large space-time target association
CN116403269B (en) * 2023-05-17 2024-03-26 智慧眼科技股份有限公司 Method, system, equipment and computer storage medium for analyzing occlusion human face
CN117238039B (en) * 2023-11-16 2024-03-19 暗物智能科技(广州)有限公司 Multitasking human behavior analysis method and system based on top view angle

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017000115A1 (en) * 2015-06-29 2017-01-05 北京旷视科技有限公司 Person re-identification method and device
CN106709449A (en) * 2016-12-22 2017-05-24 深圳市深网视界科技有限公司 Pedestrian re-recognition method and system based on deep learning and reinforcement learning
CN108960127A (en) * 2018-06-29 2018-12-07 厦门大学 Pedestrian's recognition methods again is blocked based on the study of adaptive depth measure
US20180374233A1 (en) * 2017-06-27 2018-12-27 Qualcomm Incorporated Using object re-identification in video surveillance

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017000115A1 (en) * 2015-06-29 2017-01-05 北京旷视科技有限公司 Person re-identification method and device
CN106709449A (en) * 2016-12-22 2017-05-24 深圳市深网视界科技有限公司 Pedestrian re-recognition method and system based on deep learning and reinforcement learning
US20180374233A1 (en) * 2017-06-27 2018-12-27 Qualcomm Incorporated Using object re-identification in video surveillance
CN108960127A (en) * 2018-06-29 2018-12-07 厦门大学 Pedestrian's recognition methods again is blocked based on the study of adaptive depth measure

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"A Fast Mode Decision Algorithm for Multiview Video Coding";Mingjing Ai 等;《IEEE》;20101018;全文 *
"Orientation-Guided Similarity Learning for Person Re-identification";Na Jiang 等;《IEEE》;20181129;全文 *
"Person Re-identification by Features Fusion";Wan Xin 等;《IEEE》;20160905;全文 *

Also Published As

Publication number Publication date
CN112101150A (en) 2020-12-18

Similar Documents

Publication Publication Date Title
CN112101150B (en) Multi-feature fusion pedestrian re-identification method based on orientation constraint
CN107832672B (en) Pedestrian re-identification method for designing multi-loss function by utilizing attitude information
Bai et al. Group-sensitive triplet embedding for vehicle reidentification
Qu et al. RGBD salient object detection via deep fusion
Guo et al. Efficient and deep person re-identification using multi-level similarity
Sarfraz et al. Deep view-sensitive pedestrian attribute inference in an end-to-end model
Liu et al. Matching-cnn meets knn: Quasi-parametric human parsing
Bashir et al. Vr-proud: Vehicle re-identification using progressive unsupervised deep architecture
Wang et al. A survey of vehicle re-identification based on deep learning
CN108520226B (en) Pedestrian re-identification method based on body decomposition and significance detection
CN113408492B (en) Pedestrian re-identification method based on global-local feature dynamic alignment
CN111507217A (en) Pedestrian re-identification method based on local resolution feature fusion
CN110263712B (en) Coarse and fine pedestrian detection method based on region candidates
Tang et al. Weakly supervised learning of deformable part-based models for object detection via region proposals
Wang et al. Traffic sign detection using a cascade method with fast feature extraction and saliency test
CN113221625A (en) Method for re-identifying pedestrians by utilizing local features of deep learning
Wang et al. S 3 d: scalable pedestrian detection via score scale surface discrimination
Tang et al. Weakly-supervised part-attention and mentored networks for vehicle re-identification
Fu et al. Learning latent features with local channel drop network for vehicle re-identification
Li et al. VRID-1: A basic vehicle re-identification dataset for similar vehicles
Zhang et al. Bioinspired scene classification by deep active learning with remote sensing applications
Liu et al. Multi-attention deep reinforcement learning and re-ranking for vehicle re-identification
Cai et al. Beyond photo-domain object recognition: Benchmarks for the cross-depiction problem
Lu et al. It’s okay to be wrong: Cross-view geo-localization with step-adaptive iterative refinement
Chen et al. Part alignment network for vehicle re-identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant