CN109165612B - Pedestrian re-identification method based on depth feature and bidirectional KNN sequencing optimization - Google Patents

Pedestrian re-identification method based on depth feature and bidirectional KNN sequencing optimization Download PDF

Info

Publication number
CN109165612B
CN109165612B CN201811007813.8A CN201811007813A CN109165612B CN 109165612 B CN109165612 B CN 109165612B CN 201811007813 A CN201811007813 A CN 201811007813A CN 109165612 B CN109165612 B CN 109165612B
Authority
CN
China
Prior art keywords
knn
pedestrian
bidirectional
distance
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811007813.8A
Other languages
Chinese (zh)
Other versions
CN109165612A (en
Inventor
包宗铭
龚声蓉
刘纯平
王朝晖
钟珊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201811007813.8A priority Critical patent/CN109165612B/en
Publication of CN109165612A publication Critical patent/CN109165612A/en
Application granted granted Critical
Publication of CN109165612B publication Critical patent/CN109165612B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pedestrian re-identification method based on depth feature and bidirectional KNN sequencing optimization, which comprises three stages, namely a feature extraction stage: obtaining a ResNet-50 model which is pre-trained on an ImageNet data set, finely adjusting the pre-trained model by utilizing a pedestrian re-identification data set, and respectively extracting the characteristics of a target image and a candidate set image by utilizing the finely adjusted model to obtain a characteristic vector; a metric learning stage: learning a metric function for the two extracted feature vectors through metric learning to measure the similarity between the two extracted feature vectors, and obtaining an initial ranking according to the distance between the two extracted feature vectors; and a reordering stage: and calculating the bidirectional KNN distance between the two images according to the bidirectional KNN relation and the bidirectional KNN set, and weighting and summing the initial distance and the bidirectional KNN distance to obtain a final distance to obtain the optimized ranking. The invention can better represent a pedestrian, improves the accuracy of pedestrian re-identification on the basis of initial ranking, reduces the pressure of manpower and machine resources and has good universality.

Description

Pedestrian re-identification method based on depth feature and bidirectional KNN sequencing optimization
Technical Field
The invention relates to a pedestrian re-identification method based on depth feature and bidirectional KNN sequencing optimization.
Background
Pedestrian re-identification is a cross-camera pedestrian matching technique. Generally, pedestrian re-identification includes two parts of pedestrian detection and pedestrian matching. The pedestrian detection refers to a process of locating and extracting pedestrians from an original video frame. Pedestrian matching refers to a process of retrieving a specified target from a candidate set in a cross-camera environment. Finally, the pedestrian re-identifies and returns a ranked list of candidate set images ordered by similarity (distance) to the target image.
The purpose of pedestrian re-identification is to retrieve a target pedestrian from a set of pedestrian candidates in a cross-camera environment. The method has wide application value in many fields, such as multi-camera tracking, multi-view behavior analysis, intelligent video monitoring and the like. In recent years, public attention to public safety problems of the society is higher and higher, the shoulder of the public security department is heavier and heavier, along with the gradual improvement of a city monitoring system, video data recorded by a camera can provide powerful technical support for criminal investigation and case solving and judicial evidence obtaining and the like of the public security department, and an intelligent video monitoring technology is now a new means for investigation and case solving of the public security department. In large public places such as parks, schools, squares, stations and the like, management departments can utilize a camera network to combine with an intelligent video monitoring technology to accurately and visually monitor and track target pedestrians in the places. And the pedestrian re-identification is taken as an important component of the intelligent video monitoring system, and plays an irreplaceable role naturally. By utilizing the pedestrian re-identification technology, the target person can be searched across cameras. Besides, the method can be combined with the target tracking technology and the like to calculate the motion trail of the target person. These all provide effectual guarantee for improving the public security sense.
The implementation method of pedestrian re-identification is various, but the following three steps are basically not left. Firstly, a feature descriptor is designed to represent the pedestrian, namely, the pedestrian is converted into a form of a feature vector from an image through feature design and extraction. The feature vectors are then mapped into a relatively low-dimensional metric space, i.e., the similarity (distance) between feature vectors is measured by learning a metric function, so that the distance between samples of the same class is as small as possible and the distance between samples of different classes is as large as possible. Finally, whether the samples belong to the same individual or not is judged according to the distance between the samples, the smaller the distance is, the more similar the samples are, and vice versa, and the candidate set images are sorted according to the distance. Sometimes we also improve the accuracy of pedestrian re-identification by introducing a re-ranking algorithm to optimize the initial ranking.
Feature design and extraction are the content that most computer vision tasks involve. Because computers cannot recognize images as our human being, it is necessary to convert the images into a format that can be read by computers. In other words, a good characterization method can better represent an image, which is robust to a variety of conditions. In the pedestrian re-identification problem, the pedestrian image is generally obtained by a method of manual annotation or DPM from a video frame captured by a camera, and because the pedestrian image is taken from different cameras, conditions such as illumination, viewing angle and the like are different. The illumination has a very large influence on the image presentation, and under different illumination conditions, two images originally belonging to the same pedestrian may show a great difference. The same pedestrian may also appear at different viewing angles under different lenses, which also makes two images originally belonging to the same individual look very different. In addition, the conditions of low resolution, shielding, noisy background and the like of the image can have great influence on the accuracy rate of pedestrian re-identification. Therefore, designing a feature description algorithm with resolving power and robustness is an important link in pedestrian re-identification. In 2015, Liao et al performed sliding window sampling on the input image, and extracted HSV color features and SILTP texture features in each small window. In addition, in order to solve the problem of alignment of human body parts descending by different cameras, the author adopts a strategy that a sliding window at the same horizontal position takes the maximum value. The finally obtained feature vector LOMO considers both integral and local features and has stronger robustness to illumination change, visual angle change and body part misalignment. The LOMO operator is also one of the feature representation algorithms studied and applied by many scholars due to its excellent robustness and efficiency. In recent years, with the advancement of the military project of deep learning, the feature representation method of the image is rapidly developed, and from the image recognition of the target detection to the image question and answer, almost all computer vision tasks can be improved by the deep learning method. Pedestrian re-identification methods based on deep learning are mainly classified into two categories, one is to extract features of an image through a depth model to replace a conventional feature descriptor, such as the LOMO mentioned above. The other is to input the image and output the result by learning an end-to-end depth model.
The result of pedestrian re-identification is a ranking of candidate set images according to their degree of similarity to the target image. Therefore, pedestrian re-identification can be viewed as a fine-grained image retrieval task to some extent. In the image retrieval task, we often use some reordering algorithm to optimize the ranking of the retrieval. Similarly, we want to enable candidate images that are more similar to the target image to be ranked more forward by introducing some contextual information. The selection of the context information is also diversified, and Liu et al optimize the initial matching result of the pedestrian re-identification in a manual feedback mode. Chen et al propose a method of ranking in a manifold that can be easily embedded into a general pedestrian re-identification procedure. Ye et al propose a method for adjusting and re-aggregating initial ranking results by using inter-pedestrian similarity and differential clues. Leng et al propose a reordering method that fuses content similarity and context similarity to perform two-way ordering optimization on an initial result.
For feature representation and reordering in pedestrian re-identification, there are currently two major problems:
1. and (4) robustness. For some simple visual tasks, such as image classification, face recognition, etc., manually designed features often achieve better results. However, due to the particularity of the pedestrian re-identification problem, conditions such as cross-view angle and shielding place higher requirements on the robustness of pedestrian features. This is because the lighting conditions are different under different cameras, and the viewing angles presented by pedestrians are different, which makes two images of pedestrians originally belonging to the same individual look dissimilar. In addition, the resolution of the image acquired by re-identifying the pedestrian is generally low, so that the face information cannot be utilized. As such, conventional manual design features may be difficult to achieve with high accuracy in more complex visual tasks, such as pedestrian re-identification. In 2012, with alexene proposed by Alex Krizhevsky who has taken great attention to ILSVRC in the same year, deep learning has been studied and applied by more and more students, and excellent performance has been obtained on many tasks. Depth learning can be understood as a black box, depth features of an image can be extracted through the black box, and the depth features are usually represented as some image low-level features in the first layers of a network model, such as corner points, edge information and the like. The middle layers of the network model are represented by shape features with certain semantic information, such as the shape of a head, the shape of a foot, and the like. By the last few layers, some high-level features with abstract meaning, such as a person, a package, etc., can be presented. Moreover, a large number of researches show that the robustness and the resolution of the depth features extracted by the depth model are far higher than those of the traditional artificial design features. Deep learning relies on two basic conditions-large-scale training data and high-performance GPU graphics. The scale of the data set in pedestrian re-identification is generally small, the problem of model overfitting can be caused by a small training sample, namely the trained model does not have good generalization capability. Therefore, in order to improve the robustness of feature extraction in pedestrian re-recognition by using deep learning, the problem of small number of samples in the training set needs to be solved first.
2. Another concern is the contextual information introduced at the time of reordering. Generally, reordering algorithms generally do not require additional training data, but the effect of reordering optimization is limited by the initial ranking. For example, when we reorder using the KNN relation of images as context information, we consider that if one candidate image belongs to the K-nearest neighbor of the target image, there is a high possibility that the two images are positively correlated. However, if negative samples are also present in the KNN set of the target image, the KNN-based reordering process may actually add new noise to the initial ranking, thereby affecting the accuracy of pedestrian re-identification. Therefore, when designing a reordering algorithm, we need to consider more reasonable and sufficient context information and reordering strategies to optimize the initial ranking of pedestrian re-identification.
Disclosure of Invention
The invention aims to provide a pedestrian re-identification method based on depth features and bidirectional KNN sequencing optimization.
2. The technical scheme of the invention is as follows: a pedestrian re-identification method based on depth feature and bidirectional KNN sequencing optimization is characterized by comprising three stages of feature extraction, metric learning and reordering:
the feature extraction stage comprises:
step 1) obtaining a ResNet-50 model pre-trained on an ImageNet data set;
step 2) utilizing the pedestrian re-identification data set to finely adjust the pre-training model;
step 3) respectively extracting the characteristics of the target image and the candidate set image by using the fine-tuned model to respectively obtain a characteristic vector of the target image and a characteristic vector of the candidate set image;
the metric learning phase comprises:
step 4) learning a metric function for the extracted feature vectors of the target image and the feature vectors of the candidate set image through metric learning to measure the similarity between the two feature vectors, namely measuring the distance between the two feature vectors, and sequencing according to the distance between the two feature vectors to obtain an initial ranking;
the reordering stage comprises:
step 5) for the target image p, we define that N (p, K) represents a K-neighbor set of the target image, then there are:
N(p,k)={g1,g2,…,gk},|N(p,k)|=k, (1)
according to the above definition, the bi-directional KNN set D (p, k) is represented as:
D(p,k)={gi|(gi∈N(p,k))∧(p∈N(gi,k))}, (2)
wherein, giRepresenting candidate set images, N (g)iK) represents a K-neighbor set of candidate set images,
p and giD (p, g) betweeni) Obtained by comparing their bidirectional KNN sets:
Figure GDA0003044033890000041
wherein,D(giK) represents giThe bidirectional KNN set of (1);
the computation is simplified by encoding the bi-directional KNN neighbor set as a vector, represented as an N-dimensional vector, each term in the vector revealing whether the relevant image is contained in D (p, k), the encoding rule being as follows:
Figure GDA0003044033890000051
the intersection calculation of the bidirectional KNN sets is realized by a formula (5) and a formula (6), the formula (5) and the formula (6) are substituted into the formula (3), namely, the set comparison problem can be converted into a pure vector calculation problem, and the formula (5) and the formula (6) are respectively expressed as:
Figure GDA0003044033890000052
Figure GDA0003044033890000053
where min and max represent the minimum and maximum element-based measurements for two input vectors, | · |. luminance1Then represents the L1 paradigm;
and 6) finally, weighting and summing the initial distance and the bidirectional KNN distance to obtain a final distance, and obtaining the optimized ranking.
Further, in the present invention, in step 4), distance measurement is performed on the two feature vectors to calculate mahalanobis distances between the feature vector of the target image and the feature vectors of the candidate set images, respectively.
Further, in the present invention, in step 6), the final distance d is set*Can be calculated by equation (7):
d*(p,gi)=(1-λ)d(p,gi)+λd0(p,gi) (7)
wherein d is0(p,gi) Denotes the initial distance, λ ∈ [0,1 ]]Representing a penalty factor.
Compared with the prior art, the invention has the following advantages:
1) the invention provides a depth feature extraction algorithm based on ResNet. By fine-tuning a pre-trained model on the ImageNet data set, a model for extracting the depth features of the pedestrians can be obtained. The ImageNet is a database which has more than 1400 million pictures and covers more than 2 million categories. The large-scale data set completely meets the requirements of deep learning on data volume, and the idea of migration is utilized, namely, a model trained on the data set of the related visual task is used for pedestrian re-identification through a fine-tuning strategy. Here we have chosen ResNet-50 as the initial neural network architecture because its unique network structure can solve the problem of model capability degradation as the network depth deepens in deep learning. After training on a large-scale data set such as ImageNet, the model has the capability of extracting abundant features, and can be well applied to tasks such as detection, identification, classification and the like. And finally, extracting the pedestrian features of the image by using the model.
2) In order to further improve the accuracy of the pedestrian re-identification result, a reordering algorithm based on the KNN relation is improved. In order to reduce the influence of negative examples in the KNN set of the image on the initial ranking, a bidirectional KNN relation between the target image and the candidate image is introduced and used as context information to optimize the ranking. The traditional KNN method is to use K neighbors of target images to adjust and optimize initial ranking, and the K neighbors of candidate images are also taken into account by the bidirectional KNN method based on the traditional method, namely if a certain candidate image belongs to a KNN set of the target image, and the target image also belongs to the KNN set of the candidate image, the matching degree of the pair of images is very high. Pairs of images satisfying this relationship are referred to as bi-directional KNN relationship sets. In this way, we measure the distance between the pair of images by comparing the bidirectional KNN sets of the target image and the candidate image, and when the respective bidirectional KNN sets of the two images have overlapping portions, it can be said that the two images are similar, and the more the overlapping portions, the more similar the two images are. For convenience of calculation, when the image distance is calculated, the neighbor set is coded into a more concise and effective vector form, and distance comparison between images can be completed through calculation between vectors. The calculated distances are reordered, weighted and summed with the initial distances as the final metric distance, and reordered according to this distance.
Drawings
The invention is further described with reference to the following figures and examples:
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
Example (b):
the specific implementation of the pedestrian re-identification method based on the depth feature and the bidirectional KNN sequencing optimization is shown in the attached drawing, and comprises three stages of feature extraction, metric learning and reordering.
1. Feature extraction
The artificial design features can specifically alleviate the influence caused by the change of factors such as illumination, visual angle and the like. However, for more complex computer vision tasks, the resolution and robustness are still insufficient, which also makes the pedestrian re-identification not to reach a high accuracy. The rise of deep learning enables many computer vision tasks to make breakthrough progress, so that a deep residual error network model is trained by using a related data set by means of a deep learning method and is used for extracting features of a pedestrian image, and the method mainly comprises the following steps:
step 1): obtaining a ResNet-50 model pre-trained on an ImageNet dataset;
in the task of pedestrian re-identification, because the sample number of a related data set is small, a model needs to be pre-trained by means of the related data sets in other tasks, and an ImageNet data set is selected, wherein the ImageNet data set is a large data set with over 1400 ten thousand images. Meanwhile, the deepening of the depth also causes some problems, a neural network with tens of layers can solve the problem of gradient disappearance through a method of centering one layer, and the like, but when the number of layers of the neural network continues to increase, the model has a degradation problem, namely, the accuracy begins to decline after the saturation. The deep residual network solves the problem well, and the ResNet-50 is selected as an initial network structure. The initial parameters in the network also substantially correspond to the original model. The initial model finally reaches convergence after one time of forward propagation, backward propagation and weight updating.
Step 2): fine-tuning the pre-training model by using the pedestrian re-identification data set;
through the step 1), a pre-training model is obtained, and in order to better migrate the pre-training model to the field of pedestrian re-identification, fine tuning is further required. The fine-tuning is also training in nature, but uses a smaller number of training samples than the pre-training step. Due to the gap between different data sets, we also need to employ different trimming strategies for different data sets. Taking the two data sets of CUHK03 and PRW as an example, on the CUHK03 data set, we adopt a strategy of fine tuning directly on the pre-training model. For the PRW data set, a cascading fine tuning strategy is adopted, namely, firstly, a two-class network capable of distinguishing pedestrians and backgrounds is trained by utilizing detection data, and then, on the basis, a 482-class recognition network is finely tuned by utilizing the training data of the PRW. The advantage of the cascade fine-tuning strategy is that the influence caused by error detection can be reduced, and the recognition capability of the model is improved.
Step 3): and respectively extracting the characteristics of the target image and the candidate set image by using the fine-tuned model to respectively obtain the characteristic vector of the target image and the characteristic vector of the candidate set image.
2. Metric learning
The metric for measuring the distance or similarity between two feature vectors can be divided into two categories, i.e., a metric without learning and a metric with learning. For example, the L1 norm, the L2 norm, and the cosine similarity belong to metrics that do not require learning, and these metrics cannot effectively use the discrimination information contained in the data, so the matching performance in pedestrian re-identification is weak. The learning-based measurement method, such as the Mahalanobis distance, can fully utilize the inherent distribution information of the training data to learn the model parameters with strong discriminability, and can obtain excellent matching accuracy in pedestrian re-identification. Therefore, the invention mainly utilizes a pedestrian re-identification method based on metric learning, which mainly comprises the following steps:
step 4): and (3) learning a metric function for the extracted feature vectors of the target image and the feature vectors of the candidate set image through metric learning to measure the similarity between the two feature vectors, namely measuring the distance between the two feature vectors, specifically calculating the Mahalanobis distance between the feature vectors of the target image and the feature vectors of the candidate set image respectively, and sequencing according to the distance between the feature vectors of the target image and the feature vectors of the candidate set image to obtain an initial ranking.
3. Reordering
Reordering algorithms are often used in image retrieval to optimize the retrieval results. The pedestrian re-identification can be regarded as a fine-grained pedestrian image retrieval task, so that the matching result of a certain retrieval image can be reordered, and the image belonging to the positive sample can be arranged at a position before the positive sample. Reordering algorithms generally do not require additional training data, but the effect of reordering is limited by the initial ranking. In order to solve the problem, the invention provides a bidirectional KNN reordering method, which mainly comprises the following steps:
step 5): according to the bidirectional KNN relation and the bidirectional KNN set, calculating the bidirectional KNN distance between the two images to realize reordering, which specifically comprises the following steps:
for the target image p, we define that N (p, K) represents a K-neighbor set of the target image, then there are:
N(p,k)={g1,g2,…,gk},|N(p,k)|=k (1)
wherein, giRepresenting the candidate set images.
According to the above definition, the bi-directional KNN set D (p, k) can be expressed as:
D(p,k)={gi|(gi∈N(p,k))∧(p∈N(gi,k))} (2)
as can be seen from the above equation, if a pair of images satisfies the bi-directional KNN relationship, there is a greater possibility that they are positively correlated than the K-nearest neighbor relationship. These image pairs with bi-directional KNN relationship form a bi-directional KNN set, and we need to define a distance function to calculate p and giThe distance between them. Here we select the Jacard function, p and giD (p, g) betweeni) It can be obtained by comparing their bidirectional KNN sets:
Figure GDA0003044033890000091
if the two images are more similar, the more their bidirectional KNN sets overlap, corresponding to the above equation, the smaller the distance.
Furthermore, we simplify the computation by encoding the bi-directional KNN neighbor set into vectors, the rules for encoding are as follows:
Figure GDA0003044033890000092
with the above rule, the bi-directional KNN set can be represented as an N-dimensional vector, each term in the vector revealing whether the relevant image is contained in D (p, k).
Based on the above definition, the intersection calculation of the bidirectional KNN sets can be realized by formula (5) and formula (6):
Figure GDA0003044033890000093
Figure GDA0003044033890000094
where min and max represent the minimum and maximum element-based measurements for two input vectors, | · |. luminance1Then represents the L1 paradigm. Substituting formula (5) and formula (6) into formula (3) to calculate the distance can transform the set comparison problem into a pure vector calculation problem.
Step 6): and finally, weighting and summing the initial distance and the bidirectional KNN distance to obtain a final distance, and obtaining an optimized ranking, which specifically comprises the following steps:
final distance d*Equal to the weighted sum of the initial distance and the two-way KNN distance:
d*(p,gi)=(1-λ)d(p,gi)+λd0(p,gi) (7)
wherein d is0(p,gi) Denotes the initial distance, λ ∈ [0,1 ]]Representing a penalty factor. When λ is 0, only the bidirectional KNN distance is considered, and when λ is 1, only the initial distance is considered. The final ranking is in ascending order of distance, i.e., distance is in front and distance is in back.
The demonstration experiment usage data sets for this example are CUHK03, Market1501 and PRW.
The CUHK03 dataset was taken at hong kong chinese university and comprised 14096 images of 1467 pedestrians. Each pedestrian is from two different cameras, with an average of 4.8 images of the pedestrian per camera. The data set includes two labeling modes, manual and DPM (Deformable Part model).
Market1501 is one of the largest pedestrian re-identification datasets at present, and it is taken from six different cameras, totaling 1501 32668 images of pedestrians. The data set was divided into two parts, 12936 images of 751 pedestrians for training and 19732 images of 750 pedestrians for testing.
The PRW is also a large data set containing 11816 images of 932 pedestrians from different six cameras, with the training set containing 5704 images of 482 pedestrians and the test set containing 6112 images of 450 pedestrians. Table 1 is detailed information of the above data set.
TABLE 1
Data set Year of year Number of pedestrians Number of cameras Number of images Size of image Label mode
CUHK03 2014 1467 10 13164 Vary Hand/DPM
Market1501 2015 1501 6 32217 128*64 Hand/DPM
PRW 2016 932 6 34304 Vary Hand
In addition, the experimental hardware environment: ubuntu 16.04, GTX1080ti video card, video memory 12G, core (TM) i7 processor, main frequency 3.4G, memory 16G.
The code running environment is as follows: deep learning framework (Caffe), python2.7, Matlab 2014 a.
The experimental results are as follows:
we selected two evaluation indices (CMC and mAP) to evaluate the experimental results. The experimental strategy also differed for the three different data sets, which we analyzed separately below.
For CUHK03, we separated the data set into a training set (1160 pedestrians) and a test set (100 pedestrians). Since CUHK03 provides both manually Labeled bounding boxes and DPM Detected bounding boxes, we tested Labeled and Detected images, respectively. Here, XQDA is selected as a unified measurement method, and in the aspect of characteristics, Bow, LOMO and IDE and the depth characteristics based on ResNet provided by the invention are respectively selected. Through comparison, the depth feature has a better effect compared with the traditional manual design feature. In addition, the depth features based on ResNet are also better than the general CNN model. After the four methods are subjected to reordering optimization, the result is improved to a certain degree. We therefore conclude that: the reordering algorithm can effectively improve the accuracy of pedestrian re-identification, and the experimental result is shown in table 2.
TABLE 2
Figure GDA0003044033890000111
For the Market1501 dataset, similar to CUHK03, we chose BOW, LOMO and our proposed ResNet-50 based depth feature in features. To increase the richness of the comparison, we add the KISSME method to the metric learning method. In addition, we also compare our reordering algorithm with AQE reordering algorithm and CDM reordering algorithm, and the experimental results prove that our method also achieves good effect on mark 1501, and the experimental results are shown in table 3.
TABLE 3
Figure GDA0003044033890000112
The experimental results show that the reordering algorithm is helpful for different feature descriptors and metric learning methods. In addition, compared with two reordering algorithms, namely AQE and CDM, the reordering algorithm based on the bidirectional KNN relation also obtains better experimental results.
Finally, for the PRW dataset, since the dataset is an end-to-end dataset, which is more challenging than image and video based data, we need to first detect candidate bounding boxes from the original image using DPM, then select LOMO and IDE as feature representations, and we select XQDA as the metric function, as shown in table 4.
TABLE 4
Figure GDA0003044033890000121
And (4) experimental conclusion:
through experimental comparison on three different data sets (CUHK03, Market1501, PRW), we can find that compared with the traditional manual design feature, the depth feature based on ResNet has stronger resolving power and robustness and can better represent a pedestrian. In addition, the reordering algorithm based on the bidirectional KNN relation can effectively improve the accuracy of pedestrian re-identification on the basis of initial ranking. The effect obtained by our proposed reordering algorithm is also better than the other two reordering algorithms (AQE, CDM). Finally, the reordering algorithm does not need additional training data, reduces the pressure of manpower and machine resources, and can be found from experimental results, the algorithm is suitable for most of feature extraction and metric learning methods, and has good universality.
It should be understood that the above-mentioned embodiments are only illustrative of the technical concepts and features of the present invention, and are intended to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the scope of the present invention. All modifications made according to the spirit of the main technical scheme of the invention are covered in the protection scope of the invention.

Claims (3)

1. A pedestrian re-identification method based on depth feature and bidirectional KNN sequencing optimization is characterized by comprising three stages of feature extraction, metric learning and reordering:
the feature extraction stage comprises:
step 1) obtaining a ResNet-50 model pre-trained on an ImageNet data set;
step 2) utilizing the pedestrian re-identification data set to finely adjust the pre-training model;
step 3) respectively extracting the characteristics of the target image and the candidate set image by using the fine-tuned model to respectively obtain a characteristic vector of the target image and a characteristic vector of the candidate set image;
the metric learning phase comprises:
step 4) learning a metric function for the extracted feature vectors of the target image and the feature vectors of the candidate set image through metric learning to measure the similarity between the two feature vectors, namely measuring the distance between the two feature vectors, and sequencing according to the distance between the two feature vectors to obtain an initial ranking;
the reordering stage comprises:
step 5) for the target image p, we define that N (p, K) represents a K-neighbor set of the target image, then there are:
N(p,k)={g1,g2,…,gk},|N(p,k)|=k, (1)
according to the above definition, the bi-directional KNN set D (p, k) is represented as:
D(p,k)={gi|(gi∈N(p,k))∧(p∈N(gi,k))}, (2)
wherein, giRepresenting candidate set images, N (g)iK) represents a K-neighbor set of candidate set images,
p and giD (p, g) betweeni) Obtained by comparing their bidirectional KNN sets:
Figure FDA0003044033880000011
wherein, D (g)iK) represents giThe bidirectional KNN set of (1);
the computation is simplified by encoding the bi-directional KNN neighbor set as a vector, represented as an N-dimensional vector, each term in the vector revealing whether the relevant image is contained in D (p, k), the encoding rule being as follows:
Figure FDA0003044033880000012
the intersection calculation of the bidirectional KNN sets is realized by a formula (5) and a formula (6), the formula (5) and the formula (6) are substituted into the formula (3), namely, the set comparison problem can be converted into a pure vector calculation problem, and the formula (5) and the formula (6) are respectively expressed as:
Figure FDA0003044033880000021
Figure FDA0003044033880000022
where min and max represent the minimum and maximum element-based measurements for two input vectors, | · |. luminance1Then represents the L1 paradigm;
and 6) finally, weighting and summing the initial distance and the bidirectional KNN distance to obtain a final distance, and obtaining the optimized ranking.
2. The pedestrian re-identification method based on depth feature and bidirectional KNN ranking optimization according to claim 1, characterized in that: and 4), performing distance measurement on the two feature vectors to respectively calculate the Mahalanobis distance between the feature vector of the target image and the feature vector of the candidate set image.
3. The pedestrian re-identification method based on depth feature and bidirectional KNN ranking optimization according to claim 1, characterized in that: in step 6), the final distance d*Calculated by equation (7):
d*(p,gi)=(1-λ)d(p,gi)+λd0(p,gi) (7)
wherein d is0(p,gi) Denotes the initial distance, λ ∈ [0,1 ]]Representing a penalty factor.
CN201811007813.8A 2018-08-31 2018-08-31 Pedestrian re-identification method based on depth feature and bidirectional KNN sequencing optimization Active CN109165612B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811007813.8A CN109165612B (en) 2018-08-31 2018-08-31 Pedestrian re-identification method based on depth feature and bidirectional KNN sequencing optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811007813.8A CN109165612B (en) 2018-08-31 2018-08-31 Pedestrian re-identification method based on depth feature and bidirectional KNN sequencing optimization

Publications (2)

Publication Number Publication Date
CN109165612A CN109165612A (en) 2019-01-08
CN109165612B true CN109165612B (en) 2021-07-09

Family

ID=64893607

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811007813.8A Active CN109165612B (en) 2018-08-31 2018-08-31 Pedestrian re-identification method based on depth feature and bidirectional KNN sequencing optimization

Country Status (1)

Country Link
CN (1) CN109165612B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886185B (en) * 2019-02-18 2023-08-04 深圳市商汤科技有限公司 Target identification method, device, electronic equipment and computer storage medium
CN110517293A (en) * 2019-08-29 2019-11-29 京东方科技集团股份有限公司 Method for tracking target, device, system and computer readable storage medium
CN110569819A (en) * 2019-09-16 2019-12-13 天津通卡智能网络科技股份有限公司 Bus passenger re-identification method
CN110704659B (en) * 2019-09-30 2023-09-26 腾讯科技(深圳)有限公司 Image list ordering method and device, storage medium and electronic device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090216794A1 (en) * 2008-02-27 2009-08-27 General Electric Company Method and system for accessing a group of objects in an electronic document
CN104298971A (en) * 2014-09-28 2015-01-21 北京理工大学 Method for identifying objects in 3D point cloud data
CN106886795A (en) * 2017-02-17 2017-06-23 北京维弦科技有限责任公司 Object identification method based on the obvious object in image

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090216794A1 (en) * 2008-02-27 2009-08-27 General Electric Company Method and system for accessing a group of objects in an electronic document
CN104298971A (en) * 2014-09-28 2015-01-21 北京理工大学 Method for identifying objects in 3D point cloud data
CN106886795A (en) * 2017-02-17 2017-06-23 北京维弦科技有限责任公司 Object identification method based on the obvious object in image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
不均衡训练集下短信过滤系统kNN方法的研究;徐山等;《计算机应用与软件》;20131130;全文 *

Also Published As

Publication number Publication date
CN109165612A (en) 2019-01-08

Similar Documents

Publication Publication Date Title
Unar et al. A decisive content based image retrieval approach for feature fusion in visual and textual images
Ghrabat et al. An effective image retrieval based on optimized genetic algorithm utilized a novel SVM-based convolutional neural network classifier
Wang et al. Discriminative feature and dictionary learning with part-aware model for vehicle re-identification
CN109165612B (en) Pedestrian re-identification method based on depth feature and bidirectional KNN sequencing optimization
Zhao et al. A novel image retrieval method based on multi-trend structure descriptor
Kang et al. Image matching in large scale indoor environment
Niu et al. A novel image retrieval method based on multi-features fusion
Zhang et al. Weakly supervised human fixations prediction
Kobyshev et al. Matching features correctly through semantic understanding
Ming et al. Uniform local binary pattern based texture-edge feature for 3D human behavior recognition
Sadique et al. Content-based image retrieval using color layout descriptor, gray-level co-occurrence matrix and k-nearest neighbors
Ghrabat et al. Greedy learning of deep Boltzmann machine (GDBM)’s variance and search algorithm for efficient image retrieval
Paul et al. Mining images for image annotation using SURF detection technique
Al-Jubouri Content-based image retrieval: Survey
Wang et al. MashFormer: A novel multiscale aware hybrid detector for remote sensing object detection
Liu et al. Feature grouping and local soft match for mobile visual search
Chen et al. Big Visual Data Analysis: Scene Classification and Geometric Labeling
Ravi et al. A multimodal deep learning framework for scalable content based visual media retrieval
Wang et al. Fast loop closure detection via binary content
Al-Jubouri Multi Evidence Fusion Scheme for Content-Based Image Retrieval by Clustering Localised Colour and Texture
Zhang et al. Image retrieval using compact deep semantic correlation descriptors
Raboh et al. Learning latent scene-graph representations for referring relationships
Wu et al. Study on a new video scene segmentation algorithm
Raja et al. Classification and retrieval of natural scenes
CN113298037B (en) Vehicle weight recognition method based on capsule network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant