CN110516533B

CN110516533B - Pedestrian re-identification method based on depth measurement

Info

Publication number: CN110516533B
Application number: CN201910626883.XA
Authority: CN
Inventors: 苗夺谦; 王倩倩
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2023-06-02
Anticipated expiration: 2039-07-11
Also published as: CN110516533A

Abstract

The invention relates to a pedestrian re-identification method based on depth measurement, which comprises the following steps: 1) Training a ResNet-50 network by taking the ImageNet data set as a training data set to enable the ResNet-50 network to have an initial value; 2) Removing the softmax layer and the last fully-connected layer of the ResNet-50 network; 3) Forming a depth measurement network by using a plurality of nonlinear full-connection layers, and adding an Euclidean distance calculating unit after outputting; 4) Connecting a depth measurement network after the ResNet-50 network is adjusted to form a final network model of the invention; 5) Randomly cutting images in the pedestrian re-identification training data set to obtain a group of training data sets with 224 multiplied by 224, randomly selecting pedestrians with different P positions from the training data sets, and randomly selecting K images for each pedestrian to form a small training batch; 6) Optimizing the network in 4) by minimizing Hard Triplet Loss loss function using the training data obtained in 5), and performing this step in a loop until the loss value converges; 7) Inputting the pedestrian image to be identified and the images in the candidate library into the optimized model, and obtaining the feature vector of the pedestrian image on the same feature space; 8) And calculating Euclidean distance between the feature vectors, sequencing the distance, and finally obtaining the matching rate of the pedestrian image to be identified and the comparison image.

Description

Pedestrian re-identification method based on depth measurement

Technical Field

The invention relates to the field of intelligent analysis of surveillance videos, in particular to a pedestrian re-identification method based on depth measurement.

Background

Pedestrian re-identification refers to the problem of matching pedestrians under different camera angles in a system consisting of multiple cameras, and relates to numerous research hotspots such as feature selection, saliency extraction, distance measurement learning, deep learning and the like. The pedestrian re-identification technology provides key help for analysis of different aspects of pedestrian identity, tracking and the like, and is developed into a key component in the intelligent video monitoring field.

The main methods in the pedestrian re-recognition field can be divided into the following two categories: 1) A pedestrian re-identification method based on characteristic representation; 2) A method based on distance metric learning.

The former aims at designing or learning features that are robust to changes in illumination and viewing angle, etc., and this type of approach typically combines multiple underlying visual features, where the underlying secondary features are typically color (color space, histogram, dominant color, etc.) and texture (LBP, gabor, co-occurrence matrix, etc.) features. For example: symmetry-based cumulative feature descriptors, covariance descriptors, horizontal stripe-based partition descriptors, pyramid match descriptors, pattern matches, saliency matches, deep learning models, and so forth. The method solves the problems of illumination, visual angle and the like to a certain extent, but only can extract the bottom visual information, the feature extraction rule is fixed, and the robustness and the adaptability of the features are limited to a certain extent.

The latter is focused on designing a similarity metric model suitable for pedestrian re-recognition. The existing distance measurement model is mainly divided into a non-learning method and a learning method. First-order distance, second-order distance, papanicolaou distance, etc. are non-learning methods that are generally mathematically simple. However, the recognition result is not ideal due to the influence of problems such as redundancy, robustness, and the like of the extracted pedestrian features. The measurement method based on learning generally learns the identification information of the appearance characteristics of the same pedestrian and different pedestrians under different cameras, and optimizes the difference and the similarity between samples, so that the identification effect is relatively good. The method mainly comprises RankSVM, relative distance comparison, metric learning based on a kernel method, mahalanobis distance learning, deep metric learning, metric integration and the like.

In general, the above method divides the pedestrian re-recognition process into two steps: the feature representation and the distance measure are then optimized for each of the two steps. These fracture the feature representation and the metric, while in practice the distance metric effect and the feature representation have a close relationship and are not completely cuttable.

The chinese application CN108171184a proposes a pedestrian re-recognition method based on a Siamese network, which uses two identical res net-50 networks to form a Siamese network, and uses paired training data to optimize the network. Although the method adopts the convolutional neural network to automatically learn the image characteristics, paired input is needed in training, and the training time is too long. Further, due to the influence of various factors such as illumination change, posture, visual angle, shielding, image resolution and the like, the pedestrian re-identification performance is still poor in the intelligent analysis of the monitoring video.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a pedestrian re-identification method based on depth measurement.

The aim of the invention can be achieved by the following technical scheme:

a pedestrian re-identification method based on depth measurement comprises the following steps:

1. constructing a network

1) The res net-50 network is pre-trained,

training a ResNet-50 network by taking the ImageNet data set as a training data set to enable the ResNet-50 network to have an initial value;

2) Adjusting the ResNet-50 network in the step 1), and removing the softmax layer and the last full connection layer in the ResNet-50 network; providing step 4);

3) A plurality of nonlinear full-connection layers are adopted to form a depth measurement network, an Euclidean distance calculation unit is added after the depth measurement network is output, and a random initialization method is adopted for the network parameters; providing step 4);

4) A pedestrian re-identification network model is constructed,

connecting the depth measurement network in the step 3) after the ResNet-50 network is adjusted in the step 2), and forming a final network model of the invention;

2. training

5) Preprocessing pedestrians, identifying a training data set, randomly cutting images in the training data set to obtain a group of training data sets with 224 multiplied by 224, randomly selecting pedestrians with different P bits from the training data sets, and randomly selecting K images for each pedestrian to form a small training batch;

6) The network model is trained such that,

optimizing the network model finally constructed in the step 4) by minimizing a Hard Triplet Loss loss function, inputting the training data obtained in the step 5) into the optimized network model, and circularly executing the step until the loss value converges;

3. identification of

7) Re-identifying pedestrians, respectively inputting the images of the pedestrians to be identified and the images in the candidate library into the optimized network model in the step 6), and obtaining the feature vectors of the pedestrians on the same feature space;

8) And calculating the similarity between the image to be recognized and all the images in the candidate library, namely calculating the Euclidean distance of the feature vector between the image to be recognized and the candidate library image, wherein the feature vector is obtained in the step 7). And then sequencing the images in the candidate library according to the rule that the similarity is from small to large, wherein the images are more similar to the images to be identified when the ranking is higher. Wherein similar refers to images in which two images of pedestrians are the same pedestrian. The first image is the same pedestrian image of the pedestrian image to be identified.

Further, the pretrained ResNet-50 network in the step 1) optimizes training by adopting a dropout or Batchnormal method, so that the ResNet-50 network has image feature extraction capability.

Further, the ResNet-50 is adjusted in the step 2), namely, the softmax layer and the last full connection layer of the ResNet-50 network are deleted, and a vector with the final output of 2048 dimensions is obtained.

Further, the depth measurement network of step 3),

step 3), which is a key innovation step of the present invention, the depth measurement network module is one of the innovations of the present invention, and is connected to the 2048-dimensional feature vector to output the euclidean space feature vector after nonlinear projection. The depth measurement network structure specifically comprises:

after a neural network consisting of M nonlinear fully connected layers, a Euclidean distance calculation layer is added. The depth of the first full-connection layer is 2048, the parameter initialization of each layer adopts a random initialization method, and the calculation formula is as follows:

wherein M is more than or equal to 1 and less than or equal to M, r ^(m) Is the depth of the m-th layer, and r ⁽⁰⁾ ＝2048，

Is the weight of the m-th layer, the bias of each layer +.>

Initializing to zero vector, wherein M is the total layer number of the full connection layer in the depth measurement network, and is a super parameter.

Further, the step 4) of constructing a pedestrian re-identification network model specifically includes:

connecting the ResNet-50 network adjusted in the step 2) with the depth measurement network obtained in the step 3), namely inputting the output of the ResNet-50 network into the depth measurement network, and constructing the pedestrian re-identification network model.

Further, the training network model in step 6) specifically refers to that in the new training set generated in step 5), pedestrians with different P positions are randomly selected, K images are randomly selected for each pedestrian to form a small training batch, the training batch is input into the network for training, the loss function adopts Hard Triplet Loss, and the calculation formula is as follows:

61 Acquiring characteristics of each sample in the training batch extracted through ResNet-50 network

(1≤i≤P,1≤a≤K)，/>

An a-th image representing an i-th pedestrian in the training batch, r (-) represents the output of the ResNet-50 network.

62 Acquiring each bitSign vector

The output through the depth measurement network is specifically calculated as follows:

wherein M is more than or equal to 1 and less than or equal to M and h ^(m) For the output of the mth layer in the depth metric network,

for nonlinear activation functions, f (·) is a nonlinear mapping function of depth metric network parameterization. />

Representing the bias vector of the mth layer in the depth metric network. />

Is the weight of the m-th layer in the depth measurement network. r is (r) ^(m) To measure the depth of the m-th layer of the network, and r ⁽⁰⁾ ＝2048。/>

Indicating that the content is r ^(m) And each element in the vector is a real value. R is a real number set.

63 Calculating a loss function value):

wherein the method comprises the steps of

The a-th image representing the i-th pedestrian in the training Batch, r (-) represents the output of the ResNet-50 network, and P, K is the number of different pedestrians in the Batch and the number of images of each pedestrian respectively. X represents the input of Batch, σ is the threshold, θ is the parameter of the network, ++>

For nonlinear activation functions, f (·) is a nonlinear mapping function of depth metric network parameterization. d, d _f (p ₁ ,p ₂ ) Represents p ₁ And p is as follows ₂ Depth measurement distance between, where p ₁ And p is as follows ₂ Are vectors.

And then, solving an optimal solution for the loss function by using a random gradient descent method, so as to update and optimize the corresponding parameters.

Further, the step 7) of re-identifying the pedestrian specifically refers to inputting the image of the pedestrian to be identified and the image in the candidate library into the network to obtain an output f (r (x)) of each image x, where x represents any one of the image to be identified and the candidate library.

Further, in step 8), the distance between the pedestrian image to be recognized and the contrast image is:

d _f (r(x),r(y))＝d(f(r(x)),f(r(y)))＝||f(r(x))-f(r(y))|| ₂

wherein x represents any image to be identified, and y represents any image in the candidate libraryAn image. r (-) represents the output of the ResNet-50 network. f (·) is a non-linear mapping function of the depth metric network parameterization. d, d _f (p ₁ ,p ₂ ) Represents p ₁ And p is as follows ₂ Depth measurement distance between, where p ₁ And p is as follows ₂ Are vectors. Wherein r (x), r (y) are feature vectors of the image to be recognized and the contrast image respectively, and f (r (x)) and f (r (y)) are feature vectors of the image to be recognized and the contrast image on the same feature space obtained by nonlinear mapping of the depth measurement network respectively. d, d _f (r (x), r (y)) represents the depth measurement distance between the image x to be recognized and any image y in the candidate library.

According to the technical scheme, the feature extraction and the measurement learning are integrated in the unified frame, so that the method can be optimized under the unified target, and the accuracy of pedestrian re-identification is improved.

Compared with the prior art, the invention has the following advantages:

1. by utilizing the excellent network model trained on the large-scale image database and performing fine adjustment on the pedestrian re-identification database, the image features can be automatically learned through the network model without complex preprocessing operation when the image features are extracted.

2. Using a multi-layer nonlinear feedforward neural network, learning a potential nonlinear mapping function, mapping the image features extracted by ResNet-50 into a low-dimensional feature space, and calculating Euclidean distance of the mapped features in the feature space to serve as similarity measurement of the images. The depth metric may capture a nonlinear relationship between data points compared to a traditional mahalanobis distance;

3. the feature extraction and the measurement learning are fused under one frame, and the optimization is carried out under a unified target, so that the extracted features are more suitable for the re-identification problem.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention.

FIG. 2 is a schematic diagram of the system structure of the present invention.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples.

Examples:

in order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the following examples, which are specifically illustrated in the flowcharts and block diagrams shown in fig. 1 and 2. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Step one: pre-training the ResNet-50 network, taking the ImageNet data set as a training data set, and training one ResNet-50 network to effectively initialize the parameters of the ResNet-50 network; effective initialization refers to the ability to learn certain image features;

step two: trimming the ResN-50 network, deleting the softmax layer and the last full connection layer of the ResNet-50 network, wherein the output of the network after deletion is 2048-dimensional vector;

step three: the depth measurement network is constructed, 2 nonlinear full-connection layers are connected to form the depth measurement network, and after the output, an Euclidean distance calculation unit is added, the depths of the full-connection layers of the two layers are respectively 512 and 128, the activation function adopts a tanh function, the network parameters adopt a random initialization method, and the specific formula is as follows:

wherein m is more than or equal to 1 and less than or equal to 2, r ⁽⁰⁾ ＝2048，r ⁽¹⁾ ＝512，r ⁽²⁾ =128, the bias of the two-layer network is initialized to zero vector.

Step four: the pedestrian re-identification network model is constructed, and specifically comprises the following steps:

Step five: preprocessing a pedestrian and identifying a training data set, randomly cutting all images in the training data set to obtain a training data set with the uniform image size of 225 multiplied by 225, scrambling the sequence of the training data set obtained after cutting, randomly selecting P=25 pedestrians from the training data set, and randomly selecting K=4 images of each pedestrian to form a small training batch;

step six: training the pedestrian re-identification network model, solving Hard Triplet Loss loss function by using the training data obtained in the step five, updating network parameters by using a random gradient descent method, and circularly executing the step until the loss function converges, wherein the specific calculation is as follows:

firstly, obtaining the characteristics extracted from each sample in the training batch through ResNet-50 network

An a-th image representing an i-th pedestrian in the training batch, r (-) represents the output of the ResNet-50 network. Then each feature vector is acquired +.>

wherein M is more than or equal to 1 and less than or equal to M and h ^(m) Depth isThe output of the m-th layer in the quantity network,

for nonlinear activation functions, f (·) is a nonlinear mapping function of depth metric network parameterization. Finally, calculating a loss function value:

wherein the method comprises the steps of

Representation of

And->

Euclidean distance between them.d _f (p ₁ ,p ₂ ) Represents p ₁ And p is as follows ₂ Depth measurement distance between, where p ₁ And p is as follows ₂ Are vectors. />

Step seven: and (3) re-identifying pedestrians, inputting the images to be identified and the images in the candidate library into a trained network, and extracting the output of the last full-connection layer of the depth measurement network to obtain the feature vector of the pedestrian image in the same feature space.

Step eight: and calculating Euclidean distance between the image of the pedestrian to be identified and the feature vector of the candidate library image, and sequencing the distance, wherein the image with the higher rank is the image with the same type as the image to be identified, and the same type refers to the image of the same pedestrian.

Further described in conjunction with the drawings.

FIG. 1 is a flowchart of an algorithm implementation of the present invention, and the specific embodiment is as follows:

1. pretraining the ResNet-50 network, and optimizing the training by adopting a dropout or Batch Normalization method, so that the ResNet-50 network has the capability of extracting image characteristics;

2. trimming the ResN-50 network, deleting the softmax layer and the last full connection layer of the ResNet-50 network, wherein the output of the network after deletion is 2048-dimensional vector;

3. a depth measurement network is formed by adopting a plurality of nonlinear full-connection layers, an Euclidean distance calculation unit is added after the depth measurement network is output, and network parameters adopt a random initialization method, wherein the specific formula is as follows:

Is the weight of the m-th layer, each layerBias b ^(m) ∈R ^r(m) Initialized to the zero vector.

4. Constructing a pedestrian re-identification network model, and connecting a depth measurement network after an adjusted ResNet-50 network to form a final network model of the invention, as shown in figure 2;

5. preprocessing pedestrian re-identification training data, randomly cutting images in the training data set to obtain a group of training data sets with 224 multiplied by 224, randomly selecting P pedestrians with different positions from the training data sets, and randomly selecting K images for each pedestrian to form a small training batch;

6. training a network model, optimizing the network in 4) by minimizing Hard Triplet Loss loss function by using the training data obtained in 5), and circularly executing the step until the loss value converges;

7. re-identifying pedestrians, inputting images of pedestrians to be identified and images in a candidate library into an optimized model, and obtaining feature vectors of the pedestrians on the same feature space;

8. calculating Euclidean distance of feature vectors of the sample feature vector to be identified and the pedestrian image library;

9. and sequencing the images in the candidate library according to the sequence from small to large in distance, wherein the image with rank 1 is the image of the same pedestrian as the image to be recognized.

Tables 1-3 are comparisons of performance of the algorithms of the embodiments of the present invention after operation with other algorithms.

Table 1 comparison of the performance of the inventive algorithm with other algorithms on VIPeR pedestrian re-identification public dataset

Method	rank-1	rank-10	rank-20
				Our	56.34	90.25	98.45
DDML	46.50	87.53	96.13
				XQDA	40.50	80.42	91.03
KISSME	19.73	61.20	77.01
				DML	29.73	71.20	86.01

TABLE 2 comparison of the performance of the inventive algorithm with other algorithms on the Market-1501 pedestrian re-identification public dataset

Method	rank-1	mAP
			Our	73.8	89.4
DDML	32.6	57.4
			DML	29.4	53.7
Gated	39.6	65.9
			Pose	56.0	79.3
Scalable	68.8	82.2

TABLE 3 comparison of Performance of the inventive algorithm with other algorithms on CUHK03 line re-identification public dataset

Method	rank-1	rank-5	rank-10
				Our	75.5	90.6	98.4
DDML	56.8	87.3	90.2
				XQDA	46.3	78.9	88.6
KISSME	11.7	33.3	48.0
				DML	35.7	60.9	73.4
Re-ranking	64.0	86.4	93.7

The results obtained from experiments on three common pedestrian re-identification public data sets can show that the rank-1 value and the mAP value of the CMC curve of the embodiment are better than those of other algorithms, which indicates that the embodiment can obtain good pedestrian re-identification performance by constructing a network model based on depth measurement and referencing a triple loss function selected by a difficult sample.

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.

Claims

1. The pedestrian re-identification method based on the depth measurement is characterized by comprising the following steps of:

1. constructing a network

1) The res net-50 network is pre-trained,

2) Adjusting the ResNet-50 network in the step 1), and removing the softmax layer and the last full connection layer in the ResNet-50 network; providing to step 4);

step 4) constructing a pedestrian re-identification network model, which comprises the following steps:

connecting the ResNet-50 network adjusted in the step 2) with the depth measurement network obtained in the step 3), namely inputting the output of the ResNet-50 network into the depth measurement network, and constructing a pedestrian re-identification network model;

2. training

6) The network model is trained such that,

3. identification of

8) Calculating the similarity between the image to be identified and all the images in the candidate library, namely calculating the Euclidean distance of the feature vector between the image to be identified and the candidate library image, wherein the feature vector is obtained in the step 7); then, according to the rule that the similarity is from small to large, sequencing the images in the candidate library, wherein the images are more similar to the images to be identified when the ranking is higher; wherein similar refers to images in which two images of pedestrians are the same pedestrian.

2. The pedestrian re-recognition method based on depth measurement according to claim 1, wherein the pre-training res net-50 network in step 1) optimizes training by using a dropout or Batch Normalization method, so that the res net-50 network has image feature extraction capability.

3. The pedestrian re-recognition method based on depth measurement according to claim 1, wherein the step 2) of adjusting the res net-50 is to eliminate a softmax layer and a last full-connection layer of the res net-50 network to obtain a vector with a final output of 2048 dimensions.

4. The pedestrian re-recognition method based on depth measurement according to claim 3, wherein the depth measurement network in step 3) is accessed to the 2048-dimensional feature vector to output a non-linear projected European space feature vector; the depth measurement network structure specifically comprises:

after a neural network formed by M nonlinear full-connection layers, adding a Euclidean distance calculation layer; the depth of the first full-connection layer is 2048, the parameter initialization of each layer adopts a random initialization method, and the calculation formula is as follows:

/>

Is the weight of the m-th layer, the bias of each layer +.>

5. The pedestrian re-identification method based on depth measurement according to claim 1, wherein the training network model in step 6) specifically refers to a new training set generated in step 5), randomly selecting P different pedestrians, randomly selecting K images for each pedestrian to form a small training batch, inputting the training batch into a network for training, adopting a loss function Hard Triplet Loss, and adopting a calculation formula as follows:

An a-th image representing an i-th pedestrian in the training batch, and r (-) represents the output of the ResNet-50 network;

62 Acquiring each feature vector

f (·) is a nonlinear activation function, f (·) is a nonlinear mapping function of depth metric network parameterization; />

A bias vector representing an mth layer in the depth metric network;

the weight of the m layer in the depth measurement network; r is (r) ^(m) To measure the depth of the m-th layer of the network, and r ⁽⁰⁾ ＝2048；/>

Indicating that the content is r ^(m) And each element in the vector is a real value; r is a real number set;

63 Calculating a loss function value):

wherein the method comprises the steps of

The a-th image of the ith pedestrian in the training Batch is represented, r (-) represents the output of the ResNet-50 network, and P, K is the number of different pedestrians in the Batch and the number of images of each pedestrian respectively; x represents the input of Batch, σ is the threshold, θ is the parameter of the network, ++>

F (·) is a nonlinear activation function, f (·) is a nonlinear mapping function of depth metric network parameterization; d, d _f (p ₁ ,p ₂ ) Represents p ₁ And p is as follows ₂ Depth measurement distance between, where p ₁ And p is as follows ₂ Are vectors; l (L) _BH (theta; X) is the network loss value of the training single batch;

6. The pedestrian re-recognition method based on the depth measurement according to claim 1, wherein the pedestrian re-recognition in step 7) specifically refers to inputting the image of the pedestrian to be recognized and the image in the candidate library into a network, and obtaining an output f (r (x)) of each image x.

7. The pedestrian re-recognition method based on depth measurement according to claim 1, wherein in step 8), the distance between the pedestrian image to be recognized and the contrast image is:

d _f (r(x),r(y))＝d(f(r(x)),f(r(y)))＝||f(r(x))-f(r(y))|| ₂

wherein x represents any image to be identified, and y represents any image in the candidate library; r (-) represents the output of the ResNet-50 network; f (·) is a nonlinear mapping function of the depth metric network parameterization;

d _f (p ₁ ,p ₂ ) Represents p ₁ And p is as follows ₂ Depth measurement distance between, where p ₁ And p is as follows ₂ Are vectors; r (x), r (y) are feature vectors of the image to be recognized and the contrast image respectively, f (r (x)), f (r (y)) are feature vectors of the image to be recognized and the contrast image on the same feature space obtained by nonlinear mapping of the depth measurement network respectively, and d _f (r (x), r (y)) represents the depth metric distance between the image x to be identified and the image y in the candidate library.