CN109961051B

CN109961051B - Pedestrian re-identification method based on clustering and block feature extraction

Info

Publication number: CN109961051B
Application number: CN201910243050.5A
Authority: CN
Inventors: 熊炜; 冯川; 熊子婕; 杨荻椿; 李敏; 王娟; 曾春艳; 刘敏
Original assignee: Hubei University of Technology
Current assignee: Hubei University of Technology
Priority date: 2019-03-28
Filing date: 2019-03-28
Publication date: 2022-11-15
Anticipated expiration: 2039-03-28
Also published as: CN109961051A

Abstract

The invention discloses a pedestrian re-identification method based on clustering and block feature extraction, which comprises the following steps of (1) clustering images through K-means, respectively inputting the clustered images into a DCGAN network, respectively generating images, and expanding an original training set; (2) In deep learning, feature extraction is carried out on real data and generated data without labels through block feature extraction, meanwhile, the clustering labeling smooth normalization loss function (CLS) is adopted to carry out labeling training on the data, and rearrangement (Re-ranking) is adopted during testing to further improve the effect of pedestrian Re-identification. The invention combines the clustering labeling smooth normalization loss function and the block feature extraction method, solves the problems of limited pedestrian re-identification training data and label distribution, and simultaneously extracts the effective features of the image in a block mode.

Description

Pedestrian re-identification method based on clustering and block feature extraction

Technical Field

The invention belongs to the field of digital image processing and computer vision, relates to a pedestrian re-identification method, and particularly relates to a pedestrian re-identification method based on a generated countermeasure network and a convolutional neural network.

Background

Pedestrian re-identification is a technology for judging whether a specific pedestrian exists in an image or a video sequence by using a computer vision technology, and is widely considered as a sub-problem of image retrieval, namely, given a monitored pedestrian image, the pedestrian image under the cross-device is retrieved. The pedestrian re-identification has wide application prospects including pedestrian retrieval, pedestrian tracking, street event detection, pedestrian action and behavior analysis and the like.

In the field of computer vision, the object of pedestrian re-identification is to specify a pedestrian image, and identify the image of the pedestrian from the image library of the pedestrian under the visual angle of other existing non-overlapped cameras. In surveillance video, very high quality face pictures are often not available due to camera resolution and shooting angle. When face recognition fails, pedestrian re-recognition becomes a very important substitute technology. With the development of deep learning in recent years, pedestrian re-identification has taken a very great breakthrough. Deep learning has become a focus of research in the field of computer vision. Because the convolutional neural network has the capability of learning how to extract features, the convolutional neural network is more suitable for practical application in the engineering field than the traditional method.

Because a depth model needs a large amount of training data, but the acquisition of pedestrian re-identification data sets needs to manually perform target framing and ID calibration, which is a relatively high-cost data acquisition, a method for rapidly generating more pedestrian re-identification training data through GAN becomes a popular research direction.

Pedestrian re-identification is a difficult problem, and many challenges are faced in solving the problem. These challenges can be summarized in two categories: the first challenge is the need for large amounts of training data; the second challenge is a non-ideal scenario.

The pedestrian re-identification research mainly comprises the following steps: a feature representation based method and a method for generating a countermeasure network. The characteristic representation method is mainly used for researching and extracting identification characteristics with robustness to represent pedestrians, and the other way for improving the performance of the pedestrian re-identification model is to generate a new pedestrian image by means of a generation countermeasure network (GAN) so as to expand training data.

At present, the following defects mainly exist in the pedestrian re-identification:

(1) Limited training data;

from the current situation of collecting pedestrian re-recognition training data, the spatiotemporal distribution of the collected data relative to the real data is very limited and local. Meanwhile, the data size for pedestrian re-identification is also very small compared to other vision tasks. For example, in the case of the large-scale image recognition data set ImageNet, the training data of the large-scale image recognition data set ImageNet comprises 125 ten thousand pictures, and the pedestrian re-recognition data set currently frequently used for pedestrian re-recognition comprises only 3 ten thousand pictures of pedestrians.

The pedestrian re-identification training and data acquisition are difficult, and it is difficult for people to collect pedestrian data of cross time, cross climate and multiple scenes. In addition, privacy concerns also hinder data acquisition.

(2) The pedestrian re-identification data is difficult to label;

firstly, the labeling workload is huge, and the labeling cost is very large in terms of time and money. Secondly, the annotation itself is sometimes very difficult, and it is difficult to separate two people of similar age and appearance from each other in a video, wearing the same clothing.

(3) Non-ideal scenes;

the pedestrian identification method mainly comprises the steps that pedestrians are in different postures, complex backgrounds, different lighting conditions and different shooting visual angles exist, and great troubles are brought to the pedestrian re-identification. The pedestrian image has the problems of pedestrian misalignment, partial occlusion, low image quality and the like.

Disclosure of Invention

In order to solve the technical problem, the invention provides a pedestrian re-identification method based on generation of a countermeasure network and a convolutional neural network.

The technical scheme adopted by the invention is as follows: a pedestrian re-identification method based on clustering and block feature extraction is characterized by comprising the following steps:

step 1: collecting Pedestrian images under a monitoring camera to obtain a Pedestrian image library Pedestrian01, and carrying out k-means clustering on the Pedestrian 01;

step 2: respectively inputting the clustered images into a DCGAN network, and respectively generating unlabeled images;

and step 3: performing label distribution on the label-free image generated in the step 2 by adopting a clustering label smooth normalized loss function (CLS) to obtain a labeled generated image;

and 4, step 4: fusing the collected Pedestrian image library with the generated image obtained in the step 3, and expanding the collected Pedestrian image library to obtain a new Pedestrian image library Pedestrian02;

and 5: dividing the Peer 02 into p blocks horizontally, and inputting each block into a CNN network for feature extraction to obtain local features of the image;

step 6: performing combined training on the CNN network by adopting a clustering label smooth normalized loss function CLS and a cross entropy loss function in the training process;

and 7: during testing, re-ranking Re-ranking optimization is adopted, and a pedestrian Re-identification result is output.

Compared with the existing algorithm, the method has the remarkable advantages that:

(1) Aiming at the problems of difficult pedestrian re-identification training and data acquisition, the invention adopts a DCGAN network to expand a data set. Firstly, clustering images of an original data set by adopting a K-means method, dividing the images with similar characteristics into the same class, and then respectively sending the clustered K-class images into a DCGAN network to train so as to obtain K-class generated images. The clustering process makes the generated image more real and effective.

(2) Compared with the label smooth normalization, the cluster smooth normalization loss function (CLS) provided by the invention has more adaptability to the label of the generated sample because the class labels in different clusters are removed, and the classes in the same cluster are distributed with uniform probability, thereby avoiding over-concentration on a certain class and solving the problems of label distribution and over-smoothing.

(3) The invention mainly solves the problem of non-ideal scenes by detecting and matching human body parts and adopting a block feature extraction method, so that a network can learn more potential factors and the robustness is enhanced.

(4) The pedestrian re-identification method based on clustering and block feature extraction can be applied to re-identification of pedestrians in complex scenes, has the advantages of strong transportability, stable algorithm and high speed for scene change, can effectively solve the problems of small data set and label distribution, and has strong practicability.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

Detailed Description

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.

The method of the invention is divided into two parts: (1) Clustering the images through K-means, respectively inputting the clustered images into a DCGAN network, respectively generating images, and expanding an original training set; (2) In deep learning, feature extraction is carried out on real data and generated data without labels through block feature extraction, meanwhile, the clustering labeling smooth normalization loss function (CLS) is adopted to carry out labeling training on the data, and rearrangement (Re-ranking) is adopted during testing to further improve the effect of Re-identification of the pedestrians. The method combines the clustering labeling smooth normalization loss function and the block feature extraction method, solves the problems of limited pedestrian re-identification training data and label distribution, and extracts the effective features of the image in a block mode.

Referring to fig. 1, the pedestrian re-identification method based on clustering and block feature extraction provided by the invention comprises the following steps:

step 1: collecting Pedestrian images under a monitoring camera to obtain a Pedestrian image library Peerstrian 01, and carrying out k-means clustering on the Peerstrian 01;

the specific implementation of the step 1 comprises the following substeps:

step 1.1: pedestrian images Pedestrian01 to ResNet50 network input to the monitoring camera adopt a cross entropy loss function

Training to obtain a feature extraction model; wherein y is the original actual output,

is the desired output;

step 1.2: inputting the training data set into the feature extraction model of step 1.1, and extracting the feature mapping vector x of the last convolutional layer _n ；

Step 1.3: randomly selecting K feature mapping objects, wherein each object represents an initial mean value of a cluster, also called cluster center mu _j (ii) a Wherein the value of K is a positive integer greater than 0;

step 1.4: for each feature map x, calculating x separately _n With cluster center μ _j Then obtaining the nearest cluster center, and determining the cluster mark of x, namely: c _i ＝argmin _j∈[1,m] ||x _n -μ _j | |, finally the sample x is classified into the corresponding cluster C _i,x ＝C _i U is x; wherein m represents the number of clusters;

step 1.5: updating the cluster center for each cluster C _i Computing new cluster centers

Mu's' _j ≠μ _j Update mu _j Is mu' _j Otherwise mu _j The change is not changed;

step 1.6: repeat step 1.4 and step 1.5 to μ _j And no updating, and finally dividing the feature mapping vector into clusters: c = { C ₁ ,C ₂ ,...,C _m }。

And 2, step: respectively inputting the clustered images into a DCGAN network, and respectively generating unlabeled images;

and 3, step 3: performing label distribution on the label-free image generated in the step 2 by adopting a clustering label smooth normalized loss function (CLS) to obtain a labeled generated image;

and 4, step 4: fusing the collected Pedestrian image library with the generated image obtained in the step 3, and expanding the collected Pedestrian image library to obtain a new Pedestrian image library Peerseries 02;

the specific implementation of the step 4 comprises the following substeps:

step 4.1: inputting the feature vectors of each cluster into a DCGAN network respectively, wherein the DCGAN network consists of a generative model (obtaining data distribution) and a distinguishing model D (predicting whether the input is real or generated in G); wherein G is a simple neural network, a feature vector is input, and then a graph is generated as an output; d is also a simple neural network, inputs an image and then generates a confidence;

and 4.2: using a loss function L _GAN Training the DCGAN network to obtain a generated image;

L _GAN ＝logD(x)+log(1-D(G(z)))

wherein D (x) is a confidence coefficient, the value is [0-1], and G (z) is a generated image;

step 4.3: all the generated images are blended together and fused with the collected Pedestrian image library pedistrian 01 to serve as a new Pedestrian image library pedistrian 02.

And 5: dividing the Peer 02 level into p blocks, and inputting each block into the CNN network for feature extraction to obtain the local features of the image;

the specific implementation of the step 5 comprises the following substeps:

step 5.1: inputting a generated new training set in a ResNet50 network, forming a 3D tensor T through forward convolution, and dividing the tensor T into p horizontal strips; wherein the value of p is a positive integer greater than 0;

step 5.2: sampling tensor T space into p horizontal strips through a pooling layer, and averaging all column vectors in the same strip into a single partial-level column vector g;

step 5.3: reducing the dimension of the vector g through a convolution layer with the kernel size of 1 multiplied by 1, and finally respectively inputting the column vector h after dimension reduction of each vector g into a classifier;

step 5.4: connecting p h to form a final descriptor of the input image, and obtaining local features of the image;

and step 5.5: during training, each classifier predicts the identity of an input image, and a multi-loss optimization strategy is adopted, so that the softmax classifiers of p classes are used for training, and the loss function uses the cross entropy loss function in the step 1.1;

and 6: performing combined training on the CNN network by adopting a clustering label smooth normalized loss function CLS and a cross entropy loss function in the training process;

the specific implementation of the step 6 comprises the following substeps:

step 6.1: constructing a clustering label smooth normalization loss function CLS;

in order to solve the problem of new data label distribution, the invention invents a cluster label smooth normalized loss function (CLS) to train the CNN model according to the similarity of cluster samples during training.

According to the invention, a cluster labeling smooth normalization loss function (CLS) is constructed according to the similarity of cluster samples, firstly, a label smooth regularization LSRO loss function of an abnormal value is given:

wherein p (K | x) represents the prediction probability that input x belongs to class K, where K is the number of sample classes; t is a distinguishing parameter, for a real training image, T =0; for the generated image, T =1;

wherein z is _i ，z _k Respectively generating non-normalized probabilities of the ith and the K images by using K clustering;

given by N _i Cluster of individual classes C _i The generated image x, i belongs to [1,m ]]，q _g (k | x) is the one-hot category labeling probability of the image x, and the expression is as follows:

for all k, 0-1 coding is used, first when k belongs to cluster C _i Let K =1, otherwise K =0, where K ∈ {1,2, 3.., K }, and then respectively match the number with the general classAnd dividing by K to obtain the normalized clustering class label of the generated sample as follows:

considering the valid correct (ground-route) distribution so that

Since the samples are from cluster C _i And C is _i Each class has similar characteristics, so uniform distribution is adopted

Indicating that the generated sample x belongs to C _i Probability of each class, in z _k,x Denotes the log probability of unnormalization, let N _i As a cluster C _i Total number of classes of (2), then network normalized output z' _k The expression of (a) is:

deriving partial prediction probability expressions

Q is to be _g '(k | x) and p' (k | x) are substituted into the LSRO loss function to obtain CLS as:

in contrast to the LSRO penalty function, the CLS is written as:

the expression of the true sample label q' (k) of the CLS obtained from the existing label smooth normalized LSR is:

wherein delta _k,t ＝q _g (k/x)；

It can be seen from the observation that epsilon =0.1 in LSR, when k ≠ t, the LSRO penalty function gives a given label to the generated sample

Epsilon is 1; in the cluster label smooth normalized loss function (CLS) of the present invention

Step 6.2: and (3) training the CNN model by using the combination of the clustering label smooth normalized loss function CLS and the cross entropy loss function.

Step 6.3: during testing, the collected Pedestrian image library Pedestrian01 is input into the CNN model trained in the step 6.2, and an initial result arrangement table is obtained.

The specific implementation of step 7 comprises the following substeps:

step 7.1: the invention reorders the pictures in the Pedestrian image Pedestrian01 to be detected by adopting the existing k-order derivative coding-based method, so that the identification result is improved. Our goal is to reorder the initial ordered list in step 6.3 so that more positive samples appear at the front of the list. First, k-th order neighbor (k-nn), i.e., the first k samples of the ordered list, is defined:

wherein

Mahalanobis distance of feature vectors 0 to k, p is the query image,k is the k-th order inverse coded feature vector and N () is the set. Then, k-order nearest neighbor (k-rnn) is defined

Wherein g is _i For the ith image in Pedestrian01,

is p and g _i Are all in the k-th neighborhood of each other. However, due to a range of variations in illumination, pose, view angle, etc., positive samples may be excluded from the k-nn list, and we therefore employ a more robust k-rnn set.

I.e. for a collection of scripts

Find their k-rnn set for each sample q in (1)

If the number of coincident samples reaches a certain condition, the coincident samples are merged

In this way, the original text is not

Positive samples in the setAnd bringing back again.

Step 7.2: calculating the Jaccard distance between the two images through the k-rnn set obtained in the step 7.1:

where p is the query image, g _i For the ith image in Pedestrian01, k is the k-th order reciprocal coded feature vector.

Step 7.3: and (4) rearranging the initial result obtained in the step (6.3) according to the Jaccard distance obtained in the step (7.2), and finally outputting a pedestrian re-identification result.

It should be understood that parts of the specification not set forth in detail are well within the prior art.

It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A pedestrian re-identification method based on clustering and block feature extraction is characterized by comprising the following steps:

and step 3: performing label distribution on the label-free image generated in the step 2 by adopting a clustering label smooth normalized loss function CLS to obtain a labeled generated image;

2. The pedestrian re-identification method based on clustering and block feature extraction as claimed in claim 1, wherein the specific implementation of step 1 comprises the following sub-steps:

step 1.1: inputting Pedestrian image libraries Pedestrian01 to ResNet50 network under a monitoring camera, and adopting a cross entropy loss function

is the desired output;

step 1.2: inputting the training data set into the feature extraction model in step 1.1, and extracting the feature mapping vector x of the last convolutional layer _n ；

Step 1.3: randomly selecting K feature mapping objects, wherein each object represents an initial mean value of a cluster, also called a cluster center mu _j (ii) a Wherein the value of K is a positive integer greater than 0;

step 1.4: for each feature map x, calculating x separately _n With cluster center μ _j Then obtaining the nearest cluster center, and determining the cluster mark of x, namely: c _i ＝argmin _j∈[1,m] ||x _n -μ _j | |, finally the sample x is sorted into the corresponding cluster C _i,x ＝C _i U is x; wherein m represents the number of clusters;

step 1.5: in update clusterHeart, for each cluster C _i Computing new cluster centers

3. The pedestrian re-identification method based on clustering and block feature extraction as claimed in claim 1, wherein the specific implementation of step 4 comprises the following sub-steps:

step 4.1: respectively inputting the feature vectors of each cluster into a DCGAN network, wherein the DCGAN network consists of a generation model G and a distinguishing model D; wherein G is a simple neural network, a feature vector is input, and then a graph is generated as an output; d is also a simple neural network, inputs an image and then generates a confidence;

step 4.2: using a loss function L _GAN Training the DCGAN network to obtain a generated image;

L _GAN ＝logD(x)+log(1-D(G(z)))

4. The pedestrian re-identification method based on clustering and block feature extraction as claimed in claim 1, wherein the concrete implementation of step 5 comprises the following sub-steps:

step 5.1: inputting a generated new Pedestrian image library Pedestrian02 in a ResNet50 network, forming a 3D tensor T through forward convolution, and horizontally dividing the tensor T into p horizontal strips; wherein the value of p is a positive integer greater than 0;

and step 5.2: sampling a tensor T space into p horizontal strips through a pooling layer, and averaging all column vectors in the same strip into a single partial-level column vector g;

step 5.5: during training, each classifier predicts the identity of the input image, and is trained with the softmax classifiers of p classes using the multi-loss optimization strategy, and the loss function uses the cross-entropy loss function in step 1.1.

5. The pedestrian re-identification method based on clustering and block feature extraction as claimed in claim 1, wherein the specific implementation of step 6 comprises the following sub-steps:

first the label smoothing regularization LSRO loss function for a given outlier:

wherein p (K | x) represents the prediction probability that input x belongs to class K, where K is the number of sample classes; t is a distinguishing parameter, for a collected pedestrian image, T =0; for the generated image, T =1;

wherein z is _i ，z _k Respectively using the non-normalized probability of the ith and the K images generated by the K clustering;

for all k, 0-1 coding is used, first when k belongs to cluster C _i When K =1, otherwise K =0, where K belongs to {1,2,3,.., K }, and then the numbers are divided by the total class K, respectively, to obtain a normalized clustering class label of the generated sample as:

with uniform distribution

Indicating that the generated sample x belongs to C _i Probability of each class, using z _k,x Denotes the log probability of unnormalization, let N _i As a cluster C _i Total number of classes of (2), then network normalized output z' _k The expression of (a) is:

deriving partial prediction probability expressions

in contrast to the LSRO penalty function, the CLS is written as:

the expression of the real sample label q' (k) of the CLS obtained from the existing label smooth normalized LSR is:

wherein delta _k,t ＝q _g (k/x), ε =0.1 in LSR, when k ≠ t, the LSRO loss function gives a label to the generated sample

Epsilon is 1;

step 6.2: training the CNN model by using a clustering label smooth normalization loss function CLS and a cross entropy loss function;

step 6.3: during testing, the collected Pedestrian image library Pedestrian01 is input into the trained CNN model in the step 6.2, and an initial result arrangement table is obtained.

6. The pedestrian re-identification method based on clustering and block feature extraction according to any one of claims 1 to 5, wherein the specific implementation of step 7 comprises the following sub-steps:

step 7.1: reordering the pictures in the acquired Pedestrian image Pedestrian01 to be detected by adopting a k-order derivative coding-based mode, so that the identification result is improved;

first, k-th order neighbor k-nn is defined, i.e. the first k samples of the ordered list:

wherein

The Mahalanobis distance from the feature vector 0 to k is obtained, p is the query image, k is the k-order reciprocal coded feature vector, and N () is a set;

next, k-order nearest neighbors k-rnn are defined:

wherein g is _i For the ith image in Pedestrian01,

is p and g _i A set of conditions that are all neighbors of k-th order of the counterpart; however, due to a series of changes in illumination, attitude, and view angle, positive samples may be excluded from the k-nn list, and therefore a more robust k-rnn set is used:

i.e. for a collection of scripts

Find their k-rnn set for each sample q in the set

If the number of coincident samples reaches a certain condition, the coincident samples are combined

In this way, the original place is not

Bringing back the positive samples in the set;

and 7.2: calculating the Jaccard distance between the two images through the k-rnn set obtained in the step 7.1:

where p is the query image, g _i The ith image in Pedestrian01 is shown, and k is a feature vector coded by an inverse k order;