CN111191587A

CN111191587A - Pedestrian re-identification method and system

Info

Publication number: CN111191587A
Application number: CN201911393313.7A
Authority: CN
Inventors: 王阳萍; 李力; 张衍; 李宝文; 党建武; 王松; 雍玖; 杨景玉; 金静; 郭治成
Original assignee: Lanzhou Jiaotong University
Current assignee: Lanzhou Jiaotong University
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-05-22
Anticipated expiration: 2039-12-30
Also published as: CN111191587B

Abstract

The invention relates to a pedestrian re-identification method and system. The method comprises the steps of obtaining a bilinear convolutional neural network model; acquiring an input pedestrian image; extracting the intermediate features of the input pedestrian image by using a backbone network; slicing the intermediate features to obtain block features; carrying out convolution and linear rectification operation on the block characteristics by utilizing a backbone network and a branch network to obtain a first pedestrian block fine-grained characteristic and a second pedestrian block fine-grained characteristic; fusing the first pedestrian blocking fine-grained characteristic and the second pedestrian blocking fine-grained characteristic by adopting a bilinear pooling layer to obtain a fused characteristic; and determining the image which is the same as the pedestrian image to be identified in the database to be searched according to the distance between the fusion feature of the pedestrian image to be identified and the fusion feature of the image in the database to be searched. The invention provides a pedestrian re-identification method and system, which solve the problems of dislocation of pedestrian re-identification and low identification rate of pedestrian re-identification in the prior art.

Description

Pedestrian re-identification method and system

Technical Field

The invention relates to the technical field of image processing, in particular to a pedestrian re-identification method and system.

Background

Pedestrian Re-identification (Person Re-identification), also called pedestrian Re-identification, abbreviated ReID, is a technique for determining whether a specific pedestrian is present in an image or video sequence using computer vision techniques, and is widely recognized as a sub-problem in image retrieval. Given a monitored pedestrian image, the pedestrian image is retrieved across the device. In surveillance video, very high quality face pictures are often not available due to camera resolution and shooting angle. When face recognition fails, pedestrian re-recognition becomes a very important substitute technology.

The traditional pedestrian features mainly include Color Histograms (Color Histograms), local binary Patterns (LocalBinary Patterns), and the like. The processing and identifying method for the pedestrian features mainly comprises similarity metric learning, convolution neural network and mapping of the pedestrian features to a metric space by using the similarity metric learning. Commonly used Metric learning methods in the conventional method include kissme (Mahalanobis Metric learning), Local Fisher discriminant Analysis (Local Fisher discriminant Analysis), Marginal Fisher Analysis (Mariginal Fisher Analysis), large-interval nearest neighbors (LocalAdaptation decisions), Local Adaptive Decision Functions (Localy Adaptive Decision Functions), attribute consistency matching and the like.

In 2014, convolutional neural networks were first used to deal with the pedestrian re-identification problem. To address the problem of insufficient training data in human re-identification, image pairs or triplets are often employed to calculate the loss. In MR-BCNN, each input image is divided into three overlapping portions, the three portions are passed through a portion-specific convolutional network, and the outputs are connected to form the final representation. The representations of the two images are further compared using a cosine similarity measure. Or a bilinear CNN is introduced through Ustinova to learn a discriminant descriptor and a Histogram Loss (Histogram Loss) so as to train the convolutional neural network respectively. The three methods do not carry out corresponding processing aiming at the dislocation problem, so that the recognition rate of pedestrian re-recognition is low.

Disclosure of Invention

The invention aims to provide a pedestrian re-identification method and a pedestrian re-identification system, which solve the problems of dislocation of pedestrian re-identification and low identification rate of pedestrian re-identification in the prior art.

In order to achieve the purpose, the invention provides the following scheme:

a pedestrian re-identification method, comprising:

acquiring a bilinear convolutional neural network model; the bilinear convolutional neural network model comprises a main network and a branch network which are formed by residual error networks;

acquiring an input pedestrian image; the input pedestrian image comprises a pedestrian image to be identified and an image in a database to be searched;

extracting the intermediate features of the input pedestrian image by using the backbone network of the bilinear convolutional neural network model; the intermediate feature is a multi-layer feature map output by a residual block in the backbone network;

carrying out slicing operation on the intermediate features to obtain block features of the input pedestrian image;

carrying out convolution and linear rectification operation on the block characteristic by utilizing a backbone network of the bilinear convolution neural network model to obtain a first pedestrian block fine-grained characteristic of the input pedestrian image; the first pedestrian blocking fine-grained characteristic is the characteristic of the whole pedestrian in the input pedestrian image;

carrying out convolution and linear rectification operation on the block characteristics by utilizing a branch network of the bilinear convolution neural network model to obtain second pedestrian block fine-grained characteristics of the input pedestrian image; the second pedestrian blocking fine-grained features are features of different positions of pedestrians in the input pedestrian image;

fusing the first pedestrian blocking fine-grained characteristic and the second pedestrian blocking fine-grained characteristic by adopting a bilinear pooling layer to obtain a fusion characteristic of the input pedestrian image; the fusion features are fusion fine-grained features of different positions of pedestrians;

and determining the image which is the same as the pedestrian image to be identified in the database to be searched according to the distance between the fusion feature of the pedestrian image to be identified and the fusion feature of the image in the database to be searched.

Optionally, the fusing the first pedestrian blocking fine-grained feature and the second pedestrian blocking fine-grained feature by using a bilinear pooling layer to obtain a fused feature of the input pedestrian image, and then further comprising:

performing self-adaptive pooling operation and convolution operation on the fusion features to obtain fusion features after dimensionality reduction;

using a loss function L-0.6L₁+0.4*L₂Optimizing the fusion characteristics after the dimensionality reduction; l is₁Is a loss value of the backbone network, L₂Is the loss value of the trunk network.

Optionally, the obtaining a bilinear convolutional neural network model further includes:

training the bilinear convolutional neural network model by using a Pythrch deep learning library.

and removing a logistic regression layer of the bilinear convolutional neural network model.

A pedestrian re-identification system comprising:

the model acquisition module is used for acquiring a bilinear convolutional neural network model; the bilinear convolutional neural network model comprises a main network and a branch network which are formed by residual error networks;

the image acquisition module is used for acquiring an input pedestrian image; the input pedestrian image comprises a pedestrian image to be identified and an image in a database to be searched;

the intermediate feature extraction module is used for extracting the intermediate features of the input pedestrian image by utilizing the trunk network of the bilinear convolutional neural network model; the intermediate feature is a multi-layer feature map output by a residual block in the backbone network;

the slicing module is used for carrying out slicing operation on the intermediate features to obtain the blocking features of the input pedestrian image;

the first pedestrian blocking fine-grained feature determination module is used for performing convolution and linear rectification operations on the blocking features by utilizing a backbone network of the bilinear convolutional neural network model to obtain first pedestrian blocking fine-grained features of the input pedestrian image; the first pedestrian blocking fine-grained characteristic is the characteristic of the whole pedestrian in the input pedestrian image;

the second pedestrian blocking fine-grained feature determination module is used for performing convolution and linear rectification operations on the blocking features by using a trunk network of the bilinear convolutional neural network model to obtain second pedestrian blocking fine-grained features of the input pedestrian image; the second pedestrian blocking fine-grained features are features of different positions of pedestrians in the input pedestrian image;

the fusion characteristic determining module is used for fusing the first pedestrian block fine-grained characteristic and the second pedestrian block fine-grained characteristic by adopting a bilinear pooling layer to obtain a fusion characteristic of the input pedestrian image; the fusion features are fusion fine-grained features of different positions of pedestrians;

and the identification module is used for determining the image which is the same as the image of the pedestrian to be identified in the database to be searched according to the distance between the fusion feature of the image of the pedestrian to be identified and the fusion feature of the image in the database to be searched.

Optionally, the method further includes:

the dimensionality reduction module is used for carrying out self-adaptive pooling operation and convolution operation on the fusion features to obtain fusion features subjected to dimensionality reduction;

an optimization module for utilizing a loss function L-0.6L₁+0.4*L₂Optimizing the fusion characteristics after the dimensionality reduction; l is₁Is a loss value of the backbone network, L₂Is the loss value of the trunk network.

Optionally, the method further includes:

and the training module is used for training the bilinear convolutional neural network model by using a Pythrch deep learning library.

Optionally, the method further includes:

and the logistic regression layer removing module is used for removing the logistic regression layer of the bilinear convolutional neural network model.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention provides a pedestrian re-identification method and a system, wherein a branch network is arranged on a main network of a bilinear convolutional neural network, and the main network is utilized to perform slicing operation on an image; filtering background information of the image through a slicing operation; and then, the main network and the branch network are utilized to fuse the sliced images to obtain fusion characteristics and fusion fine-grained characteristics at different positions, so that the representation form of the pedestrian information comprises more information, the dislocation problem is prevented, and the identification precision of the pedestrian re-identification is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

Fig. 1 is a schematic flow chart of a pedestrian re-identification method provided by the invention;

FIG. 2 is a schematic diagram of a bilinear convolutional neural network model in a pedestrian re-identification method according to the present invention;

fig. 3 is a schematic structural diagram of a bilinear convolutional neural network model in a pedestrian re-identification method provided by the present invention;

FIG. 4 is a schematic diagram of a Loss value and a top1 error curve in a bilinear convolutional neural network model training process in the pedestrian re-identification method provided by the present invention;

fig. 5 is a schematic diagram of a pedestrian re-identification system provided by the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Fig. 1 is a schematic flow chart of a pedestrian re-identification method provided by the present invention, and as shown in fig. 1, the pedestrian re-identification method provided by the present invention includes:

s101, acquiring a bilinear convolutional neural network model; the bilinear convolutional neural network model comprises a main network and a branch network which are formed by residual error networks.

S102, acquiring an input pedestrian image; the input pedestrian image includes a pedestrian image to be recognized and an image in a library to be searched.

S103, extracting the intermediate features of the input pedestrian image by using the backbone network of the bilinear convolutional neural network model; the intermediate features are multi-layer feature maps output by a residual block in the backbone network.

S104, performing slicing operation on the intermediate features to obtain block features of the input pedestrian image;

s105, carrying out convolution and linear rectification operation on the block feature by using a backbone network of the bilinear convolution neural network model to obtain a first pedestrian block fine-grained feature of the input pedestrian image; and the first pedestrian blocking fine-grained characteristic is the integral characteristic of the pedestrians in the input pedestrian image.

S106, carrying out convolution and linear rectification operation on the block feature by using a branch network of the bilinear convolution neural network model to obtain a second pedestrian block fine-grained feature of the input pedestrian image; and the second pedestrian blocking fine-grained characteristic is a characteristic of different positions of the pedestrian in the input pedestrian image.

S107, fusing the first pedestrian block fine-grained feature and the second pedestrian block fine-grained feature by using a bilinear pooling layer to obtain a fusion feature of the input pedestrian image; the fusion features are fusion fine-grained features of different positions of the pedestrian.

S108, determining the image in the to-be-searched library, which is the same as the to-be-recognized pedestrian image, according to the distance between the fusion feature of the to-be-recognized pedestrian image and the fusion feature of the image in the to-be-searched library.

Fig. 2 is a schematic diagram of a bilinear convolutional neural network model in the pedestrian re-identification method provided by the present invention, and as shown in fig. 2, the extracted features are sliced to further filter out background information in the image information.

The bilinear convolutional neural network model is constructed by a residual error network (ResNet).

As another example, to maximize model complexity, ResNet50 is used as the backbone network for the bilinear convolutional neural network model. The specific number of layers of the backbone network is shown in table 1, channels are reduced, and residual blocks are reduced, namely, the number of the residual blocks is reduced to three. The slicing operation is performed using the features output by the second layer residual block, and the central feature of each pedestrian is horizontally divided into 4 parts as the input of a branch network, which also starts from the position of the slice.

Fig. 3 is a schematic structural diagram of a bilinear convolutional neural network model in the pedestrian re-identification method provided by the present invention, and as shown in fig. 3, branch networks are respectively disposed above and below a main network in the bilinear convolutional neural network model, and the branch networks disposed above and below the main network have the same structure.

As another example, the specific number of layers of the trunk network is shown in table 1, the convolution sum of the first layer (Conv1) of the branch network is 3 × 3, and the step size is 2 × 2. The second layer (Conv2) was first maximum pooled by 3 x 3 with a step size of 2 x 2 and then the input features were convolved with 3 residual blocks. Each residual block is composed of 3 convolution layers of 32 convolution kernels of size 1 x 1, 32 convolution kernels of size 3 x 3 and 64 convolution kernels of size 1 x 1, respectively. The third tier (Conv3) consists of 3 residual blocks, each consisting of 64 convolution kernels of size 1 x 1, 64 convolution kernels of size 3 x 3 and 128 convolution kernels of size 1 x 1, respectively. The fourth layer (Conv4) consists of 3 residual blocks, each consisting of 128 convolution kernels of size 1 x 1, 128 convolution kernels of size 3 x 3 and 256 convolution kernels of size 1 x 1, respectively. The fifth layer (Conv5) consists of 3 residual blocks, each consisting of 256 convolution kernels of size 1 x 1, 256 convolution kernels of size 3 x 3 and 512 convolution kernels of size 1 x 1, respectively. The fusion of features is then performed using compact bilinear pooling. And then reducing the dimension of the feature by using the convolution of the adaptive pooling and 1 x 1. Where the number of 1 × 1 convolutions is determined by the number of pedestrians in the training set, this embodiment uses the Market-1501 data set as 751. The features after dimensionality reduction were used and then sorted using Softmax.

TABLE 1

In order to improve the identification precision, the fusion features are optimized, and the specific operation is as follows:

As a specific example, the convolution operation is 1 × 1 convolution and is classified using the classifier softmax.

Wherein the loss function L is 0.6L₁+0.4*L₂The specific determination process is as follows:

step 1: by using

Determining a loss value for each network; y is⁽ⁱ⁾Is a real label of the bilinear convolutional neural network model,

is the output value of the bilinear convolutional neural network model.

Step 2: according to the plot diagram of the Loss value and top1 error in fig. 4, the weights of the trunk network and the branch network are determined.

Step 3: determining the loss function L-0.6L₁+0.4*L₂。

Further, in order to ensure the accuracy of the bilinear convolutional neural network model, a Pytorch deep learning library is used for training the bilinear convolutional neural network model.

As an embodiment, the bilinear convolutional neural network model is implemented and trained by using a Pytorch deep learning library, and optimized by using random gradient descent (SGD), the initial learning rate is set to be 0.01, 10 is divided after every 10 epochs, the batch size is set to be 32, a dropout layer is arranged in front of each result output layer, and the retention probability is 0.5. All training and test images were set to a size of 384 x 192. Each training image is normalized by subtracting the channel mean value thereof, and then is fed back to the network for training in a random sequence.

Further, after S101, the method further includes:

In order to verify the accuracy of the identification of the pedestrian re-identification method provided by the invention, after S108, the method further includes:

the accuracy of the identification of the pedestrian re-identification method provided by the invention is verified by using a Cumulative matching Characteristic curve (CMC) and an Average Precision Average (MAP).

CMC considers pedestrian re-identification as a ranking problem, and uses the probability of successful first matching in the gallery, denoted by rank1, i.e. the probability that the picture ranked first in the ranking table is the correct result, and is obtained by taking the average value through many experiments.

MAP is an Average value of Average Precision (AP), and is an evaluation criterion when retrieving a picture by re-identifying a pedestrian, and the formula of AP and MAP is as follows:

where k is the serial number of the searched image, p (k) is the proportion of the images related to the retrieved image, and when the retrieved image is the related image of the kth image, the value of b (k) is 1, otherwise it is 0.

The present invention further provides a pedestrian re-identification system, fig. 5 is a schematic diagram of the pedestrian re-identification system provided by the present invention, and as shown in fig. 5, the pedestrian re-identification system includes: the system comprises a model acquisition module 501, an image acquisition module 502, an intermediate feature extraction module 503, a slicing module 504, a first pedestrian block fine-grained feature determination module 505, a second pedestrian block fine-grained feature determination module 506, a fusion feature determination module 507 and an identification module 508.

The model obtaining module 501 is configured to obtain a bilinear convolutional neural network model; the bilinear convolutional neural network model comprises a main network and a branch network which are formed by residual error networks.

The image obtaining module 502 is used for obtaining an input pedestrian image; the input pedestrian image includes a pedestrian image to be recognized and an image in a library to be searched.

The intermediate feature extraction module 503 is configured to extract an intermediate feature of the input pedestrian image by using a backbone network of the bilinear convolutional neural network model; the intermediate features are multi-layer feature maps output by a residual block in the backbone network.

The slicing module 504 is configured to perform a slicing operation on the intermediate features to obtain block features of the input pedestrian image.

The first pedestrian blocking fine-grained feature determination module 505 is configured to perform convolution and linear rectification operations on the blocking features by using a backbone network of the bilinear convolutional neural network model to obtain first pedestrian blocking fine-grained features of the input pedestrian image; and the first pedestrian blocking fine-grained characteristic is the integral characteristic of the pedestrians in the input pedestrian image.

The second pedestrian blocking fine-grained feature determination module 506 is configured to perform convolution and linear rectification operations on the blocking features by using the trunk network of the bilinear convolutional neural network model to obtain second pedestrian blocking fine-grained features of the input pedestrian image; and the second pedestrian blocking fine-grained characteristic is a characteristic of different positions of the pedestrian in the input pedestrian image.

The fusion feature determination module 507 is configured to perform fusion on the first pedestrian blocking fine-grained feature and the second pedestrian blocking fine-grained feature by using a bilinear pooling layer to obtain a fusion feature of the input pedestrian image; the fusion features are fusion fine-grained features of different positions of the pedestrian.

The identification module 508 is configured to determine an image in the to-be-searched library that is the same as the to-be-recognized pedestrian image according to a distance between the fusion feature of the to-be-recognized pedestrian image and the fusion feature of the image in the to-be-searched library.

The pedestrian re-identification system further comprises: the system comprises a dimension reduction module and an optimization module.

And the dimension reduction module is used for carrying out self-adaptive pooling operation and convolution operation on the fusion features to obtain the fusion features after dimension reduction.

The optimization module is used for utilizing a loss function L-0.6L₁+0.4*L₂Optimizing the fusion characteristics after the dimensionality reduction; l is₁Is a loss value of the backbone network, L₂Is the loss value of the trunk network.

As an embodiment, the pedestrian re-identification system further includes: and a training module.

As another embodiment, the pedestrian re-identification system further includes: and removing the logistic regression layer module.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A pedestrian re-identification method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the fusing the first pedestrian blocking fine-grained feature and the second pedestrian blocking fine-grained feature by using a bilinear pooling layer to obtain a fused feature of the input pedestrian image, and then further comprising:

3. The method according to claim 1, wherein the obtaining a bilinear convolutional neural network model further comprises:

4. The method according to claim 1, wherein the obtaining a bilinear convolutional neural network model further comprises:

5. A pedestrian re-identification system, comprising:

6. The pedestrian re-identification system according to claim 5, further comprising:

7. The pedestrian re-identification system according to claim 5, further comprising:

8. The pedestrian re-identification system according to claim 5, further comprising: