CN111582126A

CN111582126A - Pedestrian re-identification method based on multi-scale pedestrian contour segmentation fusion

Info

Publication number: CN111582126A
Application number: CN202010360873.9A
Authority: CN
Inventors: 王慧燕; 陈海英; 陶家威
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2020-08-25
Anticipated expiration: 2040-04-30
Also published as: CN111582126B

Abstract

The invention discloses a pedestrian re-identification method based on multi-scale pedestrian contour segmentation and fusion. Firstly, preprocessing data; secondly, extracting global features of the image and contour features of pedestrians, and fusing the two features; secondly, training the pedestrian re-identification network by adopting a label smooth loss function so as to optimize network parameters; and finally, calculating the Euclidean distance between the specified object in the query set and each object in the candidate set aiming at the query set and the candidate set contained in the pedestrian re-identification data set, and then performing ascending sorting on the calculated distances to obtain a sorting result of the pedestrian re-identification. The pedestrian re-identification method removes the characteristics of the pedestrian clothes, learns the human body outline of the pedestrian to identify the pedestrian, and performs pedestrian re-identification by combining the global characteristics. The invention can better identify the pedestrian clothes whether the pedestrian clothes are replaced or not.

Description

Pedestrian re-identification method based on multi-scale pedestrian contour segmentation fusion

Technical Field

The invention relates to the technical field of computer vision, in particular to a pedestrian re-identification method based on multi-scale pedestrian contour segmentation and fusion.

Background

Pedestrian Re-identification, also known as pedestrian Re-identification (Re-ID), is a technique that uses computer vision to determine whether a specific pedestrian is present in an image or video sequence, and specifically refers to identifying the identity of the pedestrian from images of the pedestrian captured by different cameras. Given an image containing a target pedestrian (query), the ReID system attempts to search for images containing the same pedestrian from a large number of pedestrian images (a gallery), widely recognized as a sub-problem for image retrieval; given an image of a pedestrian under surveillance, the image of the pedestrian across the device is retrieved. The camera aims to make up the visual limitation of the existing fixed camera, can be combined with a pedestrian detection/pedestrian tracking technology, and can be widely applied to the fields of video monitoring, security protection and the like. ReID is of great interest to both academia and industry for its wide range of application potential, such as video surveillance and cross-camera tracking.

ReID has developed very rapidly in these two years, but falls to the ground with very few applications compared to face technology. In fact, the model of the ReID is not good enough, the accuracy on the data set is not high enough, and compared with the human face task, the ReID scene is more complex, and some essential problems are not solved. ReID remains a very challenging task due to a number of uncontrolled sources of variation, such as significant changes in pose and viewpoint, complex variations in illumination, and poor image quality.

The simplest and most urgent occlusion problems and the problems of invisible light and replacement of pedestrian clothing and the like can make almost all existing ReID models extremely ineffective, so to speak ineffective.

Disclosure of Invention

Aiming at the problems and the defects of the prior art, mainly aiming at the defect of the pedestrian re-identification technology in clothes-changing pedestrian identification (identification after clothes-changing of clothes-changing pedestrians), the pedestrian re-identification method based on multi-scale pedestrian contour segmentation fusion is provided.

The technical scheme adopted by the invention for solving the technical problem is as follows:

step (1), data preprocessing

And acquiring a sufficient number of sample images, and carrying out normalization processing on the sample images to obtain a data set.

Step (2), extracting the global features of the image and the contour features of the pedestrian

Inputting the data set into a pedestrian global feature extraction network to obtain global features of the image;

inputting the data set into a multi-scale pedestrian contour segmentation network to obtain the contour characteristics of the pedestrian;

the multi-scale pedestrian contour segmentation network adopts ResNet pre-trained on ImageNet as a main feature extraction network, and on the basis of the network, a new residual block is added for multi-scale feature learning, and the new residual block uses hole convolution to replace common convolution;

and the top of the new residual block is subjected to pyramid pooling by adopting a cavity space which can obtain the information of the human body outline dimensions of different pedestrians.

And (3) inputting the global features and the contour features into a pedestrian re-identification network for fusion.

Step (4), training the pedestrian re-recognition network by adopting a label smooth loss function to enable the network parameters to be optimal, specifically:

training on an ImageNet database according to InceptionResNet 2 to obtain a pre-training network, inputting a feature vector generated by fusion of global features and contour features into a label smoothing loss function, and training parameters of the pedestrian re-identification network by using a back propagation algorithm until the whole network converges.

And (5) calculating Euclidean distances of specified objects in the query set and each object in the candidate set aiming at the query set and the candidate set contained in the pedestrian re-identification data set, and then performing ascending sorting on the calculated distances to obtain a sorting result of pedestrian re-identification.

Further, the pretreatment in the step (1) is specifically: setting the size of an input image, and if the sample image is larger than the size, performing random cutting to obtain the sample image; and if the sample image is smaller than the size, performing equal-scale amplification and then cutting.

Further, the hole convolution of the new residual block controls the pixels of the features through a deep convolution neural network, the visual domain of a convolution kernel is adjusted to obtain multi-scale information, and each hole convolution uses different expansion rates to capture multi-size contextual information.

Further, the void space pyramid pooling uses void convolutions of different expansion rates to classify regions of arbitrary scale.

Further, the void space pyramid pooling includes two parts: multi-scale hole convolution and image-level features;

the multi-scale void convolution comprises common convolution of 1x1, void convolution with a void rate of 6 of 3x3, void convolution with a void rate of 12 of 3x3 and void convolution with a void rate of 18 of 3x 3;

the image level features are obtained by averaging the input in the [1,2] dimension, converting the input image size by using linear difference values through common convolution, finally connecting the four convolutions with the image features, and obtaining the output of the network through convolution.

Further, the step (3) adopts a point-by-bit addition mode to perform the fusion of the global features and the contour features.

Further, in the step (3), when the two features have different dimensions, the two features are converted into vectors with the same dimension through linear transformation.

The invention has the beneficial effects that:

1. the influence of the pedestrian background in the ReID process is removed, and the human is identified through the human body outline of the pedestrian, which is the process of performing pedestrian identification closest to the human.

2. The features on the clothing of the pedestrian are removed, and the pedestrian identification method is effective to the defect of the existing pedestrian re-identification technology in the clothing change pedestrian identification, because the network does not depend on the clothing features on the clothing, and the contour of the human body of the pedestrian is also learned to identify the pedestrian. The global features can be learned by two branches of the pedestrian re-identification method based on multi-scale pedestrian contour segmentation, the human body contour features of pedestrians can be well learned, and for a pedestrian re-identification system, the re-identification can be well performed no matter whether the pedestrian clothing is replaced or not.

Drawings

FIG. 1 is a general block diagram according to the present invention;

FIG. 2 is a network architecture diagram of a multi-scale pedestrian contour segmentation network branch in accordance with the present invention;

fig. 3 is a block diagram of a dual-branch re-identification network according to the present invention.

Detailed Description

In order to describe the present invention more specifically, the following detailed description of the technical solution of the present invention is made with reference to the accompanying drawings and the detailed description, and the flow of an embodiment of the method is shown in fig. 1. The invention relates to a pedestrian re-identification method based on pedestrian contour segmentation, which comprises the following steps of:

step (1), acquiring enough number of pedestrian sample images, wherein the images can be downloaded from a network (Market1501, DukeMTMC-reiD, CUHK03) or can be shot by self; and (3) normalizing the pedestrian sample image, taking an input image with the size of 512 multiplied by 512 as an example, if the sample image is larger than the size, randomly cutting the pedestrian sample image, and if the size of the pedestrian sample image is smaller than the size, performing equal-proportion amplification and cutting the pedestrian sample image.

the two branches can learn the global features of the image and can also well learn the human body contour features of the pedestrians. The two branches are effective to the defect of the existing pedestrian re-identification technology in the identification of the clothes-changing pedestrians, because the network does not depend on the clothes characteristics on the clothes, and the contour of the human body of the pedestrian is learned to identify the pedestrian. For the pedestrian re-identification system, the pedestrian garment can be well re-identified no matter whether the garment is replaced or not.

As shown in fig. 2, the multi-scale pedestrian contour segmentation network is a network for learning multi-scale contextual features, which is a network for extracting main features by using ResNet pre-trained on ImageNet, and on the basis of the network, a new residual block is added for multi-scale feature learning, and the hole convolution 301 is used for the new residual block to replace the ordinary convolution. The hole convolution can control the pixels of the features through a deep convolution neural network, and the visual domain of a convolution kernel is adjusted to obtain multi-scale information.

In addition, each hole convolution within this residual block uses a different expansion rate to capture multi-sized context information, using hole space pyramid pooling 302 at the top of this residual block. The cavity space pyramid pooling uses cavity convolution with different expansion rates to classify regions of any scale, so that information of different pedestrian human body contour scales can be obtained through the cavity space pyramid pooling structure.

The void space pyramid pooling includes two parts: multi-scale hole convolution and image-level features. The multiscale hole convolution includes, 1x1 ordinary convolution, 3x3 hole convolution with a hole rate of 6, 3x3 hole convolution with a hole rate of 12, and 3x3 hole convolution with a hole rate of 18; and (3) image level characteristics, namely averaging the input in the [1,2] dimension, converting the input image size by using a linear difference value through common convolution, connecting 4 convolutions with image characteristics, and finally obtaining the output of the network through one convolution. The network outputs softmax of pixel-wise, namely:

wherein x is the pixel position on the two-dimensional plane, a_k(x) Representing the value of the kth channel corresponding to pixel x in the last output layer of the network. p is a radical of_k(x) Representing the probability that pixel x belongs to class k.

Meanwhile, the multi-scale pedestrian contour segmentation branch network is used for training a pre-training model for segmenting the pedestrian contour on a large amount of label information of a segmentation data set on a coco data set, so that in the pedestrian re-identification method based on multi-scale pedestrian contour segmentation fusion, a pedestrian picture is input into the multi-scale pedestrian contour segmentation branch to obtain a pedestrian contour map.

And (3) finally fusing the pedestrian global feature extraction network and the multi-scale pedestrian contour segmentation network through the structure shown in the figure 3. The network architecture shown in fig. 3 is prior art and is not described or illustrated in detail. The two branches of the pedestrian global feature extraction branch network and the multi-scale pedestrian contour segmentation branch network are trained by taking the Inception ResNet 2 as a backbone network, and the Inception ResNet 2 is used for training on an ImageNet database to obtain a pre-training network. The IncepistionResNetv 2 network fuses the features of different scales, so that the adoption of the backbone network can realize the fusion of the features of different sizes with a multi-scale pedestrian contour segmentation branch network, better front-back correspondence can be realized, and the accuracy can be improved.

The InceptionResNet 2 is characterized in that the nxn convolution is replaced by 1xn convolution kernel nx1 convolution, so that the calculation amount is effectively reduced, the calculation amount is reduced by replacing 5x5 convolution and 7x7 convolution with a plurality of 3x3 convolutions, the speed of the fused pedestrian re-identification network can be increased relative to multi-scale pedestrian contour segmentation, ResNet and acceptance network structures are further fused in the InceptionResNet 2, and ResNet is also adopted in the multi-scale pedestrian contour segmentation branch network, and the accuracy can be further improved correspondingly.

Training is carried out on an ImageNet database according to InceptionResNet 2 to obtain a pre-training network, and then the global features and the outline features are fused in a point-by-point addition mode to obtain a feature vector. And inputting the feature vector into a cross entropy loss function, and training the defined multi-scale pedestrian contour segmentation and pedestrian re-identification network parameters by using a back propagation algorithm so as to optimize the parameters of the network model.

Step (4), the model training adopts label smooth loss, and the classification of pedestrian re-identification usually uses a cross entropy loss function:

wherein N is the total number of pedestrians and is a pedestrian label. When an image i is input, y_iIs the pedestrian's label in the image if y_iIs class i, which has a value of 1, otherwise it is 0. p is a radical of_iIs the probability that the network predicts that the pedestrian belongs to tag i pedestrian. The reason for introducing the label smoothing loss function is that the cross entropy loss function excessively depends on a correct pedestrian label, so that the phenomenon of overfitting training is easily caused, and the overfitting phenomenon in the training process is avoided. A small number of error labels may exist in a pedestrian training sample, the error labels may have a certain influence on a prediction result to a certain extent, and the label smoothing loss function may also be used for preventing the model from excessively depending on the labels in the training process. Therefore, the pedestrian label smoothing treatment is to set an error rate for the label in the training process and train by taking 1-as a real label

Step (5), testing results

And calculating Euclidean distances of the specified objects in the query set and each object in the candidate set aiming at the query set and the candidate set contained in the pedestrian re-identification data set, and then performing ascending sorting on the calculated distances to obtain a sorting result of pedestrian re-identification and obtain a pedestrian re-identification result.

While the invention has been described in connection with specific embodiments thereof, it will be understood that these should not be construed as limiting the scope of the invention, which is defined in the following claims, and any variations which fall within the scope of the claims are intended to be embraced thereby.

Claims

1. The pedestrian re-identification method based on multi-scale pedestrian contour segmentation fusion is characterized by comprising the following steps of:

step (1), data preprocessing

Acquiring a sufficient number of sample images, and carrying out normalization processing on the sample images to obtain a data set;

the top of the new residual block is subjected to pyramid pooling by adopting a cavity space which can obtain the human body outline dimension information of different pedestrians;

inputting the global features and the contour features into a pedestrian re-identification network for fusion;

training on an ImageNet database according to InceptionResNet 2 to obtain a pre-training network, inputting a feature vector generated by fusion of global features and contour features into a label smooth loss function, and training parameters of a pedestrian re-identification network by using a back propagation algorithm until the whole network converges;

2. The pedestrian re-identification method based on multi-scale pedestrian contour segmentation fusion according to claim 1, characterized in that: the pretreatment in the step (1) is specifically as follows: setting the size of an input image, and if the sample image is larger than the size, performing random cutting to obtain the sample image; and if the sample image is smaller than the size, performing equal-scale amplification and then cutting.

3. The pedestrian re-identification method based on multi-scale pedestrian contour segmentation fusion according to claim 1, characterized in that: the hole convolution of the new residual block controls the pixels of the features through a deep convolution neural network, the visual domain of a convolution kernel is adjusted to obtain multi-scale information, and each hole convolution uses different expansion rates to capture multi-size contextual information.

4. The pedestrian re-identification method based on multi-scale pedestrian contour segmentation fusion according to claim 1, characterized in that: the void space pyramid pooling uses void convolutions of different expansion rates to classify regions of arbitrary scale.

5. The pedestrian re-identification method based on multi-scale pedestrian contour segmentation fusion as claimed in claim 4, wherein: the void space pyramid pooling comprises two parts: multi-scale hole convolution and image-level features;

6. The pedestrian re-identification method based on multi-scale pedestrian contour segmentation fusion according to claim 1, characterized in that: and (3) fusing the global features and the contour features in a point-by-bit addition mode.

7. The pedestrian re-identification method based on multi-scale pedestrian contour segmentation fusion as claimed in claim 6, wherein: and (4) when the two features have different dimensions in the step (3), converting the two features into vectors with the same dimension through linear transformation.