CN111209799B

CN111209799B - Pedestrian searching method based on partial shared network and cosine interval loss function

Info

Publication number: CN111209799B
Application number: CN201911337014.1A
Authority: CN
Inventors: 罗炬锋; 陈浩然; 李丹; 曹永长; 偰超; 张力; 崔笛扬; 郑春雷
Original assignee: Shanghai Internet Of Things Co ltd
Current assignee: Shanghai Internet Of Things Co ltd
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2022-12-23
Anticipated expiration: 2039-12-23
Also published as: CN111209799A

Abstract

The invention relates to a pedestrian searching method based on a partial shared network and a cosine interval loss function, which comprises the following steps: firstly, a new neural network structure is designed, so that the pedestrian detection part and the pedestrian re-identification part share the more shallow features, so that the pedestrian detection part and the pedestrian re-identification part can concentrate on respective tasks, and mutual interference between the pedestrian detection part and the pedestrian re-identification part is relieved to a certain extent from the perspective of improving a model structure. Secondly, the influence of the weight of the pedestrian re-identification loss function on model optimization in the multi-loss function combined optimization is deeply researched, and the mutual interference between pedestrian detection and re-identification is relieved from the optimization perspective by setting reasonable loss function parameters. Finally, a more robust lookup table updating strategy is provided, cosine intervals are added into the OIM loss function to reduce the distance between similar samples, and finally the pedestrian features learned by the network are more distinctive. The invention can reduce the mutual interference between pedestrian detection and pedestrian re-identification.

Description

Pedestrian searching method based on partial shared network and cosine interval loss function

Technical Field

The invention relates to the technical field of computer vision application, in particular to a pedestrian searching method based on a partial shared network and a cosine interval loss function.

Background

Pedestrian re-identification, which aims to match target pedestrians from a multi-camera monitoring system without overlapping fields of view, is a very important and fast-developing research field in the field of computer vision. At present, pedestrian re-identification has been applied in the field of video monitoring, for example, searching criminal suspects from people, cross-camera pedestrian tracking, pedestrian activity analysis, etc., and has a very important meaning for guaranteeing the safety of public lives and properties, so in recent years, the pedestrian re-identification technology has led to extensive research in academic and industrial fields. Although many data sets and related algorithms for pedestrian re-identification are proposed at present, a huge gap still exists between the pedestrian re-identification technology itself and the practical application in real life: in other words, in most of the studies on pedestrian re-identification, cut pedestrian images are used as a data set to be queried, but in a real scene, all pictures acquired by a monitoring system are scene pictures, and after all pedestrians are detected from the scene pictures, target pedestrian query needs to be performed. Therefore, in practical applications, it is necessary to combine the tasks of pedestrian detection and pedestrian re-identification and process the two tasks at the same time. Pedestrian search aims to simultaneously process pedestrian detection and pedestrian re-identification and search out a target pedestrian from a scene graph, is a novel problem appearing in recent years, and has attracted wide attention in academic and industrial fields.

In the past few years, a number of pedestrian search algorithms have been proposed, which can be broadly divided into two categories: a two-step algorithm and an end-to-end algorithm. The two-step algorithm respectively processes the two tasks by using a pedestrian detection model and a pedestrian re-identification model, takes a scene picture as input, firstly detects pedestrians, and then takes the pedestrian slices as the input of a pedestrian re-identification network for matching after the detected pedestrian slices are obtained; the end-to-end algorithm utilizes a joint optimization deep learning model to uniformly process two tasks by setting a multi-loss function comprising pedestrian detection and pedestrian re-identification. Theoretically, the end-to-end pedestrian search algorithm has more advantages than the two-step algorithm, such as: pedestrian detection and pedestrian re-identification can be mutually promoted through joint optimization, so that the probability of pedestrian false detection is reduced, the accuracy of identification is improved, and meanwhile, the pedestrian search is more time-efficient due to the fact that the pedestrian false detection and the pedestrian re-identification share the feature layer.

However, the end-to-end model always faces these three important issues. The first is that the shared features of pedestrian detection and pedestrian re-identification affect the performance of the model. The second is that the performance of the end-to-end model depends greatly on the reasonable setting of the weight between the pedestrian detection and pedestrian re-identification loss functions due to the adoption of a multi-task learning mode. The last one is that the widely used pedestrian re-identification loss function OIM loss function in the end-to-end pedestrian search network lacks the ability to distinguish pedestrians, since it only considers correctly distinguishing different classes of samples, but ignores optimizing the similarity between the same class of samples.

Disclosure of Invention

The invention aims to provide a pedestrian searching method based on a partial shared network and a cosine interval loss function, which can reduce mutual interference between pedestrian detection and pedestrian re-identification.

The technical scheme adopted by the invention for solving the technical problem is as follows: a pedestrian searching method based on a partial shared network and a cosine interval loss function is provided, and comprises the following steps:

(1) Constructing a neural network structure comprising a pedestrian suggestion network and a pedestrian re-identification network; the neural network structure adopts ResNet-50 as a basic network, wherein the ResNet-50 comprises a convolution layer Conv1 and four convolution layer groups Conv2_ x to Conv5_ x, and each convolution layer group is respectively provided with 3,4,6 and 3 residual error units;

(2) Inputting pictures into the neural network structure, wherein Conv 1-Conv 3_4 of the neural network structure are used as a backbone network to extract a shallow feature map, and the shallow feature map is shared by a pedestrian suggestion network and a pedestrian re-identification network;

(3) When the shallow feature map is extracted, the shallow feature map enters two branches; the shallow feature map is sent to the copied convolutional network layer groups Conv4_1 to Conv4_3 by the first branch for further feature extraction, and then the shallow feature map is sent to a pedestrian suggestion network to generate a plurality of pedestrian suggestion boxes; the second branch sends the shallow feature map to an interest pooling layer, pedestrian suggestion frames obtained by the pedestrian suggestion network are used as interested areas, and pedestrian feature maps corresponding to the frames are pooled in the shallow feature map;

(4) Inputting the pedestrian feature map into a pedestrian re-identification network formed by convolution layer groups Conv4_1 to Conv5_3, outputting a deep pedestrian feature map, performing pooling through a global average pooling layer, and finally mapping each pedestrian feature map into a pedestrian feature vector in a dimension.

The neural network structure in the step (1) is trained by adopting a multi-loss function, and the form of the total loss function is expressed as: and the pedestrian suggestion network loss function + r is a pedestrian re-identification network loss function, wherein r is an adjusting parameter, and mutual interference between pedestrian detection and re-identification is relieved by adjusting r.

The pedestrian re-identification network loss function adopts a cosine interval loss function, and is expressed as follows:

wherein N represents the total number of labeled samples in a training batch, s is a scale parameter, and theta _j I denotes the ith sample x _i And pedestrian feature vector v of labeled category _j M is the cosine interval introduced, L is the total number of classes of labeled samples, Q ≈ L, ψ _k And i denotes the ith sample x _i And pedestrian feature vector u of no-label category _k The included angle of (c).

For pedestrian feature vector v with label category _j The updating method comprises the following steps:

wherein N is _t Represents the total number of samples of class t, C, in a training batch _i Represents the ith sample x _i As the confidence of the pedestrian, μ is a variable that increases as the training period increases.

The pedestrian suggestion network in the step (1) adopts a FasterR-CNN structure.

After the step (4), the method further comprises the following steps: and mapping the pedestrian vector in the dimension a into the feature subspace in the dimension b by using a full connecting layer, and then performing L2-regularization on the pedestrian feature in the feature subspace in the dimension b to prevent overfitting.

Advantageous effects

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the partial shared network adopted by the invention can effectively reduce mutual interference between the pedestrian detection part and the pedestrian re-identification part from the perspective of a model structure by reasonably reducing the network layers shared by the pedestrian detection part and the pedestrian re-identification part, so that the pedestrian detection part and the pedestrian re-identification part can focus on respective tasks. The invention enables the whole network to converge to a more ideal parameter space by setting the correct weight. The invention also introduces a cosine interval to the OIM loss function, increases the aggregation of similar samples and leads the learned characteristics to have stronger distinguishability. The invention designs a novel updating strategy for re-identifying the loss of the pedestrian, and experiments prove that the strategy is more robust. Finally, a large number of experiments are carried out on the two standard data sets, and finally, the method is proved to have better performance and stronger practicability on mAP and top-1 evaluation indexes than the existing end-to-end pedestrian search model.

Drawings

Fig. 1 is a diagram of a network architecture of the present invention.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

The embodiment of the invention relates to an end-to-end pedestrian searching method based on a partial shared network and a cosine interval loss function, which comprises the following steps: a new neural network structure is designed, so that the pedestrian detection and pedestrian re-identification parts share more shallow features, the pedestrian detection and pedestrian re-identification parts are more concentrated on respective tasks, and mutual interference between the pedestrian detection and pedestrian re-identification parts is relieved to a certain extent from the perspective of improving a model structure. In the multi-loss function combined optimization, the influence of the weight of the pedestrian re-identification loss function on model optimization is deeply researched, and the mutual interference between pedestrian detection and re-identification is relieved from the optimization perspective by setting reasonable loss function parameters. The widely used OIM loss function is improved: a more robust lookup table updating strategy is provided, cosine intervals are added into an OIM loss function to reduce the distance between similar samples, and finally the pedestrian features learned by the network are more distinctive. The whole neural network architecture is shown in fig. 1.

The partial shared network proposed by the present embodiment is improved from the proposed end-to-end pedestrian search network. Specifically, resNet-50 is used as the base network. ResNet-50 consists of a convolutional layer (Conv 1) and four convolutional layer groups (Conv 2_ x to Conv5_ x), each of which has 3,4,6,3 residual units. In the present embodiment, conv1 to Conv3_4 are used as the backbone Network, and the extracted shallow features are shared by a Pedestrian recommendation Network (PPN) and a Pedestrian re-IDentification Network (IDnet). When the shallow feature is extracted, it enters two branches. The first branch directs the shallow feature map to duplicate Conv4_1 to Conv4_3 (not sharing parameters with the corresponding network layer in IDnet) for further feature extraction, and then to generate a plurality of pedestrian suggestion boxes in the PPN. And the second branch sends the shallow feature map to the interest pooling layer, the pedestrian suggestion frames obtained by the PPN are used as interested areas, and the pedestrian feature map corresponding to each frame is pooled in the shallow feature map. The size of the pedestrian feature map corresponding to each frame is 512 × 40 × 20, and the structure of the PPN is consistent with the structure of the area detection network in the Faster R-CNN. The IDnet is composed of the Conv4_1 to Conv5_3, and is input as a pedestrian feature map obtained by Pooling, a deep pedestrian feature map with high-level classification semantics is output, then Pooling is performed through a Global Average Pooling layer (GAP), and finally each pedestrian feature map is mapped into a 2048-dimensional pedestrian feature vector. Strictly speaking, the IDnet should focus only on the pedestrian re-recognition task and should not include the pedestrian detection section. However, it was found through experiments that keeping the pedestrian front background classification layer and the frame regression layer after IDnet would make the detection result better, because the false detected pedestrians can be further eliminated by using high-level semantics. For the pedestrian re-identification part, a 2048-dimensional pedestrian vector is firstly mapped into 256-dimensional feature subspace by using a full connection layer, and then the 256-dimensional pedestrian features are subjected to L2-regularization to prevent overfitting. The whole network is trained using multiple loss functions, and the total loss function composition is shown in equation 1.

Wherein the first four loss functions constitute Fast R-CNN losses. In the partial sharing network, the network layer shared by the tasks of pedestrian detection and pedestrian re-identification is composed of Conv1 to Conv3_4, and the partial sharing network enables the two tasks to share the more shallow features and enables the two tasks to have more network layers to focus on the respective tasks more. Furthermore, the partially shared network fixes the aspect ratio of the pedestrian pooling feature map to be 2. By taking this compromise, the mutual interference between the pedestrian detection task and the pedestrian re-identification task caused by the shared features can be effectively mitigated, while the shared useful shallow feature information is also preserved.

After 256-dimensional pedestrian feature vectors subjected to L2-regularization are obtained, an Online Instance Matching (OIM) loss function is proposed by an existing algorithm to serve as a loss function of a pedestrian re-identification task. The OIM penalty function maintains an extra cache of two online updates: one LookUp Table (LookUp Table, LUT)

And a Circular Queue (CQ)

Where D is the vector dimension, L is the total number of classes of labeled samples, and Q is a value close to L. V is used for maintaining the pedestrian feature vector of the labeled category, and U is used for maintaining the pedestrian feature vector of the unlabeled category. The expression of the OIM loss function is shown in equation 2.

Where N represents the total number of labeled samples in a training batch, x _i Representing the ith sample, is an L2-regularized feature vector with 256 dimensions in one dimension, labeled t (t e [1, C)]And C represents the total number of categories). After each iteration, the LUT and the CQ are updated, and the update strategy of the existing algorithm is shown in formula 3:

v _t ＝γv _t +(1-γ)x (3)

wherein x represents the pedestrian feature vector with the category t, and the formula is sequentially executed for each pedestrian feature vector with the category t. This means that v is updated after _t The feature vector of (a) has actually a larger weight. Unlike this approach, the present embodiment proposes a more efficient update approach:

wherein N is _t Denotes the total number of samples, C, in a training batch with class t _i Representing a sample x _i As the confidence of the pedestrian, it is obtained by the classification layer of IDnet. Meanwhile, mu is a variable that increases as the training period increases, unlike formula 3 in which gamma is set to a fixed constant.

Such an update strategy is mainly based on two considerations: (1) Because the pedestrian feature vectors extracted from the same batch do not have the sequence, updating by using the weighted average information of the pedestrian feature vectors is a more robust method than updating according to the sequence. (2) At the beginning of training, v is now the case because the pedestrian features obtained are not very accurate _t Is not very reliable, so it is necessary to set a small value of μ so that v is _t Can be updated more quickly. When the training reaches a certain stage, v _t After becoming more stable, the value of μ needs to be reduced to combat the uncertain interference noise.

Because all features are L2-regularized, equation 2 can be rewritten as:

wherein, theta _j I represents x _i And v _j Angle of vector of (phi), phi _k I represents x _i And u _k S is a scale parameter, and has a value equal to 1/τ.

From equation 5, the objective of the OIM penalty function is to maximize the cosine similarity of the v vectors of x and its corresponding labels, however, it does not explicitly reduce the differences between homogeneous samples, which makes the learned pedestrian features not sufficiently strong discriminative. To solve this problem, the present embodiment introduces a cosine interval for the OIM loss function, which may naturally merge into the cosine representation of the OIM loss function.

Formally, the proposed cosine interval OIM loss function (CM-OIM) expression is:

where m is a hyperparameter representing the introduced cosine interval. The CM-OIM loss function can enable the difference between different samples to be larger in a sample space, meanwhile, the similarity between the same samples is also enhanced, and finally the learned features have stronger distinguishability.

The method of the invention is experimentally verified by two widely used large-scale data sets CUHK-SYSU and PRW, 83.5% of mAP is obtained in CUHK-SYSU, 32.8% of mAP is obtained in PRW, and the result shows that the method has better pedestrian search performance than other conventional methods.

Claims

1. A pedestrian searching method based on a partially shared network and a cosine interval loss function is characterized by comprising the following steps:

(1) Constructing a neural network structure comprising a pedestrian suggestion network and a pedestrian re-identification network; the neural network structure adopts ResNet-50 as a basic network, wherein the ResNet-50 comprises a convolution layer Conv1 and four convolution layer groups Conv2_ x to Conv5_ x, and each convolution layer group is respectively provided with 3,4,6 and 3 residual error units; wherein the neural network structure is trained using multiple loss functions, and the form of the total loss function is expressed as: the pedestrian recommendation network loss function + r is a pedestrian re-identification network loss function, wherein r is a regulation parameter, and mutual interference between pedestrian detection and re-identification is relieved through regulation of r; the pedestrian re-identification network loss function adopts a cosine interval loss function, and is represented as:

wherein N represents the total number of samples with labels in a training batch, s is a scale parameter, and theta _j I denotes the ith sample x _i And pedestrian feature vector v of labeled category _j M is the cosine interval introduced, L is the total number of classes of labeled samples, Q ≈ L, ψ _k And i denotes the ith sample x _i And pedestrian feature vector u of no-label category _k The included angle of (A); for pedestrian feature vector v with label category _j The updating method comprises the following steps:

wherein N is _t Represents the total number of samples of class t, C, in a training batch _i Denotes the ith sample x _i As the confidence of the pedestrian, μ is a variable that increases as the training period increases;

(4) Inputting the pedestrian feature maps into a pedestrian re-identification network formed by convolutional layer groups Conv4_1 to Conv5_3, outputting deep pedestrian feature maps, performing pooling through a global average pooling layer, and finally mapping each pedestrian feature map into a pedestrian feature vector of a dimension a.

2. The pedestrian searching method based on the partially shared network and the cosine interval loss function as claimed in claim 1, wherein the pedestrian proposing network in the step (1) adopts a structure of Faster R-CNN.

3. The pedestrian searching method based on the partially shared network and the cosine interval loss function according to claim 1, wherein the step (4) is further followed by: and mapping the pedestrian vector in the dimension a into the feature subspace in the dimension b by using a full connecting layer, and then performing L2-regularization on the pedestrian feature in the feature subspace in the dimension b to prevent overfitting.