CN112633417A

CN112633417A - Pedestrian depth feature fusion method for pedestrian re-identification and with neural network modularization

Info

Publication number: CN112633417A
Application number: CN202110059638.2A
Authority: CN
Inventors: 张涛; 孙星; 李璇
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2021-01-18
Filing date: 2021-01-18
Publication date: 2021-04-09

Abstract

A pedestrian depth feature fusion method for pedestrian re-identification and with a neural network modularization achieves a pedestrian re-identification method based on deep learning and suitable for different neural networks and loss functions, and meanwhile achieves feature fusion of different depths of different neural network structures. Therefore, the feature fusion method applied to pedestrian re-identification has higher flexibility and robustness, improves the accuracy of pedestrian re-identification, and weakens the interference of factors such as the change of a monitoring visual angle and the like on pedestrian re-identification. A Resnet50 Network and a multi-granularity Network (MGN) for pedestrian re-identification are adopted as reference algorithms, and cross entropy loss, triple loss and circle loss are adopted as alternative loss functions. The test data set respectively adopts a Market-1501, DukeMTMC-reid and CUHK03 data set, and on a CUHK03 data set, compared with a reference algorithm MGN, the method disclosed by the invention obtains the improvement of Rank-1/mAP of + 2.71%/2.11%.

Description

Pedestrian depth feature fusion method for pedestrian re-identification and with neural network modularization

Technical Field

The invention relates to a pedestrian re-identification method, in particular to a pedestrian re-identification method aiming at pedestrian feature fusion of different depths in a neural network training process, which has more flexibility and robustness.

Background

The pedestrian re-identification task can be regarded as a subtask of image retrieval, which can be understood as a task of searching for a pedestrian of a specific identity in videos or images captured by cameras of a plurality of different imaging areas. As a new technology in the field of intelligent video analysis, the pedestrian re-identification task plays an important role in security and monitoring application. The essence of the pedestrian re-identification task is that pedestrians in images or videos are detected, tracked and subjected to characteristic analysis, due to the influence of monitoring environment, the shooting visual angle changes, the shooting environment difference, the image quality is poor, the accuracy of the pedestrian re-identification result is influenced by factors such as human posture changes, along with the rapid development of computer vision technology and machine learning, many key problems in the pedestrian re-identification task are effectively solved, and the pedestrian re-identification task gradually becomes a very popular research direction.

Although the efficiency of pedestrian re-identification is effectively improved by the deep learning technology, the problem of pedestrian re-identification still has many research difficulties due to the interference caused by the change of the camera view angle and the change of the pedestrian posture. Since there is a large difference between different pictures of the same pedestrian captured at different camera angles, it is necessary to consider feature fusion of the same pedestrian at different camera angles. The existing deep learning method for pedestrian re-identification usually focuses on feature extraction for classification problems, and feature learning of the same pedestrian under different visual angles usually achieves the purpose of similarity improvement through constraint of loss functions. Therefore, the performance of the neural network for pedestrian re-identification can be improved by carrying out effective feature fusion in the network training process. The current common feature fusion method is to fuse the features of a fixed node in the network, and the method has a certain limitation on the flexibility of application.

The invention provides a pedestrian feature fusion method for pedestrian re-identification, which modularizes a neural network, so that the feature fusion method applied to pedestrian re-identification has higher flexibility and robustness. According to the pedestrian re-identification method, the neural network is modularized, so that the characteristic fusion of the characteristics extracted from the networks at different depths is realized, the influence of the network characteristics at different depths on pedestrian re-identification is focused on by the neural network modularization, and meanwhile, the characteristic fusion module can be more conveniently applied to different pedestrian re-identification networks, so that the accuracy rate of pedestrian re-identification is improved.

Disclosure of Invention

The pedestrian re-identification algorithm optimizes the image feature fusion method of the same pedestrian under different visual angles, enables the feature fusion module to be more flexibly applied to a neural network framework in a mode of modularizing a neural network, pays attention to the influence of features extracted from different depth network layers on pedestrian features finally used for identification, and accordingly finds a relatively excellent feature fusion mode, wherein the feature fusion module is formed by cascade 1 x 1 convolution kernels.

1) Dividing a neural network architecture W into a plurality of training modules by taking a plurality of input/outputs as nodes, wherein a loss function is an independent module, and the neural network architecture W is defined as a backbone network.

2) And selecting the output node p of a certain dividing module B _ p as the input node of the characteristic fusion module B _ f.

3) A sub-network behind a node p in an original network architecture W is defined as W _ q, and a sub-network M _ q with the same structure as W _ q is constructed.

4) And splicing the M _ q after the characteristic fusion module B _ f to construct a characteristic fusion branch network.

5) And respectively selecting proper loss functions to constrain the main network and the feature fusion branch network.

6) In the network training process, the loss function of the whole network architecture is the sum of the loss function of the main network and the loss function of the feature fusion branch network.

Alternative loss functions are, cross-entropy loss function L_softmaxTriple loss function

Sum circle loss function

The definition is as follows:

x_iis the ith feature, W_kDenotes x_iIs the k-th person's possible weight matrix y_iDenotes x_iIs the true weight matrix for the kth person.

Respectively representing the current feature, a hardest positive sample and a hardest negative sample, wherein the hardest positive sample represents the feature which belongs to the same pedestrian as the current feature in the current training batch data and is most dissimilar, and the hardest negative sample represents the feature which belongs to the same pedestrian as the current feature in the current training batch data and is most similar. Alpha is a hyper-parameter in the network.

Each picture is input into the entire network architecture.

And realizing cross image feature fusion at the node p.

And (4) constraining the whole network structure hyper-parameter through the loss function of the backbone network and the loss function of the feature fusion branch network.

10) Judging whether the iteration number set during network training is reached, if so, executing a step 11), otherwise, executing a step 7)

11) And coding the feature _ m obtained by each pedestrian picture through a main network and the feature _ f obtained by the feature fusion branch network to obtain the features finally used for pedestrian re-identification. The specific coding mode is to take the average value or the maximum value of each dimension feature of feature _ m and feature _ f as the pedestrian feature _ o finally used for retrieval.

12) And judging the identity of the pedestrian by adopting a distance measurement mode on the pedestrian feature _ o used for searching for each pedestrian, wherein the closer the distance is, the higher the possibility that the pedestrian is the same person is, and the farther the distance is, the higher the possibility that the pedestrian is not the same person is.

Drawings

FIG. 1 shows the overall flow chart of the process

Table one Resnet50 comparison with the results of the method of the present invention

TABLE II comparison of MGN with the inventive method results

Detailed Description

Taking a Cuhk03 data set and an MGN network as an example, the following is a best implementation mode under one situation in the present invention, and the specific implementation steps are as follows:

1) res _ conv1, res _ conv2 and res _ conv3 of the neural network architecture MGN are respectively taken as input/output nodes to divide a plurality of training modules, wherein a loss function is an independent module, and the neural network architecture W is defined as a backbone network.

2) Res _ conv3 is selected as the partitioning module B _ p, the output node p of which is the input node of the feature fusion module B _ f.

5) The main network loss function selects a cross entropy loss function and a circle loss function, and the loss function of the feature fusion branch network selects the cross entropy loss function and the circle loss function.

Cross entropy loss function L_softmaxSum circle loss function L_tircleThe definition is as follows:

x_iis the ith feature, W_kDenotes x_iIs the k-thPossible weight matrix for a person_iDenotes x_iIs the true weight matrix for the kth person.

7) Inputting each picture into the whole network architecture, inputting 8 images for the pedestrian images with the same identity, and inputting 8 pedestrian images with the same identity in total, namely inputting 64 images into the deep neural network MGN for pedestrian re-identification, wherein the input pictures require that the images with each identity are adjacent in the parallel training process.

8) And realizing cross image feature fusion at the node p.

9) And (4) constraining the whole network structure hyper-parameter through the loss function of the backbone network and the loss function of the feature fusion branch network.

10) Setting the iteration number to be 400, judging whether the iteration number set during network training is reached, if so, executing step 11), otherwise, executing step 7)

11) And coding the feature _ m obtained by each pedestrian picture through a main network and the feature _ f obtained by the feature fusion branch network to obtain the features finally used for pedestrian re-identification. The specific coding mode is to take the mean value of each dimensional feature of feature _ m and feature _ f as the pedestrian feature _ o finally used for retrieval.

Claims

1. A method for optimizing image feature fusion of the same pedestrian under different visual angles based on a neural network modularization mode is characterized by comprising the following steps:

1) dividing a neural network architecture W into a plurality of training modules by taking a plurality of input/outputs as nodes, wherein a loss function is an independent module, and the neural network architecture W is defined as a backbone network;

3) Defining a sub-network behind a node p in an original network architecture W as W _ q, and constructing a sub-network M _ q with the same structure as W _ q;

4) splicing the M _ q after the characteristic fusion module B _ f to construct a characteristic fusion branch network;

5) respectively selecting proper loss functions to constrain the main network and the feature fusion branch network, wherein the proper loss functions comprise a cross entropy loss function, a triple loss function and a circular loss function;

6) inputting each picture into the whole network architecture, and realizing cross image feature fusion at a node p;

7) the method comprises the steps that the whole network structure hyper-parameter is constrained through a loss function of a main network and a loss function of a feature fusion branch network;

8) judging whether the iteration number set during network training is reached, if so, executing step 11), and if not, executing step 7);

9) and coding the feature _ m obtained by each pedestrian picture through a main network and the feature _ f obtained by the feature fusion branch network to obtain the features finally used for pedestrian re-identification. The specific coding mode is that the average value or the maximum value of each dimension feature of feature _ m and feature _ f is taken as the pedestrian feature _ o finally used for retrieval;

10) and judging the identity of the pedestrian by adopting a distance measurement mode on the pedestrian feature _ o used for searching for each pedestrian, wherein the closer the distance is, the higher the possibility that the pedestrian is the same person is, and the farther the distance is, the higher the possibility that the pedestrian is not the same person is.