CN111428675A

CN111428675A - Pedestrian re-recognition method integrated with pedestrian posture features

Info

Publication number: CN111428675A
Application number: CN202010252780.4A
Authority: CN
Inventors: 董世超; 王恺; 李涛
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2020-04-02
Filing date: 2020-04-02
Publication date: 2020-07-17

Abstract

The invention discloses a pedestrian re-identification method integrated with the posture characteristics of the pedestrian, which comprises the following steps: preprocessing a pedestrian re-identification data set, and dividing the pedestrian re-identification data set into a training set and a verification set; training a pedestrian skeleton detection model by using a public data set; then extracting pedestrian attitude high-dimensional matrix features; then, the pedestrian search model is integrated into a pedestrian search model for training; forward propagation is carried out, and feature vectors extracted by the feature extraction layer are obtained; forward computing loss; calculating gradient information of each sample in the data set; the gradient of the loss layer is reversely propagated, and weight parameters in the pedestrian retrieval model feature processing module are updated; if the model is not converged or the maximum iteration number is not reached, repeating the steps; and after the network training is finished, completing the pedestrian retrieval task in the query set on the test set. The invention provides a guiding model to learn and screen the posture characteristics of the pedestrian, and the integration of the posture information improves the retrieval capability of the pedestrian retrieval model for the same pedestrian.

Description

Pedestrian re-recognition method integrated with pedestrian posture features

Technical Field

The invention belongs to the technical field of neural networks, and particularly relates to a pedestrian re-identification method integrating posture characteristics of a pedestrian.

Background

The pedestrian screening, namely pedestrian re-identification, is a research hotspot in the aspect of intelligent video analysis and is widely concerned by academia. The pedestrian re-identification is to identify whether the two photographed persons are the same person in different cameras. In a broad sense, it belongs to the field of image retrieval. The same person is searched in videos shot by different cameras, and the challenge is that for the same person, due to the fact that the same person is shot by the cameras, the similarity of the same person under different camera images is low due to the fact that the illumination, the visual angle, the distance and the posture of the person are different. And secondly, different pedestrians wear clothes with the same color and the body shapes of different pedestrians are similar to each other and can be easily identified as the same person. Pedestrian re-identification essentially utilizes the external body of the human body, which has both flexible and rigid characteristics, to perform image retrieval. Pedestrian re-identification is easily affected by factors such as clothing color, illumination, angle, etc., so it is a very challenging subject. Pedestrian re-identification is limited by public data sets and hardware technology constraints, and is mainly applied to the field of cross-camera retrieval at the present stage. The cross-camera pedestrian retrieval task hopes that the pedestrian retrieval can be completed under a plurality of cameras through the designed algorithm, so that the position information of the pedestrian in a certain time period is acquired. Different cameras have different shooting angles, shooting styles and image pixel values for the same pedestrian, and higher requirements are put forward on the robustness of the algorithm.

With the development of deep learning Convolutional Neural Network (CNN) models in recent years, the great development of the pedestrian re-identification field is promoted. The main difficulties faced by the pedestrian re-identification problem are: when the pedestrian re-identification model is used for searching pedestrians, the pedestrian re-identification model depends on the clothing characteristics of pedestrians too much, and when the clothing of the pedestrians changes slightly, the searching effect of the pedestrian re-identification model is greatly reduced. In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides a pedestrian re-identification method integrated with pedestrian posture characteristics, which guides a deep convolutional network to learn a pedestrian high-dimensional posture characteristic matrix beneficial to pedestrian re-identification from a sample in the training process, thereby improving the retrieval capability of a model under the situations of different clothes and difficult sample differentiation.

The technical scheme adopted by the invention is as follows: a pedestrian re-recognition method integrated with pedestrian posture features comprises the following steps:

step 1: preprocessing a pedestrian re-identification data set (Market1501, Duke-MTMC, CUHKO3) and the like, dividing the data into a training set and a verification set after whitening, and stacking the training set by using a random erasing, random cutting, color migration and random noise adding mode for data expansion;

step 2: training a pedestrian skeleton detection model using a public data set (COCO, etc.);

and step 3: extracting pedestrian attitude high-dimensional matrix features (feature maps) by utilizing a pedestrian framework detection model,

specifically, a sample is input into a pedestrian skeleton detection model backbone network, and the pedestrian attitude high-dimensional matrix characteristic output from the last layer of the backbone network is obtained;

and 4, step 4: fusing the extracted pedestrian attitude high-dimensional matrix into a pedestrian retrieval model for training; the feature fusion mode can adopt a mode of directly summing the pedestrian attitude high-dimensional matrix features and a feature map extracted by a pedestrian retrieval model, or a mode of compressing by using a convolution layer after splicing two feature matrices, or a mode of screening the pedestrian attitude high-dimensional matrix features by using an SE module, then splicing the feature map extracted by the pedestrian retrieval model and then compressing;

and 5: forward propagation is carried out, and feature vectors extracted by the feature extraction layer are obtained;

step 6: forward calculated loss, softmax loss function is as follows:

the Triplet loss function is as follows:

wherein m represents a distance margin between homogeneous and heterogeneous; hardest positive represents the most distant homogeneous sample spacing within the entire batch; hardest negative denotes the nearest heterogeneous sample spacing within the entire batch;

and 7: calculating gradient information of each sample in the data set;

and 8: the gradient of the loss layer is reversely propagated, and weight parameters in the pedestrian retrieval model feature processing module are updated;

and step 9: if the model is not converged or the maximum iteration number is not reached, repeating the step 3-8;

step 10: after the network training is finished, the pedestrian retrieval task in the query set is completed on the test set, and Rank1, Rank5, Rank10 and mAP are calculated.

Preferably, step 6, conducting L2 regularization operation on the output feature vectors, splicing the unit feature vectors after L2 regularization to form a triple group, calculating ternary loss according to the triple group feature vectors, obtaining a Test Vector through a Bachnormation layer by the unit feature vectors, setting bias parameters bias of a Batch normation layer to be constantly equal to 0, enabling the Test Vector to be used in a later pedestrian retrieval Test stage, inputting the Test Vector into a full connection layer behind a network model, and calculating cross entropy loss.

Compared with the prior art, the invention has the beneficial effects that: the invention provides a guiding model to learn and screen the posture characteristics of the pedestrian, and the integration of the posture information improves the retrieval capability of the pedestrian retrieval model for the same pedestrian.

Drawings

FIG. 1 is a flow chart of pedestrian re-identification model training and testing of the present invention;

FIG. 2 is a flow chart of a pedestrian re-identification model pedestrian retrieval of the present invention;

FIG. 3 is a schematic diagram of a model location of interest after the guided model learns features of the present invention;

FIG. 4 is a schematic diagram of the effect of the pedestrian-oriented attitude model of the present invention;

FIG. 5 is a schematic diagram of a model of the present invention incorporating the posture features of a pedestrian;

FIG. 6 is a schematic diagram of a feature fusion approach of the present invention;

FIG. 7 is a schematic diagram of yet another feature fusion approach of the present invention;

FIG. 8 is a schematic view of yet another feature fusion approach of the present invention;

fig. 9 is the comparison of the pedestrian search results of the present invention and the common model.

Detailed Description

In order to make the technical solutions of the present invention better understood, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The embodiment of the invention discloses a pedestrian re-identification method integrated with pedestrian posture characteristics, which comprises the following steps as shown in the figure:

step 1: preprocessing a pedestrian re-identification data set (Market1501, Duke-MTMC, CUHK03) and the like, dividing the data into a training set and a verification set after whitening, and stacking the training set by using a random erasing, random cutting, color migration and random noise adding mode for data expansion;

and step 3: inputting a sample into a pedestrian skeleton detection model backbone network to obtain pedestrian attitude high-dimensional matrix characteristics output by the last layer of the backbone network;

and 4, step 4: fusing the extracted pedestrian attitude high-dimensional matrix into a pedestrian retrieval model for training, wherein the feature fusion mode is that firstly, the SE module is utilized to screen the pedestrian attitude high-dimensional matrix features, then, the feature graph extracted by the pedestrian retrieval model is spliced and compressed;

step 6: the loss is calculated in the forward direction,

l2 regular operation is carried out on the output feature vector, and the formula of L2 Normal is as follows:

L2(feature_vector)＝feature_vector/|feature_vector|

after L2 regularization operation is completed, the Euclidean distance and the cosine distance between two unit feature vectors form a positive correlation relationship, and the obtained positive correlation relationship is as follows:

L₂(A，B)＝sqrt(2-2cos(A，B))

and splicing unit feature vectors after passing through L2 Normal to form a triple, and calculating the ternary loss according to the triple feature vectors, wherein the calculation formula of the ternary loss is as follows:

the unit feature Vector after L2 regularization is used for obtaining a Test Vector Testvector through a Bach Normalization layer, the Test Vector is used for a later pedestrian retrieval Test stage, and a bias parameter bias of a Batch Normalization layer is set to be constantly equal to 0. the calculation formula of the L BNNeck is obtained as follows:

the Test Vector is input to the fully connected layer (FC layer) after the network model and the loss is calculated, the softmax loss function is as follows:

and 7: calculating gradient information of each sample in the data set;

and step 9: if the model is not converged or the maximum iteration number is not reached, returning to the step 3;

As shown in FIG. 3, G L is a common model attention area, PA is a model extraction posture feature branch I, EPA is a model extraction posture feature branch II, and EAP-Net (the invention) is a combined attention area of G L + PA + EPA three branches.

As shown in fig. 4, the high-dimensional attitude feature matrix and the global feature matrix are fused to obtain a joint attention effect, so that a better search basis is provided, and more sample information can be paid attention to.

As shown in fig. 9, where Query represents a Query sample for pedestrian retrieval, Top-10results represents the 10 samples closest to the Query sample Query in the Query result. And the odd-numbered behaviors do not fuse the basic model query result of the attitude characteristic, and the even-numbered behaviors fuse the EAP-Net query result of the attitude characteristic. The black box is a sample of query errors. EAP-Net has a greater pedestrian retrieval capability than the base model without added attitude features. Under the condition that different pedestrians are difficult to distinguish samples along with similar clothes, the EAP-Net can complete identity judgment based on the posture information of the pedestrians, and the robustness of the model is improved. When Rankl is solved to find the similar sample closest to the query sample, the EAP-Net can preferentially query the sample with the same action as the query sample based on the attitude characteristics of the pedestrian, and the first hit rate of the model in the pedestrian identity distinguishing stage is improved.

The present invention has been described in detail with reference to the embodiments, but the description is only illustrative of the present invention and should not be construed as limiting the scope of the present invention. The scope of the invention is defined by the claims. The technical solutions of the present invention or those skilled in the art, based on the teaching of the technical solutions of the present invention, should be considered to be within the scope of the present invention, and all equivalent changes and modifications made within the scope of the present invention or equivalent technical solutions designed to achieve the above technical effects are also within the scope of the present invention.

Claims

1. A pedestrian re-recognition method integrated with pedestrian posture features is characterized by comprising the following steps: the method comprises the following steps:

step 1: preprocessing a pedestrian re-identification data set, and dividing the whitened data into a training set and a verification set;

step 2: training a pedestrian skeleton detection model by using a public data set;

and step 3: extracting pedestrian attitude high-dimensional matrix features by using a pedestrian skeleton detection model;

and 4, step 4: fusing the extracted pedestrian attitude high-dimensional matrix into a pedestrian retrieval model for training;

step 6: forward computing losses, wherein the losses comprise ternary losses and cross entropy losses;

and 7: calculating gradient information of each sample in the data set;

step 10: after the network training is finished, the pedestrian retrieval task in the query set is completed on the test set, and Rankl, Rank5, Rankl0 and mAP are calculated.

2. The pedestrian re-recognition method integrated with the posture and posture features of the pedestrian as claimed in claim 1, wherein: and 3, inputting the sample into a backbone network of the pedestrian skeleton detection model, and acquiring the pedestrian attitude high-dimensional matrix characteristic output by the last layer of the backbone network.

3. The pedestrian re-recognition method integrated with the posture and posture features of the pedestrian as claimed in claim 1, wherein: in the step 4, a feature fusion mode of fusing the pedestrian posture high-dimensional matrix into the pedestrian retrieval model can adopt a mode of directly summing the features of the pedestrian posture high-dimensional matrix and the feature map extracted by the pedestrian retrieval model, or a mode of compressing by using a convolution layer after splicing two feature matrices, or a mode of screening the features of the pedestrian posture high-dimensional matrix by using an SE module, then splicing the feature map extracted by the pedestrian retrieval model and then compressing.

4. The pedestrian re-identification method integrated with the pedestrian posture features as claimed in claim 1, wherein in step 6, L2 regularization operation is carried out on output feature vectors, unit feature vectors after L2 regularization are spliced to form a triple, ternary loss is calculated according to the triple feature vectors, the unit feature vectors are subjected to Bach Normalization layer to obtain Test vectors, the Test vectors are used in a later pedestrian retrieval Test stage, and the Test vectors are input into a full connection layer behind a network model and cross entropy loss is calculated.

5. The pedestrian re-recognition method integrated with the posture and posture features of the pedestrian as claimed in claim 3, wherein: the bias parameter bias of the Batch Normalization layer is set to be constantly equal to 0 in step 6.