CN113128461A

CN113128461A - Pedestrian re-recognition performance improving method based on human body key point mining full-scale features

Info

Publication number: CN113128461A
Application number: CN202110492149.6A
Authority: CN
Inventors: 杨绿溪; 韩志伟; 胡欣毅; 惠鸿儒; 李春国; 黄永明
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-05-06
Filing date: 2021-05-06
Publication date: 2021-07-16
Anticipated expiration: 2041-05-06
Also published as: CN113128461B

Abstract

The invention discloses a pedestrian re-identification performance improvement method based on full-scale features mined by human body key points, which utilizes the human body key points as local features of pedestrians, relieves the problem of insufficient global feature discrimination of the pedestrians under the shielding condition, combines key point thermodynamic diagrams obtained by a human body key point network with a pedestrian re-identification network, and increases the local features. Meanwhile, the visibility of the key points is predicted, and the visibility is applied to the calculation of the distance between the loss function and the characteristic, so that the negative influence of invisible local characteristics on the network performance is relieved. Experimental results show that the accuracy of pedestrian re-identification is effectively improved by the method provided by the invention.

Description

Pedestrian re-recognition performance improving method based on human body key point mining full-scale features

Technical Field

The invention relates to a pedestrian re-identification technology, and belongs to the technical field of computer visual image retrieval.

Background

Pedestrian re-identification (ReID) is a solution to the problem of pedestrian identification and retrieval across cameras and across scenes. And giving a target pedestrian picture, and finding the best matched pedestrian from a pedestrian picture library obtained by shooting through other cameras. Because blind spots exist among the cameras, the complete track of the pedestrian cannot be obtained by utilizing target tracking. Therefore, the pedestrian needs to be identified and matched between the two cameras, and if the time stamp can be combined, the number of matched pictures can be greatly reduced. Pedestrian re-identification can be regarded as a picture retrieval technology, and mainly aims at pedestrian retrieval. Compared with face recognition, the pedestrian re-recognition has less constraint on the scene, so that the method is more suitable for application in a security scene.

The early pedestrian re-identification algorithm is based on a manual feature extraction algorithm, firstly, features of pedestrians are extracted by utilizing a manually designed feature extraction template, the common features comprise color features, texture features, local features and semantic features, then, the distance between the features of an inquiry picture and the features of candidate pictures is calculated through a proper metric formula such as Mahalanobis distance and the like, and finally, whether the candidate pictures are matched with the inquiry picture can be judged. The core point of the traditional algorithm is to design pedestrian features with higher discriminability and more appropriate metric learning.

With the development of deep learning, more and more computer vision tasks adopt deep learning processing and have great success. In the field of pedestrian re-identification, the deep learning method is far superior to a pedestrian re-identification algorithm based on manually extracted features. Therefore, deep learning has become a mainstream research algorithm in the field of pedestrian re-identification in recent years. The mainstream algorithm for pedestrian re-identification at present comprises a method based on characterization learning, a method based on local characteristics, a method based on generation of a countermeasure network and a method based on metric learning.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the prior art, a pedestrian re-identification performance improvement method based on human body key point mining full-scale features is provided, and the problem that pedestrians are shielded in pedestrian re-identification is solved.

The technical scheme is as follows: a pedestrian re-identification performance improving method based on human body key point mining full-scale features comprises the following steps:

step 1: training a human body key point detection network hourglass network by using a data set aiming at human body key point detection, and inputting a pedestrian re-identification picture into the trained human body key point detection network hourglass network to obtain a thermodynamic diagram of key points of pedestrians;

step 2: inputting thermodynamic diagrams of pedestrians in a data set aiming at human body key point detection into a visibility classification sub-network, training the visibility classification sub-network, inputting the thermodynamic diagrams of the pedestrian key points obtained in the step 1 into the trained visibility classification sub-network for classification, and obtaining the visibility probability of each human body key point;

and step 3: inputting the pedestrian re-identification picture into a full-scale network of a pedestrian re-identification network to obtain global characteristics of pedestrians before global average pooling;

and 4, step 4: multiplying the global features obtained in the step (3) with thermodynamic diagrams of key points of the pedestrians to obtain a pedestrian feature map corresponding to each key point, and inputting the pedestrian feature maps and the global features into subsequent global average pooling to obtain global features and local features of the pedestrians;

and 5: in the training process, inputting the global features and the local features into a classifier to obtain the probability of each feature to each pedestrian identity, wherein the loss function is the cross entropy loss of each feature to each pedestrian identity and the real probability and then weighted average is carried out, and the weight is the probability of the visibility of key points of the human body;

in the testing process, the global characteristics and the local characteristics of the query picture and the database picture are respectively obtained in the step 4, the distance between the two picture characteristics is respectively calculated, the distance is obtained by weighted average of the global characteristics and the local Euclidean distance, and the weight of the local characteristics is the normalization of the visibility probability of each human key point.

Further, the step 1 specifically includes the following sub-steps:

step 1.1: training a human body key point detection network hourglass network, and cutting a pedestrian data set in a data set aiming at human body key point detection according to a provided rectangular frame to obtain each pedestrian picture, so that the hourglass network can train single key point detection;

step 1.2: performing data enhancement on each pedestrian picture obtained in the step 1, ensuring that the size of the picture is 256 multiplied by 128, wherein the data enhancement comprises the following steps: turning pictures, transforming sizes and filling the pictures;

step 1.3: inputting the data-enhanced picture into a human body key point detection network hourglass network, wherein the hourglass network comprises a plurality of hourglass modules, capturing global information and local information of pedestrians by utilizing the hourglass modules, predicting thermodynamic diagrams of key points of the pedestrians after the hourglass modules combine the global information and the local information, and taking the predicted thermodynamic diagrams as input of a next hourglass module until training of the human body key point detection network hourglass network is completed;

step 1.4: and changing the size of the pedestrian re-identification picture into 256 multiplied by 128, and inputting the picture into the trained human body key point detection network hourglass network to obtain the thermodynamic diagram of the pedestrian key points.

Further, the step 2 specifically includes the following sub-steps:

step 2.1: training a visibility classification sub-network, inputting thermodynamic diagrams of key points of pedestrians into a visibility classification sub-network convolution layer to obtain characteristics of the thermodynamic diagrams, inputting the characteristics of the thermodynamic diagrams into a full connection layer, and finally constraining the output within a range of 0 to 1 through an activation function so that the output of the visibility classification sub-network represents the visibility probability corresponding to each human key point;

step 2.2: taking the visibility probability as the input of a binary classification loss function, setting the visible key point as 1 and setting the invisible key point as 0;

step 2.3: and (3) taking the thermodynamic diagram of the key points of the pedestrian obtained in the step 1.4 as the input of the visibility classification sub-network trained in the step 2.2, and inputting the output of the visibility classification sub-network into an activation function to obtain the visibility probability of each human body key point.

Further, the step 3 specifically includes the following sub-steps:

step 3.1: the pedestrian re-identification picture is scaled to be 256 multiplied by 128 in the same size, and then data enhancement is carried out on the pedestrian re-identification picture through horizontal overturning;

step 3.2: inputting the data enhanced picture into a full-scale network of a pedestrian re-identification network, firstly extracting features by using a 7 multiplied by 7 convolutional network and a maximum pooling layer, and then inputting the extracted features into 4 residual modules to obtain new features, namely global features of pedestrians before global average pooling.

Further, the step 4 specifically includes the following sub-steps:

step 4.1: carrying out bilinear interpolation on the thermodynamic diagram of the key points of the pedestrian obtained in the step 1 to enable the size of the thermodynamic diagram to be consistent with the size of the global feature obtained in the step 3;

step 4.2: multiplying each thermodynamic diagram obtained in the step 4.1 by the global features obtained in the step 3 to obtain a pedestrian feature diagram corresponding to each key point;

step 4.3: inputting the pedestrian feature map and the global features corresponding to each key point into a global average pooling layer, and then leveling each feature to obtain the local characterization and the global characterization of the pedestrian.

Further, the step 5 specifically includes the following sub-steps:

step 5.1: in the training process, the local representation and the global representation of the pedestrian obtained in the step 4 are sequentially input into a full connection layer and an activation layer of a classifier, the output is constrained within the range of 0 to 1 by using an activation function, and the sum of the output is 1, so that the probability of different pedestrian identities is represented;

step 5.2: obtaining the loss of the network by utilizing the cross entropy loss according to the probability obtained in the step 5.1; the loss comprises the classification loss of global characterization and the classification loss of local characterization of the pedestrian, the two types of loss are weighted and averaged to obtain a loss function, and the weight is the probability of the visibility of the key points of the human body;

step 5.3: in the testing process, the local representation and the global representation of the query picture are obtained by using the step 4, and the local representation and the global representation of the database picture library are obtained by using the method of the step 4;

step 5.4: and respectively calculating the distances between the query picture and all picture representations in the database pictures, wherein the distances are obtained by Euclidean distance weighted average of global representations and local representations, the local representation weight is the normalization of the visibility probability of each human body key point, and then, the picture with the minimum distance from the query picture in the database pictures is taken as a picture matched with query.

Has the advantages that: in real scenes, occlusion is inevitable. In order to solve the problem of pedestrian re-identification in an occlusion scene, the invention is improved and optimized on two aspects: on one hand, in order to solve the problem of occlusion, the extraction of the local features of the pedestrians is of great importance, and the influence of occlusion on the global features can be effectively avoided by utilizing the effective local features. The key points of pedestrians are local features commonly used by pedestrians, so it is effective to combine human body key points to pedestrian re-recognition. On the other hand, occlusion may cause invisibility of the key points, and invisible local features may reduce the overall performance of the network, so it is necessary to increase the weight of the visible key point features and reduce the weight of the invisible key point features. The invention introduces the visibility of key points, applies the visibility to the calculation of the loss function and the characteristic distance of the network and reduces the influence of invisible key points.

Meanwhile, the thermodynamic diagram generated by the human body key point detection network is used for extracting the local features of the pedestrians, and the size of the key points is prevented from being manually designed. The accuracy of the method is 68.1% on the shielded pedestrian re-identification data set, and the method is greatly improved compared with the current algorithm.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of a full scale staggering module;

FIG. 3 is a diagram of the overall structure of a full-scale network;

FIG. 4 is a schematic view of an hourglass module of the hourglass network;

FIG. 5 is an overall structural view of the hourglass network;

fig. 6 is a diagram of the overall network architecture of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings.

As shown in fig. 1 to 6, the method for improving the pedestrian re-identification performance based on the human body key point mining full-scale feature includes the following steps:

step 1: training the human body key point detection network hourglass network by using a data set aiming at human body key point detection, and inputting a pedestrian re-identification picture into the trained human body key point detection network hourglass network to obtain a thermodynamic diagram of the key points of the pedestrians.

The step 1 specifically comprises the following substeps:

step 1.1: training a human body key point detection network hourglass network, and cutting a pedestrian data set in a data set aiming at human body key point detection according to a provided rectangular frame to obtain each pedestrian picture, so that the hourglass network can train single key point detection.

Step 1.2: performing data enhancement on each pedestrian picture obtained in the step 1, ensuring that the size of the picture is 256 multiplied by 128, wherein the data enhancement comprises the following steps: picture turning, size conversion and picture filling.

Step 1.3: the method comprises the steps of inputting a picture with enhanced data to a human body key point detection network hourglass network, wherein the hourglass network comprises a plurality of hourglass modules, capturing global information and local information of pedestrians by utilizing the hourglass modules, predicting thermodynamic diagrams of key points of the pedestrians after the hourglass modules combine the global information and the local information, and taking the predicted thermodynamic diagrams as input of a next hourglass module until training of the human body key point detection network hourglass network is completed.

Step 2: inputting thermodynamic diagrams of pedestrians in the data set aiming at human body key point detection into a visibility classification sub-network, training the visibility classification sub-network, inputting the thermodynamic diagrams of the pedestrian key points obtained in the step 1 into the trained visibility classification sub-network for classification, and obtaining the visibility probability of each human body key point.

The step 2 specifically comprises the following substeps:

step 2.1: training a visibility classification sub-network, inputting the thermodynamic diagram of the key points of the pedestrians into a visibility classification sub-network convolution layer to obtain the characteristics of the thermodynamic diagram, inputting the characteristics of the thermodynamic diagram into a full connection layer, and finally constraining the output within the range of 0 to 1 through an activation function, so that the output of the visibility classification sub-network represents the visibility probability corresponding to each human body key point.

And step 3: and inputting the pedestrian re-identification picture into a full-scale network of the pedestrian re-identification network to obtain the global characteristics of the pedestrians before global average pooling.

The step 3 specifically comprises the following substeps:

step 3.1: the pedestrian re-identification picture is scaled to the same size of 256 × 128, and then data enhancement is performed on the pedestrian re-identification picture through horizontal flipping.

Step 3.2: inputting the data enhanced picture into a full-scale network of a pedestrian re-identification network, firstly extracting features by using a 7 multiplied by 7 convolutional network and a maximum pooling layer, and then inputting the extracted features into 4 residual modules to obtain new features, namely global features of pedestrians before global average pooling. The residual errors reduce model parameters by utilizing depth separable convolution, the calculated amount of the model is reduced, and the module can learn multi-scale features by staggering a plurality of branches in parallel, so that residual error connection is increased.

And 4, step 4: and (4) multiplying the global features obtained in the step (3) with thermodynamic diagrams of key points of the pedestrians to obtain a pedestrian feature map corresponding to each key point, and inputting the pedestrian feature map and the global features into subsequent global average pooling to obtain global and local features of the pedestrians.

The step 4 specifically comprises the following substeps:

step 4.1: and (3) carrying out bilinear interpolation on the thermodynamic diagram of the key points of the pedestrian obtained in the step (1) to enable the size to be consistent with the size of the global feature obtained in the step (3).

Step 4.2: and (4) multiplying each thermodynamic diagram obtained in the step (4.1) with the global features obtained in the step (3) to obtain a pedestrian feature diagram corresponding to each key point.

The step 5 specifically comprises the following substeps:

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A pedestrian re-recognition performance improving method based on human body key point mining full-scale features is characterized by comprising the following steps:

2. The method for improving the pedestrian re-identification performance based on the human body key point mining full-scale feature of claim 1, wherein the step 1 specifically comprises the following substeps:

3. The method for improving the pedestrian re-identification performance based on the human body key point mining full-scale feature of claim 2, wherein the step 2 specifically comprises the following sub-steps:

4. The method for improving the pedestrian re-identification performance based on the human body key point mining full-scale feature of claim 1, wherein the step 3 specifically comprises the following substeps:

5. The method for improving the pedestrian re-identification performance based on the human body key point mining full-scale feature of claim 1, wherein the step 4 specifically comprises the following substeps:

6. The method for improving the pedestrian re-identification performance based on the human body key point mining full-scale feature of claim 1, wherein the step 5 specifically comprises the following substeps: