CN110619271A

CN110619271A - Pedestrian re-identification method based on depth region feature connection

Info

Publication number: CN110619271A
Application number: CN201910741523.4A
Authority: CN
Inventors: 刘远超; 吴宗林; 夏路; 高飞; 何伟荣
Original assignee: Zhejiang Haoteng Electronics Polytron Technologies Inc
Current assignee: Zhejiang Haoteng Electronics Polytron Technologies Inc
Priority date: 2019-08-12
Filing date: 2019-08-12
Publication date: 2019-12-27

Abstract

The invention discloses a pedestrian re-identification method based on depth region feature connection, which comprises the following steps of 1): given image P to be matched_dAnd P_c(ii) a Step 2): designing a feature extraction network, wherein Resnet is used as a network backbone in the network, and a final average pooling layer and a softmax layer are removed and named as FPEN; step 3): giving the image P to be matched by using attitude estimation algorithm_dAnd P_cCarrying out attitude prediction on the pedestrian to obtain a pedestrian framework; according to the pedestrian re-identification method, the pedestrian attitude estimation and deep learning methods are utilized, the pedestrian image is subjected to accurate sub-region division processing, meanwhile, different characteristics among the sub-regions are utilized for feature connection, overall and local image features are comprehensively utilized, and compared with other methods, the accuracy of pedestrian re-identification and the robustness of the pedestrian re-identification method on each pedestrian re-identification data set are improved.

Description

Pedestrian re-identification method based on depth region feature connection

Technical Field

The invention relates to the fields of deep learning, pedestrian attitude estimation, computer vision and the like, in particular to a pedestrian re-identification method based on deep area feature connection.

Background

Along with the increasing development of the pedestrian re-identification technology, the identification accuracy and the identification efficiency are improved, the technology is more and more applied to the field of intelligent security, and the more and more important functions are played in the fields of public security, criminal investigation, public safety and the like. Moreover, the pedestrian re-identification technology also plays an important role in the emerging fields of unmanned supermarkets, photo album clusters and the like. With the arrival of the big data era, the pedestrian matching data sets in the pedestrian re-identification field are increasingly huge, the data amount and the data types are increased sharply, from 1264 images in the early stage, 632 pedestrians only comprise two-camera VIPeR data sets to the present, 126441 images, 4101 pedestrians and MSMT17 data sets comprising 15 cameras, the data are enriched and diversified, and meanwhile, great challenges are brought to the efficiency and the accuracy of pedestrian re-identification. Through development for many years and coming of the deep learning era, the mainstream of the pedestrian re-identification technology is mainly a deep learning method at present. However, in the present view, the pedestrian re-identification technology is far from reaching the commercial mature technical standard, and the pedestrian re-identification method based on deep learning still has a great exploration and progress space. And from the perspective of data set migration, the matching accuracy of pedestrian re-identification after data set migration is greatly reduced, and the robustness is not high.

Therefore, the method for re-identifying the pedestrian by using the deep learning and region segmentation method still has higher research value and significance and has very great feasibility.

Zhang (ICCV, 2015) and the like use an MSTR (multi-task spark registration, which does not need to align and uses sparse feature expression, i.e. dictionary method to represent features) method to re-identify pedestrians, meanwhile, constraints are added to reduce the number of mis-aligned and matched image blocks, an image block similarity scoring mechanism is introduced, global component matching is considered, and mis-matching is reduced by using upper spatial layout information; kim et al (hong kong chinese university, 2017) use CNN, RoI posing and Attention models in combination to re-identify pedestrians, use the RoI posing layer to extract feature vectors corresponding to predefined portions of an input image, then selectively focus on subsets of CNN feature vectors through the Attention model, and divide the human body into 13 local portions under the network framework to cope with the occlusion of pedestrians at different positions; su et al (Qinghua university, 2017) propose a posture-driven deep convolution (PDC) model, which utilizes skeleton information of each part of the human body to reduce posture change, learns more robust feature representation from a global image and each different local part, and designs a posture-driven feature weighting sub-network to learn adaptive feature fusion to match features from the whole body and each local body part of the human body; sun et al (CVPR, 2018) design a Baseline-PCB (Part-based connected basic) network, which can obtain a comprehensive descriptor obtained from characteristics of several local levels for pedestrian matching of a reid task, consider continuity of information transition among parts of a pedestrian image in Baseline, adopt an idea of network countermeasure during training, and use an RPP (refined Part posing) strategy to generate continuity in the parts, so that a local-based model finally obtains stronger applicability and robustness; zhao (Zhongkou, 2017) and the like are unsupervised and migrated to an Re-ID data set by utilizing a posture estimation model trained on other data sets to obtain the positioning of local features, local feature information (part-levelfeatures) is extracted, and final pedestrian features are obtained for matching in a mode of combining a human body region guide multi-stage Feature Extraction Network (FEN) and a tree structure competitive Feature Fusion Network (FFN); he (zhongkou, 2018) and the like propose a local reID method which integrates sparse reconstruction learning and deep learning, does not require alignment of pedestrian images, and has no size constraint on the size of an input image, and an end-to-end depth model is trained by minimizing the reconstruction error of the pedestrian images of the same id and maximizing the reconstruction errors of different ids.

Although the above documents and methods all refer to the matching of pedestrian images by deep learning and the like, the following disadvantages still exist:

(1) the method only aims at one data set, and the robustness of the segmentation of the local area of the pedestrian is not high, so that the overall recognition robustness is low;

(2) the re-identification accuracy is not high, the expected effect is still not achieved, and the accuracy is still low under the two evaluation indexes of average accuracy and Rank-1.

Therefore, how to design a new method for pedestrian re-identification, so that the pedestrian re-identification has higher average accuracy and Rank-1, and the pedestrian re-identification is a problem which needs to be solved at present and is good in performance in each data set.

Disclosure of Invention

In order to overcome the defects of the algorithm and the method and improve the efficiency and the accuracy of pedestrian re-identification, the invention provides a pedestrian re-identification method based on depth region characteristic connection.

The technical scheme of the invention is as follows:

a pedestrian re-identification method based on depth region feature connection is characterized by comprising the following steps:

step 1): given image P to be matched_dAnd P_c；

Step 2): designing a feature extraction network, wherein Resnet is used as a network backbone in the network, and a final average pooling layer and a softmax layer are removed and named as FPEN;

step 3): giving the image P to be matched by using attitude estimation algorithm_dAnd P_cCarrying out attitude prediction on the pedestrian to obtain a pedestrian framework;

step 4): image P according to skeleton_dThe pedestrian in (1) is divided into five sub-image parts, namely a head part, a left trunk part, a right trunk part, an upper leg part and a lower leg part, which are respectively marked as P_dh、P_dl、P_dr、P_duAnd P_dd；

Step 5): image P according to skeleton_cThe pedestrian in (1) is divided into five sub-image parts, namely a head part, a left trunk part, a right trunk part, an upper leg part and a lower leg part, which are respectively marked as P_ch、P_cl、P_cr、P_cuAnd P_cd；

Step 6): image P_dAnd P_cRespectively putting the rows into an FPEN network for feature extraction, and respectively adding a full-connection layer at the tail end of the network to respectively obtain feature vectors V_dtAnd V_ct；

Step 7): sub-graph P_dh、P_dl、P_dr、P_duAnd P_ddRespectively putting the obtained data into an FPEN network for feature extraction, and respectively adding a full-connection layer at the tail end of the network to respectively obtain feature vectors V_dh、V_dl、V_dr、V_duAnd V_dd；

Step 8): sub-graph P_ch、P_cl、P_cr、P_cuAnd P_cdRespectively putting the obtained data into an FPEN network for feature extraction, and respectively adding a full-connection layer at the tail end of the network to respectively obtain feature vectors V_ch、V_cl、V_cr、V_cuAnd V_cd；

Step 9): using a full connection layer to connect the feature vectors V_dlAnd V_drConnecting to obtain a new feature vector V_dm；

Step 10): using a full connection layer to connect the feature vectors V_duAnd V_ddConnecting to obtain a new feature vector V_dn；

Step 11): using a full connection layer to connect the feature vectors V_clAnd V_crConnecting to obtain a new feature vector V_cm；

Step 12): using a full connection layer to connect the feature vectors V_cuAnd V_cdConnecting to obtain a new feature vector V_cn；

Step 13): using a full connection layer to connect the feature vectors V_dt、V_dmAnd V_dnConnecting to obtain a new feature vector V_db；

Step 14): using a full connection layer to connect the feature vectors V_ct、V_cmAnd V_cnConnecting to obtain a new feature vector V_cb；

Step 15): using a full connection layer to connect the feature vectors V_dbAnd V_dtConnecting to obtain image P_dIs finally characterized by the vector V_d；

Step 16): using a full connection layer to connect the feature vectors V_cbAnd V_ctConnecting to obtain image P_cIs the most important ofFinal feature description vector V_c；

Step 17): calculating the similarity between the characteristic vectors according to a cosine distance formula to obtain the similar distance D (P) between the two images_d,P_c)；

Step 18): if similar distance D (P)_d,P_c) If the number of the pictures is larger than the set threshold value T, the two pictures are considered as the same person, otherwise, the pictures are not considered as the same person.

The invention has the advantages that: by utilizing the pedestrian posture estimation and deep learning methods, the pedestrian image is subjected to accurate subregion division processing, meanwhile, different characteristics among subregions are utilized for characteristic connection, overall and local image characteristics are comprehensively utilized, and compared with other methods, the accuracy of pedestrian re-identification and the robustness of a pedestrian re-identification method on each pedestrian re-identification data set are improved.

Drawings

FIG. 1 shows an image P to be matched_dA skeleton diagram;

FIG. 2 shows an image P to be matched_cA skeleton diagram;

FIG. 3 is a diagram of an image P to be matched_dA body segmentation map;

FIG. 4 shows an image P to be searched_cBody segmentation map.

Detailed Description

The following describes a specific embodiment of the pedestrian re-identification method based on the depth region feature connection according to the present invention based on an example.

Step 1): given image P to be matched_dAnd P_c；

step 3): giving the image P to be matched by using attitude estimation algorithm_dAnd P_cCarrying out attitude prediction on the pedestrian to obtain a pedestrian skeleton; in this embodiment, the pedestrian pose estimation algorithm is openpos, and the image P to be matched is specifically defined_dAnd P_cThe pedestrian skeleton diagram is shown in figures 1 and 2;

step 4): image P according to skeleton_dThe pedestrian in (1) is divided into five sub-image parts, namely a head part, a left trunk part, a right trunk part, an upper leg part and a lower leg part, which are respectively marked as P_dh、P_dl、P_dr、P_duAnd P_dd(ii) a In the present embodiment, the specific divided image is as shown in fig. 3;

step 5): image P according to skeleton_cThe pedestrian in (1) is divided into five sub-image parts, namely a head part, a left trunk part, a right trunk part, an upper leg part and a lower leg part, which are respectively marked as P_ch、P_cl、P_cr、P_cuAnd P_cd(ii) a In the present embodiment, the specific divided image is as shown in fig. 4;

Step 16): using a full connection layer to connect the feature vectors V_cbAnd V_ctConnecting to obtain image P_cIs finally characterized by the vector V_c；

Step 18): if similar distance D (P)_d,P_c) If the number of the pictures is larger than the set threshold value T, the two pictures are considered as the same person, otherwise, the two pictures are not considered as the same person; in the present embodiment, the threshold value of T is set to 0.74.

The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.

Claims

1. A pedestrian re-identification method based on depth region feature connection is characterized by comprising the following steps:

step 1): given image P to be matched_dAnd P_c；

Step 2): designing a feature extraction network, wherein Resnet is used as a network backbone, the last average pooling layer and softmax layer are removed, and the network is named as FPEN;

step 4): image P according to skeleton_dThe pedestrian in (1) is divided into five sub-image parts of head, left trunk, right trunk, upper leg and lower leg, which are respectively marked as P_dh、P_dl、P_dr、P_duAnd P_dd；

Step 5): image P according to skeleton_cThe pedestrian in (1) is divided into five sub-image parts of head, left trunk, right trunk, upper leg and lower leg, which are respectively marked as P_ch、P_cl、P_cr、P_cuAnd P_cd；

Step 6): image P_dAnd P_cEach subgraph is respectively put into an FPEN network for feature extraction, and a full-link layer is respectively added at the tail end of the network to respectively obtain a feature vector V_dtAnd V_ct；

Step 10): using a full connection layer to connect the feature vectors V_duAnd V_ddIs connected to obtainNew feature vector V_dn；