CN111797813B - Partial pedestrian re-identification method based on visible perception texture semantic alignment - Google Patents

Partial pedestrian re-identification method based on visible perception texture semantic alignment Download PDF

Info

Publication number
CN111797813B
CN111797813B CN202010708118.5A CN202010708118A CN111797813B CN 111797813 B CN111797813 B CN 111797813B CN 202010708118 A CN202010708118 A CN 202010708118A CN 111797813 B CN111797813 B CN 111797813B
Authority
CN
China
Prior art keywords
pedestrian
human body
texture
alignment
partial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010708118.5A
Other languages
Chinese (zh)
Other versions
CN111797813A (en
Inventor
高赞
高立帅
张桦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Technology
Original Assignee
Tianjin University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Technology filed Critical Tianjin University of Technology
Priority to CN202010708118.5A priority Critical patent/CN111797813B/en
Publication of CN111797813A publication Critical patent/CN111797813A/en
Application granted granted Critical
Publication of CN111797813B publication Critical patent/CN111797813B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Abstract

A partial pedestrian re-identification method (TSA) based on visible perception texture semantic alignment can simultaneously and efficiently solve the two problems of pedestrian occlusion and posture or observation visual angle change. The method comprises the following specific steps: (1) designing a local area alignment network based on human body postures, and mainly solving the problem that pedestrians are shielded; (2) designing a texture alignment network based on semantic visibility, and mainly solving the problem of pedestrian posture change or visual angle change; (3) in order to enable the model to have better generalization capability, the two networks are subjected to joint learning, so that the model can better deal with the problems that pedestrians are shielded and the posture or the observation visual angle changes. The method carries out efficient partial pedestrian re-recognition based on visible perception texture semantic alignment and human body posture partial region alignment, can effectively solve the problems of shielding and posture change in pedestrian re-recognition, is high in convergence speed, and can realize efficient re-recognition in pedestrian shielding.

Description

Partial pedestrian re-identification method based on visible perception texture semantic alignment
Technical Field
The invention belongs to the technical field of computer vision and pattern recognition, and relates to a partial pedestrian re-recognition method (TSA) based on visible perception texture semantic alignment, which can align on textures and local areas simultaneously and can solve the problems of blocking and posture transformation in pedestrian re-recognition.
Background
In recent years, with the development of infrastructure, tens of millions of non-overlapping (non-intersecting) cameras have been placed in various corners of many cities in order to secure public security and safety. In some special cases, when a target person disappears from one camera, it is desirable to quickly re-identify the target person from the other cameras. This is a research focus in the field of computer vision and machine learning-the pedestrian re-identification (ReID) task. Because of its importance in public security and safety, many approaches have been proposed. All these methods assume that the whole person can be completely covered by any camera, but in real monitoring environments occlusion problems often occur. Due to the obvious difference between the partial human figure and the whole human figure, most of the existing ReiD methods cannot identify the target human when being applied to the task of re-identifying partial pedestrians, and the performance of the existing ReiD methods is sharply reduced. Since a local perspective of a person may be any part of the body, this part often needs to be scaled to a fixed size image in order to match it to the overall image of the person, during which unwanted distortions occur, leading to performance degradation. Therefore, some studies consider how to identify arbitrary parts of the occluded pedestrian image, such as SWM, AMC, DSR, and DCR. However, in these methods, pedestrians in the image are divided into independent modules, and then these methods calculate the image matching degree based on these blocks, but the unshared regions still remain, so these unshared blocks will become noise in the matching process of the bust and the whole body, resulting in the occurrence of the misalignment, which affects the matching performance.
Disclosure of Invention
The invention aims to solve the problems of inconsistent input image dimensions in the conventional classical algorithms SWM 1 and AWC 1, the problem of non-shared region feature interference in DSR 2 and DCR 3 algorithms and the problem of human posture transformation existing in a real scene in the methods, and provides a partial pedestrian re-identification method (TSA) based on visible perception texture semantic alignment.
Technical scheme of the invention
A partial pedestrian re-identification method (TSA) based on visible perception texture semantic alignment specifically comprises the following steps:
1, designing a local area alignment network based on human body postures;
step 1.1, a scheme for aligning local regions of human body parts is provided to further solve the problem of shielding;
for a complete pedestrian picture, 17 human body position key points are obtained by utilizing posture estimation, and are respectively determined as eyes, ears, mouths, shoulders, elbows, hands, hips and knees, feet, the pedestrian is divided into 5 regions in the longitudinal direction, and the 5 regions are respectively: head, trunk, upper leg, lower leg, and foot, denoted as V i And i is 1, 2, 3, 4 and 5, then judging which region is occluded according to the condition of key point missing, and if the region is occluded, V i Equal to 0, not occluded V i Equal to 1;
step 1.2, further solving the pose transformation problem by using a pixel-level classification scheme;
classifying each pixel point through a softmax classifier, wherein the total number of the classes is 5 region classes obtained in the step 1.1; performing softmax classification on the corresponding area of each pedestrian picture, wherein the number of the classes is the number of the corresponding pedestrians in the training set; using V obtained in step 1.1 i Calculating the Euclidean distance between the query and the galery by the information, namely calculating the distance between the visible block of a part of pedestrian pictures and the corresponding block of the complete pedestrian pictures; in the aspect of selection of the basic network, selecting Resnet as the network model backbone of the step;
step 1.3, calculating the cross entropy loss and the Euclidean distance on the basis of the step 1.2:
a. in the training process, each pixel point is subjected to classified cross entropy loss, wherein the class label is 5 region serial numbers corresponding to the step 1.1; b. in the training process, each picture is classified to have cross entropy loss, wherein the class label is a label corresponding to the image in the training set; c. designing a triple loss function according to the Euclidean distance between the pictures calculated in the step 1.2;
2, designing a texture alignment network based on semantic visibility;
the human body consists of a 3D grid and a texture map under a UV coordinate system, and the texture alignment scheme is to calculate the distance between pedestrians by using the texture map features corresponding to the body parts, so that the problem of the transformation of the posture of the human body and the shooting visual angle of a camera is solved;
2.1, generating a pedestrian image texture map by using a texture generator, wherein the texture map under the UV coordinate system can realize the angle invariance of the characteristic map; a human body semantic segmentation model trained by an EANet method is used for classifying components of each pedestrian in a ReiD data set, the model is used for training a COCO-Part14 data set to divide pedestrian pictures into 14 classes, and the 14 classes are respectively of body parts: the head, the trunk, the left upper arm, the right upper arm, the left lower arm, the right lower arm, the left hand, the right hand, the left upper leg, the right upper leg, the left lower leg, the right lower leg, the left foot and the right foot can also know which part is lost or blocked by utilizing human semantic segmentation information;
step 2.2, the human body semantic segmentation model trained in the step 2.1 can judge which part of the human body is shielded on the half-length picture, and is marked as V j If occluded equals 0, unoccluded equals 1;
obtaining whether each body part is shielded according to the texture maps obtained in the step 2.3 and the step 2.1 and obtaining information V of whether each body part is shielded according to the step 2.2 j Calculating a texture map corresponding to each body part; then, splicing the components, classifying each synthesized part by softmax, calculating the Euclidean distance between query and galery at the moment, and selecting Resnet as the 2 nd-step network model backbone in the aspect of selecting a basic network at the moment;
3, joint learning of the two networks;
the local area alignment network based on human body gestures in the step 1 is specially designed for solving the problem of shielding, the texture alignment network based on semantic visibility in the step 2 is specially designed for solving gesture changes, the networks corresponding to the two steps have re-identification functions, the re-identification functions can be respectively carried out on the networks, but shielding and gesture diversity often simultaneously occur in part of pedestrian re-identification tasks; therefore, it is necessary to solve these two problems at the same time, so that two networks are jointly learned and trained by using jointlylearing;
step 3.1, performing element-wise operation on the feature graphs obtained by the two branch networks in the step 1 and the step 2 to obtain fusion features, and then combining the global features with the local features to improve the ReID performance;
and 3.2, performing softmax classification by using the global features. Performing softamx classification on the fusion features obtained in the step 3.1, and calculating cross entropy loss, wherein the number of categories is the number of pedestrians;
3.3, matching the human body position blocks by using local features, calculating Euclidean distances between the human body position blocks of the query and the galery images fused with the features in the step 3.1, and designing a ternary loss function;
4, selecting a model training data set and a model testing data set, and verifying the effectiveness of the algorithm on the testing data set;
in order to be close to a real scene, taking a Market1501 as a training set, and cutting a whole-body picture in a proportion of 0-50% to obtain a half-body picture; a test set using two sets of bust data, PartialREID and Partial-iLIDS, wherein the PartialREID has 600 pictures from 60 pedestrians, each pedestrian has 5 full body pictures and 5 bust pictures; the Partial-iLIDS had 476 pictures from 119 pedestrians, each with 3 full body and 1 half body. The experimental results of two test data sets show that the performance of the method is improved by 5% and 6.4% respectively on Rank-1 compared with the best VPM method in the prior art.
The advantages and beneficial effects of the invention;
1) the features are made to have spatial invariance (TEA) by using texture semantic alignment. 2) By adaptively aligning the local pedestrian image with the overall pedestrian image through local area alignment (PRA) based on human body posture, the negative effects of irrelevant areas or covered areas are reduced as much as possible. 3) By classifying the pixel points, the problem of posture change is effectively solved. 4) Model optimization is performed through a joint learning strategy, and the convergence rate is improved.
Drawings
Fig. 1 is a flow chart of a part of a pedestrian re-identification method TSA according to the present invention.
Fig. 2 is a structural diagram of adaptive alignment of a human posture local region.
FIG. 3 is a block diagram of texture semantic alignment.
FIG. 4 is a relationship that corresponds a texture map in a TEA subnetwork branch to a human region in a PRA subnetwork, where: a. an original pedestrian picture; b. longitudinally dividing the image into 5 parts through attitude estimation; c. the category distribution condition of each pixel point in the image is obtained; d. human semantic segmentation to obtain 14 categories; e. the generated texture map is divided into 14 corresponding human body parts according to human body semantic division information, and then the parts are subjected to merge operation for aligning with the features of the maps a and b. After human semantic segmentation, the pedestrian image can be divided into 14 classes as shown in d, and then in order to make the body part semantically aligned with the images b and c, the body part is spliced in an image e mode.
Fig. 5 is a comparison between the existing method for solving the problem of pedestrian re-identification occlusion and the method proposed in the present invention on rank-k, wherein the corresponding documents of the comparison method in fig. 5 are as follows:
[1]Wei Shi Zheng,Li Xiang,Xiang Tao,Shengcai Liao,Jianhuang Lai,and Shaogang Gong.Par-tial person re-identification.In IEEE International Con-ference on Computer Vision(CVPR),2016.
[2]Lingxiao He,Jian Liang,Haiqing Li,and Zhenan Sun.Deep spatial feature reconstruction for par-tial person re-identification:Alignment-free approach.In Computer Vision and Pattern Recognition(CVPR),2018.
[3]Zan Gao,Lishuai Gao,Hua Zhang,Zhiyong Cheng,and Richang Hong.Deep spatial pyramid features collaborative reconstruction for partial person reid.In ACM International Conference on Multimedia,2019.
[4]Xin Jin,Cuiling Lan,Wenjun Zeng,Guo-qiang Wei,and Zhibo Chen.Semantics-aligned repre-sentation learning for person re-identification.In Thirty-Fourth AAAI Conference on Artificial Intelligence,2020.
[5]Hao Luo,Xing Fan,Chi Zhang,and Wei Jiang.Stnreid:Deep convolutional networks with pair-wise spatial transformer networks for partial person re-identification.In IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2019.
[6]Yifan Sun,Qin Xu,and Yali et al.Li.Perceive where to focus:Learning visibility-aware part-level features for partial person re-identification.In IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2019.
FIG. 6 shows that the TSA performance has obvious advantages compared with the existing method for solving the problem of pedestrian re-identification shielding and the method provided by the invention on the ROC curve. Where A is the performance comparison on the Partial-iLIDS dataset and B is the performance comparison on the Partial REID dataset.
FIG. 7 is a comparison of performance of a texture alignment network based on semantic visibility and a local area alignment network based on human body pose, respectively, using a joint learning strategy. Wherein A is the performance comparison in the Partial REID data set, B is the performance comparison in the Partial-iLIDS data set, and the joint learning model is found to be more extensive.
FIG. 8 is a graph A, B showing the effect of visual perceptual information on component feature alignment on Partial REID and Partial-iLDS datasets, respectively, to demonstrate the positive effect of visual perceptual information on algorithm robustness; graph C, D shows the effect of visual perception information on the alignment of textural features on Partial REID and Partial-iLIDS datasets, respectively.
Fig. 9 shows the rapid convergence of TSA during training.
Detailed Description
We perform network design by specific 3 steps: firstly, designing a local area alignment network based on human body gestures, wherein the step mainly aims at solving the problem of shielding; then, designing a texture alignment network based on human body semantic information visibility, wherein the step mainly solves the problems of posture transformation and camera angle transformation; finally, in order to unify the two solution directions, a joint learning network is designed. The invention is further described below with reference to the accompanying drawings.
Example 1
As shown in fig. 1, a flowchart of the operation of a partial pedestrian re-identification method (TSA) based on semantic alignment of visible perceptual textures according to the present invention is shown, and the flowchart includes 3 parts: 1. a local area alignment network (PRA) based on human body pose; 2. texture based on human semantic information visibility to its network (TEA); 3. and (4) joint learning strategy. The method comprises the following operation steps:
step 1, designing a local area alignment network based on human body posture
The pedestrian is divided into 5 regions by 17 key points obtained by attitude estimation (KD) as shown in the lower branch of FIG. 2, and then which region is occluded is judged according to the condition of key point missing. Is recorded as V i If occluded equals 0, unoccluded equals 1. The id classification penalty function of the visible region part is therefore as
Figure GDA0002671154890000051
Wherein L is id Indicating the IDE classification loss corresponding to each regional characteristic,
Figure GDA0002671154890000052
indicating the classification loss of the part of the IDE in the visible area corresponding to each pedestrian.
And then further solving the pose transformation problem by using a pixel-level classification scheme. Each pixel point is subjected to the next branch to obtain 5 areas for softmax classification, and the loss function of the step is
Figure GDA0002671154890000061
Wherein h is j Representing the jth pixel value, R i Indicating the index positions of the divided regions, and performing softmax classification after the feature map passes through a 1x1 convolution layer W, P (R) i |h j ) Represents h j Is of the formula R i The probability of (d); when pixel information h j Belong to an index region R i Where (i ═ j) Γ ═ 1, otherwise Γ ═ 0.
According toV by PRA branching i Signal, calculating the Euclidean distance between query and galery
Figure GDA0002671154890000062
Wherein the content of the first and second substances,
Figure GDA0002671154890000063
indicating the R-th image in the query image i The characteristics of the light source are corresponded to,
Figure GDA0002671154890000064
representing the R-th image in the galery image i Corresponding to the characteristic, p represents that the pedestrian image is longitudinally divided into p blocks, and in the paper, p is 5.
The experimental result of the step 1 is analyzed, and a PRA network is found to have good performance in solving the shielding problem corresponding to the TSA-PRA experimental method in FIG. 5.
Step 2, designing texture alignment network based on semantic visibility
As shown in the flow chart of texture semantic alignment of fig. 3, a pedestrian image texture map is generated in the lower branch using a texture generator; then, whether each part is shielded or not is specifically known in the upper branch by utilizing human semantic information; and multiplying the information characteristics obtained by the two branches to obtain the texture map corresponding to each body part. To semantically align the image features generated by the lower branch (TEA) and the upper branch (PRA) in fig. 1, i.e. corresponding to the part of the explanation of fig. 5 in the description of the figures, we fuse this texture map using the strategy of fig. 4.
And judging which part of the human body is shielded by an upper branch pedestrian part semantic segmentation model (PPS). Is recorded as V j If occluded equals 0, unoccluded equals 1, so the id classification penalty function for the visible region part is
Figure GDA0002671154890000065
Wherein L is id Representing each regionThe IDE classification loss corresponding to the feature,
Figure GDA0002671154890000066
indicating the classification loss of the part of the IDE in the visible area corresponding to each pedestrian.
V obtained from PPS j The signal, the invisible part is characterized as zero matrix, so that the Euclidean distance between query and galery is
Figure GDA0002671154890000071
Wherein the content of the first and second substances,
Figure GDA0002671154890000072
indicating the R-th image in the query image j The characteristics of the light source are corresponded to,
Figure GDA0002671154890000073
representing the R-th image in the galery image j Corresponding to the characteristic, p represents that the pedestrian texture images are merged into p blocks according to the strategy of fig. 4, and p is 5 in the paper.
The experimental result of the step 2 is analyzed, and the TEA network is found to have good performance in solving the occlusion problem corresponding to the TSA-TEA experimental method in FIG. 5.
Step 3, the two networks are subjected to joint learning
The texture alignment network (TEA) based on semantic visibility is specially designed for solving the posture change, the local area alignment network (PRA) based on the human body posture is specially designed for solving the occlusion problem, but the occlusion and posture diversity often simultaneously appear in part of the pedestrian re-identification task. It is therefore necessary to solve both problems.
Firstly, summing the id classification loss functions of the step 1 and the step 2
Figure GDA0002671154890000074
Then construct the refractory sample triplet loss function, for each batch we randomly choose P people, then for each person we randomly pick Q pictures, so there are P x Q pictures per batch, for the anchor in each batch we are the hardest positive samples (the furthest apart of all positive samples) and the hardest negative samples (the closest apart of all negative samples) in the batch. The distance between the anchor and positive sample positive and negative sample negative pictures is the sum of Euclidean distances of step 1 and step 2
Figure GDA0002671154890000075
Figure GDA0002671154890000076
The hard sample triplet loss function thus constructed is
Figure GDA0002671154890000077
In the analysis of the experimental results of fig. 5 and fig. 7, it is found that the performance of the model involved by the combined learning strategy is further obviously improved.
Step 4 model training and testing
Training the model constructed in the step 3, selecting Market1501 as a training set, wherein the loss function in the whole training process is
Figure GDA0002671154890000078
During the test, we divided into the following steps:
1. compared with the current most advanced method;
2. evaluating the superiority of the joint learning;
3. and analyzing the advantages of the visible perception method.
Our experimental testing procedure was performed on two public bust datasets, Partial REID and Partial-iLIDS, respectively. We followed the routine test practice to evaluate the model with the average Cumulative Matching Characteristics (CMC) curve of Rank-k and the Receiver Operating Characteristics (ROC) curve, respectively. As shown in fig. 5, our method shown in fig. 6 has higher accuracy. In fig. 7, TSA-PRA indicates that the training is performed only by the PRA network branch, TSA-TEA indicates that the training is performed only by the TEA network branch, and TSA indicates that the two models are jointly learned, and we can see that the joint learning has good effectiveness from the comparison of the experimental results. The comparison result of fig. 8 shows that the visual perception method has great significance for improving the generalization capability of the model. In fig. 9, we find that the TSA method has good convergence during training.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (1)

1. A partial pedestrian re-identification method based on semantic alignment of visible perception textures specifically comprises the following steps:
1, designing a local area alignment network based on human body postures;
step 1.1, a scheme for aligning local regions of human body parts is provided to further solve the problem of shielding;
for a complete pedestrian picture, 17 human body position key points are obtained by utilizing posture estimation, and are respectively determined as eyes, ears, mouths, shoulders, elbows, hands, hips and knees, feet, the pedestrian is divided into 5 regions in the longitudinal direction, and the 5 regions are respectively: head, trunk, upper leg, lower leg, and feet are recorded as
Figure DEST_PATH_IMAGE001
I =1, 2, 3, 4, 5, and then determines which region is occluded according to the missing key point condition, if occluded
Figure 961494DEST_PATH_IMAGE001
Equal to 0, not occluded
Figure 987219DEST_PATH_IMAGE001
Equal to 1;
step 1.2, further solving the pose transformation problem by using a pixel-level classification scheme;
classifying each pixel point through a softmax classifier, wherein the total number of the classes is 5 region classes obtained in the step 1.1; performing softmax classification on the corresponding area of each pedestrian picture, wherein the number of the classes is the number of the corresponding pedestrians in the training set; using the product obtained in step 1.1
Figure 519831DEST_PATH_IMAGE001
Calculating the Euclidean distance between the query and the galery by the information, namely calculating the distance between the visible block of a part of pedestrian pictures and the corresponding block of the complete pedestrian pictures; in the aspect of selection of the basic network, selecting Resnet as the network model backbone of the step;
step 1.3, calculating the cross entropy loss and the Euclidean distance on the basis of the step 1.2:
a. in the training process, each pixel point is subjected to classified cross entropy loss, wherein the class label is 5 region serial numbers corresponding to the step 1.1; b. in the training process, each picture is classified to have cross entropy loss, wherein the class label is a label corresponding to the image in the training set; c. designing a triple loss function according to the Euclidean distance between the pictures calculated in the step 1.2;
2, designing a texture alignment network based on semantic visibility;
the human body consists of a 3D grid and a texture map under a UV coordinate system, and the texture alignment scheme is to calculate the distance between pedestrians by using the texture map features corresponding to the body parts, so that the problem of the transformation of the posture of the human body and the shooting visual angle of a camera is solved;
2.1, generating a pedestrian image texture map by using a texture generator, wherein the texture map under the UV coordinate system can realize the angle invariance of the characteristic map; a human body semantic segmentation model trained by an EANet method is used for classifying components of each pedestrian in a ReiD data set, the model is used for training a COCO-Part14 data set to divide pedestrian pictures into 14 classes, and the 14 classes are respectively of body parts: the head, the trunk, the left upper arm, the right upper arm, the left lower arm, the right lower arm, the left hand, the right hand, the left upper leg, the right upper leg, the left lower leg, the right lower leg, the left foot and the right foot, and which part is lost and blocked can be known by utilizing human semantic segmentation information;
step 2.2, the human body semantic segmentation model trained in the step 2.1 can judge which part of the human body is shielded on the half-length picture, and is recorded as
Figure 125256DEST_PATH_IMAGE002
If occluded equals 0, unoccluded equals 1;
obtaining whether each body part is shielded according to the texture maps obtained in the step 2.3 and the step 2.1 and obtaining the information of whether each body part is shielded according to the step 2.2
Figure 262976DEST_PATH_IMAGE002
Calculating a texture map corresponding to each body part; then, splicing the components, classifying each synthesized part by softmax, calculating the Euclidean distance between query and galery at the moment, and selecting Resnet as the 2 nd-step network model backbone in the aspect of selecting a basic network at the moment;
3, joint learning of two networks;
the local area alignment network based on human body gestures in the step 1 is specially designed for solving the problem of shielding, the texture alignment network based on semantic visibility in the step 2 is specially designed for solving gesture changes, the networks corresponding to the two steps have re-identification functions, the re-identification functions can be respectively carried out on the networks, but shielding and gesture diversity often simultaneously occur in part of pedestrian re-identification tasks; therefore, it is necessary to solve these two problems at the same time, so that two networks are jointly learned and trained by using jointlylearing;
step 3.1, performing element-wise add operation on the feature graphs obtained by the two branch networks in the step 1 and the step 2 to obtain fusion features, and then combining the global features with the local features to improve the ReID performance;
3.2, performing softmax classification by using global features; performing softamx classification on the fusion features obtained in the step 3.1, and calculating cross entropy loss, wherein the number of categories is the number of pedestrians;
3.3, matching the human body position blocks by using local features, calculating Euclidean distances between the human body position blocks of the query and the galery images fused with the features in the step 3.1, and designing a ternary loss function;
4, selecting a model training data set and a model testing data set, and verifying the effectiveness of the algorithm on the testing data set;
in order to be close to a real scene, taking a Market1501 as a training set, and cutting a whole-body picture in a proportion of 0-50% to obtain a half-body picture; a test set using two sets of bust data, Partial REID and Partial-iLIDS, wherein 600 pictures of Partial REID are from 60 pedestrians, each pedestrian has 5 full body pictures and 5 bust pictures; the Partial-iLIDS had 476 pictures from 119 pedestrians, each with 3 full body and 1 half body; the experimental results of two test data sets show that the performance of the method is improved by 5% and 6.4% respectively on Rank-1 compared with the best VPM method in the prior art.
CN202010708118.5A 2020-07-21 2020-07-21 Partial pedestrian re-identification method based on visible perception texture semantic alignment Active CN111797813B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010708118.5A CN111797813B (en) 2020-07-21 2020-07-21 Partial pedestrian re-identification method based on visible perception texture semantic alignment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010708118.5A CN111797813B (en) 2020-07-21 2020-07-21 Partial pedestrian re-identification method based on visible perception texture semantic alignment

Publications (2)

Publication Number Publication Date
CN111797813A CN111797813A (en) 2020-10-20
CN111797813B true CN111797813B (en) 2022-08-02

Family

ID=72827271

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010708118.5A Active CN111797813B (en) 2020-07-21 2020-07-21 Partial pedestrian re-identification method based on visible perception texture semantic alignment

Country Status (1)

Country Link
CN (1) CN111797813B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580525B (en) * 2020-12-22 2023-05-23 南京信息工程大学 Case activity track monitoring method based on pedestrian re-identification
CN113139504B (en) * 2021-05-11 2023-02-17 支付宝(杭州)信息技术有限公司 Identity recognition method, device, equipment and storage medium
CN113743239A (en) * 2021-08-12 2021-12-03 青岛图灵科技有限公司 Pedestrian re-identification method and device and electronic equipment
CN114842512B (en) * 2022-07-01 2022-10-14 山东省人工智能研究院 Shielded pedestrian re-identification and retrieval method based on multi-feature cooperation and semantic perception

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224937A (en) * 2015-11-13 2016-01-06 武汉大学 Based on the semantic color pedestrian of the fine granularity heavily recognition methods of human part position constraint
CN107832672A (en) * 2017-10-12 2018-03-23 北京航空航天大学 A kind of pedestrian's recognition methods again that more loss functions are designed using attitude information
CN108960127A (en) * 2018-06-29 2018-12-07 厦门大学 Pedestrian's recognition methods again is blocked based on the study of adaptive depth measure

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224937A (en) * 2015-11-13 2016-01-06 武汉大学 Based on the semantic color pedestrian of the fine granularity heavily recognition methods of human part position constraint
CN107832672A (en) * 2017-10-12 2018-03-23 北京航空航天大学 A kind of pedestrian's recognition methods again that more loss functions are designed using attitude information
CN108960127A (en) * 2018-06-29 2018-12-07 厦门大学 Pedestrian's recognition methods again is blocked based on the study of adaptive depth measure

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Pedestrian Alignment Network for;Zhedong Zheng et al.;《arXiv》;20170703;第1-13页 *
基于多尺度卷积特征融合的行人重识别;徐龙壮等;《激光与光电子学进展》;20190731;第56卷(第14期);第141504-1-7页 *
基于特征融合的行人重识别方法;张耿宁等;《计算机工程与应用》;20160510;第53卷(第12期);第185-189页 *

Also Published As

Publication number Publication date
CN111797813A (en) 2020-10-20

Similar Documents

Publication Publication Date Title
CN111797813B (en) Partial pedestrian re-identification method based on visible perception texture semantic alignment
CN111339903B (en) Multi-person human body posture estimation method
CN110135366B (en) Shielded pedestrian re-identification method based on multi-scale generation countermeasure network
CN112597941B (en) Face recognition method and device and electronic equipment
Zhang et al. Semantic-aware dehazing network with adaptive feature fusion
WO2018133119A1 (en) Method and system for three-dimensional reconstruction of complete indoor scene based on depth camera
CN111160291A (en) Human eye detection method based on depth information and CNN
CN114119739A (en) Binocular vision-based hand key point space coordinate acquisition method
CN111597978B (en) Method for automatically generating pedestrian re-identification picture based on StarGAN network model
CN116092190A (en) Human body posture estimation method based on self-attention high-resolution network
Zhang et al. Removing Foreground Occlusions in Light Field using Micro-lens Dynamic Filter.
Gong et al. Dark-channel based attention and classifier retraining for smoke detection in foggy environments
CN114120389A (en) Network training and video frame processing method, device, equipment and storage medium
Zhu et al. Occlusion-free scene recovery via neural radiance fields
Zhu et al. Spectral Dual-Channel Encoding for Image Dehazing
Chen et al. Face recognition with masks based on spatial fine-grained frequency domain broadening
Vidyamol et al. An improved dark channel prior for fast dehazing of outdoor images
CN111709997B (en) SLAM implementation method and system based on point and plane characteristics
Fu et al. CBAM-SLAM: A Semantic SLAM Based on Attention Module in Dynamic Environment
CN113609993A (en) Attitude estimation method, device and equipment and computer readable storage medium
CN113269089A (en) Real-time gesture recognition method and system based on deep learning
Zhang et al. Light field occlusion removal network via foreground location and background recovery
Chen et al. Labelled silhouettes for human pose estimation
Wei et al. On active camera control and camera motion recovery with foveate wavelet transform
Lei et al. Human Pose Estimation of Diver Based on Improved Stacked Hourglass Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant