CN112418134B - Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method - Google Patents

Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method Download PDF

Info

Publication number
CN112418134B
CN112418134B CN202011387800.5A CN202011387800A CN112418134B CN 112418134 B CN112418134 B CN 112418134B CN 202011387800 A CN202011387800 A CN 202011387800A CN 112418134 B CN112418134 B CN 112418134B
Authority
CN
China
Prior art keywords
pedestrian
model
label
picture
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011387800.5A
Other languages
Chinese (zh)
Other versions
CN112418134A (en
Inventor
王其聪
王旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Shenzhen Research Institute of Xiamen University
Original Assignee
Xiamen University
Shenzhen Research Institute of Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University, Shenzhen Research Institute of Xiamen University filed Critical Xiamen University
Priority to CN202011387800.5A priority Critical patent/CN112418134B/en
Publication of CN112418134A publication Critical patent/CN112418134A/en
Application granted granted Critical
Publication of CN112418134B publication Critical patent/CN112418134B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/245Classification techniques relating to the decision surface
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method relates to a computer vision technology. Preparing a pedestrian re-identification direction data set, analyzing a pedestrian picture through a pedestrian analysis model to obtain masks of an upper body, a lower body and a whole body of a pedestrian, and designing a multi-branch attention mechanism network model according to the obtained masks; according to the mask, the color of the pedestrian clothing in the picture is changed by tone adjustment of the pedestrian body area, and then the pedestrian clothing is used as a new pedestrian category to expand the data set; and manufacturing a data set with a double-label structure from the newly-expanded data set, wherein each pedestrian is provided with two corresponding labels, different confidence degrees are respectively set, a multi-label classification loss function is provided, a trained network model is utilized to perform characteristic representation on the images of the test set, and then similarity comparison and sequencing are performed. The method has the advantages that good recognition performance is obtained on a plurality of public data sets, and interference of problems such as disordered background and shielding on the re-recognition direction of pedestrians is effectively relieved.

Description

Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method
Technical Field
The invention relates to a computer vision technology, in particular to a pedestrian re-identification method based on pedestrian analysis and multiple flow and labels.
Background
Pedestrian re-identification is one of the current computer vision direction research hotspots, and along with the development of modern society, the intelligent security field is paid more attention to, and pedestrian re-identification is also widely regarded as one of the important research directions in the intelligent security field, so that the rapid development of the pedestrian re-identification field is promoted. However, there are still a lot of problems in the direction of re-recognition of pedestrians, such as a small number of data sets, clutter of the pedestrians, and shielding. At present, most of research works are to solve the problem of pedestrian re-recognition direction by using a deep neural network, and in a model training stage, pedestrian re-recognition is used as a classification task, and then in a testing stage, features are extracted for similarity comparison.
The data sets of the pedestrian re-recognition directions are all obtained by shooting through a plurality of cameras without overlapping vision fields, so that the data set pictures used for the pedestrian re-recognition task contain the interference of complex background factors, the pedestrian re-recognition task is to judge whether pedestrians in pictures shot by different cameras are the same person or not, the influence of the complex background factors on the pedestrian re-recognition task is large, and more interference information can be brought in the process of extracting features through a deep neural network. In addition, due to reasons such as shooting by a camera and the effect of a later detection method, the area ratio of the pedestrian area in the edge frame is obvious, normally detected pedestrians often occupy the area with larger edge frame, and the ratio of pedestrians in pictures is smaller, so that further learning of a model is influenced, and the influence of the problem on model learning is not considered in most of the current methods. Since the pictures are randomly shot by the camera, the influence of the shielding problem on the task of re-identifying pedestrians is more remarkable, wherein the method for enhancing data is proposed by Zhong et al (Z.Zhong, L.Zheng, G.Kang, S.Li, and Y.Yang, "Random erasing data augmentation," arXiv preprint arXiv:1708.04896,2017.) and erases the region in the pictures with a certain probability, and the method can also partially treat the shielding problem, but the method does not consider from the perspective of model design and the perspective of the spatial structure of the pedestrians in the pictures, and can just effectively treat the shielding problem.
Disclosure of Invention
The invention aims to provide a multi-flow and multi-tag pedestrian re-identification method based on pedestrian analysis aiming at the technical problems existing in the existing pedestrian re-identification model.
The invention comprises the following steps:
1) Preparing a pedestrian re-identification direction data set;
2) Designing a multi-flow attention pedestrian re-identification network model structure based on pedestrian analysis, and extracting the characteristic of pedestrians with more discriminant ability through the network model;
3) Expanding the prepared data set by means of a pedestrian analysis model to obtain a new expanded data set;
4) Designing a multi-label classification loss function;
5) Designing a multi-stream multi-task loss optimization function;
6) On large-scale image data, a back propagation algorithm is utilized to pretrain a backbone network ResNet50 network, and a pretrained ResNet50 model is obtained;
7) On the basis of a pre-trained ResNet50 model, using an expanded pedestrian re-recognition data set, calculating the loss caused by the model by using a designed multi-task loss optimization function, and performing end-to-end training on the whole constructed model by using a back propagation algorithm to obtain a final trained model;
8) And (3) carrying out pedestrian re-recognition by using the trained model, and taking the output characteristics of the final network model as the characteristic representation of the pedestrian image and using the output characteristics as the subsequent similarity measurement and sequencing.
In step 1), the specific method for preparing the pedestrian re-recognition direction data set may be: assume that the training set pedestrian image is { (x) i ,y i ) I=1,..n }, where n is the number of samples of the training set and is a natural number; x is x i For the pedestrian image corresponding to the ith training sample, y i (1. Ltoreq.yi. Ltoreq.N) represents the pedestrian category label of the ith training sample, N represents the number of pedestrian categories contained in the training sample set and is a natural number.
In step 2), the specific steps of the design of the multi-flow attention pedestrian re-identification network model structure based on pedestrian analysis are as follows:
(1) Respectively acquiring masks of the whole body, the upper body and the lower body of a pedestrian in the picture by means of a pedestrian analysis model, wherein the masks are 0 except for the pixel point 1 of a specified body region;
(2) Removing the full connection layer used for classification finally by the original network, changing global average pooling before the full connection layer into global maximum pooling, and finally copying the network part after the first convolution pooling into four independent branches, wherein parameter sharing is not carried out among each branch;
(3) In the first branch, no changes are made; and respectively carrying out the attention mechanisms of the whole body, the upper body and the lower body of the pedestrian for the second branch, the third branch and the fourth branch, and respectively carrying out the channel attention mechanism on the characteristics of the positions of each branch part in each branch to finally obtain the multi-flow attention pedestrian re-identification network model structure based on pedestrian analysis.
In step 3), the specific steps of expanding the already prepared data set are as follows:
(1) Respectively acquiring masks of the whole pedestrian body in the picture by means of a pedestrian analysis model, wherein the masks except for the pixel points of the appointed body region are 1, and the rest regions are 0;
(2) And separating the foreground region and the background region of the pedestrians in the pictures by using the obtained mask, and then changing the color tone of the foreground region of the pedestrians to realize the clothes changing of the pedestrians so as to achieve the purpose of expanding the data set.
In step 4), the specific steps of designing the multi-label classification loss function are as follows:
(1) Class y for the original dataset label Assuming that the generated pedestrian category isThe original picture has +.>Two category labels, and will y label As a first category of the original picture, < >>As a second category of the original picture; the generated picture likewise has two category labels +.>Will->As the first category of newly generated pictures, y label As a second category of the newly generated picture; thus constructing a double-labeled pedestrian re-identification dataset;
(2) Different confidence levels are set for different categories, and smoothing is added, so that a designed multi-label classification loss function is obtained as follows:
wherein P (y) label ) The probability of the first category label is predicted for the model,the probability of the second label is predicted for the model.
In step 5), the specific steps of designing the multi-stream multi-task loss optimization function are as follows:
for each branch, a measurement task and a classification task are respectively calculated, wherein the measurement task adopts a triplet loss function, the classification task adopts a multi-label classification loss function, and a finally designed multi-stream multi-task loss optimization function is shown in the following formula:
L=L multi-labels +L triplet #(2)
wherein L is multi-labels To model the average classification loss of multiple branches, L triplet The average metric loss for multiple branches of the model.
Compared with the prior art, the invention has the following advantages:
the invention designs a multi-stream attention pedestrian re-recognition network model structure based on pedestrian analysis, wherein a multi-branch structure is utilized to carry out attention mechanism on pedestrians in a picture so as to eliminate interference of a background and a shielding area on the re-recognition direction of the pedestrians, and attention mechanisms are added to different channels of the same-layer characteristics of different networks, so that the model weights information channels expressing different pedestrians, and the characteristics with more discriminants can be learned; then, a more efficient data enhancement method is designed, and the purpose of expanding a data set is achieved by changing the colors of clothes of pedestrians in the pictures; the newly generated picture is generated based on the original picture, and strong similarity exists between the newly generated picture and the real picture in the aspects of pedestrian textures, outlines and backgrounds, so that a double-label data set is provided for the newly generated data set, a multi-label classification loss function is provided, a multi-stream multi-task loss function is designed in combination with a measurement task, the model is better optimized, and therefore more discriminant feature representation is extracted, subsequent feature similarity measurement and sequencing are completed, and a final pedestrian re-identification result is obtained. The invention effectively relieves the interference of the problems of disordered background, shielding and the like on the re-recognition direction of pedestrians.
Drawings
Fig. 1 is a frame diagram of an embodiment of the present invention.
Detailed Description
In order to make the above objects, features and advantages of the present invention more comprehensible, the following detailed description of the method of the present invention is given with reference to the accompanying drawings and examples, which are provided with the technical scheme of the present invention as a premise, and the present invention is not limited to the following examples.
Referring to fig. 1, the implementation of the embodiment of the present invention includes the following steps:
1. a pedestrian re-recognition direction dataset is prepared. Assume that the training set pedestrian image is { (x) i ,y i ) I=1,..n }, where n is the number of samples of the training set and is a natural number; x is x i For the pedestrian image corresponding to the ith training sample, y i (1. Ltoreq.yi. Ltoreq.N) represents the pedestrian category label of the ith training sample, N represents the number of pedestrian categories contained in the training sample set and is a natural number.
2. And designing a multi-flow attention pedestrian re-identification network model structure based on pedestrian analysis, and extracting the characteristic of more discriminant pedestrians through the network model.
B1. Masks respectively marking positions of the upper body, the lower body and the whole body area of the pedestrian on the picture through a pedestrian analysis model JPPNet, and respectively marking the positions as masks upper 、mask lower And mask whole 。mask upper The pixel value of the area where the upper half of the middle pedestrian is positioned is uniformly set to be 1, the rest positions are set to be 0, and the mask is set lower The pixel value of the area where the lower half of the middle pedestrian is positioned is uniformly set to be 1, the rest positions are set to be 0, and the mask is set whole The pixel value of the whole body area of the pedestrian is set to 1, and the rest positions are set to 0.
B2. The ResNet50 network is removed and finally the fully connected layer used for classification is removed, global average pooling before the fully connected layer is changed into global maximum pooling, and finally the network part after the first convolution pooling is duplicated into four independent branches, and parameter sharing is not carried out among each branch.
B3. The network structure of the first branch is the same as the structure after the ResNet-50 is modified, the obtained characteristic is 2048-dimensional after global maximum pooling, then a full-connection layer is connected to perform dimension reduction operation on the 2048-dimensional characteristic, the number of neurons of the full-connection layer is set to be 512, and the dimension reduction characteristic z of 512 dimensions is obtained G
B4. In the second branch, the JPPNet is used for analyzing the pedestrian picture to obtain the mask of the whole body area of the pedestrian whole Then, weighting the characteristic X obtained after the first convolution pooling to obtain the characteristic X obtained after the attention weighting of the whole pedestrian area whole
X whole =(1+α 1 ·mask whole )·X#(1)
Wherein alpha is 1 The weight parameters are used for carrying out attention mechanism on the whole body of the pedestrian in the picture. The branch is to perform a attentive mechanism on the whole body of the pedestrian on the picture, and the subsequent processing is the same as the processing in the first branch, and the obtained 2048-dimensional characteristics are subjected to dimension reduction to obtain 512-dimensional characteristics Z whole
B5. In the third and fourth branches, too, use is made ofThe JPPNet analyzes the pedestrians in the pictures to respectively obtain masks of the upper body of the pedestrians upper Mask for lower body 1ower Respectively weighting the features X to obtain the features X after the attention weighting of the upper half body area and the lower half body area of the pedestrian on the picture upper X is as follows lower
X upper =(1+α 2 ·mask upper )·X#(2)
X lower =(1+α 3 ·mask lower )·X#(3)
Wherein alpha is 2 And alpha 3 The weight parameters of the attention mechanism for the whole upper body and the lower body of the pedestrian in the picture are respectively. The two branches are respectively used for carrying out attention mechanism on the upper body and the lower body of the pedestrian on the picture, the later processing is the same as the processing in the first branch, and finally 512-dimensional characteristics Z after dimension reduction of the two branches are respectively obtained upper And Z lower
B6. In the training stage, the task of re-identifying pedestrians is used as a classification task, so that for each branch, a layer of full-connection layer is respectively connected at last and used for classifying each branch, and the number of neurons of the full-connection layer is the number of the classes of pedestrians in the training set. In the test stage, the stage is connected with the characteristics after the dimension reduction of the four branches, and the characteristics Z for similarity measurement are obtained:
wherein,representing the concatenation operation, the dimension of the resulting feature Z for the similarity measure is 2048 dimensions.
3. And expanding the prepared data set by means of a pedestrian analysis model to obtain a new expanded data set.
C1. Analyzing pedestrians in the pictures by using the JPPNet to obtain masks of the whole bodies of the pedestrians hole And recordRecording the pedestrian label, namely, label, wherein the pedestrian label is a positive value, such as a mark-1501 data set, and the value range of the label is from 0 to 1501;
C2. through the obtained mask whole Separating the area where the foreground of the pedestrian is located from the background area to obtain pictures P and B only containing foreground pixel values and background pixel values respectively;
C3. the analysis model of the person also has error, if the analysis area is smaller, namely the ratio of the pedestrian area in the area where the pedestrian is located to the total area is smaller than 0.3, the analysis of the picture is considered to be failed, no processing is carried out on the picture, and otherwise, the following operation is continued.
And C4. Converting the pedestrian foreground picture P in the RGB format into the pedestrian foreground picture in the HSV format. The angle value of the hue in the picture in the HSV format is from 0 to 360, so that the hue after conversion is obtained through the formula H=label% 360 in order to ensure the consistency of the hue H after pedestrian conversion of the same type label, then the picture P is subjected to pedestrian clothing change, and the pedestrian foreground picture P after clothing change is converted into the RGB format.
C5. And splicing the foreground picture P with the previous background picture B to obtain a new pedestrian picture, and storing the new picture.
4. And designing a multi-label classification loss function.
D1. Class y for the original dataset label Assuming that the generated pedestrian category isThe original picture has +.>Two category labels, and will y label As a first category of the original picture, < >>As the second category of the original picture. The generated picture also has two category labels +.>Will->As the first category of newly generated pictures, y label As the second category of the newly generated picture. Thus forming a double-labeled pedestrian re-identification dataset.
D2. Designs a multi-label classification loss function L multi-labels
Wherein alpha and beta are super parameters, K is a category number, P (y label ) The probability of the first category label is predicted for the model,predicting a probability of a second tag for the model; in the experiment, alpha and beta are respectively set to be 0.1.
5. A multi-task loss optimization function is designed.
E1. Calculating multi-branch metric average loss:
wherein,respectively represents the triplet measurement loss obtained by calculation of the ith branch, i epsilon [1,2,3,4 ]]Representing the four branches of the model, respectively.
E2. Calculating the average loss of the multi-label classification function of the multi-branches:
wherein,representing the multi-label classification loss calculated by the ith branch, i epsilon [1,2,3,4 ]]Representing the four branches of the model, respectively.
E3. Calculating a loss function of the whole model:
L=L multi-labels +L triplet #(8)
wherein L is cls To model the average classification loss of multiple branches, L triplet The average metric loss for multiple branches of the model.
6. On large-scale image data, a back propagation algorithm is utilized to pretrain a backbone network ResNet50 network, and a pretrained ResNet50 model is obtained.
7. On the basis of a pre-trained ResNet50 model, an expanded pedestrian re-recognition data set is used, the loss caused by a designed multi-task loss optimization function calculation model is utilized, and a back propagation algorithm is utilized to perform end-to-end training on the whole built model, so that a final trained model is obtained.
8. And (3) carrying out pedestrian re-recognition by using the trained model, and taking the output characteristics of the final network model as the characteristic representation of the pedestrian image and using the output characteristics as similarity measurement and sequencing.
Tables 1-2 are respectively the comparison of the proposed method with the re-identification results of other pedestrians on the Market-1501, CUHK03 data set.
TABLE 1
TABLE 2
In tables 1-2, other methods are as follows:
LSRO corresponds to the method proposed by Zheng et al (Z.Zheng, L.Zheng, and Y.Yang. "Unlabeled samples generated by gan improve the person re-identification baseline in vitro" arXiv preprint arXiv:1701.07717, vol.3, 2017.);
PNGAN corresponds to the method proposed by Qian et al (X.Qian, Y.Fu, T.Xiang, W.Wang, J.Qiu, Y.Wu, Y. -G.Jiang, and X.Xue. "Pose-normalized image generation for person re-identification" in European Conference on Computer Vision,2018, pp.661-678.);
CamStyle corresponds to the method proposed by Zhong et al (Z.Zhong, L.Zheng, Z.Zheng, S.Li, and Y.Yang. "Camera style adaptation for person re-identification" in proceedings of the IEEE conference on Computer Vision and Pattern Recognition,2018, pp.5157-5166.);
MLFN corresponds to the method proposed by Chang et al (X.Chang, T.Hospedales, and T.Xiang. "Multi-level factorisation net for person re-identification" in proceedings of the IEEE conference on Computer Vision and Pattern Recognition,2018, vol.1, p.2.);
HA-CNN corresponds to the method proposed by Li et al (w.li, x.zhu, and s.gong. "Harmonious attention network for person reidentification" in proceedings of the IEEE conference on Computer Vision and Pattern Recognition,2018, pp.2285-2294);
the PCB corresponds to the method proposed by Sun et al (Y.Sun, L.Zheng, Y.Yang, et al, "Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline)" in proceedings of the European Conference on Computer Vision, 2018:480-496);
MGN corresponds to the method proposed by Wang et al (G.Wang, Y.Yuan, X.Chen, et al, "Learning discriminative features with multiple granularities for person re-identification" in proceedings of the 26th ACM international conference on Multimedia,2018:274-282);
OSNet corresponds to the method proposed by Zhou et al (K.Zhou, Y.Yang, A.Cavallaro, et al, "Omni-scale feature learning for person re-identification" in proceedings of the IEEE International Conference on Computer Vision, 2019:3702-3712);
PAN corresponds to the method proposed by Zheng et al (Z.Zheng, L.Zheng, Y.Yang, et al, "Pedestrian alignment network for large-scale person re-identification" in IEEE Transactions on Circuits and Systems for Video Technology,2018,29 (10): 3037-3045);
AANet corresponds to the method proposed by Tay et al (C.Tay, S.Roy, K.Yap. "Aanet: attribute attention network for person re-identifiers" in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2019: 7134-7143.);
FPR corresponds to the method proposed by He et al (L.He, Y.Wang, W.Liu, et al, "forerunner-aware Pyramid Reconstruction for Alignment-free Occluded Person Re-identification" in proceedings of the IEEE International Conference on Computer Vision, 2019:8450-8459);
CRANs correspond to the methods proposed by Han et al (C.Han, R.Zheng, C.Gao, et al, "completion-Reinforced Attention Network for Person Re-Identification" in IEEE Transactions on Circuits and Systems for Video Technology, 2019);
CASN corresponds to the method proposed by Zheng et al (M.Zheng, S.Karanam, Z.Wu, et al, "Re-identification with consistent attentive siamese networks" in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019:5735-5744);
JPPNet corresponds to the method proposed by Liang et al (X.Liang, K.Gong, X.Shen, et al, "Look intoperson: joint body parsing & pose estimation network and a new benchmark" in IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41 (4): 871-885).
Firstly, analyzing a pedestrian picture through a pedestrian analysis model to obtain masks of an upper body, a lower body and a whole body of a pedestrian, and designing a multi-branch attention mechanism network model according to the obtained masks, wherein each branch fully utilizes the information of the locality of the pedestrian to learn the model; secondly, according to the obtained mask, the color tone of the pedestrian body area is adjusted, the color of the pedestrian clothes in the picture is changed, and then the pedestrian clothes are used as a new pedestrian category, so that the aim of expanding the data set is fulfilled; the original picture is compared with the expanded picture, the original picture has very strong similarity on the background and the outline of pedestrians, based on the consideration, the newly expanded data set is manufactured into a data set with a double-label structure, each pedestrian is provided with two corresponding labels, namely a first-class label and a second-class label, and different confidence degrees are respectively set for different labels in the classifying process, so that a multi-label classifying loss function is provided, and the model is beneficial to learn more distinguishing characteristics. And finally, carrying out feature representation on the test set images by using the trained network model, and carrying out subsequent similarity comparison and sequencing. Experimental analysis shows that the method reduces interference of problems such as background disorder and shielding on the re-recognition direction of pedestrians, and obtains good recognition performance on a plurality of public data sets.

Claims (3)

1. The pedestrian re-identification method based on pedestrian analysis and multi-stream and multi-tag is characterized by comprising the following steps of:
1) Preparing a pedestrian re-identification direction data set;
2) Designing a multi-flow attention pedestrian re-identification network model structure based on pedestrian analysis, and extracting the characteristic of pedestrians with more discriminant ability through the network model;
3) Expanding the prepared data set by means of a pedestrian analysis model to obtain a new expanded data set;
the specific steps of expanding the ready data set are as follows:
(1) Respectively acquiring masks of the whole pedestrian body in the picture by means of a pedestrian analysis model, wherein the masks except for the pixel points of the appointed body region are 1, and the rest regions are 0;
(2) Separating a pedestrian foreground area from a background area in the picture by using the obtained mask, and then changing the tone of the pedestrian foreground area to realize the clothes changing of the pedestrian, thereby achieving the purpose of expanding the data set;
4) The method for designing the multi-label classification loss function comprises the following specific steps:
(1) Class y for the original dataset label Is a row of (2)A person, assuming that the generated pedestrian category isThe original picture has +.>Two category labels, and will y label As a first category of the original picture, < >>As a second category of the original picture; the generated picture likewise has two category labels +.>Will->As the first category of newly generated pictures, y label As a second category of the newly generated picture; thus constructing a double-labeled pedestrian re-identification dataset;
(2) Different confidence levels are set for different categories, and smoothing is added, so that a designed multi-label classification loss function is obtained as follows:
wherein P (y) label ) The probability of the first category label is predicted for the model,predicting a probability of a second tag for the model;
5) The method comprises the following specific steps of:
for each branch, a measurement task and a classification task are respectively calculated, wherein the measurement task adopts a triplet loss function, the classification task adopts a multi-label classification loss function, and a finally designed multi-stream multi-task loss optimization function is shown in the following formula:
L=L multi-labels +L triplet #(2)
wherein L is multi-labels To model the average classification loss of multiple branches, L triplet Average metric loss for a model plurality of branches;
6) On large-scale image data, a back propagation algorithm is utilized to pretrain a backbone network ResNet50 network, and a pretrained ResNet50 model is obtained;
7) On the basis of a pre-trained ResNet50 model, using an expanded pedestrian re-recognition data set, calculating the loss caused by the model by using a designed multi-task loss optimization function, and performing end-to-end training on the whole constructed model by using a back propagation algorithm to obtain a final trained model;
8) And (3) carrying out pedestrian re-recognition by using the trained model, and taking the output characteristics of the final network model as the characteristic representation of the pedestrian image and using the output characteristics as the subsequent similarity measurement and sequencing.
2. The pedestrian recognition method based on the pedestrian resolution multi-stream multi-tag as claimed in claim 1, wherein in the step 1), the specific method for preparing the pedestrian recognition direction data set is as follows: assume that the training set pedestrian image is { (x) i ,y i ) I=1,..n }, where n is the number of samples of the training set and is a natural number; x is x i For the pedestrian image corresponding to the ith training sample, y i The value range of (2) is not less than 1 and not more than y i N is less than or equal to, represents the pedestrian category label of the ith training sample, and N represents the number of pedestrian categories contained in the training sample set and is a natural number.
3. The pedestrian-analysis-based multi-stream multi-tag pedestrian re-recognition method as set forth in claim 1, wherein in step 2), the specific steps of designing the pedestrian-analysis-based multi-stream attention pedestrian re-recognition network model structure are as follows:
(1) Respectively acquiring masks of the whole body, the upper body and the lower body of a pedestrian in the picture by means of a pedestrian analysis model, wherein the masks are 0 except for the pixel point 1 of a specified body region;
(2) Removing the full connection layer used for classification finally by the original network, changing global average pooling before the full connection layer into global maximum pooling, and finally copying the network part after the first convolution pooling into four independent branches, wherein parameter sharing is not carried out among each branch;
(3) In the first branch, no changes are made; and respectively carrying out the attention mechanisms of the whole body, the upper body and the lower body of the pedestrian for the second branch, the third branch and the fourth branch, and respectively carrying out the channel attention mechanism on the characteristics of the positions of each branch part in each branch to finally obtain the multi-flow attention pedestrian re-identification network model structure based on pedestrian analysis.
CN202011387800.5A 2020-12-01 2020-12-01 Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method Active CN112418134B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011387800.5A CN112418134B (en) 2020-12-01 2020-12-01 Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011387800.5A CN112418134B (en) 2020-12-01 2020-12-01 Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method

Publications (2)

Publication Number Publication Date
CN112418134A CN112418134A (en) 2021-02-26
CN112418134B true CN112418134B (en) 2024-02-27

Family

ID=74829464

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011387800.5A Active CN112418134B (en) 2020-12-01 2020-12-01 Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method

Country Status (1)

Country Link
CN (1) CN112418134B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686228B (en) * 2021-03-12 2021-06-01 深圳市安软科技股份有限公司 Pedestrian attribute identification method and device, electronic equipment and storage medium
CN113095174B (en) * 2021-03-29 2024-07-23 深圳力维智联技术有限公司 Re-identification model training method, device, equipment and readable storage medium
CN113255604B (en) 2021-06-29 2021-10-15 苏州浪潮智能科技有限公司 Pedestrian re-identification method, device, equipment and medium based on deep learning network
CN114758362B (en) * 2022-06-15 2022-10-11 山东省人工智能研究院 Clothing changing pedestrian re-identification method based on semantic perception attention and visual shielding
CN114998934B (en) * 2022-06-27 2023-01-03 山东省人工智能研究院 Clothes-changing pedestrian re-identification and retrieval method based on multi-mode intelligent perception and fusion
CN115147873A (en) * 2022-09-01 2022-10-04 汉斯夫(杭州)医学科技有限公司 Method, equipment and medium for automatically classifying dental images based on dual-label cascade
CN115661722B (en) * 2022-11-16 2023-06-06 北京航空航天大学 Pedestrian re-identification method combining attribute and orientation
CN116129473B (en) * 2023-04-17 2023-07-14 山东省人工智能研究院 Identity-guide-based combined learning clothing changing pedestrian re-identification method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977893A (en) * 2019-04-01 2019-07-05 厦门大学 Depth multitask pedestrian recognition methods again based on the study of level conspicuousness channel
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
JP2020101968A (en) * 2018-12-21 2020-07-02 株式会社 日立産業制御ソリューションズ Multi-label data learning assisting apparatus, multi-label data learning assisting method and multi-label data learning assisting program
CN111738213A (en) * 2020-07-20 2020-10-02 平安国际智慧城市科技股份有限公司 Person attribute identification method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
JP2020101968A (en) * 2018-12-21 2020-07-02 株式会社 日立産業制御ソリューションズ Multi-label data learning assisting apparatus, multi-label data learning assisting method and multi-label data learning assisting program
CN109977893A (en) * 2019-04-01 2019-07-05 厦门大学 Depth multitask pedestrian recognition methods again based on the study of level conspicuousness channel
CN111738213A (en) * 2020-07-20 2020-10-02 平安国际智慧城市科技股份有限公司 Person attribute identification method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于注意力机制的行人属性识别;吴杰;王怡涵;侯米娜;全晓鹏;;电子世界;20200130(第02期);全文 *

Also Published As

Publication number Publication date
CN112418134A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
CN112418134B (en) Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method
Chen et al. Global context-aware progressive aggregation network for salient object detection
Liu et al. Multi-objective convolutional learning for face labeling
CN109034210A (en) Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
CN109670405B (en) Complex background pedestrian detection method based on deep learning
CN114758288B (en) Power distribution network engineering safety control detection method and device
CN109977893B (en) Deep multitask pedestrian re-identification method based on hierarchical saliency channel learning
CN111882546B (en) Three-branch convolution network fabric defect detection method based on weak supervision learning
CN109508675A (en) A kind of pedestrian detection method for complex scene
Zhou et al. Multi-type self-attention guided degraded saliency detection
Chen et al. Binarized neural architecture search
CN115272777B (en) Semi-supervised image analysis method for power transmission scene
CN112528845A (en) Physical circuit diagram identification method based on deep learning and application thereof
CN116310466A (en) Small sample image classification method based on local irrelevant area screening graph neural network
CN111723840A (en) Clustering and style migration method for ultrasonic images
Weng et al. Data augmentation computing model based on generative adversarial network
CN114663658B (en) Small sample AOI surface defect detection method with cross-domain migration capability
Ye et al. Video scene classification with complex background algorithm based on improved CNNs
CN109829377A (en) A kind of pedestrian&#39;s recognition methods again based on depth cosine metric learning
CN115100509A (en) Image identification method and system based on multi-branch block-level attention enhancement network
Pang et al. Rotative maximal pattern: A local coloring descriptor for object classification and recognition
Jiang et al. Pedestrian Tracking Based on HSV Color Features and Reconstruction by Contributions
Gu et al. Flag detection with convolutional network
Zeng et al. Comparison between the traditional and deep learning algorithms on image matching
Liu Artificial Intelligence-based Image and Data Analysis in the Industrial Internet in Digital Economy Era

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant