CN115050048A - Cross-modal pedestrian re-identification method based on local detail features - Google Patents

Cross-modal pedestrian re-identification method based on local detail features Download PDF

Info

Publication number
CN115050048A
CN115050048A CN202210604338.2A CN202210604338A CN115050048A CN 115050048 A CN115050048 A CN 115050048A CN 202210604338 A CN202210604338 A CN 202210604338A CN 115050048 A CN115050048 A CN 115050048A
Authority
CN
China
Prior art keywords
pedestrian
local
mask
features
heatmap
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210604338.2A
Other languages
Chinese (zh)
Other versions
CN115050048B (en
Inventor
产思贤
朱锦校
吴周检
林沛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Pixel Technology Co ltd
Original Assignee
Hangzhou Pixel Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Pixel Technology Co ltd filed Critical Hangzhou Pixel Technology Co ltd
Priority to CN202210604338.2A priority Critical patent/CN115050048B/en
Publication of CN115050048A publication Critical patent/CN115050048A/en
Application granted granted Critical
Publication of CN115050048B publication Critical patent/CN115050048B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a cross-modal pedestrian re-identification method based on local detail features, which can guide a network to fully mine the detail information of pedestrians. The proposed APMG can generate the weight of each heatmap according to the pedestrian posture, and the heatmap is fused to obtain the mask to extract the pedestrian detail features. Because the APMG lacks the characteristics of the lower half part of the body, the MC module provided fuses the APMG and the PCB to jointly extract the local characteristic representation of the pedestrian. Further, the proposed WIPA module can interact context information between local features, and suppress the context information in slice features using position information contained in the mask. The two local feature extraction modes complement each other to make up for the deficiency. According to the pedestrian cross-mode re-identification method and device, global and local detail features are combined to be used as the representation of the pedestrian, and good effects are achieved in the cross-mode re-identification task of the pedestrian.

Description

Cross-modal pedestrian re-identification method based on local detail features
Technical Field
The invention relates to the technical field of image processing, in particular to a cross-modal pedestrian re-identification method based on local detail features.
Background
Given a query image and an image dataset, gallery, from different modalities, respectively, the purpose of cross-modality pedestrian re-identification is to match images in gallery that are identical to the identity of the query. Due to its importance in the field of public safety, cross-modal pedestrian re-identification has become a popular problem in the field of re-identification. Due to variations in spectrum, pedestrian pose, and photographic viewing angle, it is a challenging task to fully mine distinctive pedestrian identity.
In order to fully capture pedestrian information, a pedestrian representation method combining local features becomes a common setting in the field of cross-modal pedestrian re-identification. There are three main types of local feature extraction schemes including slicing, pose estimation, and mask filtering. The common slicing method PCB is to uniformly slice the finally output characteristic diagram of the backbone network into a plurality of strips along the vertical direction. From top to bottom, each bar characterizes a different part of the human body. These local features are then constrained using a loss function to focus the network on locally discriminative information. Although slicing can guarantee coverage of all parts of a pedestrian, slicing inevitably introduces extraneous background information and cannot guarantee alignment of local features.
In order to solve the above problem, there is a method such as PGII that positions a pedestrian portion using attitude estimation. A heatmap is generated using the pre-trained pose estimator and local features are extracted using the heatmap as a mask. Attitude estimation can help a network to position joint positions, the problem of feature misalignment is solved, and background information is filtered to a certain extent. However, the estimation effect of pose estimation on pedestrian re-recognition data sets may be inaccurate, which may introduce some background features. In addition, these methods do not distinguish heatmap when extracting local features in the early stage, and lack robustness to pedestrian variations, and may still introduce background information. In order to enhance the robustness of the local features to different pedestrians, the method MPANet extracts the local features by utilizing a deep network model learning mask. But the generated mask hardly focuses on a certain part of the body stably and is lack of the label of the mask, so that the extracted local features are not aligned.
In conclusion, the invention designs a cross-modal pedestrian re-identification method based on local detail features.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention aims to provide a cross-mode pedestrian re-identification method based on local detail characteristics, and three modules are introduced into the prior art, wherein the three modules comprise adaptive body part mask generation module (APMG), Mask Compensation (MC) scheme and weighted intra-part association module (WIPA). The problems of misalignment of background information and characteristics and the like existing in the current local characteristic method are solved.
In order to achieve the purpose, the invention is realized by the following technical scheme: a cross-modal pedestrian re-identification method based on local detail features comprises the following steps:
step 1, reading a SYSU-MM01 data set which comprises images of pedestrians in two modes (normal light and infrared rays), and performing data enhancement on the data set. The training set was divided into uniform batches, each containing 8 images from 8 identities, four for each modality. Inputting a pair of cross-modal images with the same identity into a backbone network Resnet-50, and extracting a global feature map
Figure BDA0003660924830000021
Figure BDA0003660924830000022
Step 2, sending the image pair into a posture estimation network GCM to obtain 16 image pairs
Figure BDA0003660924830000023
Based on the estimated mass of each heatmap on SYSU-MM01, 9 high-quality heatmaps were screened to generate human body part masks.
Step 3, selecting 9 heatmaps and a global feature map F g Sending into APMG (adaptive body part mask generation module) module. APMG learning F g The contribution of the upper body parts adaptively generates the weight of each heatmap. The selected heatmaps were divided into two groups, top part and mid part. Downsampling heatmap to F using maximum pooling g The size of the space of (a). Then, the heatmap is accumulated according to the weight to obtain the weight of each part
Figure BDA0003660924830000024
Mask is used to divide F g Extraction of local features of top part and midpart
Figure BDA0003660924830000025
Figure BDA0003660924830000026
Step 4, in order to compensate the lower body information lacking in the APMG, an mc (mask compensation) module is proposed. The MC divides the global feature into three pieces in the vertical direction by extracting the local feature from the PCB, and takes the last piece
Figure BDA0003660924830000027
Characterized as the following characteristics. Then, the local feature vector f is obtained by combining the local feature vector f with the local feature map extracted by the mask to carry out global pooling local ∈R 3×C
Step 5, f local And (3) mining the context relationship between intra-part by a weighted intra-part attribute module, and simultaneously inhibiting background information in the low part characteristic. And finally, WIPA, the contribution degree of each part is measured to generate a weight weighting characteristic. Pooled global feature vector f g And f local Connected along the pathway as a representation of a pedestrian.
And 6, in order to train the network to accurately capture the pedestrian mode invariant identity feature, the network is trained by using three Loss functions, namely ID Loss, Center Cluster Loss and modification Learning Loss.
And 7, respectively extracting the characteristics of the person in the query set and the galery set, and calculating the similarity between the image in the query set and each image in the galery set. And adopting Euclidean distance between the feature vectors as similarity measurement. And finally, sorting the images in the galery set according to the similarity to obtain a re-identification result.
Preferably, the step 2 specifically comprises: and sending a pair of pedestrian images into an attitude estimation network GCM to obtain heatmaps of 16 different joints of the human body. Pedestrian images are randomly extracted and observed for quality corresponding to heatmap. Finally, 9 of the 9 joints (chest, upper neck, crown, left and right shoulders, left hip, left and right elbows, left wrist) were selected from the 16 heatmaps
Figure BDA0003660924830000031
Preferably, the APMG in step 3 can adaptively generate a mask to extract refined local features. The input of APMG is selected
Figure BDA0003660924830000032
And a global feature map F output by the backbone network g And (4) forming. Specifically, the screened heatmaps were divided into two groups of P representing pedestrians, respectively top (chest, upper neck, crown of head, left and right shoulders), with P mid (left hip, left elbow, left wrist). Then, f is mixed g The incoming weight generation network Gw (-) generates the weight W for each heatmap h_map ∈R 1×9 . The calculation formula is as follows:
W h_map =σ(G w (GAP(F g )))
σ (-) denotes sigmoid function, G w (. cndot.) consists of a convolution with a convolution kernel size of 1. G w The purpose of (-) is to learn the degree of contribution of heatmap to human body parts based on global features and generate corresponding weights. With generated weights W h_map To P top And P mid Weighted summation of heatmaps of the two groups obtains the mask of the corresponding part top
Figure BDA0003660924830000033
The formula is as follows:
mask top =W h_map [P top ] T Heatmap[P top ]
mask mid =W h_map [P mid ] T Heatmap[P mid ]
[ P ] represents an element in the collection and corresponding position to P.
Partitioning global feature graph F with mask g Obtaining the characteristics F of top part and mid part of pedestrians l_top
Figure BDA0003660924830000034
The division formula of the local features is as follows:
F l_top =mask top ⊙F g
F l_mid =mask mid ⊙F g
preferably, the MC is F to compensate for the lack of lower body information of the mask g Dividing into three parts along vertical direction, and taking the last part of feature diagram F l_low As a representation of the lower body of the pedestrian. A local feature map F of three parts of the pedestrian l_low ,F l_top ,F l_mid Obtaining local characteristics f after global pooling local ∈R 3×C
Preferably, the WIPA module in step 5 is defined by f local ∈R 3×C As an input. A self-attention calculation between local features is first performed. Sending it into three 1 × 1 convolutional layers Q (-), K (-), and V (-), to obtain query and key features with dimension ck, and c v Value feature of (1). Features obtained from attention calculation are
Figure BDA0003660924830000047
The self-attention calculation formula is as follows:
Figure BDA0003660924830000041
Figure BDA0003660924830000042
h is the head number, by passing F l_top ,F l_mid Including body part information to help suppress slice feature F l_low The background information of (1). Because different parts of the pedestrians have different contribution degrees to the weight recognition task, the network can learn the weight of each local feature to enhance the useful information. Specifically, i set two fully-connected layers and one ReLU layer to learn local feature weights, and the calculation formula is as follows:
Figure BDA0003660924830000043
Figure BDA0003660924830000044
will f is mixed g And
Figure BDA0003660924830000045
the connections along the pathway result in a final pedestrian characterization.
Preferably, the ID Loss in step 6 is used to train the feature F l_low Ensuring that it contains pedestrian lower body information. Three loss functions joint training f g And
Figure BDA0003660924830000046
final characteristics after connection.
The invention has the following beneficial effects: the invention combines the global and local detail characteristics as the representation of the pedestrian, and obtains good effect in the cross-mode pedestrian re-identification task.
Drawings
The invention is described in detail below with reference to the drawings and the detailed description;
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.
Referring to fig. 1, the following technical solutions are adopted in the present embodiment: a cross-modal pedestrian re-identification method based on local detail features mainly comprises an adaptive human body part mask generation module (APMG), a mask compensation Module (MC) and an intra-part interactive attention module (WIPA) with the right, and comprises the following steps:
step 1, reading a SYSU-MM01 data set which contains pedestrian images in two modes (normal light and infrared rays), and performing data enhancement on the data set; the training set is divided into a plurality of batches, where each batch includes 8 identities, each identity including 4 infrared images and 4 RGB images. Inputting a pair of cross-modal images with the same identity into a backbone network Resnet-50, and extracting a global feature map
Figure BDA0003660924830000051
Step 2, sending the pair of pedestrian images into an attitude estimation network GCM to obtain heatmaps of 16 different joints of the human body; pedestrian images are randomly extracted and observed for quality corresponding to heatmap. Finally, 9 joints (chest, upper neck, top of head, left and right shoulders, left hip, left and right elbows, left wrist) were selected from 16 heatmaps
Figure BDA0003660924830000052
Step 3, screening the obtained product
Figure BDA0003660924830000053
And a global feature map F output by the backbone network g And sending the image data into an APMG to obtain masks of different parts of the pedestrian. Specifically, the screened heatmaps are divided into two groups of P which respectively represent pedestrians top (chest, upper neck, crown of head, left and right shoulders), with P mid (left hip, left elbow, left wrist). Then, f is mixed g Incoming weight generating network G w (. to) generate a weight W for each heatmap h_map ∈R 1×9 . The calculation formula is as follows:
W h_map =σ(G w (GAP(F g )))
σ (-) denotes sigmoid function, G w (. cndot.) consists of a convolution with a convolution kernel size of 1. G w The purpose of (-) is to learn the degree of contribution of heatmap to human body parts based on global features and generate corresponding weights. With generated weights W h_map To P top And P mid Weighted summation of heatmaps of the two groups obtains the mask of the corresponding part top
Figure BDA0003660924830000054
The formula is as follows:
mask top =W h_map [P top ] T Heatmap[P top ]
mask mid =W h_map [P mid ] T Heatmap[P mid ]
[ P ] represents an element in the collection and corresponding position to P.
Partitioning Global feature map F with mask g Obtaining the characteristics F of top part and mid part of pedestrians l_top
Figure BDA0003660924830000061
The division formula of the local features is as follows:
F l_top =mask top ⊙F g
F l_mid =mask mid ⊙F g
and 4, selecting the posture estimation model GCM to identify the structure of the lower half of the pedestrian, wherein the posture estimation model GCM only selects the upper half of the pedestrian to generate masks. To compensate for the lack of mask lower body information, the MC will F g Dividing into three parts along vertical direction, and taking the last part of feature diagram F l_low As a representation of the lower body of the pedestrian. A local feature map F of three parts of the pedestrian l_low ,F l_top ,F l_mid Obtaining local characteristics f after global pooling local ∈R 3×C
Step 5, in order to mine local featuresAnd suppress slice feature F l_low The background information included introduces a WIPA module. WIPA with f local ∈R 3×C As an input. A self-attention calculation between local features is first performed. Feeding it into three 1X 1 convolutional layers Q (-), K (-), and V (-), respectively obtaining the dimension c k Query and key features of (a), and dimension of c v Value feature of (1). Features obtained from attention calculation are
Figure RE-GDA0003760657200000063
The self-attention calculation formula is:
Figure BDA0003660924830000063
Figure BDA0003660924830000064
h is the head number, by passing F l_top ,F l_mid Including body part information to help suppress slice feature F l_low The background information of (1). Because different parts of the pedestrian have different contribution degrees to the re-recognition task, the network can learn the weight of each local feature by self to enhance useful information. Specifically, i set two fully-connected layers and one ReLU layer to learn local feature weights, and the calculation formula is as follows:
Figure BDA0003660924830000065
Figure BDA0003660924830000066
will f is mixed g And
Figure BDA0003660924830000067
the connections along the pathway result in a final pedestrian characterization.
And 6, co-training the network by using three Loss functions of ID Loss, Center Cluster Loss and modification Learning Loss. To ensure local features F l_low The identity information of the lower body of the pedestrian is contained, and the identity information is restrained by IDLoss. Three loss functions are jointly trained for guiding network learning mode-independent features g And
Figure BDA0003660924830000068
final characteristics after connection.
And 7, respectively extracting the characteristics of pedestrians in the query and the galery set in a sequential mode, and sequentially carrying out similarity calculation on the query image characteristics and the image characteristics in the galery. And sorting the images in the gallery according to the similarity to obtain a re-identification result.
The specific implementation mode can guide the network to fully excavate the detailed information of the pedestrian. The proposed APMG can generate the weight of each heatmap according to the pedestrian posture, and the heatmap is fused to obtain the mask to extract the pedestrian detail features. Because the APMG lacks the characteristics of the lower half part of the body, the MC module provided fuses the APMG and the PCB to jointly extract the local characteristic representation of the pedestrian. Further, the proposed WIPA module can interact context information between local features, and suppress the context information in slice features using position information contained in the mask. The two local feature extraction modes complement each other to make up for the deficiency. According to the pedestrian cross-mode re-identification method and device, global and local detail features are combined to be used as the representation of the pedestrian, and good effects are achieved in the cross-mode re-identification task of the pedestrian.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (6)

1. A cross-modal pedestrian re-identification method based on local detail features is characterized by comprising the following steps:
reading a SYSU-MM01 data set which contains pedestrian images in two modes of normal light and infrared rays, and performing data enhancement on the data set; dividing the training set into uniform batches, wherein each batch comprises 8 images from 8 identities, and four images in each mode; inputting a pair of cross-modal images with the same identity into a backbone network Resnet-50, and extracting a global feature map
Figure FDA0003660924820000011
Figure FDA0003660924820000012
Step (2), sending the image pair into a posture estimation network GCM to obtain 16 image pairs
Figure FDA0003660924820000013
Screening 9 high-quality heatmaps to generate a human body part mask according to the estimated quality of each heatmap on SYSU-MM 01;
step (3), selecting 9 heatmaps and a global feature map F g Sending into an APMG (adaptive body part masking module) module; APMG learning F g Generating the weight of each heatmap in a self-adaptive manner according to the contribution degree of the upper body part; dividing the selected heatmap into two groups of top part and mid part; downsampling heatmap to F using maximum pooling g The size of the space of (a); then, the heatmap is added according to the weight to obtain the value of each part
Figure FDA0003660924820000014
Mask is used to divide F g Extraction of local features of top part and midpart
Figure FDA0003660924820000015
Step (4), in order to compensate the lower body information lacking in APMG, a MC (mask compensation) module is provided; MC in PCBThe global feature is divided into three pieces along the vertical direction by the way of extracting the local feature, and the last piece is taken
Figure FDA0003660924820000016
As a characterization of the following itself; then, the local feature vector f is obtained by performing global pooling on the local feature vector f and the local feature map extracted by the mask in a combined manner local ∈R 3×C
Step (5), f local A weighted intra-part attribute module (WIPA) is sent to mine the context relation between intra-part, and background information in low part characteristics is inhibited; finally, WIPA (wireless fidelity) measures the contribution degree of each part to generate a weight weighting characteristic; pooled global feature vector f g And f local Connecting along the pathway as a representation of a pedestrian;
step (6), in order to train the network to accurately capture the pedestrian mode invariant identity feature, the network is trained by three Loss functions, namely ID Loss, Center Cluster Loss and modification Learning Loss;
step (7), respectively extracting the characteristics of the query and the galery set, and calculating the similarity between the images in the query set and each image in the galery set; the Euclidean distance between the characteristic vectors is used as similarity measurement; and finally, sorting the images in the galery set according to the similarity to obtain a re-recognition result.
2. The cross-modal pedestrian re-identification method based on the local detail features as claimed in claim 1, wherein the step (2) is specifically as follows: sending a pair of pedestrian images into an attitude estimation network GCM to obtain heatmaps of 16 different joints of the human body; randomly extracting pedestrian images and observing the quality of the corresponding heatmap; finally, 9 of the 9 joints (chest, upper neck, crown, left and right shoulders, left hip, left and right elbows, left wrist) were selected from the 16 heatmaps
Figure FDA0003660924820000021
3. According to claimThe cross-modal pedestrian re-identification method based on the local detail features of claim 1, wherein the APMG in the step (3) can adaptively generate a mask to extract refined local features; the input of APMG is selected
Figure FDA0003660924820000022
And a global feature map F output by the backbone network g Composition is carried out; specifically, the screened heatmaps were divided into two groups of P representing pedestrians, respectively top (chest, upper neck, crown of head, left and right shoulders), with P mid (left hip, left elbow, left wrist); then, f is mixed g Incoming weight generating network G w (. to) generate a weight for each heatmap
Figure FDA0003660924820000023
The calculation formula is as follows:
W h_map =σ(G w (GAP(F g )))
σ (-) denotes sigmoid function, G w (. consists of a convolution with a convolution kernel size of 1; g w The purpose of the (-) is to learn the contribution degree of heatmap corresponding to human body parts according to the global features and generate corresponding weights; with generated weights W h_map To P top And P mid Weighted summation of heatmaps of the two groups to obtain corresponding parts
Figure FDA0003660924820000024
The formula is as follows:
mask top =W h_map [P top ] T Heatmap[P top ]
mask mid =W h_ma p[P mid ] T Heatmap[P mid ]
[ P ] represents an element in the collection and corresponding position to P;
partitioning global feature graph F with mask g Obtaining the characteristics of top part and mid part of the pedestrian
Figure FDA0003660924820000025
The division formula of the local features is as follows:
F l_top =mask top ⊙F g
F l_mid =mask mid ⊙F g
4. the method as claimed in claim 1, wherein the MC is used for compensating for the lack of lower body information of the mask and converting the MC into F g Dividing into three parts along vertical direction, and taking the last part of feature diagram F l_low As a representation of the lower body of the pedestrian; a local feature map F of three parts of the pedestrian l_low ,F l_top ,F l_mid Obtaining a local characteristic f after global pooling local ∈R 3×C
5. The method according to claim 1, wherein the WIPA module in step (5) uses f to re-identify the pedestrian in cross-modal mode based on the local detail feature local ∈R 3×C As an input; firstly, self-attention calculation among local features is carried out; feeding it into three 1X 1 convolutional layers Q (-), K (-), and V (-), respectively obtaining the dimension c k Query and key features of, and dimension c v Value feature of (1); features obtained from attention calculation are
Figure FDA0003660924820000031
The self-attention calculation formula is:
Figure FDA0003660924820000032
Figure FDA0003660924820000033
h is the head number, by passing F l_top ,F l _ mid Including body part information to help suppressSection characteristics F l_low The background information of (1); because different parts of the pedestrian have different contribution degrees to the re-recognition task, the network can learn the weight of each local feature by self to enhance useful information; specifically, i set two fully-connected layers and one ReLU layer to learn local feature weights, and the calculation formula is as follows:
Figure FDA0003660924820000034
Figure FDA0003660924820000035
will f is g And
Figure FDA0003660924820000036
the connections along the pathway result in a final pedestrian characterization.
6. The method according to claim 1, wherein the ID Loss in the step (6) is used to train the feature F l_low Ensuring that it contains pedestrian lower body information; three loss functions joint training f g And
Figure FDA0003660924820000037
final characteristics after connection.
CN202210604338.2A 2022-05-25 2022-05-25 Cross-modal pedestrian re-identification method based on local detail features Active CN115050048B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210604338.2A CN115050048B (en) 2022-05-25 2022-05-25 Cross-modal pedestrian re-identification method based on local detail features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210604338.2A CN115050048B (en) 2022-05-25 2022-05-25 Cross-modal pedestrian re-identification method based on local detail features

Publications (2)

Publication Number Publication Date
CN115050048A true CN115050048A (en) 2022-09-13
CN115050048B CN115050048B (en) 2023-04-18

Family

ID=83159414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210604338.2A Active CN115050048B (en) 2022-05-25 2022-05-25 Cross-modal pedestrian re-identification method based on local detail features

Country Status (1)

Country Link
CN (1) CN115050048B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830637A (en) * 2022-12-13 2023-03-21 杭州电子科技大学 Method for re-identifying shielded pedestrian based on attitude estimation and background suppression
CN118315022A (en) * 2024-06-05 2024-07-09 吉林大学 Intelligent management system and method for early rehabilitation training of children

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740541A (en) * 2019-01-04 2019-05-10 重庆大学 A kind of pedestrian weight identifying system and method
WO2021017303A1 (en) * 2019-07-30 2021-02-04 平安科技(深圳)有限公司 Person re-identification method and apparatus, computer device and storage medium
JP6830707B1 (en) * 2020-01-23 2021-02-17 同▲済▼大学 Person re-identification method that combines random batch mask and multi-scale expression learning
CN112434796A (en) * 2020-12-09 2021-03-02 同济大学 Cross-modal pedestrian re-identification method based on local information learning
CN112818931A (en) * 2021-02-26 2021-05-18 中国矿业大学 Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion
CN113158891A (en) * 2021-04-20 2021-07-23 杭州像素元科技有限公司 Cross-camera pedestrian re-identification method based on global feature matching
CN113408492A (en) * 2021-07-23 2021-09-17 四川大学 Pedestrian re-identification method based on global-local feature dynamic alignment
WO2022001489A1 (en) * 2020-06-28 2022-01-06 北京交通大学 Unsupervised domain adaptation target re-identification method
CN114220124A (en) * 2021-12-16 2022-03-22 华南农业大学 Near-infrared-visible light cross-modal double-flow pedestrian re-identification method and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740541A (en) * 2019-01-04 2019-05-10 重庆大学 A kind of pedestrian weight identifying system and method
WO2021017303A1 (en) * 2019-07-30 2021-02-04 平安科技(深圳)有限公司 Person re-identification method and apparatus, computer device and storage medium
JP6830707B1 (en) * 2020-01-23 2021-02-17 同▲済▼大学 Person re-identification method that combines random batch mask and multi-scale expression learning
WO2022001489A1 (en) * 2020-06-28 2022-01-06 北京交通大学 Unsupervised domain adaptation target re-identification method
CN112434796A (en) * 2020-12-09 2021-03-02 同济大学 Cross-modal pedestrian re-identification method based on local information learning
CN112818931A (en) * 2021-02-26 2021-05-18 中国矿业大学 Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion
CN113158891A (en) * 2021-04-20 2021-07-23 杭州像素元科技有限公司 Cross-camera pedestrian re-identification method based on global feature matching
CN113408492A (en) * 2021-07-23 2021-09-17 四川大学 Pedestrian re-identification method based on global-local feature dynamic alignment
CN114220124A (en) * 2021-12-16 2022-03-22 华南农业大学 Near-infrared-visible light cross-modal double-flow pedestrian re-identification method and system

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
QIONG WU ET.AL: "Discover Cross-Modality Nuances for Visible-Infrared Person Re-Identification", 《PROCEEDINGS OF THE IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
刘康凝等: "基于多任务学习的行人重识别特征表示方法", 《重庆邮电大学学报(自然科学版)》 *
吴绍君等: "基于多层次深度学习网络的行人重识别", 《山东师范大学学报(自然科学版)》 *
李灏等: "基于改进困难三元组损失的跨模态行人重识别框架", 《计算机科学》 *
郑烨等: "基于姿态引导对齐网络的局部行人再识别", 《计算机工程》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830637A (en) * 2022-12-13 2023-03-21 杭州电子科技大学 Method for re-identifying shielded pedestrian based on attitude estimation and background suppression
CN115830637B (en) * 2022-12-13 2023-06-23 杭州电子科技大学 Method for re-identifying blocked pedestrians based on attitude estimation and background suppression
US11908222B1 (en) 2022-12-13 2024-02-20 Hangzhou Dianzi University Occluded pedestrian re-identification method based on pose estimation and background suppression
CN118315022A (en) * 2024-06-05 2024-07-09 吉林大学 Intelligent management system and method for early rehabilitation training of children

Also Published As

Publication number Publication date
CN115050048B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN109815826B (en) Method and device for generating face attribute model
Zhong et al. Grayscale enhancement colorization network for visible-infrared person re-identification
Song et al. Region-based quality estimation network for large-scale person re-identification
CN115050048B (en) Cross-modal pedestrian re-identification method based on local detail features
CN110472604B (en) Pedestrian and crowd behavior identification method based on video
Jan et al. Accurate facial parts localization and deep learning for 3D facial expression recognition
CN108230291B (en) Object recognition system training method, object recognition method, device and electronic equipment
CN109101865A (en) A kind of recognition methods again of the pedestrian based on deep learning
CN112287891B (en) Method for evaluating learning concentration through video based on expression behavior feature extraction
CN109447123B (en) Pedestrian re-identification method based on label consistency constraint and stretching regularization dictionary learning
CN110097029B (en) Identity authentication method based on high way network multi-view gait recognition
CN110263768A (en) A kind of face identification method based on depth residual error network
KR20200055811A (en) Facial emotional recognition apparatus for Identify Emotion and method thereof
CN112488229A (en) Domain self-adaptive unsupervised target detection method based on feature separation and alignment
CN113869105B (en) Human behavior recognition method
CN111797705A (en) Action recognition method based on character relation modeling
CN112070010A (en) Pedestrian re-recognition method combining multi-loss dynamic training strategy to enhance local feature learning
Laines et al. Isolated sign language recognition based on tree structure skeleton images
CN115205903A (en) Pedestrian re-identification method for generating confrontation network based on identity migration
Xing et al. Multi-level adaptive perception guidance based infrared and visible image fusion
CN114743162A (en) Cross-modal pedestrian re-identification method based on generation of countermeasure network
CN109165551B (en) Expression recognition method for adaptively weighting and fusing significance structure tensor and LBP characteristics
Zhou et al. Multitask deep neural network with knowledge-guided attention for blind image quality assessment
CN113283372A (en) Method and apparatus for processing image of person
CN112488165A (en) Infrared pedestrian identification method and system based on deep learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant