CN115050048B - Cross-modal pedestrian re-identification method based on local detail features - Google Patents

Cross-modal pedestrian re-identification method based on local detail features Download PDF

Info

Publication number
CN115050048B
CN115050048B CN202210604338.2A CN202210604338A CN115050048B CN 115050048 B CN115050048 B CN 115050048B CN 202210604338 A CN202210604338 A CN 202210604338A CN 115050048 B CN115050048 B CN 115050048B
Authority
CN
China
Prior art keywords
pedestrian
local
features
mask
heatmap
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210604338.2A
Other languages
Chinese (zh)
Other versions
CN115050048A (en
Inventor
产思贤
朱锦校
吴周检
林沛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Pixel Technology Co ltd
Original Assignee
Hangzhou Pixel Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Pixel Technology Co ltd filed Critical Hangzhou Pixel Technology Co ltd
Priority to CN202210604338.2A priority Critical patent/CN115050048B/en
Publication of CN115050048A publication Critical patent/CN115050048A/en
Application granted granted Critical
Publication of CN115050048B publication Critical patent/CN115050048B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a cross-modal pedestrian re-identification method based on local detail features, which can guide a network to fully mine the detail information of pedestrians. The proposed APMG can generate the weight of each heatmap according to the pedestrian posture, and the heatmap is fused to obtain the mask to extract the pedestrian detail features. Because the APMG lacks the characteristics of the lower half part of the body, the MC module provided fuses the APMG and the PCB to jointly extract the local characteristic representation of the pedestrian. Further, the proposed WIPA module can interact context information between local features, and suppress the context information in slice features using position information contained in the mask. The two local feature extraction modes complement each other to make up for the deficiency. According to the method and the device, global and local detail characteristics are combined to be used as the representation of the pedestrian, and good effect is achieved in the cross-mode pedestrian re-identification task.

Description

Cross-modal pedestrian re-identification method based on local detail features
Technical Field
The invention relates to the technical field of image processing, in particular to a cross-modal pedestrian re-identification method based on local detail features.
Background
Given a query image and an image dataset, gallery, from different modalities, respectively, the purpose of cross-modality pedestrian re-identification is to match images in gallery that are identical to the identity of the query. Because of its importance in the field of public safety, cross-modal pedestrian re-identification has become a popular problem in the field of re-identification. Due to variations in spectrum, pedestrian pose, and photographic viewing angle, it is a challenging task to fully mine distinctive pedestrian identity.
In order to fully capture pedestrian information, a pedestrian representation method combining local features becomes a common setting in the field of cross-modal pedestrian re-identification. There are three main types of local feature extraction schemes including slicing, pose estimation, and mask filtering. The common slicing method PCB is to uniformly slice the finally output characteristic diagram of the backbone network into a plurality of strips along the vertical direction. From top to bottom, each bar characterizes a different part of the human body. These local features are then constrained using a loss function to focus the network on locally discriminative information. Although slicing can guarantee coverage of all parts of a pedestrian, slicing inevitably introduces irrelevant background information and cannot guarantee alignment of local features.
In order to solve the above problem, there is a method such as PGII that positions a pedestrian portion using attitude estimation. A heatmap is generated using the pre-trained pose estimator and local features are extracted using the heatmap as a mask. Attitude estimation can help a network to position joint positions, solve the problem of feature misalignment and filter background information to a certain extent. However, the estimation effect of pose estimation on pedestrian re-recognition data sets may be inaccurate, which may introduce some background features. In addition, the methods do not distinguish heatmaps when local features are extracted in the early stage, and background information may still be introduced due to the lack of robustness to pedestrian changes. In order to enhance the robustness of the local features to different pedestrians, the method MPANet extracts the local features by utilizing a deep network model learning mask. But the generated mask hardly focuses on a certain part of the body stably and is lack of the label of the mask, so that the extracted local features are not aligned.
In conclusion, the invention designs a cross-modal pedestrian re-identification method based on local detail characteristics.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention aims to provide a cross-mode pedestrian re-identification method based on local detail characteristics, and three modules are introduced into the prior art, wherein the three modules comprise adaptive body part mask generation module (APMG), mask Compensation (MC) scheme and weighted intra-part association module (WIPA). The problems of misalignment of background information and characteristics and the like existing in the current local characteristic method are solved.
In order to achieve the purpose, the invention is realized by the following technical scheme: a cross-mode pedestrian re-identification method based on local detail features comprises the following steps:
step 1, reading a SYSU-MM01 data set which comprises pedestrian images in two modes (normal light and infrared rays), and performing data enhancement on the data set. The training set was divided into uniform batches, each containing 8 images from 8 identities, four for each modality. Inputting a pair of cross-modal images with the same identity into a backbone network Resnet-50, and extracting a global feature map
Figure GDA0003987957520000021
Step 2, sending the image pair into a posture estimation network GCM to obtain 16 images
Figure GDA0003987957520000022
Based on the estimated mass of each heatmap on SYSU-MM01, 9 high-quality heatmaps were screened to generate human body part masks.
Step 3, selecting 9 heatmaps and a global feature map F g The data are sent to an APMG (adaptive body part masking module) module. APMG learning F g The contribution of the upper body parts adaptively generates the weight of each heatmap. The selected heatmaps were divided into two groups, top part and mid part. Downsampling heatmap to F using max-pooling g The size of the space of (a). Then, the heatmap is accumulated according to the weight to obtain the weight of each part
Figure GDA0003987957520000023
Mask is used to divide F g Local feature F for extracting top part and midpart l_top ,
Figure GDA0003987957520000024
And 4, proposing an MC (mask compensation) module in order to compensate the lower body information lacking in the APMG. MC adopts PCB extraction officeDividing the global feature into three pieces in the vertical direction by means of the partial features, and taking the last piece
Figure GDA0003987957520000025
Characterized as the following characteristics. Then, the local feature vector f is obtained by performing global pooling on the local feature vector f and the local feature map extracted by the mask in a combined manner local ∈R 3×C
Step 5, f local And (3) mining the context relationship between intra-part by a weighted intra-part attribute module, and simultaneously inhibiting background information in the low part characteristic. And finally, WIPA, the contribution degree of each part is measured to generate a weight weighting characteristic. Pooled global feature vector f g And f local The pedestrian is characterized by the connection along the passage.
And 6, in order to train the network to accurately capture the invariant identity features of the pedestrian mode, the network is trained by using three Loss functions, namely ID Loss, center Cluster Loss and modification Learning Loss.
And 7, respectively extracting pedestrian features from the query set and the galery set, and calculating the similarity of the images in the query set and each image in the galery set. And the Euclidean distance between the feature vectors is used as a similarity measure. And finally, sorting the images in the galery set according to the similarity to obtain a re-recognition result.
Preferably, the step 2 specifically comprises: and sending a pair of pedestrian images into an attitude estimation network GCM to obtain heatmaps of 16 different joints of the human body. Pedestrian images were randomly extracted and observed for the quality of their corresponding heatmaps. Finally, 9 of the 9 joints (chest, upper neck, crown, left and right shoulders, left hip, left and right elbows, left wrist) were selected from the 16 heatmaps
Figure GDA0003987957520000031
Preferably, the APMG in step 3 can adaptively generate a mask to extract refined local features. Input of APMG is selected
Figure GDA0003987957520000032
And a global feature map F output by the backbone network g And (4) forming. Specifically, the screened heatmaps are divided into two groups of P which respectively represent pedestrians top (chest, upper neck, crown of head, left and right shoulders), with P mid (left hip, left elbow, left wrist). Then, f is mixed g Incoming weight generating network G w (. To) generate a weight W for each heatmap h_map ∈R 1×9 . The calculation formula is as follows:
W h_map =σ(G w (GAP(F g )))
σ (-) denotes sigmoid function, G w (. Cndot.) consists of a convolution with a convolution kernel size of 1. G w The purpose of (-) is to learn the degree of contribution of heatmap to human body parts based on global features and generate corresponding weights. With generated weight W h_map To P top And P mid Weighted summation of heatmaps of the two groups obtains the mask of the corresponding part top ,
Figure GDA0003987957520000033
The formula is as follows:
mask top =W h_map [P top ] T Heatmap[P top ]
mask mid =W h_map [P mid ] T Heatmap[P mid ]
[ P ] represents an element at a position corresponding to the set P.
Partitioning Global feature map F with mask g Obtaining the characteristics F of top part and mid part of pedestrians l_top
Figure GDA0003987957520000041
The division formula of the local features is as follows:
F l_top =mask top ⊙F g
F l_mid =mask mid ⊙F g
preferably, the MC is F to compensate for the lack of lower body information of the mask g Dividing into three parts along vertical direction, and taking the last part of feature diagram F l_low As a representation of the lower body of a pedestrian. A local feature map F of three parts of the pedestrian l_low ,F l_top ,F l_mid Obtaining local characteristics f after global pooling local ∈R 3×C
Preferably, the WIPA module in step 5 is expressed by f local ∈R 3×C As an input. A self-attention calculation between local features is first performed. Sending it into three 1 × 1 convolutional layers Q (-), K (-), and V (-), to obtain query and key features with dimension ck, and c v Value feature of (1). Features obtained from attention calculation are
Figure GDA0003987957520000042
The self-attention calculation formula is:
Figure GDA0003987957520000043
Figure GDA0003987957520000044
h is the head number, by passing F l_top ,F l_mid Including body part information to help suppress slice feature F l_low The background information of (1). Because different parts of the pedestrian have different contribution degrees to the re-recognition task, the network can learn the weight of each local feature by self to enhance useful information. Specifically, I set two fully connected layers and one ReLU layer to learn local feature weights, and the calculation formula is as follows:
Figure GDA0003987957520000045
Figure GDA0003987957520000046
f is to be g And with
Figure GDA0003987957520000047
Obtaining final pedestrian characterization along the pathway connection
Preferably, the ID Loss in step 6 is used to train the feature F l_low Ensuring that it contains pedestrian lower body information. Three loss functions joint training f g And
Figure GDA0003987957520000048
final characteristics after connection.
The invention has the following beneficial effects: the invention combines the global and local detail characteristics as the representation of the pedestrian, and obtains good effect in the cross-mode pedestrian re-identification task.
Drawings
The invention is described in detail below with reference to the drawings and the detailed description;
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.
Referring to fig. 1, the following technical solutions are adopted in the present embodiment: a cross-modal pedestrian re-identification method based on local detail features mainly comprises an adaptive human body part mask generation module (APMG), a mask compensation Module (MC) and an intra-part interactive attention module (WIPA) with the right, and comprises the following steps:
step 1, reading a SYSU-MM01 data set which comprises pedestrian images in two modes (normal light and infrared rays), and performing data enhancement on the data set; the training set is divided into a plurality of batches, wherein each batch comprises 8 identities, and each identity comprises 4 infrared images and 4 RGB images. Inputting a pair of cross-modal images with the same identity into a backbone network Resnet-50, and extracting a global feature map
Figure GDA0003987957520000051
Step 2, enabling a pair of pedestriansSending the image into a posture estimation network GCM to obtain heatmaps of 16 different joints of the human body; pedestrian images are randomly extracted and observed for quality corresponding to heatmap. Finally, 9 joints (chest, upper neck, top of head, left and right shoulders, left hip, left and right elbows, left wrist) were selected from 16 heatmaps
Figure GDA0003987957520000052
Step 3, screening the obtained product
Figure GDA0003987957520000053
And global feature map F output by backbone network g And sending the image data into an APMG to obtain masks of different parts of the pedestrian. Specifically, the screened heatmaps were divided into two groups of P representing pedestrians, respectively top (chest, upper neck, crown, left and right shoulders), and P mid (left hip, left elbow, left wrist). Then, f is mixed g Incoming weight generating network G w (. To) generate a weight W for each heatmap h_map ∈R 1×9 . The calculation formula is as follows:
W h_map =σ(G w (GAP(F g )))
σ (-) denotes sigmoid function, G w (. Cndot.) consists of a convolution with a convolution kernel size of 1. G w The purpose of (-) is to learn the degree of contribution of heatmap to human body parts based on global features and generate corresponding weights. With generated weights W h_map To P top And P mid Weighted summation of heatmaps of the two groups obtains the mask of the corresponding part top
Figure GDA0003987957520000061
The formula is as follows:
mask top =W h_map [P top ] T Heatmap[P top ]
mask mid =W h_map [P mid ] T Heatmap[P mid ]
[ P ] represents an element at a position corresponding to the set P.
Partitioning Global feature map F with mask g Get pedestrian top part andfeature F of mid part l_top
Figure GDA0003987957520000062
The division formula of the local features is as follows:
F l_top =mask top ⊙F g
F l_mid =mask mid ⊙F g
and 4, selecting the posture estimation model GCM to identify the structure of the lower half of the pedestrian, wherein the posture estimation model GCM only selects the upper half of the pedestrian to generate masks. To compensate for the lack of mask lower body information, the MC will F g Dividing into three parts along vertical direction, and taking the last part of feature diagram F l_low As a representation of the lower body of the pedestrian. A local feature map F of three parts of the pedestrian l_low ,F l_top ,F l_mid Obtaining local characteristics f after global pooling local ∈R 3×C
Step 5, in order to mine the context information between the local features and restrain the slice feature F l_low The background information included introduces a WIPA module. WIPA with f local ∈R 3×C As an input. A self-attention calculation between local features is first performed. It is fed into three 1X 1 convolutional layers Q (-), K (-), and V (-), respectively, with dimensions c k Query and key features of, and dimension c v Value feature of (1). Features obtained after noting the attention calculation are
Figure GDA0003987957520000063
The self-attention calculation formula is:
Figure GDA0003987957520000064
Figure GDA0003987957520000065
h is the head number, by passing F l_top ,F l_mid Including the human bodyBit information to help suppress slice feature F l_low The background information of (1). Because different parts of the pedestrian have different contribution degrees to the re-recognition task, the network can learn the weight of each local feature by self to enhance useful information. Specifically, i set two fully-connected layers and one ReLU layer to learn local feature weights, and the calculation formula is as follows:
Figure GDA0003987957520000071
Figure GDA0003987957520000072
will f is g And
Figure GDA0003987957520000073
the connections along the pathway result in a final pedestrian characterization.
And 6, training the network by using three Loss functions of ID Loss, center Cluster Loss and Modality Learning Loss. To ensure local characteristics F l_low Contains the identity information of the lower body of the pedestrian and is restrained by IDloss. Three loss functions are jointly trained for guiding network learning mode-independent features g And
Figure GDA0003987957520000074
final characteristics after connection.
And 7, respectively extracting the characteristics of pedestrians in the query and the roller set in a sequential mode, and performing similarity calculation on the query image characteristics and the image characteristics in the roller set in sequence. And sorting the images in the galery according to the similarity to obtain a re-recognition result.
The specific implementation mode can guide the network to fully excavate the detailed information of the pedestrian. The proposed APMG can generate the weight of each heatmap according to the pedestrian posture, and the heatmap is fused to obtain the mask to extract the pedestrian detail features. Because the APMG lacks the characteristics of the lower half part of the body, the MC module provided fuses the APMG and the PCB to jointly extract the local characteristic representation of the pedestrian. Further, the proposed WIPA module can interact context information between local features, and suppress the context information in slice features using position information contained in the mask. The two local feature extraction modes complement each other to make up for the deficiency. According to the pedestrian cross-mode re-identification method and device, global and local detail features are combined to be used as the representation of the pedestrian, and good effects are achieved in the cross-mode re-identification task of the pedestrian.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (6)

1. A cross-modal pedestrian re-identification method based on local detail features is characterized by comprising the following steps:
reading a SYSU-MM01 data set which comprises pedestrian images in two modes of normal light and infrared rays, and performing data enhancement on the data set; dividing the training set into uniform batches, wherein each batch comprises 8 images from 8 identities, and four images in each mode; inputting a pair of cross-modal images with the same identity into a backbone network Resnet-50, and extracting a global feature map
Figure FDA0003987957510000011
Step (2), sending the image pair into a posture estimation network GCM to obtain 16 image pairs
Figure FDA0003987957510000012
Screening 9 high-quality heatmaps to generate a human body part mask according to the estimated quality of each heatmap on SYSU-MMO 1;
step (3), selecting 9 heatmaps and a global feature map F g Feeding into APMG (adaptive body part)A masking generation module); APMG learning F g Generating the weight of each heatmap in a self-adaptive manner according to the contribution degree of the upper body part; dividing the selected heatmap into two groups of top part and mid part; downsampling heatmap to F using maximum pooling g The size of the space of (a); then, the heatmap is accumulated according to the weight to obtain the weight of each part
Figure FDA0003987957510000013
Mask is used to divide F g Extraction of local features->
Figure FDA0003987957510000014
Step (4), in order to compensate the lower body information lacking in APMG, a MC (mask compensation) module is provided; the MC divides the global feature into three pieces in the vertical direction by extracting the local feature from the PCB, and takes the last piece
Figure FDA0003987957510000015
As a characterization of the following itself; then, the local feature vector f is obtained by performing global pooling on the local feature vector f and the local feature map extracted by the mask in a combined manner local ∈R 3×C
Step (5), f local Mining the context relation between intra-part by a proposed WIPA (weighted intra-part attribute module) module, and simultaneously inhibiting background information in low part characteristics; finally, WIPA (wireless fidelity) measures the contribution degree of each part to generate a weight weighting characteristic; pooled global feature vector f g And f local Obtaining the representation of the pedestrian along the channel connection;
step (6), in order to train the network to accurately capture the invariant identity features of the pedestrian mode, the network is trained by using three Loss functions, namely ID Loss, center Cluster Loss and modification Learning Loss;
step (7), pedestrian features are respectively extracted from the query set and the gallery set, and similarity is calculated between the images in the query set and each image in the gallery set; the Euclidean distance between the characteristic vectors is used as similarity measurement; and finally, sorting the images in the galery set according to the similarity to obtain a re-recognition result.
2. The cross-modal pedestrian re-identification method based on the local detail features as claimed in claim 1, wherein the step (2) is specifically as follows: sending a pair of pedestrian images into an attitude estimation network GCM to obtain heatmaps of 16 different joints of the human body; randomly extracting pedestrian images and observing the quality of the corresponding heatmap; finally, 9 of the 9 joints (chest, upper neck, crown, left and right shoulders, left hip, left and right elbows, left wrist) were selected from the 16 heatmaps
Figure FDA0003987957510000021
3. The method according to claim 1, wherein the APMG in step (3) can adaptively generate a mask to extract refined local features; the input of APMG is selected
Figure FDA0003987957510000022
And a global feature map F output by the backbone network g Forming; specifically, the screened heatmaps were divided into two groups of P representing pedestrians, respectively top (chest, upper neck, crown, left and right shoulders), and P mid (left hip, left elbow, left wrist); then, f is mixed g Incoming weight generating network G w (. To) generate a weight W for each heatmap h_map ∈R 1×9 (ii) a The calculation formula is as follows:
W h_map =σ(G w (GAP(F g )))
σ (-) denotes a sigmoid function, G w (. Consists of a convolution with a convolution kernel size of 1; g w The purpose of the (-) is to learn the contribution degree of heatmap corresponding to human body parts according to the global features and generate corresponding weights; with generated weights W h_map To P top And P mid Weighted summation of heatmaps of the two groups to obtain corresponding parts
Figure FDA0003987957510000023
The formula is as follows:
mask top =W h_map [P top ] T Heatmap[P top ]
mask mid =W h_map [P mid ] T Heatmap[P mid ]
[ P ] represents an element at a position corresponding to the set P;
partitioning global feature graph F with mask g Obtaining the characteristics of top part and mid part of pedestrians
Figure FDA0003987957510000031
The division formula of the local features is as follows:
F l_top =mask top ⊙F g
F l_mid =mask mid ⊙F g
4. the method for cross-modal pedestrian re-recognition based on local detail features as claimed in claim 1, wherein the MC is to convert F into F in order to compensate for the lack of mask lower body information g Dividing into three parts along vertical direction, and taking the last part of feature diagram F l_low As a representation of the lower body of the pedestrian; a local feature map F of three parts of the pedestrian l_low ,F ltop ,F l_mid Obtaining local characteristics f after global pooling local ∈R 3×C
5. The method according to claim 1, wherein the WIPA module in step (5) uses f to re-identify the pedestrian in cross-modal mode based on the local detail feature local ∈R 3×C As an input; firstly, self-attention calculation among local features is carried out; feeding it into three 1X 1 convolutional layers Q (-), K (-), and V (-), respectively obtaining the dimension c k Query and key features of, and dimension c v Value feature of (1); features obtained from attention calculation are
Figure FDA0003987957510000032
The self-attention calculation formula is:
Figure FDA0003987957510000033
Figure FDA0003987957510000034
h is the head number, by passing F l_top ,F l_mid Including body part information to help suppress slice feature F l_low The background information of (1); because different parts of the pedestrian have different contribution degrees to the re-recognition task, the network can learn the weight of each local feature by self to enhance useful information; specifically, i set two fully-connected layers and one ReLU layer to learn local feature weights, and the calculation formula is as follows:
Figure FDA0003987957510000035
Figure FDA0003987957510000036
will f is g And
Figure FDA0003987957510000037
the connections along the pathway result in a final pedestrian characterization.
6. The method according to claim 1, wherein the ID Loss in the step (6) is used to train the feature F l_low Ensuring that it contains pedestrian lower body information; three loss functions joint training f g And
Figure FDA0003987957510000041
final characteristics after connection. />
CN202210604338.2A 2022-05-25 2022-05-25 Cross-modal pedestrian re-identification method based on local detail features Active CN115050048B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210604338.2A CN115050048B (en) 2022-05-25 2022-05-25 Cross-modal pedestrian re-identification method based on local detail features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210604338.2A CN115050048B (en) 2022-05-25 2022-05-25 Cross-modal pedestrian re-identification method based on local detail features

Publications (2)

Publication Number Publication Date
CN115050048A CN115050048A (en) 2022-09-13
CN115050048B true CN115050048B (en) 2023-04-18

Family

ID=83159414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210604338.2A Active CN115050048B (en) 2022-05-25 2022-05-25 Cross-modal pedestrian re-identification method based on local detail features

Country Status (1)

Country Link
CN (1) CN115050048B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830637B (en) * 2022-12-13 2023-06-23 杭州电子科技大学 Method for re-identifying blocked pedestrians based on attitude estimation and background suppression
CN118315022B (en) * 2024-06-05 2024-10-01 吉林大学 Intelligent management system and method for early rehabilitation training of children

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818931A (en) * 2021-02-26 2021-05-18 中国矿业大学 Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740541B (en) * 2019-01-04 2020-08-04 重庆大学 Pedestrian re-identification system and method
CN110532884B (en) * 2019-07-30 2024-04-09 平安科技(深圳)有限公司 Pedestrian re-recognition method, device and computer readable storage medium
CN111259850B (en) * 2020-01-23 2022-12-16 同济大学 Pedestrian re-identification method integrating random batch mask and multi-scale representation learning
CN111814854B (en) * 2020-06-28 2023-07-28 北京交通大学 Target re-identification method without supervision domain adaptation
CN112434796B (en) * 2020-12-09 2022-10-25 同济大学 Cross-modal pedestrian re-identification method based on local information learning
CN113158891B (en) * 2021-04-20 2022-08-19 杭州像素元科技有限公司 Cross-camera pedestrian re-identification method based on global feature matching
CN113408492B (en) * 2021-07-23 2022-06-14 四川大学 Pedestrian re-identification method based on global-local feature dynamic alignment
CN114220124B (en) * 2021-12-16 2024-07-12 华南农业大学 Near infrared-visible light cross-mode double-flow pedestrian re-identification method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818931A (en) * 2021-02-26 2021-05-18 中国矿业大学 Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于姿态引导对齐网络的局部行人再识别;郑烨等;《计算机工程》(第05期);全文 *

Also Published As

Publication number Publication date
CN115050048A (en) 2022-09-13

Similar Documents

Publication Publication Date Title
Zhong et al. Grayscale enhancement colorization network for visible-infrared person re-identification
CN109815826B (en) Method and device for generating face attribute model
CN115050048B (en) Cross-modal pedestrian re-identification method based on local detail features
Song et al. Region-based quality estimation network for large-scale person re-identification
CN110472604B (en) Pedestrian and crowd behavior identification method based on video
CN109101865A (en) A kind of recognition methods again of the pedestrian based on deep learning
Jan et al. Accurate facial parts localization and deep learning for 3D facial expression recognition
CN108230291B (en) Object recognition system training method, object recognition method, device and electronic equipment
Yao et al. Robust CNN-based gait verification and identification using skeleton gait energy image
CN106778604A (en) Pedestrian's recognition methods again based on matching convolutional neural networks
CN107169417B (en) RGBD image collaborative saliency detection method based on multi-core enhancement and saliency fusion
CN104504362A (en) Face detection method based on convolutional neural network
CN110097029B (en) Identity authentication method based on high way network multi-view gait recognition
Yang et al. Leveraging virtual and real person for unsupervised person re-identification
Huang et al. Domain adaptive attention learning for unsupervised person re-identification
CN109447123B (en) Pedestrian re-identification method based on label consistency constraint and stretching regularization dictionary learning
CN113963032A (en) Twin network structure target tracking method fusing target re-identification
CN112287891A (en) Method for evaluating learning concentration through video based on expression and behavior feature extraction
CN112488229A (en) Domain self-adaptive unsupervised target detection method based on feature separation and alignment
CN109447175A (en) In conjunction with the pedestrian of deep learning and metric learning recognition methods again
CN112070010A (en) Pedestrian re-recognition method combining multi-loss dynamic training strategy to enhance local feature learning
Laines et al. Isolated sign language recognition based on tree structure skeleton images
CN111797705A (en) Action recognition method based on character relation modeling
Chen et al. Hierarchical posture representation for robust action recognition
CN114743162A (en) Cross-modal pedestrian re-identification method based on generation of countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant