CN113947782A - Pedestrian target alignment method based on attention mechanism - Google Patents
Pedestrian target alignment method based on attention mechanism Download PDFInfo
- Publication number
- CN113947782A CN113947782A CN202111197529.3A CN202111197529A CN113947782A CN 113947782 A CN113947782 A CN 113947782A CN 202111197529 A CN202111197529 A CN 202111197529A CN 113947782 A CN113947782 A CN 113947782A
- Authority
- CN
- China
- Prior art keywords
- image
- feature
- pedestrian
- attention
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 9
- 230000008447 perception Effects 0.000 claims description 7
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 6
- 238000010586 diagram Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 3
- 238000005549 size reduction Methods 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 238000000605 extraction Methods 0.000 abstract description 2
- 238000005457 optimization Methods 0.000 abstract 1
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention provides a pedestrian target alignment method based on an attention mechanism, aiming at solving the problems of target loss and shielding, and improving the alignment precision and the alignment performance through process optimization and local feature extraction. The invention has high alignment precision, fully utilizes the global structure information and the local original information of the pedestrian image characteristics, effectively processes the pedestrian target shielding problem and relieves the influence of the local characteristic misalignment problem on the algorithm performance.
Description
Technical Field
The invention relates to the technical field of pedestrian target alignment, in particular to a pedestrian target alignment method based on an attention mechanism.
Background
Pedestrian target alignment is a technique that identifies an initial target region from a sequence of pedestrian images and retrieves the target in a subsequent cross-camera cross-scene. The technology can be used as an important supplement of a face recognition technology, and is widely applied to various military and civil fields such as pedestrian detection, video monitoring, action recognition, skeleton key point detection, posture recognition and the like.
In recent years, the problem of pedestrian target alignment, especially in a complex background, has gradually become a research focus, and various pedestrian target alignment techniques and methods have been proposed. Among them, the attention-based method is gaining favor in academic and industrial fields due to its excellent performance and high running speed. The target identification method based on significance learning adds the significance characteristics of the target pedestrian into the patch matching, so that the algorithm can effectively find distinctive and reliable patch matching characteristics. Attention is converged with the algorithm of the convolutional neural network, and hard areas and soft pixels are jointly learned to achieve target alignment for optimizing the image misalignment condition. The spatiotemporal attention-based algorithm extracts useful region information of each image frame using a plurality of spatial attention algorithms and integrates the output through a temporal attention model, allowing the extraction of useful region information from all frames, improving robustness. However, the performance of the existing alignment techniques is still unsatisfactory due to target occlusion, visual disparity, changes in lighting conditions, and the like.
Disclosure of Invention
The invention aims to provide a pedestrian target alignment method based on an attention mechanism, which aims to solve the problems of target loss and occlusion.
The purpose of the invention is realized as follows:
1. an attention mechanism-based pedestrian target alignment method comprises the following steps:
step 1: calculating a feature map of the image based on a residual error network of an attention mechanism;
step 1.1: and roughly extracting the characteristics of I by using a CNN layer containing the primary volume block.
Step 1.2: obtaining a target intermediate feature tensor X belonging to R by utilizing a first residual block of a residual networkC×H×WWherein R is a real number set, and W, H, C is the width, height and channel number of the feature map respectively;
step 1.3: constructing a spatial relationship perception attention block, taking a C-dimensional feature vector at each spatial position as a feature node, and using the C-dimensional feature vector to learn a spatial attention map with the size of H multiplied by W:
step 1.3.1: scanningSpatial location, representing N feature nodes as xi∈RCWherein i 1.., N;
step 1.3.2: computing the pairwise relationship r between node i and node ji,j:
Wherein ReLU (. cndot.) represents a linear rectification function,s1is a predefined positive integer for controlling the size reduction rate. Similarly, the pairwise relationship from node j to node i is calculated as rj,i. Use (r)i,j,rj,i) To describe xiAnd xjTwo-way relationship between, using affinity matrix Rs∈RN×NRepresenting the pair-wise relationship between all nodes;
step 1.3.3: calculating the relation vector r of the ith characteristic nodei=[Rs(i,:),Rs(:,i)]∈R2NWherein i is 1, 2.. times.n;
step 1.3.4: computing spatial relationship perceptual features yi:
yi=[poolC(Re L U(Wψri)),Re L U(Wφri)]
Wherein the content of the first and second substances,poolC(. cndot.) represents a global average pooling operation along the channel dimension. Thus, the global structure information and the local original information related to the feature can be fully utilized;
step 1.3.5: according to the spatial attention value a of the ith characteristic nodei:
ai=Sigmoid(W2ReLU(W1yi))
Wherein, the weight value W1And W2By 1 × 1 convolutionSigmoid (. cndot.) now indicates that a batch standardization operation is performed;
step 1.3.6: calculating the spatial attention values of all the feature nodes according to the step 1.3.5 to obtain a spatial attention matrix A ═ a1,a2,...,aN]The target intermediate feature tensor X is updated with X ═ X × a.
Step 1.4: constructing a channel relation perception attention block, taking a d-H multiplied by W dimension feature graph at each channel position as a feature node, and learning a C-dimension channel attention vector;
step 1.4.1: scanning the channel position, representing the feature node as zi∈RdWherein i 1.., C;
step 1.4.2: according to the steps 1.3.2-1.3.6, a channel attention matrix B is calculated, and the target intermediate feature tensor X is updated by using X ═ X × B.
Step 1.5: and (5) repeatedly executing the step 1.2 to the step 1.4 until the four residual blocks of the residual network are processed, and obtaining a characteristic diagram with the size of H multiplied by W multiplied by C.
Step 2: and respectively calculating the characteristic maps of the reference pedestrian image and the test pedestrian image according to the step 1. Let j be 1, 2., M be the total number of images in the test pedestrian image set, and j be the current test pedestrian image.
And step 3: global features are computed. Extracting global features by directly applying global pooling on the feature map, and respectively representing the global features of the reference pedestrian image and the jth tested pedestrian image as Q and Pj。
And 4, step 4: local features are calculated. The global pooling in the horizontal direction is applied to extract the local features of each row, and the local features of the reference pedestrian image and the jth test pedestrian image are respectively expressed as q ═ { q ═ q { (q)1,q2,...qH},Where H represents the number of local features.
And 5: calculating the global distance between the reference image and the test image:
where K represents the dimension of the vector.
Step 6: calculating the local distance between the reference image and the test image;
and 7: calculating the final distance between the reference image and the test image:
dfinal(j)=d(Q,Pj)+SH,H
and 8: and (5) repeatedly executing the steps 3 to 7 until all the images in the test image set are traversed.
And step 9: image alignment is accomplished using a minimum distance method.
The invention also includes such features:
2. the step 6 specifically includes:
step 6.1: computing two local features qmAndthe distance betweenWherein m, n belongs to 1,2,3, H, e is Euler number, | ·| electrically non |, O2Is a norm of l 2;
step 6.2: calculating the shortest distance of the local features:
wherein S ism,nIs the distance from the shortest path from (1,1) to (m, n), and the local distance of the two pedestrian images is SH,H。
Compared with the prior art, the invention has the beneficial effects that:
(1) the alignment precision is high;
(2) the global structure information and the local original information of the pedestrian image features are fully utilized;
(3) the problem of blocking the pedestrian target is effectively solved;
(4) the influence of the local feature misalignment problem on the performance of the algorithm is relieved.
Drawings
Fig. 1 is a diagram of a residual error network structure based on an attention mechanism according to the present invention.
Fig. 2 is a flow chart of the pedestrian target alignment technique of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Let I be the original input image containing only 1 pedestrian. The invention provides a pedestrian target alignment technology based on an attention mechanism, which comprises the following specific implementation steps:
step 1: the feature map of the image is calculated using the attention-based residual network shown in fig. 1:
step 1.1: and roughly extracting the characteristics of I by using a CNN layer containing the primary volume block.
Step 1.2: obtaining a target intermediate feature tensor X belonging to R by utilizing a first residual block of a residual networkC×H×WWhere R is a set of real numbers and W, H, C is the width, height, and number of channels of the feature map, respectively.
Step 1.3: constructing a spatial relationship perception attention block, taking a C-dimensional feature vector at each spatial position as a feature node, and using the C-dimensional feature vector to learn a spatial attention map with the size of H multiplied by W:
step 1.3.1: scanning spatial locations, representing N feature nodes as xi∈RCWherein i 1.
Step 1.3.2: calculating the pairwise relationship r between the nodes i and j according to equation (1)i,j:
Wherein ReLU (. cndot.) represents a linear rectification function,s1is a predefined positive integerAnd (4) a number for controlling the size reduction rate. Similarly, the pairwise relationship from node j to node i is calculated as rj,i. Use (r)i,j,rj,i) To describe xiAnd xjTwo-way relationship between, using affinity matrix Rs∈RN×NRepresenting the pairwise relationship between all nodes.
Step 1.3.3: calculating the relation vector r of the ith characteristic nodei=[Rs(i,:),Rs(:,i)]∈R2NWherein i ═ 1, 2.., N.
Step 1.3.4: calculating spatial relationship perception characteristic y according to formula (2)i:
yi=[poolC(Re L U(Wψri)),Re L U(Wφri)] (2)
Wherein the content of the first and second substances,poolC(. cndot.) represents a global average pooling operation along the channel dimension. This makes full use of the global structural information and local raw information associated with the feature.
Step 1.3.5: calculating the spatial attention value a of the ith characteristic node according to the formula (3)i:
ai=Sigmoid(W2ReLU(W1yi)) (3)
Wherein, the weight value W1And W2Sigmoid (·) indicates that a batch normalization operation is performed, which is achieved by convolution of 1 × 1.
Step 1.3.6: calculating the spatial attention values of all the feature nodes according to the step 1.3.5 to obtain a spatial attention matrix A ═ a1,a2,...,aN]The target intermediate feature tensor X is updated with X ═ X × a.
Step 1.4: constructing a channel relation perception attention block, taking a characteristic graph with d being H multiplied by W dimension at each channel position as a characteristic node, and using the characteristic node to learn a channel attention vector of C dimension:
step 1.4.1: scanning the channel position, representing the feature node as zi∈RdWhich is1., C.
Step 1.4.2: according to the steps 1.3.2-1.3.6, a channel attention matrix B is calculated, and the target intermediate feature tensor X is updated by using X ═ X × B.
Step 1.5: and (4) repeatedly executing the steps 1.2-1.4 until the four residual blocks of the residual network are processed, and obtaining a characteristic diagram with the size of H multiplied by W multiplied by C.
Step 2: and respectively calculating the characteristic maps of the reference pedestrian image and the test pedestrian image according to the step 1. Let j be 1, 2., M be the total number of images in the test pedestrian image set, and j be the current test pedestrian image.
And step 3: global features are computed. Extracting global features by directly applying global pooling on the feature map, and respectively representing the global features of the reference pedestrian image and the jth tested pedestrian image as Q and Pj。
And 4, step 4: local features are calculated. The global pooling in the horizontal direction is applied to extract the local features of each row, and the local features of the reference pedestrian image and the jth test pedestrian image are respectively expressed as q ═ { q ═ q { (q)1,q2,...qH},Where H represents the number of local features.
And 5: calculating the global distance between the reference image and the test image according to equation (5):
where K represents the dimension of the vector.
Step 6: calculating the local distance between the reference image and the test image:
step 6.1: computing two local features qmAndthe distance betweenWherein m, n belongs to 1,2,3, H, e is Euler number, | ·| electrically non |, O2Is a norm of l 2.
Step 6.2: the local feature shortest distance is calculated according to equation (6).
Wherein S ism,nIs the distance from the shortest path from (1,1) to (m, n), and the local distance of the two pedestrian images is SH,H。
And 7: the final distance between the reference image and the test image is calculated using equation (7):
dfinal(j)=d(Q,Pj)+SH,H (7)
and 8: and (5) repeatedly executing the steps 3 to 7 until all the images in the test image set are traversed.
And step 9: image alignment is accomplished using a minimum distance method.
Claims (2)
1. A pedestrian target alignment method based on an attention mechanism is characterized by comprising the following steps: the method comprises the following steps:
step 1: calculating a feature map of the image based on a residual error network of an attention mechanism;
step 1.1: and roughly extracting the characteristics of I by using a CNN layer containing the primary volume block.
Step 1.2: obtaining a target intermediate feature tensor X belonging to R by utilizing a first residual block of a residual networkC×H×WWherein R is a real number set, and W, H, C is the width, height and channel number of the feature map respectively;
step 1.3: constructing a spatial relationship perception attention block, taking a C-dimensional feature vector at each spatial position as a feature node, and using the C-dimensional feature vector to learn a spatial attention map with the size of H multiplied by W:
step 1.3.1: scanning spatial locations, representing N feature nodes as xi∈RCWherein i 1.., N;
step 1.3.2: computing the pairwise relationship r between node i and node ji,j:
Wherein ReLU (. cndot.) represents a linear rectification function,s1is a predefined positive integer for controlling the size reduction rate. Similarly, the pairwise relationship from node j to node i is calculated as rj,i. Use (r)i,j,rj,i) To describe xiAnd xjTwo-way relationship between, using affinity matrix Rs∈RN×NRepresenting the pair-wise relationship between all nodes;
step 1.3.3: calculating the relation vector r of the ith characteristic nodei=[Rs(i,:),Rs(:,i)]∈R2NWherein i is 1, 2.. times.n;
step 1.3.4: computing spatial relationship perceptual features yi:
yi=[poolC(ReLU(Wψri)),ReLU(Wφri)]
Wherein the content of the first and second substances,poolC(. cndot.) represents a global average pooling operation along the channel dimension. Thus, the global structure information and the local original information related to the feature can be fully utilized;
step 1.3.5: according to the spatial attention value a of the ith characteristic nodei:
ai=Sigmoid(W2ReLU(W1yi))
Wherein, the weight value W1And W2Realized by convolution of 1 × 1, Sigmoid (·) indicates that batch standardization operation is performed;
step 1.3.6: calculating the spatial attention values of all the feature nodes according to the step 1.3.5 to obtain the spatial annotationThe force matrix A ═ a1,a2,...,aN]The target intermediate feature tensor X is updated with X ═ X × a.
Step 1.4: constructing a channel relation perception attention block, taking a d-H multiplied by W dimension feature graph at each channel position as a feature node, and learning a C-dimension channel attention vector;
step 1.4.1: scanning the channel position, representing the feature node as zi∈RdWherein i 1.., C;
step 1.4.2: according to the steps 1.3.2-1.3.6, a channel attention matrix B is calculated, and the target intermediate feature tensor X is updated by using X ═ X × B.
Step 1.5: and (5) repeatedly executing the step 1.2 to the step 1.4 until the four residual blocks of the residual network are processed, and obtaining a characteristic diagram with the size of H multiplied by W multiplied by C.
Step 2: and respectively calculating the characteristic maps of the reference pedestrian image and the test pedestrian image according to the step 1. Let j be 1, 2., M be the total number of images in the test pedestrian image set, and j be the current test pedestrian image.
And step 3: global features are computed. Extracting global features by directly applying global pooling on the feature map, and respectively representing the global features of the reference pedestrian image and the jth tested pedestrian image as Q and Pj。
And 4, step 4: local features are calculated. The global pooling in the horizontal direction is applied to extract the local features of each row, and the local features of the reference pedestrian image and the jth test pedestrian image are respectively expressed as q ═ { q ═ q { (q)1,q2,...qH},Where H represents the number of local features.
And 5: calculating the global distance between the reference image and the test image:
where K represents the dimension of the vector.
Step 6: calculating the local distance between the reference image and the test image;
and 7: calculating the final distance between the reference image and the test image:
dfinal(j)=d(Q,Pj)+SH,H
and 8: and (5) repeatedly executing the steps 3 to 7 until all the images in the test image set are traversed.
And step 9: image alignment is accomplished using a minimum distance method.
2. A pedestrian target alignment method based on attention mechanism as claimed in claim 1, wherein: the step 6 specifically includes:
step 6.1: computing two local features qmAndthe distance betweenWherein m, n belongs to 1,2,3, H, e is Euler number, | ·| electrically non |, O2Is a norm of l 2;
step 6.2: calculating the shortest distance of the local features:
wherein S ism,nIs the distance from the shortest path from (1,1) to (m, n), and the local distance of the two pedestrian images is SH,H。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111197529.3A CN113947782A (en) | 2021-10-14 | 2021-10-14 | Pedestrian target alignment method based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111197529.3A CN113947782A (en) | 2021-10-14 | 2021-10-14 | Pedestrian target alignment method based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113947782A true CN113947782A (en) | 2022-01-18 |
Family
ID=79330390
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111197529.3A Pending CN113947782A (en) | 2021-10-14 | 2021-10-14 | Pedestrian target alignment method based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113947782A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110070073A (en) * | 2019-05-07 | 2019-07-30 | 国家广播电视总局广播电视科学研究院 | Pedestrian's recognition methods again of global characteristics and local feature based on attention mechanism |
CN110969112A (en) * | 2019-11-28 | 2020-04-07 | 福州大学 | Pedestrian identity alignment method under camera-crossing scene |
CN111160295A (en) * | 2019-12-31 | 2020-05-15 | 广州视声智能科技有限公司 | Video pedestrian re-identification method based on region guidance and space-time attention |
CN111898431A (en) * | 2020-06-24 | 2020-11-06 | 南京邮电大学 | Pedestrian re-identification method based on attention mechanism part shielding |
JP6830707B1 (en) * | 2020-01-23 | 2021-02-17 | 同▲済▼大学 | Person re-identification method that combines random batch mask and multi-scale expression learning |
CN112949565A (en) * | 2021-03-25 | 2021-06-11 | 重庆邮电大学 | Single-sample partially-shielded face recognition method and system based on attention mechanism |
CN113221625A (en) * | 2021-03-02 | 2021-08-06 | 西安建筑科技大学 | Method for re-identifying pedestrians by utilizing local features of deep learning |
-
2021
- 2021-10-14 CN CN202111197529.3A patent/CN113947782A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110070073A (en) * | 2019-05-07 | 2019-07-30 | 国家广播电视总局广播电视科学研究院 | Pedestrian's recognition methods again of global characteristics and local feature based on attention mechanism |
CN110969112A (en) * | 2019-11-28 | 2020-04-07 | 福州大学 | Pedestrian identity alignment method under camera-crossing scene |
CN111160295A (en) * | 2019-12-31 | 2020-05-15 | 广州视声智能科技有限公司 | Video pedestrian re-identification method based on region guidance and space-time attention |
JP6830707B1 (en) * | 2020-01-23 | 2021-02-17 | 同▲済▼大学 | Person re-identification method that combines random batch mask and multi-scale expression learning |
CN111898431A (en) * | 2020-06-24 | 2020-11-06 | 南京邮电大学 | Pedestrian re-identification method based on attention mechanism part shielding |
CN113221625A (en) * | 2021-03-02 | 2021-08-06 | 西安建筑科技大学 | Method for re-identifying pedestrians by utilizing local features of deep learning |
CN112949565A (en) * | 2021-03-25 | 2021-06-11 | 重庆邮电大学 | Single-sample partially-shielded face recognition method and system based on attention mechanism |
Non-Patent Citations (1)
Title |
---|
岳震;陈凯勇;: "局部遮挡条件下的人脸识别", 计算机应用与软件, no. 05, 12 May 2018 (2018-05-12) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110111366B (en) | End-to-end optical flow estimation method based on multistage loss | |
CN109800692B (en) | Visual SLAM loop detection method based on pre-training convolutional neural network | |
WO2022089077A1 (en) | Real-time binocular stereo matching method based on adaptive candidate parallax prediction network | |
CN111639564B (en) | Video pedestrian re-identification method based on multi-attention heterogeneous network | |
CN111723693B (en) | Crowd counting method based on small sample learning | |
CN112818969B (en) | Knowledge distillation-based face pose estimation method and system | |
CN112651262B (en) | Cross-modal pedestrian re-identification method based on self-adaptive pedestrian alignment | |
CN111507275B (en) | Video data time sequence information extraction method and device based on deep learning | |
CN112084895B (en) | Pedestrian re-identification method based on deep learning | |
CN110059597B (en) | Scene recognition method based on depth camera | |
CN111768354A (en) | Face image restoration system based on multi-scale face part feature dictionary | |
CN112183675B (en) | Tracking method for low-resolution target based on twin network | |
CN112633088A (en) | Power station capacity estimation method based on photovoltaic component identification in aerial image | |
CN117058456A (en) | Visual target tracking method based on multiphase attention mechanism | |
CN116452862A (en) | Image classification method based on domain generalization learning | |
CN114463340A (en) | Edge information guided agile remote sensing image semantic segmentation method | |
CN117373062A (en) | Real-time end-to-end cross-resolution pedestrian re-identification method based on joint learning | |
CN114707611B (en) | Mobile robot map construction method, storage medium and equipment based on graph neural network feature extraction and matching | |
CN116246305A (en) | Pedestrian retrieval method based on hybrid component transformation network | |
CN113947782A (en) | Pedestrian target alignment method based on attention mechanism | |
CN116311345A (en) | Transformer-based pedestrian shielding re-recognition method | |
CN113793472B (en) | Image type fire detector pose estimation method based on feature depth aggregation network | |
CN112784800B (en) | Face key point detection method based on neural network and shape constraint | |
CN115578325A (en) | Image anomaly detection method based on channel attention registration network | |
CN112487927B (en) | Method and system for realizing indoor scene recognition based on object associated attention |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |