CN113947782A - Pedestrian target alignment method based on attention mechanism - Google Patents

Pedestrian target alignment method based on attention mechanism Download PDF

Info

Publication number
CN113947782A
CN113947782A CN202111197529.3A CN202111197529A CN113947782A CN 113947782 A CN113947782 A CN 113947782A CN 202111197529 A CN202111197529 A CN 202111197529A CN 113947782 A CN113947782 A CN 113947782A
Authority
CN
China
Prior art keywords
image
feature
pedestrian
attention
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111197529.3A
Other languages
Chinese (zh)
Inventor
郑丽颖
郑薪竹
张钰渤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN202111197529.3A priority Critical patent/CN113947782A/en
Publication of CN113947782A publication Critical patent/CN113947782A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention provides a pedestrian target alignment method based on an attention mechanism, aiming at solving the problems of target loss and shielding, and improving the alignment precision and the alignment performance through process optimization and local feature extraction. The invention has high alignment precision, fully utilizes the global structure information and the local original information of the pedestrian image characteristics, effectively processes the pedestrian target shielding problem and relieves the influence of the local characteristic misalignment problem on the algorithm performance.

Description

Pedestrian target alignment method based on attention mechanism
Technical Field
The invention relates to the technical field of pedestrian target alignment, in particular to a pedestrian target alignment method based on an attention mechanism.
Background
Pedestrian target alignment is a technique that identifies an initial target region from a sequence of pedestrian images and retrieves the target in a subsequent cross-camera cross-scene. The technology can be used as an important supplement of a face recognition technology, and is widely applied to various military and civil fields such as pedestrian detection, video monitoring, action recognition, skeleton key point detection, posture recognition and the like.
In recent years, the problem of pedestrian target alignment, especially in a complex background, has gradually become a research focus, and various pedestrian target alignment techniques and methods have been proposed. Among them, the attention-based method is gaining favor in academic and industrial fields due to its excellent performance and high running speed. The target identification method based on significance learning adds the significance characteristics of the target pedestrian into the patch matching, so that the algorithm can effectively find distinctive and reliable patch matching characteristics. Attention is converged with the algorithm of the convolutional neural network, and hard areas and soft pixels are jointly learned to achieve target alignment for optimizing the image misalignment condition. The spatiotemporal attention-based algorithm extracts useful region information of each image frame using a plurality of spatial attention algorithms and integrates the output through a temporal attention model, allowing the extraction of useful region information from all frames, improving robustness. However, the performance of the existing alignment techniques is still unsatisfactory due to target occlusion, visual disparity, changes in lighting conditions, and the like.
Disclosure of Invention
The invention aims to provide a pedestrian target alignment method based on an attention mechanism, which aims to solve the problems of target loss and occlusion.
The purpose of the invention is realized as follows:
1. an attention mechanism-based pedestrian target alignment method comprises the following steps:
step 1: calculating a feature map of the image based on a residual error network of an attention mechanism;
step 1.1: and roughly extracting the characteristics of I by using a CNN layer containing the primary volume block.
Step 1.2: obtaining a target intermediate feature tensor X belonging to R by utilizing a first residual block of a residual networkC×H×WWherein R is a real number set, and W, H, C is the width, height and channel number of the feature map respectively;
step 1.3: constructing a spatial relationship perception attention block, taking a C-dimensional feature vector at each spatial position as a feature node, and using the C-dimensional feature vector to learn a spatial attention map with the size of H multiplied by W:
step 1.3.1: scanningSpatial location, representing N feature nodes as xi∈RCWherein i 1.., N;
step 1.3.2: computing the pairwise relationship r between node i and node ji,j
Figure BDA0003303718980000021
Wherein ReLU (. cndot.) represents a linear rectification function,
Figure BDA0003303718980000022
s1is a predefined positive integer for controlling the size reduction rate. Similarly, the pairwise relationship from node j to node i is calculated as rj,i. Use (r)i,j,rj,i) To describe xiAnd xjTwo-way relationship between, using affinity matrix Rs∈RN×NRepresenting the pair-wise relationship between all nodes;
step 1.3.3: calculating the relation vector r of the ith characteristic nodei=[Rs(i,:),Rs(:,i)]∈R2NWherein i is 1, 2.. times.n;
step 1.3.4: computing spatial relationship perceptual features yi
yi=[poolC(Re L U(Wψri)),Re L U(Wφri)]
Wherein the content of the first and second substances,
Figure BDA0003303718980000023
poolC(. cndot.) represents a global average pooling operation along the channel dimension. Thus, the global structure information and the local original information related to the feature can be fully utilized;
step 1.3.5: according to the spatial attention value a of the ith characteristic nodei
ai=Sigmoid(W2ReLU(W1yi))
Wherein, the weight value W1And W2By 1 × 1 convolutionSigmoid (. cndot.) now indicates that a batch standardization operation is performed;
step 1.3.6: calculating the spatial attention values of all the feature nodes according to the step 1.3.5 to obtain a spatial attention matrix A ═ a1,a2,...,aN]The target intermediate feature tensor X is updated with X ═ X × a.
Step 1.4: constructing a channel relation perception attention block, taking a d-H multiplied by W dimension feature graph at each channel position as a feature node, and learning a C-dimension channel attention vector;
step 1.4.1: scanning the channel position, representing the feature node as zi∈RdWherein i 1.., C;
step 1.4.2: according to the steps 1.3.2-1.3.6, a channel attention matrix B is calculated, and the target intermediate feature tensor X is updated by using X ═ X × B.
Step 1.5: and (5) repeatedly executing the step 1.2 to the step 1.4 until the four residual blocks of the residual network are processed, and obtaining a characteristic diagram with the size of H multiplied by W multiplied by C.
Step 2: and respectively calculating the characteristic maps of the reference pedestrian image and the test pedestrian image according to the step 1. Let j be 1, 2., M be the total number of images in the test pedestrian image set, and j be the current test pedestrian image.
And step 3: global features are computed. Extracting global features by directly applying global pooling on the feature map, and respectively representing the global features of the reference pedestrian image and the jth tested pedestrian image as Q and Pj
And 4, step 4: local features are calculated. The global pooling in the horizontal direction is applied to extract the local features of each row, and the local features of the reference pedestrian image and the jth test pedestrian image are respectively expressed as q ═ { q ═ q { (q)1,q2,...qH},
Figure BDA0003303718980000031
Where H represents the number of local features.
And 5: calculating the global distance between the reference image and the test image:
Figure BDA0003303718980000032
where K represents the dimension of the vector.
Step 6: calculating the local distance between the reference image and the test image;
and 7: calculating the final distance between the reference image and the test image:
dfinal(j)=d(Q,Pj)+SH,H
and 8: and (5) repeatedly executing the steps 3 to 7 until all the images in the test image set are traversed.
And step 9: image alignment is accomplished using a minimum distance method.
The invention also includes such features:
2. the step 6 specifically includes:
step 6.1: computing two local features qmAnd
Figure BDA0003303718980000033
the distance between
Figure BDA0003303718980000034
Wherein m, n belongs to 1,2,3, H, e is Euler number, | ·| electrically non |, O2Is a norm of l 2;
step 6.2: calculating the shortest distance of the local features:
Figure BDA0003303718980000035
wherein S ism,nIs the distance from the shortest path from (1,1) to (m, n), and the local distance of the two pedestrian images is SH,H
Compared with the prior art, the invention has the beneficial effects that:
(1) the alignment precision is high;
(2) the global structure information and the local original information of the pedestrian image features are fully utilized;
(3) the problem of blocking the pedestrian target is effectively solved;
(4) the influence of the local feature misalignment problem on the performance of the algorithm is relieved.
Drawings
Fig. 1 is a diagram of a residual error network structure based on an attention mechanism according to the present invention.
Fig. 2 is a flow chart of the pedestrian target alignment technique of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Let I be the original input image containing only 1 pedestrian. The invention provides a pedestrian target alignment technology based on an attention mechanism, which comprises the following specific implementation steps:
step 1: the feature map of the image is calculated using the attention-based residual network shown in fig. 1:
step 1.1: and roughly extracting the characteristics of I by using a CNN layer containing the primary volume block.
Step 1.2: obtaining a target intermediate feature tensor X belonging to R by utilizing a first residual block of a residual networkC×H×WWhere R is a set of real numbers and W, H, C is the width, height, and number of channels of the feature map, respectively.
Step 1.3: constructing a spatial relationship perception attention block, taking a C-dimensional feature vector at each spatial position as a feature node, and using the C-dimensional feature vector to learn a spatial attention map with the size of H multiplied by W:
step 1.3.1: scanning spatial locations, representing N feature nodes as xi∈RCWherein i 1.
Step 1.3.2: calculating the pairwise relationship r between the nodes i and j according to equation (1)i,j
Figure BDA0003303718980000041
Wherein ReLU (. cndot.) represents a linear rectification function,
Figure BDA0003303718980000042
s1is a predefined positive integerAnd (4) a number for controlling the size reduction rate. Similarly, the pairwise relationship from node j to node i is calculated as rj,i. Use (r)i,j,rj,i) To describe xiAnd xjTwo-way relationship between, using affinity matrix Rs∈RN×NRepresenting the pairwise relationship between all nodes.
Step 1.3.3: calculating the relation vector r of the ith characteristic nodei=[Rs(i,:),Rs(:,i)]∈R2NWherein i ═ 1, 2.., N.
Step 1.3.4: calculating spatial relationship perception characteristic y according to formula (2)i
yi=[poolC(Re L U(Wψri)),Re L U(Wφri)] (2)
Wherein the content of the first and second substances,
Figure BDA0003303718980000043
poolC(. cndot.) represents a global average pooling operation along the channel dimension. This makes full use of the global structural information and local raw information associated with the feature.
Step 1.3.5: calculating the spatial attention value a of the ith characteristic node according to the formula (3)i
ai=Sigmoid(W2ReLU(W1yi)) (3)
Wherein, the weight value W1And W2Sigmoid (·) indicates that a batch normalization operation is performed, which is achieved by convolution of 1 × 1.
Step 1.3.6: calculating the spatial attention values of all the feature nodes according to the step 1.3.5 to obtain a spatial attention matrix A ═ a1,a2,...,aN]The target intermediate feature tensor X is updated with X ═ X × a.
Step 1.4: constructing a channel relation perception attention block, taking a characteristic graph with d being H multiplied by W dimension at each channel position as a characteristic node, and using the characteristic node to learn a channel attention vector of C dimension:
step 1.4.1: scanning the channel position, representing the feature node as zi∈RdWhich is1., C.
Step 1.4.2: according to the steps 1.3.2-1.3.6, a channel attention matrix B is calculated, and the target intermediate feature tensor X is updated by using X ═ X × B.
Step 1.5: and (4) repeatedly executing the steps 1.2-1.4 until the four residual blocks of the residual network are processed, and obtaining a characteristic diagram with the size of H multiplied by W multiplied by C.
Step 2: and respectively calculating the characteristic maps of the reference pedestrian image and the test pedestrian image according to the step 1. Let j be 1, 2., M be the total number of images in the test pedestrian image set, and j be the current test pedestrian image.
And step 3: global features are computed. Extracting global features by directly applying global pooling on the feature map, and respectively representing the global features of the reference pedestrian image and the jth tested pedestrian image as Q and Pj
And 4, step 4: local features are calculated. The global pooling in the horizontal direction is applied to extract the local features of each row, and the local features of the reference pedestrian image and the jth test pedestrian image are respectively expressed as q ═ { q ═ q { (q)1,q2,...qH},
Figure BDA0003303718980000051
Where H represents the number of local features.
And 5: calculating the global distance between the reference image and the test image according to equation (5):
Figure BDA0003303718980000052
where K represents the dimension of the vector.
Step 6: calculating the local distance between the reference image and the test image:
step 6.1: computing two local features qmAnd
Figure BDA0003303718980000053
the distance between
Figure BDA0003303718980000054
Wherein m, n belongs to 1,2,3, H, e is Euler number, | ·| electrically non |, O2Is a norm of l 2.
Step 6.2: the local feature shortest distance is calculated according to equation (6).
Figure BDA0003303718980000055
Wherein S ism,nIs the distance from the shortest path from (1,1) to (m, n), and the local distance of the two pedestrian images is SH,H
And 7: the final distance between the reference image and the test image is calculated using equation (7):
dfinal(j)=d(Q,Pj)+SH,H (7)
and 8: and (5) repeatedly executing the steps 3 to 7 until all the images in the test image set are traversed.
And step 9: image alignment is accomplished using a minimum distance method.

Claims (2)

1. A pedestrian target alignment method based on an attention mechanism is characterized by comprising the following steps: the method comprises the following steps:
step 1: calculating a feature map of the image based on a residual error network of an attention mechanism;
step 1.1: and roughly extracting the characteristics of I by using a CNN layer containing the primary volume block.
Step 1.2: obtaining a target intermediate feature tensor X belonging to R by utilizing a first residual block of a residual networkC×H×WWherein R is a real number set, and W, H, C is the width, height and channel number of the feature map respectively;
step 1.3: constructing a spatial relationship perception attention block, taking a C-dimensional feature vector at each spatial position as a feature node, and using the C-dimensional feature vector to learn a spatial attention map with the size of H multiplied by W:
step 1.3.1: scanning spatial locations, representing N feature nodes as xi∈RCWherein i 1.., N;
step 1.3.2: computing the pairwise relationship r between node i and node ji,j
Figure FDA0003303718970000011
Wherein ReLU (. cndot.) represents a linear rectification function,
Figure FDA0003303718970000012
s1is a predefined positive integer for controlling the size reduction rate. Similarly, the pairwise relationship from node j to node i is calculated as rj,i. Use (r)i,j,rj,i) To describe xiAnd xjTwo-way relationship between, using affinity matrix Rs∈RN×NRepresenting the pair-wise relationship between all nodes;
step 1.3.3: calculating the relation vector r of the ith characteristic nodei=[Rs(i,:),Rs(:,i)]∈R2NWherein i is 1, 2.. times.n;
step 1.3.4: computing spatial relationship perceptual features yi
yi=[poolC(ReLU(Wψri)),ReLU(Wφri)]
Wherein the content of the first and second substances,
Figure FDA0003303718970000013
poolC(. cndot.) represents a global average pooling operation along the channel dimension. Thus, the global structure information and the local original information related to the feature can be fully utilized;
step 1.3.5: according to the spatial attention value a of the ith characteristic nodei
ai=Sigmoid(W2ReLU(W1yi))
Wherein, the weight value W1And W2Realized by convolution of 1 × 1, Sigmoid (·) indicates that batch standardization operation is performed;
step 1.3.6: calculating the spatial attention values of all the feature nodes according to the step 1.3.5 to obtain the spatial annotationThe force matrix A ═ a1,a2,...,aN]The target intermediate feature tensor X is updated with X ═ X × a.
Step 1.4: constructing a channel relation perception attention block, taking a d-H multiplied by W dimension feature graph at each channel position as a feature node, and learning a C-dimension channel attention vector;
step 1.4.1: scanning the channel position, representing the feature node as zi∈RdWherein i 1.., C;
step 1.4.2: according to the steps 1.3.2-1.3.6, a channel attention matrix B is calculated, and the target intermediate feature tensor X is updated by using X ═ X × B.
Step 1.5: and (5) repeatedly executing the step 1.2 to the step 1.4 until the four residual blocks of the residual network are processed, and obtaining a characteristic diagram with the size of H multiplied by W multiplied by C.
Step 2: and respectively calculating the characteristic maps of the reference pedestrian image and the test pedestrian image according to the step 1. Let j be 1, 2., M be the total number of images in the test pedestrian image set, and j be the current test pedestrian image.
And step 3: global features are computed. Extracting global features by directly applying global pooling on the feature map, and respectively representing the global features of the reference pedestrian image and the jth tested pedestrian image as Q and Pj
And 4, step 4: local features are calculated. The global pooling in the horizontal direction is applied to extract the local features of each row, and the local features of the reference pedestrian image and the jth test pedestrian image are respectively expressed as q ═ { q ═ q { (q)1,q2,...qH},
Figure FDA0003303718970000021
Where H represents the number of local features.
And 5: calculating the global distance between the reference image and the test image:
Figure FDA0003303718970000022
where K represents the dimension of the vector.
Step 6: calculating the local distance between the reference image and the test image;
and 7: calculating the final distance between the reference image and the test image:
dfinal(j)=d(Q,Pj)+SH,H
and 8: and (5) repeatedly executing the steps 3 to 7 until all the images in the test image set are traversed.
And step 9: image alignment is accomplished using a minimum distance method.
2. A pedestrian target alignment method based on attention mechanism as claimed in claim 1, wherein: the step 6 specifically includes:
step 6.1: computing two local features qmAnd
Figure FDA0003303718970000023
the distance between
Figure FDA0003303718970000024
Wherein m, n belongs to 1,2,3, H, e is Euler number, | ·| electrically non |, O2Is a norm of l 2;
step 6.2: calculating the shortest distance of the local features:
Figure FDA0003303718970000031
wherein S ism,nIs the distance from the shortest path from (1,1) to (m, n), and the local distance of the two pedestrian images is SH,H
CN202111197529.3A 2021-10-14 2021-10-14 Pedestrian target alignment method based on attention mechanism Pending CN113947782A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111197529.3A CN113947782A (en) 2021-10-14 2021-10-14 Pedestrian target alignment method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111197529.3A CN113947782A (en) 2021-10-14 2021-10-14 Pedestrian target alignment method based on attention mechanism

Publications (1)

Publication Number Publication Date
CN113947782A true CN113947782A (en) 2022-01-18

Family

ID=79330390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111197529.3A Pending CN113947782A (en) 2021-10-14 2021-10-14 Pedestrian target alignment method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN113947782A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070073A (en) * 2019-05-07 2019-07-30 国家广播电视总局广播电视科学研究院 Pedestrian's recognition methods again of global characteristics and local feature based on attention mechanism
CN110969112A (en) * 2019-11-28 2020-04-07 福州大学 Pedestrian identity alignment method under camera-crossing scene
CN111160295A (en) * 2019-12-31 2020-05-15 广州视声智能科技有限公司 Video pedestrian re-identification method based on region guidance and space-time attention
CN111898431A (en) * 2020-06-24 2020-11-06 南京邮电大学 Pedestrian re-identification method based on attention mechanism part shielding
JP6830707B1 (en) * 2020-01-23 2021-02-17 同▲済▼大学 Person re-identification method that combines random batch mask and multi-scale expression learning
CN112949565A (en) * 2021-03-25 2021-06-11 重庆邮电大学 Single-sample partially-shielded face recognition method and system based on attention mechanism
CN113221625A (en) * 2021-03-02 2021-08-06 西安建筑科技大学 Method for re-identifying pedestrians by utilizing local features of deep learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070073A (en) * 2019-05-07 2019-07-30 国家广播电视总局广播电视科学研究院 Pedestrian's recognition methods again of global characteristics and local feature based on attention mechanism
CN110969112A (en) * 2019-11-28 2020-04-07 福州大学 Pedestrian identity alignment method under camera-crossing scene
CN111160295A (en) * 2019-12-31 2020-05-15 广州视声智能科技有限公司 Video pedestrian re-identification method based on region guidance and space-time attention
JP6830707B1 (en) * 2020-01-23 2021-02-17 同▲済▼大学 Person re-identification method that combines random batch mask and multi-scale expression learning
CN111898431A (en) * 2020-06-24 2020-11-06 南京邮电大学 Pedestrian re-identification method based on attention mechanism part shielding
CN113221625A (en) * 2021-03-02 2021-08-06 西安建筑科技大学 Method for re-identifying pedestrians by utilizing local features of deep learning
CN112949565A (en) * 2021-03-25 2021-06-11 重庆邮电大学 Single-sample partially-shielded face recognition method and system based on attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
岳震;陈凯勇;: "局部遮挡条件下的人脸识别", 计算机应用与软件, no. 05, 12 May 2018 (2018-05-12) *

Similar Documents

Publication Publication Date Title
CN110111366B (en) End-to-end optical flow estimation method based on multistage loss
CN109800692B (en) Visual SLAM loop detection method based on pre-training convolutional neural network
WO2022089077A1 (en) Real-time binocular stereo matching method based on adaptive candidate parallax prediction network
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
CN111723693B (en) Crowd counting method based on small sample learning
CN112818969B (en) Knowledge distillation-based face pose estimation method and system
CN112651262B (en) Cross-modal pedestrian re-identification method based on self-adaptive pedestrian alignment
CN111507275B (en) Video data time sequence information extraction method and device based on deep learning
CN112084895B (en) Pedestrian re-identification method based on deep learning
CN110059597B (en) Scene recognition method based on depth camera
CN111768354A (en) Face image restoration system based on multi-scale face part feature dictionary
CN112183675B (en) Tracking method for low-resolution target based on twin network
CN112633088A (en) Power station capacity estimation method based on photovoltaic component identification in aerial image
CN117058456A (en) Visual target tracking method based on multiphase attention mechanism
CN116452862A (en) Image classification method based on domain generalization learning
CN114463340A (en) Edge information guided agile remote sensing image semantic segmentation method
CN117373062A (en) Real-time end-to-end cross-resolution pedestrian re-identification method based on joint learning
CN114707611B (en) Mobile robot map construction method, storage medium and equipment based on graph neural network feature extraction and matching
CN116246305A (en) Pedestrian retrieval method based on hybrid component transformation network
CN113947782A (en) Pedestrian target alignment method based on attention mechanism
CN116311345A (en) Transformer-based pedestrian shielding re-recognition method
CN113793472B (en) Image type fire detector pose estimation method based on feature depth aggregation network
CN112784800B (en) Face key point detection method based on neural network and shape constraint
CN115578325A (en) Image anomaly detection method based on channel attention registration network
CN112487927B (en) Method and system for realizing indoor scene recognition based on object associated attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination