CN111832348A - Pedestrian re-identification method based on pixel and channel attention mechanism - Google Patents

Pedestrian re-identification method based on pixel and channel attention mechanism Download PDF

Info

Publication number
CN111832348A
CN111832348A CN201910310802.5A CN201910310802A CN111832348A CN 111832348 A CN111832348 A CN 111832348A CN 201910310802 A CN201910310802 A CN 201910310802A CN 111832348 A CN111832348 A CN 111832348A
Authority
CN
China
Prior art keywords
channel
pedestrian
pixel
information
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910310802.5A
Other languages
Chinese (zh)
Other versions
CN111832348B (en
Inventor
王敏杰
李现�
张加焕
肖江剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Institute of Material Technology and Engineering of CAS
Original Assignee
Ningbo Institute of Material Technology and Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Institute of Material Technology and Engineering of CAS filed Critical Ningbo Institute of Material Technology and Engineering of CAS
Priority to CN201910310802.5A priority Critical patent/CN111832348B/en
Publication of CN111832348A publication Critical patent/CN111832348A/en
Application granted granted Critical
Publication of CN111832348B publication Critical patent/CN111832348B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pedestrian re-identification method based on a pixel and channel attention mechanism, which comprises the following steps of: extracting the global characteristics of the pedestrian according to the bounding box (search box) of the person; averagely dividing a pedestrian picture into two parts and three parts, and respectively extracting local features of pedestrians; and matching the extracted character features with the character information in the Gallery to find out the required character information. The method utilizes the channel and the pixel attention module to extract the features, thereby effectively reducing the influence of background information on a retrieval result; meanwhile, middle-layer supervision is further designed for the neural network, and in the feature extraction process, a multi-loss function is used for supervising middle-layer feature information to accelerate network convergence; the pedestrian re-identification network based on the channel attention mechanism, the pixel attention mechanism and the intermediate layer supervision can effectively delete redundant information in the character bounding box, so that character information is effectively aggregated, and the retrieval precision is obviously improved.

Description

Pedestrian re-identification method based on pixel and channel attention mechanism
Technical Field
The invention relates to a pedestrian re-identification method, in particular to a pedestrian re-identification method based on a pixel and channel attention mechanism, and belongs to the technical field of image processing.
Background
At present, various criminal behaviors at home and abroad pose a very great threat to the sustainable and stable development of society. In places with large people flow, such as shopping malls, stations, airports, pedestrian streets and the like, monitoring equipment with large and small sizes are distributed, but how to accurately find out people or information which are needed by people from the monitoring information still presents a great challenge. Especially in criminal investigation work, the policeman needs to find criminal suspect information from a large amount of long-time monitoring information, know the conditions in time and control the criminal suspect information. However, the monitoring information is huge in quantity, complicated in content, and small in angle of view of monitoring, and it becomes very difficult to find out the target person quickly and accurately. Although the face recognition technology is mature at present, the face recognition technology is widely applied to various fields. However, in the surveillance video, due to the problems of the resolution and the shooting angle of the camera, people cannot capture clear and effective face pictures, and people information cannot be retrieved by using a face recognition technology.
In order to solve the problem of person retrieval under complex conditions, a pedestrian re-identification technology is also called as pedestrian re-identification technology. The technology uses a computer to retrieve the character information, and can save a large amount of manpower and material resources. With the development of deep learning, a re-recognition method based on deep learning also becomes the mainstream of pedestrian re-recognition technology. The existing re-identification method based on deep learning is mainly divided into the following five categories: the re-identification method is based on characterization learning, metric learning, local feature, video sequence and GAN mapping.
These methods are widely used in human re-identification studies, but they also have many problems. Based on the method of characterization learning, global features are used as feature vectors, so that many detail features are lost in feature extraction, and errors occur in retrieval results. The method based on metric learning is to compare the similarity distance between two pictures through a neural network, and how to accurately calculate the similarity between the pictures is still a subject to be researched. The method divides a figure picture into a plurality of parts in the vertical direction, and then extracts the local features of the picture respectively. However, when dividing pictures, the dividing is often inaccurate due to the posture of the person, and the like, and the system accuracy is seriously affected. Video sequence-based re-identification techniques also require further exploration in the problem of how to remove redundant frames. At present, pictures generated by the GAN-based method can only be used as negative samples, and the distortion is relatively serious.
In addition to the drawbacks of the above methods, the low resolution, occlusion, view angle, posture and illumination variation of the camera can cause many adverse effects on the re-recognition system. At present, the pedestrian re-identification method based on deep learning uses pooling operation to perform data dimension reduction on feature extraction, but all channels and pixel information in a to-be-treated picture are all treated in the same way no matter maximum pooling or average pooling is adopted. Particularly, a bounding box (search box) contains person information and background information, which cannot be distinguished by a neural network, so that the background information is also used as a part of the person characteristics during feature extraction, which may have a great negative effect on the accuracy of the entire re-recognition system. How to effectively reduce the influence of background information on the re-identification technology is a great challenge.
In order to effectively reduce the influence of background information on a retrieval result, the invention provides a method for extracting features by utilizing a channel and a pixel attention module. Before the maximum pooling and the average pooling, a channel and a pixel attention module are applied to delete redundant information and improve the effectiveness of the picture feature vector; meanwhile, the invention extracts the global and local characteristics of the pedestrian based on the neural network, further designs middle layer supervision for the neural network, and uses a multi-loss function to supervise the middle layer characteristic information in the characteristic extraction process, thereby quickening the network convergence and improving the retrieval precision.
Disclosure of Invention
The invention mainly aims to provide a pedestrian re-identification method based on a pixel and channel attention mechanism so as to overcome the defects in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:
extracting the global characteristics of the pedestrian according to the bounding box (search box) of the person;
averagely dividing a pedestrian picture into two parts and three parts, and respectively extracting local features of pedestrians;
and matching the extracted character features with the character information in the Gallery to find out the required character information.
Preferably, global and local features of the pedestrian are extracted based on the neural network, the extracted global features of the pedestrian include color and edge features, and the extracted local features of the pedestrian include color and edge features of different regions of the pedestrian in the vertical direction.
Preferably, in the process of extracting the local features of the pedestrian, the channel attention module and the pixel attention module are used for aggregating the character feature information, the extracted character features are character feature information obtained by feature aggregation through a neural network, and the character information in the Gallery is character feature information output after pictures in the Gallery are input into a trained model.
Preferably, the extracting global and local features of the pedestrian based on the neural network specifically includes:
using a ResNet-50 network as a basic network to extract picture characteristics, and using the first three layers of the ResNet-50 network; then dividing the whole network into three branches, extracting global features of the image in the first branch, dividing the feature tensor into two parts in the vertical direction by the second branch, and dividing the feature tensor into three parts in the vertical direction by the third branch; then, the channel attention module is used for aggregating the characteristic information and deleting redundant channel information; then using maximal pooling to reduce dimensions; finally, using 1 × 1 convolutional layer, reducing the dimension of the feature vector from 2048 to 256;
the first three layers of the ResNet network, i.e., layer2, layer3, are followed by middle layer supervision, in which pixel attention modules are used to reduce the value of background pixels and increase the value of human pixels.
Preferably, the channel attention module is implemented as follows:
let the size of the input tensor be H × W × C, and be denoted as X ═ X1,x2,…,xc]Wherein H represents the height of the image, W represents the width of the image, and C represents the channel;
the first step is as follows: reducing the dimension of the characteristic information of each channel, and taking the characteristic of each channel after dimension reduction as FcTo carry out the presentation of the contents,
Figure BDA0002030658030000031
wherein x isc(i, j) is the value at position (i, j) on channel c, and the formula averages tensor in each channel, so that the characteristic aggregation effect can be achieved;
the second step is that: filtering each channel by using a filter, and deleting redundant information;
Figure BDA0002030658030000032
wherein, ω iscRepresents the weight given to each channel, FcRepresents the tensor value of the c channel, f1Represents a filtering operation;
the third step: performing dimension increasing operation;
Figure BDA0002030658030000033
wherein the content of the first and second substances,
Figure BDA0002030658030000034
weight for each channel, ZcFor the final weight of each channel, f2Is an operation of ascending dimensionA function, representing a convolution operation;
the fourth step: the source tensor is weighted;
Figure BDA0002030658030000035
preferably, the pixel attention module is implemented as follows:
let the size of the input tensor be H × W × C, and be denoted as Y ═ Y1,y2,…,yc]Wherein H represents the height of the image, W represents the width of the image, and C represents the channel;
the first step is as follows: compressing the number of channels to 1 based on the following formula (5) for subsequent processing;
Figure BDA0002030658030000041
the second step is that: rearranging the tensor values;
Eα=g0(D),α=3·j+i (6)
the third step: screening is carried out;
{I1,I2,…,Ic}=g1({η1,η2,…,ηα}·{E1,E2,…,Eα}) (7)
{J1,J2,…,Jα}=g2({γ1,γ2,…,γN}.{I1,I2,…,IN}) (8)
the fourth step: restoring the obtained vector to the original mapsize;
K=g4(J) (9)
the fifth step: assigning a weight to each pixel;
Yresult-c(i,j)=K(i,j)·Y(i,j) (10)。
compared with the prior art, the invention has the advantages that:
(1) by adopting the channel and pixel attention module to extract the features, the channel and pixel attention module is applied to delete redundant information and improve the effectiveness of the picture feature vector before the maximum pooling and average pooling operations; meanwhile, the invention extracts the global and local characteristics of the pedestrian based on the neural network, further designs middle layer supervision for the neural network, and uses a multi-loss function to supervise the middle layer characteristic information in the characteristic extraction process, thereby quickening the network convergence and improving the retrieval precision;
(2) the invention provides an innovative pedestrian re-identification network based on a channel attention mechanism, a pixel attention mechanism and intermediate layer supervision. The network can effectively delete redundant information in the character bounding box, so that the character information is effectively aggregated, and the retrieval precision is obviously improved;
(3) the invention uses three data sets of Market1501, DukeMTMC-reiD and CUHK03-NP to verify the experimental effect, and the result shows that compared with other methods, the re-identification network provided by the invention has the advantages that the two indexes of CMC and Map are remarkably improved, especially on the CUHK03-NP data set.
Drawings
FIG. 1 is a schematic diagram of a main workflow of pedestrian re-identification in an exemplary embodiment of the present invention;
FIG. 2 is a schematic diagram of a re-identification network structure including a channel and pixel attention mechanism in an exemplary embodiment of the invention;
FIG. 3 is a block diagram of a channel attention module in accordance with an exemplary embodiment of the present invention;
FIG. 4 is an attentionmap of a channel attention module in an exemplary embodiment of the invention;
FIG. 5 is a block diagram of a pixel attention module in accordance with an exemplary embodiment of the present invention;
FIG. 6 is an attention map of a pixel attention module in an exemplary embodiment of the invention;
FIG. 7 is a diagram illustrating the results of the search on the data sets Market1501, DukeMTMC-reiD and CUHK03-NP in accordance with an exemplary embodiment of the present invention.
Detailed Description
In view of the deficiencies in the prior art, the inventors of the present invention have made extensive studies and extensive practices to provide technical solutions of the present invention. The technical solution, its implementation and principles, etc. will be further explained as follows.
Referring to fig. 1, in fig. 1, CA represents a channel attention module, PA represents a pixel attention module, and a pedestrian re-identification method based on a pixel and channel attention mechanism includes:
firstly, extracting the global characteristics of the pedestrian according to a bounding box of the person;
then, the pedestrian picture is divided into two parts and three parts on average, local features of pedestrians are extracted respectively, and in the process, the pedestrian feature information is aggregated by using a channel and a pixel attention module;
and then matching the extracted character features with the character information in the Gallery to find out the required character information.
The extracted global features of the pedestrian mainly include features such as color and edge, and the local features of the pedestrian refer to features such as color and edge of different areas of the pedestrian in the vertical direction.
The character information in the Gallery specifically refers to character feature information output after pictures in the Gallery are input into a trained model. The extracted human features refer to human feature information obtained by feature aggregation through a neural network.
The invention extracts global and local features based on a neural network, and fig. 2 is a structure diagram of a re-recognition network including channels and a pixel attention mechanism, wherein it can be seen in the structure diagram of the whole network that the upper layer and three branch networks of the main network extract global features of people, and the middle layer and the lower layer of the main network extract local features of people.
The specific details of the overall neural network are described below:
(1) the overall network structure, as shown in fig. 2, in the figure, PA is a pixel attention model, CA is a channel attention model, triple _ Loss is a ternary Loss function, CrossEntropy Loss is a cross entropy Loss function, and Sum _ Loss is a total Loss function; the network uses the ResNet-50 network as the base network to extract picture features. The difference from the base network is that we only use the first three layers of the ResNet-50 network, after which we divide the entire network into three branches. In the first branch we extract the global features of the image, the second branch divides the feature tensor into two parts in the vertical direction, and the third branch divides the feature tensor into three parts in the vertical direction. We then used the channel attention module to aggregate the feature information, remove redundant channel information, then use max pooling to reduce the dimensions, and finally use 1 x 1 convolutional layer to reduce the dimensions of the feature vector from 2048 to 256. Also as shown in fig. 2, we add middle layer supervision after layer1, layer2, and layer3, where we use the pixel attention module to reduce the value of background pixels and increase the value of person pixels. The dimensions of the network profile are as shown in table 1.
Numbering Module Feature size Dimension (d) of
1 Layed 96×32 256
2 Layer2 48×16 512
3 Layer3 24×8 1024
4 Branch_Global 12×4 2048
5 Branch_Partl 24×8 2048
6 Branch_Part2 24×8 2048
7 Channel Attention-l 12×4 2048
8 Channel Attention-2 24×8 2048
9 Channel Attention-3 24×8 2048
10 Pixel Attention-1 96×32 256
11 Pixel Attention-2 48×16 512
12 Pixel Attention-3 24×8 1024
Table 1. for the network profile information, the resolution of the input picture is set to 384 × 128.
(2) The channel attention module is structured as shown in fig. 3.
Before that, the cnn-based convolutional neural network gives the same weight to each channel of each tensor, but the same weight is different from the actual situation, redundant channel information cannot be deleted due to the same weight, and finally noise enters the final feature vector, so that the retrieval result is influenced. The key of the channel attention mechanism is how to endow each channel with different weight values; FIG. 3 is a diagram of the channel attention model structure that we have designed.
As shown in fig. 3,. AvgPool2d is an adaptive pooling layer, and Conv2d is a convolutional layer; let us denote the size of the input tensor as H × W × C, and denote X ═ X1,x2,…,xc]In the first step, we need to perform dimension reduction on the feature information of each channel. The feature of each channel after dimensionality reduction is represented by FcTo carry out the presentation of the contents,
Figure BDA0002030658030000071
wherein xc(i, j) is the value at location (i, j) on channel c. The formula averages tensors in each channel, and the effect of feature aggregation can be achieved.
Then, the filter is used for filtering each channel, and redundant information is deleted.
Figure BDA0002030658030000072
In the formula (2), ωcRepresents the weight given to each channel, FcRepresents the tensor value of the c channel, f1Representing a filtering operation.
And then performing dimension increasing operation.
Figure BDA0002030658030000073
In the formula (3)
Figure BDA0002030658030000074
Weight for each channel, ZcFor the final weight of each channel, f2Is a rising dimension operation function, which represents the convolution operation in the structure chart; and finally, giving weight to the source tensor.
Figure BDA0002030658030000075
Fig. 4 is an attention map of a channel attention module, in which "Input image" is an Input image of a model, and it can be known from the overall network structure diagram that we use the channel attention module in the upper, middle and lower branches of the main network; "No-CA 1" is the attention feature map of the non-channel attention model, and "CA 1" is the attention feature map after adding the channel attention model;
the right 6 images in fig. 4 show the feature aggregation effect of the model after the attention module is used, and the highlight part in the figure represents that the features of the part have important influence on the retrieval result. We can see that after using CA (channel attention module), the neural network can effectively delete the background information, the character features are strengthened, and the search result is positively influenced.
(3) The pixel attention module is shown in fig. 5.
In the present invention, we apply the pixel attention module to the middle monitor branch, and as with the channel attention, we set the size of the input tensor to H × W × C, and denote Y ═ C1,y2,…,yc]The specific operation of the first step is shown in the following formula,
Figure BDA0002030658030000081
this operation compresses the channel number to 1 for subsequent processing. The tensor values are then rearranged as shown in fig. 5.
Eα=g0(D),α=3·j+i (6)
We then screened it, similar to the channel attention.
{I1,I2,…,Ic}=g1({η1,η2,…,ηα}·{E1,E2,…,Eα}) (7)
{J1,J2,…,Jα}=g2({γ1,γ2,…,γN}·{I1,I2,…,IN}) (8)
The resulting vector is then restored to the original mapsize.
K=g4(J) (9)
Finally we assign a weight to each pixel.
Yresult-c(i,j)=K(i,j)·Y(i,j) (10)。
FIG. 6 is an attention map of a pixel attention module, similar to the channel attention map, we use the pixel attention module in the three branches of layer, layer2 and layer 3; wherein, No-PA1 is the attention feature map of the non-pixel attention model, and PA1 is the attention feature map after the pixel attention model is added; as is apparent from fig. 6, after the pixel attention module is used, the environmental information is effectively subtracted, and the feature information of the person is further enhanced, so that the retrieval result is enhanced.
The invention provides an innovative pedestrian re-identification network based on a channel attention mechanism, a pixel attention mechanism and intermediate layer supervision. The network can effectively delete redundant information in the character bounding box, so that the character information is effectively aggregated, and the retrieval precision is obviously improved.
(4) Technical effects of the invention
The invention mainly uses three data sets of Markelt 501, DukeMTMC-reiD and CUHK03-NP to verify the experimental effect. Tables 2-4 show the results of the comparisons on the data sets Market1501, DukeMTMC-reiD, CUHK03-NP, respectively. Wherein RK stands for re-ranking algorithm.
Figure BDA0002030658030000091
TABLE 2 comparison of the results on the data set Market1501, RK stands for the re-ranking algorithm
Figure BDA0002030658030000092
Figure BDA0002030658030000101
TABLE 3 comparison of data sets DukeMTMC-relD, RK stands for re-ranking algorithm
Figure BDA0002030658030000102
TABLE 4 comparison of the results on the data set CUHK03-NP, RK stands for re-ranking algorithm
From tables 2-4, it can be seen that the re-identification network in the present invention has significantly improved both CMC and Map indexes compared with other methods, especially on the CUHK03-NP data set, the accuracies on CUHK03-labeled and CUHK 03-protected respectively reach rank1/mAP 80.9/78.7 and rank1/mAP 78.9/76.4, and the effect is far superior to other re-ID methods.
Table 5 shows the results of ablation experiments, which respectively test the effects of three network structures including backbone, backbone + CA, and backbone + CA + PA on the data sets of DukeMTMC-reID and CUHK03, and it can be seen that the CA and PA modules provided by the present invention have significant effects on improving the search effect of the original neural network.
Figure BDA0002030658030000111
TABLE 5 ablation test results
FIG. 7 is a graph of the search results on the datasets Markelt 501, DukeMTMC-reiD and CUHK03-NP using the present invention.
It should be understood that the above-mentioned embodiments are merely illustrative of the technical concepts and features of the present invention, which are intended to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and therefore, the protection scope of the present invention is not limited thereby. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (6)

1. A pedestrian re-identification method based on a pixel and channel attention mechanism is characterized by comprising the following steps:
extracting the global features of the pedestrians according to the retrieval frame of the person;
averagely dividing a pedestrian picture into two parts and three parts, and respectively extracting local features of pedestrians;
and matching the extracted character features with the character information in the Gallery to find out the required character information.
2. The method of claim 1 for pedestrian re-identification based on a pixel and channel attention mechanism, comprising: the method comprises the steps of extracting global and local features of the pedestrian based on a neural network, wherein the extracted global features of the pedestrian comprise color and edge features, and the extracted local features of the pedestrian comprise color and edge features of different areas of the pedestrian in the vertical direction.
3. The method of claim 2, comprising: the method comprises the steps that a channel attention module and a pixel attention module are used for aggregating character feature information in the process of extracting the local features of pedestrians, the extracted character features are character feature information obtained by feature aggregation through a neural network, and the character information in the Gallery is the character feature information output after pictures in the Gallery are input into a trained model.
4. The pedestrian re-identification method based on the pixel and channel attention mechanism according to claim 3, wherein the extracting global and local features of the pedestrian based on the neural network specifically comprises:
using a ResNet-50 network as a basic network to extract picture characteristics, and using the first three layers of the ResNet-50 network; then dividing the whole network into three branches, extracting global features of the image in the first branch, dividing the feature tensor into two parts in the vertical direction by the second branch, and dividing the feature tensor into three parts in the vertical direction by the third branch; then, the channel attention module is used for aggregating the characteristic information and deleting redundant channel information; then using maximal pooling to reduce dimensions; finally, using 1 × 1 convolutional layer, reducing the dimension of the feature vector from 2048 to 256;
the first three layers of the ResNet network, i.e., layer1, layer2, and layer3, are followed by middle layer supervision, in which pixel attention modules are used to reduce the value of background pixels and increase the value of human pixels.
5. The pedestrian re-identification method based on the pixel and channel attention mechanism as claimed in claim 3, wherein the channel attention module is implemented as follows:
let the size of the input tensor be H × W × C, and be denoted as X ═ X1,x2,…,xc]Wherein H represents the height of the image, W represents the width of the image, and C represents the channel;
the first step is as follows: reducing the dimension of the characteristic information of each channel, and taking the characteristic of each channel after dimension reduction as FcTo carry out the presentation of the contents,
Figure FDA0002030658020000021
wherein x isc(i, j) is the value at position (i, j) on channel c, and the formula averages tensor in each channel, so that the characteristic aggregation effect can be achieved;
the second step is that: filtering each channel by using a filter, and deleting redundant information;
Figure FDA0002030658020000022
wherein, ω iscRepresents the weight given to each channel, FcRepresents the tensor value of the c channel, f1Represents a filtering operation;
the third step: performing dimension increasing operation;
Figure FDA0002030658020000023
wherein the content of the first and second substances,
Figure FDA0002030658020000024
weight for each channel, ZcFor the final weight of each channel, f2Is a rising dimensional operation function, representing a convolution operation;
the fourth step: the source tensor is weighted;
Figure FDA0002030658020000025
6. the method of claim 3, wherein the pixel attention module is implemented as follows:
let the size of the input tensor be H × W × C, and be denoted as Y ═ Y1,y2,…,yc]Wherein H represents the height of the image, W represents the width of the image, and C represents the channel;
the first step is as follows: compressing the number of channels to 1 based on the following formula (5) for subsequent processing;
Figure FDA0002030658020000026
the second step is that: rearranging the tensor values;
Eα=g0(D),α=3·j+i (6)
the third step: screening is carried out;
{I1,I2,…,Ic}=g1({η1,η2,…,ηα}·{E1,E2,…,Eα}) (7)
{J1,J2,…,Jα}=g2({γ1,γ2,…,γN}·{I1,I2,…,IN}) (8)
the fourth step: restoring the obtained vector into an original mapsize, wherein the mapsize is the size of the feature map;
K=g4(J) (9)
the fifth step: assigning a weight to each pixel;
Yresult-c(i,j)=K(i,j)·Y(i,j) (10)。
CN201910310802.5A 2019-04-17 2019-04-17 Pedestrian re-identification method based on pixel and channel attention mechanism Active CN111832348B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910310802.5A CN111832348B (en) 2019-04-17 2019-04-17 Pedestrian re-identification method based on pixel and channel attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910310802.5A CN111832348B (en) 2019-04-17 2019-04-17 Pedestrian re-identification method based on pixel and channel attention mechanism

Publications (2)

Publication Number Publication Date
CN111832348A true CN111832348A (en) 2020-10-27
CN111832348B CN111832348B (en) 2022-05-06

Family

ID=72914987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910310802.5A Active CN111832348B (en) 2019-04-17 2019-04-17 Pedestrian re-identification method based on pixel and channel attention mechanism

Country Status (1)

Country Link
CN (1) CN111832348B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836646A (en) * 2021-02-05 2021-05-25 华南理工大学 Video pedestrian re-identification method based on channel attention mechanism and application
CN112884680A (en) * 2021-03-26 2021-06-01 南通大学 Single image defogging method using end-to-end neural network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832672A (en) * 2017-10-12 2018-03-23 北京航空航天大学 A kind of pedestrian's recognition methods again that more loss functions are designed using attitude information
CN108510012A (en) * 2018-05-04 2018-09-07 四川大学 A kind of target rapid detection method based on Analysis On Multi-scale Features figure

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832672A (en) * 2017-10-12 2018-03-23 北京航空航天大学 A kind of pedestrian's recognition methods again that more loss functions are designed using attitude information
CN108510012A (en) * 2018-05-04 2018-09-07 四川大学 A kind of target rapid detection method based on Analysis On Multi-scale Features figure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TIANSHENG GUO等: "Deep Network with Spatial and Channel Attention for Person Re-identification", 《2018IEEE VISUAL COMMUNICATION AND IMAGE PROCESSING》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836646A (en) * 2021-02-05 2021-05-25 华南理工大学 Video pedestrian re-identification method based on channel attention mechanism and application
CN112836646B (en) * 2021-02-05 2023-04-28 华南理工大学 Video pedestrian re-identification method based on channel attention mechanism and application
CN112884680A (en) * 2021-03-26 2021-06-01 南通大学 Single image defogging method using end-to-end neural network

Also Published As

Publication number Publication date
CN111832348B (en) 2022-05-06

Similar Documents

Publication Publication Date Title
CN109829443B (en) Video behavior identification method based on image enhancement and 3D convolution neural network
CN110348376B (en) Pedestrian real-time detection method based on neural network
CN106682108B (en) Video retrieval method based on multi-mode convolutional neural network
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN108304808A (en) A kind of monitor video method for checking object based on space time information Yu depth network
CN110378849B (en) Image defogging and rain removing method based on depth residual error network
CN107330390B (en) People counting method based on image analysis and deep learning
CN109685045B (en) Moving target video tracking method and system
CN109886159B (en) Face detection method under non-limited condition
CN111709331B (en) Pedestrian re-recognition method based on multi-granularity information interaction model
CN111832348B (en) Pedestrian re-identification method based on pixel and channel attention mechanism
Ma et al. Image-based air pollution estimation using hybrid convolutional neural network
TW201308254A (en) Motion detection method for comples scenes
CN113792606A (en) Low-cost self-supervision pedestrian re-identification model construction method based on multi-target tracking
CN114627269A (en) Virtual reality security protection monitoring platform based on degree of depth learning target detection
CN110866453B (en) Real-time crowd steady state identification method and device based on convolutional neural network
CN112164010A (en) Multi-scale fusion convolution neural network image defogging method
CN111507416A (en) Smoking behavior real-time detection method based on deep learning
CN115171183A (en) Mask face detection method based on improved yolov5
CN105701515A (en) Face super-resolution processing method and system based on double-layer manifold constraint
CN110751667A (en) Method for detecting infrared dim small target under complex background based on human visual system
CN105930789A (en) Human body behavior recognition based on logarithmic Euclidean space BOW (bag of words) model
CN117710888A (en) Method and system for re-identifying blocked pedestrians
CN108597172A (en) A kind of forest fire recognition methods, device, electronic equipment and storage medium
CN108764287A (en) Object detection method and system based on deep learning and grouping convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant