CN113792669B - Pedestrian re-recognition baseline method based on hierarchical self-attention network - Google Patents

Pedestrian re-recognition baseline method based on hierarchical self-attention network Download PDF

Info

Publication number
CN113792669B
CN113792669B CN202111087471.7A CN202111087471A CN113792669B CN 113792669 B CN113792669 B CN 113792669B CN 202111087471 A CN202111087471 A CN 202111087471A CN 113792669 B CN113792669 B CN 113792669B
Authority
CN
China
Prior art keywords
image
pedestrian
swin
block
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111087471.7A
Other languages
Chinese (zh)
Other versions
CN113792669A (en
Inventor
陈炳才
张繁盛
聂冰洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202111087471.7A priority Critical patent/CN113792669B/en
Publication of CN113792669A publication Critical patent/CN113792669A/en
Application granted granted Critical
Publication of CN113792669B publication Critical patent/CN113792669B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computational Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a pedestrian re-identification baseline method based on a hierarchical self-attention network, and belongs to the field of computer vision. In the invention, the Swin Transformer is creatively introduced into the pedestrian re-recognition field as a main network, and the weighted sum of the ID loss and the Circle loss is used as a loss function, so that the feature extraction capability is improved while the simple structure is ensured through effective data preprocessing and reasonable parameter adjustment. Compared with the traditional baseline method based on ResNet, the pedestrian re-recognition method provided by the invention has the advantage that the pedestrian re-recognition effect is obviously improved.

Description

Pedestrian re-recognition baseline method based on hierarchical self-attention network
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a pedestrian re-identification baseline method based on a hierarchical self-attention network.
Background
The pedestrian re-identification needs to identify a specific pedestrian in a cross-camera environment by utilizing a computer vision technology, a pedestrian monitoring image is given, the pedestrian image under cross-equipment is searched, and the identification of the specific pedestrian has very important significance for violation judgment, criminal investigation, danger early warning and the like.
A good baseline method should obtain good effects while ensuring a low parameter number, and the existing pedestrian re-identification baseline method is based on ResNet and is limited by limitation and inadequacy of a convolutional neural network on feature extraction, and the baseline method based on ResNet cannot obtain ideal effects.
As research proceeds, the transducer is increasingly being used in the field of computer vision. The existing pedestrian re-identification method based on the Transformer has the problems of overlarge calculated amount, single characteristic receptive field and the like.
Disclosure of Invention
The invention provides a pedestrian re-identification baseline method based on a hierarchical self-attention network, and aims to solve the problems in the background art, and achieve a good effect while having a simple structure.
The technical scheme of the invention is as follows:
a pedestrian re-identification baseline method based on a hierarchical self-attention network comprises the following specific steps:
Step one, data preprocessing;
Providing a total of N different pedestrians, wherein each pedestrian comprises M i images, M i>1,Mi represents the number of images in the class of the ith pedestrian, and i represents the ID number of each pedestrian; for the ith pedestrian, M i -1 images are used as training sets, 1 image is used as verification set, i is used as a label, and the image is indicated to correspond to the ith pedestrian;
1.1 Using bicubic interpolation algorithm to scale the image to (H, W, C) as an input image, wherein H represents the length of the image, W represents the width of the image, C represents the number of channels of the image, and C takes a value of 3; the method comprises the following steps:
1.1.1 Construction Bicubic function:
Wherein a is a variable value in the coefficient and is used for controlling the shape of Bicubic curves;
1.1.2 Interpolation formula is as follows:
wherein (x, y) represents the pixel point to be interpolated, and for each pixel point, 4×4 pixel points nearby are taken to perform bicubic interpolation operation.
1.2 Data enhancement using a random erasure algorithm;
1.2.1 A threshold probability p) is set to generate a random number p1 of 0-1, when p1> p, the input image is not processed, otherwise, erasure is needed:
p1=Rand(0,1) (3)
1.2.2 Determining an erasure area;
He=Rand(H/8,H/4) (4)
We=Rand(W/8,W/4) (5)
Se=He×We (6)
Wherein H represents the length of the input image and W represents the width of the input image; h e denotes the length of erase, W e denotes the width of erase, and S e denotes the area of erase;
1.2.3 Determining erasure coordinates;
xe=Rand(0,H-He) (7)
ye=Rand(0,W-We) (8)
Where x e represents the erased upper left corner x-coordinate and y e represents the erased upper left corner y-coordinate.
Step two, inputting the preprocessed image into a layering self-attention network, namely a Swin transform neural network, and carrying out forward transmission;
the backbone network comprises 4 processing stages, wherein the 2-4 stage network structure is identical, and the specific steps are as follows:
2.1 Stage 1);
2.1.1 Block segmentation; starting from the upper left corner of the image, the input image is divided into a set of non-overlapping image blocks, where each image block is 4 x 4 in size, then the image is divided into a number of image blocks of size (4,4,3), where the number of image blocks N patch is:
Npatch=(H/4)×(W/4) (9)
2.1.2 Linear embedding; flattening each image block into a vector with dimension C through the full connection layer, and sending the image blocks into two continuous Swin blocks;
2.1.3 Swin block extraction features;
The Swin block comprises a Swin block 1 and a Swin block 2; the main structure of the Swin block 1 is a window multi-head self-attention module and a multi-layer sensor, layer standardization processing is carried out before the two modules, and residual connection is added after the two modules; the main structure of the Swin block 2 is a mobile window multi-head self-attention module and a multi-layer sensor, layer standardization processing is carried out before the two modules, and residual connection is added after the two modules;
After the Swin block is extracted, key characteristic information such as the head, the hands and the actions of the pedestrian can be obtained, and a characteristic set (H/4, W/4, C) is output;
2.2 Stage 2);
2.2.1 Block fusion; combining the input feature sets pairwise, adjusting the feature dimension to be twice as much as the original feature dimension by using a full connection layer, and outputting a feature set (H/8,W/8,2C);
2.2.2 Swin block extraction features; completely consistent with the structure of the Swin block in 2.1.3), and outputting a key feature set (H/8,W/8,2C) after the Swin block is processed;
2.3 Stage 3-4;
the network structure of the stage 3 and the stage 4 are completely consistent with that of the stage 2, and after the processing, feature sets (H/16, W/16,4C) and (H/32, W/32,8C) are respectively output;
2.4 A global average pooling layer and a full connection layer; and (3) carrying out global average pooling processing on the feature set output in the stage (4) to obtain a vector with the length of 8C, and mapping the feature into N through a full connection layer, wherein N is the type of the pedestrian in the data set in the step (I).
Step three, calculating a loss function, and reversely transmitting and updating network parameters;
3.1 The loss function consists of two parts, namely an ID loss and a Circle loss, and the formula is as follows:
Lreid=w1Lid+w2Lcircle (10)
Wherein w 1 and w 2 represent the weights of the ID loss and the Circle loss, respectively; l reid denotes the total loss function, L id denotes the ID loss, and L circle denotes the Circle loss;
3.2 ID loss formula is as follows:
Where n represents the number of samples per batch training and p (y i|xi) represents the conditional probability that the input image x i is set to the label y i;
3.3 Circle loss formula is as follows:
Δn=m (13)
Δm=1-m (14)
Wherein N represents the number of categories of different pedestrians, and M i represents the number of images in the category of the ith pedestrian; gamma is a scale parameter; m is the stringency of the optimization; s n is an inter-class similarity score matrix, and S p is a similarity score matrix; a n and a p are non-negative matrices, weight matrices of S n and S p, respectively, and are formulated as follows:
Wherein S n is an inter-class similarity score matrix, and S p is an inter-class similarity score matrix;
3.4 Setting super parameters, and training a network; adopting a preheating learning rate, and initializing the learning rate as r, wherein the r is gradually increased to ten times in the first 10 training steps; the optimizer adopts an optimized random gradient descent algorithm to increase the weight attenuation with the value of d 1 and the momentum with the value of d 2; and (3) carrying out back propagation by using the set optimizer and learning rate and combining the loss values calculated in 3.1) to 3.3), and updating network parameters.
Step four, pedestrian re-identification matching is carried out;
And (3) inputting the pedestrian image to be detected into the Swin transducer neural network in the second step after the pedestrian image to be detected is scaled, obtaining output, and processing by using softmax to obtain N probability values, wherein the probability values respectively correspond to the probabilities that the pedestrians belong to different classes, and the class with the largest probability value is the identity of the pedestrian.
The invention has the beneficial effects that: the invention provides a pedestrian re-recognition baseline method based on a hierarchical self-attention network, creatively introduces a Swin converter as a main network into the pedestrian re-recognition field, takes the weighted sum of ID loss and Circle loss as a loss function, and greatly improves the training effect while ensuring a simple structure through effective data preprocessing and reasonable parameter adjustment.
Drawings
FIG. 1 is a diagram of the overall improved concept of the present invention;
FIG. 2 is a model diagram of a pedestrian re-recognition baseline method based on a hierarchical self-attention network of the present invention;
Fig. 3 is a schematic diagram of the structure of the Swin block.
Detailed Description
The following describes embodiments of the present invention in detail with reference to the accompanying drawings, and the present embodiment is implemented on the premise of the technical solution of the present invention, and provides detailed embodiments and specific operation procedures. The data set of the specific experiment is a mark 1501 data set collected in a certain university, the training set has 751 people and comprises 12936 images; the test set had 750 people and contained 19732 images.
Fig. 1 is an overall improved idea diagram of the present invention, and fig. 2 is a model diagram of a pedestrian re-recognition baseline method based on a hierarchical self-attention network, where specific steps in this embodiment are as follows:
Step one, data preprocessing;
In the training set, 751 people and 751 are arranged, each pedestrian comprises M i images, wherein M i>1,Mi represents the number of images in the class of the ith pedestrian, and i represents the ID number of each pedestrian; for the ith pedestrian, M i -1 images are used as training sets, 1 image is used as verification set, i is used as a label, and the image is indicated to correspond to the ith pedestrian;
1.1 Using bicubic interpolation algorithm, scaling the image to (224,224,3), where H represents the length of the image, W represents the width of the image, and C represents the number of channels of the image, as follows:
1.1.1 Construction Bicubic function:
Wherein a= -0.5, which is a variable value in the coefficient, for controlling the shape of Bicubic curve;
1.1.2 Interpolation formula is as follows:
wherein (x, y) represents the pixel point to be interpolated, and for each pixel point, 4×4 pixel points nearby are taken to perform bicubic interpolation operation.
1.2 Data enhancement using a random erasure algorithm;
1.2.1 A threshold probability p=0.5), generating a random number p1 of 0-1, when p1> p, the image is not processed, otherwise, erasure is needed:
p1=Rand(0,1) (3)
1.2.2 Determining an erasure area;
He=Rand(H/8,H/4) (4)
We=Rand(W/8,W/4) (5)
Se=He×We (6)
Wherein H represents the length of the input image and W represents the width of the input image; h e denotes the length of erase, W e denotes the width of erase, and S e denotes the area of erase;
1.2.3 Determining erasure coordinates;
xe=Rand(0,H-He) (7)
ye=Rand(0,W-We) (8)
Where x e represents the erased upper left corner x-coordinate and y e represents the erased upper left corner y-coordinate.
Step two, inputting the preprocessed image into a layering self-attention network, namely a Swin transform neural network, and carrying out forward transmission;
the backbone network comprises 4 processing stages, wherein the 2-4 stage network structure is identical, and the specific steps are as follows:
2.1 Stage 1);
2.1.1 Block segmentation; starting from the upper left corner of the image, the input image is divided into a set of non-overlapping image blocks, where each image block is 4 x 4 in size, then the image is divided into a number of image blocks of size (4,4,3), where the number of image blocks N patch is:
Npatch=(H/4)×(W/4) (9)
Wherein, H, W refer to the length and width of the input image, respectively, where N patch =56 x56;
2.1.2 Linear embedding; flattening each image block into a vector with a dimension of 128 through the full connection layer, and feeding the vector into two continuous Swin blocks;
2.1.3 Swin block extraction features;
As shown in fig. 3, the Swin block includes a Swin block 1 and a Swin block 2; the main structure of the Swin block 1 is a window multi-head self-attention module and a multi-layer sensor, layer standardization processing is carried out before the two modules, and residual connection is added after the two modules; the main structure of the Swin block 2 is a mobile window multi-head self-attention module and a multi-layer sensor, layer standardization processing is carried out before the two modules, and residual connection is added after the two modules;
After the Swin block is extracted, key characteristic information such as the head, the hands and the actions of the pedestrian can be obtained, a characteristic set (56,56,128) is output, and the key characteristic information is transmitted to the next module;
2.2 Stage 2);
2.2.1 Block fusion; combining the input feature sets pairwise, adjusting the feature dimension to be twice as much as the original dimension by using a full-connection layer, and outputting a feature set (28,28,256);
2.2.2 Swin block extraction features; the structure is completely consistent with that of 2.1.3), and after the Swin block processing, a feature set (28,28,256) is output;
2.3 Stage 3-4;
the structures of the stage 3 and the stage 4 are completely consistent with those of the stage 2, and after the processing, feature sets (14,14,512) and (7,7,1024) are respectively output;
2.4 A global average pooling layer and a full connection layer; and (3) carrying out global average pooling processing on the feature set output in the stage 4 to obtain a vector with the length of 1024, and mapping the feature into 751 classes through a full connection layer, wherein 751 is the class of pedestrians in the data set used in the embodiment.
Step three, calculating a loss function, and reversely transmitting and updating network parameters;
3.1 The loss function consists of two parts, namely an ID loss and a Circle loss, and the formula is as follows:
Lreid=w1Lid+w2Lcircle (10)
wherein w 1 and w 2 represent the weights of ID loss and Circle loss, respectively, w 1 takes a value of 0.4, and w 2 takes a value of 0.6; l reid denotes the total loss function, L id denotes the ID loss, and L circle denotes the Circle loss;
3.2 ID loss formula is as follows:
Where n represents the number of samples per batch training, the value 16 of this embodiment, p (y i|xi) represents the conditional probability that the input image x i is set to the label y i;
3.3 Circle loss formula is as follows:
Δn=m (13)
Δm=1-m (14)
Wherein N represents the number of different pedestrians, and the value 751 is taken in the embodiment; m i represents the number of images in the class of the ith pedestrian; gamma is the scale parameter, 32 in this example; m is the stringency of the optimization, taking 0.25 in this example; s n is an inter-class similarity score matrix, and S p is a similarity score matrix; a n and a p are non-negative matrices, weight matrices of S n and S p, respectively, and are formulated as follows:
Wherein S n is an inter-class similarity score matrix, and S p is an inter-class similarity score matrix;
3.4 The super parameter setting when training the neural network is shown in table 1, the back propagation is performed by using the set optimizer and learning rate and combining the loss values calculated in 3.1) to 3.3), and the network parameters are updated.
Table 1 super parameter settings for training networks
Step four, pedestrian re-identification matching is carried out;
And (3) inputting the pedestrian image to be detected into the Swin transducer neural network in the second step after the pedestrian image to be detected is scaled, obtaining and outputting the pedestrian image, and processing the pedestrian image by using softmax to obtain 751 probability values, wherein the probability value of the pedestrian belongs to different classes, and the class with the largest probability value is the identity of the pedestrian.
In this embodiment, pedestrian re-recognition effect test is performed based on the mark 1501 dataset, and the pedestrian re-recognition effect test is compared with the existing pedestrian re-recognition baseline model based on global features, as shown in table 2:
table 2 comparison of results with existing baseline model
The baseline model provided by the invention can effectively improve the Rank1 and mAP indexes of pedestrian re-identification through comparison of experimental results, proves the effectiveness of the method, and has great promotion significance for practical application of pedestrian re-identification; in addition, the network structure is simpler, has strong expandability, and has great reference significance for the design of a pedestrian re-identification method in the future.

Claims (1)

1. A hierarchical self-attention network-based pedestrian re-recognition baseline method, characterized in that the method comprises the following steps:
Step one, data preprocessing;
Providing a total of N different pedestrians, wherein each pedestrian comprises M i images, M i>1,Mi represents the number of images in the class of the ith pedestrian, and i represents the ID number of each pedestrian; for the ith pedestrian, M i -1 images are used as training sets, 1 image is used as verification set, i is used as a label, and the image is indicated to correspond to the ith pedestrian;
1.1 Using bicubic interpolation algorithm to scale the image to (H, W, C) as an input image, wherein H represents the length of the image, W represents the width of the image, C represents the number of channels of the image, and C takes a value of 3; the method comprises the following steps:
1.1.1 Construction Bicubic function:
Wherein a is a variable value in the coefficient and is used for controlling the shape of Bicubic curves;
1.1.2 Interpolation formula is as follows:
Wherein (x, y) represents the pixel point to be interpolated, and for each pixel point, 4×4 pixel points nearby are taken for bicubic interpolation operation;
1.2 Data enhancement using a random erasure algorithm;
1.2.1 A threshold probability p) is set to generate a random number p1 of 0-1, when p1> p, the input image is not processed, otherwise, erasure is needed:
p1=Rand(0,1) (3)
1.2.2 Determining an erasure area;
He=Rand(H/8,H/4) (4)
We=Rand(W/8,W/4) (5)
Se=He×We (6)
Wherein H represents the length of the input image and W represents the width of the input image; h e denotes the length of erase, W e denotes the width of erase, and S e denotes the area of erase;
1.2.3 Determining erasure coordinates;
xe=Rand(0,H-He) (7)
ye=Rand(0,W-We) (8)
Where x e represents the erased upper left corner x coordinate and y e represents the erased upper left corner y coordinate;
Step two, inputting the preprocessed image into a layering self-attention network, namely a Swin transform neural network, and carrying out forward transmission;
the backbone network comprises 4 processing stages, wherein the 2-4 stage network structure is identical, and the specific steps are as follows:
2.1 Stage 1);
2.1.1 Block segmentation; starting from the upper left corner of the image, the input image is divided into a set of non-overlapping image blocks, where each image block is 4 x 4 in size, then the image is divided into a number of image blocks of size (4,4,3), where the number of image blocks N patch is:
Npatch=(H/4)×(W/4) (9)
2.1.2 Linear embedding; flattening each image block into a vector with dimension C through the full connection layer, and sending the image blocks into two continuous Swin blocks;
2.1.3 Swin block extraction features;
The Swin block comprises a Swin block 1 and a Swin block 2; the main structure of the Swin block 1 is a window multi-head self-attention module and a multi-layer sensor, layer standardization processing is carried out before the two modules, and residual connection is added after the two modules; the main structure of the Swin block 2 is a mobile window multi-head self-attention module and a multi-layer sensor, layer standardization processing is carried out before the two modules, and residual connection is added after the two modules;
After the Swin block is extracted, the key characteristic information of the head, the hand and the action of the pedestrian is obtained, and a characteristic set (H/4, W/4, C) is output;
2.2 Stage 2);
2.2.1 Block fusion; combining the input feature sets pairwise, adjusting the feature dimension to be twice as much as the original feature dimension by using a full connection layer, and outputting a feature set (H/8,W/8,2C);
2.2.2 Swin block extraction features; completely consistent with the structure of the Swin block in 2.1.3), and outputting a key feature set (H/8,W/8,2C) after the Swin block is processed;
2.3 Stage 3-4;
the network structure of the stage 3 and the stage 4 are completely consistent with that of the stage 2, and after the processing, feature sets (H/16, W/16,4C) and (H/32, W/32,8C) are respectively output;
2.4 A global average pooling layer and a full connection layer; carrying out global average pooling treatment on the feature set output in the stage 4 to obtain a vector with the length of 8C, and mapping the feature into N through a full connection layer, wherein N is the type of pedestrians in the data set in the step one;
Step three, calculating a loss function, and reversely transmitting and updating network parameters;
3.1 The loss function consists of two parts, namely an ID loss and a Circle loss, and the formula is as follows:
Lreid=w1Lid+w2Lcircle (10)
Wherein w 1 and w 2 represent the weights of the ID loss and the Circle loss, respectively; l reid denotes the total loss function, L id denotes the ID loss, and L circle denotes the Circle loss;
3.2 ID loss formula is as follows:
Where n represents the number of samples per batch training and p (y i|xi) represents the conditional probability that the input image x i is set to the label y i;
3.3 Circle loss formula is as follows:
Δn=m (13)
Δm=1-m (14)
Wherein N represents the number of categories of different pedestrians, and M i represents the number of images in the category of the ith pedestrian; gamma is a scale parameter; m is the stringency of the optimization; s n is an inter-class similarity score matrix, and S p is a similarity score matrix; a n and a p are non-negative matrices, weight matrices of S n and S p, respectively, and are formulated as follows:
Wherein S n is an inter-class similarity score matrix, and S p is an inter-class similarity score matrix;
3.4 Setting super parameters, and training a network; adopting a preheating learning rate, and initializing the learning rate as r, wherein the r is gradually increased to ten times in the first 10 training steps; the optimizer adopts an optimized random gradient descent algorithm to increase the weight attenuation with the value of d 1 and the momentum with the value of d 2; using the set optimizer and learning rate, and combining the loss values calculated in 3.1) to 3.3), carrying out back propagation, and updating network parameters;
step four, pedestrian re-identification matching is carried out;
And (3) inputting the pedestrian image to be detected into the Swin transducer neural network in the second step after the pedestrian image to be detected is scaled, obtaining output, and processing by using softmax to obtain N probability values, wherein the probability values respectively correspond to the probabilities that the pedestrians belong to different classes, and the class with the largest probability value is the identity of the pedestrian.
CN202111087471.7A 2021-09-16 2021-09-16 Pedestrian re-recognition baseline method based on hierarchical self-attention network Active CN113792669B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111087471.7A CN113792669B (en) 2021-09-16 2021-09-16 Pedestrian re-recognition baseline method based on hierarchical self-attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111087471.7A CN113792669B (en) 2021-09-16 2021-09-16 Pedestrian re-recognition baseline method based on hierarchical self-attention network

Publications (2)

Publication Number Publication Date
CN113792669A CN113792669A (en) 2021-12-14
CN113792669B true CN113792669B (en) 2024-06-14

Family

ID=78878614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111087471.7A Active CN113792669B (en) 2021-09-16 2021-09-16 Pedestrian re-recognition baseline method based on hierarchical self-attention network

Country Status (1)

Country Link
CN (1) CN113792669B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842085B (en) * 2022-07-05 2022-09-16 松立控股集团股份有限公司 Full-scene vehicle attitude estimation method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710831A (en) * 2018-04-24 2018-10-26 华南理工大学 A kind of small data set face recognition algorithms based on machine vision

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160297B (en) * 2019-12-31 2022-05-13 武汉大学 Pedestrian re-identification method and device based on residual attention mechanism space-time combined model
CN112183468A (en) * 2020-10-27 2021-01-05 南京信息工程大学 Pedestrian re-identification method based on multi-attention combined multi-level features
CN112818790A (en) * 2021-01-25 2021-05-18 浙江理工大学 Pedestrian re-identification method based on attention mechanism and space geometric constraint

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710831A (en) * 2018-04-24 2018-10-26 华南理工大学 A kind of small data set face recognition algorithms based on machine vision

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于注意力机制的行人重识别特征提取方法;刘紫燕;万培佩;;计算机应用;20201231(第03期);全文 *

Also Published As

Publication number Publication date
CN113792669A (en) 2021-12-14

Similar Documents

Publication Publication Date Title
CN109977918B (en) Target detection positioning optimization method based on unsupervised domain adaptation
CN107563385B (en) License plate character recognition method based on depth convolution production confrontation network
WO2019228317A1 (en) Face recognition method and device, and computer readable medium
CN109492529A (en) A kind of Multi resolution feature extraction and the facial expression recognizing method of global characteristics fusion
CN111967470A (en) Text recognition method and system based on decoupling attention mechanism
CN111259940B (en) Target detection method based on space attention map
CN107437100A (en) A kind of picture position Forecasting Methodology based on the association study of cross-module state
CN109325440B (en) Human body action recognition method and system
CN109815814B (en) Face detection method based on convolutional neural network
Kaluri et al. A framework for sign gesture recognition using improved genetic algorithm and adaptive filter
CN112198966B (en) Stroke identification method and system based on FMCW radar system
CN113591978B (en) Confidence penalty regularization-based self-knowledge distillation image classification method, device and storage medium
CN113255557B (en) Deep learning-based video crowd emotion analysis method and system
CN111950340B (en) Face convolutional neural network characteristic expression learning and extracting method suitable for wearing mask
CN114742224A (en) Pedestrian re-identification method and device, computer equipment and storage medium
CN113792669B (en) Pedestrian re-recognition baseline method based on hierarchical self-attention network
CN114694255B (en) Sentence-level lip language recognition method based on channel attention and time convolution network
CN113496260B (en) Grain depot personnel non-standard operation detection method based on improved YOLOv3 algorithm
CN114821736A (en) Multi-modal face recognition method, device, equipment and medium based on contrast learning
CN113139618B (en) Robustness-enhanced classification method and device based on integrated defense
CN116758621B (en) Self-attention mechanism-based face expression depth convolution identification method for shielding people
CN116229584A (en) Text segmentation recognition method, system, equipment and medium in artificial intelligence field
CN110688880A (en) License plate identification method based on simplified ResNet residual error network
CN111626298B (en) Real-time image semantic segmentation device and segmentation method
CN114913339A (en) Training method and device of feature map extraction model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant