CN112818931A - Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion - Google Patents
Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion Download PDFInfo
- Publication number
- CN112818931A CN112818931A CN202110218857.0A CN202110218857A CN112818931A CN 112818931 A CN112818931 A CN 112818931A CN 202110218857 A CN202110218857 A CN 202110218857A CN 112818931 A CN112818931 A CN 112818931A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- local
- grained
- feature
- fusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 61
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 40
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 43
- 238000011176 pooling Methods 0.000 claims description 34
- 230000007246 mechanism Effects 0.000 claims description 14
- 238000010606 normalization Methods 0.000 claims description 14
- 230000004913 activation Effects 0.000 claims description 11
- 238000005192 partition Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 3
- 239000003607 modifier Substances 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 abstract description 2
- 238000013461 design Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion, which comprises the steps of selecting a pedestrian re-identification data set, and preprocessing a training set in the data set; selecting a residual error network as a basic framework, wherein the residual error network comprises a global coarse-grained fusion learning branch, a local coarse-grained fusion learning branch and a local attention fine-grained fusion learning branch; adopting Softmax loss and triple loss as a re-recognition network monitor to train a pedestrian re-recognition network model; fusing network characteristics of different branches to serve as a final descriptor of the pedestrian, and taking the image of the pedestrian to be inquired as the input of a pedestrian re-identification network model to obtain a pedestrian re-identification result. The invention effectively relieves the pressure of the complex background or posture change on the re-recognition task and improves the recognition precision.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a pedestrian re-identification method.
Background
Pedestrian re-identification is a technology for searching a given pedestrian under a camera according to wearing, posture, hair style and other information of the pedestrian by using a computer vision algorithm. When a specific pedestrian in one monitoring device is determined, the pedestrian can be retrieved from other non-overlapping camera devices through the method and tracking identification is carried out. In recent years, the method is widely applied to safety monitoring in public places in combination with video pedestrian tracking and detecting technologies.
Traditional research methods are mostly based on manual design features, but with the rapid development of deep learning, the traditional research methods are gradually replaced. The convolutional neural network is one of typical feature extraction methods in deep learning, can automatically learn and obtain features from data samples, and improves the performance of a pedestrian re-identification system, so that the convolutional neural network is widely applied to the field. However, in a real scene, deploying a pedestrian re-identification system still suffers from many factors, for example, images acquired from monitoring equipment are blurred, the posture of a pedestrian changes, the angle of a camera is different, and occlusion interference occurs, so that the identification rate is low.
Disclosure of Invention
In order to solve the technical problems mentioned in the background art, the invention provides a multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion.
In order to achieve the technical purpose, the technical scheme of the invention is as follows:
the multi-scale pedestrian re-identification method based on the multi-granularity depth feature fusion comprises the following steps:
(1) selecting a pedestrian re-identification data set, and preprocessing a training set in the data set;
(2) selecting a residual error network as a basic framework, wherein the residual error network comprises a global coarse-grained fusion learning branch, a local coarse-grained fusion learning branch and a local attention fine-grained fusion learning branch;
(3) learning multi-level coarse-grained characteristic information of the pedestrian by adopting a global coarse-grained fusion learning branch;
(4) extracting pedestrian local features from the local area by adopting a local coarse-grained fusion learning branch;
(5) adopting a local attention fine-grained fusion learning branch, introducing an attention mechanism to eliminate background interference, and extracting pedestrian fine-grained local features;
(6) adopting Softmax loss and triple loss as a re-recognition network monitor to train a pedestrian re-recognition network model;
(7) fusing network features of different branches to serve as a final descriptor of a pedestrian, taking an image of the pedestrian to be inquired as input of a pedestrian re-identification network model, retrieving from a candidate gallery, calculating characteristic distances between the image of the pedestrian to be inquired and all images in the candidate gallery, and sorting according to the characteristic distances, wherein the image in the candidate gallery with the closest characteristic distance and the image of the pedestrian to be inquired are data of the same pedestrian.
Further, in step (1), the training set is preprocessed as follows:
(1a) fixing the image size of the pedestrians in the training set;
(1b) and carrying out horizontal turning, rotation, random clipping and normalization processing on the samples of each identity in the training set, thereby increasing the number of training samples.
Further, in the step (2), the step of selecting the residual error network as the basic skeleton is as follows:
(2a) parameter refinement is carried out on the ResNet50 backbone network 4 th residual error stage;
(2b) for the global coarse-grained fusion learning branch, starting from ResNet50 stage1,2 and 3, keeping ResNet50 stage 4 and 5, and using global average pooling processing on the obtained feature map to obtain a 2048-dimensional global feature vector fg_2048;
(2c) For local coarse-grained fusion learning branch, the feature map of ResNet50 stage3 is divided into two equal parts horizontally and simultaneouslyThe downsample layer step size is set to 1 at ResNet50 stage 5; secondly, using a global average pooling layer to obtain the local characteristic f of each partitionp_2048_1And fp_2048_2;
(2d) For the local attention fine-grained fusion learning branch, a convolution attention module is introduced on the basis of the local coarse-grained fusion learning branch to obtain the local characteristic f of each partitionpab_2048_1And fpab_2048_2。
Further, in step (3), the method for learning the multi-level coarse-grained feature information of the pedestrian by adopting the global coarse-grained fusion learning branch includes that firstly, f is usedg_2048As the last pedestrian feature descriptor, training by using a triple loss function; next, f was processed using Conv1 × 1g_2048To obtain 512-dimensional feature vector fg_512And trained using the Softmax loss function.
Further, in step (4), the method for extracting the local features of the pedestrian for the local region by using the local coarse-grained fusion learning branch is that firstly, f isp_2048_1As the last pedestrian feature descriptor, directly using a triple hard-to-load sample loss function for training; next, process f using the batch normalization layer, the nonlinear activation function ReLU, the Dropout layer, and the batch normalization layer in this orderpab_2048_2(ii) a Finally, a 512-dimensional f is obtained by Conv1 × 1p_512_1And fp_512_2Training was performed using the Softmax loss function.
Further, in the step (5), the method for extracting the fine-grained local features of the pedestrian by using the local attention fine-grained fusion learning branch includes that the attention mechanism includes a channel attention mechanism and a space attention mechanism; firstly, inputting a pedestrian picture, and obtaining an input feature mapping chart F through a ResNet50 basic skeleton; secondly, F enters the channel attention, two different one-dimensional eigenvectors are obtained by adopting average pooling and maximum pooling processing, and are transmitted to a shared multilayer perceptron to obtain a characteristic space matrix F'; then, obtaining a characteristic F' through space attention;
local features of each partition obtained by attention mechanismSign fpab_2048_1And fpab_2048_2First, f ispab_2048_1Directly using a triple hard negative sample loss function as a final pedestrian feature descriptor for training; next, process f using the batch normalization layer, the nonlinear activation function ReLU, the Dropout layer, and the batch normalization layer in this orderpab_2048_2(ii) a Finally, a 512-dimensional f is obtained by Conv1 × 1pab_512_1And fpab_512_2And trained using the Softmax loss function.
Further, the feature space matrix F' is obtained by:
F′=MC(F)×F
in the above formula, σ is an activation function, MLP denotes a multilayer perceptron, W0And W1Respectively, the weight of the MLP, AvgPool and MaxPool respectively represent the average pooling and maximum pooling,andtwo different modifiers were obtained after average pooling and maximum pooling, respectively.
Further, the spatial attention mechanism first generates two 2-dimensional feature maps using average pooling and maximum poolingAndand concatenate them to generate a valid feature descriptor, which is then passed through convolutional layer f7×7Generating a space mechanical drawing, compressing to obtain F':
F″=MS(F′)×F′
in the above equation, σ is the activation function, and AvgPool and MaxPool represent the average pooling and maximum pooling, respectively.
Further, in step (6), a joint loss function of the Softmax loss and the triplet loss is constructed:
Losstotal=(1-w)Losssoftmax+wLosstriplet
therein, LosstotalBeing a joint Loss function, LosssoftmaxAs a Softmax Loss function, LosstripletIs a triple loss function, w is a balance coefficient, w belongs to (0,1), N is the training sample batch number, C is the training sample class number, fiGiven an input feature vector, yiFor its corresponding tag, WiAnd biIs the weight vector and offset, W, of sample iyiAnd byiIs yiThe superscript T denotes transposition, alpha is the boundary between a positive and negative sample pair,features [ X ] extracted from the fixed picture, the positive sample picture, and the negative sample picture respectively through the network]+Max (X, 0); and selecting the most dissimilar positive sample and the most similar negative sample from the batch to form the triad for each anchor point sample in the batch.
Adopt the beneficial effect that above-mentioned technical scheme brought:
the design of the invention comprises a global coarse-grained fusion learning branch, a local coarse-grained fusion learning branch and a local attention fine-grained fusion learning branch multi-branch network structure, thus relieving the pressure of complex background or pedestrian posture change on a re-recognition task, and learning global information and effective local distinguishing characteristics of pedestrians. According to the invention, a Softmax loss and triple loss function combined training network is adopted, so that the characteristic distance between the sample data of the same type of pedestrians is closer, the sample data of different types of pedestrians is farther, and the re-identification performance of the pedestrians is improved.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a schematic diagram of a Market-1501 data set in the embodiment;
FIG. 3 is a schematic diagram of an embodiment of a network framework;
FIG. 4 is a CBAM framework map of an embodiment;
FIG. 5 is a pedestrian retrieval map of an embodiment Market-1501 data set;
FIG. 6 is a schematic diagram of an embodiment marker-1501 data set w parameter performance index.
Detailed Description
The technical scheme of the invention is explained in detail in the following with the accompanying drawings.
The invention designs a multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion, as shown in figure 1, the steps are as follows:
step 1: selecting a pedestrian re-identification data set, and preprocessing a training set in the data set;
step 2: selecting a residual error network as a basic framework, wherein the residual error network comprises a global coarse-grained fusion learning branch, a local coarse-grained fusion learning branch and a local attention fine-grained fusion learning branch;
and step 3: learning multi-level coarse-grained characteristic information of the pedestrian by adopting a global coarse-grained fusion learning branch;
and 4, step 4: extracting pedestrian local features from the local area by adopting a local coarse-grained fusion learning branch;
and 5: adopting a local attention fine-grained fusion learning branch, introducing an attention mechanism to eliminate background interference, and extracting pedestrian fine-grained local features;
step 6: adopting Softmax loss and triple loss as a re-recognition network monitor to train a pedestrian re-recognition network model;
and 7: fusing network features of different branches to serve as a final descriptor of a pedestrian, taking an image of the pedestrian to be inquired as input of a pedestrian re-identification network model, retrieving from a candidate gallery, calculating characteristic distances between the image of the pedestrian to be inquired and all images in the candidate gallery, and sorting according to the characteristic distances, wherein the image in the candidate gallery with the closest characteristic distance and the image of the pedestrian to be inquired are data of the same pedestrian.
Referring to fig. 2, the present embodiment selects a Market-1501 data set, which is images of 1501 pedestrians captured by five high-resolution and one low-fraction camera devices in a qinghua university campus, from a pedestrian re-recognition field public data set. The image is a sample of partial data of a Market-1501 data set, and the image quality is uneven, the pedestrian posture is greatly changed, and the background is noisy. Preprocessing a training set of the training system, specifically:
s1.1: setting the image size of any sample data in the training set to be 256 multiplied by 128 in a uniform size;
s1.2: horizontally turning and rotating any sample data in the training set, then cutting the random length-width ratio to be 0.75, and then scaling the image obtained after cutting to be 256 multiplied by 128; the pixel values are then normalized.
In this embodiment, a Market-1501 data set is selected as the pedestrian data research content, as shown in fig. 3, the network framework of this embodiment is developed based on ResNet50, and the specific steps are as follows:
s2.1: parameter refinement is carried out on the ResNet50 backbone network 4 th residual error stage;
s2.2: for the global coarse-grained fusion learning branch, starting from ResNet50 stage1,2,3, keeping ResNet50 stage 4,5, and using the obtained feature map to use the globalAverage pooling treatment to obtain 2048-dimensional global feature vector fg_2048;
S2.3: for local coarse-grained fusion learning branches, the feature map of the ResNet50 stage3 is divided into two equal parts horizontally, and the step size of a downsampling layer is set to be 1 at the ResNet50 stage 5; secondly, using a global average pooling layer to obtain the local characteristic f of each partitionp_2048_1And fp_2048_2;
S2.4: for the local attention fine-grained fusion learning branch, a convolution attention module is introduced on the basis of the local coarse-grained fusion learning branch to obtain the local characteristic f of each partitionpab_2048_1And fpab_2048_2。
In this embodiment, the method for learning the multi-level coarse-grained characteristic information of the pedestrian by using the global coarse-grained fusion learning branch includes that firstly, f isg_2048As the last pedestrian feature descriptor, training by using a triple loss function; next, f was processed using Conv1 × 1g_2048To obtain 512-dimensional feature vector fg_512And trained using the Softmax loss function.
In the embodiment, the method for extracting the local features of the pedestrian for the local area by adopting the local coarse-grained fusion learning branch is that firstly, f isp_2048_1As the last pedestrian feature descriptor, directly using a triple hard-to-load sample loss function for training; next, process f using the batch normalization layer, the nonlinear activation function ReLU, the Dropout layer, and the batch normalization layer in this orderpab_2048_2(ii) a Finally, a 512-dimensional f is obtained by Conv1 × 1p_512_1And fp_512_2Training was performed using the Softmax loss function.
In this embodiment, the method for extracting the fine-grained local features of the pedestrian by using the local attention fine-grained fusion learning branch includes, as shown in fig. 4, that the attention mechanism (CBAM) includes a channel attention mechanism and a spatial attention mechanism; firstly, inputting a pedestrian picture, and obtaining an input feature mapping chart F through a ResNet50 basic skeleton; secondly, F enters the channel attention, two different one-dimensional eigenvectors are obtained by adopting average pooling and maximum pooling processing, and are transmitted to a shared multilayer perceptron to obtain a characteristic space matrix F'; then, after a further spatial attention, feature F' is obtained.
The feature space matrix F' is obtained by:
F′=MC(F)×F
in the above formula, σ is an activation function, MLP denotes a multilayer perceptron, W0And W1Respectively, the weight of the MLP, AvgPool and MaxPool respectively represent the average pooling and maximum pooling,andtwo different modifiers were obtained after average pooling and maximum pooling, respectively.
The spatial attention mechanism first generates two 2-dimensional feature maps by using average pooling and maximum poolingAndand concatenate them to generate a valid feature descriptor, which is then passed through convolutional layer f7×7Generating a space mechanical drawing, compressing to obtain F':
F″=MS(F′)×F′
local features f of the partitions obtained by the attention mechanismpab_2048_1And fpab_2048_2First, f ispab_2048_1Direct use of three as last pedestrian feature descriptorTraining a tuple hard negative sample loss function; next, process f using the batch normalization layer, the nonlinear activation function ReLU, the Dropout layer, and the batch normalization layer in this orderpab_2048_2(ii) a Finally, a 512-dimensional f is obtained by Conv1 × 1pab_512_1And fpab_512_2And trained using the Softmax loss function.
In this example, a joint loss function of Softmax loss and triplet loss is constructed:
Losstotal=(1-w)Losssoftmax+wLosstriplet
therein, LosstotalBeing a joint Loss function, LosssoftmaxAs a Softmax Loss function, LosstripletIs a triple loss function, w is a balance coefficient, w belongs to (0,1), N is the training sample batch number, C is the training sample class number, fiGiven an input feature vector, yiFor its corresponding tag, WiAnd biIs the weight vector and offset, W, of sample iyiAnd byiIs yiThe superscript T denotes transposition, alpha is the boundary between a positive and negative sample pair,features [ X ] extracted from the fixed picture, the positive sample picture, and the negative sample picture respectively through the network]+Max (X, 0); and selecting the most dissimilar positive sample and the most similar negative sample from the batch to form the triad for each anchor point sample in the batch.
In this embodiment, a multiscale network with multi-granularity depth feature fusion is initialized using ResNet50 weights trained on Imagenet in advance, where weights of different branches are shared. In this embodiment, each small batch of P identities is randomly selected to be sampled, and each identity randomly samples K images from a training set to meet the requirement of a triplet, where P is 16 and K is 4 in an experiment, and an SGD optimizer is selected, where weight attenuation is set to 5e-4 and momentum is 0.9. This embodiment sets the total number of training times to 240, the initial learning rate to 0.1, and the learning rate to be divided by 10 every iteration 40 times until the final learning rate to 0.001 remains unchanged, while setting the balance coefficient w to 0.6.
Fig. 5 is a person search graph on the Market-1501 data set in the present embodiment, the images in the first column represent query graphs, and the pictures searched from the gallery are sorted according to cosine similarity of 1 to 10. It can be seen from the ranking order that most of the retrieved images were correctly selected, probably due to insufficient image information collected from the single-view camera, but still some erroneous images labeled with red numbers and detection boxes.
FIG. 6 shows the effect of the w parameter of this embodiment. The Rank-1 represents the probability that the retrieved picture and the picture to be queried are the same identity, and the mAP represents the average precision that the retrieved picture and the picture to be queried are the same identity. When w is 0, only a single Softmax loss is used for supervising the training network, and the convolution descriptor is used as a unique pedestrian feature descriptor, so that pedestrian feature information of different levels is not fully utilized. Meanwhile, the Softmax loss only learns separable features, so that the learned features are insufficient in discrimination. When w > 0 invariance learning, the method of combined Softmax loss and triplet loss supervised training improves significantly. The effect is best when w is 0.6, the effectiveness of the method provided by the invention is verified, the respective defects are mutually compensated by combining Softmax loss and triple loss function supervised learning, and the characteristics of multilevel and finer granularity can be learned. But when w is 1, f is directly connected due to local fusion branchpab_2048_1And fpab_2048_2As a final descriptor, using triple loss alone as supervised training is not as effectiveAnd (5) performing combined training.
The embodiments are only for illustrating the technical idea of the present invention, and the technical idea of the present invention is not limited thereto, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the scope of the present invention.
Claims (9)
1. The multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion is characterized by comprising the following steps of:
(1) selecting a pedestrian re-identification data set, and preprocessing a training set in the data set;
(2) selecting a residual error network as a basic framework, wherein the residual error network comprises a global coarse-grained fusion learning branch, a local coarse-grained fusion learning branch and a local attention fine-grained fusion learning branch;
(3) learning multi-level coarse-grained characteristic information of the pedestrian by adopting a global coarse-grained fusion learning branch;
(4) extracting pedestrian local features from the local area by adopting a local coarse-grained fusion learning branch;
(5) adopting a local attention fine-grained fusion learning branch, introducing an attention mechanism to eliminate background interference, and extracting pedestrian fine-grained local features;
(6) adopting Softmax loss and triple loss as a re-recognition network monitor to train a pedestrian re-recognition network model;
(7) fusing network features of different branches to serve as a final descriptor of a pedestrian, taking an image of the pedestrian to be inquired as input of a pedestrian re-identification network model, retrieving from a candidate gallery, calculating characteristic distances between the image of the pedestrian to be inquired and all images in the candidate gallery, and sorting according to the characteristic distances, wherein the image in the candidate gallery with the closest characteristic distance and the image of the pedestrian to be inquired are data of the same pedestrian.
2. The multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion according to claim 1, wherein in the step (1), the step of preprocessing the training set is as follows:
(1a) fixing the image size of the pedestrians in the training set;
(1b) and carrying out horizontal turning, rotation, random clipping and normalization processing on the samples of each identity in the training set, thereby increasing the number of training samples.
3. The multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion according to claim 1, wherein in the step (2), the step of selecting the residual error network as the basic skeleton comprises the following steps:
(2a) parameter refinement is carried out on the ResNet50 backbone network 4 th residual error stage;
(2b) for the global coarse-grained fusion learning branch, starting from ResNet50 stage1,2 and 3, keeping ResNet50 stage 4 and 5, and using global average pooling processing on the obtained feature map to obtain a 2048-dimensional global feature vector fg_2048;
(2c) For local coarse-grained fusion learning branches, the feature map of the ResNet50 stage3 is divided into two equal parts horizontally, and the step size of a downsampling layer is set to be 1 at the ResNet50 stage 5; secondly, using a global average pooling layer to obtain the local characteristic f of each partitionp_2048_1And fp_2048_2;
(2d) For the local attention fine-grained fusion learning branch, a convolution attention module is introduced on the basis of the local coarse-grained fusion learning branch to obtain the local characteristic f of each partitionpab_2048_1And fpab_2048_2。
4. The method for re-identifying pedestrians based on multi-level coarse-grained feature fusion of claim 3 is characterized in that in step (3), the method for learning the multi-level coarse-grained feature information of pedestrians by using the global coarse-grained fusion learning branch is that firstly, f is divided intog_2048As the last pedestrian feature descriptor, training by using a triple loss function; next, f was processed using Conv1 × 1g_2048To obtain 512-dimensional feature vector fg_512And trained using the Softmax loss function.
5. The method for multi-scale pedestrian re-identification based on multi-granularity depth feature fusion of claim 3, wherein in the step (4), the local feature of the pedestrian is extracted from the local region by using the local coarse-granularity fusion learning branch by firstly dividing fp_2048_1As the last pedestrian feature descriptor, directly using a triple hard-to-load sample loss function for training; next, process f using the batch normalization layer, the nonlinear activation function ReLU, the Dropout layer, and the batch normalization layer in this orderpab_2048_2(ii) a Finally, a 512-dimensional f is obtained by Conv1 × 1p_512_1And fp_512_2Training was performed using the Softmax loss function.
6. The multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion is characterized in that in the step (5), the method for extracting the pedestrian fine-granularity local features by adopting the local attention fine-granularity fusion learning branch comprises the steps of a channel attention mechanism and a space attention mechanism; firstly, inputting a pedestrian picture, and obtaining an input feature mapping chart F through a ResNet50 basic skeleton; secondly, F enters the channel attention, two different one-dimensional eigenvectors are obtained by adopting average pooling and maximum pooling processing, and are transmitted to a shared multilayer perceptron to obtain a characteristic space matrix F'; then, obtaining a characteristic F' through space attention;
local features f of the partitions obtained by the attention mechanismpab_2048_1And fpab_2048_2First, f ispab_2048_1Directly using a triple hard negative sample loss function as a final pedestrian feature descriptor for training; next, process f using the batch normalization layer, the nonlinear activation function ReLU, the Dropout layer, and the batch normalization layer in this orderpab_2048_2(ii) a Finally, a 512-dimensional f is obtained by Conv1 × 1pab_512_1And fpab_512_2And trained using the Softmax loss function.
7. The multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion according to claim 6, wherein the feature space matrix F' is obtained by the following method:
F′=MC(F)×F
in the above formula, σ is an activation function, MLP denotes a multilayer perceptron, W0And W1Respectively, the weight of the MLP, AvgPool and MaxPool respectively represent the average pooling and maximum pooling,andtwo different modifiers were obtained after average pooling and maximum pooling, respectively.
8. The method according to claim 6, wherein the spatial attention mechanism first generates two 2-dimensional feature maps by using average pooling and maximum poolingAndand concatenate them to generate a valid feature descriptor, which is then passed through convolutional layer f7×7Generating a space mechanical drawing, compressing to obtain F':
F″=MS(F′)×F′
in the above equation, σ is the activation function, and AvgPool and MaxPool represent the average pooling and maximum pooling, respectively.
9. The multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion as claimed in claim 1, wherein in step (6), a joint loss function of Softmax loss and triple loss is constructed:
Losstotal=(1-w)Losssoftmax+wLosstriplet
therein, LosstotalBeing a joint Loss function, LosssoftmaxAs a Softmax Loss function, LosstripletIs a triple loss function, w is a balance coefficient, w belongs to (0,1), N is the training sample batch number, C is the training sample class number, fiGiven an input feature vector, yiFor its corresponding tag, WiAnd biIs the weight vector and offset, W, of sample iyiAnd byiIs yiThe superscript T denotes transposition, alpha is the boundary between a positive and negative sample pair,features [ X ] extracted from the fixed picture, the positive sample picture, and the negative sample picture respectively through the network]+Max (X, 0); and selecting the most dissimilar positive sample and the most similar negative sample from the batch to form the triad for each anchor point sample in the batch.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110218857.0A CN112818931A (en) | 2021-02-26 | 2021-02-26 | Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110218857.0A CN112818931A (en) | 2021-02-26 | 2021-02-26 | Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112818931A true CN112818931A (en) | 2021-05-18 |
Family
ID=75864137
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110218857.0A Pending CN112818931A (en) | 2021-02-26 | 2021-02-26 | Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112818931A (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113177518A (en) * | 2021-05-24 | 2021-07-27 | 西安建筑科技大学 | Vehicle weight identification method recommended by weak supervision area |
CN113239784A (en) * | 2021-05-11 | 2021-08-10 | 广西科学院 | Pedestrian re-identification system and method based on space sequence feature learning |
CN113255615A (en) * | 2021-07-06 | 2021-08-13 | 南京视察者智能科技有限公司 | Pedestrian retrieval method and device for self-supervision learning |
CN113255597A (en) * | 2021-06-29 | 2021-08-13 | 南京视察者智能科技有限公司 | Transformer-based behavior analysis method and device and terminal equipment thereof |
CN113283507A (en) * | 2021-05-27 | 2021-08-20 | 大连海事大学 | Multi-view-based feature fusion vehicle re-identification method |
CN113361464A (en) * | 2021-06-30 | 2021-09-07 | 重庆交通大学 | Vehicle weight recognition method based on multi-granularity feature segmentation |
CN113537032A (en) * | 2021-07-12 | 2021-10-22 | 南京邮电大学 | Diversity multi-branch pedestrian re-identification method based on picture block discarding |
CN113743497A (en) * | 2021-09-02 | 2021-12-03 | 南京理工大学 | Fine granularity identification method and system based on attention mechanism and multi-scale features |
CN113792686A (en) * | 2021-09-17 | 2021-12-14 | 中南大学 | Vehicle weight identification method based on cross-sensor invariance of visual representation |
CN114140700A (en) * | 2021-12-01 | 2022-03-04 | 西安电子科技大学 | Step-by-step heterogeneous image template matching method based on cascade network |
CN114187606A (en) * | 2021-10-21 | 2022-03-15 | 江阴市智行工控科技有限公司 | Garage pedestrian detection method and system adopting branch fusion network for light weight |
CN115050044A (en) * | 2022-04-02 | 2022-09-13 | 广西科学院 | Cross-modal pedestrian re-identification method based on MLP-Mixer |
CN115050048A (en) * | 2022-05-25 | 2022-09-13 | 杭州像素元科技有限公司 | Cross-modal pedestrian re-identification method based on local detail features |
CN115240121A (en) * | 2022-09-22 | 2022-10-25 | 之江实验室 | Joint modeling method and device for enhancing local features of pedestrians |
CN115294601A (en) * | 2022-07-22 | 2022-11-04 | 苏州大学 | Pedestrian re-identification method based on multi-scale feature dynamic fusion |
CN115841683A (en) * | 2022-12-27 | 2023-03-24 | 石家庄铁道大学 | Light-weight pedestrian re-identification method combining multi-level features |
CN115909455A (en) * | 2022-11-16 | 2023-04-04 | 航天恒星科技有限公司 | Expression recognition method integrating multi-scale feature extraction and attention mechanism |
CN116052218A (en) * | 2023-02-13 | 2023-05-02 | 中国矿业大学 | Pedestrian re-identification method |
CN117612266A (en) * | 2024-01-24 | 2024-02-27 | 南京信息工程大学 | Cross-resolution pedestrian re-identification method based on multi-scale image and feature layer alignment |
CN117994822A (en) * | 2024-04-07 | 2024-05-07 | 南京信息工程大学 | Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019025601A1 (en) * | 2017-08-03 | 2019-02-07 | Koninklijke Philips N.V. | Hierarchical neural networks with granularized attention |
CN111178432A (en) * | 2019-12-30 | 2020-05-19 | 武汉科技大学 | Weak supervision fine-grained image classification method of multi-branch neural network model |
CN111460914A (en) * | 2020-03-13 | 2020-07-28 | 华南理工大学 | Pedestrian re-identification method based on global and local fine-grained features |
CN111461038A (en) * | 2020-04-07 | 2020-07-28 | 中北大学 | Pedestrian re-identification method based on layered multi-mode attention mechanism |
US20200242153A1 (en) * | 2019-01-29 | 2020-07-30 | Samsung Electronics Co., Ltd. | Method, apparatus, electronic device and computer readable storage medium for image searching |
CN111539370A (en) * | 2020-04-30 | 2020-08-14 | 华中科技大学 | Image pedestrian re-identification method and system based on multi-attention joint learning |
CN111666851A (en) * | 2020-05-28 | 2020-09-15 | 大连理工大学 | Cross domain self-adaptive pedestrian re-identification method based on multi-granularity label |
CN111709331A (en) * | 2020-06-03 | 2020-09-25 | 江南大学 | Pedestrian re-identification method based on multi-granularity information interaction model |
-
2021
- 2021-02-26 CN CN202110218857.0A patent/CN112818931A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019025601A1 (en) * | 2017-08-03 | 2019-02-07 | Koninklijke Philips N.V. | Hierarchical neural networks with granularized attention |
US20200242153A1 (en) * | 2019-01-29 | 2020-07-30 | Samsung Electronics Co., Ltd. | Method, apparatus, electronic device and computer readable storage medium for image searching |
CN111178432A (en) * | 2019-12-30 | 2020-05-19 | 武汉科技大学 | Weak supervision fine-grained image classification method of multi-branch neural network model |
CN111460914A (en) * | 2020-03-13 | 2020-07-28 | 华南理工大学 | Pedestrian re-identification method based on global and local fine-grained features |
CN111461038A (en) * | 2020-04-07 | 2020-07-28 | 中北大学 | Pedestrian re-identification method based on layered multi-mode attention mechanism |
CN111539370A (en) * | 2020-04-30 | 2020-08-14 | 华中科技大学 | Image pedestrian re-identification method and system based on multi-attention joint learning |
CN111666851A (en) * | 2020-05-28 | 2020-09-15 | 大连理工大学 | Cross domain self-adaptive pedestrian re-identification method based on multi-granularity label |
CN111709331A (en) * | 2020-06-03 | 2020-09-25 | 江南大学 | Pedestrian re-identification method based on multi-granularity information interaction model |
Non-Patent Citations (1)
Title |
---|
卢健等: "深度学习行人再识别研究综述", 《激光与光电子学进展》, vol. 57, no. 16, pages 1 - 18 * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113239784A (en) * | 2021-05-11 | 2021-08-10 | 广西科学院 | Pedestrian re-identification system and method based on space sequence feature learning |
CN113177518A (en) * | 2021-05-24 | 2021-07-27 | 西安建筑科技大学 | Vehicle weight identification method recommended by weak supervision area |
CN113177518B (en) * | 2021-05-24 | 2023-04-28 | 西安建筑科技大学 | Vehicle re-identification method based on weak supervision area recommendation |
CN113283507A (en) * | 2021-05-27 | 2021-08-20 | 大连海事大学 | Multi-view-based feature fusion vehicle re-identification method |
CN113283507B (en) * | 2021-05-27 | 2024-04-05 | 大连海事大学 | Feature fusion vehicle re-identification method based on multiple views |
CN113255597A (en) * | 2021-06-29 | 2021-08-13 | 南京视察者智能科技有限公司 | Transformer-based behavior analysis method and device and terminal equipment thereof |
CN113361464A (en) * | 2021-06-30 | 2021-09-07 | 重庆交通大学 | Vehicle weight recognition method based on multi-granularity feature segmentation |
CN113255615A (en) * | 2021-07-06 | 2021-08-13 | 南京视察者智能科技有限公司 | Pedestrian retrieval method and device for self-supervision learning |
CN113255615B (en) * | 2021-07-06 | 2021-09-28 | 南京视察者智能科技有限公司 | Pedestrian retrieval method and device for self-supervision learning |
CN113537032B (en) * | 2021-07-12 | 2023-11-28 | 南京邮电大学 | Diversity multi-branch pedestrian re-identification method based on picture block discarding |
CN113537032A (en) * | 2021-07-12 | 2021-10-22 | 南京邮电大学 | Diversity multi-branch pedestrian re-identification method based on picture block discarding |
CN113743497A (en) * | 2021-09-02 | 2021-12-03 | 南京理工大学 | Fine granularity identification method and system based on attention mechanism and multi-scale features |
CN113792686A (en) * | 2021-09-17 | 2021-12-14 | 中南大学 | Vehicle weight identification method based on cross-sensor invariance of visual representation |
CN113792686B (en) * | 2021-09-17 | 2023-12-08 | 中南大学 | Vehicle re-identification method based on visual representation of invariance across sensors |
CN114187606A (en) * | 2021-10-21 | 2022-03-15 | 江阴市智行工控科技有限公司 | Garage pedestrian detection method and system adopting branch fusion network for light weight |
CN114140700A (en) * | 2021-12-01 | 2022-03-04 | 西安电子科技大学 | Step-by-step heterogeneous image template matching method based on cascade network |
CN115050044A (en) * | 2022-04-02 | 2022-09-13 | 广西科学院 | Cross-modal pedestrian re-identification method based on MLP-Mixer |
CN115050044B (en) * | 2022-04-02 | 2023-06-23 | 广西科学院 | Cross-modal pedestrian re-identification method based on MLP-Mixer |
CN115050048B (en) * | 2022-05-25 | 2023-04-18 | 杭州像素元科技有限公司 | Cross-modal pedestrian re-identification method based on local detail features |
CN115050048A (en) * | 2022-05-25 | 2022-09-13 | 杭州像素元科技有限公司 | Cross-modal pedestrian re-identification method based on local detail features |
CN115294601A (en) * | 2022-07-22 | 2022-11-04 | 苏州大学 | Pedestrian re-identification method based on multi-scale feature dynamic fusion |
CN115294601B (en) * | 2022-07-22 | 2023-07-11 | 苏州大学 | Pedestrian re-recognition method based on multi-scale feature dynamic fusion |
CN115240121A (en) * | 2022-09-22 | 2022-10-25 | 之江实验室 | Joint modeling method and device for enhancing local features of pedestrians |
CN115909455A (en) * | 2022-11-16 | 2023-04-04 | 航天恒星科技有限公司 | Expression recognition method integrating multi-scale feature extraction and attention mechanism |
CN115909455B (en) * | 2022-11-16 | 2023-09-19 | 航天恒星科技有限公司 | Expression recognition method integrating multi-scale feature extraction and attention mechanism |
CN115841683A (en) * | 2022-12-27 | 2023-03-24 | 石家庄铁道大学 | Light-weight pedestrian re-identification method combining multi-level features |
CN115841683B (en) * | 2022-12-27 | 2023-06-20 | 石家庄铁道大学 | Lightweight pedestrian re-identification method combining multi-level features |
CN116052218B (en) * | 2023-02-13 | 2023-07-18 | 中国矿业大学 | Pedestrian re-identification method |
CN116052218A (en) * | 2023-02-13 | 2023-05-02 | 中国矿业大学 | Pedestrian re-identification method |
CN117612266A (en) * | 2024-01-24 | 2024-02-27 | 南京信息工程大学 | Cross-resolution pedestrian re-identification method based on multi-scale image and feature layer alignment |
CN117612266B (en) * | 2024-01-24 | 2024-04-19 | 南京信息工程大学 | Cross-resolution pedestrian re-identification method based on multi-scale image and feature layer alignment |
CN117994822A (en) * | 2024-04-07 | 2024-05-07 | 南京信息工程大学 | Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112818931A (en) | Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion | |
CN111126360B (en) | Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model | |
CN108960140B (en) | Pedestrian re-identification method based on multi-region feature extraction and fusion | |
CN111259786B (en) | Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video | |
Wang et al. | Large-scale isolated gesture recognition using convolutional neural networks | |
CN111709311B (en) | Pedestrian re-identification method based on multi-scale convolution feature fusion | |
CN111325111A (en) | Pedestrian re-identification method integrating inverse attention and multi-scale deep supervision | |
CN105718889B (en) | Based on GB (2D)2The face personal identification method of PCANet depth convolution model | |
CN110503076B (en) | Video classification method, device, equipment and medium based on artificial intelligence | |
CN110598543B (en) | Model training method based on attribute mining and reasoning and pedestrian re-identification method | |
CN111401145B (en) | Visible light iris recognition method based on deep learning and DS evidence theory | |
CN109784288B (en) | Pedestrian re-identification method based on discrimination perception fusion | |
CN113158815A (en) | Unsupervised pedestrian re-identification method, system and computer readable medium | |
CN113361549A (en) | Model updating method and related device | |
CN115909407A (en) | Cross-modal pedestrian re-identification method based on character attribute assistance | |
CN112084895A (en) | Pedestrian re-identification method based on deep learning | |
CN111985332A (en) | Gait recognition method for improving loss function based on deep learning | |
CN116704611A (en) | Cross-visual-angle gait recognition method based on motion feature mixing and fine-granularity multi-stage feature extraction | |
CN116091946A (en) | Yolov 5-based unmanned aerial vehicle aerial image target detection method | |
CN113111797A (en) | Cross-view gait recognition method combining self-encoder and view transformation model | |
CN115984765A (en) | Pedestrian re-identification method based on double-current block network, electronic equipment and medium | |
CN113627380A (en) | Cross-vision-field pedestrian re-identification method and system for intelligent security and early warning | |
CN113537032A (en) | Diversity multi-branch pedestrian re-identification method based on picture block discarding | |
Cheng et al. | Automatic Data Cleaning System for Large-Scale Location Image Databases Using a Multilevel Extractor and Multiresolution Dissimilarity Calculation | |
CN110580503A (en) | AI-based double-spectrum target automatic identification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |