CN112818931A - Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion - Google Patents

Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion Download PDF

Info

Publication number
CN112818931A
CN112818931A CN202110218857.0A CN202110218857A CN112818931A CN 112818931 A CN112818931 A CN 112818931A CN 202110218857 A CN202110218857 A CN 202110218857A CN 112818931 A CN112818931 A CN 112818931A
Authority
CN
China
Prior art keywords
pedestrian
local
grained
feature
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110218857.0A
Other languages
Chinese (zh)
Inventor
云霄
葛敏
张晓光
周成峰
周恒�
李岳健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology CUMT
Original Assignee
China University of Mining and Technology CUMT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology CUMT filed Critical China University of Mining and Technology CUMT
Priority to CN202110218857.0A priority Critical patent/CN112818931A/en
Publication of CN112818931A publication Critical patent/CN112818931A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion, which comprises the steps of selecting a pedestrian re-identification data set, and preprocessing a training set in the data set; selecting a residual error network as a basic framework, wherein the residual error network comprises a global coarse-grained fusion learning branch, a local coarse-grained fusion learning branch and a local attention fine-grained fusion learning branch; adopting Softmax loss and triple loss as a re-recognition network monitor to train a pedestrian re-recognition network model; fusing network characteristics of different branches to serve as a final descriptor of the pedestrian, and taking the image of the pedestrian to be inquired as the input of a pedestrian re-identification network model to obtain a pedestrian re-identification result. The invention effectively relieves the pressure of the complex background or posture change on the re-recognition task and improves the recognition precision.

Description

Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a pedestrian re-identification method.
Background
Pedestrian re-identification is a technology for searching a given pedestrian under a camera according to wearing, posture, hair style and other information of the pedestrian by using a computer vision algorithm. When a specific pedestrian in one monitoring device is determined, the pedestrian can be retrieved from other non-overlapping camera devices through the method and tracking identification is carried out. In recent years, the method is widely applied to safety monitoring in public places in combination with video pedestrian tracking and detecting technologies.
Traditional research methods are mostly based on manual design features, but with the rapid development of deep learning, the traditional research methods are gradually replaced. The convolutional neural network is one of typical feature extraction methods in deep learning, can automatically learn and obtain features from data samples, and improves the performance of a pedestrian re-identification system, so that the convolutional neural network is widely applied to the field. However, in a real scene, deploying a pedestrian re-identification system still suffers from many factors, for example, images acquired from monitoring equipment are blurred, the posture of a pedestrian changes, the angle of a camera is different, and occlusion interference occurs, so that the identification rate is low.
Disclosure of Invention
In order to solve the technical problems mentioned in the background art, the invention provides a multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion.
In order to achieve the technical purpose, the technical scheme of the invention is as follows:
the multi-scale pedestrian re-identification method based on the multi-granularity depth feature fusion comprises the following steps:
(1) selecting a pedestrian re-identification data set, and preprocessing a training set in the data set;
(2) selecting a residual error network as a basic framework, wherein the residual error network comprises a global coarse-grained fusion learning branch, a local coarse-grained fusion learning branch and a local attention fine-grained fusion learning branch;
(3) learning multi-level coarse-grained characteristic information of the pedestrian by adopting a global coarse-grained fusion learning branch;
(4) extracting pedestrian local features from the local area by adopting a local coarse-grained fusion learning branch;
(5) adopting a local attention fine-grained fusion learning branch, introducing an attention mechanism to eliminate background interference, and extracting pedestrian fine-grained local features;
(6) adopting Softmax loss and triple loss as a re-recognition network monitor to train a pedestrian re-recognition network model;
(7) fusing network features of different branches to serve as a final descriptor of a pedestrian, taking an image of the pedestrian to be inquired as input of a pedestrian re-identification network model, retrieving from a candidate gallery, calculating characteristic distances between the image of the pedestrian to be inquired and all images in the candidate gallery, and sorting according to the characteristic distances, wherein the image in the candidate gallery with the closest characteristic distance and the image of the pedestrian to be inquired are data of the same pedestrian.
Further, in step (1), the training set is preprocessed as follows:
(1a) fixing the image size of the pedestrians in the training set;
(1b) and carrying out horizontal turning, rotation, random clipping and normalization processing on the samples of each identity in the training set, thereby increasing the number of training samples.
Further, in the step (2), the step of selecting the residual error network as the basic skeleton is as follows:
(2a) parameter refinement is carried out on the ResNet50 backbone network 4 th residual error stage;
(2b) for the global coarse-grained fusion learning branch, starting from ResNet50 stage1,2 and 3, keeping ResNet50 stage 4 and 5, and using global average pooling processing on the obtained feature map to obtain a 2048-dimensional global feature vector fg_2048
(2c) For local coarse-grained fusion learning branch, the feature map of ResNet50 stage3 is divided into two equal parts horizontally and simultaneouslyThe downsample layer step size is set to 1 at ResNet50 stage 5; secondly, using a global average pooling layer to obtain the local characteristic f of each partitionp_2048_1And fp_2048_2
(2d) For the local attention fine-grained fusion learning branch, a convolution attention module is introduced on the basis of the local coarse-grained fusion learning branch to obtain the local characteristic f of each partitionpab_2048_1And fpab_2048_2
Further, in step (3), the method for learning the multi-level coarse-grained feature information of the pedestrian by adopting the global coarse-grained fusion learning branch includes that firstly, f is usedg_2048As the last pedestrian feature descriptor, training by using a triple loss function; next, f was processed using Conv1 × 1g_2048To obtain 512-dimensional feature vector fg_512And trained using the Softmax loss function.
Further, in step (4), the method for extracting the local features of the pedestrian for the local region by using the local coarse-grained fusion learning branch is that firstly, f isp_2048_1As the last pedestrian feature descriptor, directly using a triple hard-to-load sample loss function for training; next, process f using the batch normalization layer, the nonlinear activation function ReLU, the Dropout layer, and the batch normalization layer in this orderpab_2048_2(ii) a Finally, a 512-dimensional f is obtained by Conv1 × 1p_512_1And fp_512_2Training was performed using the Softmax loss function.
Further, in the step (5), the method for extracting the fine-grained local features of the pedestrian by using the local attention fine-grained fusion learning branch includes that the attention mechanism includes a channel attention mechanism and a space attention mechanism; firstly, inputting a pedestrian picture, and obtaining an input feature mapping chart F through a ResNet50 basic skeleton; secondly, F enters the channel attention, two different one-dimensional eigenvectors are obtained by adopting average pooling and maximum pooling processing, and are transmitted to a shared multilayer perceptron to obtain a characteristic space matrix F'; then, obtaining a characteristic F' through space attention;
local features of each partition obtained by attention mechanismSign fpab_2048_1And fpab_2048_2First, f ispab_2048_1Directly using a triple hard negative sample loss function as a final pedestrian feature descriptor for training; next, process f using the batch normalization layer, the nonlinear activation function ReLU, the Dropout layer, and the batch normalization layer in this orderpab_2048_2(ii) a Finally, a 512-dimensional f is obtained by Conv1 × 1pab_512_1And fpab_512_2And trained using the Softmax loss function.
Further, the feature space matrix F' is obtained by:
F′=MC(F)×F
Figure BDA0002953596980000041
in the above formula, σ is an activation function, MLP denotes a multilayer perceptron, W0And W1Respectively, the weight of the MLP, AvgPool and MaxPool respectively represent the average pooling and maximum pooling,
Figure BDA0002953596980000048
and
Figure BDA0002953596980000049
two different modifiers were obtained after average pooling and maximum pooling, respectively.
Further, the spatial attention mechanism first generates two 2-dimensional feature maps using average pooling and maximum pooling
Figure BDA0002953596980000046
And
Figure BDA0002953596980000047
and concatenate them to generate a valid feature descriptor, which is then passed through convolutional layer f7×7Generating a space mechanical drawing, compressing to obtain F':
F″=MS(F′)×F′
Figure BDA0002953596980000042
in the above equation, σ is the activation function, and AvgPool and MaxPool represent the average pooling and maximum pooling, respectively.
Further, in step (6), a joint loss function of the Softmax loss and the triplet loss is constructed:
Losstotal=(1-w)Losssoftmax+wLosstriplet
Figure BDA0002953596980000043
Figure BDA0002953596980000044
therein, LosstotalBeing a joint Loss function, LosssoftmaxAs a Softmax Loss function, LosstripletIs a triple loss function, w is a balance coefficient, w belongs to (0,1), N is the training sample batch number, C is the training sample class number, fiGiven an input feature vector, yiFor its corresponding tag, WiAnd biIs the weight vector and offset, W, of sample iyiAnd byiIs yiThe superscript T denotes transposition, alpha is the boundary between a positive and negative sample pair,
Figure BDA0002953596980000045
features [ X ] extracted from the fixed picture, the positive sample picture, and the negative sample picture respectively through the network]+Max (X, 0); and selecting the most dissimilar positive sample and the most similar negative sample from the batch to form the triad for each anchor point sample in the batch.
Adopt the beneficial effect that above-mentioned technical scheme brought:
the design of the invention comprises a global coarse-grained fusion learning branch, a local coarse-grained fusion learning branch and a local attention fine-grained fusion learning branch multi-branch network structure, thus relieving the pressure of complex background or pedestrian posture change on a re-recognition task, and learning global information and effective local distinguishing characteristics of pedestrians. According to the invention, a Softmax loss and triple loss function combined training network is adopted, so that the characteristic distance between the sample data of the same type of pedestrians is closer, the sample data of different types of pedestrians is farther, and the re-identification performance of the pedestrians is improved.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a schematic diagram of a Market-1501 data set in the embodiment;
FIG. 3 is a schematic diagram of an embodiment of a network framework;
FIG. 4 is a CBAM framework map of an embodiment;
FIG. 5 is a pedestrian retrieval map of an embodiment Market-1501 data set;
FIG. 6 is a schematic diagram of an embodiment marker-1501 data set w parameter performance index.
Detailed Description
The technical scheme of the invention is explained in detail in the following with the accompanying drawings.
The invention designs a multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion, as shown in figure 1, the steps are as follows:
step 1: selecting a pedestrian re-identification data set, and preprocessing a training set in the data set;
step 2: selecting a residual error network as a basic framework, wherein the residual error network comprises a global coarse-grained fusion learning branch, a local coarse-grained fusion learning branch and a local attention fine-grained fusion learning branch;
and step 3: learning multi-level coarse-grained characteristic information of the pedestrian by adopting a global coarse-grained fusion learning branch;
and 4, step 4: extracting pedestrian local features from the local area by adopting a local coarse-grained fusion learning branch;
and 5: adopting a local attention fine-grained fusion learning branch, introducing an attention mechanism to eliminate background interference, and extracting pedestrian fine-grained local features;
step 6: adopting Softmax loss and triple loss as a re-recognition network monitor to train a pedestrian re-recognition network model;
and 7: fusing network features of different branches to serve as a final descriptor of a pedestrian, taking an image of the pedestrian to be inquired as input of a pedestrian re-identification network model, retrieving from a candidate gallery, calculating characteristic distances between the image of the pedestrian to be inquired and all images in the candidate gallery, and sorting according to the characteristic distances, wherein the image in the candidate gallery with the closest characteristic distance and the image of the pedestrian to be inquired are data of the same pedestrian.
Referring to fig. 2, the present embodiment selects a Market-1501 data set, which is images of 1501 pedestrians captured by five high-resolution and one low-fraction camera devices in a qinghua university campus, from a pedestrian re-recognition field public data set. The image is a sample of partial data of a Market-1501 data set, and the image quality is uneven, the pedestrian posture is greatly changed, and the background is noisy. Preprocessing a training set of the training system, specifically:
s1.1: setting the image size of any sample data in the training set to be 256 multiplied by 128 in a uniform size;
s1.2: horizontally turning and rotating any sample data in the training set, then cutting the random length-width ratio to be 0.75, and then scaling the image obtained after cutting to be 256 multiplied by 128; the pixel values are then normalized.
In this embodiment, a Market-1501 data set is selected as the pedestrian data research content, as shown in fig. 3, the network framework of this embodiment is developed based on ResNet50, and the specific steps are as follows:
s2.1: parameter refinement is carried out on the ResNet50 backbone network 4 th residual error stage;
s2.2: for the global coarse-grained fusion learning branch, starting from ResNet50 stage1,2,3, keeping ResNet50 stage 4,5, and using the obtained feature map to use the globalAverage pooling treatment to obtain 2048-dimensional global feature vector fg_2048
S2.3: for local coarse-grained fusion learning branches, the feature map of the ResNet50 stage3 is divided into two equal parts horizontally, and the step size of a downsampling layer is set to be 1 at the ResNet50 stage 5; secondly, using a global average pooling layer to obtain the local characteristic f of each partitionp_2048_1And fp_2048_2
S2.4: for the local attention fine-grained fusion learning branch, a convolution attention module is introduced on the basis of the local coarse-grained fusion learning branch to obtain the local characteristic f of each partitionpab_2048_1And fpab_2048_2
In this embodiment, the method for learning the multi-level coarse-grained characteristic information of the pedestrian by using the global coarse-grained fusion learning branch includes that firstly, f isg_2048As the last pedestrian feature descriptor, training by using a triple loss function; next, f was processed using Conv1 × 1g_2048To obtain 512-dimensional feature vector fg_512And trained using the Softmax loss function.
In the embodiment, the method for extracting the local features of the pedestrian for the local area by adopting the local coarse-grained fusion learning branch is that firstly, f isp_2048_1As the last pedestrian feature descriptor, directly using a triple hard-to-load sample loss function for training; next, process f using the batch normalization layer, the nonlinear activation function ReLU, the Dropout layer, and the batch normalization layer in this orderpab_2048_2(ii) a Finally, a 512-dimensional f is obtained by Conv1 × 1p_512_1And fp_512_2Training was performed using the Softmax loss function.
In this embodiment, the method for extracting the fine-grained local features of the pedestrian by using the local attention fine-grained fusion learning branch includes, as shown in fig. 4, that the attention mechanism (CBAM) includes a channel attention mechanism and a spatial attention mechanism; firstly, inputting a pedestrian picture, and obtaining an input feature mapping chart F through a ResNet50 basic skeleton; secondly, F enters the channel attention, two different one-dimensional eigenvectors are obtained by adopting average pooling and maximum pooling processing, and are transmitted to a shared multilayer perceptron to obtain a characteristic space matrix F'; then, after a further spatial attention, feature F' is obtained.
The feature space matrix F' is obtained by:
F′=MC(F)×F
Figure BDA0002953596980000081
in the above formula, σ is an activation function, MLP denotes a multilayer perceptron, W0And W1Respectively, the weight of the MLP, AvgPool and MaxPool respectively represent the average pooling and maximum pooling,
Figure BDA0002953596980000082
and
Figure BDA0002953596980000083
two different modifiers were obtained after average pooling and maximum pooling, respectively.
The spatial attention mechanism first generates two 2-dimensional feature maps by using average pooling and maximum pooling
Figure BDA0002953596980000084
And
Figure BDA0002953596980000085
and concatenate them to generate a valid feature descriptor, which is then passed through convolutional layer f7×7Generating a space mechanical drawing, compressing to obtain F':
F″=MS(F′)×F′
Figure BDA0002953596980000086
local features f of the partitions obtained by the attention mechanismpab_2048_1And fpab_2048_2First, f ispab_2048_1Direct use of three as last pedestrian feature descriptorTraining a tuple hard negative sample loss function; next, process f using the batch normalization layer, the nonlinear activation function ReLU, the Dropout layer, and the batch normalization layer in this orderpab_2048_2(ii) a Finally, a 512-dimensional f is obtained by Conv1 × 1pab_512_1And fpab_512_2And trained using the Softmax loss function.
In this example, a joint loss function of Softmax loss and triplet loss is constructed:
Losstotal=(1-w)Losssoftmax+wLosstriplet
Figure BDA0002953596980000087
Figure BDA0002953596980000088
therein, LosstotalBeing a joint Loss function, LosssoftmaxAs a Softmax Loss function, LosstripletIs a triple loss function, w is a balance coefficient, w belongs to (0,1), N is the training sample batch number, C is the training sample class number, fiGiven an input feature vector, yiFor its corresponding tag, WiAnd biIs the weight vector and offset, W, of sample iyiAnd byiIs yiThe superscript T denotes transposition, alpha is the boundary between a positive and negative sample pair,
Figure BDA0002953596980000091
features [ X ] extracted from the fixed picture, the positive sample picture, and the negative sample picture respectively through the network]+Max (X, 0); and selecting the most dissimilar positive sample and the most similar negative sample from the batch to form the triad for each anchor point sample in the batch.
In this embodiment, a multiscale network with multi-granularity depth feature fusion is initialized using ResNet50 weights trained on Imagenet in advance, where weights of different branches are shared. In this embodiment, each small batch of P identities is randomly selected to be sampled, and each identity randomly samples K images from a training set to meet the requirement of a triplet, where P is 16 and K is 4 in an experiment, and an SGD optimizer is selected, where weight attenuation is set to 5e-4 and momentum is 0.9. This embodiment sets the total number of training times to 240, the initial learning rate to 0.1, and the learning rate to be divided by 10 every iteration 40 times until the final learning rate to 0.001 remains unchanged, while setting the balance coefficient w to 0.6.
Fig. 5 is a person search graph on the Market-1501 data set in the present embodiment, the images in the first column represent query graphs, and the pictures searched from the gallery are sorted according to cosine similarity of 1 to 10. It can be seen from the ranking order that most of the retrieved images were correctly selected, probably due to insufficient image information collected from the single-view camera, but still some erroneous images labeled with red numbers and detection boxes.
FIG. 6 shows the effect of the w parameter of this embodiment. The Rank-1 represents the probability that the retrieved picture and the picture to be queried are the same identity, and the mAP represents the average precision that the retrieved picture and the picture to be queried are the same identity. When w is 0, only a single Softmax loss is used for supervising the training network, and the convolution descriptor is used as a unique pedestrian feature descriptor, so that pedestrian feature information of different levels is not fully utilized. Meanwhile, the Softmax loss only learns separable features, so that the learned features are insufficient in discrimination. When w > 0 invariance learning, the method of combined Softmax loss and triplet loss supervised training improves significantly. The effect is best when w is 0.6, the effectiveness of the method provided by the invention is verified, the respective defects are mutually compensated by combining Softmax loss and triple loss function supervised learning, and the characteristics of multilevel and finer granularity can be learned. But when w is 1, f is directly connected due to local fusion branchpab_2048_1And fpab_2048_2As a final descriptor, using triple loss alone as supervised training is not as effectiveAnd (5) performing combined training.
The embodiments are only for illustrating the technical idea of the present invention, and the technical idea of the present invention is not limited thereto, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the scope of the present invention.

Claims (9)

1. The multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion is characterized by comprising the following steps of:
(1) selecting a pedestrian re-identification data set, and preprocessing a training set in the data set;
(2) selecting a residual error network as a basic framework, wherein the residual error network comprises a global coarse-grained fusion learning branch, a local coarse-grained fusion learning branch and a local attention fine-grained fusion learning branch;
(3) learning multi-level coarse-grained characteristic information of the pedestrian by adopting a global coarse-grained fusion learning branch;
(4) extracting pedestrian local features from the local area by adopting a local coarse-grained fusion learning branch;
(5) adopting a local attention fine-grained fusion learning branch, introducing an attention mechanism to eliminate background interference, and extracting pedestrian fine-grained local features;
(6) adopting Softmax loss and triple loss as a re-recognition network monitor to train a pedestrian re-recognition network model;
(7) fusing network features of different branches to serve as a final descriptor of a pedestrian, taking an image of the pedestrian to be inquired as input of a pedestrian re-identification network model, retrieving from a candidate gallery, calculating characteristic distances between the image of the pedestrian to be inquired and all images in the candidate gallery, and sorting according to the characteristic distances, wherein the image in the candidate gallery with the closest characteristic distance and the image of the pedestrian to be inquired are data of the same pedestrian.
2. The multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion according to claim 1, wherein in the step (1), the step of preprocessing the training set is as follows:
(1a) fixing the image size of the pedestrians in the training set;
(1b) and carrying out horizontal turning, rotation, random clipping and normalization processing on the samples of each identity in the training set, thereby increasing the number of training samples.
3. The multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion according to claim 1, wherein in the step (2), the step of selecting the residual error network as the basic skeleton comprises the following steps:
(2a) parameter refinement is carried out on the ResNet50 backbone network 4 th residual error stage;
(2b) for the global coarse-grained fusion learning branch, starting from ResNet50 stage1,2 and 3, keeping ResNet50 stage 4 and 5, and using global average pooling processing on the obtained feature map to obtain a 2048-dimensional global feature vector fg_2048
(2c) For local coarse-grained fusion learning branches, the feature map of the ResNet50 stage3 is divided into two equal parts horizontally, and the step size of a downsampling layer is set to be 1 at the ResNet50 stage 5; secondly, using a global average pooling layer to obtain the local characteristic f of each partitionp_2048_1And fp_2048_2
(2d) For the local attention fine-grained fusion learning branch, a convolution attention module is introduced on the basis of the local coarse-grained fusion learning branch to obtain the local characteristic f of each partitionpab_2048_1And fpab_2048_2
4. The method for re-identifying pedestrians based on multi-level coarse-grained feature fusion of claim 3 is characterized in that in step (3), the method for learning the multi-level coarse-grained feature information of pedestrians by using the global coarse-grained fusion learning branch is that firstly, f is divided intog_2048As the last pedestrian feature descriptor, training by using a triple loss function; next, f was processed using Conv1 × 1g_2048To obtain 512-dimensional feature vector fg_512And trained using the Softmax loss function.
5. The method for multi-scale pedestrian re-identification based on multi-granularity depth feature fusion of claim 3, wherein in the step (4), the local feature of the pedestrian is extracted from the local region by using the local coarse-granularity fusion learning branch by firstly dividing fp_2048_1As the last pedestrian feature descriptor, directly using a triple hard-to-load sample loss function for training; next, process f using the batch normalization layer, the nonlinear activation function ReLU, the Dropout layer, and the batch normalization layer in this orderpab_2048_2(ii) a Finally, a 512-dimensional f is obtained by Conv1 × 1p_512_1And fp_512_2Training was performed using the Softmax loss function.
6. The multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion is characterized in that in the step (5), the method for extracting the pedestrian fine-granularity local features by adopting the local attention fine-granularity fusion learning branch comprises the steps of a channel attention mechanism and a space attention mechanism; firstly, inputting a pedestrian picture, and obtaining an input feature mapping chart F through a ResNet50 basic skeleton; secondly, F enters the channel attention, two different one-dimensional eigenvectors are obtained by adopting average pooling and maximum pooling processing, and are transmitted to a shared multilayer perceptron to obtain a characteristic space matrix F'; then, obtaining a characteristic F' through space attention;
local features f of the partitions obtained by the attention mechanismpab_2048_1And fpab_2048_2First, f ispab_2048_1Directly using a triple hard negative sample loss function as a final pedestrian feature descriptor for training; next, process f using the batch normalization layer, the nonlinear activation function ReLU, the Dropout layer, and the batch normalization layer in this orderpab_2048_2(ii) a Finally, a 512-dimensional f is obtained by Conv1 × 1pab_512_1And fpab_512_2And trained using the Softmax loss function.
7. The multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion according to claim 6, wherein the feature space matrix F' is obtained by the following method:
F′=MC(F)×F
Figure FDA0002953596970000031
in the above formula, σ is an activation function, MLP denotes a multilayer perceptron, W0And W1Respectively, the weight of the MLP, AvgPool and MaxPool respectively represent the average pooling and maximum pooling,
Figure FDA0002953596970000032
and
Figure FDA0002953596970000033
two different modifiers were obtained after average pooling and maximum pooling, respectively.
8. The method according to claim 6, wherein the spatial attention mechanism first generates two 2-dimensional feature maps by using average pooling and maximum pooling
Figure FDA0002953596970000034
And
Figure FDA0002953596970000035
and concatenate them to generate a valid feature descriptor, which is then passed through convolutional layer f7×7Generating a space mechanical drawing, compressing to obtain F':
F″=MS(F′)×F′
Figure FDA0002953596970000036
in the above equation, σ is the activation function, and AvgPool and MaxPool represent the average pooling and maximum pooling, respectively.
9. The multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion as claimed in claim 1, wherein in step (6), a joint loss function of Softmax loss and triple loss is constructed:
Losstotal=(1-w)Losssoftmax+wLosstriplet
Figure FDA0002953596970000041
Figure FDA0002953596970000042
therein, LosstotalBeing a joint Loss function, LosssoftmaxAs a Softmax Loss function, LosstripletIs a triple loss function, w is a balance coefficient, w belongs to (0,1), N is the training sample batch number, C is the training sample class number, fiGiven an input feature vector, yiFor its corresponding tag, WiAnd biIs the weight vector and offset, W, of sample iyiAnd byiIs yiThe superscript T denotes transposition, alpha is the boundary between a positive and negative sample pair,
Figure FDA0002953596970000043
features [ X ] extracted from the fixed picture, the positive sample picture, and the negative sample picture respectively through the network]+Max (X, 0); and selecting the most dissimilar positive sample and the most similar negative sample from the batch to form the triad for each anchor point sample in the batch.
CN202110218857.0A 2021-02-26 2021-02-26 Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion Pending CN112818931A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110218857.0A CN112818931A (en) 2021-02-26 2021-02-26 Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110218857.0A CN112818931A (en) 2021-02-26 2021-02-26 Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion

Publications (1)

Publication Number Publication Date
CN112818931A true CN112818931A (en) 2021-05-18

Family

ID=75864137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110218857.0A Pending CN112818931A (en) 2021-02-26 2021-02-26 Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion

Country Status (1)

Country Link
CN (1) CN112818931A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177518A (en) * 2021-05-24 2021-07-27 西安建筑科技大学 Vehicle weight identification method recommended by weak supervision area
CN113239784A (en) * 2021-05-11 2021-08-10 广西科学院 Pedestrian re-identification system and method based on space sequence feature learning
CN113255615A (en) * 2021-07-06 2021-08-13 南京视察者智能科技有限公司 Pedestrian retrieval method and device for self-supervision learning
CN113255597A (en) * 2021-06-29 2021-08-13 南京视察者智能科技有限公司 Transformer-based behavior analysis method and device and terminal equipment thereof
CN113283507A (en) * 2021-05-27 2021-08-20 大连海事大学 Multi-view-based feature fusion vehicle re-identification method
CN113361464A (en) * 2021-06-30 2021-09-07 重庆交通大学 Vehicle weight recognition method based on multi-granularity feature segmentation
CN113537032A (en) * 2021-07-12 2021-10-22 南京邮电大学 Diversity multi-branch pedestrian re-identification method based on picture block discarding
CN113743497A (en) * 2021-09-02 2021-12-03 南京理工大学 Fine granularity identification method and system based on attention mechanism and multi-scale features
CN113792686A (en) * 2021-09-17 2021-12-14 中南大学 Vehicle weight identification method based on cross-sensor invariance of visual representation
CN114140700A (en) * 2021-12-01 2022-03-04 西安电子科技大学 Step-by-step heterogeneous image template matching method based on cascade network
CN114187606A (en) * 2021-10-21 2022-03-15 江阴市智行工控科技有限公司 Garage pedestrian detection method and system adopting branch fusion network for light weight
CN115050044A (en) * 2022-04-02 2022-09-13 广西科学院 Cross-modal pedestrian re-identification method based on MLP-Mixer
CN115050048A (en) * 2022-05-25 2022-09-13 杭州像素元科技有限公司 Cross-modal pedestrian re-identification method based on local detail features
CN115240121A (en) * 2022-09-22 2022-10-25 之江实验室 Joint modeling method and device for enhancing local features of pedestrians
CN115294601A (en) * 2022-07-22 2022-11-04 苏州大学 Pedestrian re-identification method based on multi-scale feature dynamic fusion
CN115841683A (en) * 2022-12-27 2023-03-24 石家庄铁道大学 Light-weight pedestrian re-identification method combining multi-level features
CN115909455A (en) * 2022-11-16 2023-04-04 航天恒星科技有限公司 Expression recognition method integrating multi-scale feature extraction and attention mechanism
CN116052218A (en) * 2023-02-13 2023-05-02 中国矿业大学 Pedestrian re-identification method
CN117612266A (en) * 2024-01-24 2024-02-27 南京信息工程大学 Cross-resolution pedestrian re-identification method based on multi-scale image and feature layer alignment
CN117994822A (en) * 2024-04-07 2024-05-07 南京信息工程大学 Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019025601A1 (en) * 2017-08-03 2019-02-07 Koninklijke Philips N.V. Hierarchical neural networks with granularized attention
CN111178432A (en) * 2019-12-30 2020-05-19 武汉科技大学 Weak supervision fine-grained image classification method of multi-branch neural network model
CN111460914A (en) * 2020-03-13 2020-07-28 华南理工大学 Pedestrian re-identification method based on global and local fine-grained features
CN111461038A (en) * 2020-04-07 2020-07-28 中北大学 Pedestrian re-identification method based on layered multi-mode attention mechanism
US20200242153A1 (en) * 2019-01-29 2020-07-30 Samsung Electronics Co., Ltd. Method, apparatus, electronic device and computer readable storage medium for image searching
CN111539370A (en) * 2020-04-30 2020-08-14 华中科技大学 Image pedestrian re-identification method and system based on multi-attention joint learning
CN111666851A (en) * 2020-05-28 2020-09-15 大连理工大学 Cross domain self-adaptive pedestrian re-identification method based on multi-granularity label
CN111709331A (en) * 2020-06-03 2020-09-25 江南大学 Pedestrian re-identification method based on multi-granularity information interaction model

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019025601A1 (en) * 2017-08-03 2019-02-07 Koninklijke Philips N.V. Hierarchical neural networks with granularized attention
US20200242153A1 (en) * 2019-01-29 2020-07-30 Samsung Electronics Co., Ltd. Method, apparatus, electronic device and computer readable storage medium for image searching
CN111178432A (en) * 2019-12-30 2020-05-19 武汉科技大学 Weak supervision fine-grained image classification method of multi-branch neural network model
CN111460914A (en) * 2020-03-13 2020-07-28 华南理工大学 Pedestrian re-identification method based on global and local fine-grained features
CN111461038A (en) * 2020-04-07 2020-07-28 中北大学 Pedestrian re-identification method based on layered multi-mode attention mechanism
CN111539370A (en) * 2020-04-30 2020-08-14 华中科技大学 Image pedestrian re-identification method and system based on multi-attention joint learning
CN111666851A (en) * 2020-05-28 2020-09-15 大连理工大学 Cross domain self-adaptive pedestrian re-identification method based on multi-granularity label
CN111709331A (en) * 2020-06-03 2020-09-25 江南大学 Pedestrian re-identification method based on multi-granularity information interaction model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卢健等: "深度学习行人再识别研究综述", 《激光与光电子学进展》, vol. 57, no. 16, pages 1 - 18 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239784A (en) * 2021-05-11 2021-08-10 广西科学院 Pedestrian re-identification system and method based on space sequence feature learning
CN113177518A (en) * 2021-05-24 2021-07-27 西安建筑科技大学 Vehicle weight identification method recommended by weak supervision area
CN113177518B (en) * 2021-05-24 2023-04-28 西安建筑科技大学 Vehicle re-identification method based on weak supervision area recommendation
CN113283507A (en) * 2021-05-27 2021-08-20 大连海事大学 Multi-view-based feature fusion vehicle re-identification method
CN113283507B (en) * 2021-05-27 2024-04-05 大连海事大学 Feature fusion vehicle re-identification method based on multiple views
CN113255597A (en) * 2021-06-29 2021-08-13 南京视察者智能科技有限公司 Transformer-based behavior analysis method and device and terminal equipment thereof
CN113361464A (en) * 2021-06-30 2021-09-07 重庆交通大学 Vehicle weight recognition method based on multi-granularity feature segmentation
CN113255615A (en) * 2021-07-06 2021-08-13 南京视察者智能科技有限公司 Pedestrian retrieval method and device for self-supervision learning
CN113255615B (en) * 2021-07-06 2021-09-28 南京视察者智能科技有限公司 Pedestrian retrieval method and device for self-supervision learning
CN113537032B (en) * 2021-07-12 2023-11-28 南京邮电大学 Diversity multi-branch pedestrian re-identification method based on picture block discarding
CN113537032A (en) * 2021-07-12 2021-10-22 南京邮电大学 Diversity multi-branch pedestrian re-identification method based on picture block discarding
CN113743497A (en) * 2021-09-02 2021-12-03 南京理工大学 Fine granularity identification method and system based on attention mechanism and multi-scale features
CN113792686A (en) * 2021-09-17 2021-12-14 中南大学 Vehicle weight identification method based on cross-sensor invariance of visual representation
CN113792686B (en) * 2021-09-17 2023-12-08 中南大学 Vehicle re-identification method based on visual representation of invariance across sensors
CN114187606A (en) * 2021-10-21 2022-03-15 江阴市智行工控科技有限公司 Garage pedestrian detection method and system adopting branch fusion network for light weight
CN114140700A (en) * 2021-12-01 2022-03-04 西安电子科技大学 Step-by-step heterogeneous image template matching method based on cascade network
CN115050044A (en) * 2022-04-02 2022-09-13 广西科学院 Cross-modal pedestrian re-identification method based on MLP-Mixer
CN115050044B (en) * 2022-04-02 2023-06-23 广西科学院 Cross-modal pedestrian re-identification method based on MLP-Mixer
CN115050048B (en) * 2022-05-25 2023-04-18 杭州像素元科技有限公司 Cross-modal pedestrian re-identification method based on local detail features
CN115050048A (en) * 2022-05-25 2022-09-13 杭州像素元科技有限公司 Cross-modal pedestrian re-identification method based on local detail features
CN115294601A (en) * 2022-07-22 2022-11-04 苏州大学 Pedestrian re-identification method based on multi-scale feature dynamic fusion
CN115294601B (en) * 2022-07-22 2023-07-11 苏州大学 Pedestrian re-recognition method based on multi-scale feature dynamic fusion
CN115240121A (en) * 2022-09-22 2022-10-25 之江实验室 Joint modeling method and device for enhancing local features of pedestrians
CN115909455A (en) * 2022-11-16 2023-04-04 航天恒星科技有限公司 Expression recognition method integrating multi-scale feature extraction and attention mechanism
CN115909455B (en) * 2022-11-16 2023-09-19 航天恒星科技有限公司 Expression recognition method integrating multi-scale feature extraction and attention mechanism
CN115841683A (en) * 2022-12-27 2023-03-24 石家庄铁道大学 Light-weight pedestrian re-identification method combining multi-level features
CN115841683B (en) * 2022-12-27 2023-06-20 石家庄铁道大学 Lightweight pedestrian re-identification method combining multi-level features
CN116052218B (en) * 2023-02-13 2023-07-18 中国矿业大学 Pedestrian re-identification method
CN116052218A (en) * 2023-02-13 2023-05-02 中国矿业大学 Pedestrian re-identification method
CN117612266A (en) * 2024-01-24 2024-02-27 南京信息工程大学 Cross-resolution pedestrian re-identification method based on multi-scale image and feature layer alignment
CN117612266B (en) * 2024-01-24 2024-04-19 南京信息工程大学 Cross-resolution pedestrian re-identification method based on multi-scale image and feature layer alignment
CN117994822A (en) * 2024-04-07 2024-05-07 南京信息工程大学 Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion

Similar Documents

Publication Publication Date Title
CN112818931A (en) Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion
CN111126360B (en) Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model
CN108960140B (en) Pedestrian re-identification method based on multi-region feature extraction and fusion
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
Wang et al. Large-scale isolated gesture recognition using convolutional neural networks
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN111325111A (en) Pedestrian re-identification method integrating inverse attention and multi-scale deep supervision
CN105718889B (en) Based on GB (2D)2The face personal identification method of PCANet depth convolution model
CN110503076B (en) Video classification method, device, equipment and medium based on artificial intelligence
CN110598543B (en) Model training method based on attribute mining and reasoning and pedestrian re-identification method
CN111401145B (en) Visible light iris recognition method based on deep learning and DS evidence theory
CN109784288B (en) Pedestrian re-identification method based on discrimination perception fusion
CN113158815A (en) Unsupervised pedestrian re-identification method, system and computer readable medium
CN113361549A (en) Model updating method and related device
CN115909407A (en) Cross-modal pedestrian re-identification method based on character attribute assistance
CN112084895A (en) Pedestrian re-identification method based on deep learning
CN111985332A (en) Gait recognition method for improving loss function based on deep learning
CN116704611A (en) Cross-visual-angle gait recognition method based on motion feature mixing and fine-granularity multi-stage feature extraction
CN116091946A (en) Yolov 5-based unmanned aerial vehicle aerial image target detection method
CN113111797A (en) Cross-view gait recognition method combining self-encoder and view transformation model
CN115984765A (en) Pedestrian re-identification method based on double-current block network, electronic equipment and medium
CN113627380A (en) Cross-vision-field pedestrian re-identification method and system for intelligent security and early warning
CN113537032A (en) Diversity multi-branch pedestrian re-identification method based on picture block discarding
Cheng et al. Automatic Data Cleaning System for Large-Scale Location Image Databases Using a Multilevel Extractor and Multiresolution Dissimilarity Calculation
CN110580503A (en) AI-based double-spectrum target automatic identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination