CN113963150A - Pedestrian re-identification method based on multi-scale twin cascade network - Google Patents
Pedestrian re-identification method based on multi-scale twin cascade network Download PDFInfo
- Publication number
- CN113963150A CN113963150A CN202111355189.2A CN202111355189A CN113963150A CN 113963150 A CN113963150 A CN 113963150A CN 202111355189 A CN202111355189 A CN 202111355189A CN 113963150 A CN113963150 A CN 113963150A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- cascade
- network
- sample
- scale
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a pedestrian re-identification method based on a multi-scale twin cascade network, which comprises the following steps of: constructing a multi-scale twin cascade network; the multi-scale twin cascade network comprises a multi-scale twin cascade color network, a multi-scale twin cascade gray network, a fusion layer and a PCA dimension reduction layer; the multi-scale twin cascaded color network and the multi-scale twin cascaded gray scale network respectively comprise a first cascaded sub-network, a second cascaded sub-network and a third cascaded sub-network. The invention uses the multi-scale cascade network, fuses the cascade sub-feature graphs of the multi-scale and the corresponding superior sub-network and inputs the cascade sub-feature graphs into the secondary sub-network for pedestrian feature extraction, and fuses the pedestrian features of each sub-network, thereby obtaining more macroscopic and accurate pedestrian feature expression. Therefore, the method can obtain more global, high-level and accurate pedestrian feature expression, avoid the interference of chromatic aberration, illumination, scale set scenes and the like, and improve the accuracy of pedestrian re-identification.
Description
Technical Field
The invention belongs to the technical field of intelligent video image processing, and particularly relates to a pedestrian re-identification method based on a multi-scale twin cascade network.
Background
With the rapid development of 5G and the Internet of things, intelligent life is still natural. The intelligent security is an important component of intelligent life, and as a key technology of the intelligent security, the accuracy of a pedestrian re-identification technology for searching pedestrians under the condition of crossing camera devices is important. The current pedestrian re-identification technology has certain limitations, for example, due to differences among camera devices, pedestrians are susceptible to wearing color differences, illumination, scales, scenes and the like, and therefore accuracy is damaged. Therefore, the above factors of variation bring difficulties to the popularization and application of the pedestrian re-identification technology. Therefore, it is very important to extract the key effective characteristics of pedestrians under different equipment.
The characteristic expression method in the existing pedestrian re-identification method mainly comprises the following steps: 1. the semantic information of the extracted image represents the pedestrian features, and the pedestrian features extracted by the method have strong dependence on the clothing color, so that the collision/clothing color is difficult to distinguish when consistent; 2. the pedestrian features are extracted by using a single-scale input mode, and the detail features of images with different granularities are ignored by the pedestrian features extracted by the method; 3. the pedestrian re-identification method based on the neural network mainly uses a single network to extract pedestrian features, the pedestrian feature information is single, and the dependence on the design of a network structure is large.
Therefore, for the problems existing in the prior art, how to extract more key, effective, accurate and comprehensive pedestrian features in pedestrians in different image capturing devices is very necessary.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a pedestrian re-identification method based on a multi-scale twin cascade network, which can effectively solve the problems.
The technical scheme adopted by the invention is as follows:
the invention provides a pedestrian re-identification method based on a multi-scale twin cascade network, which comprises the following steps of:
dividing the data set into a training set TrainSet and a verification set;
the Network structures of the multi-scale twin cascade color Network _1 and the multi-scale twin cascade gray scale Network _2 are completely the same;
the multi-scale twin cascade color Network _1 comprises a first cascade color sub-Network level _1s, a second cascade color sub-Network level _2s and a third cascade color sub-Network level _3 s;
the multi-scale twin cascade gray level Network _2 comprises a first cascade gray level sub-Network level _1g, a second cascade gray level sub-Network level _2g and a third cascade gray level sub-Network level _3 g;
training the multi-scale twin cascade network by adopting the following mode to obtain the trained multi-scale twin cascade network:
step 2.1, taking 3 sample groups as a batch of sample group sets; each batch of 3 sample groups is represented as: sample set u1Sample group u2And a sample group u3(ii) a Wherein the sample group u1To fix the sample; sample set u2And a sample group u1Corresponding to the same pedestrian, sample group u2Is a sample group u1A positive sample of (a); sample set u3And a sample group u1Corresponding to different pedestrians, sample group u3Is a sample group u1A negative sample of (d);
inputting a set of sets of sample sets of a batch into the multi-scale twin cascaded network;
step 2.2, for each sample group, its color picture samples are represented as: color picture samples rgb _ tu, grayscale picture samples denoted gray _ tu;
inputting the color picture sample rgb _ tu into the multi-scale twin cascade color Network _1 to obtain a first cascade color pedestrian classification result class _1s output by the first cascade color sub-Network level _1s, a second cascade color pedestrian classification result class _2s output by the second cascade color sub-Network level _2s, a third cascade color pedestrian classification result class _3s output by the third cascade color sub-Network level _3s, and a color pedestrian fusion feature map rgb _ features output by the multi-scale twin cascade color Network _ 1;
inputting the gray picture sample gray _ tu into a multi-scale twin cascade gray Network _2 to obtain a first cascade gray pedestrian classification result class _1g output by a first cascade gray sub-Network level _1g, a second cascade gray pedestrian classification result class _2g output by a second cascade gray sub-Network level _2g, a third cascade gray pedestrian classification result class _3g output by a third cascade gray sub-Network level _3g and a gray pedestrian fusion feature map gray _ features output by a multi-scale twin cascade gray Network _ 2;
wherein, the color picture sample rgb _ tu is input into the multi-scale twin cascade color Network _1, and the specific process is as follows:
step 2.2.1, the color picture sample rgb _ tu is reduced to obtain a Scale _ a picture sample; further reducing the Scale _ a picture sample to obtain a Scale _ b picture sample; further reducing the Scale _ b picture sample to obtain a Scale _ c picture sample;
step 2.2.2, inputting the Scale _ a picture sample into a first cascade color sub-network level _1s, wherein the processing process of the first cascade color sub-network level _1s is as follows:
A1) carrying out convolution, batch normalization and activation on the Scale _ a picture samples to obtain a pedestrian feature map rgb _ feature _ a;
A2) down-sampling the pedestrian feature map rgb _ feature _ a to obtain a pedestrian feature map rgb _ feature1 with the same size as the Scale _ b picture sample;
A3) down-sampling the pedestrian feature map rgb _ feature1 to obtain a pedestrian feature map rgb _ feature2 with the same size as the Scale _ c picture sample;
A4) carrying out convolution and global average pooling operation on the pedestrian feature map rgb _ feature2, and inputting the operation into a first full connection layer to obtain a first cascade pedestrian feature map rgb _ stag1_ feature;
A5) inputting the first cascade pedestrian feature map rgb _ stag1_ feature into a second full-connection layer to obtain a first cascade color pedestrian classification result class _1 s;
step 2.2.3, the processing procedure of the second cascade color sub-network level _2s is as follows:
B1) carrying out convolution, batch normalization and activation on the Scale _ b picture samples to obtain a pedestrian characteristic image rgb _ feature _ b;
B2) performing pedestrian feature fusion on the pedestrian feature map rgb _ feature _ b and the pedestrian feature map rgb _ feature1, and then performing down-sampling to obtain a pedestrian feature map rgb _ feature3 with the same size as the Scale _ c picture sample;
B3) carrying out convolution and global average pooling operation on the pedestrian feature map rgb _ feature3, and inputting the operation into the first full connection layer to obtain a second cascade pedestrian feature map rgb _ stag2_ feature;
B4) inputting the second cascade pedestrian feature map rgb _ stag2_ feature into a second full-connection layer to obtain a second cascade color pedestrian classification result class _2 s;
step 2.2.4, the processing procedure of the third cascade color sub-network level _3s is as follows:
C1) carrying out convolution, batch normalization and activation on the Scale _ c picture samples to obtain a pedestrian characteristic image rgb _ feature _ c;
C2) pedestrian feature fusion is carried out on the pedestrian feature map rgb _ feature _ c, the pedestrian feature map rgb _ feature2 and the pedestrian feature map rgb _ feature3, convolution and global average pooling operation are carried out, then the pedestrian feature fusion is input into the first full connection layer, and a third-level joint pedestrian feature map rgb _ stag3_ feature is obtained;
C3) inputting the third-level joint pedestrian feature map rgb _ stag3_ feature into a second full-connection layer to obtain a third-level joint color pedestrian classification result class _3 s;
step 2.2.5, carrying out pedestrian feature fusion on the first cascade pedestrian feature map rgb _ stag1_ feature, the second cascade pedestrian feature map rgb _ stag2_ feature and the third cascade pedestrian feature map rgb _ stag3_ feature to obtain a colorful pedestrian fusion feature map rgb _ features;
step 2.3, for each sample group, carrying out pedestrian feature fusion on the color pedestrian fusion feature map rgb _ features and the gray level pedestrian fusion feature map gray _ features through a fusion layer, and then carrying out dimension reduction treatment through a PCA dimension reduction layer to obtain a final global pedestrian feature map features; the global pedestrian feature map features pass through the full connection layer to obtain a global pedestrian classification result classifys;
step 2.4, this batch has 3 sample groups in total, for any u < th > sample groupiA set of samples, i ═ 1,2,3, gives the u thiGlobal pedestrian feature map corresponding to each sample groupGlobal pedestrian classification resultsFirst cascade color pedestrian classification resultSecond cascade color pedestrian classification resultThird-level color pedestrian classification resultFirst cascade gray pedestrian classification resultSecond cascade gray pedestrian classification resultAnd third-level gray pedestrian classification result
Step 2.5, calculating loss values of all levels of sub-networks:
step 2.5.1, classifying the first cascade color pedestrian classification resultAnd uiComparing the sample labels of the sample groups to obtain a first cascade color pedestrian classification loss value
Classifying the second cascade color pedestrianAnd uiComparing the sample labels of the sample groups to obtain a second cascade color pedestrian classification loss value
Classifying the third cascade color pedestrianAnd uiComparing the sample labels of the sample groups to obtain a third cascade color pedestrian classification loss value
Classifying first-cascade gray level pedestriansResultsAnd uiComparing the sample labels of the individual sample groups to obtain a first cascade gray pedestrian classification loss value
Classifying the second cascade gray level pedestrianAnd uiComparing the sample labels of the sample groups to obtain a second cascade gray level pedestrian classification loss value
Classifying the third cascade gray pedestrianAnd uiComparing the sample labels of the sample groups to obtain a third cascade gray pedestrian classification loss value
Step 2.5.2, respectively calculating and obtaining a Loss value Loss _1s of the first cascade color sub-network level _1s, a Loss value Loss _2s of the second cascade color sub-network level _2s, a Loss value Loss _3s of the third cascade color sub-network level _3s, a Loss value Loss _1g of the first cascade gray sub-network level _1g, a Loss value Loss _2g of the second cascade gray sub-network level _2g, and a Loss value Loss _3g of the third cascade gray sub-network level _3g by adopting the following formula:
step 2.6, calculating a Loss value Loss _0 of the multi-scale twin cascade network:
step 2.6.1, classifying the global pedestrian resultsAnd uiComparing the sample labels of the individual sample groups to obtain a global pedestrian classification loss value
Step 2.6.2, calculating to obtain a Loss value Loss _0 of the multi-scale twin cascade network by adopting the following formula:
step 2.7, calculating a similarity Loss function value Loss _ sim between the sample groups:
step 2.7.1, calculate sample set u1Global pedestrian feature map ofAnd a sample group u2Global pedestrian feature map ofThe sample distance between, expressed as: d (u)1,u2);
Computing a set of samples u1Global pedestrian feature map ofAnd a sample group u3Global pedestrian feature map ofThe sample distance between, expressed as: d (u)1,u3);
Step 2.7.2, calculating a preliminary Loss function value Loss _ d by adopting the following formula:
Loss_d=d(u1,u2)-d(u1,u3)+α
wherein: alpha is a loss function coefficient, and the value range is as follows: alpha is alpha<d(u1,u3)-d(u1,u2)
Step 2.7.3, obtaining a similarity Loss function value Loss _ sim by the following method:
if the Loss _ d is larger than 0, then Loss _ sim is Loss _ d;
if the Loss _ d is less than or equal to 0, then Loss _ sim is 0;
and 2.8, obtaining a final Loss function value Loss _ final by adopting the following formula:
Loss_final=λ1Loss_1s+λ1Loss_2s+λ1Loss_3s+λ1Loss_1g+λ1Loss_2g+λ1Loss_3g+λ2Loss_0+λ3Loss_sim
wherein:
λ1weight coefficients representing each cascaded subnetwork;
λ2a weight coefficient representing a loss of the multi-scale twin cascaded network;
λ3a similarity loss function value weight coefficient;
step 2.9, judging whether the final Loss function value Loss _ final is converged; if the convergence is achieved, obtaining a trained multi-scale twin cascade network, and executing the step 3; if not, adjusting the network parameters of the multi-scale twin cascade network, taking another batch of sample group as input, returning to the step 2.1, and performing iterative training on the multi-scale twin cascade network;
and 4, performing feature recognition on the input pedestrian picture by adopting a multi-scale twin cascade network to obtain a pedestrian feature recognition result.
Preferably, λ1Is 1, λ2Is 6, λ3Is 7.
Preferably, step 4 specifically comprises:
step 4.1, the input pedestrian picture is a picture Q; pre-establishing a pedestrian sample library G;
step 4.2, inputting the picture Q into a multi-scale twin cascade network to obtain a global pedestrian feature map features[Q];
For each pedestrian sample picture G in the pedestrian sample library GjJ ═ 1, 2.. once, z, z represents the number of pedestrian sample pictures in the pedestrian sample library G, and the pictures are respectively input into the multi-scale twin cascade network to obtain the corresponding global pedestrian feature map
Step 4.3, calculating global pedestrian feature maps features[Q]And global pedestrian feature mapThe similarity of (2); and (4) sorting the similarity from large to small, and outputting the pedestrian sample pictures in the pedestrian sample library G with the highest similarity with the picture Q.
The pedestrian re-identification method based on the multi-scale twin cascade network provided by the invention has the following advantages:
the invention uses the multi-scale cascade network, fuses the cascade sub-feature graphs of the multi-scale and the corresponding superior sub-network and inputs the cascade sub-feature graphs into the secondary sub-network for pedestrian feature extraction, and fuses the pedestrian features of each sub-network, thereby obtaining more macroscopic and accurate pedestrian feature expression. Therefore, the method can obtain more global, high-level and accurate pedestrian feature expression, avoid the interference of chromatic aberration, illumination, scale set scenes and the like, and improve the accuracy of pedestrian re-identification.
Drawings
FIG. 1 is a schematic overall flow chart of a pedestrian re-identification method based on a multi-scale twin cascade network provided by the invention;
FIG. 2 is an overall schematic diagram of a multi-scale twin cascaded network provided by the present invention;
FIG. 3 is a diagram of a first cascaded sub-network level _1 according to the present invention;
FIG. 4 is a diagram of a level _2 of a second cascaded sub-network according to the present invention;
fig. 5 is a structural diagram of a level _3 of the third hierarchical network provided by the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Because interference conditions such as chromatic aberration, illumination, scale set scenes and the like in the prior art easily cause the reduction of the accuracy of pedestrian re-identification, the invention provides a pedestrian re-identification method based on a multi-scale twin cascade network, which has the following characteristics: 1) constructing a multi-scale twin cascade network with a color and gray level dual input structure, performing feature fusion on the multi-scale color cascade features and the multi-scale gray level cascade features, and then adopting a feature dimension reduction strategy, thereby obtaining more global, high-level and accurate pedestrian feature expression; 2) and (3) using a multi-scale cascade network, fusing the multi-scale cascade sub-feature graphs corresponding to the superior sub-networks and inputting the multi-scale cascade sub-feature graphs into the secondary sub-networks for pedestrian feature extraction, and fusing the pedestrian features of the sub-networks, thereby obtaining more macroscopic and accurate pedestrian feature expression. Therefore, the method can obtain more global, high-level and accurate pedestrian feature expression, avoid the interference of chromatic aberration, illumination, scale set scenes and the like, and improve the accuracy of pedestrian re-identification.
The invention provides a pedestrian re-identification method based on a multi-scale twin cascade network, which comprises the following steps with reference to a figure 1:
dividing the data set into a training set TrainSet and a verification set;
the training set TrainSet is used for training the multi-scale twin cascade network; the verification set is used for verifying the accuracy of the multi-scale twin cascaded network.
the Network structures of the multi-scale twin cascade color Network _1 and the multi-scale twin cascade gray scale Network _2 are completely the same;
in the invention, each color picture sample in the data set corresponds to a gray picture sample, the color picture sample is input into a multi-scale twin cascade color Network _1, and the gray picture sample is input into a multi-scale twin cascade gray Network _ 2. By setting the multi-scale twin cascade gray level Network 2 with the same structure as the multi-scale twin cascade color Network 1, the influences of color difference, illumination, scenes, postures and the like caused by camera crossing can be supplemented, and the accuracy of pedestrian feature extraction is improved.
The multi-scale twin cascade color Network _1 comprises a first cascade color sub-Network level _1s, a second cascade color sub-Network level _2s and a third cascade color sub-Network level _3 s;
the multi-scale twin cascade gray level Network _2 comprises a first cascade gray level sub-Network level _1g, a second cascade gray level sub-Network level _2g and a third cascade gray level sub-Network level _3 g;
in the present invention, the requirements of three cascaded subnetworks are: the backbone network is different, and the structure can be simpler in one level than in one level. The backbone network can be a simple convolution network, a residual error network or a combination of various networks, but the output scale of the sub-feature graph of the upper-level network is required to be consistent with the input scale of the lower-level network, and the scales only refer to height and width.
Training the multi-scale twin cascade network by adopting the following mode to obtain the trained multi-scale twin cascade network:
step 2.1, taking 3 sample groups as a batch of sample group sets; each batch of 3 sample groups is represented as: sample set u1Sample group u2And a sample group u3(ii) a Wherein the sample group u1To fix the sample; sample set u2And a sample group u1Corresponding to the same pedestrian, sample group u2Is a sample group u1A positive sample of (a); sample set u3And a sample group u1Corresponding to different pedestrians, sample group u3Is a sample group u1A negative sample of (d);
inputting a set of sets of sample sets of a batch into the multi-scale twin cascaded network;
step 2.2, for each sample group, its color picture samples are represented as: color picture samples rgb _ tu, grayscale picture samples denoted gray _ tu;
referring to fig. 2, the color picture sample rgb _ tu is input to the multi-scale twin cascade color Network _1 to obtain a first cascade color pedestrian classification result class _1s output by the first cascade color sub-Network level _1s, a second cascade color pedestrian classification result class _2s output by the second cascade color sub-Network level _2s, a third cascade color pedestrian classification result class _3s output by the third cascade color sub-Network level _3s, and a color pedestrian fusion feature map rgb _ features output by the multi-scale twin cascade color Network _ 1;
inputting the gray picture sample gray _ tu into a multi-scale twin cascade gray Network _2 to obtain a first cascade gray pedestrian classification result class _1g output by a first cascade gray sub-Network level _1g, a second cascade gray pedestrian classification result class _2g output by a second cascade gray sub-Network level _2g, a third cascade gray pedestrian classification result class _3g output by a third cascade gray sub-Network level _3g and a gray pedestrian fusion feature map gray _ features output by a multi-scale twin cascade gray Network _ 2;
because the processing procedure of inputting the color picture sample rgb _ tu into the multi-scale twin cascaded color Network _1 is completely the same as the processing procedure of inputting the gray picture sample gray _ tu into the multi-scale twin cascaded gray Network _2, the invention only takes the processing procedure of inputting the color picture sample rgb _ tu into the multi-scale twin cascaded color Network _1 as an example, and the detailed description is carried out through the steps 2.2.1 to 2.2.5, and the processing procedure of inputting the gray picture sample gray _ tu into the multi-scale twin cascaded gray Network _2 is not repeated.
Wherein, the color picture sample rgb _ tu is input into the multi-scale twin cascade color Network _1, and the specific process is as follows:
step 2.2.1, the color picture sample rgb _ tu is reduced to obtain a Scale _ a picture sample; further reducing the Scale _ a picture sample to obtain a Scale _ b picture sample; further reducing the Scale _ b picture sample to obtain a Scale _ c picture sample;
therefore, the picture sizes of the Scale _ a picture sample, the Scale _ b picture sample, and the Scale _ c picture sample are not reduced.
As a specific implementation manner, the Scale _ a picture sample is reduced by two times to obtain a Scale _ b picture sample; and reducing the Scale of the Scale _ b picture sample by two times to obtain a Scale _ c picture sample. For example, Scale _ a picture sample size is 128 × 384, Scale _ b picture sample size is 64 × 192, Scale _ c picture sample size is 32 × 96, and 32 indicates the width of the picture; 96 refers to the height of the picture.
Step 2.2.2, the Scale _ a picture sample is input into the first cascade color sub-network level _1s, and the processing procedure of the first cascade color sub-network level _1s refers to fig. 3 as follows:
A1) carrying out convolution, batch normalization and activation on the Scale _ a picture samples to obtain a pedestrian feature map rgb _ feature _ a;
A2) down-sampling the pedestrian feature map rgb _ feature _ a to obtain a pedestrian feature map rgb _ feature1 with the same size as the Scale _ b picture sample;
A3) down-sampling the pedestrian feature map rgb _ feature1 to obtain a pedestrian feature map rgb _ feature2 with the same size as the Scale _ c picture sample;
A4) carrying out convolution and global average pooling operation on the pedestrian feature map rgb _ feature2, and inputting the operation into a first full connection layer to obtain a first cascade pedestrian feature map rgb _ stag1_ feature;
A5) inputting the first cascade pedestrian feature map rgb _ stag1_ feature into a second full-connection layer to obtain a first cascade color pedestrian classification result class _1 s;
for example, Scale _ a picture sample input at Scale 128 × 384, convolution layer, batch normalization layer, ReLU activation layer, two down-sampling units down sampling _ unit, convolution, average pooling, full connection layer fc1, and full connection layer fc 2.
The downsampling unit in this embodiment is implemented by using convolution with a step size of 2, see a dashed-line frame region in fig. 2, the downsampling unit1 and the downsampling unit2 implement that after downsampling of the feature map, a pedestrian feature map rgb 1 and a pedestrian feature map rgb feature2 are respectively obtained, the output of fc1 of the cascade network level _1 full network obtains a first cascade pedestrian feature map rgb _ stag1_ feature, and the fc2 of the cascade network level _1 full network outputs a first cascade color pedestrian classification result class _1 s.
Step 2.2.3, the processing procedure of the second cascaded color sub-network level _2s with reference to fig. 4 is:
B1) carrying out convolution, batch normalization and activation on the Scale _ b picture samples to obtain a pedestrian characteristic image rgb _ feature _ b;
B2) performing pedestrian feature fusion on the pedestrian feature map rgb _ feature _ b and the pedestrian feature map rgb _ feature1, and then performing down-sampling to obtain a pedestrian feature map rgb _ feature3 with the same size as the Scale _ c picture sample;
B3) carrying out convolution and global average pooling operation on the pedestrian feature map rgb _ feature3, and inputting the operation into the first full connection layer to obtain a second cascade pedestrian feature map rgb _ stag2_ feature;
B4) inputting the second cascade pedestrian feature map rgb _ stag2_ feature into a second full-connection layer to obtain a second cascade color pedestrian classification result class _2 s;
in this embodiment, a convolutional network is constructed as an example, and the following are sequentially performed according to the direction of data flow: scale _ b picture sample input at Scale 96 x 192, convolution layer, batch normalization layer, ReLU activation layer, one down-sampling unit down sampling _ unit, convolution, average pooling, full connection layer fc1, and full connection layer fc 2.
Different from level _1 are: the network is a dual-input structure and comprises the following parts: scale _ b picture sample input and pedestrian feature map rgb _ feature1 at Scale 64 x 192, and only one down-sampling unit is needed. The number of the first layer of convolution kernels of the network needs to be consistent with the number of channels of the pedestrian feature map rgb _ feature1, and then the pedestrian feature map rgb _ feature1 and the 192Scale _ b picture sample after the first layer of convolution operation are added, and then the added data are input into a subsequent network structure, wherein a downsamping _ unit is used for downsampling the feature map to obtain the pedestrian feature map rgb _ feature3 needed by a level _3 network, and the fc1 of a cascade network level _2 full network outputs a second cascade pedestrian feature map rgb _ stag2_ feature and the cascade network level _2 full network fc2 outputs a second cascade color pedestrian classification result class _2 s.
Step 2.2.4, the processing procedure of the third cascaded color sub-network level _3s with reference to fig. 5 is:
C1) carrying out convolution, batch normalization and activation on the Scale _ c picture samples to obtain a pedestrian characteristic image rgb _ feature _ c;
C2) pedestrian feature fusion is carried out on the pedestrian feature map rgb _ feature _ c, the pedestrian feature map rgb _ feature2 and the pedestrian feature map rgb _ feature3, convolution and global average pooling operation are carried out, then the pedestrian feature fusion is input into the first full connection layer, and a third-level joint pedestrian feature map rgb _ stag3_ feature is obtained;
C3) inputting the third-level joint pedestrian feature map rgb _ stag3_ feature into a second full-connection layer to obtain a third-level joint color pedestrian classification result class _3 s;
in this embodiment, a convolutional network is constructed as an example, and the following are sequentially performed according to the direction of data flow: scale _ c picture sample input at Scale 32 x 96, convolution layer, batch normalization layer, ReLU activation layer, convolution, average pooling, full connection layer fc1, and full connection layer fc 2.
Different from the first two networks are: the network is a three-input structure, which is respectively: scale _ c picture sample input with Scale 32 x 96, pedestrian feature map rgb feature2 and pedestrian feature map rgb feature 3. The number of the first layer of convolution of the network needs to be consistent with the number of channels of the pedestrian feature map rgb _ feature2 and the pedestrian feature map rgb _ feature3, then the first layer of convolution operation 32 × 96 original image, the pedestrian feature map rgb _ feature2 and the pedestrian feature map rgb _ feature3 are subjected to Add operation, and then the added operation is input into a subsequent network structure, finally the fc1 of the cascade network level _3 full network outputs a third-level joint pedestrian feature map rgb _ stag3_ feature and the fc2 of the cascade network level _3 full network outputs a third-level joint color pedestrian classification result class _3 s.
Step 2.2.5, carrying out pedestrian feature fusion on the first cascade pedestrian feature map rgb _ stag1_ feature, the second cascade pedestrian feature map rgb _ stag2_ feature and the third cascade pedestrian feature map rgb _ stag3_ feature to obtain a colorful pedestrian fusion feature map rgb _ features;
step 2.3, for each sample group, carrying out pedestrian feature fusion on the color pedestrian fusion feature map rgb _ features and the gray level pedestrian fusion feature map gray _ features through a fusion layer, and then carrying out dimension reduction treatment through a PCA dimension reduction layer to obtain a final global pedestrian feature map features; the global pedestrian feature map features pass through the full connection layer to obtain a global pedestrian classification result classifys;
in the step, the color pedestrian fusion feature map rgb _ features and the gray level pedestrian fusion feature map gray _ features are subjected to channel fusion, so that multi-scale features of color and gray level images are fused to obtain richer pedestrian information, then PCA is connected to sequentially reduce mean centralization, calculate covariance and decompose feature values of the fusion features, and finally a final effective feature dimension is selected according to a feature value decomposition result to obtain a final global pedestrian feature map features.
For example, the color pedestrian fusion feature map rgb _ features and the gray pedestrian fusion feature map gray _ features of the twin cascade network are subjected to channel fusion to obtain D ═ (x ═ x)(1),x(2),...x(m)) M is 1024 dimensions in this embodiment, where x is a column vector of length batch; when PCA feature dimension reduction is carried out, firstly, the centralization operation of mean value reduction is carried out on D, see formulaThe obtained feature vector is represented by X, that is, X is (X1)(1)′,x1(2)′,...x1(m)′). Then, the covariance matrix V-XX is calculatedTFinally, matrix decomposition V ═ U ∑ U is carried out on VTThe purpose of matrix decomposition is to decompose the fused matrix V into eigenvalues and eigenvectors, the magnitude of which is used to determine the quality of the eigenvectors. The m eigenvalues after V decomposition are ∑ ═ (λ)1,λ2,...λm) Corresponding feature vector U ═ w1,w2...wmFinally, selecting eigenvectors { w) corresponding to the first k eigenvalues according to the set eigendimension k1,w2...wkThe final feature vector is composed of: global pedestrian feature maps features.
Step 2.4, this batch has 3 sample groups in total, for any u < th > sample groupiA set of samples, i ═ 1,2,3, gives the u thiGlobal pedestrian feature map corresponding to each sample groupGlobal pedestrian classification resultsFirst of allCascading color pedestrian classification resultsSecond cascade color pedestrian classification resultThird-level color pedestrian classification resultFirst cascade gray pedestrian classification resultSecond cascade gray pedestrian classification resultAnd third-level gray pedestrian classification result
Step 2.5, calculating loss values of all levels of sub-networks:
step 2.5.1, classifying the first cascade color pedestrian classification resultAnd uiComparing the sample labels of the sample groups to obtain a first cascade color pedestrian classification loss value
Classifying the second cascade color pedestrianAnd uiComparing the sample labels of the sample groups to obtain a second cascade color pedestrian classification loss value
Connecting the third stage to the third stageColor pedestrian classification resultAnd uiComparing the sample labels of the sample groups to obtain a third cascade color pedestrian classification loss value
Classifying the first cascade gray pedestrianAnd uiComparing the sample labels of the individual sample groups to obtain a first cascade gray pedestrian classification loss value
Classifying the second cascade gray level pedestrianAnd uiComparing the sample labels of the sample groups to obtain a second cascade gray level pedestrian classification loss value
Classifying the third cascade gray pedestrianAnd uiComparing the sample labels of the sample groups to obtain a third cascade gray pedestrian classification loss value
Step 2.5.2, respectively calculating and obtaining a Loss value Loss _1s of the first cascade color sub-network level _1s, a Loss value Loss _2s of the second cascade color sub-network level _2s, a Loss value Loss _3s of the third cascade color sub-network level _3s, a Loss value Loss _1g of the first cascade gray sub-network level _1g, a Loss value Loss _2g of the second cascade gray sub-network level _2g, and a Loss value Loss _3g of the third cascade gray sub-network level _3g by adopting the following formula:
step 2.6, calculating a Loss value Loss _0 of the multi-scale twin cascade network:
step 2.6.1, classifying the global pedestrian resultsAnd uiComparing the sample labels of the individual sample groups to obtain a global pedestrian classification loss value
Step 2.6.2, calculating to obtain a Loss value Loss _0 of the multi-scale twin cascade network by adopting the following formula:
step 2.7, calculating a similarity Loss function value Loss _ sim between the sample groups:
step 2.7.1, calculate sample set u1Global pedestrian feature map ofAnd a sample group u2Global pedestrian feature map ofThe sample distance between, expressed as: d (u)1,u2);
Computing a set of samples u1Global pedestrian feature map ofAnd a sample group u3Global pedestrian feature map ofThe sample distance between, expressed as: d (u)1,u3);
Step 2.7.2, calculating a preliminary Loss function value Loss _ d by adopting the following formula:
Loss_d=d(u1,u2)-d(u1,u3)+α
wherein: alpha is a loss function coefficient, and the value range is as follows: alpha is alpha<d(u1,u3)-d(u1,u2)
Step 2.7.3, obtaining a similarity Loss function value Loss _ sim by the following method:
if the Loss _ d is larger than 0, then Loss _ sim is Loss _ d;
if the Loss _ d is less than or equal to 0, then Loss _ sim is 0;
and 2.8, obtaining a final Loss function value Loss _ final by adopting the following formula:
Loss_final=λ1Loss_1s+λ1Loss_2s+λ1Loss_3s+λ1Loss_1g+λ1Loss_2g+λ1Loss_3g+λ2Loss_0+λ3Loss_sim
wherein:
λ1weight coefficients representing each cascaded subnetwork;
λ2a weight coefficient representing a loss of the multi-scale twin cascaded network;
λ3a similarity loss function value weight coefficient;
as a specific implementation, λ1Is 1, λ2Is 6, λ3Is 7.
Step 2.9, judging whether the final Loss function value Loss _ final is converged; if the convergence is achieved, obtaining a trained multi-scale twin cascade network, and executing the step 3; if not, adjusting the network parameters of the multi-scale twin cascade network, taking another batch of sample group as input, returning to the step 2.1, and performing iterative training on the multi-scale twin cascade network;
and 4, performing feature recognition on the input pedestrian picture by adopting a multi-scale twin cascade network to obtain a pedestrian feature recognition result.
The step 4 specifically comprises the following steps:
step 4.1, the input pedestrian picture is a picture Q; pre-establishing a pedestrian sample library G;
step 4.2, inputting the picture Q into a multi-scale twin cascade network to obtain a global pedestrian feature map features[Q];
For each pedestrian sample picture G in the pedestrian sample library GjJ ═ 1, 2.. once, z, z represents the number of pedestrian sample pictures in the pedestrian sample library G, and the pictures are respectively input into the multi-scale twin cascade network to obtain the corresponding global pedestrian feature map
Step 4.3, calculating global pedestrian feature maps features[Q]And global pedestrian feature mapThe similarity of (2); and (4) sorting the similarity from large to small, and outputting the pedestrian sample pictures in the pedestrian sample library G with the highest similarity with the picture Q.
As a specific implementation manner, 1 pedestrian sample picture in the pedestrian sample library G with the highest similarity to the picture Q may be output. The parameter K of the number of best matches of the images may also be preset, for example, if K is set to 10, sorting the pedestrian sample images in the pedestrian sample library G from large to small according to the similarity, and then selecting the 1 st to 10 th pedestrian sample images and outputting the pedestrian sample images according to the sorting order.
The technical essential that this patent relates to: 1. establishing a multi-scale cascade network, fusing and inputting the multi-scale and color or gray sub-features of a corresponding superior sub-network into a secondary sub-network for pedestrian feature extraction, and fusing the pedestrian features of the sub-networks; 2. and a twin network with double input of color and gray levels is used, the pedestrian features of color multi-scale and gray level multi-scale are subjected to feature fusion, and then a pedestrian feature dimension reduction strategy is adopted, so that stronger, comprehensive and effective pedestrian feature expression is obtained.
Compared with the prior art, the invention has the beneficial effects that:
compared with single input, the gray level input mode can balance and increase pedestrian characteristic information influenced by chromatic aberration, illumination, scenes and the like caused by camera setting, thereby extracting more comprehensive pedestrian characteristic information.
In order to reduce the interference of redundant information, the final pedestrian characteristics are subjected to dimension reduction operation, so that the obtained pedestrian characteristics represent more comprehensive and are not too complicated.
The method is different from the prior art that only a single-scale mode is adopted for pedestrian feature extraction, but a multi-scale mode is adopted for carrying out pedestrian feature extraction on images of each scale and carrying out channel fusion, so that the pedestrian features with both spatial information and strong semantic information are obtained.
Therefore, the method is different from the method for extracting the pedestrian features by adopting a single network structure in the prior art, when the network is constructed, a plurality of sub-networks are adopted to construct a multi-scale twin cascade network, and each level of network is utilized to extract a plurality of sub-features, so that the method for extracting the pedestrian features at each level is ensured to be mutually independent; the output of each level is combined with original images with different scales to serve as the input of the next level, and finally, the pedestrian features of each cascade are fused to realize mutual supplement of different cascade networks, so that more obvious pedestrian features are mined, and the pedestrian feature expression force is enhanced.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.
Claims (3)
1. A pedestrian re-identification method based on a multi-scale twin cascade network is characterized by comprising the following steps:
step 1, constructing a data set; the data set comprises a plurality of sample groups; each sample group comprises two image samples which are respectively a color image sample and a gray image sample; the gray image sample is an image sample obtained after graying the color image sample;
dividing the data set into a training set TrainSet and a verification set;
step 2, constructing a multi-scale twin cascade network; the multi-scale twin cascade Network comprises a multi-scale twin cascade color Network _1, a multi-scale twin cascade gray scale Network _2, a fusion layer and a PCA dimension reduction layer;
the Network structures of the multi-scale twin cascade color Network _1 and the multi-scale twin cascade gray scale Network _2 are completely the same;
the multi-scale twin cascade color Network _1 comprises a first cascade color sub-Network level _1s, a second cascade color sub-Network level _2s and a third cascade color sub-Network level _3 s;
the multi-scale twin cascade gray level Network _2 comprises a first cascade gray level sub-Network level _1g, a second cascade gray level sub-Network level _2g and a third cascade gray level sub-Network level _3 g;
training the multi-scale twin cascade network by adopting the following mode to obtain the trained multi-scale twin cascade network:
step 2.1, taking 3 sample groups as a batch of sample group sets; each batch of 3 sample groups is represented as: sample set u1Sample group u2And a sample group u3(ii) a Wherein the sample group u1To fix the sample; sample set u2And a sample group u1Corresponding to the same pedestrian, sample group u2Is a sample group u1A positive sample of (a); sample set u3And a sample group u1Corresponding to different pedestrians, sample group u3Is a sample group u1A negative sample of (d);
inputting a set of sets of sample sets of a batch into the multi-scale twin cascaded network;
step 2.2, for each sample group, its color picture samples are represented as: color picture samples rgb _ tu, grayscale picture samples denoted gray _ tu;
inputting the color picture sample rgb _ tu into the multi-scale twin cascade color Network _1 to obtain a first cascade color pedestrian classification result class _1s output by the first cascade color sub-Network level _1s, a second cascade color pedestrian classification result class _2s output by the second cascade color sub-Network level _2s, a third cascade color pedestrian classification result class _3s output by the third cascade color sub-Network level _3s, and a color pedestrian fusion feature map rgb _ features output by the multi-scale twin cascade color Network _ 1;
inputting the gray picture sample gray _ tu into a multi-scale twin cascade gray Network _2 to obtain a first cascade gray pedestrian classification result class _1g output by a first cascade gray sub-Network level _1g, a second cascade gray pedestrian classification result class _2g output by a second cascade gray sub-Network level _2g, a third cascade gray pedestrian classification result class _3g output by a third cascade gray sub-Network level _3g and a gray pedestrian fusion feature map gray _ features output by a multi-scale twin cascade gray Network _ 2;
wherein, the color picture sample rgb _ tu is input into the multi-scale twin cascade color Network _1, and the specific process is as follows:
step 2.2.1, the color picture sample rgb _ tu is reduced to obtain a Scale _ a picture sample; further reducing the Scale _ a picture sample to obtain a Scale _ b picture sample; further reducing the Scale _ b picture sample to obtain a Scale _ c picture sample;
step 2.2.2, inputting the Scale _ a picture sample into a first cascade color sub-network level _1s, wherein the processing process of the first cascade color sub-network level _1s is as follows:
A1) carrying out convolution, batch normalization and activation on the Scale _ a picture samples to obtain a pedestrian feature map rgb _ feature _ a;
A2) down-sampling the pedestrian feature map rgb _ feature _ a to obtain a pedestrian feature map rgb _ feature1 with the same size as the Scale _ b picture sample;
A3) down-sampling the pedestrian feature map rgb _ feature1 to obtain a pedestrian feature map rgb _ feature2 with the same size as the Scale _ c picture sample;
A4) carrying out convolution and global average pooling operation on the pedestrian feature map rgb _ feature2, and inputting the operation into a first full connection layer to obtain a first cascade pedestrian feature map rgb _ stag1_ feature;
A5) inputting the first cascade pedestrian feature map rgb _ stag1_ feature into a second full-connection layer to obtain a first cascade color pedestrian classification result class _1 s;
step 2.2.3, the processing procedure of the second cascade color sub-network level _2s is as follows:
B1) carrying out convolution, batch normalization and activation on the Scale _ b picture samples to obtain a pedestrian characteristic image rgb _ feature _ b;
B2) performing pedestrian feature fusion on the pedestrian feature map rgb _ feature _ b and the pedestrian feature map rgb _ feature1, and then performing down-sampling to obtain a pedestrian feature map rgb _ feature3 with the same size as the Scale _ c picture sample;
B3) carrying out convolution and global average pooling operation on the pedestrian feature map rgb _ feature3, and inputting the operation into the first full connection layer to obtain a second cascade pedestrian feature map rgb _ stag2_ feature;
B4) inputting the second cascade pedestrian feature map rgb _ stag2_ feature into a second full-connection layer to obtain a second cascade color pedestrian classification result class _2 s;
step 2.2.4, the processing procedure of the third cascade color sub-network level _3s is as follows:
C1) carrying out convolution, batch normalization and activation on the Scale _ c picture samples to obtain a pedestrian characteristic image rgb _ feature _ c;
C2) pedestrian feature fusion is carried out on the pedestrian feature map rgb _ feature _ c, the pedestrian feature map rgb _ feature2 and the pedestrian feature map rgb _ feature3, convolution and global average pooling operation are carried out, then the pedestrian feature fusion is input into the first full connection layer, and a third-level joint pedestrian feature map rgb _ stag3_ feature is obtained;
C3) inputting the third-level joint pedestrian feature map rgb _ stag3_ feature into a second full-connection layer to obtain a third-level joint color pedestrian classification result class _3 s;
step 2.2.5, carrying out pedestrian feature fusion on the first cascade pedestrian feature map rgb _ stag1_ feature, the second cascade pedestrian feature map rgb _ stag2_ feature and the third cascade pedestrian feature map rgb _ stag3_ feature to obtain a colorful pedestrian fusion feature map rgb _ features;
step 2.3, for each sample group, carrying out pedestrian feature fusion on the color pedestrian fusion feature map rgb _ features and the gray level pedestrian fusion feature map gray _ features through a fusion layer, and then carrying out dimension reduction treatment through a PCA dimension reduction layer to obtain a final global pedestrian feature map features; the global pedestrian feature map features pass through the full connection layer to obtain a global pedestrian classification result classifys;
step 2.4, this batch has 3 sample groups in total, for any u < th > sample groupiA set of samples, i ═ 1,2,3, gives the u thiGlobal pedestrian feature map corresponding to each sample groupGlobal pedestrian classification resultsFirst cascade color pedestrian classification resultSecond cascade color pedestrian classification resultThird-level color pedestrian classification resultFirst cascade gray pedestrian classification resultSecond cascade gray pedestrian classification resultAnd third-level gray pedestrian classification result
Step 2.5, calculating loss values of all levels of sub-networks:
step 2.5.1, classifying the first cascade color pedestrian classification resultAnd uiComparing the sample labels of the sample groups to obtain a first cascade color pedestrian classification loss value
Classifying the second cascade color pedestrianAnd uiComparing the sample labels of the sample groups to obtain a second cascade color pedestrian classification loss value
Classifying the third cascade color pedestrianAnd uiComparing the sample labels of the sample groups to obtain a third cascade color pedestrian classification loss value
Classifying the first cascade gray pedestrianAnd uiComparing the sample labels of the individual sample groups to obtain a first cascade gray pedestrian classification loss value
Classifying the second cascade gray level pedestrianAnd uiComparing the sample labels of the sample groups to obtain a second cascade gray level pedestrian classification loss value
Classifying the third cascade gray pedestrianAnd uiComparing the sample labels of the sample groups to obtain a third cascade gray pedestrian classification loss value
Step 2.5.2, respectively calculating and obtaining a Loss value Loss _1s of the first cascade color sub-network level _1s, a Loss value Loss _2s of the second cascade color sub-network level _2s, a Loss value Loss _3s of the third cascade color sub-network level _3s, a Loss value Loss _1g of the first cascade gray sub-network level _1g, a Loss value Loss _2g of the second cascade gray sub-network level _2g, and a Loss value Loss _3g of the third cascade gray sub-network level _3g by adopting the following formula:
step 2.6, calculating a Loss value Loss _0 of the multi-scale twin cascade network:
step 2.6.1, classifying the global pedestrian resultsAnd uiComparing the sample labels of the individual sample groups to obtain a global pedestrian classification loss value
Step 2.6.2, calculating to obtain a Loss value Loss _0 of the multi-scale twin cascade network by adopting the following formula:
step 2.7, calculating a similarity Loss function value Loss _ sim between the sample groups:
step 2.7.1, calculate sample set u1Global pedestrian feature map ofAnd a sample group u2Global pedestrian feature map ofThe sample distance between, expressed as: d (u)1,u2);
Computing a set of samples u1Global pedestrian feature map ofAnd a sample group u3Global pedestrian feature map ofThe sample distance between, expressed as: d (u)1,u3);
Step 2.7.2, calculating a preliminary Loss function value Loss _ d by adopting the following formula:
Loss_d=d(u1,u2)-d(u1,u3)+α
wherein: alpha is a loss function coefficient, and the value range is as follows: alpha is alpha<d(u1,u3)-d(u1,u2)
Step 2.7.3, obtaining a similarity Loss function value Loss _ sim by the following method:
if the Loss _ d is larger than 0, then Loss _ sim is Loss _ d;
if the Loss _ d is less than or equal to 0, then Loss _ sim is 0;
and 2.8, obtaining a final Loss function value Loss _ final by adopting the following formula:
Loss_final=λ1Loss_1s+λ1Loss_2s+λ1Loss_3s+λ1Loss_1g+λ1Loss_2g+λ1Loss_3g+λ2Loss_0+λ3Loss_sim
wherein:
λ1weight coefficients representing each cascaded subnetwork;
λ2a weight coefficient representing a loss of the multi-scale twin cascaded network;
λ3a similarity loss function value weight coefficient;
step 2.9, judging whether the final Loss function value Loss _ final is converged; if the convergence is achieved, obtaining a trained multi-scale twin cascade network, and executing the step 3; if not, adjusting the network parameters of the multi-scale twin cascade network, taking another batch of sample group as input, returning to the step 2.1, and performing iterative training on the multi-scale twin cascade network;
step 3, performing precision verification test on the trained multi-scale twin cascade network by using a verification set, and if the test precision meets the requirement, obtaining a multi-scale twin cascade network which passes the verification;
and 4, performing feature recognition on the input pedestrian picture by adopting a multi-scale twin cascade network to obtain a pedestrian feature recognition result.
2. The pedestrian re-identification method based on the multi-scale twin cascade network as claimed in claim 1, wherein λ is1Is 1, λ2Is 6, λ3Is 7.
3. The pedestrian re-identification method based on the multi-scale twin cascade network as claimed in claim 1, wherein the step 4 specifically comprises:
step 4.1, the input pedestrian picture is a picture Q; pre-establishing a pedestrian sample library G;
step 4.2, inputting the picture Q into a multi-scale twin cascade network to obtain a global pedestrian feature map features[Q];
For each pedestrian sample picture G in the pedestrian sample library GjJ ═ 1, 2.. once, z, z represents the number of pedestrian sample pictures in the pedestrian sample library G, and the pictures are respectively input into the multi-scale twin cascade network to obtain the corresponding global pedestrian feature map
Step 4.3, calculating global pedestrian feature maps features[Q]And global pedestrian feature mapThe similarity of (2); and (4) sorting the similarity from large to small, and outputting the pedestrian sample pictures in the pedestrian sample library G with the highest similarity with the picture Q.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111355189.2A CN113963150B (en) | 2021-11-16 | 2021-11-16 | Pedestrian re-identification method based on multi-scale twin cascade network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111355189.2A CN113963150B (en) | 2021-11-16 | 2021-11-16 | Pedestrian re-identification method based on multi-scale twin cascade network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113963150A true CN113963150A (en) | 2022-01-21 |
CN113963150B CN113963150B (en) | 2022-04-08 |
Family
ID=79470827
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111355189.2A Active CN113963150B (en) | 2021-11-16 | 2021-11-16 | Pedestrian re-identification method based on multi-scale twin cascade network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113963150B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117132937A (en) * | 2023-09-06 | 2023-11-28 | 东北大学佛山研究生创新学院 | Dual-channel pedestrian re-identification method based on attention twin network |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109446898A (en) * | 2018-09-20 | 2019-03-08 | 暨南大学 | A kind of recognition methods again of the pedestrian based on transfer learning and Fusion Features |
CN110084215A (en) * | 2019-05-05 | 2019-08-02 | 上海海事大学 | A kind of pedestrian of the twin network model of binaryzation triple recognition methods and system again |
CN110909605A (en) * | 2019-10-24 | 2020-03-24 | 西北工业大学 | Cross-modal pedestrian re-identification method based on contrast correlation |
CN111259850A (en) * | 2020-01-23 | 2020-06-09 | 同济大学 | Pedestrian re-identification method integrating random batch mask and multi-scale representation learning |
US20200226421A1 (en) * | 2019-01-15 | 2020-07-16 | Naver Corporation | Training and using a convolutional neural network for person re-identification |
CN111709317A (en) * | 2020-05-28 | 2020-09-25 | 西安理工大学 | Pedestrian re-identification method based on multi-scale features under saliency model |
CN111931637A (en) * | 2020-08-07 | 2020-11-13 | 华南理工大学 | Cross-modal pedestrian re-identification method and system based on double-current convolutional neural network |
CN112883850A (en) * | 2021-02-03 | 2021-06-01 | 湖北工业大学 | Multi-view aerospace remote sensing image matching method based on convolutional neural network |
CN112906605A (en) * | 2021-03-05 | 2021-06-04 | 南京航空航天大学 | Cross-modal pedestrian re-identification method with high accuracy |
CN112926531A (en) * | 2021-04-01 | 2021-06-08 | 深圳市优必选科技股份有限公司 | Feature information extraction method, model training method and device and electronic equipment |
CN112949608A (en) * | 2021-04-15 | 2021-06-11 | 南京邮电大学 | Pedestrian re-identification method based on twin semantic self-encoder and branch fusion |
-
2021
- 2021-11-16 CN CN202111355189.2A patent/CN113963150B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109446898A (en) * | 2018-09-20 | 2019-03-08 | 暨南大学 | A kind of recognition methods again of the pedestrian based on transfer learning and Fusion Features |
US20200226421A1 (en) * | 2019-01-15 | 2020-07-16 | Naver Corporation | Training and using a convolutional neural network for person re-identification |
CN110084215A (en) * | 2019-05-05 | 2019-08-02 | 上海海事大学 | A kind of pedestrian of the twin network model of binaryzation triple recognition methods and system again |
CN110909605A (en) * | 2019-10-24 | 2020-03-24 | 西北工业大学 | Cross-modal pedestrian re-identification method based on contrast correlation |
CN111259850A (en) * | 2020-01-23 | 2020-06-09 | 同济大学 | Pedestrian re-identification method integrating random batch mask and multi-scale representation learning |
CN111709317A (en) * | 2020-05-28 | 2020-09-25 | 西安理工大学 | Pedestrian re-identification method based on multi-scale features under saliency model |
CN111931637A (en) * | 2020-08-07 | 2020-11-13 | 华南理工大学 | Cross-modal pedestrian re-identification method and system based on double-current convolutional neural network |
CN112883850A (en) * | 2021-02-03 | 2021-06-01 | 湖北工业大学 | Multi-view aerospace remote sensing image matching method based on convolutional neural network |
CN112906605A (en) * | 2021-03-05 | 2021-06-04 | 南京航空航天大学 | Cross-modal pedestrian re-identification method with high accuracy |
CN112926531A (en) * | 2021-04-01 | 2021-06-08 | 深圳市优必选科技股份有限公司 | Feature information extraction method, model training method and device and electronic equipment |
CN112949608A (en) * | 2021-04-15 | 2021-06-11 | 南京邮电大学 | Pedestrian re-identification method based on twin semantic self-encoder and branch fusion |
Non-Patent Citations (8)
Title |
---|
JIANGUO JIANG等: "A Cross-Modal Multi-granularity Attention Network for RGB-IR Person Re-identification", 《NEUROCOMPUTING》 * |
MANG YE等: "Visible Thermal Person Re-Identification via Dual-Constrained Top-Ranking", 《PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-18)》 * |
YUNZHOU ZHANG等: "SCALE-INVARIANT SIAMESE NETWORK FOR PERSON RE-IDENTIFICATION", 《2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)》 * |
严晨晨: "基于多尺度联合学习的车辆重识别方法研究", 《中国优秀博硕士学位论文全文数据库(硕士) 工程科技Ⅱ辑》 * |
张姗等: "级联式多尺度行人检测算法研究", 《传感器与微系统》 * |
焦隆: "面向安防监控的行人重识别设计与实现", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 * |
童靖然等: "特征金字塔融合的多模态行人检测算法", 《计算机工程与应用》 * |
范星: "智能视频监控中的行人重识别方法研究", 《中国优秀博硕士学位论文全文数据库(博士) 信息科技辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117132937A (en) * | 2023-09-06 | 2023-11-28 | 东北大学佛山研究生创新学院 | Dual-channel pedestrian re-identification method based on attention twin network |
Also Published As
Publication number | Publication date |
---|---|
CN113963150B (en) | 2022-04-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109902732B (en) | Automatic vehicle classification method and related device | |
CN109472298B (en) | Deep bidirectional feature pyramid enhanced network for small-scale target detection | |
CN107766850B (en) | Face recognition method based on combination of face attribute information | |
CN115937655B (en) | Multi-order feature interaction target detection model, construction method, device and application thereof | |
CN112348036A (en) | Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade | |
CN111160249A (en) | Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion | |
CN109063649B (en) | Pedestrian re-identification method based on twin pedestrian alignment residual error network | |
CN106022300A (en) | Traffic sign identifying method and traffic sign identifying system based on cascading deep learning | |
CN110826462A (en) | Human body behavior identification method of non-local double-current convolutional neural network model | |
CN112364721A (en) | Road surface foreign matter detection method | |
EP4323915A1 (en) | License plate classification method, license plate classification apparatus, and computer-readable storage medium | |
CN112036260A (en) | Expression recognition method and system for multi-scale sub-block aggregation in natural environment | |
CN111860683A (en) | Target detection method based on feature fusion | |
CN113052184A (en) | Target detection method based on two-stage local feature alignment | |
CN117516937A (en) | Rolling bearing unknown fault detection method based on multi-mode feature fusion enhancement | |
CN116486251A (en) | Hyperspectral image classification method based on multi-mode fusion | |
CN117333908A (en) | Cross-modal pedestrian re-recognition method based on attitude feature alignment | |
CN114331946A (en) | Image data processing method, device and medium | |
CN113963150B (en) | Pedestrian re-identification method based on multi-scale twin cascade network | |
CN115272242B (en) | YOLOv 5-based optical remote sensing image target detection method | |
CN116563410A (en) | Electrical equipment electric spark image generation method based on two-stage generation countermeasure network | |
CN111723852A (en) | Robust training method for target detection network | |
CN111340064A (en) | Hyperspectral image classification method based on high-low order information fusion | |
CN110688976A (en) | Store comparison method based on image identification | |
KR20210011707A (en) | A CNN-based Scene classifier with attention model for scene recognition in video |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |