CN113963150B - Pedestrian re-identification method based on multi-scale twin cascade network - Google Patents

Pedestrian re-identification method based on multi-scale twin cascade network Download PDF

Info

Publication number
CN113963150B
CN113963150B CN202111355189.2A CN202111355189A CN113963150B CN 113963150 B CN113963150 B CN 113963150B CN 202111355189 A CN202111355189 A CN 202111355189A CN 113963150 B CN113963150 B CN 113963150B
Authority
CN
China
Prior art keywords
pedestrian
cascade
network
sample
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111355189.2A
Other languages
Chinese (zh)
Other versions
CN113963150A (en
Inventor
宋春晓
瞿洪桂
孙家乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinonet Science and Technology Co Ltd
Original Assignee
Beijing Sinonet Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinonet Science and Technology Co Ltd filed Critical Beijing Sinonet Science and Technology Co Ltd
Priority to CN202111355189.2A priority Critical patent/CN113963150B/en
Publication of CN113963150A publication Critical patent/CN113963150A/en
Application granted granted Critical
Publication of CN113963150B publication Critical patent/CN113963150B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a pedestrian re-identification method based on a multi-scale twin cascade network, which comprises the following steps of: constructing a multi-scale twin cascade network; the multi-scale twin cascade network comprises a multi-scale twin cascade color network, a multi-scale twin cascade gray network, a fusion layer and a PCA dimension reduction layer; the multi-scale twin cascaded color network and the multi-scale twin cascaded gray scale network respectively comprise a first cascaded sub-network, a second cascaded sub-network and a third cascaded sub-network. The invention uses the multi-scale cascade network, fuses the cascade sub-feature graphs of the multi-scale and the corresponding superior sub-network and inputs the cascade sub-feature graphs into the secondary sub-network for pedestrian feature extraction, and fuses the pedestrian features of each sub-network, thereby obtaining more macroscopic and accurate pedestrian feature expression. Therefore, the method can obtain more global, high-level and accurate pedestrian feature expression, avoid the interference of chromatic aberration, illumination, scale set scenes and the like, and improve the accuracy of pedestrian re-identification.

Description

Pedestrian re-identification method based on multi-scale twin cascade network
Technical Field
The invention belongs to the technical field of intelligent video image processing, and particularly relates to a pedestrian re-identification method based on a multi-scale twin cascade network.
Background
With the rapid development of 5G and the Internet of things, intelligent life is still natural. The intelligent security is an important component of intelligent life, and as a key technology of the intelligent security, the accuracy of a pedestrian re-identification technology for searching pedestrians under the condition of crossing camera devices is important. The current pedestrian re-identification technology has certain limitations, for example, due to differences among camera devices, pedestrians are susceptible to wearing color differences, illumination, scales, scenes and the like, and therefore accuracy is damaged. Therefore, the above factors of variation bring difficulties to the popularization and application of the pedestrian re-identification technology. Therefore, it is very important to extract the key effective characteristics of pedestrians under different equipment.
The characteristic expression method in the existing pedestrian re-identification method mainly comprises the following steps: 1. the semantic information of the extracted image represents the pedestrian features, and the pedestrian features extracted by the method have strong dependence on the clothing color, so that the collision/clothing color is difficult to distinguish when consistent; 2. the pedestrian features are extracted by using a single-scale input mode, and the detail features of images with different granularities are ignored by the pedestrian features extracted by the method; 3. the pedestrian re-identification method based on the neural network mainly uses a single network to extract pedestrian features, the pedestrian feature information is single, and the dependence on the design of a network structure is large.
Therefore, for the problems existing in the prior art, how to extract more key, effective, accurate and comprehensive pedestrian features in pedestrians in different image capturing devices is very necessary.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a pedestrian re-identification method based on a multi-scale twin cascade network, which can effectively solve the problems.
The technical scheme adopted by the invention is as follows:
the invention provides a pedestrian re-identification method based on a multi-scale twin cascade network, which comprises the following steps of:
step 1, constructing a data set; the data set comprises a plurality of sample groups; each sample group comprises two image samples which are respectively a color image sample and a gray image sample; the gray image sample is an image sample obtained after graying the color image sample;
dividing the data set into a training set TrainSet and a verification set;
step 2, constructing a multi-scale twin cascade network; the multi-scale twin cascade Network comprises a multi-scale twin cascade color Network _1, a multi-scale twin cascade gray scale Network _2, a fusion layer and a PCA dimension reduction layer;
the Network structures of the multi-scale twin cascade color Network _1 and the multi-scale twin cascade gray scale Network _2 are completely the same;
the multi-scale twin cascade color Network _1 comprises a first cascade color sub-Network level _1s, a second cascade color sub-Network level _2s and a third cascade color sub-Network level _3 s;
the multi-scale twin cascade gray level Network _2 comprises a first cascade gray level sub-Network level _1g, a second cascade gray level sub-Network level _2g and a third cascade gray level sub-Network level _3 g;
training the multi-scale twin cascade network by adopting the following mode to obtain the trained multi-scale twin cascade network:
step 2.1, taking 3 sample groups as a batch of sample group sets; each batch of 3 sample groups is represented as: sample set u1Sample group u2And a sample group u3(ii) a Wherein the sample group u1To fix the sample; sample set u2And a sample group u1Corresponding to the same pedestrian, sample group u2Is a sample group u1A positive sample of (a); sample set u3And a sample group u1Corresponding to different pedestrians, sample group u3Is a sample group u1A negative sample of (d);
inputting a set of sets of sample sets of a batch into the multi-scale twin cascaded network;
step 2.2, for each sample group, its color picture samples are represented as: color picture samples rgb _ tu, grayscale picture samples denoted gray _ tu;
inputting the color picture sample rgb _ tu into the multi-scale twin cascade color Network _1 to obtain a first cascade color pedestrian classification result class _1s output by the first cascade color sub-Network level _1s, a second cascade color pedestrian classification result class _2s output by the second cascade color sub-Network level _2s, a third cascade color pedestrian classification result class _3s output by the third cascade color sub-Network level _3s, and a color pedestrian fusion feature map rgb _ features output by the multi-scale twin cascade color Network _ 1;
inputting the gray picture sample gray _ tu into a multi-scale twin cascade gray Network _2 to obtain a first cascade gray pedestrian classification result class _1g output by a first cascade gray sub-Network level _1g, a second cascade gray pedestrian classification result class _2g output by a second cascade gray sub-Network level _2g, a third cascade gray pedestrian classification result class _3g output by a third cascade gray sub-Network level _3g and a gray pedestrian fusion feature map gray _ features output by a multi-scale twin cascade gray Network _ 2;
wherein, the color picture sample rgb _ tu is input into the multi-scale twin cascade color Network _1, and the specific process is as follows:
step 2.2.1, the color picture sample rgb _ tu is reduced to obtain a Scale _ a picture sample; further reducing the Scale _ a picture sample to obtain a Scale _ b picture sample; further reducing the Scale _ b picture sample to obtain a Scale _ c picture sample;
step 2.2.2, inputting the Scale _ a picture sample into a first cascade color sub-network level _1s, wherein the processing process of the first cascade color sub-network level _1s is as follows:
A1) carrying out convolution, batch normalization and activation on the Scale _ a picture samples to obtain a pedestrian feature map rgb _ feature _ a;
A2) down-sampling the pedestrian feature map rgb _ feature _ a to obtain a pedestrian feature map rgb _ feature1 with the same size as the Scale _ b picture sample;
A3) down-sampling the pedestrian feature map rgb _ feature1 to obtain a pedestrian feature map rgb _ feature2 with the same size as the Scale _ c picture sample;
A4) carrying out convolution and global average pooling operation on the pedestrian feature map rgb _ feature2, and inputting the operation into a first full connection layer to obtain a first cascade pedestrian feature map rgb _ stag1_ feature;
A5) inputting the first cascade pedestrian feature map rgb _ stag1_ feature into a second full-connection layer to obtain a first cascade color pedestrian classification result class _1 s;
step 2.2.3, the processing procedure of the second cascade color sub-network level _2s is as follows:
B1) carrying out convolution, batch normalization and activation on the Scale _ b picture samples to obtain a pedestrian characteristic image rgb _ feature _ b;
B2) performing pedestrian feature fusion on the pedestrian feature map rgb _ feature _ b and the pedestrian feature map rgb _ feature1, and then performing down-sampling to obtain a pedestrian feature map rgb _ feature3 with the same size as the Scale _ c picture sample;
B3) carrying out convolution and global average pooling operation on the pedestrian feature map rgb _ feature3, and inputting the operation into the first full connection layer to obtain a second cascade pedestrian feature map rgb _ stag2_ feature;
B4) inputting the second cascade pedestrian feature map rgb _ stag2_ feature into a second full-connection layer to obtain a second cascade color pedestrian classification result class _2 s;
step 2.2.4, the processing procedure of the third cascade color sub-network level _3s is as follows:
C1) carrying out convolution, batch normalization and activation on the Scale _ c picture samples to obtain a pedestrian characteristic image rgb _ feature _ c;
C2) pedestrian feature fusion is carried out on the pedestrian feature map rgb _ feature _ c, the pedestrian feature map rgb _ feature2 and the pedestrian feature map rgb _ feature3, convolution and global average pooling operation are carried out, then the pedestrian feature fusion is input into the first full connection layer, and a third-level joint pedestrian feature map rgb _ stag3_ feature is obtained;
C3) inputting the third-level joint pedestrian feature map rgb _ stag3_ feature into a second full-connection layer to obtain a third-level joint color pedestrian classification result class _3 s;
step 2.2.5, carrying out pedestrian feature fusion on the first cascade pedestrian feature map rgb _ stag1_ feature, the second cascade pedestrian feature map rgb _ stag2_ feature and the third cascade pedestrian feature map rgb _ stag3_ feature to obtain a colorful pedestrian fusion feature map rgb _ features;
step 2.3, for each sample group, carrying out pedestrian feature fusion on the color pedestrian fusion feature map rgb _ features and the gray level pedestrian fusion feature map gray _ features through a fusion layer, and then carrying out dimension reduction treatment through a PCA dimension reduction layer to obtain a final global pedestrian feature map features; the global pedestrian feature map features pass through the full connection layer to obtain a global pedestrian classification result classifys;
step 2.4, this batch has 3 sample groups in total, for any u < th > sample groupiA set of samples, i ═ 1,2,3, gives the u thiGlobal pedestrian feature map corresponding to each sample group
Figure BDA0003357303030000051
Global pedestrian classification results
Figure BDA0003357303030000052
First cascade color pedestrian classification result
Figure BDA0003357303030000053
Second cascade color pedestrian classification result
Figure BDA0003357303030000054
Third-level color pedestrian classification result
Figure BDA0003357303030000055
First cascade gray pedestrian classification result
Figure BDA0003357303030000056
Second cascade gray pedestrian classification result
Figure BDA0003357303030000057
And third-level gray pedestrian classification result
Figure BDA0003357303030000058
Step 2.5, calculating loss values of all levels of sub-networks:
step 2.5.1, classifying the first cascade color pedestrian classification result
Figure BDA0003357303030000059
And uiComparing the sample labels of the sample groups to obtain a first cascade color pedestrian classification loss value
Figure BDA00033573030300000510
Classifying the second cascade color pedestrian
Figure BDA00033573030300000511
And uiComparing the sample labels of the sample groups to obtain a second cascade color pedestrian classification loss value
Figure BDA00033573030300000512
Classifying the third cascade color pedestrian
Figure BDA00033573030300000513
And uiComparing the sample labels of the sample groups to obtain a third cascade color pedestrian classification loss value
Figure BDA0003357303030000061
Classifying the first cascade gray pedestrian
Figure BDA0003357303030000062
And uiComparing the sample labels of the individual sample groups to obtain a first cascade gray pedestrian classification loss value
Figure BDA0003357303030000063
Classifying the second cascade gray level pedestrian
Figure BDA0003357303030000064
And uiComparing the sample labels of the sample groups to obtain a second cascade gray level pedestrian classification loss value
Figure BDA0003357303030000065
Classifying the third cascade gray pedestrian
Figure BDA0003357303030000066
And uiComparing the sample labels of the sample groups to obtain a third cascade gray pedestrian classification loss value
Figure BDA0003357303030000067
Step 2.5.2, respectively calculating and obtaining a Loss value Loss _1s of the first cascade color sub-network level _1s, a Loss value Loss _2s of the second cascade color sub-network level _2s, a Loss value Loss _3s of the third cascade color sub-network level _3s, a Loss value Loss _1g of the first cascade gray sub-network level _1g, a Loss value Loss _2g of the second cascade gray sub-network level _2g, and a Loss value Loss _3g of the third cascade gray sub-network level _3g by adopting the following formula:
Figure BDA0003357303030000068
Figure BDA0003357303030000069
Figure BDA00033573030300000610
Figure BDA00033573030300000611
Figure BDA00033573030300000612
Figure BDA00033573030300000613
step 2.6, calculating a Loss value Loss _0 of the multi-scale twin cascade network:
step 2.6.1, classifying the global pedestrian results
Figure BDA00033573030300000614
And uiComparing the sample labels of the individual sample groups to obtain a global pedestrian classification loss value
Figure BDA00033573030300000615
Step 2.6.2, calculating to obtain a Loss value Loss _0 of the multi-scale twin cascade network by adopting the following formula:
Figure BDA0003357303030000071
step 2.7, calculating a similarity Loss function value Loss _ sim between the sample groups:
step 2.7.1, calculate sample set u1Global pedestrian feature map of
Figure BDA0003357303030000072
And a sample group u2Global pedestrian feature map of
Figure BDA0003357303030000073
The sample distance between, expressed as: d (u)1,u2);
Computing a set of samples u1Global pedestrian feature map of
Figure BDA0003357303030000074
And a sample group u3Global pedestrian feature map of
Figure BDA0003357303030000075
The sample distance between, expressed as: d (u)1,u3);
Step 2.7.2, calculating a preliminary Loss function value Loss _ d by adopting the following formula:
Loss_d=d(u1,u2)-d(u1,u3)+α
wherein: alpha is a loss function coefficient, and the value range is as follows: alpha is alpha<d(u1,u3)-d(u1,u2)
Step 2.7.3, obtaining a similarity Loss function value Loss _ sim by the following method:
if the Loss _ d is larger than 0, then Loss _ sim is Loss _ d;
if the Loss _ d is less than or equal to 0, then Loss _ sim is 0;
and 2.8, obtaining a final Loss function value Loss _ final by adopting the following formula:
Loss_final=λ1Loss_1s+λ1Loss_2s+λ1Loss_3s+λ1Loss_1g+λ1Loss_2g+λ1Loss_3g+λ2Loss_0+λ3Loss_sim
wherein:
λ1weight coefficients representing each cascaded subnetwork;
λ2a weight coefficient representing a loss of the multi-scale twin cascaded network;
λ3a similarity loss function value weight coefficient;
step 2.9, judging whether the final Loss function value Loss _ final is converged; if the convergence is achieved, obtaining a trained multi-scale twin cascade network, and executing the step 3; if not, adjusting the network parameters of the multi-scale twin cascade network, taking another batch of sample group as input, returning to the step 2.1, and performing iterative training on the multi-scale twin cascade network;
step 3, performing precision verification test on the trained multi-scale twin cascade network by using a verification set, and if the test precision meets the requirement, obtaining a multi-scale twin cascade network which passes the verification;
and 4, performing feature recognition on the input pedestrian picture by adopting a multi-scale twin cascade network to obtain a pedestrian feature recognition result.
Preferably, λ1Is 1, λ2Is 6, λ3Is 7.
Preferably, step 4 specifically comprises:
step 4.1, the input pedestrian picture is a picture Q; pre-establishing a pedestrian sample library G;
step 4.2, inputting the picture Q into a multi-scale twin cascade network to obtain a global pedestrian feature map features[Q]
For each pedestrian sample picture G in the pedestrian sample library GjJ ═ 1, 2.. once, z, z represents the number of pedestrian sample pictures in the pedestrian sample library G, and the pictures are respectively input into the multi-scale twin cascade network to obtain the corresponding global pedestrian feature map
Figure BDA0003357303030000081
Step 4.3, calculating global pedestrian feature maps features[Q]And global pedestrian feature map
Figure BDA0003357303030000082
The similarity of (2); and (4) sorting the similarity from large to small, and outputting the pedestrian sample pictures in the pedestrian sample library G with the highest similarity with the picture Q.
The pedestrian re-identification method based on the multi-scale twin cascade network provided by the invention has the following advantages:
the invention uses the multi-scale cascade network, fuses the cascade sub-feature graphs of the multi-scale and the corresponding superior sub-network and inputs the cascade sub-feature graphs into the secondary sub-network for pedestrian feature extraction, and fuses the pedestrian features of each sub-network, thereby obtaining more macroscopic and accurate pedestrian feature expression. Therefore, the method can obtain more global, high-level and accurate pedestrian feature expression, avoid the interference of chromatic aberration, illumination, scale set scenes and the like, and improve the accuracy of pedestrian re-identification.
Drawings
FIG. 1 is a schematic overall flow chart of a pedestrian re-identification method based on a multi-scale twin cascade network provided by the invention;
FIG. 2 is an overall schematic diagram of a multi-scale twin cascaded network provided by the present invention;
FIG. 3 is a diagram of a first cascaded sub-network level _1 according to the present invention;
FIG. 4 is a diagram of a level _2 of a second cascaded sub-network according to the present invention;
fig. 5 is a structural diagram of a level _3 of the third hierarchical network provided by the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Because interference conditions such as chromatic aberration, illumination, scale set scenes and the like in the prior art easily cause the reduction of the accuracy of pedestrian re-identification, the invention provides a pedestrian re-identification method based on a multi-scale twin cascade network, which has the following characteristics: 1) constructing a multi-scale twin cascade network with a color and gray level dual input structure, performing feature fusion on the multi-scale color cascade features and the multi-scale gray level cascade features, and then adopting a feature dimension reduction strategy, thereby obtaining more global, high-level and accurate pedestrian feature expression; 2) and (3) using a multi-scale cascade network, fusing the multi-scale cascade sub-feature graphs corresponding to the superior sub-networks and inputting the multi-scale cascade sub-feature graphs into the secondary sub-networks for pedestrian feature extraction, and fusing the pedestrian features of the sub-networks, thereby obtaining more macroscopic and accurate pedestrian feature expression. Therefore, the method can obtain more global, high-level and accurate pedestrian feature expression, avoid the interference of chromatic aberration, illumination, scale set scenes and the like, and improve the accuracy of pedestrian re-identification.
The invention provides a pedestrian re-identification method based on a multi-scale twin cascade network, which comprises the following steps with reference to a figure 1:
step 1, constructing a data set; the data set comprises a plurality of sample groups; each sample group comprises two image samples which are respectively a color image sample and a gray image sample; the gray image sample is an image sample obtained after graying the color image sample;
dividing the data set into a training set TrainSet and a verification set;
the training set TrainSet is used for training the multi-scale twin cascade network; the verification set is used for verifying the accuracy of the multi-scale twin cascaded network.
Step 2, constructing a multi-scale twin cascade network; the multi-scale twin cascade Network comprises a multi-scale twin cascade color Network _1, a multi-scale twin cascade gray scale Network _2, a fusion layer and a PCA dimension reduction layer;
the Network structures of the multi-scale twin cascade color Network _1 and the multi-scale twin cascade gray scale Network _2 are completely the same;
in the invention, each color picture sample in the data set corresponds to a gray picture sample, the color picture sample is input into a multi-scale twin cascade color Network _1, and the gray picture sample is input into a multi-scale twin cascade gray Network _ 2. By setting the multi-scale twin cascade gray level Network 2 with the same structure as the multi-scale twin cascade color Network 1, the influences of color difference, illumination, scenes, postures and the like caused by camera crossing can be supplemented, and the accuracy of pedestrian feature extraction is improved.
The multi-scale twin cascade color Network _1 comprises a first cascade color sub-Network level _1s, a second cascade color sub-Network level _2s and a third cascade color sub-Network level _3 s;
the multi-scale twin cascade gray level Network _2 comprises a first cascade gray level sub-Network level _1g, a second cascade gray level sub-Network level _2g and a third cascade gray level sub-Network level _3 g;
in the present invention, the requirements of three cascaded subnetworks are: the backbone network is different, and the structure can be simpler in one level than in one level. The backbone network can be a simple convolution network, a residual error network or a combination of various networks, but the output scale of the sub-feature graph of the upper-level network is required to be consistent with the input scale of the lower-level network, and the scales only refer to height and width.
Training the multi-scale twin cascade network by adopting the following mode to obtain the trained multi-scale twin cascade network:
step 2.1, taking 3 sample groups as a batch of sample group sets; each batch of 3 sample groups is represented as: sample set u1Sample group u2And a sample group u3(ii) a Wherein the sample group u1To fix the sample; sample set u2And a sample group u1Corresponding to the same pedestrian, sample group u2Is a sample group u1A positive sample of (a); sample set u3And a sample group u1Corresponding to different pedestrians, sample group u3Is a sample group u1A negative sample of (d);
inputting a set of sets of sample sets of a batch into the multi-scale twin cascaded network;
step 2.2, for each sample group, its color picture samples are represented as: color picture samples rgb _ tu, grayscale picture samples denoted gray _ tu;
referring to fig. 2, the color picture sample rgb _ tu is input to the multi-scale twin cascade color Network _1 to obtain a first cascade color pedestrian classification result class _1s output by the first cascade color sub-Network level _1s, a second cascade color pedestrian classification result class _2s output by the second cascade color sub-Network level _2s, a third cascade color pedestrian classification result class _3s output by the third cascade color sub-Network level _3s, and a color pedestrian fusion feature map rgb _ features output by the multi-scale twin cascade color Network _ 1;
inputting the gray picture sample gray _ tu into a multi-scale twin cascade gray Network _2 to obtain a first cascade gray pedestrian classification result class _1g output by a first cascade gray sub-Network level _1g, a second cascade gray pedestrian classification result class _2g output by a second cascade gray sub-Network level _2g, a third cascade gray pedestrian classification result class _3g output by a third cascade gray sub-Network level _3g and a gray pedestrian fusion feature map gray _ features output by a multi-scale twin cascade gray Network _ 2;
because the processing procedure of inputting the color picture sample rgb _ tu into the multi-scale twin cascaded color Network _1 is completely the same as the processing procedure of inputting the gray picture sample gray _ tu into the multi-scale twin cascaded gray Network _2, the invention only takes the processing procedure of inputting the color picture sample rgb _ tu into the multi-scale twin cascaded color Network _1 as an example, and the detailed description is carried out through the steps 2.2.1 to 2.2.5, and the processing procedure of inputting the gray picture sample gray _ tu into the multi-scale twin cascaded gray Network _2 is not repeated.
Wherein, the color picture sample rgb _ tu is input into the multi-scale twin cascade color Network _1, and the specific process is as follows:
step 2.2.1, the color picture sample rgb _ tu is reduced to obtain a Scale _ a picture sample; further reducing the Scale _ a picture sample to obtain a Scale _ b picture sample; further reducing the Scale _ b picture sample to obtain a Scale _ c picture sample;
therefore, the picture sizes of the Scale _ a picture sample, the Scale _ b picture sample, and the Scale _ c picture sample are not reduced.
As a specific implementation manner, the Scale _ a picture sample is reduced by two times to obtain a Scale _ b picture sample; and reducing the Scale of the Scale _ b picture sample by two times to obtain a Scale _ c picture sample. For example, Scale _ a picture sample size is 128 × 384, Scale _ b picture sample size is 64 × 192, Scale _ c picture sample size is 32 × 96, and 32 indicates the width of the picture; 96 refers to the height of the picture.
Step 2.2.2, the Scale _ a picture sample is input into the first cascade color sub-network level _1s, and the processing procedure of the first cascade color sub-network level _1s refers to fig. 3 as follows:
A1) carrying out convolution, batch normalization and activation on the Scale _ a picture samples to obtain a pedestrian feature map rgb _ feature _ a;
A2) down-sampling the pedestrian feature map rgb _ feature _ a to obtain a pedestrian feature map rgb _ feature1 with the same size as the Scale _ b picture sample;
A3) down-sampling the pedestrian feature map rgb _ feature1 to obtain a pedestrian feature map rgb _ feature2 with the same size as the Scale _ c picture sample;
A4) carrying out convolution and global average pooling operation on the pedestrian feature map rgb _ feature2, and inputting the operation into a first full connection layer to obtain a first cascade pedestrian feature map rgb _ stag1_ feature;
A5) inputting the first cascade pedestrian feature map rgb _ stag1_ feature into a second full-connection layer to obtain a first cascade color pedestrian classification result class _1 s;
for example, Scale _ a picture sample input at Scale 128 × 384, convolution layer, batch normalization layer, ReLU activation layer, two down-sampling units down sampling _ unit, convolution, average pooling, full connection layer fc1, and full connection layer fc 2.
The downsampling unit in this embodiment is implemented by using convolution with a step size of 2, see a dashed-line frame region in fig. 2, the downsampling unit1 and the downsampling unit2 implement that after downsampling of the feature map, a pedestrian feature map rgb 1 and a pedestrian feature map rgb feature2 are respectively obtained, the output of fc1 of the cascade network level _1 full network obtains a first cascade pedestrian feature map rgb _ stag1_ feature, and the fc2 of the cascade network level _1 full network outputs a first cascade color pedestrian classification result class _1 s.
Step 2.2.3, the processing procedure of the second cascaded color sub-network level _2s with reference to fig. 4 is:
B1) carrying out convolution, batch normalization and activation on the Scale _ b picture samples to obtain a pedestrian characteristic image rgb _ feature _ b;
B2) performing pedestrian feature fusion on the pedestrian feature map rgb _ feature _ b and the pedestrian feature map rgb _ feature1, and then performing down-sampling to obtain a pedestrian feature map rgb _ feature3 with the same size as the Scale _ c picture sample;
B3) carrying out convolution and global average pooling operation on the pedestrian feature map rgb _ feature3, and inputting the operation into the first full connection layer to obtain a second cascade pedestrian feature map rgb _ stag2_ feature;
B4) inputting the second cascade pedestrian feature map rgb _ stag2_ feature into a second full-connection layer to obtain a second cascade color pedestrian classification result class _2 s;
in this embodiment, a convolutional network is constructed as an example, and the following are sequentially performed according to the direction of data flow: scale _ b picture sample input at Scale 96 x 192, convolution layer, batch normalization layer, ReLU activation layer, one down-sampling unit down sampling _ unit, convolution, average pooling, full connection layer fc1, and full connection layer fc 2.
Different from level _1 are: the network is a dual-input structure and comprises the following parts: scale _ b picture sample input and pedestrian feature map rgb _ feature1 at Scale 64 x 192, and only one down-sampling unit is needed. The number of the first layer of convolution kernels of the network needs to be consistent with the number of channels of the pedestrian feature map rgb _ feature1, and then the pedestrian feature map rgb _ feature1 and the 192Scale _ b picture sample after the first layer of convolution operation are added, and then the added data are input into a subsequent network structure, wherein a downsamping _ unit is used for downsampling the feature map to obtain the pedestrian feature map rgb _ feature3 needed by a level _3 network, and the fc1 of a cascade network level _2 full network outputs a second cascade pedestrian feature map rgb _ stag2_ feature and the cascade network level _2 full network fc2 outputs a second cascade color pedestrian classification result class _2 s.
Step 2.2.4, the processing procedure of the third cascaded color sub-network level _3s with reference to fig. 5 is:
C1) carrying out convolution, batch normalization and activation on the Scale _ c picture samples to obtain a pedestrian characteristic image rgb _ feature _ c;
C2) pedestrian feature fusion is carried out on the pedestrian feature map rgb _ feature _ c, the pedestrian feature map rgb _ feature2 and the pedestrian feature map rgb _ feature3, convolution and global average pooling operation are carried out, then the pedestrian feature fusion is input into the first full connection layer, and a third-level joint pedestrian feature map rgb _ stag3_ feature is obtained;
C3) inputting the third-level joint pedestrian feature map rgb _ stag3_ feature into a second full-connection layer to obtain a third-level joint color pedestrian classification result class _3 s;
in this embodiment, a convolutional network is constructed as an example, and the following are sequentially performed according to the direction of data flow: scale _ c picture sample input at Scale 32 x 96, convolution layer, batch normalization layer, ReLU activation layer, convolution, average pooling, full connection layer fc1, and full connection layer fc 2.
Different from the first two networks are: the network is a three-input structure, which is respectively: scale _ c picture sample input with Scale 32 x 96, pedestrian feature map rgb feature2 and pedestrian feature map rgb feature 3. The number of the first layer of convolution of the network needs to be consistent with the number of channels of the pedestrian feature map rgb _ feature2 and the pedestrian feature map rgb _ feature3, then the first layer of convolution operation 32 × 96 original image, the pedestrian feature map rgb _ feature2 and the pedestrian feature map rgb _ feature3 are subjected to Add operation, and then the added operation is input into a subsequent network structure, finally the fc1 of the cascade network level _3 full network outputs a third-level joint pedestrian feature map rgb _ stag3_ feature and the fc2 of the cascade network level _3 full network outputs a third-level joint color pedestrian classification result class _3 s.
Step 2.2.5, carrying out pedestrian feature fusion on the first cascade pedestrian feature map rgb _ stag1_ feature, the second cascade pedestrian feature map rgb _ stag2_ feature and the third cascade pedestrian feature map rgb _ stag3_ feature to obtain a colorful pedestrian fusion feature map rgb _ features;
step 2.3, for each sample group, carrying out pedestrian feature fusion on the color pedestrian fusion feature map rgb _ features and the gray level pedestrian fusion feature map gray _ features through a fusion layer, and then carrying out dimension reduction treatment through a PCA dimension reduction layer to obtain a final global pedestrian feature map features; the global pedestrian feature map features pass through the full connection layer to obtain a global pedestrian classification result classifys;
in the step, the color pedestrian fusion feature map rgb _ features and the gray level pedestrian fusion feature map gray _ features are subjected to channel fusion, so that multi-scale features of color and gray level images are fused to obtain richer pedestrian information, then PCA is connected to sequentially reduce mean centralization, calculate covariance and decompose feature values of the fusion features, and finally a final effective feature dimension is selected according to a feature value decomposition result to obtain a final global pedestrian feature map features.
For example, the color pedestrian fusion feature map rgb _ features and the gray pedestrian fusion feature map gray _ features of the twin cascade network are subjected to channel fusion to obtain D ═ (x ═ x)(1),x(2),...x(m)) M is 1024 dimensions in this embodiment, where x is a column vector of length batch; when PCA feature dimension reduction is carried out, firstly, the centralization operation of mean value reduction is carried out on D, see formula
Figure BDA0003357303030000161
The obtained feature vector is represented by X, that is, X is (X1)(1)′,x1(2)′,...x1(m)′). Then, the covariance matrix V-XX is calculatedTFinally, matrix decomposition V ═ U ∑ U is carried out on VTThe purpose of matrix decomposition is to decompose the fused matrix V into eigenvalues and eigenvectors, the magnitude of which is used to determine the quality of the eigenvectors. The m eigenvalues after V decomposition are ∑ ═ (λ)12,...λm) Corresponding feature vector U ═ w1,w2...wmFinally, selecting eigenvectors { w) corresponding to the first k eigenvalues according to the set eigendimension k1,w2...wkThe final feature vector is composed of: global pedestrian feature maps features.
Step 2.4, this batchThere were 3 sample groups in total, for any u < th > sampleiA set of samples, i ═ 1,2,3, gives the u thiGlobal pedestrian feature map corresponding to each sample group
Figure BDA0003357303030000162
Global pedestrian classification results
Figure BDA0003357303030000163
First cascade color pedestrian classification result
Figure BDA0003357303030000164
Second cascade color pedestrian classification result
Figure BDA0003357303030000165
Third-level color pedestrian classification result
Figure BDA0003357303030000166
First cascade gray pedestrian classification result
Figure BDA0003357303030000167
Second cascade gray pedestrian classification result
Figure BDA0003357303030000168
And third-level gray pedestrian classification result
Figure BDA0003357303030000169
Step 2.5, calculating loss values of all levels of sub-networks:
step 2.5.1, classifying the first cascade color pedestrian classification result
Figure BDA00033573030300001610
And uiComparing the sample labels of the sample groups to obtain a first cascade color pedestrian classification loss value
Figure BDA00033573030300001611
The second cascade color pedestrianClassification result
Figure BDA00033573030300001612
And uiComparing the sample labels of the sample groups to obtain a second cascade color pedestrian classification loss value
Figure BDA00033573030300001613
Classifying the third cascade color pedestrian
Figure BDA00033573030300001614
And uiComparing the sample labels of the sample groups to obtain a third cascade color pedestrian classification loss value
Figure BDA00033573030300001615
Classifying the first cascade gray pedestrian
Figure BDA00033573030300001616
And uiComparing the sample labels of the individual sample groups to obtain a first cascade gray pedestrian classification loss value
Figure BDA0003357303030000171
Classifying the second cascade gray level pedestrian
Figure BDA0003357303030000172
And uiComparing the sample labels of the sample groups to obtain a second cascade gray level pedestrian classification loss value
Figure BDA0003357303030000173
Classifying the third cascade gray pedestrian
Figure BDA0003357303030000174
And uiComparing the sample labels of the sample groups to obtain a third cascade gray pedestrian classification loss value
Figure BDA0003357303030000175
Step 2.5.2, respectively calculating and obtaining a Loss value Loss _1s of the first cascade color sub-network level _1s, a Loss value Loss _2s of the second cascade color sub-network level _2s, a Loss value Loss _3s of the third cascade color sub-network level _3s, a Loss value Loss _1g of the first cascade gray sub-network level _1g, a Loss value Loss _2g of the second cascade gray sub-network level _2g, and a Loss value Loss _3g of the third cascade gray sub-network level _3g by adopting the following formula:
Figure BDA0003357303030000176
Figure BDA0003357303030000177
Figure BDA0003357303030000178
Figure BDA0003357303030000179
Figure BDA00033573030300001710
Figure BDA00033573030300001711
step 2.6, calculating a Loss value Loss _0 of the multi-scale twin cascade network:
step 2.6.1, classifying the global pedestrian results
Figure BDA00033573030300001712
And uiSample labeling of individual sample groupsComparing to obtain a global pedestrian classification loss value
Figure BDA00033573030300001713
Step 2.6.2, calculating to obtain a Loss value Loss _0 of the multi-scale twin cascade network by adopting the following formula:
Figure BDA00033573030300001714
step 2.7, calculating a similarity Loss function value Loss _ sim between the sample groups:
step 2.7.1, calculate sample set u1Global pedestrian feature map of
Figure BDA0003357303030000181
And a sample group u2Global pedestrian feature map of
Figure BDA0003357303030000182
The sample distance between, expressed as: d (u)1,u2);
Computing a set of samples u1Global pedestrian feature map of
Figure BDA0003357303030000183
And a sample group u3Global pedestrian feature map of
Figure BDA0003357303030000184
The sample distance between, expressed as: d (u)1,u3);
Step 2.7.2, calculating a preliminary Loss function value Loss _ d by adopting the following formula:
Loss_d=d(u1,u2)-d(u1,u3)+α
wherein: alpha is a loss function coefficient, and the value range is as follows: alpha is alpha<d(u1,u3)-d(u1,u2)
Step 2.7.3, obtaining a similarity Loss function value Loss _ sim by the following method:
if the Loss _ d is larger than 0, then Loss _ sim is Loss _ d;
if the Loss _ d is less than or equal to 0, then Loss _ sim is 0;
and 2.8, obtaining a final Loss function value Loss _ final by adopting the following formula:
Loss_final=λ1Loss_1s+λ1Loss_2s+λ1Loss_3s+λ1Loss_1g+λ1Loss_2g+λ1Loss_3g+λ2Loss_0+λ3Loss_sim
wherein:
λ1weight coefficients representing each cascaded subnetwork;
λ2a weight coefficient representing a loss of the multi-scale twin cascaded network;
λ3a similarity loss function value weight coefficient;
as a specific implementation, λ1Is 1, λ2Is 6, λ3Is 7.
Step 2.9, judging whether the final Loss function value Loss _ final is converged; if the convergence is achieved, obtaining a trained multi-scale twin cascade network, and executing the step 3; if not, adjusting the network parameters of the multi-scale twin cascade network, taking another batch of sample group as input, returning to the step 2.1, and performing iterative training on the multi-scale twin cascade network;
step 3, performing precision verification test on the trained multi-scale twin cascade network by using a verification set, and if the test precision meets the requirement, obtaining a multi-scale twin cascade network which passes the verification;
and 4, performing feature recognition on the input pedestrian picture by adopting a multi-scale twin cascade network to obtain a pedestrian feature recognition result.
The step 4 specifically comprises the following steps:
step 4.1, the input pedestrian picture is a picture Q; pre-establishing a pedestrian sample library G;
step 4.2, inputting the picture Q into a multi-scale twin cascade network to obtain a global pedestrian feature map features[Q]
For pedestrian sample library GEach pedestrian sample picture G in (1)jJ ═ 1, 2.. once, z, z represents the number of pedestrian sample pictures in the pedestrian sample library G, and the pictures are respectively input into the multi-scale twin cascade network to obtain the corresponding global pedestrian feature map
Figure BDA0003357303030000191
Step 4.3, calculating global pedestrian feature maps features[Q]And global pedestrian feature map
Figure BDA0003357303030000192
The similarity of (2); and (4) sorting the similarity from large to small, and outputting the pedestrian sample pictures in the pedestrian sample library G with the highest similarity with the picture Q.
As a specific implementation manner, 1 pedestrian sample picture in the pedestrian sample library G with the highest similarity to the picture Q may be output. The parameter K of the number of best matches of the images may also be preset, for example, if K is set to 10, sorting the pedestrian sample images in the pedestrian sample library G from large to small according to the similarity, and then selecting the 1 st to 10 th pedestrian sample images and outputting the pedestrian sample images according to the sorting order.
The technical essential that this patent relates to: 1. establishing a multi-scale cascade network, fusing and inputting the multi-scale and color or gray sub-features of a corresponding superior sub-network into a secondary sub-network for pedestrian feature extraction, and fusing the pedestrian features of the sub-networks; 2. and a twin network with double input of color and gray levels is used, the pedestrian features of color multi-scale and gray level multi-scale are subjected to feature fusion, and then a pedestrian feature dimension reduction strategy is adopted, so that stronger, comprehensive and effective pedestrian feature expression is obtained.
Compared with the prior art, the invention has the beneficial effects that:
compared with single input, the gray level input mode can balance and increase pedestrian characteristic information influenced by chromatic aberration, illumination, scenes and the like caused by camera setting, thereby extracting more comprehensive pedestrian characteristic information.
In order to reduce the interference of redundant information, the final pedestrian characteristics are subjected to dimension reduction operation, so that the obtained pedestrian characteristics represent more comprehensive and are not too complicated.
The method is different from the prior art that only a single-scale mode is adopted for pedestrian feature extraction, but a multi-scale mode is adopted for carrying out pedestrian feature extraction on images of each scale and carrying out channel fusion, so that the pedestrian features with both spatial information and strong semantic information are obtained.
Therefore, the method is different from the method for extracting the pedestrian features by adopting a single network structure in the prior art, when the network is constructed, a plurality of sub-networks are adopted to construct a multi-scale twin cascade network, and each level of network is utilized to extract a plurality of sub-features, so that the method for extracting the pedestrian features at each level is ensured to be mutually independent; the output of each level is combined with original images with different scales to serve as the input of the next level, and finally, the pedestrian features of each cascade are fused to realize mutual supplement of different cascade networks, so that more obvious pedestrian features are mined, and the pedestrian feature expression force is enhanced.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims (3)

1. A pedestrian re-identification method based on a multi-scale twin cascade network is characterized by comprising the following steps:
step 1, constructing a data set; the data set comprises a plurality of sample groups; each sample group comprises two image samples which are respectively a color image sample and a gray image sample; the gray image sample is an image sample obtained after graying the color image sample;
dividing the data set into a training set TrainSet and a verification set;
step 2, constructing a multi-scale twin cascade network; the multi-scale twin cascade Network comprises a multi-scale twin cascade color Network _1, a multi-scale twin cascade gray scale Network _2, a fusion layer and a PCA dimension reduction layer;
the Network structures of the multi-scale twin cascade color Network _1 and the multi-scale twin cascade gray scale Network _2 are completely the same;
the multi-scale twin cascade color Network _1 comprises a first cascade color sub-Network level _1s, a second cascade color sub-Network level _2s and a third cascade color sub-Network level _3 s;
the multi-scale twin cascade gray level Network _2 comprises a first cascade gray level sub-Network level _1g, a second cascade gray level sub-Network level _2g and a third cascade gray level sub-Network level _3 g;
training the multi-scale twin cascade network by adopting the following mode to obtain the trained multi-scale twin cascade network:
step 2.1, taking 3 sample groups as a batch of sample group sets; each batch of 3 sample groups is represented as: sample set u1Sample group u2And a sample group u3(ii) a Wherein the sample group u1To fix the sample; sample set u2And a sample group u1Corresponding to the same pedestrian, sample group u2Is a sample group u1A positive sample of (a); sample set u3And a sample group u1Corresponding to different pedestrians, sample group u3Is a sample group u1A negative sample of (d);
inputting a set of sets of sample sets of a batch into the multi-scale twin cascaded network;
step 2.2, for each sample group, its color picture samples are represented as: color picture samples rgb _ tu, grayscale picture samples denoted gray _ tu;
inputting the color picture sample rgb _ tu into the multi-scale twin cascade color Network _1 to obtain a first cascade color pedestrian classification result class _1s output by the first cascade color sub-Network level _1s, a second cascade color pedestrian classification result class _2s output by the second cascade color sub-Network level _2s, a third cascade color pedestrian classification result class _3s output by the third cascade color sub-Network level _3s, and a color pedestrian fusion feature map rgb _ features output by the multi-scale twin cascade color Network _ 1;
inputting the gray picture sample gray _ tu into a multi-scale twin cascade gray Network _2 to obtain a first cascade gray pedestrian classification result class _1g output by a first cascade gray sub-Network level _1g, a second cascade gray pedestrian classification result class _2g output by a second cascade gray sub-Network level _2g, a third cascade gray pedestrian classification result class _3g output by a third cascade gray sub-Network level _3g and a gray pedestrian fusion feature map gray _ features output by a multi-scale twin cascade gray Network _ 2;
wherein, the color picture sample rgb _ tu is input into the multi-scale twin cascade color Network _1, and the specific process is as follows:
step 2.2.1, the color picture sample rgb _ tu is reduced to obtain a Scale _ a picture sample; further reducing the Scale _ a picture sample to obtain a Scale _ b picture sample; further reducing the Scale _ b picture sample to obtain a Scale _ c picture sample;
step 2.2.2, inputting the Scale _ a picture sample into a first cascade color sub-network level _1s, wherein the processing process of the first cascade color sub-network level _1s is as follows:
A1) carrying out convolution, batch normalization and activation on the Scale _ a picture samples to obtain a pedestrian feature map rgb _ feature _ a;
A2) down-sampling the pedestrian feature map rgb _ feature _ a to obtain a pedestrian feature map rgb _ feature1 with the same size as the Scale _ b picture sample;
A3) down-sampling the pedestrian feature map rgb _ feature1 to obtain a pedestrian feature map rgb _ feature2 with the same size as the Scale _ c picture sample;
A4) carrying out convolution and global average pooling operation on the pedestrian feature map rgb _ feature2, and inputting the operation into a first full connection layer to obtain a first cascade pedestrian feature map rgb _ stag1_ feature;
A5) inputting the first cascade pedestrian feature map rgb _ stag1_ feature into a second full-connection layer to obtain a first cascade color pedestrian classification result class _1 s;
step 2.2.3, the processing procedure of the second cascade color sub-network level _2s is as follows:
B1) carrying out convolution, batch normalization and activation on the Scale _ b picture samples to obtain a pedestrian characteristic image rgb _ feature _ b;
B2) performing pedestrian feature fusion on the pedestrian feature map rgb _ feature _ b and the pedestrian feature map rgb _ feature1, and then performing down-sampling to obtain a pedestrian feature map rgb _ feature3 with the same size as the Scale _ c picture sample;
B3) carrying out convolution and global average pooling operation on the pedestrian feature map rgb _ feature3, and inputting the operation into the first full connection layer to obtain a second cascade pedestrian feature map rgb _ stag2_ feature;
B4) inputting the second cascade pedestrian feature map rgb _ stag2_ feature into a second full-connection layer to obtain a second cascade color pedestrian classification result class _2 s;
step 2.2.4, the processing procedure of the third cascade color sub-network level _3s is as follows:
C1) carrying out convolution, batch normalization and activation on the Scale _ c picture samples to obtain a pedestrian characteristic image rgb _ feature _ c;
C2) pedestrian feature fusion is carried out on the pedestrian feature map rgb _ feature _ c, the pedestrian feature map rgb _ feature2 and the pedestrian feature map rgb _ feature3, convolution and global average pooling operation are carried out, then the pedestrian feature fusion is input into the first full connection layer, and a third-level joint pedestrian feature map rgb _ stag3_ feature is obtained;
C3) inputting the third-level joint pedestrian feature map rgb _ stag3_ feature into a second full-connection layer to obtain a third-level joint color pedestrian classification result class _3 s;
step 2.2.5, carrying out pedestrian feature fusion on the first cascade pedestrian feature map rgb _ stag1_ feature, the second cascade pedestrian feature map rgb _ stag2_ feature and the third cascade pedestrian feature map rgb _ stag3_ feature to obtain a colorful pedestrian fusion feature map rgb _ features;
step 2.3, for each sample group, carrying out pedestrian feature fusion on the color pedestrian fusion feature map rgb _ features and the gray level pedestrian fusion feature map gray _ features through a fusion layer, and then carrying out dimension reduction treatment through a PCA dimension reduction layer to obtain a final global pedestrian feature map features; the global pedestrian feature map features pass through the full connection layer to obtain a global pedestrian classification result classifys;
step 2.4, this batch has 3 sample groups in total, for any u < th > sample groupiA set of samples, i ═ 1,2,3, gives the u thiGlobal pedestrian feature map corresponding to each sample group
Figure FDA0003357303020000041
Global pedestrian classification results
Figure FDA0003357303020000042
First cascade color pedestrian classification result
Figure FDA0003357303020000043
Second cascade color pedestrian classification result
Figure FDA0003357303020000044
Third-level color pedestrian classification result
Figure FDA0003357303020000045
First cascade gray pedestrian classification result
Figure FDA0003357303020000046
Second cascade gray pedestrian classification result
Figure FDA0003357303020000047
And third-level gray pedestrian classification result
Figure FDA0003357303020000048
Step 2.5, calculating loss values of all levels of sub-networks:
step 2.5.1, classifying the first cascade color pedestrian classification result
Figure FDA0003357303020000049
And uiComparing the sample labels of the sample groups to obtain a first cascade color pedestrian classification loss value
Figure FDA00033573030200000410
Classifying the second cascade color pedestrian
Figure FDA00033573030200000411
And uiComparing the sample labels of the sample groups to obtain a second cascade color pedestrian classification loss value
Figure FDA00033573030200000412
Classifying the third cascade color pedestrian
Figure FDA00033573030200000413
And uiComparing the sample labels of the sample groups to obtain a third cascade color pedestrian classification loss value
Figure FDA00033573030200000414
Classifying the first cascade gray pedestrian
Figure FDA00033573030200000415
And uiComparing the sample labels of the individual sample groups to obtain a first cascade gray pedestrian classification loss value
Figure FDA0003357303020000051
Classifying the second cascade gray level pedestrian
Figure FDA0003357303020000052
And uiComparing the sample labels of the sample groups to obtain a second cascade gray level pedestrian classification loss value
Figure FDA0003357303020000053
Classifying the third cascade gray pedestrian
Figure FDA0003357303020000054
And uiComparing the sample labels of the sample groups to obtain a third cascade gray pedestrian classification loss value
Figure FDA0003357303020000055
Step 2.5.2, respectively calculating and obtaining a Loss value Loss _1s of the first cascade color sub-network level _1s, a Loss value Loss _2s of the second cascade color sub-network level _2s, a Loss value Loss _3s of the third cascade color sub-network level _3s, a Loss value Loss _1g of the first cascade gray sub-network level _1g, a Loss value Loss _2g of the second cascade gray sub-network level _2g, and a Loss value Loss _3g of the third cascade gray sub-network level _3g by adopting the following formula:
Figure FDA0003357303020000056
Figure FDA0003357303020000057
Figure FDA0003357303020000058
Figure FDA0003357303020000059
Figure FDA00033573030200000510
Figure FDA00033573030200000511
step 2.6, calculating a Loss value Loss _0 of the multi-scale twin cascade network:
step 2.6.1, classifying the global pedestrian results
Figure FDA00033573030200000512
And uiComparing the sample labels of the individual sample groups to obtain a global pedestrian classification loss value
Figure FDA00033573030200000513
Step 2.6.2, calculating to obtain a Loss value Loss _0 of the multi-scale twin cascade network by adopting the following formula:
Figure FDA00033573030200000514
step 2.7, calculating a similarity Loss function value Loss _ sim between the sample groups:
step 2.7.1, calculate sample set u1Global pedestrian feature map of
Figure FDA0003357303020000061
And a sample group u2Global pedestrian feature map of
Figure FDA0003357303020000062
The sample distance between, expressed as: d (u)1,u2);
Computing a set of samples u1Global pedestrian feature map of
Figure FDA0003357303020000063
And a sample group u3Global pedestrian feature map of
Figure FDA0003357303020000064
The sample distance between, expressed as: d (u)1,u3);
Step 2.7.2, calculating a preliminary Loss function value Loss _ d by adopting the following formula:
Loss_d=d(u1,u2)-d(u1,u3)+α
wherein: alpha is a loss function coefficient, and the value range is as follows: alpha is alpha<d(u1,u3)-d(u1,u2)
Step 2.7.3, obtaining a similarity Loss function value Loss _ sim by the following method:
if the Loss _ d is larger than 0, then Loss _ sim is Loss _ d;
if the Loss _ d is less than or equal to 0, then Loss _ sim is 0;
and 2.8, obtaining a final Loss function value Loss _ final by adopting the following formula:
Loss_final=λ1Loss_1s+λ1Loss_2s+λ1Loss_3s+λ1Loss_1g+λ1Loss_2g+λ1Loss_3g+λ2Loss_0+λ3Loss_sim
wherein:
λ1weight coefficients representing each cascaded subnetwork;
λ2a weight coefficient representing a loss of the multi-scale twin cascaded network;
λ3a similarity loss function value weight coefficient;
step 2.9, judging whether the final Loss function value Loss _ final is converged; if the convergence is achieved, obtaining a trained multi-scale twin cascade network, and executing the step 3; if not, adjusting the network parameters of the multi-scale twin cascade network, taking another batch of sample group as input, returning to the step 2.1, and performing iterative training on the multi-scale twin cascade network;
step 3, performing precision verification test on the trained multi-scale twin cascade network by using a verification set, and if the test precision meets the requirement, obtaining a multi-scale twin cascade network which passes the verification;
and 4, performing feature recognition on the input pedestrian picture by adopting a multi-scale twin cascade network to obtain a pedestrian feature recognition result.
2. The pedestrian re-identification method based on the multi-scale twin cascade network as claimed in claim 1, wherein λ is1Is 1, λ2Is 6, λ3Is 7.
3. The pedestrian re-identification method based on the multi-scale twin cascade network as claimed in claim 1, wherein the step 4 specifically comprises:
step 4.1, the input pedestrian picture is a picture Q; pre-establishing a pedestrian sample library G;
step 4.2, inputting the picture Q into a multi-scale twin cascade network to obtain a global pedestrian feature map features[Q]
For each pedestrian sample picture G in the pedestrian sample library GjJ ═ 1, 2.. once, z, z represents the number of pedestrian sample pictures in the pedestrian sample library G, and the pictures are respectively input into the multi-scale twin cascade network to obtain the corresponding global pedestrian feature map
Figure FDA0003357303020000071
Step 4.3, calculating global pedestrian feature maps features[Q]And global pedestrian feature map
Figure FDA0003357303020000072
The similarity of (2); and (4) sorting the similarity from large to small, and outputting the pedestrian sample pictures in the pedestrian sample library G with the highest similarity with the picture Q.
CN202111355189.2A 2021-11-16 2021-11-16 Pedestrian re-identification method based on multi-scale twin cascade network Active CN113963150B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111355189.2A CN113963150B (en) 2021-11-16 2021-11-16 Pedestrian re-identification method based on multi-scale twin cascade network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111355189.2A CN113963150B (en) 2021-11-16 2021-11-16 Pedestrian re-identification method based on multi-scale twin cascade network

Publications (2)

Publication Number Publication Date
CN113963150A CN113963150A (en) 2022-01-21
CN113963150B true CN113963150B (en) 2022-04-08

Family

ID=79470827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111355189.2A Active CN113963150B (en) 2021-11-16 2021-11-16 Pedestrian re-identification method based on multi-scale twin cascade network

Country Status (1)

Country Link
CN (1) CN113963150B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446898A (en) * 2018-09-20 2019-03-08 暨南大学 A kind of recognition methods again of the pedestrian based on transfer learning and Fusion Features
CN110084215A (en) * 2019-05-05 2019-08-02 上海海事大学 A kind of pedestrian of the twin network model of binaryzation triple recognition methods and system again
CN110909605A (en) * 2019-10-24 2020-03-24 西北工业大学 Cross-modal pedestrian re-identification method based on contrast correlation
CN111259850A (en) * 2020-01-23 2020-06-09 同济大学 Pedestrian re-identification method integrating random batch mask and multi-scale representation learning
CN111709317A (en) * 2020-05-28 2020-09-25 西安理工大学 Pedestrian re-identification method based on multi-scale features under saliency model
CN111931637A (en) * 2020-08-07 2020-11-13 华南理工大学 Cross-modal pedestrian re-identification method and system based on double-current convolutional neural network
CN112883850A (en) * 2021-02-03 2021-06-01 湖北工业大学 Multi-view aerospace remote sensing image matching method based on convolutional neural network
CN112906605A (en) * 2021-03-05 2021-06-04 南京航空航天大学 Cross-modal pedestrian re-identification method with high accuracy
CN112926531A (en) * 2021-04-01 2021-06-08 深圳市优必选科技股份有限公司 Feature information extraction method, model training method and device and electronic equipment
CN112949608A (en) * 2021-04-15 2021-06-11 南京邮电大学 Pedestrian re-identification method based on twin semantic self-encoder and branch fusion

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11138469B2 (en) * 2019-01-15 2021-10-05 Naver Corporation Training and using a convolutional neural network for person re-identification

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446898A (en) * 2018-09-20 2019-03-08 暨南大学 A kind of recognition methods again of the pedestrian based on transfer learning and Fusion Features
CN110084215A (en) * 2019-05-05 2019-08-02 上海海事大学 A kind of pedestrian of the twin network model of binaryzation triple recognition methods and system again
CN110909605A (en) * 2019-10-24 2020-03-24 西北工业大学 Cross-modal pedestrian re-identification method based on contrast correlation
CN111259850A (en) * 2020-01-23 2020-06-09 同济大学 Pedestrian re-identification method integrating random batch mask and multi-scale representation learning
CN111709317A (en) * 2020-05-28 2020-09-25 西安理工大学 Pedestrian re-identification method based on multi-scale features under saliency model
CN111931637A (en) * 2020-08-07 2020-11-13 华南理工大学 Cross-modal pedestrian re-identification method and system based on double-current convolutional neural network
CN112883850A (en) * 2021-02-03 2021-06-01 湖北工业大学 Multi-view aerospace remote sensing image matching method based on convolutional neural network
CN112906605A (en) * 2021-03-05 2021-06-04 南京航空航天大学 Cross-modal pedestrian re-identification method with high accuracy
CN112926531A (en) * 2021-04-01 2021-06-08 深圳市优必选科技股份有限公司 Feature information extraction method, model training method and device and electronic equipment
CN112949608A (en) * 2021-04-15 2021-06-11 南京邮电大学 Pedestrian re-identification method based on twin semantic self-encoder and branch fusion

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
A Cross-Modal Multi-granularity Attention Network for RGB-IR Person Re-identification;Jianguo Jiang等;《Neurocomputing》;20200422;第406卷;第59-67页 *
SCALE-INVARIANT SIAMESE NETWORK FOR PERSON RE-IDENTIFICATION;Yunzhou Zhang等;《2020 IEEE International Conference on Image Processing (ICIP)》;20200930;第2436-2440页 *
Visible Thermal Person Re-Identification via Dual-Constrained Top-Ranking;Mang Ye等;《Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)》;20181231;第1092-1099页 *
基于多尺度联合学习的车辆重识别方法研究;严晨晨;《中国优秀博硕士学位论文全文数据库(硕士) 工程科技Ⅱ辑》;20200215(第02期);第C034-576页 *
智能视频监控中的行人重识别方法研究;范星;《中国优秀博硕士学位论文全文数据库(博士) 信息科技辑》;20210115(第01期);第I136-266页 *
特征金字塔融合的多模态行人检测算法;童靖然等;《计算机工程与应用》;20191231;第55卷(第19期);第214-222页 *
级联式多尺度行人检测算法研究;张姗等;《传感器与微系统》;20200131;第39卷(第1期);第42-45,52页 *
面向安防监控的行人重识别设计与实现;焦隆;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》;20210915(第09期);第I138-583页 *

Also Published As

Publication number Publication date
CN113963150A (en) 2022-01-21

Similar Documents

Publication Publication Date Title
CN109472298B (en) Deep bidirectional feature pyramid enhanced network for small-scale target detection
CN111612008B (en) Image segmentation method based on convolution network
CN110263786B (en) Road multi-target identification system and method based on feature dimension fusion
CN109063649B (en) Pedestrian re-identification method based on twin pedestrian alignment residual error network
CN111160249A (en) Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN102385592B (en) Image concept detection method and device
CN112348036A (en) Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade
CN115937655B (en) Multi-order feature interaction target detection model, construction method, device and application thereof
CN112036260B (en) Expression recognition method and system for multi-scale sub-block aggregation in natural environment
CN111046732A (en) Pedestrian re-identification method based on multi-granularity semantic analysis and storage medium
CN110826462A (en) Human body behavior identification method of non-local double-current convolutional neural network model
CN112364721A (en) Road surface foreign matter detection method
CN113052184A (en) Target detection method based on two-stage local feature alignment
CN115272242B (en) YOLOv 5-based optical remote sensing image target detection method
CN111860683A (en) Target detection method based on feature fusion
CN112395953A (en) Road surface foreign matter detection system
Asgarian Dehkordi et al. Vehicle type recognition based on dimension estimation and bag of word classification
CN116342894A (en) GIS infrared feature recognition system and method based on improved YOLOv5
CN116563410A (en) Electrical equipment electric spark image generation method based on two-stage generation countermeasure network
CN111723852A (en) Robust training method for target detection network
CN110688976A (en) Store comparison method based on image identification
KR20210011707A (en) A CNN-based Scene classifier with attention model for scene recognition in video
CN113450297A (en) Fusion model construction method and system for infrared image and visible light image
CN117516937A (en) Rolling bearing unknown fault detection method based on multi-mode feature fusion enhancement
CN117333908A (en) Cross-modal pedestrian re-recognition method based on attitude feature alignment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant