CN113963150B

CN113963150B - Pedestrian re-identification method based on multi-scale twin cascade network

Info

Publication number: CN113963150B
Application number: CN202111355189.2A
Authority: CN
Inventors: 宋春晓; 瞿洪桂; 孙家乐
Original assignee: Beijing Sinonet Science and Technology Co Ltd
Current assignee: Beijing Sinonet Science and Technology Co Ltd
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2022-04-08
Anticipated expiration: 2041-11-16
Also published as: CN113963150A

Abstract

The invention provides a pedestrian re-identification method based on a multi-scale twin cascade network, which comprises the following steps of: constructing a multi-scale twin cascade network; the multi-scale twin cascade network comprises a multi-scale twin cascade color network, a multi-scale twin cascade gray network, a fusion layer and a PCA dimension reduction layer; the multi-scale twin cascaded color network and the multi-scale twin cascaded gray scale network respectively comprise a first cascaded sub-network, a second cascaded sub-network and a third cascaded sub-network. The invention uses the multi-scale cascade network, fuses the cascade sub-feature graphs of the multi-scale and the corresponding superior sub-network and inputs the cascade sub-feature graphs into the secondary sub-network for pedestrian feature extraction, and fuses the pedestrian features of each sub-network, thereby obtaining more macroscopic and accurate pedestrian feature expression. Therefore, the method can obtain more global, high-level and accurate pedestrian feature expression, avoid the interference of chromatic aberration, illumination, scale set scenes and the like, and improve the accuracy of pedestrian re-identification.

Description

Pedestrian re-identification method based on multi-scale twin cascade network

Technical Field

The invention belongs to the technical field of intelligent video image processing, and particularly relates to a pedestrian re-identification method based on a multi-scale twin cascade network.

Background

With the rapid development of 5G and the Internet of things, intelligent life is still natural. The intelligent security is an important component of intelligent life, and as a key technology of the intelligent security, the accuracy of a pedestrian re-identification technology for searching pedestrians under the condition of crossing camera devices is important. The current pedestrian re-identification technology has certain limitations, for example, due to differences among camera devices, pedestrians are susceptible to wearing color differences, illumination, scales, scenes and the like, and therefore accuracy is damaged. Therefore, the above factors of variation bring difficulties to the popularization and application of the pedestrian re-identification technology. Therefore, it is very important to extract the key effective characteristics of pedestrians under different equipment.

The characteristic expression method in the existing pedestrian re-identification method mainly comprises the following steps: 1. the semantic information of the extracted image represents the pedestrian features, and the pedestrian features extracted by the method have strong dependence on the clothing color, so that the collision/clothing color is difficult to distinguish when consistent; 2. the pedestrian features are extracted by using a single-scale input mode, and the detail features of images with different granularities are ignored by the pedestrian features extracted by the method; 3. the pedestrian re-identification method based on the neural network mainly uses a single network to extract pedestrian features, the pedestrian feature information is single, and the dependence on the design of a network structure is large.

Therefore, for the problems existing in the prior art, how to extract more key, effective, accurate and comprehensive pedestrian features in pedestrians in different image capturing devices is very necessary.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a pedestrian re-identification method based on a multi-scale twin cascade network, which can effectively solve the problems.

The technical scheme adopted by the invention is as follows:

the invention provides a pedestrian re-identification method based on a multi-scale twin cascade network, which comprises the following steps of:

step 1, constructing a data set; the data set comprises a plurality of sample groups; each sample group comprises two image samples which are respectively a color image sample and a gray image sample; the gray image sample is an image sample obtained after graying the color image sample;

dividing the data set into a training set TrainSet and a verification set;

step 2, constructing a multi-scale twin cascade network; the multi-scale twin cascade Network comprises a multi-scale twin cascade color Network _1, a multi-scale twin cascade gray scale Network _2, a fusion layer and a PCA dimension reduction layer;

the Network structures of the multi-scale twin cascade color Network _1 and the multi-scale twin cascade gray scale Network _2 are completely the same;

the multi-scale twin cascade color Network _1 comprises a first cascade color sub-Network level _1s, a second cascade color sub-Network level _2s and a third cascade color sub-Network level _3 s;

the multi-scale twin cascade gray level Network _2 comprises a first cascade gray level sub-Network level _1g, a second cascade gray level sub-Network level _2g and a third cascade gray level sub-Network level _3 g;

training the multi-scale twin cascade network by adopting the following mode to obtain the trained multi-scale twin cascade network:

step 2.1, taking 3 sample groups as a batch of sample group sets; each batch of 3 sample groups is represented as: sample set u₁Sample group u₂And a sample group u₃(ii) a Wherein the sample group u₁To fix the sample; sample set u₂And a sample group u₁Corresponding to the same pedestrian, sample group u₂Is a sample group u₁A positive sample of (a); sample set u₃And a sample group u₁Corresponding to different pedestrians, sample group u₃Is a sample group u₁A negative sample of (d);

inputting a set of sets of sample sets of a batch into the multi-scale twin cascaded network;

step 2.2, for each sample group, its color picture samples are represented as: color picture samples rgb _ tu, grayscale picture samples denoted gray _ tu;

inputting the color picture sample rgb _ tu into the multi-scale twin cascade color Network _1 to obtain a first cascade color pedestrian classification result class _1s output by the first cascade color sub-Network level _1s, a second cascade color pedestrian classification result class _2s output by the second cascade color sub-Network level _2s, a third cascade color pedestrian classification result class _3s output by the third cascade color sub-Network level _3s, and a color pedestrian fusion feature map rgb _ features output by the multi-scale twin cascade color Network _ 1;

inputting the gray picture sample gray _ tu into a multi-scale twin cascade gray Network _2 to obtain a first cascade gray pedestrian classification result class _1g output by a first cascade gray sub-Network level _1g, a second cascade gray pedestrian classification result class _2g output by a second cascade gray sub-Network level _2g, a third cascade gray pedestrian classification result class _3g output by a third cascade gray sub-Network level _3g and a gray pedestrian fusion feature map gray _ features output by a multi-scale twin cascade gray Network _ 2;

wherein, the color picture sample rgb _ tu is input into the multi-scale twin cascade color Network _1, and the specific process is as follows:

step 2.2.1, the color picture sample rgb _ tu is reduced to obtain a Scale _ a picture sample; further reducing the Scale _ a picture sample to obtain a Scale _ b picture sample; further reducing the Scale _ b picture sample to obtain a Scale _ c picture sample;

step 2.2.2, inputting the Scale _ a picture sample into a first cascade color sub-network level _1s, wherein the processing process of the first cascade color sub-network level _1s is as follows:

A1) carrying out convolution, batch normalization and activation on the Scale _ a picture samples to obtain a pedestrian feature map rgb _ feature _ a;

A2) down-sampling the pedestrian feature map rgb _ feature _ a to obtain a pedestrian feature map rgb _ feature1 with the same size as the Scale _ b picture sample;

A3) down-sampling the pedestrian feature map rgb _ feature1 to obtain a pedestrian feature map rgb _ feature2 with the same size as the Scale _ c picture sample;

A4) carrying out convolution and global average pooling operation on the pedestrian feature map rgb _ feature2, and inputting the operation into a first full connection layer to obtain a first cascade pedestrian feature map rgb _ stag1_ feature;

A5) inputting the first cascade pedestrian feature map rgb _ stag1_ feature into a second full-connection layer to obtain a first cascade color pedestrian classification result class _1 s;

step 2.2.3, the processing procedure of the second cascade color sub-network level _2s is as follows:

B1) carrying out convolution, batch normalization and activation on the Scale _ b picture samples to obtain a pedestrian characteristic image rgb _ feature _ b;

B2) performing pedestrian feature fusion on the pedestrian feature map rgb _ feature _ b and the pedestrian feature map rgb _ feature1, and then performing down-sampling to obtain a pedestrian feature map rgb _ feature3 with the same size as the Scale _ c picture sample;

B3) carrying out convolution and global average pooling operation on the pedestrian feature map rgb _ feature3, and inputting the operation into the first full connection layer to obtain a second cascade pedestrian feature map rgb _ stag2_ feature;

B4) inputting the second cascade pedestrian feature map rgb _ stag2_ feature into a second full-connection layer to obtain a second cascade color pedestrian classification result class _2 s;

step 2.2.4, the processing procedure of the third cascade color sub-network level _3s is as follows:

C1) carrying out convolution, batch normalization and activation on the Scale _ c picture samples to obtain a pedestrian characteristic image rgb _ feature _ c;

C2) pedestrian feature fusion is carried out on the pedestrian feature map rgb _ feature _ c, the pedestrian feature map rgb _ feature2 and the pedestrian feature map rgb _ feature3, convolution and global average pooling operation are carried out, then the pedestrian feature fusion is input into the first full connection layer, and a third-level joint pedestrian feature map rgb _ stag3_ feature is obtained;

C3) inputting the third-level joint pedestrian feature map rgb _ stag3_ feature into a second full-connection layer to obtain a third-level joint color pedestrian classification result class _3 s;

step 2.2.5, carrying out pedestrian feature fusion on the first cascade pedestrian feature map rgb _ stag1_ feature, the second cascade pedestrian feature map rgb _ stag2_ feature and the third cascade pedestrian feature map rgb _ stag3_ feature to obtain a colorful pedestrian fusion feature map rgb _ features;

step 2.3, for each sample group, carrying out pedestrian feature fusion on the color pedestrian fusion feature map rgb _ features and the gray level pedestrian fusion feature map gray _ features through a fusion layer, and then carrying out dimension reduction treatment through a PCA dimension reduction layer to obtain a final global pedestrian feature map features; the global pedestrian feature map features pass through the full connection layer to obtain a global pedestrian classification result classifys;

step 2.4, this batch has 3 sample groups in total, for any u < th > sample group_iA set of samples, i ═ 1,2,3, gives the u th_iGlobal pedestrian feature map corresponding to each sample group

Global pedestrian classification results

First cascade color pedestrian classification result

Second cascade color pedestrian classification result

Third-level color pedestrian classification result

First cascade gray pedestrian classification result

Second cascade gray pedestrian classification result

And third-level gray pedestrian classification result

Step 2.5, calculating loss values of all levels of sub-networks:

step 2.5.1, classifying the first cascade color pedestrian classification result

And u_iComparing the sample labels of the sample groups to obtain a first cascade color pedestrian classification loss value

Classifying the second cascade color pedestrian

And u_iComparing the sample labels of the sample groups to obtain a second cascade color pedestrian classification loss value

Classifying the third cascade color pedestrian

And u_iComparing the sample labels of the sample groups to obtain a third cascade color pedestrian classification loss value

Classifying the first cascade gray pedestrian

And u_iComparing the sample labels of the individual sample groups to obtain a first cascade gray pedestrian classification loss value

Classifying the second cascade gray level pedestrian

And u_iComparing the sample labels of the sample groups to obtain a second cascade gray level pedestrian classification loss value

Classifying the third cascade gray pedestrian

And u_iComparing the sample labels of the sample groups to obtain a third cascade gray pedestrian classification loss value

Step 2.5.2, respectively calculating and obtaining a Loss value Loss _1s of the first cascade color sub-network level _1s, a Loss value Loss _2s of the second cascade color sub-network level _2s, a Loss value Loss _3s of the third cascade color sub-network level _3s, a Loss value Loss _1g of the first cascade gray sub-network level _1g, a Loss value Loss _2g of the second cascade gray sub-network level _2g, and a Loss value Loss _3g of the third cascade gray sub-network level _3g by adopting the following formula:

step 2.6, calculating a Loss value Loss _0 of the multi-scale twin cascade network:

step 2.6.1, classifying the global pedestrian results

And u_iComparing the sample labels of the individual sample groups to obtain a global pedestrian classification loss value

Step 2.6.2, calculating to obtain a Loss value Loss _0 of the multi-scale twin cascade network by adopting the following formula:

step 2.7, calculating a similarity Loss function value Loss _ sim between the sample groups:

step 2.7.1, calculate sample set u₁Global pedestrian feature map of

And a sample group u₂Global pedestrian feature map of

The sample distance between, expressed as: d (u)₁,u₂)；

Computing a set of samples u₁Global pedestrian feature map of

And a sample group u₃Global pedestrian feature map of

The sample distance between, expressed as: d (u)₁,u₃)；

Step 2.7.2, calculating a preliminary Loss function value Loss _ d by adopting the following formula:

Loss_d＝d(u₁,u₂)-d(u₁,u₃)+α

wherein: alpha is a loss function coefficient, and the value range is as follows: alpha is alpha<d(u₁,u₃)-d(u₁,u₂)

Step 2.7.3, obtaining a similarity Loss function value Loss _ sim by the following method:

if the Loss _ d is larger than 0, then Loss _ sim is Loss _ d;

if the Loss _ d is less than or equal to 0, then Loss _ sim is 0;

and 2.8, obtaining a final Loss function value Loss _ final by adopting the following formula:

Loss_final＝λ₁Loss_1s+λ₁Loss_2s+λ₁Loss_3s+λ₁Loss_1g+λ₁Loss_2g+λ₁Loss_3g+λ₂Loss_0+λ₃Loss_sim

wherein:

λ₁weight coefficients representing each cascaded subnetwork;

λ₂a weight coefficient representing a loss of the multi-scale twin cascaded network;

λ₃a similarity loss function value weight coefficient;

step 2.9, judging whether the final Loss function value Loss _ final is converged; if the convergence is achieved, obtaining a trained multi-scale twin cascade network, and executing the step 3; if not, adjusting the network parameters of the multi-scale twin cascade network, taking another batch of sample group as input, returning to the step 2.1, and performing iterative training on the multi-scale twin cascade network;

step 3, performing precision verification test on the trained multi-scale twin cascade network by using a verification set, and if the test precision meets the requirement, obtaining a multi-scale twin cascade network which passes the verification;

and 4, performing feature recognition on the input pedestrian picture by adopting a multi-scale twin cascade network to obtain a pedestrian feature recognition result.

Preferably, λ₁Is 1, λ₂Is 6, λ₃Is 7.

Preferably, step 4 specifically comprises:

step 4.1, the input pedestrian picture is a picture Q; pre-establishing a pedestrian sample library G;

step 4.2, inputting the picture Q into a multi-scale twin cascade network to obtain a global pedestrian feature map features^[Q]；

For each pedestrian sample picture G in the pedestrian sample library G_jJ ═ 1, 2.. once, z, z represents the number of pedestrian sample pictures in the pedestrian sample library G, and the pictures are respectively input into the multi-scale twin cascade network to obtain the corresponding global pedestrian feature map

Step 4.3, calculating global pedestrian feature maps features^[Q]And global pedestrian feature map

The similarity of (2); and (4) sorting the similarity from large to small, and outputting the pedestrian sample pictures in the pedestrian sample library G with the highest similarity with the picture Q.

The pedestrian re-identification method based on the multi-scale twin cascade network provided by the invention has the following advantages:

the invention uses the multi-scale cascade network, fuses the cascade sub-feature graphs of the multi-scale and the corresponding superior sub-network and inputs the cascade sub-feature graphs into the secondary sub-network for pedestrian feature extraction, and fuses the pedestrian features of each sub-network, thereby obtaining more macroscopic and accurate pedestrian feature expression. Therefore, the method can obtain more global, high-level and accurate pedestrian feature expression, avoid the interference of chromatic aberration, illumination, scale set scenes and the like, and improve the accuracy of pedestrian re-identification.

Drawings

FIG. 1 is a schematic overall flow chart of a pedestrian re-identification method based on a multi-scale twin cascade network provided by the invention;

FIG. 2 is an overall schematic diagram of a multi-scale twin cascaded network provided by the present invention;

FIG. 3 is a diagram of a first cascaded sub-network level _1 according to the present invention;

FIG. 4 is a diagram of a level _2 of a second cascaded sub-network according to the present invention;

fig. 5 is a structural diagram of a level _3 of the third hierarchical network provided by the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Because interference conditions such as chromatic aberration, illumination, scale set scenes and the like in the prior art easily cause the reduction of the accuracy of pedestrian re-identification, the invention provides a pedestrian re-identification method based on a multi-scale twin cascade network, which has the following characteristics: 1) constructing a multi-scale twin cascade network with a color and gray level dual input structure, performing feature fusion on the multi-scale color cascade features and the multi-scale gray level cascade features, and then adopting a feature dimension reduction strategy, thereby obtaining more global, high-level and accurate pedestrian feature expression; 2) and (3) using a multi-scale cascade network, fusing the multi-scale cascade sub-feature graphs corresponding to the superior sub-networks and inputting the multi-scale cascade sub-feature graphs into the secondary sub-networks for pedestrian feature extraction, and fusing the pedestrian features of the sub-networks, thereby obtaining more macroscopic and accurate pedestrian feature expression. Therefore, the method can obtain more global, high-level and accurate pedestrian feature expression, avoid the interference of chromatic aberration, illumination, scale set scenes and the like, and improve the accuracy of pedestrian re-identification.

The invention provides a pedestrian re-identification method based on a multi-scale twin cascade network, which comprises the following steps with reference to a figure 1:

dividing the data set into a training set TrainSet and a verification set;

the training set TrainSet is used for training the multi-scale twin cascade network; the verification set is used for verifying the accuracy of the multi-scale twin cascaded network.

in the invention, each color picture sample in the data set corresponds to a gray picture sample, the color picture sample is input into a multi-scale twin cascade color Network _1, and the gray picture sample is input into a multi-scale twin cascade gray Network _ 2. By setting the multi-scale twin cascade gray level Network 2 with the same structure as the multi-scale twin cascade color Network 1, the influences of color difference, illumination, scenes, postures and the like caused by camera crossing can be supplemented, and the accuracy of pedestrian feature extraction is improved.

in the present invention, the requirements of three cascaded subnetworks are: the backbone network is different, and the structure can be simpler in one level than in one level. The backbone network can be a simple convolution network, a residual error network or a combination of various networks, but the output scale of the sub-feature graph of the upper-level network is required to be consistent with the input scale of the lower-level network, and the scales only refer to height and width.

referring to fig. 2, the color picture sample rgb _ tu is input to the multi-scale twin cascade color Network _1 to obtain a first cascade color pedestrian classification result class _1s output by the first cascade color sub-Network level _1s, a second cascade color pedestrian classification result class _2s output by the second cascade color sub-Network level _2s, a third cascade color pedestrian classification result class _3s output by the third cascade color sub-Network level _3s, and a color pedestrian fusion feature map rgb _ features output by the multi-scale twin cascade color Network _ 1;

because the processing procedure of inputting the color picture sample rgb _ tu into the multi-scale twin cascaded color Network _1 is completely the same as the processing procedure of inputting the gray picture sample gray _ tu into the multi-scale twin cascaded gray Network _2, the invention only takes the processing procedure of inputting the color picture sample rgb _ tu into the multi-scale twin cascaded color Network _1 as an example, and the detailed description is carried out through the steps 2.2.1 to 2.2.5, and the processing procedure of inputting the gray picture sample gray _ tu into the multi-scale twin cascaded gray Network _2 is not repeated.

therefore, the picture sizes of the Scale _ a picture sample, the Scale _ b picture sample, and the Scale _ c picture sample are not reduced.

As a specific implementation manner, the Scale _ a picture sample is reduced by two times to obtain a Scale _ b picture sample; and reducing the Scale of the Scale _ b picture sample by two times to obtain a Scale _ c picture sample. For example, Scale _ a picture sample size is 128 × 384, Scale _ b picture sample size is 64 × 192, Scale _ c picture sample size is 32 × 96, and 32 indicates the width of the picture; 96 refers to the height of the picture.

Step 2.2.2, the Scale _ a picture sample is input into the first cascade color sub-network level _1s, and the processing procedure of the first cascade color sub-network level _1s refers to fig. 3 as follows:

for example, Scale _ a picture sample input at Scale 128 × 384, convolution layer, batch normalization layer, ReLU activation layer, two down-sampling units down sampling _ unit, convolution, average pooling, full connection layer fc1, and full connection layer fc 2.

The downsampling unit in this embodiment is implemented by using convolution with a step size of 2, see a dashed-line frame region in fig. 2, the downsampling unit1 and the downsampling unit2 implement that after downsampling of the feature map, a pedestrian feature map rgb 1 and a pedestrian feature map rgb feature2 are respectively obtained, the output of fc1 of the cascade network level _1 full network obtains a first cascade pedestrian feature map rgb _ stag1_ feature, and the fc2 of the cascade network level _1 full network outputs a first cascade color pedestrian classification result class _1 s.

Step 2.2.3, the processing procedure of the second cascaded color sub-network level _2s with reference to fig. 4 is:

in this embodiment, a convolutional network is constructed as an example, and the following are sequentially performed according to the direction of data flow: scale _ b picture sample input at Scale 96 x 192, convolution layer, batch normalization layer, ReLU activation layer, one down-sampling unit down sampling _ unit, convolution, average pooling, full connection layer fc1, and full connection layer fc 2.

Different from level _1 are: the network is a dual-input structure and comprises the following parts: scale _ b picture sample input and pedestrian feature map rgb _ feature1 at Scale 64 x 192, and only one down-sampling unit is needed. The number of the first layer of convolution kernels of the network needs to be consistent with the number of channels of the pedestrian feature map rgb _ feature1, and then the pedestrian feature map rgb _ feature1 and the 192Scale _ b picture sample after the first layer of convolution operation are added, and then the added data are input into a subsequent network structure, wherein a downsamping _ unit is used for downsampling the feature map to obtain the pedestrian feature map rgb _ feature3 needed by a level _3 network, and the fc1 of a cascade network level _2 full network outputs a second cascade pedestrian feature map rgb _ stag2_ feature and the cascade network level _2 full network fc2 outputs a second cascade color pedestrian classification result class _2 s.

Step 2.2.4, the processing procedure of the third cascaded color sub-network level _3s with reference to fig. 5 is:

in this embodiment, a convolutional network is constructed as an example, and the following are sequentially performed according to the direction of data flow: scale _ c picture sample input at Scale 32 x 96, convolution layer, batch normalization layer, ReLU activation layer, convolution, average pooling, full connection layer fc1, and full connection layer fc 2.

Different from the first two networks are: the network is a three-input structure, which is respectively: scale _ c picture sample input with Scale 32 x 96, pedestrian feature map rgb feature2 and pedestrian feature map rgb feature 3. The number of the first layer of convolution of the network needs to be consistent with the number of channels of the pedestrian feature map rgb _ feature2 and the pedestrian feature map rgb _ feature3, then the first layer of convolution operation 32 × 96 original image, the pedestrian feature map rgb _ feature2 and the pedestrian feature map rgb _ feature3 are subjected to Add operation, and then the added operation is input into a subsequent network structure, finally the fc1 of the cascade network level _3 full network outputs a third-level joint pedestrian feature map rgb _ stag3_ feature and the fc2 of the cascade network level _3 full network outputs a third-level joint color pedestrian classification result class _3 s.

in the step, the color pedestrian fusion feature map rgb _ features and the gray level pedestrian fusion feature map gray _ features are subjected to channel fusion, so that multi-scale features of color and gray level images are fused to obtain richer pedestrian information, then PCA is connected to sequentially reduce mean centralization, calculate covariance and decompose feature values of the fusion features, and finally a final effective feature dimension is selected according to a feature value decomposition result to obtain a final global pedestrian feature map features.

For example, the color pedestrian fusion feature map rgb _ features and the gray pedestrian fusion feature map gray _ features of the twin cascade network are subjected to channel fusion to obtain D ═ (x ═ x)⁽¹⁾,x⁽²⁾,...x^(m)) M is 1024 dimensions in this embodiment, where x is a column vector of length batch; when PCA feature dimension reduction is carried out, firstly, the centralization operation of mean value reduction is carried out on D, see formula

The obtained feature vector is represented by X, that is, X is (X1)^(1)′,x1^(2)′,...x1^(m)′). Then, the covariance matrix V-XX is calculated^TFinally, matrix decomposition V ═ U ∑ U is carried out on V^TThe purpose of matrix decomposition is to decompose the fused matrix V into eigenvalues and eigenvectors, the magnitude of which is used to determine the quality of the eigenvectors. The m eigenvalues after V decomposition are ∑ ═ (λ)₁,λ₂,...λ_m) Corresponding feature vector U ═ w₁,w₂...w_mFinally, selecting eigenvectors { w) corresponding to the first k eigenvalues according to the set eigendimension k₁,w₂...w_kThe final feature vector is composed of: global pedestrian feature maps features.

Step 2.4, this batchThere were 3 sample groups in total, for any u < th > sample_iA set of samples, i ═ 1,2,3, gives the u th_iGlobal pedestrian feature map corresponding to each sample group

Global pedestrian classification results

First cascade color pedestrian classification result

Second cascade color pedestrian classification result

Third-level color pedestrian classification result

First cascade gray pedestrian classification result

Second cascade gray pedestrian classification result

And third-level gray pedestrian classification result

Step 2.5, calculating loss values of all levels of sub-networks:

The second cascade color pedestrianClassification result

Classifying the third cascade color pedestrian

Classifying the first cascade gray pedestrian

Classifying the second cascade gray level pedestrian

Classifying the third cascade gray pedestrian

step 2.6.1, classifying the global pedestrian results

And u_iSample labeling of individual sample groupsComparing to obtain a global pedestrian classification loss value

step 2.7.1, calculate sample set u₁Global pedestrian feature map of

And a sample group u₂Global pedestrian feature map of

The sample distance between, expressed as: d (u)₁,u₂)；

Computing a set of samples u₁Global pedestrian feature map of

And a sample group u₃Global pedestrian feature map of

The sample distance between, expressed as: d (u)₁,u₃)；

Loss_d＝d(u₁,u₂)-d(u₁,u₃)+α

if the Loss _ d is larger than 0, then Loss _ sim is Loss _ d;

if the Loss _ d is less than or equal to 0, then Loss _ sim is 0;

wherein:

λ₁weight coefficients representing each cascaded subnetwork;

λ₃a similarity loss function value weight coefficient;

as a specific implementation, λ₁Is 1, λ₂Is 6, λ₃Is 7.

The step 4 specifically comprises the following steps:

For pedestrian sample library GEach pedestrian sample picture G in (1)_jJ ═ 1, 2.. once, z, z represents the number of pedestrian sample pictures in the pedestrian sample library G, and the pictures are respectively input into the multi-scale twin cascade network to obtain the corresponding global pedestrian feature map

As a specific implementation manner, 1 pedestrian sample picture in the pedestrian sample library G with the highest similarity to the picture Q may be output. The parameter K of the number of best matches of the images may also be preset, for example, if K is set to 10, sorting the pedestrian sample images in the pedestrian sample library G from large to small according to the similarity, and then selecting the 1 st to 10 th pedestrian sample images and outputting the pedestrian sample images according to the sorting order.

The technical essential that this patent relates to: 1. establishing a multi-scale cascade network, fusing and inputting the multi-scale and color or gray sub-features of a corresponding superior sub-network into a secondary sub-network for pedestrian feature extraction, and fusing the pedestrian features of the sub-networks; 2. and a twin network with double input of color and gray levels is used, the pedestrian features of color multi-scale and gray level multi-scale are subjected to feature fusion, and then a pedestrian feature dimension reduction strategy is adopted, so that stronger, comprehensive and effective pedestrian feature expression is obtained.

Compared with the prior art, the invention has the beneficial effects that:

compared with single input, the gray level input mode can balance and increase pedestrian characteristic information influenced by chromatic aberration, illumination, scenes and the like caused by camera setting, thereby extracting more comprehensive pedestrian characteristic information.

In order to reduce the interference of redundant information, the final pedestrian characteristics are subjected to dimension reduction operation, so that the obtained pedestrian characteristics represent more comprehensive and are not too complicated.

The method is different from the prior art that only a single-scale mode is adopted for pedestrian feature extraction, but a multi-scale mode is adopted for carrying out pedestrian feature extraction on images of each scale and carrying out channel fusion, so that the pedestrian features with both spatial information and strong semantic information are obtained.

Therefore, the method is different from the method for extracting the pedestrian features by adopting a single network structure in the prior art, when the network is constructed, a plurality of sub-networks are adopted to construct a multi-scale twin cascade network, and each level of network is utilized to extract a plurality of sub-features, so that the method for extracting the pedestrian features at each level is ensured to be mutually independent; the output of each level is combined with original images with different scales to serve as the input of the next level, and finally, the pedestrian features of each cascade are fused to realize mutual supplement of different cascade networks, so that more obvious pedestrian features are mined, and the pedestrian feature expression force is enhanced.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims

1. A pedestrian re-identification method based on a multi-scale twin cascade network is characterized by comprising the following steps:

dividing the data set into a training set TrainSet and a verification set;

Global pedestrian classification results

First cascade color pedestrian classification result

Second cascade color pedestrian classification result

Third-level color pedestrian classification result

First cascade gray pedestrian classification result

Second cascade gray pedestrian classification result

And third-level gray pedestrian classification result

Step 2.5, calculating loss values of all levels of sub-networks:

Classifying the second cascade color pedestrian

Classifying the third cascade color pedestrian

Classifying the first cascade gray pedestrian

Classifying the second cascade gray level pedestrian

Classifying the third cascade gray pedestrian

step 2.6.1, classifying the global pedestrian results

step 2.7.1, calculate sample set u₁Global pedestrian feature map of

And a sample group u₂Global pedestrian feature map of

The sample distance between, expressed as: d (u)₁,u₂)；

Computing a set of samples u₁Global pedestrian feature map of

And a sample group u₃Global pedestrian feature map of

The sample distance between, expressed as: d (u)₁,u₃)；

Loss_d＝d(u₁,u₂)-d(u₁,u₃)+α

if the Loss _ d is larger than 0, then Loss _ sim is Loss _ d;

if the Loss _ d is less than or equal to 0, then Loss _ sim is 0;

wherein:

λ₁weight coefficients representing each cascaded subnetwork;

λ₃a similarity loss function value weight coefficient;

2. The pedestrian re-identification method based on the multi-scale twin cascade network as claimed in claim 1, wherein λ is₁Is 1, λ₂Is 6, λ₃Is 7.

3. The pedestrian re-identification method based on the multi-scale twin cascade network as claimed in claim 1, wherein the step 4 specifically comprises: