CN112270228A - Pedestrian re-identification method based on DCCA fusion characteristics - Google Patents
Pedestrian re-identification method based on DCCA fusion characteristics Download PDFInfo
- Publication number
- CN112270228A CN112270228A CN202011109621.5A CN202011109621A CN112270228A CN 112270228 A CN112270228 A CN 112270228A CN 202011109621 A CN202011109621 A CN 202011109621A CN 112270228 A CN112270228 A CN 112270228A
- Authority
- CN
- China
- Prior art keywords
- neural network
- convolutional neural
- pedestrian
- deep convolutional
- network model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 63
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 85
- 238000010219 correlation analysis Methods 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 238000012545 processing Methods 0.000 claims abstract description 5
- 238000012549 training Methods 0.000 claims description 56
- 239000011159 matrix material Substances 0.000 claims description 43
- 239000000523 sample Substances 0.000 claims description 34
- 238000005259 measurement Methods 0.000 claims description 18
- 238000012360 testing method Methods 0.000 claims description 15
- 230000007704 transition Effects 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 9
- 238000000354 decomposition reaction Methods 0.000 claims description 7
- 125000001967 indiganyl group Chemical group [H][In]([H])[*] 0.000 claims description 4
- 239000003550 marker Substances 0.000 claims description 4
- 239000000758 substrate Substances 0.000 claims 1
- 238000000605 extraction Methods 0.000 abstract description 3
- 238000011176 pooling Methods 0.000 description 26
- 238000010586 diagram Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 210000002569 neuron Anatomy 0.000 description 4
- 238000005457 optimization Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 210000002364 input neuron Anatomy 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 102000002274 Matrix Metalloproteinases Human genes 0.000 description 1
- 108010000684 Matrix Metalloproteinases Proteins 0.000 description 1
- 102100040160 Rabankyrin-5 Human genes 0.000 description 1
- 101710086049 Rabankyrin-5 Proteins 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011840 criminal investigation Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/32—Normalisation of the pattern dimensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a pedestrian re-identification method based on DCCA fusion characteristics, which is implemented according to the following steps: preprocessing the pedestrian re-identification data set, and adjusting the size of the image to be a proper size; respectively carrying out depth feature extraction on the pedestrian data set after processing based on vgg16 deep convolutional neural network and omni-scale deep convolutional neural network; performing typical correlation analysis on the extracted depth features, solving respective projection matrixes, and performing feature fusion on the projected features according to a feature fusion strategy; and finishing the whole pedestrian re-identification process by using the fused features. The pedestrian re-identification method based on DCCA fusion characteristics combines the advantages of vgg16 and the omni-scale depth network, improves the robustness of characteristics, effectively eliminates redundant information while fusing the characteristics, improves the characteristic discrimination capability and improves the accuracy rate of pedestrian re-identification.
Description
Technical Field
The invention belongs to the technical field of computer vision, and relates to a pedestrian re-identification method based on DCCA fusion characteristics.
Background
Pedestrian re-identification is a very popular research topic in the field of computer vision in recent years, and can be regarded as a sub-problem of image retrieval, namely a technology for retrieving a specific pedestrian in an image or a video by utilizing a computer vision technology. That is, an image of the pedestrian of interest is given in advance and is found out in the non-overlapping monitoring device. The task is applied to the fields of intelligent monitoring, crime criminal investigation and the like.
The traditional method mainly solves the problem of pedestrian re-identification from the following two aspects: the method comprises the steps of 1, designing artificial features, extracting features with robustness to characterize pedestrians, and 2, learning a better distance measurement function, wherein the purpose is to learn the similarity between two images. The application of pedestrian re-identification is that on the basis of feature representation, a distance measurement function with strong discrimination is learned to discriminate the similarity between pedestrian images by utilizing the similarity between features, so that the distance between the same pedestrians is as small as possible, and the distance between different pedestrians is as large as possible. With the advent of deep learning, a typical algorithm (Convolutional Neural Network) represented by CNN has been highlighted in the field of computer vision, especially in the international famous ImageNet image classification tournament, which proves that the deep Neural Network has good recognition efficiency. The convolutional neural network is taken as a representative, the convolutional neural network can automatically focus on important areas of input images, extracts features of different network layers, and is more expressive compared with traditional artificial features. The existing deep learning-based method generally completes the pedestrian re-identification process in an end-to-end manner, the process completes network automatic image feature extraction and similarity matching between features at one time, however, due to the end-to-end manner, the redundancy and feature dimension of the features may be higher.
Disclosure of Invention
The invention aims to provide a pedestrian re-identification method based on DCCA fusion characteristics, which combines the advantages of vgg16 network and omni-scale network, improves the robustness of characteristics, and eliminates redundant information to a certain extent while fusing the characteristics, thereby improving the identification capability of the characteristics and improving the accuracy of pedestrian re-identification.
The technical scheme adopted by the invention is that a pedestrian re-identification method based on DCCA fusion characteristics is implemented according to the following steps:
step 1, preprocessing a pedestrian re-identification data set, and adjusting the size of an image to a proper size;
step 2, respectively extracting depth features of the pedestrian data set after processing based on vgg16 deep convolutional neural network and omni-scale deep convolutional neural network;
and 4, completing the whole pedestrian re-identification process by using the fused features.
The present invention is also characterized in that,
the pedestrian re-identification data set is a marker 1501 data set, the data set is divided into a training set train and a test set, and the test set comprises a query set probe and a candidate set galery.
The pedestrian re-recognition images in the training set train and the test set are each adjusted to a size of 224 × 224 pixels and a size of 256 × 128 pixels, respectively.
The step 2 specifically comprises the following steps:
step 2.1, constructing vgg16 deep convolutional neural network models, wherein the vgg16 deep convolutional neural network models comprise thirteen convolutional layers which are connected in sequence, and the output of the last convolutional layer is connected with three full-connection layers in sequence;
step 2.2, constructing an omni-scale deep convolutional neural network model, wherein the omni-scale deep convolutional neural network model comprises five convolutional layers which are sequentially connected, the output of the last convolutional layer is sequentially connected with two transition layers, and the output of the last transition layer is connected with a full connection layer;
step 2.3, migrating pre-training weight parameters for training the vgg16 deep convolutional neural network model constructed in the step 2.1 and the omni-scale deep convolutional neural network model constructed in the step 2.2 on the ImageNet data set respectively;
step 2.4, inputting the training set train with the image pixel size of 224 × 224 and the training set train with the image pixel size of 256 × 128 into the vgg16 deep convolutional neural network model and the omni-scale deep convolutional neural network model processed in step 2.3 respectively to train the models, initializing partial weight parameters of the vgg16 deep convolutional neural network model constructed in step 2.1 and the omni-scale deep convolutional neural network model constructed in step 2.2 by using pre-training weight parameters during training, then extracting the finally output depth features of the two vgg16 deep convolutional neural network models and the omni-scale deep convolutional neural network model respectively and recording the finally output depth features as H1And H2In which H is1∈Ro×m,H2∈Ro×mO represents the dimensions of the two features, m represents the number of samples of the data set, and R is the real number set.
In step 2.4, the pre-training weight parameters are used to initialize the vgg16 deep convolutional neural network model constructed in step 2.1 and part of the weight parameters of the omni-scale deep convolutional neural network model constructed in step 2.2, which specifically include: the pre-training weight parameters are applied to sequentially initialize vgg16 weight parameters of the first thirteen layers of the deep convolutional neural network model layer by layer, and then the weight parameters of the full-connection layer of the last three layers are endowed with random values; and (3) applying the pre-training weight parameters to sequentially and correspondingly initialize the weight parameters of the first seven layers of the omni-scale deep convolutional neural network model layer by layer, and then endowing the weight parameters of the fully-connected layer of the last layer with random values.
The step 3 specifically comprises the following steps:
step 3.1, adding H1 H1And H2Normalized to obtain a mean of 0 and a variance of1, standard data;
Step 3.4, performing singular value decomposition on the matrix T to obtain a maximum singular value rho and left and right singular vectors u, v corresponding to the maximum singular value;
step 3.5, calculate H1And H2Projection matrix A of1And A2,The representation of the two depth features in the relevant subspace is thenAnd
The step 4 specifically comprises the following steps:
step 4.1, respectively inputting the images in the query set probe and the candidate set galery into the vgg16 deep convolutional neural network model and the omni-scale deep convolutional neural network model processed in step 2.3 according to step 2.4, respectively training the images, respectively extracting depth features, and then calculating fusion features corresponding to the query set probe and the candidate set galery according to step 3, wherein the dimensions are r;
step 4.2, the fusion characteristics after train set train training and the corresponding training sample labels are used as input and input into an XQDA algorithm, and the output is a subspace mapping matrix W and a measurement matrixWherein ∑'IIs an intra-class covariance matrix, sigma'EIs an inter-class covariance matrix;
and 4.3, giving the fusion feature corresponding to the query set probe according to the pedestrian image represented by the fusion feature corresponding to the query set and the candidate set galery, and performing feature similarity measurement on all features in the fusion feature corresponding to the query set probe and the candidate set galery to finally obtain a similarity ranking result, wherein the ranking result is determined by similarity, and the higher the similarity is, the earlier the ranking result is, and the recognition is completed.
In step 4.3, the similarity measurement adopts the Mahalanobis distance, and the fusion characteristics corresponding to the query set probe and the candidate set galery are respectively input into the subspace mapping matrix W and the measurement matrix WAnd obtaining the Mahalanobis distance of the corresponding features of the given query set probe and the candidate set galery in the subspace, wherein the smaller the distance is, the higher the similarity is.
The invention has the beneficial effects that:
the method is based on a fusion characteristic pedestrian re-identification method and CCA (typical correlation analysis), combines the advantages of vgg16 and the omni-scale deep network, and improves the characteristic robustness. Meanwhile, maximum correlation analysis is carried out on the two depths by using a DCCA (depth canonical correlation analysis) algorithm, and finally a feature fusion strategy is selected to fuse the two features. The method analyzes the maximum correlation of the features in different spaces in the public subspace, takes the maximum correlation feature between the two features as the discrimination information, effectively eliminates redundant information while fusing the features, improves the feature discrimination capability, and can improve the accuracy of pedestrian re-identification to a certain extent.
Drawings
FIG. 1 is a flowchart of the operation of a pedestrian re-identification method based on DCCA fusion features according to the present invention;
FIG. 2 is a process diagram of vgg16 network extraction of pedestrian features in the pedestrian re-identification method based on DCCA fusion features;
FIG. 3 is a process diagram of extracting pedestrian features by the omni-scale network in the pedestrian re-identification method based on DCCA fusion features of the present invention;
FIG. 4 is a schematic structural diagram of a bottleeck in the embodiment of the present invention;
FIG. 5 is a schematic structural diagram of Lite3 × 3 in the omni-scale deep convolutional neural network model in the embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to a pedestrian re-identification method based on DCCA fusion characteristics, the flow of which is shown in figure 1 and is implemented according to the following steps:
step 1, preprocessing a pedestrian re-identification data set, adjusting the size of an image to a proper size, selecting a marker 1501 data set from the pedestrian re-identification data set, dividing the data set into a training set train and a test set, wherein the test set comprises a query set probe and a candidate set galery; respectively adjusting the pedestrian re-identification images in the training set train and the testing set to be 224 multiplied by 224 pixel size and 256 multiplied by 128 pixel size;
step 2, respectively extracting depth features of the pedestrian data set after processing based on vgg16 deep convolutional neural network and omni-scale deep convolutional neural network; the method specifically comprises the following steps:
as shown in fig. 2, step 2.1, constructing vgg16 deep convolutional neural network model, vgg16 deep convolutional neural network model includes thirteen convolutional layers connected in sequence, and the output of the last convolutional layer is connected with three full-connection layers in sequence;
as shown in fig. 3, step 2.2, constructing an omni-scale deep convolutional neural network model, where the omni-scale deep convolutional neural network model includes five convolutional layers connected in sequence, the output of the last convolutional layer is connected with two transition layers in sequence, and the output of the last transition layer is connected with a full connection layer;
step 2.3, migrating pre-training weight parameters for training the vgg16 deep convolutional neural network model constructed in the step 2.1 and the omni-scale deep convolutional neural network model constructed in the step 2.2 on the ImageNet data set respectively;
step 2.4, inputting the training set train with the image pixel size of 224 × 224 and the training set train with the image pixel size of 256 × 128 into the vgg16 deep convolutional neural network model and the omni-scale deep convolutional neural network model processed in step 2.3 respectively to train the models, initializing partial weight parameters of the vgg16 deep convolutional neural network model constructed in step 2.1 and the omni-scale deep convolutional neural network model constructed in step 2.2 by using pre-training weight parameters during training, then extracting the finally output depth features of the two vgg16 deep convolutional neural network models and the omni-scale deep convolutional neural network model respectively and recording the finally output depth features as H1And H2In which H is1∈Ro×m,H2∈Ro×mO represents the dimensions of the two features, m represents the number of samples of the data set, and R is the real number set.
In step 2.4, the pre-training weight parameters are used to initialize the vgg16 deep convolutional neural network model constructed in step 2.1 and part of the weight parameters of the omni-scale deep convolutional neural network model constructed in step 2.2, which specifically include: the pre-training weight parameters are applied to sequentially initialize vgg16 weight parameters of the first thirteen layers of the deep convolutional neural network model layer by layer, and then the weight parameters of the full-connection layer of the last three layers are endowed with random values; applying the pre-training weight parameters to sequentially and correspondingly initialize the weight parameters of the first seven layers of the omni-scale deep convolutional neural network model layer by layer, and then endowing the weight parameters of the last layer of the full-connection layer with random values;
step 3.1, adding H1And H2Standardizing to obtain standard data with the mean value of 0 and the variance of 1;
Step 3.4, performing singular value decomposition on the matrix T to obtain a maximum singular value rho and left and right singular vectors u, v corresponding to the maximum singular value;
step 3.5, calculate H1And H2Projection matrix A of1And A2,The representation of the two depth features in the relevant subspace is thenAnd
step 4, completing the whole pedestrian re-identification process by using the fused features; the method specifically comprises the following steps:
step 4.1, respectively inputting the images in the query set probe and the candidate set galery into the vgg16 deep convolutional neural network model and the omni-scale deep convolutional neural network model processed in step 2.3 according to step 2.4, respectively training the images, respectively extracting depth features, and then calculating fusion features corresponding to the query set probe and the candidate set galery according to step 3, wherein the dimensions are r;
step 4.2, the fusion characteristics after train set train training and the corresponding training sample labels are used as input and input into an XQDA algorithm, and the output is a subspace mapping matrix W and a measurement matrixWherein ∑'IIs an intra-class covariance matrix, sigma'EIs an inter-class covariance matrix;
and 4.3, giving a fusion feature corresponding to the query set probe and a candidate set galery to a pedestrian image represented by the fusion feature corresponding to the query set probe and the candidate set galery, performing feature similarity measurement on all features in the fusion feature corresponding to the candidate set galery, and finally obtaining a similarity ranking result, wherein the ranking result is determined by similarity, the higher the similarity is, the earlier the ranking result is, and the recognition is completed, wherein the similarity measurement adopts the Mahalanobis distance, and the fusion features corresponding to the query set probe and the candidate set galery are respectively input into a subspace mapping matrix W and a measurement matrixAnd obtaining the Mahalanobis distance of the corresponding features of the given query set probe and the candidate set galery in the subspace, wherein the smaller the distance is, the higher the similarity is.
Examples
The invention relates to a pedestrian re-identification method based on DCCA fusion characteristics, which is implemented according to the following steps:
step 1, preprocessing a pedestrian re-identification data set, adjusting the size of an image to a proper size, selecting a marker 1501 data set from the pedestrian re-identification data set, dividing the data set into a training set train and a test set, wherein the test set comprises a query set probe and a candidate set galery; respectively adjusting the pedestrian re-identification images in the training set train and the testing set to be 224 multiplied by 224 pixel size and 256 multiplied by 128 pixel size;
the pedestrian re-identification data set is mark 1501, which is shot by 6 cameras (5 high-definition cameras and 1 low-definition camera) at Qinghua university in summer, and 1501 pedestrians and 32688 detected rectangular frames are shot in total, each pedestrian is captured by at least 2 cameras, and multiple images can be obtained in one camera, the training set comprises 751 persons, and the test set comprises 750 persons; each image is scaled to a size of 128 x 48 pixels and the picture resolution is resized in order to meet the input requirements of the depth network used. For the vgg16 network, the picture size is adjusted to 224 × 224, for the omni-scale network, the picture size is adjusted to 256 × 128; data set market1501 contains 1501 pedestrians, 32688 total images, and is divided into train (751 people) and test (750 people), wherein train contains 12936 images, test contains probe and galery, wherein probe contains 3368 images, and galery contains 19732 images;
step 2, respectively extracting depth features of the pedestrian data set after processing based on vgg16 deep convolutional neural network and omni-scale deep convolutional neural network; the method specifically comprises the following steps:
step 2.1, constructing vgg16 deep convolutional neural network models, wherein the vgg16 deep convolutional neural network models comprise thirteen convolutional layers which are connected in sequence, and the output of the last convolutional layer is connected with three full-connection layers in sequence; as shown in fig. 2, after passing through thirteen convolutional layers and three full-link layers, the sizes of the characteristic diagrams are as follows: 224 × 224 × 64, 112 × 112 × 128, 28 × 28 × 256, 14 × 14 × 512, 7 × 7 × 512, 1 × 1 × 4096, and 1 × 1 × 751. The specific network structure parameters are set as follows: the convolution kernel size of the first layer convolution layer is 3 multiplied by 64, the step size is 1 multiplied by 1, and the filling mode is same; the convolution kernel size of the second layer of convolution layer is 3 multiplied by 64, the step size is 1 multiplied by 1, the filling mode is same, the pooling layer adopts the maximum pooling mode, and the pooling window size is 2 multiplied by 2; the convolution kernel size of the third layer of convolution layer is 3 multiplied by 128, the step size is 1 multiplied by 1, and the filling mode is same; the convolution kernel size of the fourth layer of convolution layer is 3 multiplied by 128, the step size is 1 multiplied by 1, the filling mode is same, the pooling layer adopts the maximum pooling mode, and the pooling window size is 2 multiplied by 2; the convolution kernel size of the fifth layer convolution layer is 3 multiplied by 256, the step size is 1 multiplied by 1, and the filling mode is same; the convolution kernel size of the sixth layer of convolution layer is 3 multiplied by 256, the step size is 1 multiplied by 1, and the filling mode is same; the convolution kernel size of the seventh layer of convolution layer is 3 multiplied by 256, the step size is 1 multiplied by 1, the filling mode is same, the pooling layer adopts the maximum pooling mode, and the pooling window size is 2 multiplied by 2; the convolution kernel size of the eighth layer of convolution layer is 3 × 3 × 512, the step size is 1 × 1, and the filling mode is same; the convolution kernel size of the ninth convolution layer is 3 × 3 × 512, the step size is 1 × 1, and the filling mode is same; the convolution kernel size of the tenth layer of convolution layer is 3 multiplied by 512, the step size is 1 multiplied by 1, the filling mode is same, the pooling layer adopts the maximum pooling mode, and the pooling window size is 2 multiplied by 2; the convolution kernel size of the eleventh convolution layer is 3 multiplied by 512, the step size is 1 multiplied by 1, and the filling mode is same; the convolution kernel size of the twelfth convolution layer is 3 multiplied by 512, the step size is 1 multiplied by 1, and the filling mode is same; the convolution kernel size of the thirteenth layer of convolution layer is 3 × 3 × 512, the step size is 1 × 1, the filling mode is same, the pooling layer adopts the maximum pooling mode, and the pooling window size is 2 × 2; then, the characteristic diagram of the thirteenth layer is flattened, the characteristic diagram is input into a full connection layer, the input neurons are randomly disconnected with a probability of 0.5 when the full connection layer of the fourteenth layer passes through a Dropout layer to update parameters so as to prevent overfitting and output 4096 neurons, the input neurons are randomly disconnected with a probability of 25% when the full connection layer of the twelfth layer passes through the Dropout layer to update parameters so as to prevent overfitting and output 4096 neurons, and the full connection layer of the sixteenth layer sets and outputs 751 neurons according to the category number of the data set;
step 2.2, constructing an omni-scale deep convolutional neural network model, wherein the omni-scale deep convolutional neural network model comprises five convolutional layers which are sequentially connected, the output of the last convolutional layer is sequentially connected with two transition layers, and the output of the last transition layer is connected with a full connection layer; as shown in fig. 3, after passing through the convolutional layer, the transition layer, and the full link layer, the characteristic diagram size is: 128 × 64 × 64, 64 × 32 × 256, 32 × 16 × 384, 16 × 8 × 512, and 1 × 1 × 512. The specific network structure parameters are set as follows: the convolution kernel size of the first layer of convolution layer is 7 multiplied by 64, the step length is 2 multiplied by 2, the pooling layer adopts the maximum pooling mode, and the pooling window size is 2 multiplied by 2; the second convolutional layer comprises two bottleeck block structures, the bottleeck structure is shown in fig. 4, an improved residual block structure is adopted, 4 convolutional stream branches are used, the Lite3 × 3 structure is a deep separable convolution, as shown in fig. 5, the deep separable convolution improves the standard convolution, the 3 × 3 convolution is changed into 1 × 1 point convolution and 3 × 3 deep convolution, and parameters to be updated by the network are reduced. The third layer of transition layer comprises a convolution layer and an average pooling layer, the size of convolution kernels of the convolution layer is 1 multiplied by 256, the step size is 1 multiplied by 1, the pooling layer adopts an average pooling mode, the pooling window size is 2 multiplied by 2, and the step size is 2 multiplied by 2; the fourth layer of convolution layer comprises two bottompiece structures; the fifth layer of transition comprises a convolution layer and an average pooling layer, the convolution kernel of the convolution layer has the size of 1 multiplied by 256, the step size is 1 multiplied by 1, the pooling layer adopts an average pooling mode, the pooling window size is 2 multiplied by 2, and the step size is 2 multiplied by 2; the sixth layer of convolution layer comprises two bottompiece structures; the convolution kernel size of the seventh layer of convolution layer is 1 × 1 × 512, and the step size is 1 × 1; the seventh fc layer sets and outputs 751 neurons according to the data set category number;
step 2.3, migrating pre-training weight parameters for training the vgg16 deep convolutional neural network model constructed in the step 2.1 and the omni-scale deep convolutional neural network model constructed in the step 2.2 on the ImageNet data set respectively; the network model can have a good initialization parameter by the transfer learning, so that the convergence of the network is accelerated, and the generalization capability of the network is improved;
step 2.4, the training set train with image pixel size of 224 × 224 and the training set with image pixel size of 256 × 128 are input to the vgg16 deep convolutional neural network model and omni-sc processed in step 2.3, respectivelyale, training the deep convolutional neural network models respectively, and during training, initializing part of weight parameters of the vgg16 deep convolutional neural network model constructed in the step 2.1 and the omni-scale deep convolutional neural network model constructed in the step 2.2 by using pre-training weight parameters, namely: the pre-training weight parameters are applied to sequentially initialize vgg16 weight parameters of the first thirteen layers of the deep convolutional neural network model layer by layer, and then the weight parameters of the full-connection layer of the last three layers are endowed with random values; the pre-training weight parameters are applied to sequentially and layer by layer correspondingly initializing the weight parameters of the first seven layers of the omni-scale deep convolutional neural network model, then the weight parameters of the full connection layer of the last layer are endowed with random values, and finally output depth features of the two vgg16 deep convolutional neural network models and the omni-scale deep convolutional neural network model are respectively extracted and recorded as H1And H2In which H is1∈Ro×m,H2∈Ro×mO represents the dimensions of two features, m represents the number of samples of the dataset, R is the set of real numbers, and both outputs are 751 dimensions;
step 3.1, adding H1 H1And H2Standardizing to obtain standard data with the mean value of 0 and the variance of 1;
Step 3.4, performing singular value decomposition on the matrix T to obtain a maximum singular value rho and left and right singular vectors u, v corresponding to the maximum singular value;
step 3.5, calculate H1And H2Projection matrix A of1And A2,The representation of the two depth features in the relevant subspace is thenAnd
step 4, completing the whole pedestrian re-identification process by using the fused features; the method specifically comprises the following steps:
step 4.1, respectively inputting the images in the query set probe and the candidate set galery into the vgg16 deep convolutional neural network model and the omni-scale deep convolutional neural network model processed in step 2.3 according to step 2.4, respectively training the images, respectively extracting depth features, and then calculating fusion features corresponding to the query set probe and the candidate set galery according to step 3, wherein the dimensions are r;
step 4.2, the fusion characteristics after train set train training and the corresponding training sample labels are used as input and input into an XQDA algorithm, and the output is a subspace mapping matrix W and a measurement matrixWherein ∑'IIs an intra-class covariance matrix, sigma'EIs an inter-class covariance matrix;
and 4.3, giving a fusion feature corresponding to the query set probe and a candidate set galery to a pedestrian image represented by the fusion feature corresponding to the query set probe and the candidate set galery, performing feature similarity measurement on all features in the fusion feature corresponding to the candidate set galery, and finally obtaining a similarity ranking result, wherein the ranking result is determined by similarity, the higher the similarity is, the earlier the ranking result is, and the recognition is completed, wherein the similarity measurement adopts the Mahalanobis distance, and the fusion features corresponding to the query set probe and the candidate set galery are respectively input into a subspace mapping matrix W and a measurement matrixAnd obtaining the Mahalanobis distance of the corresponding features of the given query set probe and the candidate set galery in the subspace, wherein the smaller the distance is, the higher the similarity is.
The evaluation results used CMC curves, using rank1, rank5, rank10, rank20 as evaluation indexes, where the value of rank1 is particularly important in evaluating the effect of pedestrian re-identification.
The derivation process of step 3 is as follows:
solving for H using CCA (canonical correlation analysis)1And H2Is given by H1And H2Respectively is A1And A2Their representation in subspace is:andtheir correlation coefficient can be expressed as:
the objective function is:
i.e. solving the mapping matrix A corresponding to the maximum correlation coefficient1 A1And A2;
Before projection, raw data is first normalized to obtain data with a mean of 0 and a variance of 1, such that:
by the same methodWherein H1And H2Representing two network depth features, Cov represents covariance matrix, E represents expectation, and Var represents variance matrix.
Due to H1And H2All mean values of (a) are 0, then:
Since the numerator denominator is increased by the same factor, the optimization target result is unchanged. And (4) fixing denominators and optimizing numerators by adopting an optimization method similar to the SVM. Namely:
in solving the objective function in (6), a method of SVD (singular value decomposition) may be employed,
at this time, the objective function is:
s.t.uTu=1,vTv=1
for the objective function, let the matrixIn this case, U and V may be regarded as left and right singular vectors corresponding to one singular value of the matrix T, and T ═ U Σ V may be obtained by singular value decompositionTU and V are matrixes formed by a left singular vector and a right singular vector of T respectively, and sigma is a diagonal matrix formed by singular values of T. Since all columns of U, V are orthonormal bases, UTU and VTv gets a vector with only one scalar 1 and the remaining scalars 0. MaximizationThe corresponding maximum value is the maximum value of singular values corresponding to a group of left and right singular vectors, namely after sigma is subjected to singular value decomposition, the maximum singular value is the maximum value of an optimization target, namely H1And H2The maximum correlation coefficient therebetween. Using the left and right singular vectors to calculate H1And H2Projection matrix A of1And A2Are respectively
By projecting a matrix A1And A2Based on a feature fusion strategy, corresponding fusion is carried out on features of DCCA (deep canonical correlation analysis), and specific fusion modes include the following two modes:
F2=H'1+H'2=A1H1+A2H2。
based on a fusion feature pedestrian re-identification method and CCA (canonical correlation analysis), the method improves feature robustness by combining the advantages of vgg16 and the omni-scale depth network, simultaneously utilizes DCCA (canonical correlation analysis) algorithm to carry out maximum correlation analysis on two depth features, and finally selects a feature fusion strategy to fuse the two features.
Claims (8)
1. A pedestrian re-identification method based on DCCA fusion features is characterized by comprising the following steps:
step 1, preprocessing a pedestrian re-identification data set, and adjusting the size of an image to a proper size;
step 2, respectively extracting depth features of the pedestrian data set after processing based on vgg16 deep convolutional neural network and omni-scale deep convolutional neural network;
step 3, performing typical correlation analysis on the extracted depth features, solving respective projection matrixes, and performing feature fusion on the projected features according to a feature fusion strategy;
and 4, completing the whole pedestrian re-identification process by using the fused features.
2. The method as claimed in claim 1, wherein the pedestrian re-identification data set is a marker 1501 data set, and the data set is divided into a training set train and a test set, wherein the test set includes a query set probe and a candidate set galery.
3. The DCCA-fusion-feature-based pedestrian re-recognition method according to claim 2, wherein the pedestrian re-recognition images in the training set train and the testing set are adjusted to 224 x 224 pixel size and 256 x 128 pixel size, respectively.
4. The pedestrian re-identification method based on DCCA fusion characteristics according to claim 3, wherein said step 2 specifically is:
step 2.1, constructing vgg16 deep convolutional neural network models, wherein the vgg16 deep convolutional neural network models comprise thirteen convolutional layers which are connected in sequence, and the output of the last convolutional layer is connected with three full-connection layers in sequence;
step 2.2, constructing an omni-scale deep convolutional neural network model, wherein the omni-scale deep convolutional neural network model comprises five convolutional layers which are sequentially connected, the output of the last convolutional layer is sequentially connected with two transition layers, and the output of the last transition layer is connected with a full connection layer;
step 2.3, migrating pre-training weight parameters for training the vgg16 deep convolutional neural network model constructed in the step 2.1 and the omni-scale deep convolutional neural network model constructed in the step 2.2 on the ImageNet data set respectively;
step 2.4, inputting the training set train with the image pixel size of 224 × 224 and the training set train with the image pixel size of 256 × 128 into the vgg16 deep convolutional neural network model and the omni-scale deep convolutional neural network model processed in step 2.3 respectively to train the models, initializing partial weight parameters of the vgg16 deep convolutional neural network model constructed in step 2.1 and the omni-scale deep convolutional neural network model constructed in step 2.2 by using pre-training weight parameters during training, then extracting the finally output depth features of the two vgg16 deep convolutional neural network models and the omni-scale deep convolutional neural network model respectively and recording the finally output depth features as H1And H2In which H is1∈Ro×m,H2∈Ro×mO represents the dimensions of the two features, m represents the number of samples of the data set, and R is the real number set.
5. The pedestrian re-identification method based on DCCA fusion characteristics according to claim 4, wherein in step 2.4, the vgg16 deep convolutional neural network model constructed in step 2.1 and part of the weight parameters of the omni-scale deep convolutional neural network model constructed in step 2.2 are initialized with pre-trained weight parameters, specifically: the pre-training weight parameters are applied to sequentially initialize vgg16 weight parameters of the first thirteen layers of the deep convolutional neural network model layer by layer, and then the weight parameters of the full-connection layer of the last three layers are endowed with random values; and (3) applying the pre-training weight parameters to sequentially and correspondingly initialize the weight parameters of the first seven layers of the omni-scale deep convolutional neural network model layer by layer, and then endowing the weight parameters of the fully-connected layer of the last layer with random values.
6. The pedestrian re-identification method based on DCCA fusion characteristics according to claim 5, wherein said step 3 specifically is:
step 3.1, adding H1And H2Standardizing to obtain standard data with the mean value of 0 and the variance of 1;
Step 3.4, performing singular value decomposition on the matrix T to obtain a maximum singular value rho and left and right singular vectors u, v corresponding to the maximum singular value;
step 3.5, calculate H1And H2Projection matrix A of1And A2,The representation of the two depth features in the relevant subspace is thenAnd
7. The pedestrian re-identification method based on DCCA fusion characteristics according to claim 6, wherein said step 4 specifically is:
step 4.1, respectively inputting the images in the query set probe and the candidate set galery into the vgg16 deep convolutional neural network model and the omni-scale deep convolutional neural network model processed in step 2.3 according to step 2.4, respectively training the images, respectively extracting depth features, and then calculating fusion features corresponding to the query set probe and the candidate set galery according to step 3, wherein the dimensions are r;
step 4.2, the fusion characteristics after train set train training and the corresponding training sample labels are used as input and input into an XQDA algorithm, and the output is a subspace mapping matrix W and a measurement matrixWherein ∑'IIs an intra-class covariance matrix, sigma'EIs an inter-class covariance matrix;
and 4.3, giving the fusion feature corresponding to the query set probe according to the pedestrian image represented by the fusion feature corresponding to the query set and the candidate set galery, and performing feature similarity measurement on all features in the fusion feature corresponding to the query set probe and the candidate set galery to finally obtain a similarity ranking result, wherein the ranking result is determined by similarity, and the higher the similarity is, the earlier the ranking result is, and the recognition is completed.
8. A substrate according to claim 7The pedestrian re-identification method based on DCCA fusion characteristics is characterized in that the similarity measurement in the step 4.3 adopts the Mahalanobis distance, and the fusion characteristics corresponding to the query set probe and the candidate set galery are respectively input into the subspace mapping matrix W and the measurement matrixAnd obtaining the Mahalanobis distance of the corresponding features of the given query set probe and the candidate set galery in the subspace, wherein the smaller the distance is, the higher the similarity is.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011109621.5A CN112270228A (en) | 2020-10-16 | 2020-10-16 | Pedestrian re-identification method based on DCCA fusion characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011109621.5A CN112270228A (en) | 2020-10-16 | 2020-10-16 | Pedestrian re-identification method based on DCCA fusion characteristics |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112270228A true CN112270228A (en) | 2021-01-26 |
Family
ID=74337570
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011109621.5A Pending CN112270228A (en) | 2020-10-16 | 2020-10-16 | Pedestrian re-identification method based on DCCA fusion characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112270228A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113111974A (en) * | 2021-05-10 | 2021-07-13 | 清华大学 | Vision-laser radar fusion method and system based on depth canonical correlation analysis |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508731A (en) * | 2018-10-09 | 2019-03-22 | 中山大学 | A kind of vehicle based on fusion feature recognition methods, system and device again |
CN110874576A (en) * | 2019-11-14 | 2020-03-10 | 西安工程大学 | Pedestrian re-identification method based on canonical correlation analysis fusion features |
CN111401178A (en) * | 2020-03-09 | 2020-07-10 | 蔡晓刚 | Video target real-time tracking method and system based on depth feature fusion and adaptive correlation filtering |
WO2020164270A1 (en) * | 2019-02-15 | 2020-08-20 | 平安科技(深圳)有限公司 | Deep-learning-based pedestrian detection method, system and apparatus, and storage medium |
-
2020
- 2020-10-16 CN CN202011109621.5A patent/CN112270228A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508731A (en) * | 2018-10-09 | 2019-03-22 | 中山大学 | A kind of vehicle based on fusion feature recognition methods, system and device again |
WO2020164270A1 (en) * | 2019-02-15 | 2020-08-20 | 平安科技(深圳)有限公司 | Deep-learning-based pedestrian detection method, system and apparatus, and storage medium |
CN110874576A (en) * | 2019-11-14 | 2020-03-10 | 西安工程大学 | Pedestrian re-identification method based on canonical correlation analysis fusion features |
CN111401178A (en) * | 2020-03-09 | 2020-07-10 | 蔡晓刚 | Video target real-time tracking method and system based on depth feature fusion and adaptive correlation filtering |
Non-Patent Citations (2)
Title |
---|
KAIYANG ZHOU 等: "Omni-Scale Feature Learning for Person Re-Identification", 《ARXIV》, pages 1 - 14 * |
曾超: "基于深度学习的车辆重识别", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》, no. 3, pages 034 - 863 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113111974A (en) * | 2021-05-10 | 2021-07-13 | 清华大学 | Vision-laser radar fusion method and system based on depth canonical correlation analysis |
CN113111974B (en) * | 2021-05-10 | 2021-12-14 | 清华大学 | Vision-laser radar fusion method and system based on depth canonical correlation analysis |
US11532151B2 (en) | 2021-05-10 | 2022-12-20 | Tsinghua University | Vision-LiDAR fusion method and system based on deep canonical correlation analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110110642B (en) | Pedestrian re-identification method based on multi-channel attention features | |
CN111814584B (en) | Vehicle re-identification method based on multi-center measurement loss under multi-view environment | |
CN107341452B (en) | Human behavior identification method based on quaternion space-time convolution neural network | |
CN108460356B (en) | Face image automatic processing system based on monitoring system | |
CN109543602B (en) | Pedestrian re-identification method based on multi-view image feature decomposition | |
CN111709311B (en) | Pedestrian re-identification method based on multi-scale convolution feature fusion | |
CN108520226B (en) | Pedestrian re-identification method based on body decomposition and significance detection | |
CN106372581B (en) | Method for constructing and training face recognition feature extraction network | |
CN108921107B (en) | Pedestrian re-identification method based on sequencing loss and Simese network | |
KR102224253B1 (en) | Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof | |
CN109977757B (en) | Multi-modal head posture estimation method based on mixed depth regression network | |
CN110321830B (en) | Chinese character string picture OCR recognition method based on neural network | |
CN110717411A (en) | Pedestrian re-identification method based on deep layer feature fusion | |
CN110580460A (en) | Pedestrian re-identification method based on combined identification and verification of pedestrian identity and attribute characteristics | |
CN110807434A (en) | Pedestrian re-identification system and method based on combination of human body analysis and coarse and fine particle sizes | |
CN109255289B (en) | Cross-aging face recognition method based on unified generation model | |
CN109447123B (en) | Pedestrian re-identification method based on label consistency constraint and stretching regularization dictionary learning | |
CN111783576A (en) | Pedestrian re-identification method based on improved YOLOv3 network and feature fusion | |
Ma et al. | Orientation driven bag of appearances for person re-identification | |
CN110097029B (en) | Identity authentication method based on high way network multi-view gait recognition | |
CN107169117B (en) | Hand-drawn human motion retrieval method based on automatic encoder and DTW | |
CN108280421B (en) | Human behavior recognition method based on multi-feature depth motion map | |
CN110728216A (en) | Unsupervised pedestrian re-identification method based on pedestrian attribute adaptive learning | |
CN111353447A (en) | Human skeleton behavior identification method based on graph convolution network | |
CN111639580A (en) | Gait recognition method combining feature separation model and visual angle conversion model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |