CN112270228A - Pedestrian re-identification method based on DCCA fusion characteristics - Google Patents

Pedestrian re-identification method based on DCCA fusion characteristics Download PDF

Info

Publication number
CN112270228A
CN112270228A CN202011109621.5A CN202011109621A CN112270228A CN 112270228 A CN112270228 A CN 112270228A CN 202011109621 A CN202011109621 A CN 202011109621A CN 112270228 A CN112270228 A CN 112270228A
Authority
CN
China
Prior art keywords
neural network
convolutional neural
pedestrian
deep convolutional
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011109621.5A
Other languages
Chinese (zh)
Inventor
张凯兵
唐瑞琪
李敏奇
卢健
景军锋
刘薇
陈小改
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Polytechnic University
Original Assignee
Xian Polytechnic University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Polytechnic University filed Critical Xian Polytechnic University
Priority to CN202011109621.5A priority Critical patent/CN112270228A/en
Publication of CN112270228A publication Critical patent/CN112270228A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pedestrian re-identification method based on DCCA fusion characteristics, which is implemented according to the following steps: preprocessing the pedestrian re-identification data set, and adjusting the size of the image to be a proper size; respectively carrying out depth feature extraction on the pedestrian data set after processing based on vgg16 deep convolutional neural network and omni-scale deep convolutional neural network; performing typical correlation analysis on the extracted depth features, solving respective projection matrixes, and performing feature fusion on the projected features according to a feature fusion strategy; and finishing the whole pedestrian re-identification process by using the fused features. The pedestrian re-identification method based on DCCA fusion characteristics combines the advantages of vgg16 and the omni-scale depth network, improves the robustness of characteristics, effectively eliminates redundant information while fusing the characteristics, improves the characteristic discrimination capability and improves the accuracy rate of pedestrian re-identification.

Description

Pedestrian re-identification method based on DCCA fusion characteristics
Technical Field
The invention belongs to the technical field of computer vision, and relates to a pedestrian re-identification method based on DCCA fusion characteristics.
Background
Pedestrian re-identification is a very popular research topic in the field of computer vision in recent years, and can be regarded as a sub-problem of image retrieval, namely a technology for retrieving a specific pedestrian in an image or a video by utilizing a computer vision technology. That is, an image of the pedestrian of interest is given in advance and is found out in the non-overlapping monitoring device. The task is applied to the fields of intelligent monitoring, crime criminal investigation and the like.
The traditional method mainly solves the problem of pedestrian re-identification from the following two aspects: the method comprises the steps of 1, designing artificial features, extracting features with robustness to characterize pedestrians, and 2, learning a better distance measurement function, wherein the purpose is to learn the similarity between two images. The application of pedestrian re-identification is that on the basis of feature representation, a distance measurement function with strong discrimination is learned to discriminate the similarity between pedestrian images by utilizing the similarity between features, so that the distance between the same pedestrians is as small as possible, and the distance between different pedestrians is as large as possible. With the advent of deep learning, a typical algorithm (Convolutional Neural Network) represented by CNN has been highlighted in the field of computer vision, especially in the international famous ImageNet image classification tournament, which proves that the deep Neural Network has good recognition efficiency. The convolutional neural network is taken as a representative, the convolutional neural network can automatically focus on important areas of input images, extracts features of different network layers, and is more expressive compared with traditional artificial features. The existing deep learning-based method generally completes the pedestrian re-identification process in an end-to-end manner, the process completes network automatic image feature extraction and similarity matching between features at one time, however, due to the end-to-end manner, the redundancy and feature dimension of the features may be higher.
Disclosure of Invention
The invention aims to provide a pedestrian re-identification method based on DCCA fusion characteristics, which combines the advantages of vgg16 network and omni-scale network, improves the robustness of characteristics, and eliminates redundant information to a certain extent while fusing the characteristics, thereby improving the identification capability of the characteristics and improving the accuracy of pedestrian re-identification.
The technical scheme adopted by the invention is that a pedestrian re-identification method based on DCCA fusion characteristics is implemented according to the following steps:
step 1, preprocessing a pedestrian re-identification data set, and adjusting the size of an image to a proper size;
step 2, respectively extracting depth features of the pedestrian data set after processing based on vgg16 deep convolutional neural network and omni-scale deep convolutional neural network;
step 3, performing typical correlation analysis on the extracted depth features, solving respective projection matrixes, and performing feature fusion on the projected features according to a feature fusion strategy;
and 4, completing the whole pedestrian re-identification process by using the fused features.
The present invention is also characterized in that,
the pedestrian re-identification data set is a marker 1501 data set, the data set is divided into a training set train and a test set, and the test set comprises a query set probe and a candidate set galery.
The pedestrian re-recognition images in the training set train and the test set are each adjusted to a size of 224 × 224 pixels and a size of 256 × 128 pixels, respectively.
The step 2 specifically comprises the following steps:
step 2.1, constructing vgg16 deep convolutional neural network models, wherein the vgg16 deep convolutional neural network models comprise thirteen convolutional layers which are connected in sequence, and the output of the last convolutional layer is connected with three full-connection layers in sequence;
step 2.2, constructing an omni-scale deep convolutional neural network model, wherein the omni-scale deep convolutional neural network model comprises five convolutional layers which are sequentially connected, the output of the last convolutional layer is sequentially connected with two transition layers, and the output of the last transition layer is connected with a full connection layer;
step 2.3, migrating pre-training weight parameters for training the vgg16 deep convolutional neural network model constructed in the step 2.1 and the omni-scale deep convolutional neural network model constructed in the step 2.2 on the ImageNet data set respectively;
step 2.4, inputting the training set train with the image pixel size of 224 × 224 and the training set train with the image pixel size of 256 × 128 into the vgg16 deep convolutional neural network model and the omni-scale deep convolutional neural network model processed in step 2.3 respectively to train the models, initializing partial weight parameters of the vgg16 deep convolutional neural network model constructed in step 2.1 and the omni-scale deep convolutional neural network model constructed in step 2.2 by using pre-training weight parameters during training, then extracting the finally output depth features of the two vgg16 deep convolutional neural network models and the omni-scale deep convolutional neural network model respectively and recording the finally output depth features as H1And H2In which H is1∈Ro×m,H2∈Ro×mO represents the dimensions of the two features, m represents the number of samples of the data set, and R is the real number set.
In step 2.4, the pre-training weight parameters are used to initialize the vgg16 deep convolutional neural network model constructed in step 2.1 and part of the weight parameters of the omni-scale deep convolutional neural network model constructed in step 2.2, which specifically include: the pre-training weight parameters are applied to sequentially initialize vgg16 weight parameters of the first thirteen layers of the deep convolutional neural network model layer by layer, and then the weight parameters of the full-connection layer of the last three layers are endowed with random values; and (3) applying the pre-training weight parameters to sequentially and correspondingly initialize the weight parameters of the first seven layers of the omni-scale deep convolutional neural network model layer by layer, and then endowing the weight parameters of the fully-connected layer of the last layer with random values.
The step 3 specifically comprises the following steps:
step 3.1, adding H1 H1And H2Normalized to obtain a mean of 0 and a variance of1, standard data;
step 3.2, calculate H1Variance of (2)
Figure BDA0002728152150000041
H2Variance of (2)
Figure BDA0002728152150000042
H1And H2Covariance of
Figure BDA0002728152150000043
Step 3.3, calculate the matrix
Figure BDA0002728152150000044
Step 3.4, performing singular value decomposition on the matrix T to obtain a maximum singular value rho and left and right singular vectors u, v corresponding to the maximum singular value;
step 3.5, calculate H1And H2Projection matrix A of1And A2
Figure BDA0002728152150000045
The representation of the two depth features in the relevant subspace is then
Figure BDA0002728152150000046
And
Figure BDA0002728152150000047
step 3.6, fusion characteristics are expressed as
Figure BDA0002728152150000048
Or F2=H'1+H'2=A1H1+A2H2And the dimension is r.
The step 4 specifically comprises the following steps:
step 4.1, respectively inputting the images in the query set probe and the candidate set galery into the vgg16 deep convolutional neural network model and the omni-scale deep convolutional neural network model processed in step 2.3 according to step 2.4, respectively training the images, respectively extracting depth features, and then calculating fusion features corresponding to the query set probe and the candidate set galery according to step 3, wherein the dimensions are r;
step 4.2, the fusion characteristics after train set train training and the corresponding training sample labels are used as input and input into an XQDA algorithm, and the output is a subspace mapping matrix W and a measurement matrix
Figure BDA0002728152150000049
Wherein ∑'IIs an intra-class covariance matrix, sigma'EIs an inter-class covariance matrix;
and 4.3, giving the fusion feature corresponding to the query set probe according to the pedestrian image represented by the fusion feature corresponding to the query set and the candidate set galery, and performing feature similarity measurement on all features in the fusion feature corresponding to the query set probe and the candidate set galery to finally obtain a similarity ranking result, wherein the ranking result is determined by similarity, and the higher the similarity is, the earlier the ranking result is, and the recognition is completed.
In step 4.3, the similarity measurement adopts the Mahalanobis distance, and the fusion characteristics corresponding to the query set probe and the candidate set galery are respectively input into the subspace mapping matrix W and the measurement matrix W
Figure BDA0002728152150000051
And obtaining the Mahalanobis distance of the corresponding features of the given query set probe and the candidate set galery in the subspace, wherein the smaller the distance is, the higher the similarity is.
The invention has the beneficial effects that:
the method is based on a fusion characteristic pedestrian re-identification method and CCA (typical correlation analysis), combines the advantages of vgg16 and the omni-scale deep network, and improves the characteristic robustness. Meanwhile, maximum correlation analysis is carried out on the two depths by using a DCCA (depth canonical correlation analysis) algorithm, and finally a feature fusion strategy is selected to fuse the two features. The method analyzes the maximum correlation of the features in different spaces in the public subspace, takes the maximum correlation feature between the two features as the discrimination information, effectively eliminates redundant information while fusing the features, improves the feature discrimination capability, and can improve the accuracy of pedestrian re-identification to a certain extent.
Drawings
FIG. 1 is a flowchart of the operation of a pedestrian re-identification method based on DCCA fusion features according to the present invention;
FIG. 2 is a process diagram of vgg16 network extraction of pedestrian features in the pedestrian re-identification method based on DCCA fusion features;
FIG. 3 is a process diagram of extracting pedestrian features by the omni-scale network in the pedestrian re-identification method based on DCCA fusion features of the present invention;
FIG. 4 is a schematic structural diagram of a bottleeck in the embodiment of the present invention;
FIG. 5 is a schematic structural diagram of Lite3 × 3 in the omni-scale deep convolutional neural network model in the embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to a pedestrian re-identification method based on DCCA fusion characteristics, the flow of which is shown in figure 1 and is implemented according to the following steps:
step 1, preprocessing a pedestrian re-identification data set, adjusting the size of an image to a proper size, selecting a marker 1501 data set from the pedestrian re-identification data set, dividing the data set into a training set train and a test set, wherein the test set comprises a query set probe and a candidate set galery; respectively adjusting the pedestrian re-identification images in the training set train and the testing set to be 224 multiplied by 224 pixel size and 256 multiplied by 128 pixel size;
step 2, respectively extracting depth features of the pedestrian data set after processing based on vgg16 deep convolutional neural network and omni-scale deep convolutional neural network; the method specifically comprises the following steps:
as shown in fig. 2, step 2.1, constructing vgg16 deep convolutional neural network model, vgg16 deep convolutional neural network model includes thirteen convolutional layers connected in sequence, and the output of the last convolutional layer is connected with three full-connection layers in sequence;
as shown in fig. 3, step 2.2, constructing an omni-scale deep convolutional neural network model, where the omni-scale deep convolutional neural network model includes five convolutional layers connected in sequence, the output of the last convolutional layer is connected with two transition layers in sequence, and the output of the last transition layer is connected with a full connection layer;
step 2.3, migrating pre-training weight parameters for training the vgg16 deep convolutional neural network model constructed in the step 2.1 and the omni-scale deep convolutional neural network model constructed in the step 2.2 on the ImageNet data set respectively;
step 2.4, inputting the training set train with the image pixel size of 224 × 224 and the training set train with the image pixel size of 256 × 128 into the vgg16 deep convolutional neural network model and the omni-scale deep convolutional neural network model processed in step 2.3 respectively to train the models, initializing partial weight parameters of the vgg16 deep convolutional neural network model constructed in step 2.1 and the omni-scale deep convolutional neural network model constructed in step 2.2 by using pre-training weight parameters during training, then extracting the finally output depth features of the two vgg16 deep convolutional neural network models and the omni-scale deep convolutional neural network model respectively and recording the finally output depth features as H1And H2In which H is1∈Ro×m,H2∈Ro×mO represents the dimensions of the two features, m represents the number of samples of the data set, and R is the real number set.
In step 2.4, the pre-training weight parameters are used to initialize the vgg16 deep convolutional neural network model constructed in step 2.1 and part of the weight parameters of the omni-scale deep convolutional neural network model constructed in step 2.2, which specifically include: the pre-training weight parameters are applied to sequentially initialize vgg16 weight parameters of the first thirteen layers of the deep convolutional neural network model layer by layer, and then the weight parameters of the full-connection layer of the last three layers are endowed with random values; applying the pre-training weight parameters to sequentially and correspondingly initialize the weight parameters of the first seven layers of the omni-scale deep convolutional neural network model layer by layer, and then endowing the weight parameters of the last layer of the full-connection layer with random values;
step 3, performing typical correlation analysis on the extracted depth features, solving respective projection matrixes, and performing feature fusion on the projected features according to a feature fusion strategy; the method specifically comprises the following steps:
step 3.1, adding H1And H2Standardizing to obtain standard data with the mean value of 0 and the variance of 1;
step 3.2, calculate H1Variance of (2)
Figure BDA0002728152150000071
H2Variance of (2)
Figure BDA0002728152150000072
H1And H2Covariance of
Figure BDA0002728152150000073
Step 3.3, calculate the matrix
Figure BDA0002728152150000074
Step 3.4, performing singular value decomposition on the matrix T to obtain a maximum singular value rho and left and right singular vectors u, v corresponding to the maximum singular value;
step 3.5, calculate H1And H2Projection matrix A of1And A2
Figure BDA0002728152150000075
The representation of the two depth features in the relevant subspace is then
Figure BDA0002728152150000076
And
Figure BDA0002728152150000077
step 3.6, fusion characteristics are expressed as
Figure BDA0002728152150000078
Or F2=H'1+H'2=A1H1+A2H2The dimension is r;
step 4, completing the whole pedestrian re-identification process by using the fused features; the method specifically comprises the following steps:
step 4.1, respectively inputting the images in the query set probe and the candidate set galery into the vgg16 deep convolutional neural network model and the omni-scale deep convolutional neural network model processed in step 2.3 according to step 2.4, respectively training the images, respectively extracting depth features, and then calculating fusion features corresponding to the query set probe and the candidate set galery according to step 3, wherein the dimensions are r;
step 4.2, the fusion characteristics after train set train training and the corresponding training sample labels are used as input and input into an XQDA algorithm, and the output is a subspace mapping matrix W and a measurement matrix
Figure BDA0002728152150000081
Wherein ∑'IIs an intra-class covariance matrix, sigma'EIs an inter-class covariance matrix;
and 4.3, giving a fusion feature corresponding to the query set probe and a candidate set galery to a pedestrian image represented by the fusion feature corresponding to the query set probe and the candidate set galery, performing feature similarity measurement on all features in the fusion feature corresponding to the candidate set galery, and finally obtaining a similarity ranking result, wherein the ranking result is determined by similarity, the higher the similarity is, the earlier the ranking result is, and the recognition is completed, wherein the similarity measurement adopts the Mahalanobis distance, and the fusion features corresponding to the query set probe and the candidate set galery are respectively input into a subspace mapping matrix W and a measurement matrix
Figure BDA0002728152150000082
And obtaining the Mahalanobis distance of the corresponding features of the given query set probe and the candidate set galery in the subspace, wherein the smaller the distance is, the higher the similarity is.
Examples
The invention relates to a pedestrian re-identification method based on DCCA fusion characteristics, which is implemented according to the following steps:
step 1, preprocessing a pedestrian re-identification data set, adjusting the size of an image to a proper size, selecting a marker 1501 data set from the pedestrian re-identification data set, dividing the data set into a training set train and a test set, wherein the test set comprises a query set probe and a candidate set galery; respectively adjusting the pedestrian re-identification images in the training set train and the testing set to be 224 multiplied by 224 pixel size and 256 multiplied by 128 pixel size;
the pedestrian re-identification data set is mark 1501, which is shot by 6 cameras (5 high-definition cameras and 1 low-definition camera) at Qinghua university in summer, and 1501 pedestrians and 32688 detected rectangular frames are shot in total, each pedestrian is captured by at least 2 cameras, and multiple images can be obtained in one camera, the training set comprises 751 persons, and the test set comprises 750 persons; each image is scaled to a size of 128 x 48 pixels and the picture resolution is resized in order to meet the input requirements of the depth network used. For the vgg16 network, the picture size is adjusted to 224 × 224, for the omni-scale network, the picture size is adjusted to 256 × 128; data set market1501 contains 1501 pedestrians, 32688 total images, and is divided into train (751 people) and test (750 people), wherein train contains 12936 images, test contains probe and galery, wherein probe contains 3368 images, and galery contains 19732 images;
step 2, respectively extracting depth features of the pedestrian data set after processing based on vgg16 deep convolutional neural network and omni-scale deep convolutional neural network; the method specifically comprises the following steps:
step 2.1, constructing vgg16 deep convolutional neural network models, wherein the vgg16 deep convolutional neural network models comprise thirteen convolutional layers which are connected in sequence, and the output of the last convolutional layer is connected with three full-connection layers in sequence; as shown in fig. 2, after passing through thirteen convolutional layers and three full-link layers, the sizes of the characteristic diagrams are as follows: 224 × 224 × 64, 112 × 112 × 128, 28 × 28 × 256, 14 × 14 × 512, 7 × 7 × 512, 1 × 1 × 4096, and 1 × 1 × 751. The specific network structure parameters are set as follows: the convolution kernel size of the first layer convolution layer is 3 multiplied by 64, the step size is 1 multiplied by 1, and the filling mode is same; the convolution kernel size of the second layer of convolution layer is 3 multiplied by 64, the step size is 1 multiplied by 1, the filling mode is same, the pooling layer adopts the maximum pooling mode, and the pooling window size is 2 multiplied by 2; the convolution kernel size of the third layer of convolution layer is 3 multiplied by 128, the step size is 1 multiplied by 1, and the filling mode is same; the convolution kernel size of the fourth layer of convolution layer is 3 multiplied by 128, the step size is 1 multiplied by 1, the filling mode is same, the pooling layer adopts the maximum pooling mode, and the pooling window size is 2 multiplied by 2; the convolution kernel size of the fifth layer convolution layer is 3 multiplied by 256, the step size is 1 multiplied by 1, and the filling mode is same; the convolution kernel size of the sixth layer of convolution layer is 3 multiplied by 256, the step size is 1 multiplied by 1, and the filling mode is same; the convolution kernel size of the seventh layer of convolution layer is 3 multiplied by 256, the step size is 1 multiplied by 1, the filling mode is same, the pooling layer adopts the maximum pooling mode, and the pooling window size is 2 multiplied by 2; the convolution kernel size of the eighth layer of convolution layer is 3 × 3 × 512, the step size is 1 × 1, and the filling mode is same; the convolution kernel size of the ninth convolution layer is 3 × 3 × 512, the step size is 1 × 1, and the filling mode is same; the convolution kernel size of the tenth layer of convolution layer is 3 multiplied by 512, the step size is 1 multiplied by 1, the filling mode is same, the pooling layer adopts the maximum pooling mode, and the pooling window size is 2 multiplied by 2; the convolution kernel size of the eleventh convolution layer is 3 multiplied by 512, the step size is 1 multiplied by 1, and the filling mode is same; the convolution kernel size of the twelfth convolution layer is 3 multiplied by 512, the step size is 1 multiplied by 1, and the filling mode is same; the convolution kernel size of the thirteenth layer of convolution layer is 3 × 3 × 512, the step size is 1 × 1, the filling mode is same, the pooling layer adopts the maximum pooling mode, and the pooling window size is 2 × 2; then, the characteristic diagram of the thirteenth layer is flattened, the characteristic diagram is input into a full connection layer, the input neurons are randomly disconnected with a probability of 0.5 when the full connection layer of the fourteenth layer passes through a Dropout layer to update parameters so as to prevent overfitting and output 4096 neurons, the input neurons are randomly disconnected with a probability of 25% when the full connection layer of the twelfth layer passes through the Dropout layer to update parameters so as to prevent overfitting and output 4096 neurons, and the full connection layer of the sixteenth layer sets and outputs 751 neurons according to the category number of the data set;
step 2.2, constructing an omni-scale deep convolutional neural network model, wherein the omni-scale deep convolutional neural network model comprises five convolutional layers which are sequentially connected, the output of the last convolutional layer is sequentially connected with two transition layers, and the output of the last transition layer is connected with a full connection layer; as shown in fig. 3, after passing through the convolutional layer, the transition layer, and the full link layer, the characteristic diagram size is: 128 × 64 × 64, 64 × 32 × 256, 32 × 16 × 384, 16 × 8 × 512, and 1 × 1 × 512. The specific network structure parameters are set as follows: the convolution kernel size of the first layer of convolution layer is 7 multiplied by 64, the step length is 2 multiplied by 2, the pooling layer adopts the maximum pooling mode, and the pooling window size is 2 multiplied by 2; the second convolutional layer comprises two bottleeck block structures, the bottleeck structure is shown in fig. 4, an improved residual block structure is adopted, 4 convolutional stream branches are used, the Lite3 × 3 structure is a deep separable convolution, as shown in fig. 5, the deep separable convolution improves the standard convolution, the 3 × 3 convolution is changed into 1 × 1 point convolution and 3 × 3 deep convolution, and parameters to be updated by the network are reduced. The third layer of transition layer comprises a convolution layer and an average pooling layer, the size of convolution kernels of the convolution layer is 1 multiplied by 256, the step size is 1 multiplied by 1, the pooling layer adopts an average pooling mode, the pooling window size is 2 multiplied by 2, and the step size is 2 multiplied by 2; the fourth layer of convolution layer comprises two bottompiece structures; the fifth layer of transition comprises a convolution layer and an average pooling layer, the convolution kernel of the convolution layer has the size of 1 multiplied by 256, the step size is 1 multiplied by 1, the pooling layer adopts an average pooling mode, the pooling window size is 2 multiplied by 2, and the step size is 2 multiplied by 2; the sixth layer of convolution layer comprises two bottompiece structures; the convolution kernel size of the seventh layer of convolution layer is 1 × 1 × 512, and the step size is 1 × 1; the seventh fc layer sets and outputs 751 neurons according to the data set category number;
step 2.3, migrating pre-training weight parameters for training the vgg16 deep convolutional neural network model constructed in the step 2.1 and the omni-scale deep convolutional neural network model constructed in the step 2.2 on the ImageNet data set respectively; the network model can have a good initialization parameter by the transfer learning, so that the convergence of the network is accelerated, and the generalization capability of the network is improved;
step 2.4, the training set train with image pixel size of 224 × 224 and the training set with image pixel size of 256 × 128 are input to the vgg16 deep convolutional neural network model and omni-sc processed in step 2.3, respectivelyale, training the deep convolutional neural network models respectively, and during training, initializing part of weight parameters of the vgg16 deep convolutional neural network model constructed in the step 2.1 and the omni-scale deep convolutional neural network model constructed in the step 2.2 by using pre-training weight parameters, namely: the pre-training weight parameters are applied to sequentially initialize vgg16 weight parameters of the first thirteen layers of the deep convolutional neural network model layer by layer, and then the weight parameters of the full-connection layer of the last three layers are endowed with random values; the pre-training weight parameters are applied to sequentially and layer by layer correspondingly initializing the weight parameters of the first seven layers of the omni-scale deep convolutional neural network model, then the weight parameters of the full connection layer of the last layer are endowed with random values, and finally output depth features of the two vgg16 deep convolutional neural network models and the omni-scale deep convolutional neural network model are respectively extracted and recorded as H1And H2In which H is1∈Ro×m,H2∈Ro×mO represents the dimensions of two features, m represents the number of samples of the dataset, R is the set of real numbers, and both outputs are 751 dimensions;
step 3, performing typical correlation analysis on the extracted depth features, solving respective projection matrixes, and performing feature fusion on the projected features according to a feature fusion strategy; the method specifically comprises the following steps:
step 3.1, adding H1 H1And H2Standardizing to obtain standard data with the mean value of 0 and the variance of 1;
step 3.2, calculate H1Variance of (2)
Figure BDA0002728152150000121
H2Variance of (2)
Figure BDA0002728152150000122
H1And H2Covariance of
Figure BDA0002728152150000123
Step 3.3, calculate the matrix
Figure BDA0002728152150000124
Step 3.4, performing singular value decomposition on the matrix T to obtain a maximum singular value rho and left and right singular vectors u, v corresponding to the maximum singular value;
step 3.5, calculate H1And H2Projection matrix A of1And A2
Figure BDA0002728152150000125
The representation of the two depth features in the relevant subspace is then
Figure BDA0002728152150000126
And
Figure BDA0002728152150000127
step 3.6, fusion characteristics are expressed as
Figure BDA0002728152150000128
Or F2=H'1+H'2=A1H1+A2H2The dimension is r;
step 4, completing the whole pedestrian re-identification process by using the fused features; the method specifically comprises the following steps:
step 4.1, respectively inputting the images in the query set probe and the candidate set galery into the vgg16 deep convolutional neural network model and the omni-scale deep convolutional neural network model processed in step 2.3 according to step 2.4, respectively training the images, respectively extracting depth features, and then calculating fusion features corresponding to the query set probe and the candidate set galery according to step 3, wherein the dimensions are r;
step 4.2, the fusion characteristics after train set train training and the corresponding training sample labels are used as input and input into an XQDA algorithm, and the output is a subspace mapping matrix W and a measurement matrix
Figure BDA0002728152150000131
Wherein ∑'IIs an intra-class covariance matrix, sigma'EIs an inter-class covariance matrix;
and 4.3, giving a fusion feature corresponding to the query set probe and a candidate set galery to a pedestrian image represented by the fusion feature corresponding to the query set probe and the candidate set galery, performing feature similarity measurement on all features in the fusion feature corresponding to the candidate set galery, and finally obtaining a similarity ranking result, wherein the ranking result is determined by similarity, the higher the similarity is, the earlier the ranking result is, and the recognition is completed, wherein the similarity measurement adopts the Mahalanobis distance, and the fusion features corresponding to the query set probe and the candidate set galery are respectively input into a subspace mapping matrix W and a measurement matrix
Figure BDA0002728152150000132
And obtaining the Mahalanobis distance of the corresponding features of the given query set probe and the candidate set galery in the subspace, wherein the smaller the distance is, the higher the similarity is.
The evaluation results used CMC curves, using rank1, rank5, rank10, rank20 as evaluation indexes, where the value of rank1 is particularly important in evaluating the effect of pedestrian re-identification.
The derivation process of step 3 is as follows:
solving for H using CCA (canonical correlation analysis)1And H2Is given by H1And H2Respectively is A1And A2Their representation in subspace is:
Figure BDA0002728152150000133
and
Figure BDA0002728152150000134
their correlation coefficient can be expressed as:
the objective function is:
Figure BDA0002728152150000135
i.e. solving the mapping matrix A corresponding to the maximum correlation coefficient1 A1And A2
Before projection, raw data is first normalized to obtain data with a mean of 0 and a variance of 1, such that:
Figure BDA0002728152150000141
Figure BDA0002728152150000142
by the same method
Figure BDA0002728152150000143
Wherein H1And H2Representing two network depth features, Cov represents covariance matrix, E represents expectation, and Var represents variance matrix.
Due to H1And H2All mean values of (a) are 0, then:
Figure BDA0002728152150000144
Figure BDA0002728152150000145
Figure BDA0002728152150000146
Figure BDA0002728152150000147
order to
Figure BDA0002728152150000148
The objective function is converted into
Figure BDA0002728152150000149
Since the numerator denominator is increased by the same factor, the optimization target result is unchanged. And (4) fixing denominators and optimizing numerators by adopting an optimization method similar to the SVM. Namely:
Figure BDA00027281521500001410
Figure BDA00027281521500001411
in solving the objective function in (6), a method of SVD (singular value decomposition) may be employed,
order to
Figure BDA00027281521500001412
u, v are unit vectors
Then
Figure BDA00027281521500001413
At the same time, from
Figure BDA00027281521500001414
The following can be obtained:
Figure BDA00027281521500001415
by
Figure BDA0002728152150000151
The following can be obtained:
Figure BDA0002728152150000152
at this time, the objective function is:
Figure BDA0002728152150000153
s.t.uTu=1,vTv=1
for the objective function, let the matrix
Figure BDA0002728152150000154
In this case, U and V may be regarded as left and right singular vectors corresponding to one singular value of the matrix T, and T ═ U Σ V may be obtained by singular value decompositionTU and V are matrixes formed by a left singular vector and a right singular vector of T respectively, and sigma is a diagonal matrix formed by singular values of T. Since all columns of U, V are orthonormal bases, UTU and VTv gets a vector with only one scalar 1 and the remaining scalars 0. Maximization
Figure BDA0002728152150000155
The corresponding maximum value is the maximum value of singular values corresponding to a group of left and right singular vectors, namely after sigma is subjected to singular value decomposition, the maximum singular value is the maximum value of an optimization target, namely H1And H2The maximum correlation coefficient therebetween. Using the left and right singular vectors to calculate H1And H2Projection matrix A of1And A2Are respectively
Figure BDA0002728152150000156
By projecting a matrix A1And A2Based on a feature fusion strategy, corresponding fusion is carried out on features of DCCA (deep canonical correlation analysis), and specific fusion modes include the following two modes:
Figure BDA0002728152150000157
F2=H'1+H'2=A1H1+A2H2
based on a fusion feature pedestrian re-identification method and CCA (canonical correlation analysis), the method improves feature robustness by combining the advantages of vgg16 and the omni-scale depth network, simultaneously utilizes DCCA (canonical correlation analysis) algorithm to carry out maximum correlation analysis on two depth features, and finally selects a feature fusion strategy to fuse the two features.

Claims (8)

1. A pedestrian re-identification method based on DCCA fusion features is characterized by comprising the following steps:
step 1, preprocessing a pedestrian re-identification data set, and adjusting the size of an image to a proper size;
step 2, respectively extracting depth features of the pedestrian data set after processing based on vgg16 deep convolutional neural network and omni-scale deep convolutional neural network;
step 3, performing typical correlation analysis on the extracted depth features, solving respective projection matrixes, and performing feature fusion on the projected features according to a feature fusion strategy;
and 4, completing the whole pedestrian re-identification process by using the fused features.
2. The method as claimed in claim 1, wherein the pedestrian re-identification data set is a marker 1501 data set, and the data set is divided into a training set train and a test set, wherein the test set includes a query set probe and a candidate set galery.
3. The DCCA-fusion-feature-based pedestrian re-recognition method according to claim 2, wherein the pedestrian re-recognition images in the training set train and the testing set are adjusted to 224 x 224 pixel size and 256 x 128 pixel size, respectively.
4. The pedestrian re-identification method based on DCCA fusion characteristics according to claim 3, wherein said step 2 specifically is:
step 2.1, constructing vgg16 deep convolutional neural network models, wherein the vgg16 deep convolutional neural network models comprise thirteen convolutional layers which are connected in sequence, and the output of the last convolutional layer is connected with three full-connection layers in sequence;
step 2.2, constructing an omni-scale deep convolutional neural network model, wherein the omni-scale deep convolutional neural network model comprises five convolutional layers which are sequentially connected, the output of the last convolutional layer is sequentially connected with two transition layers, and the output of the last transition layer is connected with a full connection layer;
step 2.3, migrating pre-training weight parameters for training the vgg16 deep convolutional neural network model constructed in the step 2.1 and the omni-scale deep convolutional neural network model constructed in the step 2.2 on the ImageNet data set respectively;
step 2.4, inputting the training set train with the image pixel size of 224 × 224 and the training set train with the image pixel size of 256 × 128 into the vgg16 deep convolutional neural network model and the omni-scale deep convolutional neural network model processed in step 2.3 respectively to train the models, initializing partial weight parameters of the vgg16 deep convolutional neural network model constructed in step 2.1 and the omni-scale deep convolutional neural network model constructed in step 2.2 by using pre-training weight parameters during training, then extracting the finally output depth features of the two vgg16 deep convolutional neural network models and the omni-scale deep convolutional neural network model respectively and recording the finally output depth features as H1And H2In which H is1∈Ro×m,H2∈Ro×mO represents the dimensions of the two features, m represents the number of samples of the data set, and R is the real number set.
5. The pedestrian re-identification method based on DCCA fusion characteristics according to claim 4, wherein in step 2.4, the vgg16 deep convolutional neural network model constructed in step 2.1 and part of the weight parameters of the omni-scale deep convolutional neural network model constructed in step 2.2 are initialized with pre-trained weight parameters, specifically: the pre-training weight parameters are applied to sequentially initialize vgg16 weight parameters of the first thirteen layers of the deep convolutional neural network model layer by layer, and then the weight parameters of the full-connection layer of the last three layers are endowed with random values; and (3) applying the pre-training weight parameters to sequentially and correspondingly initialize the weight parameters of the first seven layers of the omni-scale deep convolutional neural network model layer by layer, and then endowing the weight parameters of the fully-connected layer of the last layer with random values.
6. The pedestrian re-identification method based on DCCA fusion characteristics according to claim 5, wherein said step 3 specifically is:
step 3.1, adding H1And H2Standardizing to obtain standard data with the mean value of 0 and the variance of 1;
step 3.2, calculate H1Variance of (2)
Figure FDA0002728152140000031
H2Variance of (2)
Figure FDA0002728152140000032
H1And H2Covariance of
Figure FDA0002728152140000033
Step 3.3, calculate the matrix
Figure FDA0002728152140000034
Step 3.4, performing singular value decomposition on the matrix T to obtain a maximum singular value rho and left and right singular vectors u, v corresponding to the maximum singular value;
step 3.5, calculate H1And H2Projection matrix A of1And A2
Figure FDA0002728152140000035
The representation of the two depth features in the relevant subspace is then
Figure FDA0002728152140000036
And
Figure FDA0002728152140000037
step 3.6, fusion characteristics are expressed as
Figure FDA0002728152140000038
Or F2=H′1+H′2=A1H1+A2H2And the dimension is r.
7. The pedestrian re-identification method based on DCCA fusion characteristics according to claim 6, wherein said step 4 specifically is:
step 4.1, respectively inputting the images in the query set probe and the candidate set galery into the vgg16 deep convolutional neural network model and the omni-scale deep convolutional neural network model processed in step 2.3 according to step 2.4, respectively training the images, respectively extracting depth features, and then calculating fusion features corresponding to the query set probe and the candidate set galery according to step 3, wherein the dimensions are r;
step 4.2, the fusion characteristics after train set train training and the corresponding training sample labels are used as input and input into an XQDA algorithm, and the output is a subspace mapping matrix W and a measurement matrix
Figure FDA0002728152140000039
Wherein ∑'IIs an intra-class covariance matrix, sigma'EIs an inter-class covariance matrix;
and 4.3, giving the fusion feature corresponding to the query set probe according to the pedestrian image represented by the fusion feature corresponding to the query set and the candidate set galery, and performing feature similarity measurement on all features in the fusion feature corresponding to the query set probe and the candidate set galery to finally obtain a similarity ranking result, wherein the ranking result is determined by similarity, and the higher the similarity is, the earlier the ranking result is, and the recognition is completed.
8. A substrate according to claim 7The pedestrian re-identification method based on DCCA fusion characteristics is characterized in that the similarity measurement in the step 4.3 adopts the Mahalanobis distance, and the fusion characteristics corresponding to the query set probe and the candidate set galery are respectively input into the subspace mapping matrix W and the measurement matrix
Figure FDA0002728152140000041
And obtaining the Mahalanobis distance of the corresponding features of the given query set probe and the candidate set galery in the subspace, wherein the smaller the distance is, the higher the similarity is.
CN202011109621.5A 2020-10-16 2020-10-16 Pedestrian re-identification method based on DCCA fusion characteristics Pending CN112270228A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011109621.5A CN112270228A (en) 2020-10-16 2020-10-16 Pedestrian re-identification method based on DCCA fusion characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011109621.5A CN112270228A (en) 2020-10-16 2020-10-16 Pedestrian re-identification method based on DCCA fusion characteristics

Publications (1)

Publication Number Publication Date
CN112270228A true CN112270228A (en) 2021-01-26

Family

ID=74337570

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011109621.5A Pending CN112270228A (en) 2020-10-16 2020-10-16 Pedestrian re-identification method based on DCCA fusion characteristics

Country Status (1)

Country Link
CN (1) CN112270228A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111974A (en) * 2021-05-10 2021-07-13 清华大学 Vision-laser radar fusion method and system based on depth canonical correlation analysis

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508731A (en) * 2018-10-09 2019-03-22 中山大学 A kind of vehicle based on fusion feature recognition methods, system and device again
CN110874576A (en) * 2019-11-14 2020-03-10 西安工程大学 Pedestrian re-identification method based on canonical correlation analysis fusion features
CN111401178A (en) * 2020-03-09 2020-07-10 蔡晓刚 Video target real-time tracking method and system based on depth feature fusion and adaptive correlation filtering
WO2020164270A1 (en) * 2019-02-15 2020-08-20 平安科技(深圳)有限公司 Deep-learning-based pedestrian detection method, system and apparatus, and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508731A (en) * 2018-10-09 2019-03-22 中山大学 A kind of vehicle based on fusion feature recognition methods, system and device again
WO2020164270A1 (en) * 2019-02-15 2020-08-20 平安科技(深圳)有限公司 Deep-learning-based pedestrian detection method, system and apparatus, and storage medium
CN110874576A (en) * 2019-11-14 2020-03-10 西安工程大学 Pedestrian re-identification method based on canonical correlation analysis fusion features
CN111401178A (en) * 2020-03-09 2020-07-10 蔡晓刚 Video target real-time tracking method and system based on depth feature fusion and adaptive correlation filtering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KAIYANG ZHOU 等: "Omni-Scale Feature Learning for Person Re-Identification", 《ARXIV》, pages 1 - 14 *
曾超: "基于深度学习的车辆重识别", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》, no. 3, pages 034 - 863 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111974A (en) * 2021-05-10 2021-07-13 清华大学 Vision-laser radar fusion method and system based on depth canonical correlation analysis
CN113111974B (en) * 2021-05-10 2021-12-14 清华大学 Vision-laser radar fusion method and system based on depth canonical correlation analysis
US11532151B2 (en) 2021-05-10 2022-12-20 Tsinghua University Vision-LiDAR fusion method and system based on deep canonical correlation analysis

Similar Documents

Publication Publication Date Title
CN110110642B (en) Pedestrian re-identification method based on multi-channel attention features
CN111814584B (en) Vehicle re-identification method based on multi-center measurement loss under multi-view environment
CN107341452B (en) Human behavior identification method based on quaternion space-time convolution neural network
CN108460356B (en) Face image automatic processing system based on monitoring system
CN109543602B (en) Pedestrian re-identification method based on multi-view image feature decomposition
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN108520226B (en) Pedestrian re-identification method based on body decomposition and significance detection
CN106372581B (en) Method for constructing and training face recognition feature extraction network
CN108921107B (en) Pedestrian re-identification method based on sequencing loss and Simese network
KR102224253B1 (en) Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof
CN109977757B (en) Multi-modal head posture estimation method based on mixed depth regression network
CN110321830B (en) Chinese character string picture OCR recognition method based on neural network
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
CN110580460A (en) Pedestrian re-identification method based on combined identification and verification of pedestrian identity and attribute characteristics
CN110807434A (en) Pedestrian re-identification system and method based on combination of human body analysis and coarse and fine particle sizes
CN109255289B (en) Cross-aging face recognition method based on unified generation model
CN109447123B (en) Pedestrian re-identification method based on label consistency constraint and stretching regularization dictionary learning
CN111783576A (en) Pedestrian re-identification method based on improved YOLOv3 network and feature fusion
Ma et al. Orientation driven bag of appearances for person re-identification
CN110097029B (en) Identity authentication method based on high way network multi-view gait recognition
CN107169117B (en) Hand-drawn human motion retrieval method based on automatic encoder and DTW
CN108280421B (en) Human behavior recognition method based on multi-feature depth motion map
CN110728216A (en) Unsupervised pedestrian re-identification method based on pedestrian attribute adaptive learning
CN111353447A (en) Human skeleton behavior identification method based on graph convolution network
CN111639580A (en) Gait recognition method combining feature separation model and visual angle conversion model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination