CN110163110B

CN110163110B - Pedestrian re-recognition method based on transfer learning and depth feature fusion

Info

Publication number: CN110163110B
Application number: CN201910329733.2A
Authority: CN
Inventors: 丁剑飞; 王进; 阚丹会; 闫盈盈; 曹扬
Original assignee: CETC Big Data Research Institute Co Ltd
Current assignee: CETC Big Data Research Institute Co Ltd
Priority date: 2019-04-23
Filing date: 2019-04-23
Publication date: 2023-06-06
Anticipated expiration: 2039-04-23
Also published as: CN110163110A

Abstract

The invention provides a pedestrian re-identification method based on transfer learning and depth feature fusion, which comprises the following steps: pre-training, human posture correction and segmentation, feature vector-depth feature fusion, training model, test model and recognition result. According to the invention, the pedestrian global and local features are extracted by utilizing the deep convolutional neural network, the two features are subjected to deep fusion to obtain final pedestrian feature characterization, then in the training process of the deep convolutional neural network, a pedestrian re-recognition network model with better effect is obtained by adopting a transfer learning mode, and finally, the features extracted by the pedestrian re-recognition network model have stronger resolution capability, so that the purpose of improving the pedestrian re-recognition accuracy is achieved.

Description

Pedestrian re-recognition method based on transfer learning and depth feature fusion

Technical Field

The invention relates to a pedestrian re-identification method based on transfer learning and depth feature fusion, and belongs to the technical field of deep learning and transfer learning.

Background

Pedestrian re-identification is mainly aimed at carrying out pedestrian matching tasks under a non-overlapping view field multi-camera network, namely, finding out target pedestrians shot by cameras at different positions at different moments.

With the development of artificial intelligence technology, pedestrian re-identification technology in application scenes such as public security and image retrieval is focused on in a wide research area. However, compared with the traditional biological recognition technologies such as face recognition and gesture recognition, the pedestrian re-recognition technology has the problem of low recognition accuracy caused by factors such as low image resolution, visual angle change, gesture change, light change and shielding due to complex and uncontrollable environment of a monitoring video. Therefore, the pedestrian re-recognition technology faces a great challenge in the practical application scene.

In order to improve the accuracy of pedestrian re-recognition and enhance the robustness of the system, a plurality of scholars have proposed different pedestrian re-recognition methods through long-term researches. The open vision technology Facci++ has made great progress in the field of pedestrian Re-recognition, and the paper alignedReID published by the team provides a new method, and through dynamic alignment (Dynamic Alignment) and collaborative learning (Mutual learning), and then in Re-sequencing (Re-Ranking), the paper finds out through experiments that the recognition accuracy of extracting global features of pedestrians and fusing global and local features of pedestrians is almost the same in the test stage; yi and the like propose a deep measurement learning method based on a twin convolutional neural network, and good effects are obtained; liu et al propose a deep nonlinear measurement learning method based on neighborhood component analysis and a deep confidence network, and the function of neighborhood transformation analysis is to maximize the number of identifiable samples of each type of data in training data through data transformation, and in order to expand the data transformation in the neighborhood transformation analysis, the deep confidence network is used for learning nonlinear feature transformation. However, in research, it is found that most of the pedestrian re-recognition methods extract global feature vectors based on pedestrian global images as input in the training process, and some methods extract local features, but do not fully utilize the pedestrian local features to perform depth fusion, so as to obtain differentiated image characterization. And simple fine-tuning on the pedestrian database using the pre-training model does not take into account the data distribution differences between the source domain and target domain datasets. Further, the network migration effect is not ideal.

Disclosure of Invention

In order to solve the technical problems, the invention provides a pedestrian re-recognition method based on transfer learning and depth feature fusion, which can not deeply fuse global features and local features of pedestrians and does not fully consider the difference of data distribution in the network fine tuning process.

The invention is realized by the following technical scheme.

The invention provides a pedestrian re-identification method based on transfer learning and depth feature fusion, which comprises the following steps:

(1) pre-training: pre-training the pre-training model based on the ImageNet on the pedestrian re-recognition data to obtain a pedestrian re-recognition pre-training network model;

(2) correcting and dividing human body posture: selecting a refractory sample pair from a pedestrian data set, inputting the refractory sample pair into a human skeleton key point detection network, detecting fourteen key points, correcting the posture of the human body and segmenting a pedestrian local ROI, and obtaining a refractory sample pair with enhanced data and corrected global and local images;

(3) feature vector: inputting corrected global and local images and difficult-to-separate sample pairs with enhanced data into a pedestrian re-identification pre-training network model to obtain pedestrian local and global feature vectors;

(4) depth feature fusion: depth feature fusion is carried out on the local feature vector and the global feature vector of the pedestrian, and a final pedestrian feature vector is obtained;

(5) training a model: fine tuning the pedestrian re-recognition pre-training network model by adopting a transfer learning mode and the final pedestrian feature vector in the step (4), and adding a self-adaptive layer into the pedestrian re-recognition pre-training network model to obtain the pedestrian re-recognition network model;

(6) test model: inputting inquiry pedestrian and target pedestrian images, and extracting two distinguishable pedestrian global feature vectors by using a pedestrian re-recognition network model;

(7) recognition result: and (3) calculating the similarity between the query pedestrian and any image in the target pedestrian data set based on the pedestrian global feature vector in the step (6), wherein the pedestrian with the highest similarity is considered to be the same pedestrian.

And in the training stage, inputting a pedestrian re-recognition network model, and adopting a triplet pedestrian image.

The step (1) is divided into the following steps:

(1.1) acquiring a depth convolution network model trained in advance on an ImageNet data set, and training the depth convolution network model on pedestrian re-identification data;

and (1.2) when the deep convolutional neural network model is pre-trained on the pedestrian re-identification data, only the sample marking information is used for fine tuning the deep convolutional neural network model.

The step (1.2) is divided into the following steps:

(1.2.1) removing the top full connection layer from the pre-trained ResNet50 network model on the ImageNet dataset, and adding two full connection layers and one softmax layer after the maximum pooling layer;

(1.2.2) finely adjusting the constructed deep convolutional neural network by using label information marked by pedestrian images, and fixing the first three layers of the deep convolutional neural network in the fine adjustment process;

(1.2.3) obtaining the prediction probability of the pedestrian global image according to the deep convolutional neural network;

(1.2.4) defining a loss function in the deep convolutional neural network according to the predictive probability.

The corrected global and local images obtained in the step (2) are a pedestrian image of a positive and negative sample triplet, a pedestrian global image corrected by human body posture and a local ROI image.

The step (2) is divided into the following steps:

(2.1) randomly selecting P ID pedestrians from each training batch, randomly selecting K different images from each pedestrian, wherein each batch contains P multiplied by K pedestrian images;

(2.2) taking the image in each training batch as the anchor sample H _n Selecting a positive sample which is the most difficult to use

And a most difficult negative sample +.>

And H _n Forming a triplet, wherein the requirement for selecting refractory sample pair is +.>

Maximum (max)/(min)>

Minimum;

(2.3) inputting the pedestrian images of the positive and negative sample triplets which are difficult to separate into a human skeleton key point detection network, respectively detecting fourteen human skeleton key points, including a head, four limbs, an upper half body and a lower half body, and correcting the human body posture by taking the fourteen key points as coordinates;

and (2.4) dividing the pedestrian global image into three pedestrian local ROI images of the head, the upper body and the lower body according to fourteen human skeleton key points, and obtaining a corrected pedestrian global image and three pedestrian local images.

In the step (2.2), the pre-trained deep convolutional neural network model in the step (1.1) is used for obtaining the anchor sample H _n The pedestrian image samples with the lowest scores are selected from the same pedestrian ID images to form a difficult-to-separate positive sample pair, and the difficult-to-separate positive sample pair is matched with an anchor sample H _n And selecting pedestrian image samples with highest scores from different pedestrian ID images to form a difficult-to-separate negative sample pair.

The step (3) is divided into the following steps:

(3.1) obtaining the pre-training depth convolutional neural network model in the step (1.1) and the global and local images corrected in the step (2), and removing a softmax layer and a full connection layer on the top layer of the pre-training depth convolutional neural network model;

and (3.2) respectively inputting the hard-to-separate sample pair with the enhanced data and the corrected global and local images into a deep convolutional neural network model, and obtaining a pedestrian global feature vector and a pedestrian local feature vector through the deep convolutional neural network model constructed in the step (3.1).

The step (4) is divided into the following steps:

(4.1) inputting the pedestrian local and global feature vectors in the step (3) into a full-connection layer, and carrying out depth feature fusion to obtain the pedestrian feature vectors after output fusion;

and (4.2) respectively inputting the fused pedestrian characteristic vector and the pedestrian local characteristic vector in the step (3.2) into a square layer, wherein the square layer measures the similarity between the refractory sample pairs by using the square Euclidean distance.

The step (6) is divided into the following steps:

(6.1) inputting the inquiry and target pedestrian images into a human body key point and posture correction network to correct the human body posture;

and (6.2) inputting the pedestrian image with the corrected human body posture into a pedestrian re-recognition network model to obtain a pedestrian global feature vector.

The invention has the beneficial effects that: the pedestrian global feature and the pedestrian local feature are extracted through the deep convolutional neural network, the two features are subjected to deep fusion to obtain final pedestrian feature characterization, then in the deep convolutional neural network training process, a pedestrian re-recognition network model with better effect is obtained by adopting a transfer learning mode, and finally the features extracted by the pedestrian re-recognition network model have stronger resolving power, so that the purpose of improving the pedestrian re-recognition accuracy is achieved.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a network block diagram of a global feature and local feature depth fusion in accordance with an embodiment of the present invention;

fig. 3 is a network structure diagram of a deep feature fusion and local feature learning model based on a deep convolutional neural network according to an embodiment of the present invention.

Detailed Description

The technical solution of the present invention is further described below, but the scope of the claimed invention is not limited to the above.

As shown in fig. 1, a pedestrian re-recognition method based on transfer learning and depth feature fusion comprises the following steps:

The step (1) is divided into the following steps:

The step (1.2) is divided into the following steps:

The step (2) is divided into the following steps:

And a most difficult negative sample +.>

Maximum (max)/(min)>

Minimum;

The step (3) is divided into the following steps:

The step (4) is divided into the following steps:

The step (6) is divided into the following steps:

In summary, the invention utilizes the advantages of self-adaptive learning of transfer learning and deep learning, and adopts the fusion of the local features and the integral features of the pedestrian image to obtain the network model which can pay attention to the local features of the pedestrian, thereby improving the accuracy of pedestrian re-identification.

Example 1

As described above, the pedestrian re-recognition method based on transfer learning and depth feature fusion comprises the following steps:

(1) pre-training: pre-training the pre-training model based on the ImageNet on the pedestrian re-recognition data to obtain a pedestrian re-recognition pre-training network model; the method comprises the following steps:

(1.2) when the deep convolutional neural network model is pre-trained on the pedestrian re-recognition data, only sample marking information is utilized to finely tune the deep convolutional neural network model;

(1.2.2) fine-tuning the constructed deep convolutional neural network by using label information marked by pedestrian images, wherein the front three layers of the deep convolutional neural network are fixed in the fine-tuning process, and the characteristics extracted from the front three layers of the deep convolutional neural network are usually textures, edges and the like, so that the characteristics have certain universality;

(1.2.3) obtaining the prediction probability y of the pedestrian global image according to the deep convolutional neural network _i Expressed as:

/>

wherein y is _i Representing the probability that sample x belongs to the i-th class,

is a normalization term, C is the total number of categories;

(1.2.4) defining a loss function L in the deep convolutional neural network according to the prediction probability _I Expressed as:

wherein q _j Representing the tag probability, C is the total number of categories.

(2) Correcting and dividing human body posture: selecting a difficult-to-separate sample pair from a pedestrian data set, inputting the difficult-to-separate sample pair into fourteen key points detected by a human skeleton key point detection network, correcting the posture of a human body and segmenting a local ROI of the human body, and obtaining a difficult-to-separate sample pair with enhanced data and corrected global and local images; the corrected global and local images are a difficult-to-separate positive and negative sample triplet pedestrian image, a pedestrian global image and a local ROI image after human body posture correction; the method comprises the following steps:

And a most difficult negative sample +.>

Maximum (max)/(min)>

Minimum;

specifically, using the pre-trained deep convolutional neural network model of step (1.1), in combination with the anchor sample H _n The pedestrian image samples with the lowest scores are selected from the same pedestrian ID images to form a difficult-to-separate positive sample pair, and the difficult-to-separate positive sample pair is matched with an anchor sample H _n Selecting pedestrian image samples with highest scores from different pedestrian ID images to form a difficult-to-separate negative sample pair;

(3) Feature vector: inputting corrected global and local images and difficult-to-separate sample pairs with enhanced data into a pedestrian re-identification pre-training network model to obtain pedestrian local and global feature vectors; the method comprises the following steps:

(3.2) respectively inputting the hard-to-separate sample pair after data enhancement and the global and local images after correction into a deep convolutional neural network model, and constructing the deep convolutional neural network model through the step (3.1)Obtaining a pedestrian global feature vector A and a pedestrian local feature vector B ₁ 、B ₂ 、B ₃ Wherein B is ₁ For head region feature vector, B ₂ For the upper body region feature vector, B ₃ Is the lower body feature vector.

Further, when the hard-to-separate sample pairs after data enhancement are respectively input into the deep convolutional neural network in parallel, the deep convolutional neural network models are simultaneously propagated and weight is shared.

(4) Depth feature fusion: depth feature fusion is carried out on the local feature vector and the global feature vector of the pedestrian, and a final pedestrian feature vector is obtained; the method comprises the following steps:

(4.1) inputting the pedestrian local and global feature vectors in the step (3) into a full-connection layer, and carrying out depth feature fusion to obtain an output fused pedestrian feature vector C;

(4.2) combining the pedestrian feature vector C and the pedestrian local feature vector B in the step (3.2) ₁ 、B ₂ 、B ₃ And respectively inputting a square layer, wherein the square layer measures the similarity between the refractory sample pairs by using the square Euclidean distance, and the similarity is expressed as follows:

wherein a is the anchor sample, p is the most difficult positive sample, n is the most difficult negative sample, d _a,p To difficultly divide the distance between positive sample pairs, d _a,n The distance between the negative sample pairs is difficult to separate.

Preferably, in order to enable the deep convolutional neural network to extract pedestrian features with higher resolution, and make full use of labeling information of pedestrian samples, cross entropy loss and triplet loss are used in the training process, wherein the deep convolutional neural network which fuses global features and local features uses two loss functions in the training process, and the deep convolutional neural network for extracting the head, the upper body and the lower body uses only the triplet loss functions;

further, the deep convolutional neural network that fuses global and local features uses cross entropy loss and TriHard loss, expressed as:

the sample set having the same ID as the anchor sample a is a, and the remaining sample set having a different ID is B. L (L) _th For TriHard loss, L _I For cross entropy loss, L _th Alpha in (a) is a threshold parameter considered to be set, q _j Representing the tag probability, C being the total number of categories;

further, the deep convolutional neural network for extracting the head, the upper body and the lower body of the pedestrian uses a TriHard loss function, and the deep convolutional neural network for extracting the global features of the pedestrian can pay more attention to the local features with distinguishing property by sharing weight parameters, wherein the loss function L _th The method comprises the following steps:

the sample set having the same ID as the anchor sample a is a, and the remaining sample set having a different ID is B. L (L) _th Is a threshold parameter considered to be set;

finally, the loss of the extracted depth fusion features and the loss of the local features are weighted according to the corresponding weights to form Total loss, and the network parameters are updated by back propagation of the whole network;

(5) training a model: fine tuning the pedestrian re-recognition pre-training network model by adopting a transfer learning mode and the final pedestrian feature vector in the step (4), and adding a self-adaptive layer in the pedestrian re-recognition pre-training network model to obtain the pedestrian re-recognition network model; in order to obtain a better migration learning effect, an adaptive layer is added, so that the data distribution of a source domain and a target domain is closer, and the effect of re-identifying a network model by a pedestrian is better;

specifically, parameter learning of the multi-core MMD metric is added in training of the deep convolutional neural network, and the difference between a source domain and a target domain is measured, wherein the multi-core of the multi-core MMD metric is expressed as:

the distribution distance between the source domain and the target domain is expressed as:

wherein phi () is a mapping for mapping the original variable to a regenerated kernel hilbert space, and H represents that the metric distance is measured by phi () mapping the data into the regenerated hilbert space (RKHS);

the optimization objective of the adaptive layer consists of a loss function and an adaptive loss, expressed as:

where Θ represents all weight and bias parameters of the network, which are the target parameters of learning, l ₁ To l ₂ Is the beginning and ending layer of the network adaptation, the former does not adapt,

n _a representing a set of all annotation data in the source domain and the target domain, J () being a loss function;

specifically, after removing a top softmax layer from the obtained pre-training deep convolutional neural network, selecting a pedestrian image for input, calculating scores of a plurality of layers of convolutional layers at the top of the convolutional neural network by using a trained classifier, fixing a network before the layer with the highest score, and fine-tuning the layer with the highest score and the network layers after the layer with the highest score;

(6) test model: only using a pedestrian re-recognition network model, inputting and inquiring pedestrian and target pedestrian images, obtaining two distinguishing pedestrian feature vectors, and respectively extracting pedestrian global feature vectors from the two distinguishing pedestrian feature vectors; the method comprises the following steps:

Further, the input of the pedestrian re-recognition network model in the training stage adopts a triplet (triplet) pedestrian image.

Example 2

step S1, pre-training a pre-training model based on ImageNet on pedestrian re-recognition data to obtain a pedestrian re-recognition pre-training network model;

step S11, a depth convolution network model trained in advance on an ImageNet data set is obtained, and training is carried out on pedestrian re-identification data;

step S12, when the deep convolutional neural network model is pre-trained on the pedestrian re-identification data, only sample labeling information is used for fine tuning the network model;

step S121, removing the full connection layer of the top layer from a pre-trained ResNet50 network model on an ImageNet data set, and adding two full connection layers and one softmax layer after the maximum pooling layer;

further, the parameters of the added two full-connection layers are respectively 1×1×2048 and 1×1×751, the input images 224×224 are subjected to iterative optimization by adopting a gradient descent method when the ResNet50 is pre-trained, the iterative times are set to 75, the learning rate is initialized to 0.1, the weight attenuation value in the optimization process is set to 0.001, and 64 pedestrian samples are input into each batch;

and step S122, performing fine adjustment on the constructed deep convolutional neural network by using label information marked by the pedestrian image, and fixing the first three layers of the network in the fine adjustment process. Because the characteristics extracted from the first three layers of the convolutional neural network are usually textures, edges and the like, the characteristics have certain universality;

step S123, obtaining the prediction probability of the pedestrian global image according to the convolutional neural network, wherein the prediction probability is expressed as follows:

is a normalization term, C is the total number of categories;

preferably, c=751 when training and testing is performed on the Market-1501 database.

Step S124, setting the loss function in the convolutional neural network to L according to the prediction probability _I Expressed as:

wherein qj represents tag probability and C is 751;

step S2, in the training stage, a triplet (triplet) pedestrian image is adopted as input of the network model. Firstly, selecting a difficult-to-separate sample pair on a pedestrian data set, inputting the difficult-to-separate sample pair into a human skeleton key point detection network to detect fourteen key points, and then correcting human body posture and segmenting a pedestrian local ROI to obtain a corrected global image and a corrected local image;

step S21, randomly selecting pedestrians with P IDs from each training batch, randomly selecting K different images from each pedestrian, wherein each batch contains P multiplied by K pedestrian images;

specifically, 6 pedestrians with ID are selected in the embodiment, each ID pedestrian randomly selects 16 different images, and each batch contains 64 pedestrian images;

step S22, taking the image in each training batch as an anchor sample H _n Selecting a positive sample which is the most difficult to use

And a most difficult negative sample +.>

The maximum value of the total number of the components,

minimum;

further, using the convolutional neural network model pre-trained in step S1, in conjunction with the anchor sample H _n The pedestrian image samples with the lowest scores are selected from the same pedestrian ID images to form a difficult-to-separate positive sample pair, and the difficult-to-separate positive sample pair is matched with an anchor sample H _n Selecting pedestrian image samples with highest scores from different pedestrian ID images to form a difficult-to-separate negative sample pair;

step S23, inputting the triple pedestrian images into a human skeleton key point detection network, respectively detecting fourteen human skeleton key points, including a head, four limbs, an upper half body and a lower half body, and correcting the human body posture by taking the fourteen key points as coordinates;

step S24, dividing the pedestrian global image into three pedestrian local ROI images of the head, the upper body and the lower body according to fourteen human skeleton key points, and further obtaining a corrected pedestrian global image and three pedestrian local images;

s3, inputting the difficult-to-separate sample pair subjected to human body posture correction and data enhancement into a pre-training network so as to obtain local and global feature vectors of pedestrians;

step S31, obtaining a deep convolutional neural network model pre-trained in step S1, obtaining a pedestrian global image and a pedestrian local image based on a refractory sample pair in step S2, and removing a softmax layer and a full-connection layer on the top layer of the pre-trained deep convolutional neural network;

in this embodiment, the parameters of the added full connection layer are respectively 1×1×751, the input images 224×224 are subjected to iterative optimization by adopting a gradient descent method when the res net50 is pre-trained, the iteration times are set to 60, the learning rate of the first 20 iterations is initialized to 0.01, the learning rate of the latter 40 iterations is 0.001, the weight attenuation value in the optimization process is set to 0.0001, and 64 pedestrian samples are input into each batch;

step S32, the obtained refractory sample pairs are respectively input into a deep convolutional neural network, wherein the deep convolutional neural network comprises a pedestrian global image and a pedestrian local image, and the pedestrian global feature vector A and the pedestrian local feature vector B are obtained through the deep convolutional neural network constructed in the step S31 ₁ ,B ₂ ,B ₃ Wherein B is ₁ For head region feature vector, B ₂ For the upper body region feature vector, B ₃ Is the lower body feature vector;

s33, when the difficult-to-separate sample pairs are respectively input into the deep convolutional neural networks in parallel, all the deep convolutional neural network models propagate simultaneously and share weights;

s4, carrying out depth feature fusion on the obtained pedestrian local feature vector and the global feature vector to obtain a final pedestrian feature vector;

step S41, obtaining a pedestrian global feature vector A and a pedestrian local feature vector B through step S3 ₁ ,B ₂ ,B ₃ Then inputting the pedestrian global feature vector and the local feature vector into a layer of full-connection layer for depth feature fusion, and outputting a fused pedestrian feature vector C, as shown in figure 2;

step S42, fusing the pedestrian characteristic vector C and the local characteristic vector B ₁ ,B ₂ ,B ₃ And respectively inputting a square layer, wherein the square layer measures the similarity between the refractory sample pairs by using the square Euclidean distance, and the similarity is expressed as follows:

wherein a is the anchor sample, p is the most difficult positive sample, n is the most difficult negative sample, d _a,p To difficultly divide the distance between positive sample pairs, d _a,n The distance between the negative sample pairs is difficult to separate;

step S43, in order to enable the deep convolutional neural network to extract pedestrian features with higher resolution, and make full use of labeling information of pedestrian samples, cross entropy loss and triplet loss are used in the training network process, wherein the deep convolutional neural network which fuses global features and local features uses two loss functions in the training process, and the deep convolutional neural network which extracts head, upper body and lower body features uses only triplet loss functions;

step S431, the deep convolutional neural network fusing the global features and the local features uses cross entropy loss and TriHard loss, expressed as:

the sample set having the same ID as the anchor sample a is a, and the remaining sample set having a different ID is B. L (L) _th For TriHard loss, L _I For cross entropy loss, L _th Alpha in (a) is a threshold parameter considered to be set, q _j Representing tag probability, C751;

step S432, extracting the depth convolution neural network of the head, upper body and lower body of the pedestrian uses the TriHard loss function, and the global features of the pedestrian are extracted by sharing weight parametersThe depth convolution neural network of features can pay more attention to the distinguishing local features, wherein the loss function L _th The method comprises the following steps:

step S433, finally, the loss of the extracted depth fusion feature and the local feature is weighted according to the corresponding weight to form Total loss, and the network parameters are updated by back propagation on the whole network, as shown in FIG. 3;

further, the combination mode of each network loss weighted according to the corresponding weight is as follows:

wherein p is _c To extract the cross entropy loss of depth fusion features, p _t 、

TriHard loss of extracted depth fusion feature, head feature, upper body feature, lower body feature, respectively, weight factor alpha ₁ 、α ₂ 、α ₃ 、α ₄ 、α ₅ Are respectively set to 0.2, 0.2 and 0.2;

step S5, in the whole process of training the pedestrian re-identification network model, fine tuning is carried out on the pre-training network in a transfer learning mode, and a self-adaptive layer is added in the network;

step S51, after removing a top softmax layer of the obtained pre-training convolutional neural network, selecting a pedestrian image to input into the network, calculating scores of a plurality of layers of convolutional layers at the top of the convolutional neural network by using a trained classifier, fixing a network before the layer with the highest score, and fine-tuning the layer with the highest score and the later network layers;

in step S52, an adaptive layer is added in the fine-tuned network layer to obtain a better migration learning effect, so that the data distribution of the source domain and the target domain is closer, and a better effect is obtained by re-identifying the network by the pedestrian. Adding parameter learning of the multi-core MMD metric in training of a deep convolutional neural network, and measuring the difference between a source domain and a target domain, wherein the multi-core of the multi-core MMD metric is expressed as:

step S53, the optimization objective of the adaptive layer consists of a loss function and an adaptive loss, expressed as:

and S6, in the test stage, extracting the pedestrian global feature vector by using the trained network model only. Based on the trained network model in the steps, inputting and inquiring images of pedestrians and target pedestrians to obtain two pedestrian feature vectors with higher distinguishability;

step S61, the pedestrian global feature vector extracted by the deep convolutional neural network trained by the step S has higher distinguishability, so that only the pedestrian global feature vector is extracted in the test stage of the model;

step S62, inputting inquiry and target pedestrian images into a human body key point and posture correction network to correct human body posture;

and S7, calculating the similarity between the query pedestrian image and any one image in the target pedestrian image data set based on the pedestrian global feature vector, wherein the pedestrian with the highest similarity is considered to be the same pedestrian, and obtaining a pedestrian re-identification result.

Specifically, the invention takes the mark-1501 pedestrian database as a training set and a testing set, and the rank-1 of the pedestrian re-recognition method based on transfer learning and depth feature fusion is up to 85% and mAp to 60%. The pedestrian re-recognition method adopts the method of transition learning and deep fusion of the global features and the local features of the pedestrian during training, thereby greatly improving the accuracy of pedestrian re-recognition and further seeing the effectiveness of the method.

In summary, the network model with the pedestrian local and global features fused deeply is trained by adopting a transfer learning mode, in the training stage, a difficult-to-separate positive sample pair and a difficult-to-separate negative sample pair are selected from a pedestrian data set and input into a human skeleton key point detection network, fourteen key points are detected, and the pedestrian gesture is corrected and divided into three pedestrian image subregions by the fourteen key points; respectively inputting pedestrian training images containing a difficult-to-separate positive sample pair and a difficult-to-separate negative sample pair into a pre-training network, wherein each input sample is expanded into a pedestrian integral image and three pedestrian sub-area images, and local and global feature vectors of pedestrians are obtained; three pedestrian local feature vectors are input into a layer of full-connection layer to be fused with the pedestrian global feature vector, so that a depth feature fused pedestrian feature vector is obtained; the pre-training network is finely tuned in a transfer learning mode in the training process of the pedestrian re-recognition network, and an adaptive layer is added to the top layer of the pre-training network to complete the adaptation of the source domain and the target domain, so that the data distribution of the source domain and the target domain is more approximate, and the effect of the pedestrian re-recognition network is better; in the test stage, only the global feature network model is used, the query pedestrian image and the target pedestrian image are input into the global feature extraction network model to obtain two global feature vectors, and then the similarity of the query pedestrian and the target pedestrian is calculated to obtain the recognition result.

Claims

1. A pedestrian re-identification method based on transfer learning and depth feature fusion is characterized by comprising the following steps of: the method comprises the following steps:

(7) recognition result: based on the pedestrian global feature vector in the step (6), calculating the similarity between the query pedestrian and any one image in the target pedestrian data set, wherein the pedestrian with the highest similarity is considered to be the same pedestrian;

the step (1) is divided into the following steps:

the step (1.2) is divided into the following steps:

(1.2.4) defining a loss function in the deep convolutional neural network according to the predictive probability;

the step (2) is divided into the following steps:

(2.2) taking the image in each training batch as the anchor sample H _n Selecting a positive sample which is most difficult to separate

And a negative sample of the most difficult to separate +.>

Maximum (max)/(min)>

Minimum;

(2.4) dividing the pedestrian global image into three pedestrian local ROI images of the head, the upper body and the lower body according to fourteen human skeleton key points, and obtaining a corrected pedestrian global image and three pedestrian local images;

the step (6) is divided into the following steps:

(6.2) inputting the pedestrian image with the corrected human body posture into a pedestrian re-recognition network model to obtain a pedestrian global feature vector;

in the step (5), parameter learning of the multi-core MMD metric is added in training of the deep convolutional neural network, and the difference between the source domain and the target domain is measured, wherein the multi-core of the multi-core MMD metric is expressed as:

wherein phi () is a mapping for mapping the original variable to the regenerated kernel hilbert space, and H represents that the measurement distance is measured by mapping the data by phi () into the regenerated hilbert space RKHS;

n _a representing the set of all annotation data in the source and target domains, J () is a loss function.

2. The pedestrian re-recognition method based on transfer learning and depth feature fusion as claimed in claim 1, wherein: and in the training stage, inputting a pedestrian re-recognition network model, and adopting a triplet pedestrian image.

3. The pedestrian re-recognition method based on transfer learning and depth feature fusion as claimed in claim 1, wherein: the corrected global and local images obtained in the step (2) are a pedestrian image of a positive and negative sample triplet, a pedestrian global image corrected by human body posture and a local ROI image.

4. The pedestrian re-recognition method based on transfer learning and depth feature fusion as claimed in claim 1, wherein: in the step (2.2), the pre-trained deep convolutional neural network model in the step (1.1) is used for obtaining the anchor sample H _n The pedestrian image samples with the lowest scores are selected from the same pedestrian ID images to form a difficult-to-separate positive sample pair, and the difficult-to-separate positive sample pair is matched with an anchor sample H _n And selecting pedestrian image samples with highest scores from different pedestrian ID images to form a difficult-to-separate negative sample pair.

5. The pedestrian re-recognition method based on transfer learning and depth feature fusion as claimed in claim 1, wherein: the step (3) is divided into the following steps:

6. The pedestrian re-recognition method based on transfer learning and depth feature fusion as claimed in claim 1, wherein: the step (4) is divided into the following steps: