CN109271895B - Pedestrian re-identification method based on multi-scale feature learning and feature segmentation - Google Patents

Pedestrian re-identification method based on multi-scale feature learning and feature segmentation Download PDF

Info

Publication number
CN109271895B
CN109271895B CN201811007656.0A CN201811007656A CN109271895B CN 109271895 B CN109271895 B CN 109271895B CN 201811007656 A CN201811007656 A CN 201811007656A CN 109271895 B CN109271895 B CN 109271895B
Authority
CN
China
Prior art keywords
pedestrian
feature
image
layer
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811007656.0A
Other languages
Chinese (zh)
Other versions
CN109271895A (en
Inventor
何立火
邢志伟
高新波
王智康
路文
李琪琦
张怡
钟炎喆
武天妍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201811007656.0A priority Critical patent/CN109271895B/en
Publication of CN109271895A publication Critical patent/CN109271895A/en
Application granted granted Critical
Publication of CN109271895B publication Critical patent/CN109271895B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion

Abstract

A pedestrian re-identification method based on multi-scale feature learning and feature segmentation mainly solves the problems that in the prior art, the representation is poor due to only two scales, and errors are caused by inaccurate human body feature extraction by human body part identification. The method comprises the following specific steps: (1) constructing a multi-scale feature learning module; (2) constructing a feature segmentation module; (3) constructing a feature learning network; (4) preprocessing a video containing a pedestrian; (5) training a feature learning network; (6) calculating a characteristic distance; (7) obtaining a matching image; the invention utilizes the multi-scale feature learning module to extract the multi-scale features of the pedestrian image, and utilizes the feature segmentation module to extract the local features of the global feature and the coarse and fine granularities, and the extracted features have good distinguishability and high robustness, so that the pedestrian re-identification achieves higher precision.

Description

Pedestrian re-identification method based on multi-scale feature learning and feature segmentation
Technical Field
The invention belongs to the technical field of image processing, and further relates to a pedestrian re-identification method based on multi-scale feature learning and feature segmentation in the technical field of image identification. The invention can be used for identifying whether the pedestrian images obtained by the monitoring videos of different cameras at different angles are the same pedestrian or not.
Background
With the continuous development of the current society, the social public safety becomes a hot topic of people's attention, a large number of monitoring cameras are installed in the public field, massive video data are generated every day, and the intelligent analysis of the data becomes a hot research topic. The pedestrian re-identification is to compare a certain target pedestrian appearing under the camera with all pedestrians under other cameras in the monitoring network, accurately and rapidly identify the target, find all target pedestrian images under all cameras, and then can realize the tracking and positioning of the target pedestrian across the cameras. The pedestrian re-identification is realized by judging whether the pedestrian is the same pedestrian target through comparison among pedestrian images, but because the angles of different cameras are different, the monitoring scene is complex, the background and the illumination are different, and the obtained postures, appearances and the like of the pedestrians are also different; in addition, the shielding of pedestrians or the shielding of pedestrians and other objects in different degrees brings great challenges to the re-identification of the pedestrians.
Shanghai university of transportation proposed a multi-scale feature-fused pedestrian comparison method in its patent document, "a multi-scale feature-fused pedestrian comparison method" (patent application No. 201410635897.5, application publication No. CN 104376334A). The method comprises the following implementation steps: establishing a pedestrian set; extracting color and contour features by using the low-scale image, and cascading and fusing to obtain low-scale features; carrying out semi-supervised SVM learning on the low-scale features, and carrying out first pedestrian comparison screening to obtain a candidate pedestrian set; calculating the similarity between each pedestrian in the screened candidate pedestrian set and the target pedestrian by utilizing the high-scale image and adopting a comparison algorithm based on local feature points; and superposing the pedestrian similarity on the two scales to obtain the final sorting result of the candidate pedestrians in the screened candidate pedestrian set. The method has the disadvantages that only two manually designed scale features of high scale and low scale are extracted, the scale is small, the representation is poor, the manually designed features are not high in universality, and targets are easily missed.
The chinese metrological institute has proposed a pedestrian re-identification method based on transfer learning in the patent document "pedestrian re-identification method based on transfer learning" (patent application No. 201510445055.8, application publication No. CN 105095870A). The method comprises the following steps: segmenting a pedestrian foreground, extracting pedestrian features, learning a source domain model, migrating and learning a target domain, and measuring pedestrian distance; firstly, selecting a pedestrian target in a video by using a GrabCont algorithm; then, dividing the human body into five regions of a head, left and right upper limbs and left and right legs by using a human body symmetry model, and extracting color, edge and texture characteristics; training and optimizing a neural network model by using pedestrian data of a source domain; on the basis of the model parameters, target domain data are utilized for transfer learning; and finally, comparing the pedestrians by using the neural network model improved by the target domain to obtain the sequencing of the pedestrian distance measurement, and finally obtaining the result of pedestrian re-identification. The method has the disadvantages that the postures of pedestrians are different in video monitoring, the accuracy of recognizing human body parts by using the human body symmetric model is difficult to guarantee, the extracted regional characteristics of the head, the limbs and the like of the human body are not accurate, and the accuracy of the pedestrian re-recognition result is low due to errors caused by the extracted regional characteristics.
Disclosure of Invention
The invention aims to provide a pedestrian re-identification method based on multi-scale feature learning and feature segmentation, aiming at the defects of the prior art.
The idea for realizing the purpose of the invention is to construct a multi-scale feature learning network to extract the multi-scale features of the pedestrian image, construct a feature segmentation module, further extract the local features of global and coarse granularity of the features under different scales by utilizing the module, and adaptively fuse the local features of global and coarse granularity under all scales, so that the extracted features are more distinguishable and more robust, thereby improving the algorithm precision.
The method comprises the following specific steps:
(1) constructing a multi-scale feature learning module:
(1a) an 11-layer multi-scale feature learning module is built, and the structure of the module is as follows in sequence: input layer → convolutional layer → max pooling layer → eight hourglass modules; the hourglass module is structurally provided with ten serially-connected residual blocks, wherein the output of a first residual block is connected with the input of a tenth residual block, the output of a second residual block is connected with the input of a ninth residual block, the output of a third residual block is connected with the input of an eighth residual block, and the output of a fourth residual block is connected with the input of a seventh residual block;
(1b) setting parameters of each module of the multi-scale feature learning module;
(2) constructing a feature segmentation module:
(2a) build eight 4 layers of characteristic and cut apart the module, its structure does in proper order: the characteristic segmentation layer → the global pooling fusion layer → the full convolution layer → the SoftMax classification layer;
(2b) the parameters of each layer of the characteristic segmentation module are set as follows: setting 1792 feature maps output by the pooling fusion layer and 256 output feature maps of the full convolution layer;
(3) constructing a feature learning network:
connecting the output of each hourglass module in the multi-scale feature learning module with the input of each feature segmentation module in a one-to-one manner, and connecting the outputs of the seventh, eighth, ninth and tenth residual blocks in each hourglass module with the input of each feature segmentation module in a four-to-one manner;
(4) preprocessing a video containing pedestrians:
(4a) extracting a continuous video image containing multiple pedestrians from video images shot by a camera, selecting a frame of image from each video image, cutting out an image of an area occupied by each pedestrian from each frame of video image, forming a pedestrian image set A by all the cut images, and uniformly setting the sizes of the pedestrian images in the pedestrian image set A to be 384 multiplied by 124 pixels;
(4b) marking all images of the same pedestrian in the pedestrian image set A as real labels of images of the same class of pedestrians, wherein each class at least comprises one pedestrian image, and forming a pedestrian image training set by all the images of the pedestrians with the real labels;
(4c) extracting a continuous video image containing multiple pedestrians from video images shot by a camera, selecting a frame of image from each video image, cutting out an image of an area occupied by each pedestrian from each frame of video image, forming a pedestrian image set B by all the cut images, and uniformly setting the sizes of the pedestrian images in the pedestrian image set B to be 384 multiplied by 124 pixels;
(4d) randomly selecting one pedestrian image from the pedestrian image set B as an inquiry target pedestrian image, and taking the rest pedestrian images in the pedestrian image set B as a candidate pedestrian image set;
(5) training a feature learning network:
(5a) inputting a pedestrian image training set into a feature learning network, taking the probability distribution output by the eighth feature segmentation module SoftMax classification layer as the prediction probability distribution of each pedestrian image, and taking the category to which the maximum value in the prediction probabilities belongs as the prediction label of the pedestrian image;
(5b) calculating the cross entropy of a predicted label of each pedestrian image in the pedestrian image training set and a corresponding real label of the pedestrian image by using a label smoothing cross entropy formula, and taking the sum of all cross entropies as a loss value of the feature learning network;
(5c) training a feature learning network by using a random gradient descent method;
(6) calculating the characteristic distance:
(6a) inputting the inquiry target pedestrian image and each pedestrian image in the candidate pedestrian image set into a feature learning network, and taking a feature mapping image output by a full convolution layer of an eighth feature segmentation module in the feature learning network as each pedestrian image feature;
(6b) calculating the characteristic distance between the image characteristic of the query target pedestrian and the image characteristic of each pedestrian in the candidate pedestrian set by utilizing an Euclidean distance formula;
(7) obtaining a matching image:
and sequencing the pedestrian images in the pedestrian candidate set according to the ascending order of the characteristic distance, and taking the first 20 images as matching images for pedestrian re-identification.
Compared with the prior art, the invention has the following advantages:
firstly, because the invention constructs a multi-scale feature learning module to extract the multi-scale features of the pedestrian image, the features of multiple scales can more fully represent the pedestrian image under different resolutions, and the problems of poor representation and low universality caused by artificial design features in the prior art caused by only two scales are solved, so that the invention has the advantages of good feature representation and high universality.
Secondly, the invention constructs a feature segmentation module to extract two local features with different granularities, namely global feature and thickness under different scales, fully utilizes global and local information of the pedestrian image, and overcomes the problem of error caused by inaccurate human body part identification in the prior art, so that the invention has the advantages of low complexity and high identification precision.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic view of an hourglass module of the present invention;
FIG. 3 is a simulation diagram of the present invention.
Detailed Description
The present invention will be described in further detail with reference to fig. 1.
The implementation steps of the present invention are described in further detail with reference to fig. 1.
Step 1, constructing a multi-scale feature learning module.
An 11-layer multi-scale feature learning module is built, and the structure of the module is as follows in sequence: input layer → convolutional layer → max pooling layer → eight hourglass modules; the hourglass module is structurally provided with ten serially-connected residual blocks, wherein the output of a first residual block is connected with the input of a tenth residual block, the output of a second residual block is connected with the input of a ninth residual block, the output of a third residual block is connected with the input of an eighth residual block, and the output of a fourth residual block is connected with the input of a seventh residual block;
the residual block is nine layers, and the structure thereof is as follows in sequence: the first batch normalization layer → the first ReLU layer → the first convolution layer → the second batch normalization layer → the second convolution layer → the third batch normalization layer → the third ReLU layer → the third convolution layer, the first batch normalization layer and the third convolution layer are connected to form the output of the residual block; the feature maps of the first convolutional layer in the residual block are set to 64, the convolutional kernel size is set to 3 × 3 pixels, the step size is set to 1 pixel, the feature map of the second convolutional layer in the residual block is set to 256, the convolutional kernel size is set to 1 × 1 pixel, the step size is set to 1 pixel, the feature map of the third convolutional layer in the residual block is set to 256, the convolutional kernel size is set to 3 × 3 pixels, and the step size is set to 1 pixel.
And setting parameters of each module of the multi-scale feature learning module.
The parameter setting of each module of the multi-scale feature learning network is as follows: setting the total number of convolutional layer feature mapping graphs in the multi-scale feature learning network to be 64, setting the size of a convolutional kernel to be 7 multiplied by 7 pixels, and setting the step size to be 2 pixels; the step size of the maximum pooling layer in the multi-scale feature learning network is set to 2 pixels.
And 2, constructing a feature segmentation module.
Build eight 4 layers of characteristic and cut apart the module, its structure does in proper order: feature segmentation layer → global pooling fusion layer → full convolution layer → SoftMax classification layer.
The parameters of each layer of the characteristic segmentation module are set as follows: the feature maps output by the pooling layer are set to 1792, and the output feature maps of the full convolution layer are set to 256.
And 3, constructing a feature learning network.
And connecting the output of each hourglass module in the multi-scale feature learning module with the input of each feature segmentation module in a one-to-one manner, and connecting the outputs of the seventh, eighth, ninth and tenth residual blocks in each hourglass module with the input of each feature segmentation module in a four-to-one manner.
The cascading of the hourglass modules and the feature segmenting modules of the present invention is described in further detail with reference to figure 2.
The ten rectangles in fig. 2 represent the ten residual blocks, which make up the hourglass-shaped hourglass module. One rounded rectangle in fig. 2 represents the feature segmentation module.
And taking the outputs of the seventh, eighth, ninth and tenth residual error blocks as the outputs of each hourglass module, wherein the output of each hourglass module is connected with the input of one characteristic segmentation module.
Inputting a feature mapping chart output by a seventh, eighth, ninth and tenth residual block in each hourglass module in a multi-scale feature learning module in a feature learning network into a feature segmentation module; dividing each feature mapping graph input into a feature segmentation layer in a feature segmentation module into 1 part, 2 parts and 4 parts horizontally and inputting the divided feature mapping graphs into a global pooling fusion layer; in the global pooling fusion layer, a global maximum pooling operation and a global average pooling operation are respectively used for the feature maps, and the feature maps output by the two pooling operations are added to serve as the output of the pooling fusion layer.
And 4, preprocessing the video containing the pedestrians.
Extracting a continuous video image containing multiple pedestrians from video images shot by a camera, selecting a frame of image from each video image, cutting out an image of an area occupied by each pedestrian from each frame of video image, forming a pedestrian image set A by all the cut images, and uniformly setting the sizes of the pedestrian images in the pedestrian image set A to be 384 multiplied by 124 pixels;
marking all images of the same pedestrian in the pedestrian image set A as real labels of images of the same class of pedestrians, wherein each class at least comprises one pedestrian image, and forming a pedestrian image training set by all the images of the pedestrians with the real labels;
extracting a continuous video image containing multiple pedestrians from video images shot by a camera, selecting a frame of image from each video image, cutting out an image of an area occupied by each pedestrian from each frame of video image, forming a pedestrian image set B by all the cut images, and uniformly setting the sizes of the pedestrian images in the pedestrian image set B to be 384 multiplied by 124 pixels;
randomly selecting one pedestrian image from the pedestrian image set B as an inquiry target pedestrian image, and taking the rest pedestrian images in the pedestrian image set B as a candidate pedestrian image set;
and 5, training a feature learning network.
Inputting the pedestrian image training set into a feature learning network, taking the probability distribution output by the eighth feature segmentation module SoftMax classification layer as the prediction probability distribution of each pedestrian image, and taking the category to which the maximum value in the prediction probabilities belongs as the prediction label of the pedestrian image.
And calculating the cross entropy of the predicted label of each pedestrian image in the pedestrian image training set and the corresponding real label of the pedestrian image by using a label smoothing cross entropy formula, and taking the sum of all cross entropies as the loss value of the feature learning network.
The cross entropy formula for label smoothing is as follows:
Figure BDA0001784315690000061
wherein L ishExpressing the cross entropy of a prediction label of the h-th pedestrian image in the pedestrian image training set and a corresponding real label of the pedestrian image, K expressing the total number of pedestrian image categories in the pedestrian image training set, sigma expressing the summation operation, K expressing the serial number of the pedestrian image categories in the pedestrian image training set, and epsilon expressing the value of 0.1Smoothing parameter, qh(k) Representing the probability that the h-th pedestrian image in the pedestrian image training set is a K (K is more than or equal to 1 and less than or equal to K) type real label, if the h-th pedestrian image real label is K (K is more than or equal to 1 and less than or equal to K), the real label probability value is 1, if the h-th pedestrian image real label is not K (K is more than or equal to 1 and less than or equal to K), the real label probability value is 0, log represents logarithm operation with 2 as the base, p ish(k) And the probability of the prediction label of the h-th pedestrian image in the pedestrian image training set as the K-th (K is more than or equal to 1 and less than or equal to K) class is represented.
And training the feature learning network by using a random gradient descent method.
And 6, calculating the characteristic distance.
And inputting the inquiry target pedestrian image and each pedestrian image in the candidate pedestrian image set into the feature learning network, and taking the feature mapping image output by the full convolution layer of the eighth feature segmentation module in the feature learning network as each pedestrian image feature.
And calculating the characteristic distance between the image characteristic of the query target pedestrian and the image characteristic of each pedestrian in the candidate pedestrian set by utilizing an Euclidean distance formula.
The euclidean distance formula is as follows:
Figure BDA0001784315690000062
wherein d (x, y) represents the distance of vector x from vector y in Euclidean space,
Figure BDA0001784315690000063
representing an evolution operation, n representing the dimension of the vector, Σ representing a summation operation, j representing the number of dimensions, xjRepresenting the value of the j-th dimension of the vector x, yjRepresenting the value of the jth dimension of the vector y.
And 7, obtaining a matching image.
And sequencing the pedestrian images in the pedestrian candidate set according to the ascending order of the characteristic distance, and taking the first 20 images as matching images for pedestrian re-identification.
The effect of the present invention will be further described with reference to simulation experiments.
1. Simulation conditions are as follows:
the hardware test platform of the simulation experiment of the invention is as follows: CPU is Intel (R) core (TM) i9-7900X, the main frequency is 3.3GHz, the memory is 32GB, and the GPU is NVIDIA 1080 Ti; the software platform is as follows: ubuntu 16.04 LTS.
2. Simulation content and result analysis:
the simulation experiment of the invention adopts the method of the invention to train and test the pedestrian re-identification model on two public data sets of Market-1501 and DukeMTMC-reiD, and the simulation result is shown in figure 3, wherein figure 3(a) is the image of the query target pedestrian selected during the simulation experiment test of the invention. Fig. 3(b) to (u) show 20 best matching images, where the 20 best matching images are the first 20 images with the smallest characteristic distance obtained by sorting all the images in the candidate pedestrian set in ascending order of the characteristic distance by using the method of the present invention, and the 20 images are the best matching images with the target pedestrian image. Wherein the pedestrian in 7 images in fig. 3(b) to (h) is the same pedestrian as the pedestrian in the query target pedestrian image, and is an accurate recognition result of pedestrian re-recognition, and the pedestrian in 13 images in fig. 3(i) to (u) is not the same pedestrian as the pedestrian in the query target pedestrian image.
In order to evaluate the accuracy of the pedestrian re-identification model obtained by the method of the present invention, the values of four points, Rank-1, Rank-5, Rank-10, and Rank-20, in the cumulative matching curve (CMC curve) were used for evaluation, and the obtained results are shown in table 1.
And the cumulative matching curve (CMC curve) Rank-t is the percentage of the pedestrian images which are the same as the target pedestrian in the first t best matching images in the pedestrian candidate set and all the pedestrian images which are the same as the target pedestrian.
TABLE 1 precision evaluation chart of pedestrian re-identification model
Figure BDA0001784315690000071
As can be seen from table 1, the pedestrian re-identification result of the present invention has higher accuracy on two data sets, which indicates that the present invention utilizes the multi-scale feature learning module to extract multi-scale features, and utilizes the feature segmentation module to extract two local features of global features and thick and thin different granularities under different scales, and the extracted features have good distinguishability and high robustness, so that the pedestrian re-identification achieves higher accuracy.

Claims (5)

1. A pedestrian re-identification method based on multi-scale feature learning and feature segmentation is characterized in that multi-scale features of a pedestrian image are extracted by using a constructed multi-scale feature learning network, and local features of global features and thick and thin different granularities under different scales are extracted by using a constructed feature segmentation module; the method comprises the following specific steps:
(1) constructing a multi-scale feature learning module:
(1a) an 11-layer multi-scale feature learning module is built, and the structure of the module is as follows in sequence: input layer → convolutional layer → max pooling layer → eight hourglass modules; the hourglass module is structurally provided with ten serially-connected residual blocks, wherein the output of a first residual block is connected with the input of a tenth residual block, the output of a second residual block is connected with the input of a ninth residual block, the output of a third residual block is connected with the input of an eighth residual block, and the output of a fourth residual block is connected with the input of a seventh residual block;
(1b) setting parameters of each module of the multi-scale feature learning module;
(2) constructing a feature segmentation module:
(2a) build eight 4 layers of characteristic and cut apart the module, its structure does in proper order: the characteristic segmentation layer → the global pooling fusion layer → the full convolution layer → the SoftMax classification layer; dividing each feature mapping graph input into a feature segmentation layer in a feature segmentation module into 1 part, 2 parts and 4 parts horizontally and inputting the divided feature mapping graphs into a global pooling fusion layer; in the global pooling fusion layer, respectively using global maximum pooling operation and global average pooling operation on the feature maps, and adding the feature maps output by the two pooling operations as the output of the pooling fusion layer;
(2b) the parameters of each layer of the characteristic segmentation module are set as follows: setting 1792 feature maps output by the pooling fusion layer and 256 output feature maps of the full convolution layer;
(3) constructing a feature learning network:
connecting the output of each hourglass module in the multi-scale feature learning module with the input of each feature segmentation module in a one-to-one manner, and connecting the outputs of the seventh, eighth, ninth and tenth residual blocks in each hourglass module with the input of each feature segmentation module in a four-to-one manner;
(4) preprocessing a video containing pedestrians:
(4a) extracting a continuous video image containing multiple pedestrians from video images shot by a camera, selecting a frame of image from each video image, cutting out an image of an area occupied by each pedestrian from each frame of video image, forming a pedestrian image set A by all the cut images, and uniformly setting the sizes of the pedestrian images in the pedestrian image set A to be 384 multiplied by 124 pixels;
(4b) marking all images of the same pedestrian in the pedestrian image set A as real labels of images of the same class of pedestrians, wherein each class at least comprises one pedestrian image, and forming a pedestrian image training set by all the images of the pedestrians with the real labels;
(4c) extracting a continuous video image containing multiple pedestrians from video images shot by a camera, selecting a frame of image from each video image, cutting out an image of an area occupied by each pedestrian from each frame of video image, forming a pedestrian image set B by all the cut images, and uniformly setting the sizes of the pedestrian images in the pedestrian image set B to be 384 multiplied by 124 pixels;
(4d) randomly selecting one pedestrian image from the pedestrian image set B as an inquiry target pedestrian image, and taking the rest pedestrian images in the pedestrian image set B as a candidate pedestrian image set;
(5) training a feature learning network:
(5a) inputting a pedestrian image training set into a feature learning network, taking the probability distribution output by the eighth feature segmentation module SoftMax classification layer as the prediction probability distribution of each pedestrian image, and taking the category to which the maximum value in the prediction probabilities belongs as the prediction label of the pedestrian image;
(5b) calculating the cross entropy of a predicted label of each pedestrian image in the pedestrian image training set and a corresponding real label of the pedestrian image by using a label smoothing cross entropy formula, and taking the sum of all cross entropies as a loss value of the feature learning network;
(5c) training a feature learning network by using a random gradient descent method;
(6) calculating the characteristic distance:
(6a) inputting the inquiry target pedestrian image and each pedestrian image in the candidate pedestrian image set into a feature learning network, and taking a feature mapping image output by a full convolution layer of an eighth feature segmentation module in the feature learning network as each pedestrian image feature;
(6b) calculating the characteristic distance between the image characteristic of the query target pedestrian and the image characteristic of each pedestrian in the candidate pedestrian set by utilizing an Euclidean distance formula;
(7) obtaining a matching image:
and sequencing the pedestrian images in the pedestrian candidate set according to the ascending order of the characteristic distance, and taking the first 20 images as matching images for pedestrian re-identification.
2. The pedestrian re-identification method based on multi-scale feature learning and feature segmentation as claimed in claim 1, wherein the residual blocks in step (1a) are nine layers, and the structure thereof is sequentially: the first batch normalization layer → the first ReLU layer → the first convolution layer → the second batch normalization layer → the second convolution layer → the third batch normalization layer → the third ReLU layer → the third convolution layer, the first batch normalization layer and the third convolution layer are connected to form the output of the residual block; the feature maps of the first convolutional layer in the residual block are set to 64, the convolutional kernel size is set to 3 × 3 pixels, the step size is set to 1 pixel, the feature map of the second convolutional layer in the residual block is set to 256, the convolutional kernel size is set to 1 × 1 pixel, the step size is set to 1 pixel, the feature map of the third convolutional layer in the residual block is set to 256, the convolutional kernel size is set to 3 × 3 pixels, and the step size is set to 1 pixel.
3. The pedestrian re-identification method based on multi-scale feature learning and feature segmentation as claimed in claim 1, wherein the parameters of each module of the multi-scale feature learning network in step (1b) are set as follows: setting the total number of convolutional layer feature mapping graphs in the multi-scale feature learning network to be 64, setting the size of a convolutional kernel to be 7 multiplied by 7 pixels, and setting the step size to be 2 pixels; the step size of the maximum pooling layer in the multi-scale feature learning network is set to 2 pixels.
4. The pedestrian re-identification method based on multi-scale feature learning and feature segmentation as claimed in claim 1, wherein the cross entropy formula of the label smoothing in step (5b) is as follows:
Figure FDA0003413418940000031
wherein L ishExpressing the cross entropy of a prediction label of the h-th pedestrian image in the pedestrian image training set and a corresponding real label of the pedestrian image, K expressing the total number of pedestrian image categories in the pedestrian image training set, sigma expressing the summation operation, K expressing the serial number of the pedestrian image categories in the pedestrian image training set, epsilon expressing a smoothing parameter with the value of 0.1, qh(k) Representing the probability that the h-th pedestrian image in the pedestrian image training set is a K-th real label, K is more than or equal to 1 and less than or equal to K, log represents logarithm operation with 2 as a base, and ph(k) And representing the probability that the h-th pedestrian image in the pedestrian image training set is a K-th class prediction label, wherein K is more than or equal to 1 and less than or equal to K.
5. The pedestrian re-identification method based on multi-scale feature learning and feature segmentation as claimed in claim 1, wherein the euclidean distance formula in step (6b) is as follows:
Figure FDA0003413418940000041
wherein d (x, y) represents the distance of vector x from vector y in Euclidean space,
Figure FDA0003413418940000042
representing an evolution operation, n representing the dimension of the vector, Σ representing a summation operation, j representing the number of dimensions, xjRepresenting the value of the j-th dimension of the vector x, yjRepresenting the value of the jth dimension of the vector y.
CN201811007656.0A 2018-08-31 2018-08-31 Pedestrian re-identification method based on multi-scale feature learning and feature segmentation Active CN109271895B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811007656.0A CN109271895B (en) 2018-08-31 2018-08-31 Pedestrian re-identification method based on multi-scale feature learning and feature segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811007656.0A CN109271895B (en) 2018-08-31 2018-08-31 Pedestrian re-identification method based on multi-scale feature learning and feature segmentation

Publications (2)

Publication Number Publication Date
CN109271895A CN109271895A (en) 2019-01-25
CN109271895B true CN109271895B (en) 2022-03-04

Family

ID=65154787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811007656.0A Active CN109271895B (en) 2018-08-31 2018-08-31 Pedestrian re-identification method based on multi-scale feature learning and feature segmentation

Country Status (1)

Country Link
CN (1) CN109271895B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902658A (en) * 2019-03-15 2019-06-18 百度在线网络技术(北京)有限公司 Pedestrian's characteristic recognition method, device, computer equipment and storage medium
CN109919246A (en) * 2019-03-18 2019-06-21 西安电子科技大学 Pedestrian's recognition methods again based on self-adaptive features cluster and multiple risks fusion
CN111723611A (en) * 2019-03-20 2020-09-29 北京沃东天骏信息技术有限公司 Pedestrian re-identification method and device and storage medium
CN110502964B (en) * 2019-05-21 2021-09-28 杭州电子科技大学 Unsupervised data-driven pedestrian re-identification method
US11048917B2 (en) * 2019-07-31 2021-06-29 Baidu Usa Llc Method, electronic device, and computer readable medium for image identification
CN110648291B (en) * 2019-09-10 2023-03-03 武汉科技大学 Unmanned aerial vehicle motion blurred image restoration method based on deep learning
CN110598654B (en) * 2019-09-18 2022-02-11 合肥工业大学 Multi-granularity cross modal feature fusion pedestrian re-identification method and re-identification system
CN110796643A (en) * 2019-10-18 2020-02-14 四川大学 Rail fastener defect detection method and system
CN110807434B (en) * 2019-11-06 2023-08-15 威海若维信息科技有限公司 Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination
CN111178178B (en) * 2019-12-16 2023-10-10 汇纳科技股份有限公司 Multi-scale pedestrian re-identification method, system, medium and terminal combined with region distribution
CN111582126B (en) * 2020-04-30 2024-02-27 浙江工商大学 Pedestrian re-recognition method based on multi-scale pedestrian contour segmentation fusion
CN111783576B (en) * 2020-06-18 2023-08-18 西安电子科技大学 Pedestrian re-identification method based on improved YOLOv3 network and feature fusion
CN111738172B (en) * 2020-06-24 2021-02-12 中国科学院自动化研究所 Cross-domain target re-identification method based on feature counterstudy and self-similarity clustering
CN111950346A (en) * 2020-06-28 2020-11-17 中国电子科技网络信息安全有限公司 Pedestrian detection data expansion method based on generation type countermeasure network
CN117612266B (en) * 2024-01-24 2024-04-19 南京信息工程大学 Cross-resolution pedestrian re-identification method based on multi-scale image and feature layer alignment

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9117147B2 (en) * 2011-04-29 2015-08-25 Siemens Aktiengesellschaft Marginal space learning for multi-person tracking over mega pixel imagery
CN104766096B (en) * 2015-04-17 2017-11-10 南京大学 A kind of image classification method based on multiple dimensioned global characteristics and local feature
CN105224937B (en) * 2015-11-13 2018-04-20 武汉大学 Fine granularity semanteme color pedestrian recognition methods again based on human part position constraint
CN106485253B (en) * 2016-09-14 2019-05-14 同济大学 A kind of pedestrian of maximum particle size structured descriptor discrimination method again
CN107330397B (en) * 2017-06-28 2020-10-02 苏州经贸职业技术学院 Pedestrian re-identification method based on large-interval relative distance measurement learning
CN107766791A (en) * 2017-09-06 2018-03-06 北京大学 A kind of pedestrian based on global characteristics and coarseness local feature recognition methods and device again
CN107657249A (en) * 2017-10-26 2018-02-02 珠海习悦信息技术有限公司 Method, apparatus, storage medium and the processor that Analysis On Multi-scale Features pedestrian identifies again
CN108427927B (en) * 2018-03-16 2020-11-27 深圳市商汤科技有限公司 Object re-recognition method and apparatus, electronic device, program, and storage medium

Also Published As

Publication number Publication date
CN109271895A (en) 2019-01-25

Similar Documents

Publication Publication Date Title
CN109271895B (en) Pedestrian re-identification method based on multi-scale feature learning and feature segmentation
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN106096561B (en) Infrared pedestrian detection method based on image block deep learning features
Sirmacek et al. Urban area detection using local feature points and spatial voting
CN106845341B (en) Unlicensed vehicle identification method based on virtual number plate
CN107480620B (en) Remote sensing image automatic target identification method based on heterogeneous feature fusion
CN110633708A (en) Deep network significance detection method based on global model and local optimization
CN106960176B (en) Pedestrian gender identification method based on transfinite learning machine and color feature fusion
CN103761531A (en) Sparse-coding license plate character recognition method based on shape and contour features
Zhang et al. Road recognition from remote sensing imagery using incremental learning
CN105184298A (en) Image classification method through fast and locality-constrained low-rank coding process
CN111639587B (en) Hyperspectral image classification method based on multi-scale spectrum space convolution neural network
CN106372624A (en) Human face recognition method and human face recognition system
CN108460400A (en) A kind of hyperspectral image classification method of combination various features information
CN108073940B (en) Method for detecting 3D target example object in unstructured environment
Li et al. Dating ancient paintings of Mogao Grottoes using deeply learnt visual codes
Bai et al. Multimodal information fusion for weather systems and clouds identification from satellite images
Tao et al. Smoke vehicle detection based on spatiotemporal bag-of-features and professional convolutional neural network
CN110991374B (en) Fingerprint singular point detection method based on RCNN
CN112329771A (en) Building material sample identification method based on deep learning
Tereikovskyi et al. The method of semantic image segmentation using neural networks
CN109034213A (en) Hyperspectral image classification method and system based on joint entropy principle
CN109635726B (en) Landslide identification method based on combination of symmetric deep network and multi-scale pooling
CN112990282B (en) Classification method and device for fine-granularity small sample images
CN106548195A (en) A kind of object detection method based on modified model HOG ULBP feature operators

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant