CN111177447B - Pedestrian image identification method based on depth network model - Google Patents
Pedestrian image identification method based on depth network model Download PDFInfo
- Publication number
- CN111177447B CN111177447B CN201911362901.4A CN201911362901A CN111177447B CN 111177447 B CN111177447 B CN 111177447B CN 201911362901 A CN201911362901 A CN 201911362901A CN 111177447 B CN111177447 B CN 111177447B
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- feature
- features
- pedestrian image
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
Abstract
The invention provides a pedestrian image identification method based on a depth network model, which comprises the following steps: carrying out data preprocessing on the pedestrian image; performing an adaptive sampling algorithm on the preprocessed data to obtain batches with more difficult samples; extracting multilayer features through a backbone network model, using sub-modules to enhance low-level features, performing scale reduction and splicing with high-level features to obtain the multilayer features, segmenting the multilayer features with different granularities to form a multi-branch structure, extracting component features and global features of each branch, and splicing all the extracted features to obtain depth representation of a pedestrian image; training the constructed network model; and extracting the depth representation of the query image through the trained network model, and returning the identification result of each query image according to the cosine distance similarity of each query image and the queried set. Through the multilayer and multi-granularity pedestrian re-identification depth model, the optimal pedestrian re-identification performance at the present stage is realized.
Description
Technical Field
The invention relates to the field of machine learning and computer vision, in particular to a pedestrian image identification method based on a depth network model.
Background
With the development of modern society, public safety gradually receives attention of people. A large number of surveillance camera systems are installed in places, such as shopping malls, apartments, schools, hospitals, office buildings, large squares and the like, which are dense in crowds and are easy to have public safety incidents, and the research on surveillance videos is concentrated and is particularly used for identifying visible objects, especially pedestrians. This is because pedestrians are generally the target of the monitoring system. More specifically, the task of the surveillance system is to search for a specific pedestrian in the surveillance video data, i.e. the task of pedestrian re-identification.
However, on one hand, the data volume of the surveillance video is often very huge, and on the other hand, it is very challenging to find a specific pedestrian in the massive surveillance video data due to the influence of the factors such as the light, the shelters, the wearing of the pedestrian, the shooting angle, the camera and the like of the environment where the pedestrian is located. However, monitoring through manual identification is not only high in cost, but also low in efficiency and poor in stability, and it is unrealistic to only rely on manual identification to re-identify pedestrians in the long run. Therefore, the monitoring video data of the public safety places are quickly analyzed, the specific pedestrians are automatically found, the monitoring quality can be obviously improved, and the method has important significance for city construction and social safety guarantee.
In the existing pedestrian re-identification method, the pedestrian re-identification method based on the depth model of the component has the most advanced performance, however, the depth model based on the component at the present stage is usually to segment the high-level features in the backbone network to obtain the component features, on one hand, the high-level features of the depth model have high coupling, and the simple segmentation of the high-level features can cause the loss of semantic information, so that the performance of the model is limited. On the other hand, although the semantic information of the low-level features of the depth model is weak, the low-level features are often weakly coupled, the robustness on segmentation is better, and the problem of semantic information loss caused by segmentation can be relieved by combining the high-level features and the low-level features.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem of the prior art, and provides a pedestrian image recognition method based on a depth network model to solve the problem of semantic information loss in the prior art of a pedestrian re-recognition method based on a depth model of a component. The invention comprises the following steps:
step 4, training the network model constructed in the step 3;
and 5, re-identifying the pedestrian.
The step 1 comprises the following steps:
step 1-1, adjusting the size of an input pedestrian image by using a bicubic interpolation method, adjusting the size of the pedestrian image to 3K multiplied by K for any channel of pedestrian images with different sizes, wherein K is generally 128 or 192, and for any point P (0,0) in the image, defining the relative coordinate of 16 points in the periphery of the image including the point P (r, c), wherein r is larger than or equal to 1 and smaller than or equal to 2, and c is larger than or equal to 1 and smaller than or equal to 2; where r and c respectively represent the offset of the abscissa and the offset of the ordinate, a negative value represents a leftward or upward offset, and a positive value represents a rightward or downward offset, e.g., P (0,1) is an adjacent point to the right of P (0, 0);
wherein P (0,0) represents a pixel point (x) in the target interpolation graph1,y1) (x) the closest mapping point in the original image1,y1) And the coordinate offset of P (0,0) is represented as (u, v), and the absolute coordinate of P (0,0) in the original image is represented as (i, j), the bicubic interpolation method is the sum of the above 16-point convolution interpolations, i.e., the following interpolation function F (i + u, j + v):
wherein x is1=i+u,y1J + c, f (i + r, j + c) represents the pixel value of any one point of the 16 points in the original image, and s (x) is a sampling formula, specifically, a sampling formula:
Wherein, a is a formula coefficient, and a common value can be-0.5;
step 1-2, by randomly horizontally flipping the pedestrian image: randomly horizontally flipping any channel of a pedestrian image with the size of 3 KxK with the probability P1 of 0 < P1 < 1 and a second arbitrary point (x) on the pedestrian image2,y2) Coordinates (x) of the symmetrical point after turning in the horizontal directionf,yf) Comprises the following steps:
(xf,yf)=(x2,3K-y2-1)
wherein (x)2,y2) Is the coordinate of a second arbitrary point in the pedestrian image, x is more than or equal to 02≤3K,0≤y2≤K;
Step 1-3, by randomly erasing the pedestrian image: randomly erasing a random area with the size of h multiplied by w according to the following random erasing function f () with the probability P2, 0 < P2 < 1 for any channel of a pedestrian image with the size of 3 KxK, and setting all pixel values of each channel in the random area as the pixel value mean value of the channel:
f(x3:x3+h,y3:y3+w)=m,
wherein (x)3,y3) X is more than or equal to 0 and is the coordinate of a third arbitrary point in the pedestrian image3≤3K,0≤y3K is less than or equal to K, and m is the pixel value mean value of each channel in the pedestrian image;
step 1-4, carrying out data standardization processing on data of each channel of the pedestrian image: carrying out data normalization and normalization processing on any channel of the pedestrian image with the size of 3 KxK according to the following normalization function f (x):
wherein x is the pixel value of any point under each channel of the pedestrian image obtained in the step 1-3, x is more than or equal to 0 and less than or equal to 255, mu is the mean value of the public data set ImageNet, and delta is the standard deviation of the public data set ImageNet.
The step 2 comprises the following steps:
step 2-1, counting an index list corresponding to a pedestrian image of each identity in a training set, wherein the pedestrian image in the training set is a training sample, defining a dictionary set of an unsampled sample index list as US, a set of correct classification of a model as TS, a set of incorrect classification of the model as FS, initializing TS and FS as empty, and US as a dictionary set formed by all current training samples;
step 2-2, performing dynamic sampling, and acquiring a batch consisting of P pedestrians and Q images corresponding to the P pedestrians from the training set under the current iteration turn, so that the identities of the P pedestrians are randomly sampled from a label list of the training set;
step 2-3, preferentially sampling and acquiring Q images from the US set for each pedestrian identity acquired in the step 2-2, if the US set is empty or the number of the pedestrian images with the residual corresponding identities is less than Q, sampling and complementing from the FS set, if the number of the pedestrian images is still insufficient, sampling and complementing from the TS set, and if the number of the pedestrian images is still insufficient, circulating the step 2-3 and repeatedly sampling until Q images are acquired;
step 2-4, after each iteration sampling, transferring the samples sampled in the current iteration round from the US set to the FS set, simultaneously transferring the samples correctly classified by the model from the FS set to the TS set, and transferring the samples wrongly classified by the model from the TS set to the FS set;
step 2-5, the step 2-3 and the step 2-4 are circulated until a batch with the size of P multiplied by Q is obtained by sampling;
the step 3 comprises the following steps:
step 3-1, constructing a network model for pedestrian re-identification, wherein the network model comprises a backbone network model and sub-modules;
extracting multilayer features through a backbone network model, namely extracting features of different depths, wherein the features of different depths comprise: first layer depth feature l1Second layer depth feature l2The third layer is deepDegree characteristic l3And a fourth layer depth feature l4(ii) a The backbone network model selects a classical classification network ResNet of an ImageNet data set;
the sub-modules comprise an enhancement module, a downscaling module, a reduction module and a maximum pooling layer module; defining a first layer depth feature l1And a second layer depth feature l2A low level feature, a third level depth feature3And a fourth layer depth feature l4Then it is a high-level feature;
when the first layer depth feature l1When the size of the second layer is C multiplied by H multiplied by W, the second layer depth feature l is obtained according to the backbone network model2Is 2 CxH/2 xW/2, the third layer depth feature l3Has a size of 4 CxH/4 xW/4, wherein C is the first layer depth feature l1H is the first layer depth characteristic l1W is the first layer depth characteristic l1Is wide;
step 3-2, respectively enhancing the depth characteristics l of the first layer by two enhancing modules1And a second layer depth feature l2The size of the first layer depth feature l is kept unchanged, and then the first layer depth feature l passes through the two downscaling modules1And a second layer depth feature l2Are reduced to 2 CxH/4 xW/4, respectively;
step 3-3, reducing the third layer depth characteristic l by the reduction module3The number of the channels is half of the original number, namely the size is reduced to 2 CxH/4 xW/4;
the downscaled first layer depth feature l is1And a reduced third layer depth feature l3Splicing according to the channel dimension to obtain a first multilayer depth feature l with the size of 2 CxH/4 xW/413;
The downscaled second layer depth feature l is2And a reduced third layer depth feature l3Splicing according to the channel dimension to obtain a second multilayer depth feature l with the size of 2 CxH/4 xW/423;
Step 3-4, the multilayer depth characteristic l obtained in the step 3-3 is processed13And l23And a backboneThird layer depth feature l in network model3Fourth layer depth feature l respectively accessed into backbone network model4A corresponding network layer forming the multi-branch structure, the global features including: first global feature l4-1Second global feature l4-2And a third global feature l4-3(ii) a First global feature l4-1By third layer depth features l in the backbone network model3Accessing fourth layer depth feature l4The corresponding network layer obtains a fourth layer depth feature equivalent to the backbone network model, and a second global feature l4-2By a 123Accessing fourth layer depth feature l4The corresponding network layer obtains the third global feature l4-3Then pass through13Accessing fourth layer depth feature l4Obtaining a corresponding network layer;
segmenting the global features into component features, including: the first global feature l is combined4-1Cutting into first part features with granularity of 1, and dividing the second global features l into first part features with granularity of 14-2Cutting into second part features with granularity of 2, and dividing the third global features l4-3A third part feature cut to a grain size of 3;
pooling the resolutions of the global features and the component features to 1 × 1 by using a maximum pooling layer module, further reducing the number of channels of the global features and the component features to F, where F is generally 9, the reduction module is a convolution kernel shared by 1 × 1 convolution layers, the size of each of the global features and the component features after reduction is F × 1 × 1, and a set formed by the component features after reduction is denoted as S;
and splicing all the reduced global features and the reduced component features to obtain the depth representation of the constructed pedestrian image, wherein the size is M multiplied by F, and M is the total number of the global features and the component features.
Step 4 comprises the following steps:
step 4-1, defining experiment related configuration: before training a network model on a training set, firstly defining a model optimizer for updating parameters; setting the size of the batch of the dynamic sampling in the step 2 to be P multiplied by Q, wherein P represents the number of the pedestrian identities included in each batch, and Q represents the number of the pedestrian images included in each pedestrian identity; finally, a learning rate scheduler is set; the training set is provided with a pedestrian identity label, and the number of the pedestrian identity label classes of the training set is recorded as Y;
step 4-2, optimizing each global feature in the step 3 respectively: averaging each global feature by a modified ternary loss function for the feature metric, LtripletComprises the following steps:
where G denotes the number of global features, G-3,an anchor sample representing the g-th global feature of the i-th pedestrian identity,a positive sample of the g-th global feature representing the identity of the ith pedestrian,a negative sample of the g global feature representing the identity of the j pedestrian; wherein alpha is a hyper-parameter for controlling the difference between the inter-class distance and the intra-class distance, alpha is more than 1.0 and less than 1.5, i is more than or equal to 1 and less than or equal to P, and a is more than or equal to 1 and less than or equal to Q;
step 4-3, optimizing each reduced component feature obtained in the step 3-4 by respectively using a cross entropy loss function of identity classification, wherein each component feature uses a linear classifier without bias items, the component features correspond to the linear classifiers one by one, and the cross entropy loss function L of the identity classificationidIs as follows;
wherein fcjDenotes the jth Linear classifier, fjqRepresenting the jth part characteristic fjJ is more than or equal to 1 and less than or equal to N, Q is more than or equal to 1 and less than or equal to PxQ of the vectors of the Q-th pedestrian image in a batch; n represents the total number of linear classifiers, i.e., the number of component features; 1r=yExpressing a one-hot coded vector with the length of the identity number of the pedestrian, wherein the index r of the one-hot element is equal to the identity true value y of the pedestrian image;
step 4-4, adding the cross entropy loss function and the improved ternary loss function to obtain a loss function L used in final training, which is as follows:
L=Ltriplet+Lid,
and 4-5, performing model training of the network model on the training set.
In steps 4-5, when model training of the network model is performed on the training set, the input is as follows: training set D; a pedestrian identity tag y; the iteration number T; a sampler S, an optimizer OPT, a learning rate scheduler LR; initialization parameter theta0The index 0 is the current iteration number, the initial model phi (x; theta)0) (ii) a The output is: model phi (x; theta)T) (ii) a The specific training process comprises the following steps:
step 4-5-1, loading a pre-training model theta on the public data set ImageNet0;
Step 4-5-2, the sampler S dynamically samples N from the training set D according to the configuration of step 3-1bIndividual preprocessed pedestrian imagexiRepresenting the ith pre-processed pedestrian image, where Nb=P×Q;
4-5-3, clearing the accumulated gradient by an optimizer OPT;
4-5-6, performing back propagation according to the loss value loss;
step 4-5-7, the optimizer OPT updates the model parameter thetatMeanwhile, the learning rate scheduler LR updates the learning rate;
and 4-5-8, circularly and iteratively executing the steps 4-5-2 to 4-5-7 until the iteration number reaches T.
The step 5 comprises the following steps:
step 5-1, loading the network model trained in the step 4, and extracting pedestrian images in a test set by using the network model, namely extracting the depth representations of the query images in the query set and the queried images in the queried set;
all global features and component features in the test set are stitched together as defined in steps 3-4, each feature of the test set being represented as:
wherein N istestRepresents the test set, θTRepresenting a parameter set when the iteration number is T;
the depth characterization of the final extracted pedestrian image is as follows:
step 5-2, eliminating the deviation between a training set and a test set in a pedestrian image data set, and representing the depth of the pedestrian imageAnd depth characterization of the flipped pedestrian imageAdditive, depth characterization of pedestrian images as test set
Step 5-3, normalizing the depth representation of the pedestrian image obtained in the step 5-2 by using a two-normThe two-norm is calculated according to the following formula:
the depth characterization of the pedestrian image normalized using the two-norm to obtain the final test set is as follows:
step 5-4, according to the depth characterization of the pedestrian images of the final test set, calculating the distance between each pedestrian image in the query set and each pedestrian image in the queried set, obtaining the query result of each pedestrian image in the query set, and realizing pedestrian re-identification;
step 5-4 comprises: if the depth of each pedestrian image in the query set is characterized asThe depth of each pedestrian image in the queried set is characterized asThe distance matrix between the query set and the queried set is:
wherein N isgalleryRepresenting a queried set, NqueryRepresenting a set of queries, MjiIs the element of the ith row and the jth column of the matrix; according to the sequence from small to large, each is searchedAnd sequencing the distances between the inquiry images and each pedestrian image in the inquired set to obtain the identification result of each inquiry image.
Has the advantages that:
in the prior art, the problem of semantic information loss exists in the depth model based on the component because of high coupling of high-level features, but by adopting the method disclosed by the invention, the problem of semantic information loss of the high-level features can be inhibited through the depth model based on multiple layers and multiple granularities, so that the pedestrian re-recognition performance of the depth model based on the component is improved, and the method for constructing the pedestrian depth representation, training the model and finally finishing the pedestrian re-recognition is carried out based on data preprocessing, so that the optimal pedestrian re-recognition performance in the current stage is realized.
Drawings
The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a schematic flow chart of a multi-layer multi-granularity pedestrian re-identification depth model according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a multi-layer multi-granularity pedestrian re-identification depth model according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a convolutional network structure of an enhancement module, a downscaling module and a reduction module in a multilayer multi-granularity pedestrian re-identification depth model according to an embodiment of the present invention, where a maximum pooling layer module is a basic network pooling layer;
fig. 4 is a diagram of an example of query results in a multi-layer multi-granularity pedestrian re-identification depth model according to an embodiment of the present invention.
Fig. 5 is a schematic diagram defining relative coordinates of 16 points including itself around any one point P00.
Detailed Description
The embodiment of the invention provides a pedestrian image identification method based on a deep network model, which is applied to rapidly analyzing monitoring video data of public security places, automatically finding out specific pedestrians, remarkably improving monitoring quality and having important significance on city construction and social security.
As shown in fig. 1, a schematic workflow diagram of a pedestrian image recognition method based on a depth network model provided in the embodiment of the present invention is provided, and the embodiment discloses a pedestrian image recognition method based on a depth network model, including:
Step 4, training the network model constructed in step 3, including: defining experiment related configuration, and optimizing model parameters of the network model, specifically, in this embodiment, optimizing the model parameters by combining a cross entropy loss function of identity classification and an improved ternary loss function for feature measurement. The loss function used in the final training is the sum of the average cross entropy loss function for each component and the average modified ternary loss function for each global feature.
And 5, re-identifying the pedestrian, comprising the following steps: under the condition that the identity of the pedestrians in the test set and the identity of the pedestrians in the training set are not repeated, extracting the depth representation of the query image through the network model trained in the step 4, normalizing the depth representation of the query image by using a two-norm method, and returning the identification result of each query image according to the similarity of each query image and the queried set based on the cosine distance. In the step, the pedestrian is re-identified under the condition that the pedestrian identity is not repeated, and the effectiveness of the model can be verified through the returned identification result.
In the modern society, the monitoring video data of public safety places are quickly analyzed, specific pedestrians are automatically found, the monitoring quality can be obviously improved, and the method has important significance for city construction and social safety. The invention provides a multilayer and multi-granularity pedestrian re-identification depth model and realizes the best pedestrian re-identification performance at the present stage.
In the following, the steps of the present invention are described in detail, and in the multi-layer and multi-granularity pedestrian re-identification depth model according to this embodiment, the step 1 includes:
step 1-1, adjusting the size of an input pedestrian image by using a bicubic interpolation method, adjusting the size of the pedestrian image to 3K × K for any channel of pedestrian images with different sizes, and defining relative coordinates of 16 points including the pedestrian image at any point P00, as shown in fig. 5; -1. ltoreq. r.ltoreq.2, -1. ltoreq. c.ltoreq.2; where r and c respectively represent the offset of the abscissa and the offset of the ordinate, a negative value represents a leftward or upward offset, and a positive value represents a rightward or downward offset, e.g., P (0,1) is an adjacent point to the right of P (0, 0);
where P00 represents a certain pixel point (x) in the target interpolation graph1,y1) The closest mapping point in the original image, (x)1,y1) And the coordinate offset of P00 is (u, v), and the absolute coordinate of P00 in the original image is (i, j), the bicubic interpolation method is the sum of the above 16-point convolution interpolations, that is, the following interpolation function:
here, x1=i+u,y1J + v, f (i + r, j + c) represents the pixel value of the original image at any point of the 16 points in the original image, and s (x) is a sampling formula, specifically:
wherein, a is a formula coefficient, and a common value can be-0.5;
step 1-2, performing data enhancement by randomly and horizontally overturning the pedestrian image, comprising: for any channel of a pedestrian image with the size of 3 KxK, the pedestrian image is randomly horizontally overturned by the probability P1, 0 < P1 < 1, in the embodiment, the probability P1 is 0.5 in the practical experiment, and the pedestrian image is provided with a second arbitrary point (x)2,y2) The coordinates of the flipped symmetry point with respect to the horizontal direction are:
(xf,yf)=(x2,3K-y2-1)
wherein (x)2,y2) Is the coordinate of a second arbitrary point in the pedestrian image, x is more than or equal to 02≤3K,0≤y2≤K。
Step 1-3, performing data enhancement by randomly erasing the pedestrian image, including: for any channel of a pedestrian image with the size of 3 KxK, the probability P2, 0 < P2 < 1 is adopted, in the embodiment, the probability P2 is 0.5 in an actual experiment, a random area with the size of h x w is randomly erased according to the following random erasing function, and the pixel value of each channel in the random area is set as the pixel value mean value of the channel:
f(x3:x3+h,y3:y3+w)=m
wherein (x)3,y3) X is more than or equal to 0 and is the coordinate of a third arbitrary point in the pedestrian image3≤3K,0≤y3K is less than or equal to K, and m is the pixel value mean value of each channel in the pedestrian image.
Step 1-4, carrying out data standardization processing on data of each channel of the pedestrian image, wherein the data standardization processing comprises the following steps: data normalization and normalization processing is performed on any channel of a pedestrian image of 3K × K in size according to the following normalization and normalization function:
wherein x is the pedestrian image obtained in the step 1-3, x is more than or equal to 0 and less than or equal to 255, mu is the mean value of the public data set ImageNet, and delta is the standard deviation of the public data set ImageNet. In this embodiment, the mean and variance of each channel are actually used on the ImageNet data set, specifically, the mean of each channel of RGB is 0.485, 0.456, 0.406, and the variance is 0.229, 0.224, 0.225.
step 2-1, counting an index list corresponding to a pedestrian image of each identity in a training set, defining a dictionary set of an index list of samples which are not sampled as US, a correctly classified set of models as TS, an incorrectly classified set of models as FS, initializing TS, FS as null, and US as a dictionary set formed by all current training samples;
step 2-2, performing dynamic sampling, and under the current iteration turn, acquiring a batch consisting of P pedestrians and Q images of each pedestrian from a training set, so that the identities of the P pedestrians are randomly sampled from a label list of the training set;
step 2-3, preferentially sampling and acquiring Q images from the US set for each pedestrian identity acquired in the step 2-2, if the US set is empty or the number of the pedestrian images with the residual corresponding identities is less than Q, sampling and complementing from the FS set, if the number of the pedestrian images is still insufficient, sampling and complementing from the TS set, and if the number of the pedestrian images is still insufficient, circulating the step and repeatedly sampling until Q images are acquired;
step 2-4, after each iteration sampling, transferring the samples sampled in the current iteration round from the US set to the FS set, simultaneously transferring the samples correctly classified by the model from the FS set to the TS set, and transferring the samples wrongly classified by the model from the TS set to the FS set;
step 2-5, the step 2-3 and the step 2-4 are circulated until a batch with the size of P multiplied by Q is obtained by sampling;
after that, a depth characterization of the pedestrian image needs to be constructed through step 3, in the multilayer and multi-granularity pedestrian re-identification depth model according to the embodiment, the step 3 includes:
step 3-1, extracting multilayer features through a backbone network model, wherein in this embodiment, the backbone network model refers to an existing basic deep convolutional neural network model, such as ResNet, VGG, and the like, and features of different depths can be extracted through a backbone network ResNet101, and the features of different depths include: first layer depth feature l1Second layer depth feature l2Third layer depth feature l3And a fourth layer depth feature l4In FIG. 2, the fourth layer depth feature l4Not shown, the sub-modules include an enhancement module, a downscaling module, a maximum pooling layer module, and a reduction module, and specifically, in fig. 2, an arrow labeled 0 indicates each layer of the backbone network, and the label is labeledThe arrow at 1 indicates the enhancement module, the arrow at 2 indicates the downscaling module, the arrow at 3 indicates the max pooling layer module, and the arrow at 4 indicates the reduction module.
As shown in fig. 3, a schematic diagram of a convolutional network structure of an enhancement module, a downscaling module and a reduction module in a multi-layer and multi-granularity pedestrian re-identification depth model provided in the embodiment of the present invention is provided, where Conv is a convolutional layer, the number after Conv is a convolutional kernel size of the convolutional layer, BatchNorm2d is a batch normalization layer, ReLU is a non-linear activation function layer, and a maximum pooling layer module is an existing common basic module, which is not given here.
In this embodiment, the step 3-2 includes: enhancing the first layer depth feature l by an enhancement module1And a second layer
Depth feature l2The characterization capability of (2). Scaling the first layer depth features l by two downscaling modules1And a second layer depth feature l2Is reduced to the reduced third layer depth feature l3Are consistent in size.
When the first layer depth feature l1When the size of (a) is C × H × W, in this embodiment, W is generally K/4, and H is generally 3W, and the second-layer depth feature l is obtained according to the backbone network model2Is 2 CxH/2 xW/2, the third layer depth feature l3Is 4 CxH/4 xW/4, reduced third layer depth feature l3Has a size of 2 CxH/4 xW/4, wherein C is the number of channels and H is the first layer depth feature l1Is 96 in this example, and W is the first layer depth feature l1Is 32 in this example.
Step 3-3, after passing through the two downscaling modules, the first layer depth feature l1And a second layer depth feature l2Is reduced to the reduced third layer depth feature l3The sizes of (A) and (B) are consistent, namely 2C multiplied by H/4 multiplied by W/4;
characterizing the first layer depth i1And a reduced third layer depth feature l3Splicing according to the channel dimension to obtain a template with the size of 4 CxH/4W/4 depth feature, dimension and third layer depth feature before reduction3Keeping consistent to obtain a first multilayer depth characteristic l with the size of 2 CxH/4 xW/413;
Characterizing the second layer depth i2And third layer depth characteristic l3Splicing according to the channel dimension to obtain the depth feature with the size of 4 CxH/4 xW/4, the size and the third layer depth feature l before reduction3Keeping consistent to obtain a second multilayer depth characteristic l with the size of 2 CxH/4 xW/423;
Step 3-4, the multilayer depth characteristic l obtained in the step 3-3 is processed13And l23And third layer depth characteristics l in the backbone network3Fourth layer depth feature l separately accessed in backbone network4A corresponding network layer forming the multi-branch structure, the global features including: first global feature l4-1Second global feature l4-2And a third global feature l4-3;
Segmenting the global features into component features, including: the first global feature l is combined4-1Cutting into first part features with granularity of 1, and dividing the second global features l into first part features with granularity of 14-2Cutting into second part features with granularity of 2, and dividing the third global features l4-3A third part feature cut to a grain size of 3;
reducing the number of channels of the global features and the component features to F further by using a reduction module, pooling the sizes of the global features and the component features to 1 × 1, wherein the reduction module is a shared convolution kernel of 1 × 1 convolution layer, the size of each reduced global feature and component feature is F × 1 × 1, and a set formed by the reduced component features is marked as S; specifically, in this embodiment, F is 256.
And splicing all the reduced global features and component features to obtain a depth representation of the constructed pedestrian image, wherein the size is M × F, M is the total number of the global features and the component features, and specifically, in this embodiment, M is 9.
In the multilayer and multi-granularity pedestrian re-identification depth model according to this embodiment, the step 4 includes:
step 4-1, defining relevant configuration of the experiment, comprising: before training the pedestrian re-recognition model on the training set, firstly defining a model optimizer for updating parameters, specifically, in the embodiment, using an Adam optimizer, loading parameters of the pedestrian re-recognition model constructed in the step 3, and using an AMSGrad method; the batch size of the input images is set to be P multiplied by Q, wherein P represents the number of pedestrian identities included in each batch, and Q represents the number of pedestrian images included in each pedestrian identity. Specifically, in this embodiment, P is 12, and Q is 4; finally, a learning rate scheduler is set; the training set is contained in an open pedestrian image data set, the training set is provided with pedestrian identity labels, and the number of the pedestrian identity label classes of the training set is recorded as Y. Specifically, in this embodiment, a multistep learning rate scheduler multistep lr is used, and when the training reaches a preset iteration time point, the learning rate is reduced to be twice the original gamma, in this embodiment, the gamma is 0.1, and an iteration time point is preset every 40 iterations.
Step 4-2, optimizing each global feature in the step 3 respectively, including: averaging each global feature by a modified ternary loss function for the feature metric, the modified ternary loss function being:
where G denotes the number of global features, G-3,an anchor sample representing the g-th global feature of the i-th pedestrian identity,a positive sample of the g-th global feature representing the identity of the ith pedestrian,g-th global feature representing jth pedestrian identityA negative sample of (d); wherein alpha is a hyper-parameter for controlling the difference between the inter-class distance and the intra-class distance, alpha is more than 1.0 and less than 1.5, i is more than or equal to 1 and less than or equal to P, a is more than or equal to 1 and less than or equal to Q, and in the embodiment, alpha is 1.2.
4-3, optimizing each reduced component feature obtained in the step 3-4 by using an identity-classified cross entropy loss function, in this embodiment, because identity classification needs to keep output dimensionality consistent with the number Y of pedestrian identity labels, a linear layer without a bias term needs to be added to each component feature, so that the component feature with dimensionality F sets the output dimensionality as Y through the linear layer, each component feature uses a linear classifier without a bias term, the component features correspond to the linear classifiers one to one, and the identity-classified cross entropy loss function is as follows;
wherein fcjDenotes the jth Linear classifier, fjqRepresenting the jth part characteristic fjThe vector of the qth pedestrian image in a batch, 1 ≦ j ≦ N, 1 ≦ Q ≦ PxQ, which represents the size of a batch, N representing the total number of linear classifiers, i.e., the number of component features, 1, as described in step 3-1r=yAnd the single-hot coded vector with the length of the identity number of the pedestrian is represented, wherein the index r of the single-hot element is equal to the identity true value y of the pedestrian image.
Step 4-4, adding the average cross entropy loss function of each component feature and the average improved ternary loss function of each global feature to obtain a loss function used in final training, as follows:
L=Ltriplet+Lid;
and 4-5, performing model training of the network model on the training set. The specific training algorithm is as follows:
inputting: training set D; a pedestrian identity tag y; the iteration number T; a sampler S, an optimizer OPT, a learning rate scheduler LR; initialization parameter theta0Subscript is currentNumber of iterations, initial model Φ (x; θ)0);
And (3) outputting: model phi (x; theta)T);
Step 4-5-1, loading a pre-training model theta on the public data set ImageNet0;
Step 4-5-2, the sampler S dynamically samples N from the training set D according to the configuration of step 3-1bIndividual preprocessed pedestrian imagexiRepresenting the ith pre-processed pedestrian image, where Nb=P×Q;
4-5-3, clearing the accumulated gradient by an optimizer OPT;
4-5-6, performing back propagation according to the loss value loss;
step 4-5-7, the optimizer OPT updates the model parameter thetatMeanwhile, the learning rate scheduler LR updates the learning rate;
step 4-5-8, circularly and iteratively executing the step 4-5-2 to the step 4-5-7 until the iteration number reaches T;
wherein, the parameter subscript number t in the model output by the training algorithm represents the current iteration number, and the batch size Nb=P×Q。
In the multilayer and multi-granularity pedestrian re-identification depth model according to this embodiment, the step 5 includes:
and 5-1, loading the network model trained in the step 4, and extracting the depth characterization of the pedestrian image in a test set, wherein the test set comprises a query set and a queried set, namely extracting the query image and the depth characterization of the queried image by using the model.
As defined in steps 3-4, all global features and component features in the test set are stitched together, each feature of the test set being represented as:
wherein N istestRepresents the test set, θTRepresenting a parameter set when the iteration number is T;
the depth characterization of the final extracted pedestrian image is as follows:
and 5-2, eliminating the deviation between a training set and a test set in the enhanced pedestrian image data set, remarkably changing data distribution due to random horizontal inversion of the training set, and representing the depth of the pedestrian image by considering the inverted pedestrian image during specific testAnd depth characterization of the flipped pedestrian imageAdditive, pedestrian depth characterization as test setSpecifically, in this embodiment, the flipping function is shown as step 1-2.
Step 5-3, normalizing the pedestrian depth characterization obtained in the step 5-2 by using a two-norm methodThe two-norm is calculated according to the following formula:
the pedestrian depth characterization using the two-norm normalization to obtain the final test set is:
step 5-4, according to the pedestrian depth representation of the final test set, calculating the distance between each pedestrian image in the query set and each pedestrian image in the queried set, obtaining the query result of each pedestrian image in the query set, and realizing pedestrian re-identification;
if the depth of each pedestrian image in the query set is characterized asThe depth of each pedestrian image in the queried set is characterized asThe distance matrix between the query set and the queried set is:
wherein N isgalleryRepresenting a queried set, NqueryRepresenting a set of queries;
the distances between each query image and each pedestrian image in all the queried sets are ranked according to the sequence from small to large, the smaller the distance between the pedestrian image in the queried set and the query image is, the higher the possibility that the pedestrian is the same is, and therefore the identification result of each query image can be obtained, and the first ten query results are generally taken for evaluation.
As shown in fig. 4, for the example diagram of the query result in the multilayer multi-granularity pedestrian re-identification depth model provided in the embodiment of the present invention, where √ represents a correct search, and √ represents an incorrect search, in each example query, the first row is the query result obtained in the present invention, and the second row is the query result of the classical component model PCB, it can be seen that the method of the present invention is significantly better than the classical component model PCB, and the best pedestrian re-identification performance at the present stage is achieved.
According to the technical scheme, the embodiment of the invention provides a multilayer and multi-granularity pedestrian re-identification depth model, which comprises the following steps: step 1, preprocessing pedestrian images in a pedestrian image data set, comprising: adjusting the size of the pedestrian image, enhancing data, and performing data normalization and standardization processing on the pedestrian image after data enhancement, wherein the pedestrian image data set comprises a training set, a query set and a queried set; (ii) a Step 3, constructing a network model for pedestrian re-identification, namely constructing a depth representation of the pedestrian image, and comprising the following steps: extracting multilayer features through a backbone network model, enhancing and fusing the multilayer features by using sub-modules to form a multi-branch structure, and extracting component features and global features of each branch; step 4, training the network model constructed in step 3, including: defining experiment related configuration, and optimizing model parameters of the network model; and 5, re-identifying the pedestrian, comprising the following steps: extracting the depth characterization of the query image through the network model trained in the step 4, normalizing the depth characterization of the query image by using a two-norm model, and returning the identification result of each query image according to the similarity of each query image and the queried set based on the cosine distance.
In the prior art, because a depth model based on a component often only segments high-level features with high coupling in a backbone network, semantic information beneficial to re-identification of the high-level features is lost due to segmentation, so that the performance of re-identification of pedestrians through the depth model of the component is unstable.
By adopting the method, the problem that semantic information is lost after high-level feature segmentation is solved through the depth features based on multiple layers and multiple granularities, so that the pedestrian re-recognition performance of the depth model based on the components is improved, the pedestrian depth characterization is constructed based on data preprocessing, the model is trained, the pedestrian re-recognition is finally completed, and the best pedestrian re-recognition performance in the current stage is realized.
In particular implementations, the present invention also provides a computer storage medium, where the computer storage medium may store a program that, when executed, may include some or all of the steps in embodiments of a multi-layered, multi-granular pedestrian re-identification depth model provided by the present invention. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The present invention provides a pedestrian image recognition method based on a deep network model, and the method and the way for implementing the technical solution are many, the above description is only the preferred embodiment of the present invention, it should be noted that, for those skilled in the art, without departing from the principle of the present invention, several improvements and embellishments can be made, and these improvements and embellishments should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.
Claims (3)
1. A pedestrian image identification method based on a depth network model is characterized by comprising the following steps:
step 1, preprocessing pedestrian images in a pedestrian image data set, wherein the pedestrian image data set comprises a training set and a testing set, the testing set comprises a query set and a queried set, the pedestrian identities in the testing set and the training set are not repeated, and the query set and the queried set have the same pedestrian identity;
step 2, dynamically sampling the preprocessed training set;
step 3, constructing a network model for pedestrian re-identification;
step 4, training the network model constructed in the step 3;
step 5, re-identifying the pedestrian;
the step 1 comprises the following steps:
step 1-1, adjusting the size of an input pedestrian image by using a bicubic interpolation method, adjusting the size of the pedestrian image to be 3K multiplied by K for any channel of pedestrian images with different sizes, and defining the relative coordinates of 16 points around the pedestrian image including the pedestrian image as P (r, c) for any point P (0,0) in the image, wherein r is more than or equal to 1 and less than or equal to 2, and c is more than or equal to 1 and less than or equal to 2; r, c respectively represent the offset of the abscissa and the offset of the ordinate, a negative value represents a leftward or upward offset, and a positive value represents a rightward or downward offset;
wherein P (0,0) represents a pixel point (x) in the target interpolation graph1,y1) (x) the closest mapping point in the original image1,y1) And the coordinate offset of P (0,0) is represented as (u, v), and the absolute coordinate of P (0,0) in the original image is represented as (i, j), the bicubic interpolation method is the sum of the above 16-point convolution interpolations, i.e., the following interpolation function F (i + u, j + v):
wherein x is1=i+u,y1J + v, f (i + r, j + c) represents the pixel value of any one point of the 16 points in the original image, and s (x) is a sampling formula, specifically:
wherein a is a formula coefficient;
step 1-2, passing random waterFlat flipping the pedestrian image: randomly horizontally flipping any channel of a pedestrian image with the size of 3 KxK with the probability P1 of 0 < P1 < 1 and a second arbitrary point (x) on the pedestrian image2,y2) Coordinates (x) of the symmetrical point after turning in the horizontal directionf,yf) Comprises the following steps:
(xf,yf)=(x2,3K-y2-1)
wherein (x)2,y2) Is the coordinate of a second arbitrary point in the pedestrian image, x is more than or equal to 02≤3K,0≤y2≤K;
Step 1-3, by randomly erasing the pedestrian image: randomly erasing a random area with the size of h multiplied by w according to the following random erasing function f () with the probability P2, 0 < P2 < 1 for any channel of a pedestrian image with the size of 3 KxK, and setting all pixel values of each channel in the random area as the pixel value mean value of the channel:
f(x3:x3+h,y3:y3+w)=m,
wherein (x)3,y3) X is more than or equal to 0 and is the coordinate of a third arbitrary point in the pedestrian image3≤3K,0≤y3K is less than or equal to K, and m is the pixel value mean value of each channel in the pedestrian image;
step 1-4, carrying out data standardization processing on data of each channel of the pedestrian image: carrying out data normalization and normalization processing on any channel of the pedestrian image with the size of 3 KxK according to the following normalization function f (x):
wherein x is the pixel value of any point under each channel of the pedestrian image obtained in the step 1-3, x is more than or equal to 0 and less than or equal to 255, mu is the mean value of the public data set ImageNet, and delta is the standard deviation of the public data set ImageNet;
the step 2 comprises the following steps:
step 2-1, counting an index list corresponding to a pedestrian image of each identity in a training set, wherein the pedestrian image in the training set is a training sample, defining a dictionary set of an unsampled sample index list as US, a set of correct classification of a model as TS, a set of incorrect classification of the model as FS, initializing TS and FS as empty, and US as a dictionary set formed by all current training samples;
step 2-2, performing dynamic sampling, and acquiring a batch consisting of P pedestrians and Q images corresponding to the P pedestrians from the training set under the current iteration turn, so that the identities of the P pedestrians are randomly sampled from a label list of the training set;
step 2-3, preferentially sampling and acquiring Q images from the US set for each pedestrian identity acquired in the step 2-2, if the US set is empty or the number of the pedestrian images with the residual corresponding identities is less than Q, sampling and complementing from the FS set, if the number of the pedestrian images is still insufficient, sampling and complementing from the TS set, and if the number of the pedestrian images is still insufficient, circulating the step 2-3 and repeatedly sampling until Q images are acquired;
step 2-4, after each iteration sampling, transferring the samples sampled in the current iteration round from the US set to the FS set, simultaneously transferring the samples correctly classified by the model from the FS set to the TS set, and transferring the samples wrongly classified by the model from the TS set to the FS set;
step 2-5, the step 2-3 and the step 2-4 are circulated until a batch with the size of P multiplied by Q is obtained by sampling;
the step 3 comprises the following steps:
step 3-1, constructing a network model for pedestrian re-identification, wherein the network model comprises a backbone network model and sub-modules;
extracting multilayer features through a backbone network model, namely extracting features of different depths, wherein the features of different depths comprise: first layer depth feature l1Second layer depth feature l2Third layer depth feature l3And a fourth layer depth feature l4(ii) a The backbone network model selects a classical classification network ResNet of an ImageNet data set;
the sub-modules comprise an enhancement module, a downscaling module, a reduction module and a maximum pooling layer module; defining a first layer depth feature l1And a second layer depth feature l2Is a low layer characteristicCharacterization, third layer depth feature l3And a fourth layer depth feature l4Then it is a high-level feature;
when the first layer depth feature l1When the size of the second layer is C multiplied by H multiplied by W, the second layer depth feature l is obtained according to the backbone network model2Is 2 CxH/2 xW/2, the third layer depth feature l3Has a size of 4 CxH/4 xW/4, wherein C is the first layer depth feature l1H is the first layer depth characteristic l1W is the first layer depth characteristic l1Is wide;
step 3-2, respectively enhancing the depth characteristics l of the first layer by two enhancing modules1And a second layer depth feature l2The size of the first layer depth feature l is kept unchanged, and then the first layer depth feature l passes through the two downscaling modules1And a second layer depth feature l2Are reduced to 2 CxH/4 xW/4, respectively;
step 3-3, reducing the third layer depth characteristic l by the reduction module3The number of the channels is half of the original number, namely the size is reduced to 2 CxH/4 xW/4;
the downscaled first layer depth feature l is1And a reduced third layer depth feature l3Splicing according to the channel dimension to obtain a first multilayer depth feature l with the size of 2 CxH/4 xW/413;
The downscaled second layer depth feature l is2And a reduced third layer depth feature l3Splicing according to the channel dimension to obtain a second multilayer depth feature l with the size of 2 CxH/4 xW/423;
Step 3-4, the multilayer depth characteristic l obtained in the step 3-3 is processed13And l23And third layer depth features l in the backbone network model3Fourth layer depth feature l respectively accessed into backbone network model4The corresponding network layer forms a multi-branch structure, and the global characteristics comprise: first global feature l4-1Second global feature l4-2And a third global feature l4-3(ii) a First global feature l4-1By third layer depth in the backbone network modelCharacteristic l3Accessing fourth layer depth feature l4The corresponding network layer obtains a fourth layer depth feature equivalent to the backbone network model, and a second global feature l4-2By a 123Accessing fourth layer depth feature l4The corresponding network layer obtains the third global feature l4-3Then pass through13Accessing fourth layer depth feature l4Obtaining a corresponding network layer;
segmenting the global features into component features, including: the first global feature l is combined4-1Cutting into first part features with granularity of 1, and dividing the second global features l into first part features with granularity of 14-2Cutting into second part features with granularity of 2, and dividing the third global features l4-3A third part feature cut to a grain size of 3;
pooling the resolutions of the global features and the component features to 1 × 1 by using a maximum pooling layer module, further reducing the number of channels of the global features and the component features to F by using a reduction module, wherein the reduction module is a shared convolution kernel of 1 × 1 convolution layer, the size of each reduced global feature and component feature is F × 1 × 1, and a set formed by the reduced component features is denoted as S;
splicing all the reduced global features and component features to obtain the depth representation of the constructed pedestrian image, wherein the size is M multiplied by F, and M is the total number of the global features and the component features;
step 4 comprises the following steps:
step 4-1, defining experiment related configuration: before training a network model on a training set, firstly defining a model optimizer for updating parameters; setting the size of the batch of the dynamic sampling in the step 2 to be P multiplied by Q, wherein P represents the number of the pedestrian identities included in each batch, and Q represents the number of the pedestrian images included in each pedestrian identity; finally, a learning rate scheduler is set; the training set is provided with a pedestrian identity label, and the number of the pedestrian identity label classes of the training set is recorded as Y;
step 4-2, optimizing each global feature in the step 3 respectively: averaging each global feature by a modified ternary penalty function for the feature metricLoss function LtripletComprises the following steps:
wherein G represents the number of global features,an anchor sample representing the g-th global feature of the i-th pedestrian identity,a positive sample of the g-th global feature representing the identity of the ith pedestrian,a negative sample of the g global feature representing the identity of the j pedestrian; wherein alpha is a hyper-parameter for controlling the difference between the inter-class distance and the intra-class distance, alpha is more than 1.0 and less than 1.5, i is more than or equal to 1 and less than or equal to P, and a is more than or equal to 1 and less than or equal to Q;
step 4-3, optimizing each reduced component feature obtained in the step 3-4 by respectively using a cross entropy loss function of identity classification, wherein each component feature uses a linear classifier without bias items, the component features correspond to the linear classifiers one by one, and the cross entropy loss function L of the identity classificationidIs as follows;
wherein fcjDenotes the jth Linear classifier, fjqRepresenting the jth part characteristic fjJ is more than or equal to 1 and less than or equal to N, Q is more than or equal to 1 and less than or equal to PxQ of the vectors of the Q-th pedestrian image in a batch; n represents the total number of linear classifiers, i.e., the number of component features; 1r=yExpressing a one-hot coded vector with the length of the identity number of the pedestrian, wherein the index r of the one-hot element is equal to the identity true value y of the pedestrian image;
step 4-4, adding the cross entropy loss function and the improved ternary loss function to obtain a loss function L used in final training, which is as follows:
L=Ltriplet+Lid,
4-5, performing model training of a network model on the training set;
in steps 4-5, when model training of the network model is performed on the training set, the input is as follows: training set D; a pedestrian identity tag y; the iteration number T; a sampler S, an optimizer OPT, a learning rate scheduler LR; initialization parameter theta0The index 0 is the current iteration number, the initial model phi (x; theta)0) (ii) a The output is: model phi (x; theta)T) (ii) a The specific training process comprises the following steps:
step 4-5-1, loading a pre-training model theta on the public data set ImageNet0;
Step 4-5-2, the sampler S dynamically samples N from the training set D according to the configuration of step 3-1bIndividual preprocessed pedestrian imagexiRepresenting the ith pre-processed pedestrian image, where Nb=P×Q;
4-5-3, clearing the accumulated gradient by an optimizer OPT;
4-5-6, performing back propagation according to the loss value loss;
step 4-5-7, the optimizer OPT updates the model parameter thetatMeanwhile, the learning rate scheduler LR updates the learning rate;
and 4-5-8, circularly and iteratively executing the steps 4-5-2 to 4-5-7 until the iteration number reaches T.
2. The method of claim 1, wherein step 5 comprises:
step 5-1, loading the network model trained in the step 4, and extracting pedestrian images in a test set by using the network model, namely extracting the depth representations of the query images in the query set and the queried images in the queried set;
all global features and component features in the test set are stitched together as defined in steps 3-4, each feature of the test set being represented as:
wherein N istestRepresents the test set, θTRepresenting a parameter set when the iteration number is T;
the depth characterization of the final extracted pedestrian image is as follows:
step 5-2, eliminating the deviation between a training set and a test set in a pedestrian image data set, and representing the depth of the pedestrian imageAnd depth characterization of the flipped pedestrian imageAdditive, depth characterization of pedestrian images as test set
Step 5-3, normalizing the result of step 5-2 using a two-normDepth characterization of pedestrian imagesThe two-norm is calculated according to the following formula:
the depth characterization of the pedestrian image normalized using the two-norm to obtain the final test set is as follows:
and 5-4, calculating the distance between each pedestrian image in the query set and each pedestrian image in the queried set according to the depth characterization of the pedestrian image in the final test set, obtaining the query result of each pedestrian image in the query set, and realizing pedestrian re-identification.
3. The method of claim 2, wherein steps 5-4 comprise: if the depth of each pedestrian image in the query set is characterized asThe depth of each pedestrian image in the queried set is characterized asThe distance matrix between the query set and the queried set is:
wherein N isgalleryRepresenting a queried set, NqueryRepresenting a set of queries, MjiElements representing the ith row and the jth column in the matrix; according to the order from small to largeAnd sequencing the distance between each query image and each pedestrian image in the query set to obtain the identification result of each query image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911362901.4A CN111177447B (en) | 2019-12-26 | 2019-12-26 | Pedestrian image identification method based on depth network model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911362901.4A CN111177447B (en) | 2019-12-26 | 2019-12-26 | Pedestrian image identification method based on depth network model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111177447A CN111177447A (en) | 2020-05-19 |
CN111177447B true CN111177447B (en) | 2021-04-30 |
Family
ID=70655664
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911362901.4A Active CN111177447B (en) | 2019-12-26 | 2019-12-26 | Pedestrian image identification method based on depth network model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111177447B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111783526B (en) * | 2020-05-21 | 2022-08-05 | 昆明理工大学 | Cross-domain pedestrian re-identification method using posture invariance and graph structure alignment |
CN111882548A (en) * | 2020-07-31 | 2020-11-03 | 北京小白世纪网络科技有限公司 | Method and device for counting cells in pathological image based on deep learning |
CN112926569B (en) * | 2021-03-16 | 2022-10-18 | 重庆邮电大学 | Method for detecting natural scene image text in social network |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101865A (en) * | 2018-05-31 | 2018-12-28 | 湖北工业大学 | A kind of recognition methods again of the pedestrian based on deep learning |
CN110008842A (en) * | 2019-03-09 | 2019-07-12 | 同济大学 | A kind of pedestrian's recognition methods again for more losing Fusion Model based on depth |
CN110046553A (en) * | 2019-03-21 | 2019-07-23 | 华中科技大学 | A kind of pedestrian weight identification model, method and system merging attributive character |
CN110059616A (en) * | 2019-04-17 | 2019-07-26 | 南京邮电大学 | Pedestrian's weight identification model optimization method based on fusion loss function |
CN110096947A (en) * | 2019-03-15 | 2019-08-06 | 昆明理工大学 | A kind of pedestrian based on deep learning recognizer again |
CN110188611A (en) * | 2019-04-26 | 2019-08-30 | 华中科技大学 | A kind of pedestrian recognition methods and system again introducing visual attention mechanism |
CN110472591A (en) * | 2019-08-19 | 2019-11-19 | 浙江工业大学 | It is a kind of that pedestrian's recognition methods again is blocked based on depth characteristic reconstruct |
-
2019
- 2019-12-26 CN CN201911362901.4A patent/CN111177447B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101865A (en) * | 2018-05-31 | 2018-12-28 | 湖北工业大学 | A kind of recognition methods again of the pedestrian based on deep learning |
CN110008842A (en) * | 2019-03-09 | 2019-07-12 | 同济大学 | A kind of pedestrian's recognition methods again for more losing Fusion Model based on depth |
CN110096947A (en) * | 2019-03-15 | 2019-08-06 | 昆明理工大学 | A kind of pedestrian based on deep learning recognizer again |
CN110046553A (en) * | 2019-03-21 | 2019-07-23 | 华中科技大学 | A kind of pedestrian weight identification model, method and system merging attributive character |
CN110059616A (en) * | 2019-04-17 | 2019-07-26 | 南京邮电大学 | Pedestrian's weight identification model optimization method based on fusion loss function |
CN110188611A (en) * | 2019-04-26 | 2019-08-30 | 华中科技大学 | A kind of pedestrian recognition methods and system again introducing visual attention mechanism |
CN110472591A (en) * | 2019-08-19 | 2019-11-19 | 浙江工业大学 | It is a kind of that pedestrian's recognition methods again is blocked based on depth characteristic reconstruct |
Non-Patent Citations (2)
Title |
---|
CLUSTERING AND DYNAMIC SAMPLING BASED UNSUPERVISED DOMAIN ADAPTATION FOR PERSON RE-IDENTIFICATION;Jinlin Wu,Zhen Lei等;《2019 IEEE International Conference on Multimedia and Expo》;20190712;886-888 * |
Learning Discriminative Features with Multiple Granularities for Person Re-Identification;Guanshuo Wang等;《Proceedings of the 26th ACM international conference on Multimedia》;20181226;274-282 * |
Also Published As
Publication number | Publication date |
---|---|
CN111177447A (en) | 2020-05-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111080628B (en) | Image tampering detection method, apparatus, computer device and storage medium | |
Xie et al. | Multilevel cloud detection in remote sensing images based on deep learning | |
CN110443143B (en) | Multi-branch convolutional neural network fused remote sensing image scene classification method | |
US11200424B2 (en) | Space-time memory network for locating target object in video content | |
CN109754015B (en) | Neural networks for drawing multi-label recognition and related methods, media and devices | |
EP3388978B1 (en) | Image classification method, electronic device, and storage medium | |
US8503792B2 (en) | Patch description and modeling for image subscene recognition | |
EP2701098B1 (en) | Region refocusing for data-driven object localization | |
US8705866B2 (en) | Region description and modeling for image subscene recognition | |
Jung et al. | A unified spectral-domain approach for saliency detection and its application to automatic object segmentation | |
CN111177447B (en) | Pedestrian image identification method based on depth network model | |
US10262214B1 (en) | Learning method, learning device for detecting lane by using CNN and testing method, testing device using the same | |
US8503768B2 (en) | Shape description and modeling for image subscene recognition | |
CN110909618B (en) | Method and device for identifying identity of pet | |
US10275667B1 (en) | Learning method, learning device for detecting lane through lane model and testing method, testing device using the same | |
CN111709313B (en) | Pedestrian re-identification method based on local and channel combination characteristics | |
CN107784288A (en) | A kind of iteration positioning formula method for detecting human face based on deep neural network | |
CN108154133B (en) | Face portrait-photo recognition method based on asymmetric joint learning | |
CN107862680B (en) | Target tracking optimization method based on correlation filter | |
KR20200027887A (en) | Learning method, learning device for optimizing parameters of cnn by using multiple video frames and testing method, testing device using the same | |
CN114332544B (en) | Image block scoring-based fine-grained image classification method and device | |
CN109472733A (en) | Image latent writing analysis method based on convolutional neural networks | |
CN110781817B (en) | Pedestrian re-identification method for solving component misalignment | |
Srinagesh et al. | A modified shape feature extraction technique for image retrieval | |
AU2009347563A1 (en) | Detection of objects represented in images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |