CN111738143A - Pedestrian re-identification method based on expectation maximization - Google Patents

Pedestrian re-identification method based on expectation maximization Download PDF

Info

Publication number
CN111738143A
CN111738143A CN202010567949.5A CN202010567949A CN111738143A CN 111738143 A CN111738143 A CN 111738143A CN 202010567949 A CN202010567949 A CN 202010567949A CN 111738143 A CN111738143 A CN 111738143A
Authority
CN
China
Prior art keywords
pedestrian
feature
features
input
expectation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010567949.5A
Other languages
Chinese (zh)
Other versions
CN111738143B (en
Inventor
周非
陈文峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202010567949.5A priority Critical patent/CN111738143B/en
Publication of CN111738143A publication Critical patent/CN111738143A/en
Application granted granted Critical
Publication of CN111738143B publication Critical patent/CN111738143B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a pedestrian re-identification method based on expectation maximization, and belongs to the field of computer vision application. Firstly, extracting intermediate features of input pedestrians by using a residual convolutional neural network ResNet50 as a backbone network for feature extraction; constructing an attention module, capturing correlation information among different regions by the characteristics through covariance operation in Non-Local operation in the module, and then performing attention sparse reconstruction on the characteristics by adopting an EM (effective electromagnetic radiation) algorithm, so that the redundancy degree of the characteristics is reduced in the process of mining latent variables in the characteristics, and the characterization capability of effective characteristic information is enhanced; and performing joint training on the network by adopting the triple loss function, the cross entropy loss function and the central loss function. The invention can capture the characteristics with stronger identification degree; and the redundancy degree of the features can be well reduced, an attention feature map with low-rank features is obtained, and the recognition rate is further improved.

Description

Pedestrian re-identification method based on expectation maximization
Technical Field
The invention belongs to the field of computer vision, and relates to a pedestrian re-identification method based on expectation maximization.
Background
Pedestrian Re-Identification (Re-ID) is also called cross-border tracking, is one of important research contents in the field of computer vision, and plays a vital role in the fields of video monitoring, intelligent security, pedestrian identity verification, human-computer interaction and the like. The pedestrian re-identification aims to retrieve pedestrian images of the same identity from a large-scale pedestrian gallery given one inquired pedestrian image in scenes of different visual angles, time and places shot by non-overlapping multiple cameras. Compared with face recognition, the pedestrian re-recognition scene is closer to the real environment, but is more easily influenced by illumination change, pedestrian posture change, background switching, different shooting angles and the like, and great challenges are brought to the pedestrian re-recognition.
At present, the research of pedestrian re-identification mainly comprises two ideas: a method of feature-based representation learning and a method of metric-based learning. The feature representation based learning approach treats pedestrian re-identification as a classification problem, i.e., classifying pedestrians of each identical ID into one class. Therefore, the main task of such a method is to learn more discriminative features from each ID pedestrian image, thereby reducing the difficulty of classification. The metric learning-based method measures semantic similarity between model-embedded features by mapping high-dimensional pedestrian images to low-dimensional feature spaces, so that the intra-class distance between features is reduced and the inter-class distance is increased. The traditional feature expression learning method describes features by manually designing feature descriptors, and features extracted based on deep learning have higher identification capability compared with manual features due to the development of deep learning in recent years. But the neural network treats the features obtained through the automatic learning of the hierarchical structure equally, in fact, the effects of different features on the pedestrian re-identification task are different, and for example, the correlation relation among feature areas has a gain on the feature characterization capability, which is often ignored by the common convolutional network.
Attention mechanisms can cause neural networks to reallocate computational resources, allocating them to more important tasks. On the task of re-identifying the pedestrians, the attention mechanism mainly focuses on capturing information which is meaningful to the task, enhances the characteristic capability of the features, and reduces interference caused by useless information such as background and shielding. The document "Hu, Jie et al, Squeeze-and-Excitation Networks [ J ]. IEEE Transactions on Pattern Analysis and machine insight, 2017" proposes the correlation between modeling feature channels, and screens out the feature of the channel with the largest response, which provides a certain idea for the development of the subsequent attention mechanism. Self-attention based approaches are also gaining increasing use in computer vision tasks. The self-attention mechanism represents the response at a location in the picture by focusing on all locations of the feature map and taking its weighted average in the embedding space. For example, the document "Xialong Wang et al, Non-Local neural networks in CVPR, pages 7794-.
To sum up, the problem that exists at present in pedestrian heavy identification technical field is: 1) in pedestrian re-identification, the image resolution of a data set is low, and the extracted characteristic representation force is insufficient, so that the re-identification precision is low; 2) in pedestrian re-identification, the extracted features are high-dimensional, and the classification boundary is too complex; 3) in pedestrian re-identification, although the network subjected to self-attention modeling can increase the area associated information, the redundancy degree of other characteristics is increased.
Disclosure of Invention
In view of the above, the present invention provides a pedestrian re-identification method based on expectation maximization, which introduces covariance as Non-Local operation of correlation operation, performs correlation modeling on each region of a feature map, and introduces an em (expectation maximization) algorithm to perform low-rank reconstruction on features, so as to maximally mine information with the most discriminative power, i.e., attention information, in redundant features, aiming at the problems of insufficient feature characterization capability and high redundancy, which are extracted by a neural network.
In order to achieve the purpose, the invention provides the following technical scheme:
a method of pedestrian re-identification based on expectation maximization, the method comprising the steps of:
s1: carrying out different preprocessing operations on input training and testing images;
s2: constructing a ResNet50 backbone network, dividing ResNet50 into four stages of Stage1-4, and sequentially extracting characteristic information from shallow to deep;
s3: constructing an attention module, wherein the input and output dimensions of the attention module are consistent, and the attention module can be inserted into the Stage-2 and Stage3 stages of ResNet50, and the attention module comprises two parts: the Non-Local operation and the EM algorithm which use the covariance as a correlation function carry out the operation of reconstructing the characteristics;
s4: after the backbone network ResNet50 extracts features, the network is split into two branches: global Branch and Local Branch, wherein the global Branch extracts the complete features of pedestrians, and the Local Branch extracts the features after feature erasing operation;
s5: respectively training the feature vectors of the training set extracted by the two branches by utilizing the triple loss function, the Cross Entropy Cross Engine loss function and the Center loss function;
s6: inputting the pedestrian image set of Gallery into a model trained in S5, thereby obtaining a pedestrian feature database, wherein each feature in the database corresponds to a unique pedestrian ID;
s7: and (4) inputting a Query image into the CNN model to obtain an input feature, performing similarity measurement on the feature and pedestrian features in the feature library in S6, sorting the features from large to small according to the similarity, and returning to the pedestrian images of the quantity specified by the user.
Optionally, the preprocessing operation in step S1 includes:
random horizontal flipping, i.e., flipping the input set of images with a given probability;
image rotation, namely rotating an input pedestrian image at a certain angle;
color enhancement, i.e., randomly altering the intensity of each channel of the input RGB image.
Optionally, in step S2, in two stages of Stage3 and Stage4 of the backbone network ResNet50, a hole convolution is performed to convolve the features, so as to obtain a larger feature map and obtain sufficient feature information.
Optionally, in step S3, the construction of the attention module is divided into two stages:
stage 1: performing Non-Local calculation on the input features, wherein the correlation is obtained by calculating the covariance among pixels, and a Non-Local core operator is as follows:
Figure BDA0002548202290000031
where x is the input feature map, f (·,) function calculates the correlation between pixel i and pixel j, g (x)j) The function calculates the mapping of the feature map on pixel j, C (x) represents the normalization coefficient, yiRepresenting the weighted average of all other pixels except the i pixel after the g function transformation, wherein the weight is a normalized similarity function;
and (2) stage: acquiring rich related information between the regions through second-order statistic covariance, bringing a part of high-redundancy characteristics, and performing sparse reconstruction on the redundancy characteristics by adopting an EM (effective noise) algorithm; the EM algorithm assumes X ═ X1,x2,…,xNIs the obtained feature information set, which is composed of N observation samples, each data point xiAll have corresponding potential information ziI.e. the most characteristic information of the force; { X, Z } is the complete data with a likelihood function of lnp (X, Z | θ), where θ is the set of all parameters in the model; in fact the knowledge of the underlying information in Z is derived from the posterior distribution p (X, Z | θ); the EM algorithm maximizes the likelihood of lnp (X, Z | θ) by two operations, expectation (E) and maximization of expectation (M);
E:Q(θ,θ(i))=EZ[lnp(X,Z|θ)|X,θ(i)]
=∑Zlnp(X,Z|θ)P(Z|X,θ(i))
wherein p (Z | X, theta)(i)) Is to estimate theta at the given characteristic information data X and the ith parameter(i)Implicit variable data Z, i.e. the probability distribution of the attention information; m, updating parameters by maximizing the expectation obtained in the step E to obtain a parameter estimation value theta of the (i + 1) th iteration(i+1)
Figure BDA0002548202290000032
Optionally, in step S4, after extracting features from the Global Branch, pooling each feature map into 2048 × 1 feature vectors through a Global average pooling layer GAP, and then reducing the feature vectors into 512 × 1 vectors through the features; the Local Branch adopts Batch DdropLock to erase the same region of each Batch of input features in a certain proportion, then a global maximum pooling layer GMP is used for replacing a global average pooling layer to generate 2048-dimensional maximum feature vectors, and the Local Branch features become 512-dimensional after dimensionality reduction.
Optionally, in step S5, a plurality of loss functions are jointly trained, and the triplet loss function minimizes the distance between any target sample and the positive sample and maximizes the distance between any target sample and the negative sample, where the formula is as follows:
Figure BDA0002548202290000041
wherein the content of the first and second substances,
Figure BDA0002548202290000042
representing the distance between the target sample and the positive sample,
Figure BDA0002548202290000043
representing the distance between the target sample and the negative sample, m being the threshold for loss of the triplet;
the cross entropy describes the distance between two probability distributions, and when the cross entropy is smaller, the two probability distributions are closer to each other, the formula is as follows:
Figure BDA0002548202290000044
wherein K belongs to {1,2, …, K } represents the pedestrian class output by the pedestrian re-identification network, p (K) represents the prediction probability of the input image belonging to the class K, and q (K) represents the actual probability;
the central loss function can reduce the distance between samples of the same type, so that the similarity of the samples is increased, and the formula is as follows:
Figure BDA0002548202290000045
wherein c is a sample class center; the final loss function is a weighted sum of the above three loss functions, namely:
Ltotal=LtripletiLidcLcenter
the invention has the beneficial effects that:
(1) the invention obtains the correlation among the characteristic areas through covariance operation, can bring rich second-order statistical information to the characteristics, and enhances the characterization force of the characteristics.
(2) According to the method, the characteristics are reconstructed by using an expectation maximization algorithm, attention information and model parameters are updated through E and M two-step multiple iteration in the reconstruction process, the characteristics are reconstructed by using the converged attention information and the model parameters after a convergence state is finally achieved, and the reconstructed characteristics have lower redundancy compared with the original characteristics;
(3) experimental results show that compared with the traditional space attention and channel attention, the method has higher re-identification precision.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
fig. 1 is a schematic general flow chart of a pedestrian re-identification system according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an attention module according to an embodiment of the present invention;
FIG. 3 is a visual attention diagram of the attention module of the present invention;
FIG. 4 is a CMC curve comparison graph of the algorithm of the present invention under Market1501, DukeMTMC data sets;
FIG. 5 is a graph comparing CMC curves under the CUHK03-labeled data set by the algorithm of the present invention;
FIG. 6 is a graph comparing CMC curves under CUHK 03-detected data set by the algorithm of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
Fig. 1 is a schematic flow chart of a pedestrian re-identification method based on an expectation-maximization algorithm according to an embodiment of the present invention, as shown in the figure, the method includes the following steps:
s1: different pre-processing operations are performed on the input training and test image sets.
In the step, the input training and testing data sets are three public pedestrian data sets including Market-1501, DukeMMC-reiD and CUHK 03.
The Market1501 data set is a rectangular frame containing 1501 different pedestrians and 32668 detected pedestrians photographed by 6 cameras at Qinghua university. 751 pedestrians in the training set and 12936 images; the test set had 750 people, and contained 19732 images. During testing, 3368 images containing 750 pedestrians are used as a query set to identify the correct pedestrian identity in the test set.
The DukeMMC-reID dataset contains 36411 pedestrian images of 1812 identities taken by 8 cameras. The training set contains 702 pedestrians with different identities, and 16522 training images; the test set contained 17661 test images; the query image set consists of 2228 images of another 702 identities.
The CUHK03 dataset contained 14097 images of 1467 identities. This dataset provides two kinds of bounding boxes, namely the manually labeled bounding box and the bounding box detected by the DPM, which are respectively labeled as two sets of "labeled" and "detected". 767 identity pedestrians are in the training set, and 700 identity pedestrians are in the testing set. CUHK03 has two testing protocols, and the present invention uses a new testing protocol, similar to Market-1501, which divides the data set into training set containing 767 pedestrians and testing set containing 700 pedestrians.
TABLE 1 summary of the experimental data set
Data set Time of day Number of pedestrians Number of images Camera number
Market1501 2015 1501 32668 6
DukeMTMC-reID 2017 1812 36441 6
CUHK03 2014 1467 13164 10
The present invention uses two evaluation criteria to evaluate the performance of the model across all data sets. The first evaluation criterion is a Cumulative Matching Characteristics (CMC) curve, which represents the probability value of finding the correct match among the first k Matching results. The CMC regards the pedestrian re-identification problem as a sorting problem, and is represented by Rank-k, and if the identification rate of the Rank-k is P, the probability that the correct target is k before the ranking result is P. The second evaluation criterion is the Mean Average Precision (mAP), which considers the pedestrian re-identification problem as the target retrieval problem. The mAP can evaluate the overall performance of the model.
S2: constructing a ResNet50 backbone network, dividing ResNet50 into four stages of Stage1-4, and sequentially extracting characteristic information from shallow to deep;
s3: constructing an attention module, wherein the input and output dimensions of the attention module are consistent, inserting the Stage-2 and Stage3 stages of ResNet50,
the specific steps are that the attention module comprises two stages of F and B, namely Non-Local operation using covariance as a correlation function and operation of reconstructing characteristics by an EM algorithm. Fig. 2 is a schematic structural diagram of an attention module according to an embodiment of the present invention. Since there is some correlation between various parts of the pedestrian's body, the F part introduces second order statistics covariance to capture the correlation between non-local regions in the feature map space.
Given input feature map X ∈ Rh×w×cH and w are the height and width of the characteristic diagram respectively, c represents the channel number of the characteristic diagram, and the space dimension of X is compressed to one dimension to become X ∈ Rhw×cThen, constructing two functions of theta (x) and g (x) through a 1 × 1 convolution, a batch normalization layer and a ReLU activation function, thereby obtaining two dimensions of
Figure BDA0002548202290000071
Wherein r is the feature channel reduction factor. The covariance matrix is then calculated using θ (x), and the formula is as follows:
Figure BDA0002548202290000072
wherein
Figure BDA0002548202290000073
I is an identity matrix and is a matrix of the identity,
Figure BDA0002548202290000074
has the dimension of hw × hw. will
Figure BDA0002548202290000075
Multiplying the scaling factor by the covariance matrix, then performing matrix multiplication with g (X) through a softmax function to obtain X':
Figure BDA0002548202290000076
considering that the second-order statistics is introduced in the F stage, a large amount of redundant feature information is brought, and negative effects are brought to the pedestrian re-identification task. For this reason, an Expectation Maximization (EM) algorithm is introduced to reconstruct the characteristics output by the F part by using a small number of characteristic descriptors, and the reconstructed characteristics have low rank characteristics.
The stage B consists of three steps, namely expectation (E) operation, maximization (M) operation and feature reconstruction operation, the EM algorithm is used for solving the maximum likelihood solution containing a hidden variable model, the hidden variable is taken as a mapping matrix Z, model parameters are K descriptors, and the feature map input by the stage B is X' ∈ Rhw×c/rThe initial value of the descriptor is u ∈ Rk×c/rStep E, updating the mapping matrix Z ∈ Rhw×k(attention is sought), as shown in the following equation:
Z=softmax(λX'(uT))
wherein, λ is taken as a hyper-parameter to control the distribution of Z, and the default value is 1.
M steps update the descriptor u (parameter), where u is calculated as the weighted average of X', and the kth descriptor is updated as:
Figure BDA0002548202290000077
and E and M alternately execute T steps until u and Z approximately converge. At this time, u and Z are used to re-estimate X' to obtain X ″, i.e.:
X”=Zu
finally reconstructed X "∈ Rhw×c/rThe number of channels is recovered by a convolution 1 × 1 and added to the most original feature map X to obtain X':
X”'=X+X”
the attention module pseudo code is as follows:
TABLE 2 attention Module Algorithm framework
Figure BDA0002548202290000081
S4: after the backbone network ResNet50 extracts features, the network is split into two branches: global Branch (global Branch) and Local Branch (Local Branch), wherein the global Branch extracts the complete features of pedestrians, and the Local Branch extracts the features after feature erasing operation.
S5: respectively training the training set feature vectors extracted by the two branches by utilizing a triple loss function, a Cross Entropy loss function and a Center loss function;
s6: inputting the pedestrian image set of Gallery into a model trained in S5, thereby obtaining a pedestrian feature database, wherein each feature in the database corresponds to a unique pedestrian ID;
s7: and (4) inputting a Query image into the CNN model to obtain an input feature, performing similarity measurement on the feature and pedestrian features in the feature library in S6, sorting the features from large to small according to the similarity, and returning to the pedestrian images of the quantity specified by the user.
Fig. 3 shows an attention feature diagram generated after iterative convergence of the EM algorithm in the attention module of the present invention, and it can be seen from the diagram that the EM algorithm can guide the attention of the model to the pedestrian through iteration, while ignoring the interference caused by the background information to some extent. And the feature descriptors mu generated in the iteration process are mutually orthogonal, so that the redundancy of the features can be reduced, and the highest accuracy can be obtained when the number K of the feature descriptors is 160 and the iteration number T is 3 through experimental verification.
The network performance of the invention is verified by experiments after adding no attention module, F and B and adding complete attention module, and the verification results are shown in tables 3 and 4.
TABLE 3 comparison of attention Module splitting experiments on DukeMTMC-reiD and Market501 data sets
Figure BDA0002548202290000091
TABLE 4 comparison of attention Module splitting experiments on CUHK03 dataset
Figure BDA0002548202290000092
From the two tables, the network precision is gained to a certain extent in both the F stage and the B stage, the average precision mean (mAP) and the first hit rate (rank1) are respectively improved by 1.1% and 1.0% after the F is added to the Duke MTMC-reiD data set, and this proves that the feature information extracted after the introduction of the covariance has stronger expression capability compared with the original feature. After separately introducing B and processing the characteristics by using the EM algorithm, the mAP and rank1 are improved by 1.7 percent and 1.5 percent compared with the original network, which shows that the characteristics reconstructed by using the EM algorithm have certain effectiveness on model optimization. Compared with the mAP and rank1 fused in the two stages of F and B, the mAP and rank1 are improved after being separated independently, and are respectively 78.8% and 89.4%. Similar to the results on the DukeMTMC-reiD dataset, the attention module on Market1501 and CUHK03 can also bring accuracy improvements to the underlying network.
FIGS. 4, 5, 6 show the recognition rates of the proposed attention module compared on the three data sets DukeMTMC-reiD, Market501, and CUHK 03. It can be seen from the figure that when the recognition rate of the attention module in both the F-stage and the B-stage is improved to different degrees, the accuracy of the improvement of the complete attention module after the two-stage fusion is the highest.
The above embodiments are provided only for illustrating the present invention and not for limiting the present invention, and those skilled in the art should make various changes or modifications without departing from the spirit and scope of the present invention.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (6)

1. A pedestrian re-identification method based on expectation maximization is characterized by comprising the following steps: the method comprises the following steps:
s1: carrying out different preprocessing operations on input training and testing images;
s2: constructing a ResNet50 backbone network, dividing ResNet50 into four stages of Stage1-4, and sequentially extracting characteristic information from shallow to deep;
s3: constructing an attention module, wherein the input and output dimensions of the attention module are consistent, and the attention module can be inserted into the Stage-2 and Stage3 stages of ResNet50, and the attention module comprises two parts: the Non-Local operation and the EM algorithm which use the covariance as a correlation function carry out the operation of reconstructing the characteristics;
s4: after the backbone network ResNet50 extracts features, the network is split into two branches: global Branch and Local Branch, wherein the Global Branch extracts the complete features of pedestrians, and the Local Branch extracts the features after feature erasing operation;
s5: respectively training the feature vectors of the training set extracted by the two branches by utilizing the triple loss function, the Cross Entropy Cross Engine loss function and the Center loss function;
s6: inputting the pedestrian image set of Gallery into a model trained in S5, thereby obtaining a pedestrian feature database, wherein each feature in the database corresponds to a unique pedestrian ID;
s7: and (4) inputting a Query image into the CNN model to obtain an input feature, performing similarity measurement on the feature and pedestrian features in the feature library in S6, sorting the features from large to small according to the similarity, and returning to the pedestrian images of the quantity specified by the user.
2. The expectation-maximization-based pedestrian re-identification method according to claim 1, wherein: in the step S1, the preprocessing operation includes:
random horizontal flipping, i.e., flipping the input set of images with a given probability;
image rotation, namely rotating an input pedestrian image at a certain angle;
color enhancement, i.e., randomly altering the intensity of each channel of the input RGB image.
3. The expectation-maximization-based pedestrian re-identification method according to claim 2, wherein: in step S2, the features are convolved by using hole convolution rules at two stages, namely Stage3 and Stage4, of the backbone network ResNet50, so as to obtain a larger feature map and obtain sufficient feature information.
4. The expectation-maximization-based pedestrian re-identification method according to claim 3, wherein: in step S3, the construction of the attention module is divided into two stages:
stage 1: performing Non-Local calculation on the input features, wherein the correlation is obtained by calculating the covariance among pixels, and a Non-Local core operator is as follows:
Figure FDA0002548202280000011
where x is the input feature map, f (·,) function calculates the correlation between pixel i and pixel j, g (x)j) The function calculates the mapping of the feature map on pixel j, C (x) represents the normalization coefficient, yiRepresenting the weighted average of all other pixels except the i pixel after the g function transformation, wherein the weight is a normalized similarity function;
and (2) stage: acquiring rich related information between the regions through second-order statistic covariance, bringing a part of high-redundancy characteristics, and performing sparse reconstruction on the redundancy characteristics by adopting an EM (effective noise) algorithm; the EM algorithm assumes X ═ X1,x2,…,xNIs the obtained feature information set, which is composed of N observation samples, each data point xiAll have corresponding potential information ziI.e. the most characteristic information of the force; { X, Z } is the complete data with a likelihood function of lnp (X, Z | θ), where θ is the set of all parameters in the model(ii) a In fact the knowledge of the underlying information in Z is derived from the posterior distribution p (X, Z | θ); the EM algorithm maximizes the likelihood of lnp (X, Z | θ) by two operations, expectation E and maximization of expectation M;
E:Q(θ,θ(i))=EZ[lnp(X,Z|θ)|X,θ(i)]
=∑Zlnp(X,Z|θ)P(Z|X,θ(i))
wherein p (Z | X, theta)(i)) Is to estimate theta at the given characteristic information data X and the ith parameter(i)Implicit variable data Z, i.e. the probability distribution of the attention information; m, updating parameters by maximizing the expectation obtained in the step E to obtain a parameter estimation value theta of the (i + 1) th iteration(i+1)
M:
Figure FDA0002548202280000021
5. The expectation-maximization-based pedestrian re-identification method according to claim 4, wherein: in step S4, after extracting features from the Global Branch, pooling each feature map into 2048 × 1 feature vectors through a Global average pooling layer GAP, and then reducing the feature vectors into 512 × 1 vectors through features; the Local Branch adopts Batch DdropLock to erase the same region of each Batch of input features in a certain proportion, then a global maximum pooling layer GMP is used for replacing a global average pooling layer to generate 2048-dimensional maximum feature vectors, and the Local Branch features become 512-dimensional after dimensionality reduction.
6. The expectation-maximization-based pedestrian re-identification method according to claim 5, wherein: in step S5, a plurality of loss functions are jointly trained, and the triplet loss function minimizes the distance between any target sample and the positive sample and maximizes the distance between any target sample and the negative sample, and the formula is as follows:
Figure FDA0002548202280000022
wherein the content of the first and second substances,
Figure FDA0002548202280000023
representing the distance between the target sample and the positive sample,
Figure FDA0002548202280000024
representing the distance between the target sample and the negative sample, m being the threshold for loss of the triplet;
the cross entropy describes the distance between two probability distributions, and when the cross entropy is smaller, the two probability distributions are closer to each other, the formula is as follows:
Figure FDA0002548202280000031
wherein K belongs to {1,2, …, K } represents the pedestrian class output by the pedestrian re-identification network, p (K) represents the prediction probability of the input image belonging to the class K, and q (K) represents the actual probability;
the central loss function can reduce the distance between samples of the same type, so that the similarity of the samples is increased, and the formula is as follows:
Figure FDA0002548202280000032
wherein c is a sample class center; the final loss function is a weighted sum of the above three loss functions, namely:
Ltotal=LtripletiLidcLcenter
CN202010567949.5A 2020-06-19 2020-06-19 Pedestrian re-identification method based on expectation maximization Active CN111738143B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010567949.5A CN111738143B (en) 2020-06-19 2020-06-19 Pedestrian re-identification method based on expectation maximization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010567949.5A CN111738143B (en) 2020-06-19 2020-06-19 Pedestrian re-identification method based on expectation maximization

Publications (2)

Publication Number Publication Date
CN111738143A true CN111738143A (en) 2020-10-02
CN111738143B CN111738143B (en) 2022-04-19

Family

ID=72651842

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010567949.5A Active CN111738143B (en) 2020-06-19 2020-06-19 Pedestrian re-identification method based on expectation maximization

Country Status (1)

Country Link
CN (1) CN111738143B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541453A (en) * 2020-12-18 2021-03-23 广州丰石科技有限公司 Luggage weight recognition model training and luggage weight recognition method
CN112580525A (en) * 2020-12-22 2021-03-30 南京信息工程大学 Case activity track monitoring method based on pedestrian re-identification
CN112597825A (en) * 2020-12-07 2021-04-02 深延科技(北京)有限公司 Driving scene segmentation method and device, electronic equipment and storage medium
CN112684427A (en) * 2020-12-15 2021-04-20 南京理工大学 Radar target identification method based on serial quadratic reinforcement training
CN112733590A (en) * 2020-11-06 2021-04-30 哈尔滨理工大学 Pedestrian re-identification method based on second-order mixed attention
CN112766353A (en) * 2021-01-13 2021-05-07 南京信息工程大学 Double-branch vehicle re-identification method for enhancing local attention
CN112784795A (en) * 2021-01-30 2021-05-11 深圳市心和未来教育科技有限公司 Quick face recognition and analysis equipment and system
CN112949406A (en) * 2021-02-02 2021-06-11 西北农林科技大学 Sheep individual identity recognition method based on deep learning algorithm
CN113065516A (en) * 2021-04-22 2021-07-02 中国矿业大学 Unsupervised pedestrian re-identification system and method based on sample separation
US11468697B2 (en) * 2019-12-31 2022-10-11 Wuhan University Pedestrian re-identification method based on spatio-temporal joint model of residual attention mechanism and device thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005760A (en) * 2015-06-11 2015-10-28 华中科技大学 Pedestrian re-identification method based on finite mixture model
CN108960140A (en) * 2018-07-04 2018-12-07 国家新闻出版广电总局广播科学研究院 The pedestrian's recognition methods again extracted and merged based on multi-region feature
CN110796026A (en) * 2019-10-10 2020-02-14 湖北工业大学 Pedestrian re-identification method based on global feature stitching
CN110852276A (en) * 2019-11-12 2020-02-28 智慧视通(杭州)科技发展有限公司 Pedestrian re-identification method based on multitask deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005760A (en) * 2015-06-11 2015-10-28 华中科技大学 Pedestrian re-identification method based on finite mixture model
CN108960140A (en) * 2018-07-04 2018-12-07 国家新闻出版广电总局广播科学研究院 The pedestrian's recognition methods again extracted and merged based on multi-region feature
CN110796026A (en) * 2019-10-10 2020-02-14 湖北工业大学 Pedestrian re-identification method based on global feature stitching
CN110852276A (en) * 2019-11-12 2020-02-28 智慧视通(杭州)科技发展有限公司 Pedestrian re-identification method based on multitask deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
邹嘉伟: "基于卷积神经网络的自动图像识别与标注", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
郑舟恒等: "基于测度矩阵正则化的行人重识别算法", 《计算机工程与设计》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11468697B2 (en) * 2019-12-31 2022-10-11 Wuhan University Pedestrian re-identification method based on spatio-temporal joint model of residual attention mechanism and device thereof
CN112733590A (en) * 2020-11-06 2021-04-30 哈尔滨理工大学 Pedestrian re-identification method based on second-order mixed attention
CN112597825A (en) * 2020-12-07 2021-04-02 深延科技(北京)有限公司 Driving scene segmentation method and device, electronic equipment and storage medium
CN112684427A (en) * 2020-12-15 2021-04-20 南京理工大学 Radar target identification method based on serial quadratic reinforcement training
CN112684427B (en) * 2020-12-15 2024-05-17 南京理工大学 Radar target recognition method based on serial secondary reinforcement training
CN112541453A (en) * 2020-12-18 2021-03-23 广州丰石科技有限公司 Luggage weight recognition model training and luggage weight recognition method
CN112580525A (en) * 2020-12-22 2021-03-30 南京信息工程大学 Case activity track monitoring method based on pedestrian re-identification
CN112580525B (en) * 2020-12-22 2023-05-23 南京信息工程大学 Case activity track monitoring method based on pedestrian re-identification
CN112766353A (en) * 2021-01-13 2021-05-07 南京信息工程大学 Double-branch vehicle re-identification method for enhancing local attention
CN112766353B (en) * 2021-01-13 2023-07-21 南京信息工程大学 Double-branch vehicle re-identification method for strengthening local attention
CN112784795A (en) * 2021-01-30 2021-05-11 深圳市心和未来教育科技有限公司 Quick face recognition and analysis equipment and system
CN112784795B (en) * 2021-01-30 2022-02-01 深圳市心和未来教育科技有限公司 Quick face recognition and analysis equipment and system
CN112949406A (en) * 2021-02-02 2021-06-11 西北农林科技大学 Sheep individual identity recognition method based on deep learning algorithm
CN113065516A (en) * 2021-04-22 2021-07-02 中国矿业大学 Unsupervised pedestrian re-identification system and method based on sample separation
CN113065516B (en) * 2021-04-22 2023-12-01 中国矿业大学 Sample separation-based unsupervised pedestrian re-identification system and method

Also Published As

Publication number Publication date
CN111738143B (en) 2022-04-19

Similar Documents

Publication Publication Date Title
CN111738143B (en) Pedestrian re-identification method based on expectation maximization
CN109961051B (en) Pedestrian re-identification method based on clustering and block feature extraction
Wu et al. A comprehensive study on cross-view gait based human identification with deep cnns
Shotton et al. Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context
Kusakunniran et al. Recognizing gaits across views through correlated motion co-clustering
Gnouma et al. Stacked sparse autoencoder and history of binary motion image for human activity recognition
Oliva et al. Scene-centered description from spatial envelope properties
US7957584B2 (en) Fast object detection for augmented reality systems
Jia et al. Visual tracking via coarse and fine structural local sparse appearance models
CN106897669B (en) Pedestrian re-identification method based on consistent iteration multi-view migration learning
Carmona et al. Human action recognition by means of subtensor projections and dense trajectories
CN109033978B (en) Error correction strategy-based CNN-SVM hybrid model gesture recognition method
CN113408492A (en) Pedestrian re-identification method based on global-local feature dynamic alignment
CN112464730B (en) Pedestrian re-identification method based on domain-independent foreground feature learning
CN113688894B (en) Fine granularity image classification method integrating multiple granularity features
Dong Optimal Visual Representation Engineering and Learning for Computer Vision
JPH09134432A (en) Pattern recognition method
Yang et al. Diffusion model as representation learner
CN111695455B (en) Low-resolution face recognition method based on coupling discrimination manifold alignment
Gao et al. Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition
Gao et al. Adaptive random down-sampling data augmentation and area attention pooling for low resolution face recognition
CN111414958B (en) Multi-feature image classification method and system for visual word bag pyramid
CN116030495A (en) Low-resolution pedestrian re-identification algorithm based on multiplying power learning
CN112818779B (en) Human behavior recognition method based on feature optimization and multiple feature fusion
Jampour et al. A joint mapping and synthesis approach for multiview facial expression recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant