CN111783526A

CN111783526A - Cross-domain pedestrian re-identification method using posture invariance and graph structure alignment

Info

Publication number: CN111783526A
Application number: CN202010434344.9A
Authority: CN
Inventors: 李华锋; 庞健; 严双林; 欧洋汛; 张亚飞; 余正涛
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2020-05-21
Filing date: 2020-05-21
Publication date: 2020-10-16
Anticipated expiration: 2040-05-21
Also published as: CN111783526B

Abstract

The invention provides a cross-domain pedestrian re-identification method by utilizing posture invariance and graph structure alignment, belonging to the field of computer vision. The invention provides a dictionary learning algorithm based on matrix decomposition to eliminate the influence of domain information and pedestrian attitude information among data sets on cross-domain pedestrian re-identification. Specifically, the method is divided into two parts: (1) decomposing original visual features into attitude invariant components, domain information components and interference information components based on the idea of matrix decomposition, and aiming at extracting visual components which are not influenced by domain information and pedestrian attitude information; (2) in order to further improve the generalization capability of the model, the relation between the posture invariant feature and the semantic attribute is established by introducing hypergraph structure alignment constraint so as to accurately predict the pedestrian attribute of the target data set at the later stage, and finally the pedestrian similarity measurement can be carried out by combining the posture invariant feature and the semantic attribute of the pedestrian so as to further improve the recognition performance.

Description

Cross-domain pedestrian re-identification method using posture invariance and graph structure alignment

Technical Field

The invention relates to a cross-domain pedestrian re-identification method by utilizing posture invariance and graph structure alignment, belonging to the field of computer vision.

Background

With the rapid development of artificial intelligence, it is a need to apply pedestrian re-identification technology based on high-dimensional features to real life. Therefore, the scholars at home and abroad make a series of great research progresses in the aspect of pedestrian re-identification, and a plurality of methods are developed. Some methods design discriminative artifact features robust to changes in illumination, viewing angle, etc. for a target data set, or cluster unmarked target data. However, the performance of this kind of method is poor, mainly because the target data has no label, and the model is very difficult to mine the discriminant information. Some more advanced approaches view pedestrian re-identification as an unsupervised domain adaptation problem, which focuses on source domain to target domain knowledge migration. Compared to traditional unsupervised domain adaptation methods, pedestrian labels are completely different in the source domain and the target domain, and therefore the challenge is greater. Such methods still suffer from poor performance compared to supervised methods.

Disclosure of Invention

The invention aims to provide a cross-domain pedestrian re-recognition method by utilizing posture invariance and graph structure alignment, which is used for solving the problem that the existing pedestrian re-recognition algorithm is difficult to deploy; introducing an effective hypergraph structure alignment constraint, establishing a conversion relation between the posture invariant feature and the semantic attribute, and fully combining the advantages of the posture invariant feature and the semantic attribute to carry out joint measurement, wherein the specific flow is shown in figure 1. Compared with the existing method, the method can perform cross-domain re-recognition task, namely, the trained model is deployed to a brand-new camera network for pedestrian recognition.

A cross-domain pedestrian re-recognition method using posture invariance and graph structure alignment comprises the following steps:

1) defining data set variables and characteristics and attributes of pedestrians;

2) a design feature decomposition module for determining a target function containing a posture invariant component dictionary, a domain information component dictionary, an interference component dictionary and a conversion matrix;

3) designing a hypergraph structure alignment module by utilizing semantic attribute information;

4) designing a domain adaptation module capable of reducing domain offset;

5) merging the proposed loss functions into a final optimization function;

6) obtaining a dictionary and a conversion matrix by using an alternative optimization algorithm, thereby further obtaining a target domain data coding coefficient;

7) predicting the identity and the attribute of the pedestrian through the target domain coding coefficient;

8) and calculating the similarity between the pedestrians by using the cosine similarity and combining the predicted identity and the attribute.

The method comprises the following specific steps:

step 1, defining that K pedestrians exist in a source data set,

wherein

Representing the ith pedestrian feature of the source domain s, d representing the feature dimension,

representing the ith pedestrian attribute, c represents the attribute dimension,

indicates the i-th pedestrian label, N_sIndicating the number of samples. X_s，A_s，Y_sRespectively representing a source domain feature set, a source domain attribute set and a source domain label set. Defining a target dataset

Contains N in total_tThe number of the samples is one,

the ith pedestrian feature of the target domain t is represented, and d represents the feature dimension. And using GOG pedestrian features at a feature level, and using the attributes of the existing data set as the attributes of the pedestrians.

Step 2: the following loss function Feature Decomposition term (FD) L is designed_FDThe purpose of (1) decomposing a source domain feature set into an attitude invariant component, a domain component and an interference component:

wherein, V_sDenotes the total number of source domain views, X_s,v,iAnd (3) representing the features of the ith identity at the v view angle in the training set s. D_p，D_d，D_rRespectively representing an attitude invariant component dictionary, a domain information component dictionary, and an interference component dictionary. While

Represents X_s,v,iCorresponding to the coding coefficients of the three component dictionaries, respectively. I | · | purple wind_*Represents the kernel norm, | ·| non-woven phosphor of the matrix_2,1Expressing the structured sparse norm η, λ₁，λ₂A regularization parameter is represented. Wherein phi (D)_r,C^p,C^r) Regular terms that promote domain separation are represented, specifically as follows:

wherein C is^p，C^rRepresenting the data set as a whole coding coefficients. Lambda [ alpha ]₃And λ₄Representing a regularization parameter. I and Q represent the identity matrix and identity matrix, respectively.

And step 3: in order to enhance the robustness and the domain invariance of the semantic attributes, the semantic attributes are introduced to assist cross-domain pedestrian re-identification. Loss function hypergraph structure alignment term (Hypergr)aph Structure Alignment，HSA)L_HSAIs represented as follows:

firstly, a hypergraph G (X, E) is constructed through image samples of a source domain and the identity of a pedestrian, and comprises a group of vertexes

And a set of super edges

Wherein | N_jI and | N_rAnd | respectively represents the number of vertexes and super edges. For any given hypergraph, its hyper-edges can be easily converted into a correlation matrix

α₁，α₂，β₁The representation of the hyper-parameter is,

representing two hypergraph laplacian regularizations, P and E represent linear transformation coefficient matrices, L-I-W represent hypergraph laplacian matrices,

a weight matrix representing a hypergraph to measure the degree of correlation between two vertices;

D_xand D_eDiagonal matrices representing the degrees of the super edge and the degrees of the vertex, respectively. W_eA diagonal matrix representing super-edge weights.

And 4, step 4: in order to solve the Domain deviation, a Domain Adaptation item is introduced, part of unlabeled data of the target Domain participates in the training of a characteristic decomposition model, and a Domain Adaptation (DA) L is lost_DATo representThe following were used:

wherein, V_tRepresents the total number of views of the target domain, N_tRepresenting the number of samples, X, of the target domain_t,v,iAnd (3) representing the pedestrian image feature sequence of the ith identity at the v view angle in the target data set t. While

Represents X_t,v,iCorresponding to three component dictionaries D respectively_p，D_d，D_rThe coding coefficients of (1). Lambda [ alpha ]₂Is a regularization parameter. Finally, the entire objective function is represented as:

L＝L_FD+L_HSA+L_DA(6)。

and 5: the proposed functions are then consolidated and merged, and the overall loss function L in step 4 can be expanded into the following form:

step 6: in the step 5, 9 variables need to be solved, each variable is solved by using an alternating iterative optimization algorithm, and other variables need to be fixed in the process of solving one variable. Obtaining an attitude invariant component dictionary D by solving_pDomain information component dictionary D_dDictionary of interference components D_rAnd transformation matrices P and E. With these dictionaries, the corresponding coding coefficients can be calculated by the following formula

ζ represents a regularization parameter.

And 7: when calculated, get

Then, using the transformation matrices P, E obtained in step 6, h can be obtained by equations (9) and (10)_t,iAnd a_t,i：

In the above formula, h_t,iAnd E can be considered constant by finding the optimum a_t,iThe minimum value is taken after the F norm of the right term is squared, and the a at the moment is obtained_t,i. With predicted identity representation h for the test sample_t,iAnd semantic Attribute a_t,i。α₂The regularization parameters are represented.

And 8: finally, the similarity achievement sim of the pedestrian image pair in the identity space and the semantic space can be respectively calculated through the cosine distance calculation formula of the equation (11)_hAnd sim_a。

Wherein z is_aAnd z_bRespectively representing the current pedestrian identity expression vector and the semantic attribute vector and h obtained in the step 7_t,iAnd a_t,iAre identical, except that z_aAnd z_bBroadly refers to the identity representation and semantic attributes of the current pedestrian, and h_t,i，a_t,iAn identity representation and semantic attributes representing the ith pedestrian. Is constant 0.0000001. And (4) weighting and summing the similarity scores respectively obtained by the identity space and the semantic attribute space, and taking the weighted similarity score as a final pedestrian to perform similarity measurement on the similarity score.

sim_final＝τsim_a+(1-τ)sim_h(12)

Where τ > 0 represents the weight occupied by each space. In the present invention, τ is set to 0.2. Through the method, the similarity of the pedestrians in the target data set can be finally measured by using the solved variable.

The invention has the following beneficial effects:

(1) by the aid of the proposed decomposition model, influence of domain information and pedestrian posture information among data sets on cross-domain pedestrian re-identification is eliminated, and differences among different domains are reduced. The method is beneficial to the model to extract the more robust characteristics of the pedestrian in the real scene.

(2) By introducing an effective hypergraph structure alignment constraint, a conversion relation between the posture invariant feature and the semantic attribute is established, and the model is more discriminative for different pedestrians by combining a similarity measurement method performed by the two, for example, the appearances of two pedestrians are very similar, but the two pedestrians can be prevented from being identified as the same pedestrian through attribute information, so that misjudgment is avoided.

Drawings

FIG. 1 is a flow chart of a cross-domain pedestrian re-identification method using gesture invariance and graph structure alignment according to the present invention.

Detailed Description

The invention is further described with reference to the following figures and specific examples.

Example 1: as shown in fig. 1, a cross-domain pedestrian re-identification method using posture invariance and graph structure alignment includes the following steps:

4) designing a domain adaptation module capable of reducing domain offset;

5) merging the proposed loss functions into a final optimization function;

The method comprises the following specific steps:

step 1, defining that K pedestrians exist in a source data set,

wherein

Contains N in total_tThe number of the samples is one,

Step 2: the following loss function Feature Decomposition term (FD) L is designed_FDIs to set the source domain features

Decomposition into pose-invariant components, domain components, interference components:

Represents X_s,v,iCorresponding to the coding coefficients of the three component dictionaries, respectively. I | · | purple wind_*Represents the kernel norm, | ·| non-woven phosphor of the matrix_2,1Expressing the structured sparse norm η, λ₁，λ₂A regularization parameter is represented. Wherein Φ D_r,C^p,C^r) Regular terms that promote domain separation are represented, specifically as follows:

And step 3: in order to enhance the robustness and the domain invariance of the semantic attributes, the semantic attributes are introduced to assist cross-domain pedestrian re-identification. Loss function Hypergraph Structure Alignment (HSA) L_HSAIs represented as follows:

And a set of super edges

α₁，α₂，β₁The representation of the hyper-parameter is,

And 4, step 4: in order to solve the Domain deviation, a Domain Adaptation item is introduced, part of unlabeled data of the target Domain participates in the training of a characteristic decomposition model, and a Domain Adaptation (DA) L is lost_DAIs represented as follows:

L＝L_FD+L_HSA+L_DA(6)。

ζ represents a regularization parameter.

And 7: when calculated, get

In the above formula, h_t,iAnd E can be considered constant by finding the optimuma_t,iThe minimum value is taken after the F norm of the right term is squared, and the a at the moment is obtained_t,i. With predicted identity representation h for the test sample_t,iAnd semantic Attribute a_t,i。α₂The regularization parameters are represented.

sim_final＝τsim_a+(1-τ)sim_h(12)

In the model proposed above, there are 11 parameters to be set, including dictionary D_p，D_d，D_rAtom size d of_p,d_d,d_rAnd the regularization term parameter λ₁,λ₂,λ₃,λ₄,α₁,α₂β ζ in the experiment, these parameters were set to d, respectively_p＝600,d_d＝180,d_r＝180，λ₁＝0.0001,λ₂＝0.0001,λ₃＝0.01,λ₄＝1,α₁＝0.1,α₂＝0.1,β＝0.1,ζ＝0.1。

The GOG features are used as visual features of pedestrians, and standard semantic attributes which are already represented are used as attributes of the pedestrians. To demonstrate that the algorithm can be deployed in real life, experiments were performed on the VIPeR dataset. The data set contains two cameras, each capturing one image per person. The data set has various pedestrian attitude changes, as well as viewing angle and illumination changes. And taking prid2011 and grid as source data sets, and averagely dividing the model into training and testing. Training was repeated 10 times to obtain the average as the final performance. The comparison results are shown in table 1. The experiment proves that the method can directly deploy the trained model to the VIPer scene for recognition and keep good recognition rate.

TABLE 1 VIPeR data set

The invention also carries out experiments on the CUHK01 data set, the data set is collected from the campus scene of Chinese university in hong Kong, the cameras are respectively arranged in a teaching building and an outdoor scene, and the visual angle is wide step by step. Tests were performed with VIPeR as the source data set and CUHK01 as the target data set. The results are shown in table 2, which also shows the performance of other processes, from which it can be seen that the process achieves a relatively high performance.

TABLE 2 CUHK01 dataset

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.

Claims

1. A cross-domain pedestrian re-recognition method using posture invariance and graph structure alignment is characterized in that: the method comprises the following steps:

4) designing a domain adaptation module capable of reducing domain offset;

5) merging the proposed loss functions into a final optimization function;

2. The method of claim 1, wherein the method comprises the following steps: the method comprises the following specific steps:

step 1, defining that K pedestrians exist in a source data set,

wherein

indicates the i-th pedestrian label, N_sDenotes the number of samples, X_s，A_s，Y_sRespectively representing a source domain feature set, a source domain attribute set, a source domain label set and defining a target data set

Contains N in total_tThe number of the samples is one,

representing the ith pedestrian feature of the target domain t, using the GOG pedestrian feature on the feature level, and using the attribute of the existing data set as the attribute of the pedestrian;

step 2: the loss function characteristic decomposition term L is designed as follows_FDIs to set the source domain features

wherein, V_sDenotes the total number of source domain views, X_s,v,iFeatures representing the ith identity at the v view in the training set s, D_p，D_d，D_rRespectively represent an attitude invariant component dictionary, a domain information component dictionary, and an interference component dictionary, and

represents X_s,v,iCorresponding to the coding coefficients of the three-component dictionary, | | · | | non-woven phosphor_*Represents the kernel norm, | ·| non-woven phosphor of the matrix_2,1Representing a structured sparse norm, η, λ₁，λ₂Represents a regularization parameter, where Φ (D)_r,C^p,C^r) Regular terms that promote domain separation are represented, specifically as follows:

wherein C is^p，C^rRepresenting the overall coding coefficient, λ, of the data set₃And λ₄Representing a regular parameter, wherein I and Q respectively represent an identity matrix and an identity matrix;

and step 3: in order to enhance the robustness and the domain invariance of semantic attributes, the semantic attributes are introduced to assist cross-domain pedestrian re-identification, and a loss function hypergraph structure alignment item L_HSAIs represented as follows:

And a set of super edges

Wherein | N_jI and | N_rL respectively represents the number of vertexes and super edges, and for any given super graph, the super edge can be easily converted into a correlation matrix

α₁，α₂，β₁Denotes a hyperparameter, tr (C)^pLC^pT) Representing two hypergraph laplacian regularizations, P and E represent linear transformation coefficient matrices, L-I-W represent hypergraph laplacian matrices,

D_xand D_eDiagonal matrices, W, representing the degrees of the super-edges and the degrees of the vertices, respectively_eA diagonal matrix representing super-edge weights;

and 4, step 4: in order to solve the domain deviation, a domain adaptation term is introduced, part of unlabeled data of the target domain participates in the training of the characteristic decomposition model, and the function domain adaptation term L is lost_DAIs represented as follows:

wherein, V_tRepresents the total number of views of the target domain, N_tRepresenting the number of samples, X, of the target domain_t,v,iA sequence of pedestrian image features representing the ith identity at the v view angle in the target data set t, and

represents X_t,v,iCorresponding to three component dictionaries D respectively_p，D_d，D_rA coding coefficient of (a)₂To regularize the parameters, finally, the entire objective function is expressed as:

L＝L_FD+L_HSA+L_DA(6)

step 6: in the step 5, 9 variables need to be solved, each variable is solved by using an alternative iterative optimization algorithm, in the process, one variable needs to be fixed with other variables, and the attitude invariant component dictionary D is obtained by solving_pDomain information component dictionary D_dDictionary of interference components D_rAnd transformation matrices P and E, with these dictionaries, whose corresponding coding coefficients can be calculated by the following formula

ζ represents a regularization parameter;

and 7: when calculated, get

In the above formula, h_t,iAnd E can be considered constant by finding the optimum a_t,iThe minimum value is taken after the F norm of the right term is squared, and the a at the moment is obtained_t,iFor the test sample, there is a predicted identity representation h_t,iAnd semantic Attribute a_t,i，α₂Representing a regularization parameter;

and 8: finally, the similarity between the pedestrian image pair in the identity space and the semantic space can be respectively calculated through the cosine distance calculation formula of equation (11)Achievement sim_hAnd sim_a，

Wherein z is_aAnd z_bRespectively representing the current pedestrian identity expression vector and the semantic attribute vector and h obtained in the step 7_t,iAnd a_t,iAre identical, except that z_aAnd z_bBroadly refers to the identity representation and semantic attributes of the current pedestrian, and h_t,i，a_t,iThe identity representation and the semantic attribute of the ith pedestrian are represented and are constant 0.0000001, the similarity scores obtained from the identity space and the semantic attribute space are weighted and summed, and the weighted similarity score is taken as the final pedestrian to perform similarity measurement on the similarity score:

sim_final＝τsim_a+(1-τ)sim_h(12)

wherein tau > 0 represents the weight occupied by each space, and tau is set to be 0.2 in the invention, and finally the similarity of pedestrians in the target data set can be measured by using the solved variable.