CN116777004B

CN116777004B - Non-convex discriminative transfer subspace learning method incorporating distribution alignment information

Info

Publication number: CN116777004B
Application number: CN202310779007.7A
Authority: CN
Inventors: 罗廷金; 刘玥瑛
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2024-02-06
Anticipated expiration: 2043-06-29
Also published as: CN116777004A

Abstract

The invention discloses a non-convex discrimination migration subspace learning method integrated with distribution alignment information, which comprises the following steps: s1, constructing an objective function by cutting the square of the Frobenius norm; s2, carrying out iterative solution on the objective function through an IALM algorithm to obtain a projection matrix P; s3, projecting the source domain data and the target domain data into a common characteristic subspace through a projection matrix P. Compared with trace norms, the invention adopts a non-convex regularization term which is more compact to rank function approximation as a proxy model, better approximates low-rank constraint by minimizing the first k minimum singular values of a reconstruction matrix, and aligns class information of data by minimizing joint distribution difference between a source domain and a target domain, so that the source domain data can reconstruct the target domain data better, and classification performance is improved.

Description

Non-convex discriminative transfer subspace learning method incorporating distribution alignment information

技术领域Technical field

本发明涉及无监督技术领域，更具体地说，特别涉及一种融入分布对齐信息的非凸判别迁移子空间学习方法。The present invention relates to the field of unsupervised technology, and more specifically, to a non-convex discriminative transfer subspace learning method that incorporates distribution alignment information.

背景技术Background technique

随着大数据时代的发展，在图像识别、自然语言处理、医疗健康等领域，数据往往来自于不同域，从而分布不同。同时，海量数据通常没有标注。因此，如何对这些无标签的数据(即目标域数据)进行分类是一个非常重要的问题，也引发了许多专家学者的关注。无监督领域自适应则提供了一种很自然的方式来解决上述问题，它通过深度挖掘有标签数据(即源域数据)中的信息来促进目标分类器的学习。With the development of the big data era, in fields such as image recognition, natural language processing, and medical health, data often come from different domains and are therefore distributed differently. At the same time, massive amounts of data are often unlabeled. Therefore, how to classify these unlabeled data (i.e., target domain data) is a very important issue and has attracted the attention of many experts and scholars. Unsupervised domain adaptation provides a natural way to solve the above problems by deeply mining the information in labeled data (ie, source domain data) to promote the learning of the target classifier.

无监督领域自适应中，传统的方法可分为基于分布自适应和基于迁移子空间学习两大类，第一类方法通过对齐源域和目标域的联合分布来减少两域之间的差异。而衡量两域之间的差异可以通过定义不同方式的距离实现，其中MMD准则因其简单性以及坚实的理论基础被广泛采用。基于分布自适应的方法有很多，下面介绍几种经典的方法。迁移成分分析(TCA)在解决源域和目标域分布不同问题时，将两个领域的数据一起映射到一个高维的再生核希尔伯特空间。在此空间中，最小化两者之间的边缘分布差异，同时最大程度地保留它们各自的内部属性。在此基础上，联合分布适配(JDA)同时考虑数据的边缘分布差异和类条件分布差异，实现两域分布从整体到各类更为精细的对齐。其中，类条件分布差异的计算需要用到目标域数据的标签，而这恰恰是我们的任务。针对这一问题，JDA提出伪标签的思想。同时考虑到伪标签的不可靠性，采用迭代更新机制来降低不利影响。然而JDA并没有根据应用场景对边缘分布差异和条件分布差异的权重作出适当的调整。如当数据集更加不同时，意味着边缘分布差异更大；而当数据集相似时，意味着条件分布需要更多的关注。因此，如果将两者权重设为定值，只会降低算法的性能。针对这一想法，Wang等人提出了平衡分布适配(BDA)。同时，作者还提出了一种新的加权平衡分布自适应(W-BDA)算法，它通过自适应地改变各类权重来解决迁移中的类不平衡问题。此外，Zhang等人认为之前的方法都是计算两域的边缘分布差异和类条件分布差异的加权和，更自然的度量是直接计算它们的联合概率分布差异。作者也提出之前的方法只增加域之间的可迁移性，而忽略了不同类数据之间的可判别性，这可能导致模型分类性能下降，因此提出联合概率分布适配(JPDA)来解决这一问题。Wang等人则通过理论推导探寻MMD的本质，验证了最小化类条件分布差异等价于最小化源域和目标域各类的类间距离，也等价于最大化数据方差和两域各类的类间距离，这直观说明了MMD中特征可判别性下降的原因。通过理论的指导，Wang等人提出了一种新的具有两种并行策略的判别性MMD，减轻MMD对特征判别性的负面影响。In unsupervised domain adaptation, traditional methods can be divided into two categories: distribution-based adaptation and transfer subspace learning-based. The first type of method reduces the difference between the two domains by aligning the joint distribution of the source domain and the target domain. Measuring the difference between two domains can be achieved by defining distances in different ways, among which the MMD criterion is widely adopted because of its simplicity and solid theoretical foundation. There are many methods based on distribution adaptation. Here are some classic methods. When solving the problem of different distributions in the source domain and target domain, transfer component analysis (TCA) maps the data in the two fields together into a high-dimensional regenerative kernel Hilbert space. In this space, the difference in marginal distribution between the two is minimized while maximizing their respective internal properties. On this basis, Joint Distribution Adaptation (JDA) considers both the edge distribution differences and class conditional distribution differences of the data to achieve a more refined alignment of the two domain distributions from the whole to each type. Among them, the calculation of the class conditional distribution difference requires the use of labels of the target domain data, and this is precisely our task. In response to this problem, JDA proposed the idea of pseudo tags. At the same time, considering the unreliability of pseudo-labels, an iterative update mechanism is used to reduce adverse effects. However, JDA did not make appropriate adjustments to the weights of marginal distribution differences and conditional distribution differences according to the application scenario. For example, when the data sets are more different, it means that the marginal distributions are more different; and when the data sets are similar, it means that the conditional distribution needs more attention. Therefore, if the weights of the two are set to fixed values, it will only reduce the performance of the algorithm. In response to this idea, Wang et al. proposed balanced distribution adaptation (BDA). At the same time, the author also proposed a new Weighted Balanced Distribution Adaptation (W-BDA) algorithm, which solves the class imbalance problem in migration by adaptively changing the weights of various types. In addition, Zhang et al. believe that previous methods calculate the weighted sum of the difference in marginal distributions and class conditional distributions between two domains, and a more natural measure is to directly calculate the difference in their joint probability distributions. The author also proposed that previous methods only increase the transferability between domains, but ignore the discriminability between different types of data, which may lead to a decrease in model classification performance. Therefore, joint probability distribution adaptation (JPDA) is proposed to solve this problem. One question. Wang et al. explored the essence of MMD through theoretical derivation and verified that minimizing the difference in class conditional distribution is equivalent to minimizing the inter-class distance between the source domain and the target domain, and is also equivalent to maximizing the data variance and the difference between the classes in the two domains. The inter-class distance, which intuitively illustrates the reason for the decline in feature discriminability in MMD. Through the guidance of theory, Wang et al. proposed a new discriminative MMD with two parallel strategies to alleviate the negative impact of MMD on feature discriminability.

相比于第一类方法，基于迁移子空间学习的方法从数据具有的几何结构特征出发，促使表达更加简洁有效，其中又可分为两类：以测地线流采样(SGF)和测地线流式核方法(GFK)为代表的方法引入流形学习保持数据结构。它们认为域都是格拉斯曼流形中的一个点，通过两者测地线距离上的d个中间点连接起来形成一条路径。因此，找到每一步合适的变换，即可实现源域到目标域的变换。而另一种方法考虑对齐源域和目标域数据的统计特征来实现知识迁移，如子空间对齐(SA)、子空间分布对齐(SDA)以及方差关联对齐(CORAL)等。但这些方法易受到噪声和离群点的干扰，因此Shao等人提出低秩迁移子空间学习(LTSL)。它将源域和目标域中数据迁移到统一的广义子空间中，通过源域样本实现对目标域样本的线性重构，并对重构矩阵施加低秩约束来保证数据的全局结构。考虑到迁移过程中噪声的影响，引入噪声项来提升方法的鲁棒性。Fang等人则采用最小二乘拟合标签。考虑到标签噪声对模型造成的不利影响，对源域数据的标签进行松弛，这一操作为适应标签提供更多自由的同时尽可能增大了不同类数据之间的边距。判别迁移子空间学习(DTSL)结合上述两者，认为目标样本在重构时，仅由几个相似的源域样本线性表示可以得到更好的效果，而非所有样本。因此对重构矩阵施加稀疏约束，刻画数据的局部结构。联合特征选择和结构保留方法(FSSP)和联合低秩表示和特征选择(JLRFS)方法则对投影矩阵施加结构化稀疏约束，促使模型筛选出更为关键的特征。同时引入图拉普拉斯项，更好的保留数据结构。Compared with the first type of method, the method based on migration subspace learning starts from the geometric structure characteristics of the data, making the expression more concise and effective. It can be divided into two categories: geodesic flow sampling (SGF) and geodesic flow sampling. Methods represented by the linear flow kernel method (GFK) introduce manifold learning to preserve the data structure. They believe that the domain is a point in the Grassmann manifold, connected by d intermediate points on the geodesic distance between the two to form a path. Therefore, by finding the appropriate transformation at each step, the transformation from the source domain to the target domain can be achieved. Another method considers aligning the statistical characteristics of source and target domain data to achieve knowledge transfer, such as subspace alignment (SA), subspace distribution alignment (SDA), and variance correlation alignment (CORAL). However, these methods are susceptible to interference from noise and outliers, so Shao et al. proposed low-rank transfer subspace learning (LTSL). It migrates data in the source domain and target domain into a unified generalized subspace, realizes linear reconstruction of target domain samples through source domain samples, and imposes low-rank constraints on the reconstruction matrix to ensure the global structure of the data. Considering the impact of noise during the migration process, the noise term is introduced to improve the robustness of the method. Fang et al. used least squares to fit the labels. Taking into account the adverse impact of label noise on the model, the labels of the source domain data are relaxed. This operation provides more freedom to adapt to the labels while increasing the margin between different types of data as much as possible. Discriminative transfer subspace learning (DTSL) combines the above two and believes that when reconstructing the target sample, it can achieve better results by linearly representing only a few similar source domain samples instead of all samples. Therefore, sparse constraints are imposed on the reconstruction matrix to characterize the local structure of the data. The joint feature selection and structure preserving method (FSSP) and the joint low rank representation and feature selection (JLRFS) method impose structured sparse constraints on the projection matrix, prompting the model to screen out more critical features. At the same time, the graph Laplacian term is introduced to better preserve the data structure.

然而，这些方法仍然有许多局限，首先，基于迁移子空间学习的方法通常对重构矩阵施加低秩约束来保留数据的全局结构。由于秩最小化问题是NP难的，许多学者通过采用迹范数来近似秩函数。但这种近似方法会受到最大奇异值的显著影响，秩函数则不会，因此其解可能导致严重偏离最优解。其次，传统的迁移子空间学习方法只利用数据的特征信息来实现对目标样本的重构，但忽略了数据的类别信息。如果目标样本由特征相似但类别不同的源数据线性表示，则很难被分对。针对这些问题，本发明提出了融入分布对齐信息的非凸判别迁移子空间学习方法来解决这些问题。However, these methods still have many limitations. First, methods based on transfer subspace learning usually impose low-rank constraints on the reconstruction matrix to preserve the global structure of the data. Since the rank minimization problem is NP-hard, many scholars approximate the rank function by using the trace norm. However, this approximation method will be significantly affected by the largest singular value, while the rank function will not, so its solution may lead to serious deviations from the optimal solution. Secondly, the traditional transfer subspace learning method only uses the characteristic information of the data to reconstruct the target sample, but ignores the category information of the data. If the target sample is linearly represented by source data with similar features but different categories, it will be difficult to classify them into pairs. In response to these problems, the present invention proposes a non-convex discriminative transfer subspace learning method that incorporates distribution alignment information to solve these problems.

发明内容Contents of the invention

本发明的目的在于提供一种融入分布对齐信息的非凸判别迁移子空间学习方法，以克服现有技术所存在的缺陷。The purpose of the present invention is to provide a non-convex discriminant transfer subspace learning method that incorporates distribution alignment information, so as to overcome the shortcomings of the existing technology.

为了达到上述目的，本发明采用的技术方案如下：In order to achieve the above objects, the technical solutions adopted by the present invention are as follows:

融入分布对齐信息的非凸判别迁移子空间学习方法，用于图像识别，包括以下步骤：A non-convex discriminative transfer subspace learning method incorporating distribution alignment information for image recognition includes the following steps:

S1、通过截断Frobenius范数的平方来构造目标函数；S1. Construct the objective function by truncating the square of the Frobenius norm;

S2、通过IALM算法对目标函数进行迭代求解得到投影矩阵P；S2. Use the IALM algorithm to iteratively solve the objective function to obtain the projection matrix P;

S3、将源域和目标域数据通过投影矩阵P投影到一公共特征子空间中。S3. Project the source domain and target domain data into a common feature subspace through the projection matrix P.

进一步地，所述步骤S1具体包括：Further, the step S1 specifically includes:

S10、将目标域数据通过源域数据线性表示为：S10. Linearly express the target domain data through the source domain data as:

P^TX_t＝P^TX_sZ+EP ^T X _t = P ^T X _s Z + E

式中，X_t和X_s分别为目标域数据和源域数据的特征矩阵；其中，n_t和n_s分别为两域的数据量；Z为重构矩阵，E为噪声矩阵。In the formula _, _X _t _and

S11、施加低秩约束在Z上来维护数据的全局结构，施加l₁范数在Z上促使其稀疏来保证数据的局部结构，同时对E施加稀疏约束，得到目标函数为：S11. Apply low-rank constraints on Z to maintain the global structure of the data. Apply l ₁ norm on Z to make it sparse to ensure the local structure of the data. At the same time, apply sparse constraints on E. The objective function is:

式中，α,β,λ为正则化参数；In the formula, α, β, and λ are regularization parameters;

S12、通过截断Frobenius范数的平方来促使前k个最小奇异值为0来近似低秩条件，使目标函数转换为：S12. Approximate the low-rank condition by truncating the square of the Frobenius norm to force the first k smallest singular values to be 0, so that the objective function is transformed into:

式中，k＝n_s-rank(Z)，是判别子空间学习函数项，通过在子空间中拟合源域标签来加强数据的判别性，它使用松弛标签Y°＝Y_s+B⊙M来消除标签噪声对模型的影响，Y_s为源域数据的独热编码，M为非负矩阵，⊙为Hadamard算子，矩阵B为：In the formula, k=n _s -rank(Z), is the discriminant subspace learning function term, which strengthens the discriminability of the data by fitting the source domain label in the subspace. It uses the relaxed label Y°=Y _s +B⊙M to eliminate the impact of label noise on the model, Y _s is One-hot encoding of source domain data, M is a non-negative matrix, ⊙ is the Hadamard operator, and matrix B is:

S13、在投影矩阵上施加结构化稀疏l_1,2范数，将步骤S12中的目标函数转换为：S13. Apply a structured sparse l _1,2 norm to the projection matrix, and convert the objective function in step S12 to:

式中，γ为平衡因子；In the formula, γ is the balance factor;

S14、通过学习P来减少子空间中的源域和目标域之间的距离，基于分布自适应方法得到以下公式：S14. By learning P to reduce the distance between the source domain and the target domain in the subspace, the following formula is obtained based on the distribution adaptive method:

式中，ω是平衡因子，分别为源域和目标域中第c类样本的数量，X＝[X_s,X_t]，M₀,M_c定义如下所示： In the formula, ω is the balance factor, are the number of samples of type c in the source domain and target domain respectively, X=[X _s ,X _t ], M ₀ , M _c are defined as follows :

S15、采用伪标签来计算步骤S14中的公式，对伪标签进行迭代细化，令最终的目标函数为：S15. Use pseudo labels to calculate the formula in step S14, and iteratively refine the pseudo labels, so that The final objective function is:

进一步地，所述步骤S2具体包括：Further, the step S2 specifically includes:

S20、引入矩阵Q和J，其解可以通过最小化增广的拉格朗日函数来等价得到目标函数：S20. Introduce matrices Q and J, the solution of which can be equivalently obtained by minimizing the augmented Lagrangian function to obtain the objective function:

式中，Y₁,Y₂,Y₃为拉格朗日乘子，μ为惩罚参数；In the formula, Y ₁ , Y ₂ , Y ₃ are Lagrange multipliers, and μ is the penalty parameter;

S21、固定矩阵Q、Z、J、E和M，将步骤S20的目标函数公式重写为：S21. Fix the matrices Q, Z, J, E and M, and rewrite the objective function formula of step S20 as:

通过计算步骤S21的目标函数的偏导，令其为0，可以得到矩阵P的闭式解：By calculating the partial derivative of the objective function in step S21 and setting it to 0, the closed-form solution of matrix P can be obtained:

式中，G₁＝Y_s+B⊙M，G₂＝X_t-X_sZ；In the formula, G ₁ =Y _s +B⊙M, G ₂ =X _t -X _s Z;

S22、固定矩阵P、Z、J、E和M，更新Q的目标函数为：S22. Fixed matrices P, Z, J, E and M, and the objective function of updating Q is:

其中采用收缩算子，Q的解可以表示为：in Using the contraction operator, the solution of Q can be expressed as:

式中，其中[A]_:,i表示矩阵A的第i列；In the formula, [A] _:,i represents the i-th column of matrix A;

S23、固定矩阵P、Q、J、E和M，更新Z的目标函数为：S23. Fixed matrices P, Q, J, E and M, and the objective function of updating Z is:

由于的求解是NP难的，基于Ky Fan理论，其等价于目标函数可以重写为：because The solution of is NP-hard, based on Ky Fan theory, which is equivalent to The objective function can be rewritten as:

迭代计算F和Z直至上述目标函数收敛，得到Z的最优解后再进行步骤S24的更新，而非仅迭代一次F和Z，令上述公式的偏导数为0，得到Z的闭式解如下：Calculate F and Z iteratively until the above objective function converges. After obtaining the optimal solution of Z, update step S24 instead of just iterating F and Z once. Let the partial derivative of the above formula be 0 and obtain the closed-form solution of Z as follows :

其中，在计算F时，即优化下式：in, When calculating F, the following formula is optimized:

F的最优解由Z的k个最小奇异值对应的k个左奇异向量形成，即U₂，其中U为Z通过SVD分解得到的左奇异向量矩阵，令U＝[U₁,U₂]，由于计算Z的闭式解时只需FF^T，通过下式计算：The optimal solution of F is formed by k left singular vectors corresponding to the k minimum singular values of Z, that is, U ₂ , where U is the left singular vector matrix obtained by Z through SVD decomposition, let U=[U ₁ ,U ₂ ] , Since only FF ^T is needed to calculate the closed-form solution of Z, it is calculated by the following formula:

S24、固定矩阵P、Q、Z、E和M，更新J的目标函数为：S24. Fixed matrices P, Q, Z, E and M, and update the objective function of J as:

采用收缩算子，J的迭代解为：Using the contraction operator, the iterative solution of J is:

式中，shrink(x,c)＝sign max(|x|-c,0)。In the formula, shrink(x,c)=sign max(|x|-c,0).

S25、固定矩阵P、Q、Z、J和M，更新E的目标函数为：S25. Fixed matrices P, Q, Z, J and M, and update the objective function of E as:

采用收缩算子，E的迭代解为：Using the contraction operator, the iterative solution of E is:

S26、固定矩阵P、Q、Z、J和E，更新M的目标函数为S26. Fixed matrices P, Q, Z, J and E, and update the objective function of M as

令R＝P^TX_s-Y_s，上述目标函数分解成d×n_s个子优化问题其中d,n_s分别为非负松弛标签矩阵M的行数和列数，最优解为M_ij＝max(R_ijB_ij,0)，上述目标函数的闭式解为：Let R=P ^T X _s -Y _s , the above objective function is decomposed into d×n _s sub-optimization problems Among them, d and n _s are the number of rows and columns of the non-negative relaxed label matrix M respectively. The optimal solution is M _ij =max (R _ij B _ij ,0). The closed-form solution of the above objective function is:

M^*＝max(R⊙B,0)M ^* =max(R⊙B,0)

S27、更新拉格朗日乘数Y₁,Y₂,Y₃以及惩罚因子μ：S27. Update Lagrange multipliers Y ₁ , Y ₂ , Y ₃ and penalty factor μ:

式中，ρ为学习率，μ_max为μ可取的最大值。In the formula, ρ is the learning rate, and μ _max is the maximum value that μ can take.

与现有技术相比，本发明的优点在于：本发明采用相比于迹范数，对秩函数近似更为紧致的一种非凸正则项作为代理模型，其通过最小化重构矩阵的前k个最小奇异值来更好的近似低秩约束，通过最小化源域和目标域之间的联合分布差异来对齐数据的类别信息，使得源域数据更好的重构目标域数据，提升分类性能。Compared with the existing technology, the advantage of the present invention is that: the present invention uses a non-convex regular term that is more compact in approximating the rank function than the trace norm as a proxy model, which reconstructs the matrix by minimizing The first k minimum singular values are used to better approximate the low-rank constraint, and the category information of the data is aligned by minimizing the joint distribution difference between the source domain and the target domain, so that the source domain data can better reconstruct the target domain data, improving Classification performance.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.

图1是本发明融入分布对齐信息的非凸判别迁移子空间学习方法的迭代优化求解图。Figure 1 is an iterative optimization solution diagram of the non-convex discriminant transfer subspace learning method of the present invention that incorporates distribution alignment information.

具体实施方式Detailed ways

下面结合附图对本发明的优选实施例进行详细阐述，以使本发明的优点和特征能更易于被本领域技术人员理解，从而对本发明的保护范围做出更为清楚明确的界定。The preferred embodiments of the present invention are described in detail below in conjunction with the accompanying drawings, so that the advantages and features of the present invention can be more easily understood by those skilled in the art, and the protection scope of the present invention can be more clearly defined.

传统迁移子空间学习方法通常采用迹范数来近似低秩，然而它太过松弛，导致其解严重偏离理想解。此外，传统方法忽略了对齐数据的类别信息，使得分类性能有所降低。因此，本实施例提出融入分布对齐信息的非凸判别迁移子空间学习方法来解决这些问题。具体来讲，本实施例的目标是学习投影矩阵P将源域和目标域投影到一个公共子空间中。Traditional transfer subspace learning methods usually use trace norm to approximate low rank. However, it is too relaxed, causing its solution to seriously deviate from the ideal solution. In addition, traditional methods ignore the category information of aligned data, resulting in reduced classification performance. Therefore, this embodiment proposes a non-convex discriminative transfer subspace learning method that incorporates distribution alignment information to solve these problems. Specifically, the goal of this embodiment is to learn the projection matrix P to project the source domain and the target domain into a common subspace.

参阅图1所示，本实施例公开了一种融入分布对齐信息的非凸判别迁移子空间学习方法，用于图像识别，包括以下步骤：Referring to Figure 1, this embodiment discloses a non-convex discriminant transfer subspace learning method that incorporates distribution alignment information for image recognition, including the following steps:

步骤S1、通过截断Frobenius范数的平方来构造目标函数，具体包括：Step S1: Construct the objective function by truncating the square of the Frobenius norm, specifically including:

步骤S10、由于两域数据处于同一流形中，因此目标域数据可以通过源域数据线性表出，其原始公式写为：Step S10. Since the two domain data are in the same manifold, the target domain data can be expressed linearly through the source domain data. The original formula is written as:

P^TX_t＝P^TX_sZ+E (1)P ^T X _t = P ^T X _s Z + E (1)

步骤S11、每一个目标样本都可以通过其源域邻居样本的组合来线性表示。因此，在重构矩阵Z上施加低秩约束来保持这一性质，并且保护数据的全局结构。同时，每一个目标样本的重构都仅需少数的源域样本参与，施加l₁范数在Z上促使其稀疏来保证数据的局部结构，得到目标函数为：Step S11: Each target sample can be linearly represented by a combination of its source domain neighbor samples. Therefore, a low-rank constraint is imposed on the reconstruction matrix Z to maintain this property and protect the global structure of the data. At the same time, the reconstruction of each target sample only requires the participation of a few source domain samples. The l ₁ norm is applied on Z to make it sparse to ensure the local structure of the data. The objective function is:

式中，α,β,λ为正则化参数。In the formula, α, β, and λ are regularization parameters.

步骤S12、由于迹范数是秩函数在单位球上的最紧凸近似，因此许多专家学者将其作为代理模型来近似低秩条件。然而，根据迹范数的定义，它会随着矩阵最大奇异值的显著变化而产生剧烈变动，但矩阵的秩并不发生改变。因此迹范数这种松弛方式会促使其解严重偏离理想解。本实施例通过截断Frobenius范数的平方，促使前k个最小奇异值为0来近似低秩条件，消除了其他较大奇异值对代理模型的影响，使目标函数(2)转换为：Step S12. Since the trace norm is the most compact convex approximation of the rank function on the unit sphere, many experts and scholars use it as a surrogate model to approximate low-rank conditions. However, according to the definition of the trace norm, it will change drastically with the significant change of the maximum singular value of the matrix, but the rank of the matrix will not change. Therefore, this relaxation method of trace norm will cause its solution to seriously deviate from the ideal solution. This embodiment approximates the low-rank condition by truncating the square of the Frobenius norm, causing the first k smallest singular values to be 0, eliminating the impact of other larger singular values on the surrogate model, and converting the objective function (2) into:

步骤S13、另外，选出的特征可能是冗余的，因此需要在投影矩阵上施加结构化稀疏l_1,2范数，将步骤S12中的目标函数转换为：Step S13. In addition, the selected features may be redundant, so it is necessary to apply a structured sparse l _1,2 norm on the projection matrix to convert the objective function in step S12 into:

式中，γ为平衡因子。In the formula, γ is the balance factor.

步骤S14、然而，传统方法以及上述公式都只利用了数据的特征信息来对齐源域和目标域的子空间，忽视了数据类别信息对模型的影响。因此，本实施例利用不同类别的语义信息来减少两域间的分布差异，从而提高模型的分类性能。具体来讲，本实施例通过学习P来减少子空间中的两域距离，基于分布自适应方法得到以下公式：Step S14. However, traditional methods and the above formulas only use the characteristic information of the data to align the subspaces of the source domain and the target domain, ignoring the impact of data category information on the model. Therefore, this embodiment uses different categories of semantic information to reduce the distribution difference between the two domains, thereby improving the classification performance of the model. Specifically, this embodiment uses learning P to reduce the distance between the two domains in the subspace, and obtains the following formula based on the distribution adaptive method:

式中，ω是平衡因子，分别为源域和目标域中第c类样本的数量，X＝[X_s,X_t]，M₀,M_c定义如下所示：In the formula, ω is the balance factor, are the number of samples of type c in the source domain and target domain respectively, X=[X _s ,X _t ], M ₀ , M _c are defined as follows:

步骤S15、然而，在计算公式(5)时，由于目标域无标签，无法知道目标域中第j个样本是否属于第c类，即未知。因此，采用伪标签来进行计算。又因为伪标签可靠性较低，因此对伪标签进行迭代细化，令/>最终的目标函数为：Step S15. However, when calculating formula (5), since the target domain has no label, it is impossible to know whether the j-th sample in the target domain belongs to the c-th category, that is, unknown. Therefore, pseudo labels are used for calculation. And because the reliability of pseudo-labels is low, the pseudo-labels are iteratively refined, so that/> The final objective function is:

步骤S2、通过IALM算法对目标函数进行迭代求解得到投影矩阵P。Step S2: Iteratively solve the objective function through the IALM algorithm to obtain the projection matrix P.

具体的，本实施例设计IALM算法来有效求解目标函数(6)。所述步骤S2具体包括：Specifically, this embodiment designs the IALM algorithm to effectively solve the objective function (6). The step S2 specifically includes:

步骤S20，引入矩阵Q和J，其解可以通过最小化增广的拉格朗日函数(7)来等价得到：Step S20, introduce matrices Q and J, the solution of which can be obtained equivalently by minimizing the augmented Lagrangian function (7):

步骤S21、固定矩阵Q、Z、J、E和M，将步骤S20的目标函数公式重写为：Step S21, fix the matrices Q, Z, J, E and M, and rewrite the objective function formula of step S20 as:

通过计算目标函数(8)的偏导，令其为0，可以得到矩阵P的闭式解：By calculating the partial derivative of the objective function (8) and setting it to 0, the closed-form solution of the matrix P can be obtained:

步骤S22、固定矩阵P、Z、J、E和M，更新Q的目标函数为：Step S22, fix the matrices P, Z, J, E and M, and update the objective function of Q as:

步骤S23、固定矩阵P、Q、J、E和M，更新Z的目标函数为：Step S23, fix the matrices P, Q, J, E and M, and update the objective function of Z as:

迭代计算F和Z直至上述目标函数收敛，得到Z的最优解后再进行步骤S24的更新，而非仅迭代一次F和Z,令公式(13)的偏导数为0，得到Z的闭式解如下：Calculate F and Z iteratively until the above objective function converges. After obtaining the optimal solution of Z, update step S24 instead of just iterating F and Z once. Let the partial derivative of formula (13) be 0 to obtain the closed formula of Z. The solution is as follows:

步骤S24、固定矩阵P、Q、Z、E和M，更新J的目标函数为：Step S24, fix the matrices P, Q, Z, E and M, and update the objective function of J as:

步骤S25、固定矩阵P、Q、Z、J和M，更新E的目标函数为：Step S25, fix the matrices P, Q, Z, J and M, and update the objective function of E as:

步骤S26、固定矩阵P、Q、Z、J和E，更新M的目标函数为Step S26, fix the matrices P, Q, Z, J and E, and update the objective function of M as

M^*＝max(R⊙B,0) (21)M ^* =max(R⊙B,0) (21)

步骤S27、更新拉格朗日乘数Y₁,Y₂,Y₃以及惩罚因子μ：Step S27: Update Lagrange multipliers Y ₁ , Y ₂ , Y ₃ and penalty factor μ:

综上所述，将模型的各个优化步骤总结如下表1所示：To sum up, the various optimization steps of the model are summarized in Table 1 below:

步骤S3、将源域和目标域数据通过投影矩阵P投影到一公共特征子空间中。Step S3: Project the source domain and target domain data into a common feature subspace through the projection matrix P.

本实施例还对上述算法进行改进。在近似秩函数时，采用另一种变体其通过截断迹范数来构造。相比前文中的近似方式/>变体/>在公式定义上更贴近秩函数，因此近似效果更好。可以得到目标函数如下所示：This embodiment also improves the above algorithm. Another variation is used when approximating the rank function It is constructed by truncating the trace norm. Compared with the approximate method in the previous article/> Variants/> The formula definition is closer to the rank function, so the approximation effect is better. The objective function can be obtained as follows:

分析可知，改变低秩条件的代理模型仅对重构矩阵Z的计算产生影响，因此固定其他变量更新Z时，上述目标函数转变为：The analysis shows that changing the surrogate model with low-rank conditions only affects the calculation of the reconstruction matrix Z. Therefore, when other variables are fixed and Z is updated, the above objective function changes to:

同样地，在进行求解时是NP难的。幸运的是，可以将其等价转换为下式Similarly, It is NP-hard to solve. Fortunately, this can be converted equivalently to

其中r＝rank(Z)。而在优化迹范数时，本发明将其转换为Tr(Z^TDZ)，迭代求解。其中表示当前最优解。因此，目标函数(24)可以转换为where r=rank(Z). When optimizing the trace norm, the present invention converts it into Tr(Z ^T DZ) and solves it iteratively. in represents the current optimal solution. Therefore, the objective function (24) can be transformed into

当固定矩阵F和G，更新Z时。可以得到Z的闭式解如下所示：When fixing matrices F and G, update Z. The closed form solution of Z can be obtained as follows:

当固定Z时，F和G的最优解分别由Z的前r个最大奇异值对应的左右奇异向量构成。When Z is fixed, the optimal solutions of F and G are respectively composed of the left and right singular vectors corresponding to the first r largest singular values of Z.

本发明采用相比于迹范数，对秩函数近似更为紧致的一种非凸正则项作为代理模型，其通过最小化重构矩阵的前k个最小奇异值来更好的近似低秩约束，通过最小化源域和目标域之间的联合分布差异来对齐数据的类别信息，使得源域数据更好的重构目标域数据，提升分类性能。The present invention uses a non-convex regular term that is more compact in approximating the rank function than the trace norm as a surrogate model. It can better approximate low rank by minimizing the first k smallest singular values of the reconstruction matrix. Constraints are used to align the category information of the data by minimizing the joint distribution difference between the source domain and the target domain, so that the source domain data can better reconstruct the target domain data and improve classification performance.

虽然结合附图描述了本发明的实施方式，但是专利所有者可以在所附权利要求的范围之内做出各种变形或修改，只要不超过本发明的权利要求所描述的保护范围，都应当在本发明的保护范围之内。Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, the patent owner may make various variations or modifications within the scope of the appended claims, as long as they do not exceed the protection scope described in the claims of the present invention. within the protection scope of the present invention.

Claims

1. The non-convex discrimination migration subspace learning method integrated with the distribution alignment information is used for image recognition and is characterized by comprising the following steps of:

s1, constructing an objective function by cutting the square of the Frobenius norm;

s2, carrying out iterative solution on the objective function through an IALM algorithm to obtain a projection matrix P;

s3, projecting the source domain data and the target domain data into a common feature subspace through a projection matrix P;

the step S1 specifically includes:

s10, linearly representing the target domain data by the source domain data as follows:

P ^T X _t ＝P ^T X _s Z+E

wherein X is _t And X _s Feature matrices of the target domain data and the source domain data respectively; wherein n is _t And n _s The data quantity of the two domains respectively; z is a reconstruction matrix, E is a noise matrix;

s11, applying low-rank constraint on Z to maintain global structure of data, and applying l ₁ The norm promotes the sparsity on Z to ensure the local structure of the data, and simultaneously applies sparsity constraint to E to obtain an objective function as follows:

wherein alpha, beta, lambda are regularization parameters;

s12, the square of the Frobenius norm is truncated to promote the first k minimum singular values to be 0 to approximate a low-rank condition, and an objective function is converted into:

where k=n _s -rank(Z)，Is a discriminant subspace learning function term that enhances the discriminant of data by fitting source domain labels in subspaces, using the relaxed labels y+=y _s +B.sup.M to eliminate the influence of tag noise on the model, Y _s For the single thermal encoding of source domain data, M is a non-negative matrix, and by Hadamard operator, matrix B is:

s13, applying structured sparsity on the projection matrix _1,2 The norm converts the objective function in step S12 into:

wherein, gamma is a balance factor;

s14, reducing the distance between a source domain and a target domain in the subspace by learning P, and obtaining the following formula based on a distribution self-adaption method:

where ω is the balance factor,the number of class c samples in the source domain and the target domain, x= [ X ] _s ,X _t ]，M ₀ ,M _c The definition is as follows:

s15, calculating the formula in the step S14 by adopting the pseudo tag, and carrying out iterative refinement on the pseudo tag to enable the pseudo tag to be madeThe final objective function is:

the step S2 specifically includes:

s20, introducing matrixes Q and J, wherein the solution can be equivalently obtained into an objective function by minimizing the augmented Lagrangian function:

wherein Y is ₁ ,Y ₂ ,Y ₃ Is Lagrangian multiplier, μ is penalty parameter;

initializing: p=p ₀ ,Q＝0，M＝1,J＝Z＝0,E＝0,Y ₁ ＝0,Y ₂ ＝0,Y ₃ ＝0,μ ₀ ＝0.05,μ _max ＝10 ⁶ ,ρ＝1.1,ε＝10 ^-6 Repeating the following steps until the objective function meets the convergence condition;

s21, fixing matrices Q, Z, J, E and M, and rewriting the objective function formula of step S20 to:

by calculating the partial derivative of the objective function in step S21 to make it 0, a closed-form solution of the matrix P can be obtained:

wherein G is ₁ ＝Y _s +B⊙M，G ₂ ＝X _t -X _s Z；

S22, fixing matrices P, Z, J, E and M, updating the objective function of Q as:

wherein the method comprises the steps ofWith a contraction operator, the solution for Q can be expressed as:

wherein [ A ]] _:,i Representing the ith column of matrix a;

s23, fixing matrices P, Q, J, E and M, updating the objective function of Z as:

due toIs NP-difficult, which is equivalent to +.>The objective function may be rewritten as:

and (3) carrying out iterative computation on F and Z until the objective function converges to obtain an optimal solution of Z, and then updating in the step S24, instead of only iterating F and Z once, enabling the partial derivative of the formula to be 0, so as to obtain a closed solution of Z as follows:

wherein,in calculating F, the following formula is optimized:

the optimal solution of F is formed by k left singular vectors corresponding to k minimum singular values of Z, namely U ₂ Wherein U is a left singular vector matrix obtained by SVD decomposition of Z, and U= [ U ₁ ,U ₂ ]， Since only FF is needed in computing the closed-form solution of Z ^T Calculated by the following formula:

s24, fixing matrices P, Q, Z, E and M, updating the objective function of J as:

the iterative solution of J is:

where shrnk (x, c) =sign max (|x| -c, 0);

s25, fixing the matrices P, Q, Z, J and M, updating the objective function of E as:

the iterative solution of E is:

s26, fixing matrices P, Q, Z, J and E, updating the objective function of M to be

Let r=p ^T X _s -Y _s The objective function is decomposed into d×n _s Sub-optimization problemWherein d, n _s The number of rows and columns of the non-negative loose label matrix M are respectively, and the optimal solution is M _ij ＝max(R _ij B _ij 0), the closed-form solution of the objective function is:

M ^* ＝max(R⊙B,0)

s27, updating Lagrangian multiplier Y ₁ ,Y ₂ ,Y ₃ Penalty factor μ:

wherein ρ is learning rate, μ _max Is the maximum value that mu can take.