CN109635951A - Unsupervised cross-cutting self-adapting data scaling method and system based on weight distribution alignment and geometrical characteristic alignment - Google Patents

Unsupervised cross-cutting self-adapting data scaling method and system based on weight distribution alignment and geometrical characteristic alignment Download PDF

Info

Publication number
CN109635951A
CN109635951A CN201811547551.4A CN201811547551A CN109635951A CN 109635951 A CN109635951 A CN 109635951A CN 201811547551 A CN201811547551 A CN 201811547551A CN 109635951 A CN109635951 A CN 109635951A
Authority
CN
China
Prior art keywords
data
probability distribution
matrix
alignment
weighted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811547551.4A
Other languages
Chinese (zh)
Inventor
何慧
张伟哲
方滨兴
杨洪伟
李韬
白雅雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201811547551.4A priority Critical patent/CN109635951A/en
Publication of CN109635951A publication Critical patent/CN109635951A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models

Landscapes

  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Algebra (AREA)
  • Image Analysis (AREA)

Abstract

Unsupervised cross-cutting self-adapting data scaling method and system based on weight distribution alignment and geometrical characteristic alignment, are related to data scaling technical field.The present invention is in order to effectively improve data scaling accuracy rate.Weight distribution alignment can weigh the importance of the marginal probability distribution and conditional probability distribution of sample data, and then the difference between reduction field;Geometrical characteristic alignment not only can further between excavation applications sample data geometrical characteristic, and can be very good by Tula Bu Lasi regularization to keep the geometry in sample data space, and then the accuracy of raising sample separability and data scaling.By carrying out Experimental comparison with other methods, the system-that the present invention develops can effectively improve data scaling accuracy rate based on the unsupervised cross-cutting self-adapting data scaling method of weight distribution alignment and geometrical characteristic alignment.

Description

Unsupervised cross-domain self-adaptive data calibration method and system based on weighted distribution alignment and geometric feature alignment
Technical Field
The invention relates to an unsupervised cross-domain self-adaptive data calibration method and system, and relates to the technical field of data calibration.
Background
The unsupervised domain adaptation problem is a class of sub-problem of transfer learning, which aims to solve the domain adaptation problem that the target domain has no label data. Previous research results mainly focus on sample-based domain adaptation and feature transformation-based domain adaptation. The domain adaptation problem method based on feature transformation can be divided into a data-centered method and a subspace-centered method, the data-centered method mainly aims to find a uniform transformation to map data of a source domain and a target domain to a domain invariant space so as to reduce distribution difference and maintain the data features of an original space, but the method does not further utilize the geometric features of the data because the original feature space is distorted or stretched after the feature transformation; the subspace-centered approach only processes the subspace and does not explicitly consider the distribution difference between the domains after mapping.
Disclosure of Invention
The invention aims to provide an unsupervised cross-domain self-adaptive data calibration method and system based on weighted distribution alignment and geometric feature alignment so as to effectively improve the data calibration accuracy.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the first technical scheme is as follows: an unsupervised cross-domain adaptive data calibration method based on weighted distribution alignment and geometric feature alignment is implemented by the following steps:
the input of the method is as follows: xs,XtXsRepresenting a source domain exemplar, a known label exemplar; xtRepresenting a target field sample and a sample to be marked;a source domain sample label;
parameters are as follows:
α is evaluated for the importance of maximizing the variance of the sample to be marked,
λ 1 is the importance rating of the difference inside the generalized feature transform,
β are measures of importance for maximizing inter-class variance (with samples of different classes),
mu belongs to [0,1] is a marginal distribution and condition distribution importance parameter in the evaluation field,
delta E [0,1] is a graph Laplace regularization term (importance of further mining marginal distribution) coefficient,
p is the number of nearest neighbors of the sample,
k is the number of subspaces, and T is the number of iterations;
the output of the method is:
the transformation matrix Φ, Ψ; xsZ obtained by transforming matrix phis,XtZ obtained by transforming the matrix Ψt(ii) a And (3) adapting to a classifier: f;
step 1, calculating a divergence matrix S of a target domaintInter-class divergence matrix S of databWithin-class divergence matrix Sw,
M′s,M′t,M′st,M′tsThe sum of the weighted sum of the marginal probability distribution and the conditional probability distribution of the source field sample and the target field sample and the weighted Laplace regularization term corresponding to the weighted sum is provided (mainly, the prior knowledge is provided for the classification of the target field sample by further mining the potential knowledge for the distribution characteristics of the conditional probability distribution and the marginal probability distribution);
M′s,M′t,M′st,M′tsis four partitions in a matrix;
M′srepresenting the sum of the weighted sum of the marginal probability distribution and the conditional probability distribution of the source field sample and the weighted Laplace regularization term corresponding to the weighted sum;
M′trepresenting the sum of the weighted sum of the marginal probability distribution and the conditional probability distribution of the target field sample and the weighted Laplace regularization term corresponding to the weighted sum;
M′strepresenting the sum of weighted sum of marginal probability distribution and conditional probability distribution of the source domain sample and the target domain sample and a weighted Laplace regularization term corresponding to the weighted sum;
M′tsrepresenting the sum of weighted sum of marginal probability distribution and conditional probability distribution of the target field sample and the source field sample and weighted Laplace regularization terms corresponding to the weighted sum;
initializing pseudo labels of a target domain with a classifier trained in a source domain
Step 2: repeating the step 3 to the step 6;
and step 3: solving a generalized eigenvalue problem, taking the corresponding k eigenvectors and the first k eigenvalues as eigenvalues of a generalized eigenvalue transformation U, and then obtaining a source domain data sample transformation matrix phi and a target domain data sample transformation matrix psi;
and 4, step 4: mapping the original data to the corresponding subspace to obtain embedding:
and 5: in thatTraining a classifier to update pseudo labels of a target domain
Step 6: update M's,M′t,M′st,M′ts
And 7: until convergence;
and 8: is finally obtained atAnd an upper classifier f.
The second technical scheme is as follows: an unsupervised cross-domain adaptive data calibration method based on weighted distribution alignment and geometric feature alignment, the method is used for calculating the variable of claim 1, and the method is realized by the following steps:
step one, maximizing target field data variance
The target domain data variance is maximized in the corresponding feature subspace to avoid mapping data features onto unrelated dimensions,
in the formula
StIs the target domain divergence matrix and is,is a central matrix of which the center is,is a column vector whose elements are all 1;
m represents the dimension of the target domain sample space; i istIs an identity matrix;
step two, source field data separability characteristic maintenance
The method is based on that the label space of the source field is available in the data calibration process, and the separability of the transformed source field data is controlled by using the label information of the source field
Wherein S isbAnd SwThe inter-class divergence matrix and the intra-class divergence matrix of the data are defined as follows:
wherein,for a set of source domain data samples belonging to class c, is the central matrix for the class c data,is a matrix of the units,is a column vector whose elements are all 1,the number of data samples belonging to the source field of the class c;
step three, weighted distribution alignment
The weighted distribution alignment is used for quantitatively evaluating the importance of the marginal probability distribution and the conditional probability distribution, and is given in the following way:
Dw=(1-μ)D(Ps,Pt)+μD(Qs,Qt)
calculating marginal probability distribution and conditional probability distribution according to a minimum mean square difference (MMD) method, thereby marginal probability distribution D (P)s,Pt) And conditional probability distribution D (Q)s,Qt) Can be written as:
wherein,is a set of samples belonging to class c in the source domain,is thatThe label of (1), correspondingIs a set of samples belonging to class c in the target domain,is thatA true tag of, andfor the number of samples of class C in the source domain,the number of samples of the C type in the target field is taken;
training a certain classifier in a source field with a label, applying the classifier to a target field to obtain a pseudo label, and reducing conditional probability distribution among fields through pseudo label iteration; therefore, the form of the weight distribution alignment can be rewritten as:
whereinThe blocking matrix for aligning the marginal probability distribution and the conditional probability distribution weighted distribution of the source field samples and the target field samples specifically comprises the following steps:
step four, minimizing the internal difference of the transformation
Note the book
U is a generalized feature transform and is,
thus, transform internal difference minimization can be written as:
step five, graph Laplace regularization
If when x isi,Marginal probability distribution P of time, datas(xs) And Pt(xt) Sufficiently close, then conditional probability distribution Qs(ys|xs) And Qt(yt|xt) Are also sufficiently similar; suppose from xiTo xjIs sufficiently smooth, the graph laplacian regularization term can be expressed as:
wherein W is a graph adjacency matrix, WijIs an element in W, L is a regularized graph Laplacian matrix, LijIs an element in L; x is the number ofi,xjIs an element from the sample space;
wherein,denotes xiA set of p-neighbors of (a); l ═ I-D-1/2WD-1/2D is a diagonal element ofA diagonal matrix of (a);
step six, formalization solving
In summary, the optimization objective of WDGA is the following equation:
the method comprises the following steps that TDA, BCV, WDA, GLR, IDT and WCV respectively represent maximization of target domain data variance, maintenance of source domain data separability characteristics, alignment of weighted distribution, graph Laplace regularization, minimization of transformation internal difference and minimization of intra-class variance;
the calculation of each component according to the five steps is substituted into the formula to obtain an optimized expression as follows:
wherein,
next, there are:
to obtain the solution U, let the first derivativeThe following can be obtained:
wherein Λ ═ diag (λ)1,…,λk) Is the first k eigenvalues, U ═ U1,…,Uk]I is the identity matrix for the eigenvectors belonging to the corresponding eigenvalues. The second technical scheme corresponds to the first technical scheme.
The third technical scheme is as follows: an unsupervised cross-domain self-adaptive data calibration system based on weighted distribution alignment and geometric feature alignment comprises an input module, a data calibration module and an output module, wherein the input module is used for reading data to be calibrated and labels and transmitting the data to the data calibration module, the data calibration module is used for calibrating and classifying the input data to be calibrated, and the output module is used for outputting results classified by the data calibration module. The third technical solution is realized based on the algorithm described in the first and second technical solutions.
The invention has the beneficial effects that:
the transfer learning is used as a means for solving the problem of cross-domain identification data labeling, and the reduction of the difference between the source field data and the target field data is important for solving the problem of unsupervised field self-adaptive data calibration. And the method based on the weighted distribution alignment and the geometric feature alignment better solves the problem. Firstly, the importance of marginal probability distribution and conditional probability distribution of sample data can be balanced by weighted distribution alignment, so that the difference between fields is reduced; the geometric feature alignment can further excavate the geometric features of the sample data between domains, and the geometric structure of the sample data space can be well maintained through the graph-Laplacian regularization, so that the sample separability and the data calibration accuracy are improved. Through experimental comparison with other methods, the unsupervised cross-domain adaptive data calibration method based on weighted distribution alignment and geometric feature alignment, which is a system developed by people, can effectively improve the data calibration accuracy.
The invention provides an unsupervised cross-domain adaptive data calibration method based on weighted distribution alignment and geometric feature alignment, which is provided for overcoming various defects of the prior art in solving the field adaptation problem, and develops a set of system on the basis of the data calibration method.
Drawings
FIG. 1 is a graph of the effect of WDGA comparison with data characteristics of three other algorithms; FIG. 2 is a graph of parameters p, δ, μ, T versus accuracy, and FIG. 3 is a graph of the method of the present invention versus accuracy; FIG. 4 is a flow chart of data calibration performed by the data calibration system based on the method of the present invention.
Detailed Description
The unsupervised cross-domain adaptive data calibration method and system based on weighted distribution alignment and geometric feature alignment are explained by combining the accompanying drawings and tables as follows:
as the name suggests, the weighted distribution alignment and the geometric feature alignment are to perform weighted processing on marginal probability distribution and conditional probability distribution of data and perform alignment and perform geometric feature alignment on sample data space features, and simultaneously introduce a Laplace regularization term to further maintain the geometric structure of a sample space, so as to finally obtain an optimization problem which can obtain a closed solution, thereby solving the field adaptation problem.
In the present invention, the definition of parameters is given except for the common sense parameters and the intermediate parameters in the derivation process.
1-maximization of target domain data variance to avoid mapping data features onto unrelated dimensions, we circumvent this problem by maximizing the target domain data variance in the corresponding feature subspace
Here, the
Is the target domain divergence matrix and is,is a central matrix of which the center is,is a column vector whose elements are all 1.
2 source field data separability feature keeping source field label space available in data calibration process, we can use source field label information to control separability of transformed source field data
Wherein S isbAnd SwThe inter-class divergence matrix and the intra-class divergence matrix of the data are defined as follows:
wherein,for a set of source domain data samples belonging to class c, is the central matrix for the class c data,is a matrix of the units,is a column vector whose elements are all 1,the number of data samples belonging to the source domain of the class c.
3 weighted distribution alignment the purpose of weighted distribution alignment is to quantitatively evaluate the importance of marginal probability distribution and conditional probability distribution, and general weighted distribution alignment can be given as follows:
Dw=(1-μ)D(Ps,Pt)+μD(Qs,Qt)
here, the marginal probability distribution and the conditional probability distribution are calculated according to a minimum mean square difference (MMD) method, such that the marginsProbability distribution D (P)s,Pt) And conditional probability distribution D (Q)s,Qt) Can be written as:
wherein,is a set of samples belonging to class c in the source domain,is thatThe label of (1), correspondingIs a set of samples belonging to class c in the target domain,is thatA true tag of, andsince the target domain has no label, we adopt the idea proposed by Longmingsheng et al: training a certain classifier in a source field with a label, applying the classifier to a target field to obtain a pseudo label, and reducing conditional probability distribution among fields through pseudo label iteration.
Therefore, the form of the weight distribution alignment can be rewritten as:
wherein
4-transform intra-disparity minimization
Note the book
Is a generalized feature transform.
Thus, transform internal difference minimization can be written as:
5-graph Laplace regularization
Among the field adaptation problems are the presence of tagged data and non-tagged data. We expect to be able to further exploit the marginal probability distribution PsAnd PtIn other words, unlabeled data may reveal potential data characteristics of the target domain, such as sample variance. The idea of the prevailing assumption can be expressed as: if when x isi,xjWhen x is belonged to, marginal probability distribution P of datas(xs) And Pt(xt) Sufficiently close, then conditional probability distribution Qs(ys|xs) And Qt(yt|xt) But also sufficiently similar. Suppose from xiTo xjIs sufficiently smooth, the graph laplacian regularization term can be expressed as:
where W is the graph adjacency matrix and L is the regularized graph Laplace matrix.
Wherein,denotes xiP-neighbors of (a). L ═ I-D-1/2WD-1/2D is a diagonal element ofThe diagonal matrix of (a).
6 formalization solving
In summary, from five steps 1, 2, 3, 4 and 5, the optimal expression can be obtained as follows:
wherein,
next, there are:
to obtain the solution U, let the first derivativeThe following can be obtained:
wherein Λ ═ diag (λ)1,…,λk) Is the first k eigenvalues, U ═ U1,…,Uk]Is the eigenvector belonging to the corresponding eigenvalue. Pseudo code for the weighted distribution alignment and geometric feature alignment (WDGA) algorithm is shown as algorithm 2-1:
the data calibration system based on the method of the present invention performs data calibration, and the flow chart is shown in fig. 4.
For the invented data calibration system developed based on the weighted distribution alignment and geometric feature alignment (WDGA) algorithm, we tested on the following public data set:
table 1: six image recognition public data sets
Dataset Name Data Features Classes Domain(s)
Office-10 1,410 800(4,096) 10 A,W,D
Caltech-10 1,123 800(4,096) 10 C
USPS 1,800 256 10 USPS(U)
MINIST 2,000 256 10 MNIST(M)
ImageNet 7,341 4,096 5 ImageNet(I)
VOC2007 3,376 4,096 5 VOC(V)
Through tests based on different features on different data sets, it can be seen that the system developed by us is significantly better than other algorithms in terms of image classification accuracy (as shown in tables 2, 3, 4).
Table 2: accuracy rate based on SURF features on dataset Office + Caltech10
Task Raw SA SDA PCA TCA GFK JDA TJM ARTL SCA JGSA WDGA(P)
C→A 36.01 49.27 49.69 36.95 45.82 46.03 45.62 46.76 44.13 45.61 51.46 53.24
C→W 29.15 40.00 38.98 32.54 31.19 36.95 41.69 38.98 31.48 40.02 45.42 51.86
C→D 38.22 39.49 40.13 38.22 34.39 40.76 45.22 44.59 39.53 47.08 45.86 54.14
A→C 34.19 39.98 39.54 34.73 42.39 40.69 39.36 39.45 36.07 39.70 41.50 41.59
A→W 31.19 33.22 30.85 35.59 36.27 36.95 37.97 42.03 33.63 34.93 45.76 50.85
A→D 35.67 33.76 33.76 27.39 33.76 40.13 39.49 45.22 36.87 39.47 47.13 45.86
W→C 28.76 35.17 34.73 26.36 29.39 24.76 31.17 30.19 29.72 31.06 33.21 33.21
W→A 31.63 39.25 39.25 31.00 28.91 27.56 32.78 29.96 38.28 30.04 39.87 39.87
W→D 84.71 75.16 75.80 77.07 89.17 85.35 89.17 89.17 87.89 87.34 90.45 87.26
D→C 29.56 34.55 35.89 29.65 30.72 29.30 31.52 31.43 30.49 30.67 29.92 34.11
D→A 28.29 39.87 38.73 32.05 31.00 28.71 33.09 32.78 34.84 31.58 38.00 41.44
D→W 83.73 76.95 76.95 75.93 86.10 80.34 89.49 85.42 88.56 84.42 91.86 89.49
Average 40.93 44.72 44.52 39.79 43.26 43.13 46.38 46.33 44.30 45.20 50.04 51.91
Table 3: accuracy on datasets MNIST + USPS and ImageNet + VOC2007
Task Raw SA SDA PCA TCA GFK JDA TJM ARTL SCA JGSA WDGA(P)
M→U 65.94 67.78 65.01 66.20 56.33 61.22 67.28 63.28 88.76 65.11 80.44 85.78
U→M 44.70 48.80 35.73 45.04 51.20 46.45 59.65 52.25 67.66 48.03 68.15 68.30
I→V - - - 58.36 63.71 59.51 63.44 63.69 62.37 - 52.32 69.88
V→I - - - 65.13 64.86 73.79 70.16 73.02 72.22 - 70.58 82.35
Average - - - 58.68 59.02 60.24 65.13 63.06 72.75 - 67.87 76.58
Table 4: accuracy rate based on Decaf6 features on dataset Office + Caltech10
Task DMM OTGL PCA TCA GFK JDA TJM SCA ARTL CORAL JGSA WDGA(L)
C→A 92.41 92.15 88.13 89.78 88.21 90.19 88.83 89.52 92.44 92.01 91.44 92.59
C→W 87.49 84.17 83.37 78.32 77.59 85.42 81.37 85.38 87.76 80.02 86.78 90.17
C→D 90.42 87.25 84.12 85.37 86.58 85.99 84.68 87.89 86.61 84.72 93.63 94.27
A→C 84.78 85.51 79.28 82.63 79.22 81.92 84.32 78.81 87.39 83.15 84.86 88.33
A→W 84.73 83.05 70.86 74.19 70.92 80.68 71.92 75.93 88.48 74.61 81.02 91.19
A→D 92.37 85.00 82.26 81.51 82.18 81.53 76.38 85.37 85.42 84.09 88.54 94.27
W→C 81.74 81.45 70.33 80.42 69.77 81.21 83.01 74.87 88.23 75.53 84.95 87.44
W→A 86.46 90.62 73.47 84.08 76.83 90.71 87.59 85.03 92.37 81.17 90.71 91.75
W→D 98.73 96.25 99.41 100 100 100 100 100 100 100 100 99.36
D→C 83.27 84.11 71.69 82.30 71.40 80.32 83.80 78.11 87.33 76.80 86.20 85.84
D→A 90.74 92.31 79.22 89.06 76.34 91.96 90.34 90.01 92.67 85.54 91.96 92.90
D→W 99.27 96.29 97.99 99.73 99.28 99.32 99.25 98.57 100 99.37 99.66 95.25
Average 89.41 88.18 81.70 85.59 81.52 87.44 85.99 85.89 90.70 84.71 89.98 91.95
As can be seen from fig. 1, in the data processing stage, the system can maintain the original data characteristics, and can also collect the same type of samples as much as possible to improve the data separability, thereby improving the accuracy of data calibration and further improving the accuracy of image recognition.
From fig. 2, the relationship between the regularization parameters δ, μ and the parameters p and the iteration number T and the accuracy can be seen.
From the following fig. 3, it can be seen that the algorithm of our proposed invention system based on weighted distribution alignment and laplacian regularization is related to the classification accuracy.
Graph 5 is a comparison of our method (WDGA) and JGSA methods at algorithm runtime.
Table 5: WDGA and JGSA algorithm runtime comparison
Task Data×Features JGSA WDGA
C→A 2,081×800 18.50 19.19
M→U 3,800×256 16.21 17.09
D→A 1,115×800 13.17 13.45
The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.

Claims (3)

1. An unsupervised cross-domain adaptive data calibration method based on weighted distribution alignment and geometric feature alignment is characterized in that the method is realized by the following steps:
the input of the method is as follows: xs,XtXsRepresenting a source domain exemplar, a known label exemplar; xtRepresenting a target field sample and a sample to be marked;a source domain sample label;
parameters are as follows:
α is evaluated for the importance of maximizing the variance of the sample to be marked,
λ 1 is the importance rating of the difference inside the generalized feature transform,
β maximize importance assessment for the inter-class variance,
mu belongs to [0,1] is a marginal distribution and condition distribution importance parameter in the evaluation field,
delta E is [0,1] is the coefficient of the graph Laplace regularization term,
p is the number of nearest neighbors of the sample,
k is the number of subspaces, and T is the number of iterations;
the output of the method is:
the transformation matrix Φ, Ψ; xsZ obtained by transforming matrix phis,XtZ obtained by transforming the matrix Ψt(ii) a And (3) adapting to a classifier: f;
step 1, calculating a divergence matrix S of a target domaintInter-class divergence matrix S of databWithin-class divergence matrix Sw
M′s,M′t,M′st,M′tsThe sum of the weighted sum of the marginal probability distribution and the conditional probability distribution of the source field sample and the target field sample and the weighted Laplace regularization term corresponding to the weighted sum is used as the sum;
M′s,M′t,M′st,M′tsis four partitions in a matrix;
M′srepresenting the sum of the weighted sum of the marginal probability distribution and the conditional probability distribution of the source field sample and the weighted Laplace regularization term corresponding to the weighted sum;
M′trepresenting the sum of the weighted sum of the marginal probability distribution and the conditional probability distribution of the target field sample and the weighted Laplace regularization term corresponding to the weighted sum;
M′strepresenting source and target domain samplesThe sum of the weighted sum of the marginal probability distribution and the conditional probability distribution and the weighted Laplace regularization term corresponding to the weighted sum;
M′tsrepresenting the sum of weighted sum of marginal probability distribution and conditional probability distribution of the target field sample and the source field sample and weighted Laplace regularization terms corresponding to the weighted sum;
initializing pseudo labels of a target domain with a classifier trained in a source domain
Step 2: repeating the step 3 to the step 6;
and step 3: solving a generalized eigenvalue problem, taking the corresponding k eigenvectors and the first k eigenvalues as eigenvalues of a generalized eigenvalue transformation U, and then obtaining a source domain data sample transformation matrix phi and a target domain data sample transformation matrix psi;
and 4, step 4: mapping the original data to the corresponding subspace to obtain embedding:
and 5: in thatTraining a classifier to update pseudo labels of a target domain
Step 6: update M's,M′t,M′st,M′ts
And 7: until convergence;
and 8: is finally obtained atThe classifier f above.
2. An unsupervised cross-domain adaptive data calibration method based on weighted distribution alignment and geometric feature alignment, which is used for calculating the variable of claim 1 and is realized by the following steps:
step one, maximizing target field data variance
The target domain data variance is maximized in the corresponding feature subspace to avoid mapping data features onto unrelated dimensions,
in the formula
StIs the target domain divergence matrix and is,is a central matrix of which the center is,is a column vector whose elements are all 1; m represents the dimension of the target domain sample space; i istIs an identity matrix;
step two, source field data separability characteristic maintenance
The method is based on that the label space of the source field is available in the data calibration process, and the separability of the transformed source field data is controlled by using the label information of the source field
Wherein S isbAnd SwThe inter-class divergence matrix and the intra-class divergence matrix of the data are defined as follows:
wherein,for a set of source domain data samples belonging to class c, is the central matrix for the class c data,is a matrix of the units,is a column vector whose elements are all 1,the number of data samples belonging to the source field of the class c;
step three, weighted distribution alignment
The weighted distribution alignment is used for quantitatively evaluating the importance of the marginal probability distribution and the conditional probability distribution, and is given in the following way:
Dw=(1-μ)D(Ps,Pt)+μD(Qs,Qt)
calculating marginal probability distribution and conditional probability distribution according to a minimum mean square difference (MMD) method, thereby marginal probability distribution D (P)s,Pt) And conditional probability distribution D (Q)s,Qt) Can be written as:
wherein,is a set of samples belonging to class c in the source domain,is thatThe label of (1), correspondingIs a set of samples belonging to class c in the target domain,is thatA true tag of, andfor the number of samples of class C in the source domain,the number of samples of the C type in the target field is taken;
training a certain classifier in a source field with a label, applying the classifier to a target field to obtain a pseudo label, and reducing conditional probability distribution among fields through pseudo label iteration; therefore, the form of the weight distribution alignment can be rewritten as:
whereinThe blocking matrix for aligning the marginal probability distribution and the conditional probability distribution weighted distribution of the source field samples and the target field samples specifically comprises the following steps:
step four, minimizing the internal difference of the transformation
Note the book
U is a generalized feature transform and is,
thus, transform internal difference minimization can be written as:
step five, graph Laplace regularization
If when it is usedMarginal probability distribution P of time, datas(xs) And Pt(xt) Sufficiently close, then conditional probability distribution Qs(ys|xs) And Qt(yt|xt) Are also sufficiently similar; suppose from xiTo xjIs sufficiently smooth, the graph laplacian regularization term can be expressed as:
wherein W is a graph adjacency matrix, WijIs an element in W, L is a regularized graph Laplacian matrix, LijIs an element in L; x is the number ofi,xjIs an element from the sample space;
wherein,denotes xiA set of p-neighbors of (a); l ═ I-D-1/2WD-1/2D is a diagonal element ofA diagonal matrix of (a);
step six, formalization solving
In summary, the optimization objective of WDGA is the following equation:
the method comprises the following steps that TDA, BCV, WDA, GLR, IDT and WCV respectively represent maximization of target domain data variance, maintenance of source domain data separability characteristics, alignment of weighted distribution, graph Laplace regularization, minimization of transformation internal difference and minimization of intra-class variance;
the calculation of each component according to the five steps is substituted into the formula to obtain an optimized expression as follows:
wherein,
next, there are:
to obtain the solution U, let the first derivativeThe following can be obtained:
wherein Λ ═ diag (λ)1,...,λk) Is the first k eigenvalues, U ═ U1,...,Uk]For eigenvectors belonging to the corresponding eigenvalues, I is the unit momentAnd (5) arraying.
3. The system is characterized by comprising an input module, a data calibration module and an output module, wherein the input module is used for reading data to be calibrated and labels and transmitting the data to the data calibration module, the data calibration module is used for calibrating and classifying the input data to be calibrated, and the output module is used for outputting results classified by the data calibration module.
CN201811547551.4A 2018-12-18 2018-12-18 Unsupervised cross-cutting self-adapting data scaling method and system based on weight distribution alignment and geometrical characteristic alignment Pending CN109635951A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811547551.4A CN109635951A (en) 2018-12-18 2018-12-18 Unsupervised cross-cutting self-adapting data scaling method and system based on weight distribution alignment and geometrical characteristic alignment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811547551.4A CN109635951A (en) 2018-12-18 2018-12-18 Unsupervised cross-cutting self-adapting data scaling method and system based on weight distribution alignment and geometrical characteristic alignment

Publications (1)

Publication Number Publication Date
CN109635951A true CN109635951A (en) 2019-04-16

Family

ID=66074962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811547551.4A Pending CN109635951A (en) 2018-12-18 2018-12-18 Unsupervised cross-cutting self-adapting data scaling method and system based on weight distribution alignment and geometrical characteristic alignment

Country Status (1)

Country Link
CN (1) CN109635951A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449811A (en) * 2021-07-16 2021-09-28 桂林电子科技大学 Low-illumination target detection method based on MS-WSDA
WO2022095356A1 (en) * 2020-11-05 2022-05-12 平安科技(深圳)有限公司 Transfer learning method for image classification, related device, and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022095356A1 (en) * 2020-11-05 2022-05-12 平安科技(深圳)有限公司 Transfer learning method for image classification, related device, and storage medium
CN113449811A (en) * 2021-07-16 2021-09-28 桂林电子科技大学 Low-illumination target detection method based on MS-WSDA

Similar Documents

Publication Publication Date Title
Xia et al. Are anchor points really indispensable in label-noise learning?
Smola et al. On a kernel-based method for pattern recognition, regression, approximation, and operator inversion
CN107273927B (en) Unsupervised field adaptive classification method based on inter-class matching
CN105891422B (en) The electronic nose Gas Distinguishing Method that the limit learns drift compensation is migrated based on source domain
CN105335756B (en) A kind of image classification method and image classification system based on Robust Learning model
CN108875933B (en) Over-limit learning machine classification method and system for unsupervised sparse parameter learning
CN107451545B (en) The face identification method of Non-negative Matrix Factorization is differentiated based on multichannel under soft label
CN110717519B (en) Training, feature extraction and classification method, device and storage medium
CN108846357B (en) Face recognition method based on improved incremental non-negative matrix factorization
CN111461157A (en) Self-learning-based cross-modal Hash retrieval method
CN106066992B (en) The face identification method and system of differentiation dictionary learning based on adaptive local constraint
CN114580484B (en) Small sample communication signal automatic modulation identification method based on incremental learning
CN110598636B (en) Ship target identification method based on feature migration
Chen et al. Dictionary learning from ambiguously labeled data
CN109635951A (en) Unsupervised cross-cutting self-adapting data scaling method and system based on weight distribution alignment and geometrical characteristic alignment
Wang et al. Time-weighted kernel-sparse-representation-based real-time nonlinear multimode process monitoring
CN109063750B (en) SAR target classification method based on CNN and SVM decision fusion
Abrahamsen et al. Regularized pre-image estimation for kernel PCA de-noising: input space regularization and sparse reconstruction
Zhang et al. Minimal nonlinear distortion principle for nonlinear independent component analysis
Zhu et al. Multi-kernel low-rank dictionary pair learning for multiple features based image classification
CN108009570A (en) A kind of data classification method propagated based on the positive and negative label of core and system
CN115861708A (en) Low-rank sparse representation transfer learning method with adaptive graph diffusion
Derksen et al. Segmentation of multivariate mixed data via lossy coding and compression
CN115439710A (en) Remote sensing sample labeling method based on combined transfer learning
Singh et al. Classification in likelihood spaces

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190416

RJ01 Rejection of invention patent application after publication