A kind of high-spectral data dimension reduction method based on Factor Analysis Model
Technical field
The present invention relates to a kind of high-spectral data dimension reduction method, belong to high-spectral data disposal route and applied technical field, be applicable to the theoretical method and the application technical research of high-spectral data dimensionality reduction based on Factor Analysis Model.
Background technology
Hyperspectral imager is a kind of novel remote sensing load, and its spectrum has tight, continuous characteristics, can write down the spectrum and the spatial information characteristic of tested same atural object simultaneously, and therefore, high-spectral data dimension height and data volume are big; Because spectral resolution is higher, has higher correlativity between each wave band, has had a large amount of redundant informations.And along with the increase of wave band data, data processing amount is index and rises.Therefore,, eliminate the correlativity between wave band, become the key link of data processing how with the high-spectral data dimensionality reduction.
Propose a lot of methods to the high-spectral data dimensionality reduction at present, mainly be divided into two big types of band selection and feature extractions.The purpose of band selection is that n is tieed up high-spectral data with the (m<n) data subset replacement of the m dimension with the meaning represented; Though the problem of these class methods is the characteristics that kept original digital image data preferably, lost the quantity of information and the spectral details information of high-spectral data; The purpose of feature extraction is n to be tieed up high-spectral data be compressed to m dimension and (in the model space of m<n), make the separability between classification maximum.Therefore, these class methods have been given prominence to the otherness between view data in the data dimensionality reduction, but have changed the primary characteristic of data.At present the problem that mainly exists of Method of Data with Adding Windows is: good all useful informations of providing of retention data can not be illustrated in the same type characters of ground object in the identical wave band.
Summary of the invention
The objective of the invention is to a kind of high-spectral data dimension reduction methods of not enough proposition such as existing high-spectral data dimension reduction method lost data information based on Factor Analysis Model.
Technical solution of the present invention is: a kind of high-spectral data dimension reduction method based on Factor Analysis Model of the present invention is meant that specifically a kind of Factor Analysis Model drops to high spectrum higher dimensional space data the method for the lower dimensional space that characterizes the high-spectral data intrinsic property.This method mainly is to utilize Factor Analysis Model, and through the calculated factor loading matrix, based on the maximum intrinsic dimension that factor loading is rotated, the calculated factor score obtains characterizing high-spectral data of variance, thereby realize the high-spectral data dimensionality reduction.
A kind of high-spectral data dimension reduction method of the present invention based on Factor Analysis Model, its step is following:
(1) high-spectrum remote sensing data reads in;
(2) set up the Factor Analysis Model of high-spectral data dimensionality reduction;
(3) average, covariance matrix and the correlation matrix of calculating high-spectral data;
(4) eigenwert of computational data correlation matrix and standardized feature vector;
(5) carry out finding the solution of factor loading matrix by the major component method of separating;
(6) the common degree of the variance matrix of specific factor and data variable in the calculated factor analytical model;
(7) calculate based on the maximum factor loading rotation matrix of variance;
(8) utilize based on weighted least require method calculated factor score;
(9) obtain characterizing the intrinsic dimension of high-spectral data, realize the high-spectral data dimensionality reduction.
Wherein, " the setting up the Factor Analysis Model of high-spectral data dimensionality reduction " described in the step (2),
Its Factor Analysis Model is:
X=μ+AF+ε
In the formula, X=(x
1, x
2..., x
pThe observable stochastic variable of) ' be, average μ=E (X), the common factor F=(F of X
1, F
2..., F
m) ' (m<p) is a unobservable random variable, and satisfies average E (F)=0, variance D (F)=I
m(being that each component variance of F is 1 and uncorrelated mutually), specific factor ε=(ε
1, ε
2..., ε
p) ' uncorrelated mutually with F, and satisfy average E (ε)=0, variance
Common factor generally all works to each component of X, and specific factor only works to a certain component of X, and between the specific factor and uncorrelated mutually between specific factor and the common factor; Factor loading matrix A=(a
Ij) p * mBe matrix of coefficients to be estimated, in factorial analysis, specific factor plays residual error or noise.
Wherein, step (3) described " calculating average, covariance matrix and the correlation matrix of high-spectral data ", its computing method are following:
In the formula, X is the average of data; V is a covariance matrix; R is a correlation matrix; N representes the pixel number; X
(t)Represent t pixel.
Wherein, " eigenwert of computational data correlation matrix and the standardized feature vector " described in the step (4), its implication is explained as follows: the eigenwert of calculating correlation matrix R is λ
1>=λ
2>=...>=λ
p>=0, and the proper vector l of the Schimidt orthogonalization of each eigenwert correspondence
1, l
2..., l
p
Wherein, " the carrying out finding the solution of factor loading matrix " described in the step (5) by the major component method of separating, its computing method are following:
Basis at first
Smallest positive integral, confirm common factor number m;
The method of utilizing major component to separate is then found the solution the factor loading matrix: the eigenwert of utilizing the correlation matrix R that calculates in the step (4) is λ
1>=λ
2>=...>=λ
p>=0, the proper vector behind the corresponding Schimidt orthogonalization is l
1, l
2..., l
p, then utilize the spectral factorization of correlation matrix R:
Separate for one that obtains Factor Analysis Model:
Wherein, " the common degree of the variance matrix of specific factor and data variable in the calculated factor analytical model " described in the step (6), its implication is explained as follows: the variance of specific factor is in the Factor Analysis Model:
Variable x
iCommon degree h
i 2Be estimated as
Wherein, " calculating " described in the step (7) based on the maximum factor loading rotation matrix of variance; Its implication is explained as follows: common factor is done the quadrature rotation exactly loading matrix A is done orthogonal transformation; Orthogonal matrix Γ is taken advantage of on the right side, makes A Γ have distincter meaning, and postrotational common factor vector is F
*=Γ ' F, its each component F
1 *, F
2 *..., F
m *It also is mutual incoherent common factor; Difference according to orthogonal matrix Γ is chosen mode, and with constructing different quadrature spinning solutions, the method that adopts among the present invention is the spinning solution of maximum variance; Concrete implementation procedure is following:
Order
A then
*The relative variance of j column element square can be defined as:
In the formula, get d
Ij 2Be in order to eliminate d
IjThe influence that symbol is different; Use a
Ij *Divided by h
iBe in order to eliminate each original variable X
iThe influence different to the common factor degree of dependence; The maximum variance spinning solution is selected orthogonal matrix Γ exactly, makes A
*It is maximum that the relative variance V of all m column elements square reaches: V=V
1+ V
2+ ... + V
mOne by one per two common factors are carried out above-mentioned rotation, to common factor F
lAnd F
kRotation is carried out orthogonal transformation to the l of A matrix row and k row exactly, make the relative variance sum maximum of the element square of these two row, and all the other row is constant, rotation matrix Γ
LkCan be expressed as:
In the formula, θ is F
lAnd F
kTrip's gyration, all the other elements of matrix are 0; Ask θ to make V reach maximum, by the method for asking extreme value in the infinitesimal analysis, to the θ differentiate, and to make it be 0 with V:
So, by
:
In the formula:
a
I1, a
I2Be the element in the factor loading matrix, h
i 2Be variable x
iCommon degree.
The first round, postrotational factor loading matrix was A
(1), begin the rotation of next round then, obtain a series of factor loading matrix and be: A
(1), A
(2)..., A
(s)..., then must have: V
(1)≤V
(2)≤...≤V
(s)≤..., V
(s)Be A
(s)The relative variance sum of each column element square in the practical application, is worked as V
(s-1)-V
(s)During≤ε, can stop the rotation.
Wherein, " utilization " described in the step (8) based on weighted least require method calculated factor score, its implication is explained as follows: on the basis that obtains A and D, adopt weighted least-squares that F is found the solution:
Promptly obtained the weighted least-squares estimation of F, in the formula,
Be the factor loading matrix that estimation obtains,
Be the specific factor variance matrix that estimation obtains, X reads in data.
Wherein, " obtaining characterizing the intrinsic dimension of high-spectral data; realize the high-spectral data dimensionality reduction " described in the step (9), its implication is explained as follows: utilize factor score to estimate to obtain characterizing the intrinsic dimension of high-spectral data, thereby realized the high-spectral data dimensionality reduction.
The present invention's advantage compared with prior art is: overcome existing high-spectral data dimension reduction method lost data information, can't the retention data primary characteristic etc. limitation; This method has been utilized Factor Analysis Model; And adopted quadrature to rotate based on the maximum factor matrix of variance; Obtain characterizing the intrinsic dimension of high-spectral data, realized the high-spectral data dimensionality reduction.It has following advantage: (1) utilizes Factor Analysis Model that high-spectral data is carried out the intrinsic dimension that reflection high-spectral data structure and characteristic are tried to achieve in modeling; Simplify data structure when eliminating the correlativity between wave band, kept the data primary characteristic preferably; (2) utilized based on the maximum factor loading rotation of variance, be easy to the explanation of each factor representative implication, and every kind of atural object only plays a leading role in a factor, farthest increased the separability between the dissimilar atural objects.
Description of drawings
Fig. 1 is the realization flow of a kind of high-spectral data dimension reduction method based on Factor Analysis Model that the present invention relates to
Embodiment
For the high-spectral data dimension reduction method based on Factor Analysis Model that the present invention relates to better is described, utilize PHI aviation high-spectral data to carry out area, Fang Lu tea plantation, Jiangsu crops sophisticated category.A kind of high-spectral data dimension reduction method of the present invention based on Factor Analysis Model, realization flow is as shown in Figure 1, and concrete performing step is following:
(1) reading in of high-spectral data: read in Fang Lu tea plantation PHI high-spectral data, remove wave bands such as signal to noise ratio (S/N ratio) is low, Atmospheric Absorption, former size of data is 210 * 150 * 64;
(2) set up the Factor Analysis Model of high-spectral data dimensionality reduction;
The Factor Analysis Model of high-spectral data dimensionality reduction is:
X=μ+AF?+ε
In the formula, X=(x
1, x
2..., x
64The observable stochastic variable of) ' be, average μ=E (X), the common factor F=(F of X
1, F
2..., F
m) ' (m<64) be unobservable random variable, and satisfy average E (F)=0, variance D (F)=I
m(being that each component variance of F is 1 and uncorrelated mutually), specific factor ε=(ε
1, ε
2..., ε
64) ' uncorrelated mutually with F, and satisfy average E (ε)=0, variance
Common factor generally all works to each component of X, and specific factor only works to a certain component of X, and between the specific factor and uncorrelated mutually between specific factor and the common factor; Matrix A=(a
Ij)
P * mBe matrix of coefficients to be estimated, be called the factor loading matrix, in factorial analysis, specific factor plays residual error or noise;
(3) average, covariance matrix and the correlation matrix of calculating high-spectral data;
The computing method of 210 * 150 * 64 high-spectral data average X, covariance matrix V, correlation matrix R are following:
In the formula, X is the average of data; V is a covariance matrix; R is a correlation matrix; N=210 * 150 expression pixel numbers; X
(t)Represent t pixel.
(4) eigenwert of computational data correlation matrix and standardized feature vector;
The eigenwert of calculating correlation matrix R is λ
1>=λ
2>=...>=λ
64>=0, the corresponding proper vector behind Schimidt orthogonalization is l
1, l
2..., l
64
(5) carry out finding the solution of factor loading matrix by the major component method of separating;
Basis at first
Smallest positive integral confirm common factor number m=6;
The method of utilizing major component to separate is then found the solution the factor loading matrix: the eigenwert of utilizing the correlation matrix R that calculates in the step (4) is λ
1>=λ
2>=...>=λ
64>=0, the corresponding proper vector behind Schimidt orthogonalization is l
1, l
2..., l
64, then utilize the spectral factorization of correlation matrix R:
Separate for one that obtains Factor Analysis Model,
(6) the common degree of the variance matrix of specific factor and data variable in the calculated factor analytical model;
The variance of specific factor is in the Factor Analysis Model:
Variable x
iCommon degree h
i 2Be estimated as
(7) calculate based on the maximum factor loading rotation matrix of variance;
Common factor is done the quadrature rotation exactly loading matrix A is done orthogonal transformation, orthogonal matrix Γ is taken advantage of on the right side, makes A Γ have distincter meaning, and postrotational common factor vector is F
*=Γ ' F, its each component F
1 *, F
2 *..., F
m *It also is mutual incoherent common factor; Difference according to orthogonal matrix Γ is chosen mode, and with constructing different quadrature spinning solutions, the method that adopts among the present invention is the spinning solution of maximum variance; Concrete implementation procedure is following:
Order
A then
*The relative variance of j column element square can be defined as:
In the formula, get d
Ij 2Be in order to eliminate d
IjThe influence that symbol is different; Use a
Ij *Divided by h
iBe in order to eliminate each original variable X
iThe influence different to the common factor degree of dependence; Selection is divided by h
iBe because A
*The capable quadratic sum of i:
The maximum variance spinning solution is selected orthogonal matrix Γ exactly, makes A
*It is maximum that the relative variance V of all 6 column elements square reaches: V=V
1+ V
2+ ... + V
6One by one per two common factors are carried out above-mentioned rotation, to common factor F
lAnd F
kRotation is carried out orthogonal transformation to the l of A matrix row and k row exactly, make the relative variance sum maximum of the element square of these two row, and all the other row is constant, rotation matrix Γ
LkCan be expressed as:
In the formula, θ is F
lAnd F
kThe anglec of rotation, all the other elements of matrix are 0; Ask θ to make V reach maximum, by the method for asking extreme value in the infinitesimal analysis, to the θ differentiate, and to make it be 0 with V:
So, by
:
In the formula:
a
I1, a
I2Be the element in the factor loading matrix, h
i 2Be variable x
iCommon degree.
The first round, postrotational factor loading matrix was A
(1), begin the rotation of next round then, obtain a series of factor loading matrix and be: A
(1), A
(2)..., A
(s)..., then must have: V
(1)≤V
(2)≤...≤V
(s)≤..., V
(s)Be A
(s)The relative variance sum of each column element square in the practical application, is worked as V
(s-1)-V
(s)During≤ε (ε among the present invention≤0.001), can stop the rotation;
(8) utilize based on weighted least require method calculated factor score;
On the basis that obtains factor loading matrix A and specific factor variance matrix D, adopt weighted least-squares that F is found the solution:
The weighted least-squares that has promptly obtained F is estimated;
(9) obtain characterizing the intrinsic dimension of high-spectral data, realize the high-spectral data dimensionality reduction.
Utilize factor score to estimate to obtain characterizing the intrinsic dimension of high-spectral data, thereby realize the high-spectral data dimensionality reduction; 6 intrinsic wave bands that characterize all information of the high-spectral data that uses have been obtained.