CN104063617A - Multiple linear regression method based on dimensionality reduction hyperplane - Google Patents
Multiple linear regression method based on dimensionality reduction hyperplane Download PDFInfo
- Publication number
- CN104063617A CN104063617A CN201410318782.3A CN201410318782A CN104063617A CN 104063617 A CN104063617 A CN 104063617A CN 201410318782 A CN201410318782 A CN 201410318782A CN 104063617 A CN104063617 A CN 104063617A
- Authority
- CN
- China
- Prior art keywords
- beta
- variable
- linear regression
- multiple linear
- overbar
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the field of probability theory and mathematical statistics, in particular to a multiple linear regression method based on a dimensionality reduction hyperplane. The multiple linear regression method comprises the steps of setting variables x1, x2,..., xn, wherein the variables have a linear dependence relation, setting a matrix of observed values of y as (X, Y), performing principal component analysis on (X, Y) or a normalization matrix (X1, Y1) of the (X, Y), solving a hyperplane equation perpendicular to a (n+1)<th> principal component and passing through the center of gravity of data, arranging the hyperplane equation into the form (as specified in the specification), and obtaining an estimated multiple linear regression equation (as specified in the specification) based on the observation data (X,Y).
Description
Technical field
The present invention relates to Probability Theory and Math Statistics field, particularly relate to a kind of multiple linear regression analysis method based on dimensionality reduction lineoid.
Background technology
Linear regression analysis is one of research method the most basic in mathematical statistics, in order to study the correlationship between variable.In socic-economic field, even if the relation between a lot of variable is not linear in macroscopic view, on microcosmic, still can be similar to and does linearization process.In addition, by the pre-service such as variable being taken the logarithm to, the nonlinear relationship between variable can be transformed to linear relationship sometimes.Statistical study, the numerical evaluation software of main flow all be take matrix operation as basis at present.Therefore, variable is carried out to high-precision linear regression and there is important basic role.
Linear regression can be divided into a heavy monobasic, heavy polynary, multiple several situations such as polynary according to the quantity of independent variable and dependent variable, and wherein, a heavy multiple linear regression is one of wherein basic problem, is summarized as follows:
Be provided with variable x
1, x
2..., x
n, y meets linear relation y=β
0+ β
1x
1+ ...+β
nx
n+ ε, wherein β
i(i=0,1 ..., n) being constant, ε is stochastic error.Each variable is carried out to N observation, and observed reading is:
y=(y
1, y
2..., y
n) ', be x wherein
ijrepresent variable x
jthe i time observed reading.
Above data and loose point set S={ (x
i1..., x
in, y
i) | i ∈ (1 ..., N) } equivalence.The multiple linear regression equations of the estimation based on above observation data is
The matrix form of multiple linear regression equations be Y=(1, X) B+E, wherein, B=(β
0..., β
n) ', E=(ε
1..., ε
n) '.
The most frequently used solution of one heavy multiple linear regression is the linear regression method based on least square method: y is considered as to dependent variable, x
1, x
2..., x
nbe considered as independent variable, independent variable is not considered as stochastic variable, only has dependent variable to be considered as stochastic variable; The maximum likelihood of parameter matrix B is estimated as
The result of least-squares linear regression does not have coordinate independence.So-called coordinate independence refers to computing place coordinate system to do the result that orthogonal transformation (translation is or/and rotation) does not affect computing.
In socioeconomic variable, seldom there is value not there is " pure " independent variable of randomness.Due to viewing angle, observation instrument, data definition and sum up the difference of method, the observation data of same economic phenomenon may have very big difference in form, but through the even simple coordinate transform of certain linear transformation, between data, just often show obvious equivalence.For the above-mentioned reasons, wishing to have the regression result of data group of relation of equivalence also identical is very natural requirement, and therefore, the linear regression method that development has coordinate invariance is necessary.
Summary of the invention
The invention provides a kind of multiple linear regression analysis method based on dimensionality reduction lineoid, can make regression result there is coordinate independence.
For achieving the above object, the technical solution adopted in the present invention is: a kind of multiple linear regression analysis method based on dimensionality reduction lineoid, and step is as follows:
(1) be provided with variable x
1, x
2..., x
n, y meets linear relation y=β
0+ β
1x
1+ ...+β
nx
n+ ε, wherein β
i(i=0,1 ..., n) being constant, ε is stochastic error, and n > 2 carries out N observation to each variable, and observed reading is
y=(y
1, y
2..., y
n) ', be x wherein
ijrepresent variable x
jthe i time observed reading, above data and loose point set S={ (x
i1..., x
in, y
i) | i ∈ (1 ..., N) } equivalence, the multiple linear regression equations of the estimation based on above observation data is
(2) (X, Y) carried out to principal component analysis (PCA), establish major component and be followed successively by G
1..., G
n+1, corresponding to the vector of unit length of above major component, be expressed as
(X, Y) is at n+1 dimension space (g
1..., g
n) in homography be G=(g
ij)
(n+1) * N;
(3) at n+1 dimension space (g
1..., g
n) in, calculate and F
n+1vertical and pass through each sample point geometric center
lineoid equation be
Wherein
(4) by formula
middle g
n+1use x
1, x
2..., x
n, y represents, arranges to be
form,
multiple linear regression equations for the estimation based on observation data (X, Y).
The present invention also provides the multiple linear regression analysis method of the second based on dimensionality reduction lineoid, and step is as follows:
(1) be provided with variable x
1, x
2..., x
n, y meets linear relation y=β
0+ β
1x
1+ ...+β
nx
n+ ε, wherein β
i(i=0,1 ..., n) being constant, ε is stochastic error, and n > 2 carries out N observation to each variable, and observed reading is
y=(y
1, y
2..., y
n) ', be x wherein
ijrepresent variable x
jthe i time observed reading, above data and loose point set S={ (x
i1..., x
in, y
i) | i ∈ (1 ..., N) } equivalence, the multiple linear regression equations of the estimation based on above observation data is
(2) column vector in Y and X is normalized respectively, result is merged into matrix (X
1, Y
1), (X
1, Y
1) in variable corresponding to each column vector be x
1 1, x
1 2..., x
1 n, y
1, the method for normalized is: establishing Z is column vector, the average of each component that mean (Z) is Z, and the standard deviation of each component that std (Z) is Z, the normalized vector of Z is
if Z and Z
1corresponding variable is respectively z and z
1, its relational expression is
(3) to (X
1, Y
1) carry out principal component analysis (PCA), establish major component and be followed successively by F
1... F
n+1, corresponding to the vector of unit length of above major component, be expressed as
(X
1, Y
1) at n+1 dimension space (f
1..., f
n) in homography be F=(f
ij)
(n+1) * N;
(4) at n+1 dimension space (f
1..., f
n) in, calculate and F
n+1vertical and pass through each sample point geometric center
lineoid equation be
Wherein
(5) by formula
middle f
n+1use x
1 1, x
1 2..., x
1 n, y
1represent, arrange and be
form;
(6) by formula
in each variable x
1, x
2..., x
n, y represents, arranges to be
Form,
Multiple linear regression equations for the estimation based on observation data (X, Y).
The beneficial effect that the present invention reaches: make regression result there is coordinate independence, improve regression accuracy.
Accompanying drawing explanation
Embodiment
Homing method concrete steps without normalization link of the present invention are as follows:
(1) be provided with variable x
1, x
2..., x
n, y meets linear relation y=β
0+ β
1x
1+ ...+β
nx
n+ ε, wherein β
i(i=0,1 ..., n) being constant, ε is stochastic error, and n > 2 carries out N observation to each variable, and observed reading is
y=(y
1, y
2..., y
n) ', be x wherein
ijrepresent variable x
jthe i time observed reading, above data and loose point set S={ (x
i1..., x
in, y
i) | i ∈ (1 ..., N) } equivalence, the multiple linear regression equations of the estimation based on above observation data is
(2) (X, Y) carried out to principal component analysis (PCA), establish major component and be followed successively by G
1..., G
n+1, corresponding to the vector of unit length of above major component, be expressed as
(X, Y) is at n+1 dimension space (g
1..., g
n) in homography be G=(g
ij)
(n+1) * N;
(3) at n+1 dimension space (g
1..., g
n) in, calculate and F
n+1vertical and pass through each sample point geometric center
lineoid equation be
Wherein
(4) by formula
middle g
n+1use x
1, x
2..., x
n, y represents, arranges to be
form,
multiple linear regression equations for the estimation based on observation data (X, Y).
Of the present invention have the homing method concrete steps of normalization link as follows:
(1) be provided with variable x
1, x
2..., x
n, y meets linear relation y=β
0+ β
1x
1+ ...+β
nx
n+ ε, wherein β
i(i=0,1 .., n) is constant, and ε is stochastic error, and n > 2 carries out N observation to each variable, and observed reading is
y=(y
1, y
2..., y
n) ', be x wherein
ijrepresent variable x
jthe i time observed reading, above data and loose point set S={ (x
i1..., x
ij, y
i) | i ∈ (1 ..., N) } equivalence, the multiple linear regression equations of the estimation based on above observation data is
(2) column vector in Y and X is normalized respectively, result is merged into matrix (X
1, Y
1), (X
1, Y
1) in variable corresponding to each column vector be x
1 1, x
1 2.., x
1 n, y
1, the method for normalized is: establishing Z is column vector, the average of each component that mean (Z) is Z, and the standard deviation of each component that std (Z) is Z, the normalized vector of Z is
if Z and Z
1corresponding variable is respectively z and z
1, its relational expression is
(3) to (X
1, Y
1) carry out principal component analysis (PCA), establish major component and be followed successively by F
1... F
n+1, corresponding to the vector of unit length of above major component, be expressed as
(X
1, Y
1) at n+1 dimension space (f
1..., f
n) in homography be F=(f
ij)
(n+1) * N;
(4) at n+1 dimension space (f
1..., f
n) in, calculate and F
n+1vertical and pass through each sample point geometric center
lineoid equation be
Wherein
(5) by formula
middle f
n+1use x
1 1, x
1 2..., x
1 n, y
1represent, arrange and be
form;
(6) by formula
in each variable x
1, x
2.., x
n, y represents, arranges to be
Form,
Multiple linear regression equations for the estimation based on observation data (X, Y).
Claims (2)
1. the multiple linear regression analysis method based on dimensionality reduction lineoid, is characterized in that, step is as follows:
(1) be provided with variable x
1, x
2..., x
n, y meets linear relation y=β
0+ β
1x
1+ ...+β
nx
n+ ε, wherein β
i(i=0,1 ..., n) being constant, ε is stochastic error, and n > 2 carries out N observation to each variable, and observed reading is
y=(y
1, y
2..., y
n) ', be x wherein
ijrepresent variable x
jthe i time observed reading, above data and loose point set S={ (x
i1..., x
in, y
i) | i ∈ (1 ..., N) } equivalence, the multiple linear regression equations of the estimation based on above observation data is
(2) (X, Y) carried out to principal component analysis (PCA), establish major component and be followed successively by G
1..., G
n+1, corresponding to the vector of unit length of above major component, be expressed as
(X, Y) is at n+1 dimension space (g
1..., g
n) in homography be G=(g
ij)
(n+1) * N;
(3) at n+1 dimension space (g
1..., g
n) in, calculate and F
n+1vertical and pass through each sample point geometric center
lineoid equation be
Wherein
(4) by formula
middle g
n+1use x
1, x
2..., x
n, y represents, arranges to be
form,
multiple linear regression equations for the estimation based on observation data (X, Y).
2. the multiple linear regression analysis method based on dimensionality reduction lineoid, is characterized in that, step is as follows:
(1) be provided with variable x
1, x
2..., x
n, y meets linear relation y=β
0+ β
1x
1+ ...+β
nx
n+ ε, wherein β
i(i=0,1 ..., n) being constant, ε is stochastic error, and n > 2 carries out N observation to each variable, and observed reading is
y=(y
1, y
2..., y
n) ', be x wherein
ijrepresent variable x
jthe i time observed reading, above data and loose point set S={ (x
i1..., x
in, y
i) | i ∈ (1 ..., N) } equivalence, the multiple linear regression equations of the estimation based on above observation data is
(2) column vector in Y and X is normalized respectively, result is merged into matrix (X
1, Y
1), (X
1, Y
1) in variable corresponding to each column vector be x
1 1, x
1 2..., x
1 n, y
1, the method for normalized is: establishing Z is column vector, the average of each component that mean (Z) is Z, and the standard deviation of each component that std (Z) is Z, the normalized vector of Z is
if Z and Z
1corresponding variable is respectively z and z
1, its relational expression is
(3) to (X
1, Y
1) carry out principal component analysis (PCA), establish major component and be followed successively by F
1... F
n+1, corresponding to the vector of unit length of above major component, be expressed as
(X
1, Y
1) at n+1 dimension space (f
1..., f
n) in homography be F=(f
ij)
(n+1) * N;
(4) at n+1 dimension space (f
1..., f
n) in, calculate and F
n+1vertical and pass through each sample point geometric center
lineoid equation be
Wherein
(5) by formula
middle f
n+1use x
1 1, x
1 2..., x
1 n, y
1represent, arrange and be
form;
(6) by formula
in each variable x
1, x
2..., x
n, y represents, arranges to be
Form,
Multiple linear regression equations for the estimation based on observation data (X, Y).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410318782.3A CN104063617A (en) | 2014-07-07 | 2014-07-07 | Multiple linear regression method based on dimensionality reduction hyperplane |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410318782.3A CN104063617A (en) | 2014-07-07 | 2014-07-07 | Multiple linear regression method based on dimensionality reduction hyperplane |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104063617A true CN104063617A (en) | 2014-09-24 |
Family
ID=51551327
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410318782.3A Pending CN104063617A (en) | 2014-07-07 | 2014-07-07 | Multiple linear regression method based on dimensionality reduction hyperplane |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104063617A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104462818A (en) * | 2014-12-08 | 2015-03-25 | 天津大学 | Embedding manifold regression model based on Fisher criterion |
CN113449656A (en) * | 2021-07-01 | 2021-09-28 | 淮阴工学院 | Driver state identification method based on improved convolutional neural network |
-
2014
- 2014-07-07 CN CN201410318782.3A patent/CN104063617A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104462818A (en) * | 2014-12-08 | 2015-03-25 | 天津大学 | Embedding manifold regression model based on Fisher criterion |
CN104462818B (en) * | 2014-12-08 | 2017-10-10 | 天津大学 | A kind of insertion manifold regression model based on Fisher criterions |
CN113449656A (en) * | 2021-07-01 | 2021-09-28 | 淮阴工学院 | Driver state identification method based on improved convolutional neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Abdullaev et al. | Numerical method of solution to loaded nonlocal boundary value problems for ordinary differential equations | |
Kane et al. | Determining the number of clusters for a k-means clustering algorithm | |
CN104063617A (en) | Multiple linear regression method based on dimensionality reduction hyperplane | |
CN104281770A (en) | Unary linear regression method | |
Wang et al. | Legendre polynomials method for solving a class of variable order fractional differential equation | |
Mishura et al. | On drift parameter estimation in models with fractional Brownian motion by discrete observations | |
Cheng et al. | A generic position based method for real root isolation of zero-dimensional polynomial systems | |
Zhou et al. | Optimal multi-degree reduction of Bézier curves with geometric constraints | |
Mohammadi et al. | Estimating the parameters of an α-stable distribution using the existence of moments of order statistics | |
CN104182379A (en) | Unary linear regression method based on rotational inertia | |
Kudryavtsev | An efficient numerical method to solve a special cass of integro-differential equations relating to the Levy models | |
Singh | Classification of radial solutions for semilinear elliptic systems with nonlinear gradient terms | |
Rathie et al. | Stable and generalized-t distributions and applications | |
Senger et al. | A Monte Carlo simulation study for Kolmogorov-Smirnov two-sample test under the precondition of heterogeneity: upon the changes on the probabilities of statistical power and type I error rates with respect to skewness measure | |
CN104700117A (en) | Principal component analysis method of two-dimensional probability | |
Fonseca et al. | Bayesian analysis based on the Jeffreys prior for the hyperbolic distribution | |
Berera et al. | Information production in homogeneous isotropic turbulence | |
WO2016061911A1 (en) | Method and device for implementing clustering algorithm based on mic | |
Thavaneswaran et al. | Inference for linear and nonlinear stable error processes via estimating functions | |
Alavi | On a new bimodal normal family | |
Kim | Optimal importance sampling for the Laplace transform of exponential Brownian functionals | |
Ouadjed | Estimation of the distortion risk premium for heavy-tailed losses under serial dependence | |
Borowska et al. | Application of difference equation to certain tridiagonal matrices | |
Deng et al. | Rényi information flow in the Ising model with single-spin dynamics | |
CN104182380A (en) | Binary linear regression method based on dimensionality reduction main ingredient plane |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C53 | Correction of patent for invention or patent application | ||
CB02 | Change of applicant information |
Address after: 310018 School of management, Zhejiang University of Media and Communications, Xiasha 998, Hangzhou, Zhejiang Applicant after: Xu Weiwei Address before: 100192, Beijing, Haidian District Qinghe clear East Lane 8 building, room 504 Applicant before: Xu Weiwei |
|
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140924 |
|
WD01 | Invention patent application deemed withdrawn after publication |