CN104063617A - Multiple linear regression method based on dimensionality reduction hyperplane - Google Patents

Multiple linear regression method based on dimensionality reduction hyperplane Download PDF

Info

Publication number
CN104063617A
CN104063617A CN201410318782.3A CN201410318782A CN104063617A CN 104063617 A CN104063617 A CN 104063617A CN 201410318782 A CN201410318782 A CN 201410318782A CN 104063617 A CN104063617 A CN 104063617A
Authority
CN
China
Prior art keywords
beta
variable
linear regression
multiple linear
overbar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410318782.3A
Other languages
Chinese (zh)
Inventor
许蔚蔚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XU YUYU
Original Assignee
XU YUYU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XU YUYU filed Critical XU YUYU
Priority to CN201410318782.3A priority Critical patent/CN104063617A/en
Publication of CN104063617A publication Critical patent/CN104063617A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of probability theory and mathematical statistics, in particular to a multiple linear regression method based on a dimensionality reduction hyperplane. The multiple linear regression method comprises the steps of setting variables x1, x2,..., xn, wherein the variables have a linear dependence relation, setting a matrix of observed values of y as (X, Y), performing principal component analysis on (X, Y) or a normalization matrix (X1, Y1) of the (X, Y), solving a hyperplane equation perpendicular to a (n+1)<th> principal component and passing through the center of gravity of data, arranging the hyperplane equation into the form (as specified in the specification), and obtaining an estimated multiple linear regression equation (as specified in the specification) based on the observation data (X,Y).

Description

A kind of multiple linear regression analysis method based on dimensionality reduction lineoid
Technical field
The present invention relates to Probability Theory and Math Statistics field, particularly relate to a kind of multiple linear regression analysis method based on dimensionality reduction lineoid.
Background technology
Linear regression analysis is one of research method the most basic in mathematical statistics, in order to study the correlationship between variable.In socic-economic field, even if the relation between a lot of variable is not linear in macroscopic view, on microcosmic, still can be similar to and does linearization process.In addition, by the pre-service such as variable being taken the logarithm to, the nonlinear relationship between variable can be transformed to linear relationship sometimes.Statistical study, the numerical evaluation software of main flow all be take matrix operation as basis at present.Therefore, variable is carried out to high-precision linear regression and there is important basic role.
Linear regression can be divided into a heavy monobasic, heavy polynary, multiple several situations such as polynary according to the quantity of independent variable and dependent variable, and wherein, a heavy multiple linear regression is one of wherein basic problem, is summarized as follows:
Be provided with variable x 1, x 2..., x n, y meets linear relation y=β 0+ β 1x 1+ ...+β nx n+ ε, wherein β i(i=0,1 ..., n) being constant, ε is stochastic error.Each variable is carried out to N observation, and observed reading is: y=(y 1, y 2..., y n) ', be x wherein ijrepresent variable x jthe i time observed reading.
Above data and loose point set S={ (x i1..., x in, y i) | i ∈ (1 ..., N) } equivalence.The multiple linear regression equations of the estimation based on above observation data is y ^ = &beta; ^ 0 + &beta; ^ 1 x 1 + . . . + &beta; ^ n x n .
The matrix form of multiple linear regression equations be Y=(1, X) B+E, wherein, B=(β 0..., β n) ', E=(ε 1..., ε n) '.
The most frequently used solution of one heavy multiple linear regression is the linear regression method based on least square method: y is considered as to dependent variable, x 1, x 2..., x nbe considered as independent variable, independent variable is not considered as stochastic variable, only has dependent variable to be considered as stochastic variable; The maximum likelihood of parameter matrix B is estimated as B ^ = ( &beta; ^ 0 , &beta; ^ 1 ) &prime; = ( ( 1 , X ) &prime; ( 1 , X ) ) - 1 ( 1 , X ) &prime; Y .
The result of least-squares linear regression does not have coordinate independence.So-called coordinate independence refers to computing place coordinate system to do the result that orthogonal transformation (translation is or/and rotation) does not affect computing.
In socioeconomic variable, seldom there is value not there is " pure " independent variable of randomness.Due to viewing angle, observation instrument, data definition and sum up the difference of method, the observation data of same economic phenomenon may have very big difference in form, but through the even simple coordinate transform of certain linear transformation, between data, just often show obvious equivalence.For the above-mentioned reasons, wishing to have the regression result of data group of relation of equivalence also identical is very natural requirement, and therefore, the linear regression method that development has coordinate invariance is necessary.
Summary of the invention
The invention provides a kind of multiple linear regression analysis method based on dimensionality reduction lineoid, can make regression result there is coordinate independence.
For achieving the above object, the technical solution adopted in the present invention is: a kind of multiple linear regression analysis method based on dimensionality reduction lineoid, and step is as follows:
(1) be provided with variable x 1, x 2..., x n, y meets linear relation y=β 0+ β 1x 1+ ...+β nx n+ ε, wherein β i(i=0,1 ..., n) being constant, ε is stochastic error, and n > 2 carries out N observation to each variable, and observed reading is y=(y 1, y 2..., y n) ', be x wherein ijrepresent variable x jthe i time observed reading, above data and loose point set S={ (x i1..., x in, y i) | i ∈ (1 ..., N) } equivalence, the multiple linear regression equations of the estimation based on above observation data is y ^ = &beta; ^ 0 + &beta; ^ 1 x 1 + . . . + &beta; ^ n x n ;
(2) (X, Y) carried out to principal component analysis (PCA), establish major component and be followed successively by G 1..., G n+1, corresponding to the vector of unit length of above major component, be expressed as (X, Y) is at n+1 dimension space (g 1..., g n) in homography be G=(g ij) (n+1) * N;
(3) at n+1 dimension space (g 1..., g n) in, calculate and F n+1vertical and pass through each sample point geometric center lineoid equation be g n + 1 - g n + 1 &OverBar; = 0 , Wherein g j &OverBar; = &Sigma; i = 1 N g ij ;
(4) by formula middle g n+1use x 1, x 2..., x n, y represents, arranges to be form, multiple linear regression equations for the estimation based on observation data (X, Y).
The present invention also provides the multiple linear regression analysis method of the second based on dimensionality reduction lineoid, and step is as follows:
(1) be provided with variable x 1, x 2..., x n, y meets linear relation y=β 0+ β 1x 1+ ...+β nx n+ ε, wherein β i(i=0,1 ..., n) being constant, ε is stochastic error, and n > 2 carries out N observation to each variable, and observed reading is y=(y 1, y 2..., y n) ', be x wherein ijrepresent variable x jthe i time observed reading, above data and loose point set S={ (x i1..., x in, y i) | i ∈ (1 ..., N) } equivalence, the multiple linear regression equations of the estimation based on above observation data is y ^ = &beta; ^ 0 + &beta; ^ 1 x 1 + . . . + &beta; ^ n x n ;
(2) column vector in Y and X is normalized respectively, result is merged into matrix (X 1, Y 1), (X 1, Y 1) in variable corresponding to each column vector be x 1 1, x 1 2..., x 1 n, y 1, the method for normalized is: establishing Z is column vector, the average of each component that mean (Z) is Z, and the standard deviation of each component that std (Z) is Z, the normalized vector of Z is if Z and Z 1corresponding variable is respectively z and z 1, its relational expression is
(3) to (X 1, Y 1) carry out principal component analysis (PCA), establish major component and be followed successively by F 1... F n+1, corresponding to the vector of unit length of above major component, be expressed as (X 1, Y 1) at n+1 dimension space (f 1..., f n) in homography be F=(f ij) (n+1) * N;
(4) at n+1 dimension space (f 1..., f n) in, calculate and F n+1vertical and pass through each sample point geometric center lineoid equation be f n + 1 - f n + 1 &OverBar; = 0 , Wherein f j &OverBar; = &Sigma; i = 1 N f ij ;
(5) by formula middle f n+1use x 1 1, x 1 2..., x 1 n, y 1represent, arrange and be form;
(6) by formula in each variable x 1, x 2..., x n, y represents, arranges to be y = &beta; ^ 0 + &beta; ^ 1 x 1 + . . . + &beta; ^ n x n Form, y ^ = &beta; ^ 0 + &beta; ^ 1 x 1 + . . . + &beta; ^ n x n Multiple linear regression equations for the estimation based on observation data (X, Y).
The beneficial effect that the present invention reaches: make regression result there is coordinate independence, improve regression accuracy.
Accompanying drawing explanation
Embodiment
Homing method concrete steps without normalization link of the present invention are as follows:
(1) be provided with variable x 1, x 2..., x n, y meets linear relation y=β 0+ β 1x 1+ ...+β nx n+ ε, wherein β i(i=0,1 ..., n) being constant, ε is stochastic error, and n > 2 carries out N observation to each variable, and observed reading is y=(y 1, y 2..., y n) ', be x wherein ijrepresent variable x jthe i time observed reading, above data and loose point set S={ (x i1..., x in, y i) | i ∈ (1 ..., N) } equivalence, the multiple linear regression equations of the estimation based on above observation data is y ^ = &beta; ^ 0 + &beta; ^ 1 x 1 + . . . + &beta; ^ n x n . ;
(2) (X, Y) carried out to principal component analysis (PCA), establish major component and be followed successively by G 1..., G n+1, corresponding to the vector of unit length of above major component, be expressed as (X, Y) is at n+1 dimension space (g 1..., g n) in homography be G=(g ij) (n+1) * N;
(3) at n+1 dimension space (g 1..., g n) in, calculate and F n+1vertical and pass through each sample point geometric center lineoid equation be g n + 1 - g n + 1 &OverBar; = 0 , Wherein g j &OverBar; = &Sigma; i = 1 N g ij ;
(4) by formula middle g n+1use x 1, x 2..., x n, y represents, arranges to be form, multiple linear regression equations for the estimation based on observation data (X, Y).
Of the present invention have the homing method concrete steps of normalization link as follows:
(1) be provided with variable x 1, x 2..., x n, y meets linear relation y=β 0+ β 1x 1+ ...+β nx n+ ε, wherein β i(i=0,1 .., n) is constant, and ε is stochastic error, and n > 2 carries out N observation to each variable, and observed reading is y=(y 1, y 2..., y n) ', be x wherein ijrepresent variable x jthe i time observed reading, above data and loose point set S={ (x i1..., x ij, y i) | i ∈ (1 ..., N) } equivalence, the multiple linear regression equations of the estimation based on above observation data is y ^ = &beta; ^ 0 + &beta; ^ 1 x 1 + . . . + &beta; ^ n x n ;
(2) column vector in Y and X is normalized respectively, result is merged into matrix (X 1, Y 1), (X 1, Y 1) in variable corresponding to each column vector be x 1 1, x 1 2.., x 1 n, y 1, the method for normalized is: establishing Z is column vector, the average of each component that mean (Z) is Z, and the standard deviation of each component that std (Z) is Z, the normalized vector of Z is if Z and Z 1corresponding variable is respectively z and z 1, its relational expression is
(3) to (X 1, Y 1) carry out principal component analysis (PCA), establish major component and be followed successively by F 1... F n+1, corresponding to the vector of unit length of above major component, be expressed as (X 1, Y 1) at n+1 dimension space (f 1..., f n) in homography be F=(f ij) (n+1) * N;
(4) at n+1 dimension space (f 1..., f n) in, calculate and F n+1vertical and pass through each sample point geometric center lineoid equation be f n + 1 - f n + 1 &OverBar; = 0 , Wherein f j &OverBar; = &Sigma; i = 1 N f ij ;
(5) by formula middle f n+1use x 1 1, x 1 2..., x 1 n, y 1represent, arrange and be form;
(6) by formula in each variable x 1, x 2.., x n, y represents, arranges to be y = &beta; ^ 0 + &beta; ^ 1 x 1 + . . . + &beta; ^ n x n Form, y ^ = &beta; ^ 0 + &beta; ^ 1 x 1 + . . . + &beta; ^ n x n Multiple linear regression equations for the estimation based on observation data (X, Y).

Claims (2)

1. the multiple linear regression analysis method based on dimensionality reduction lineoid, is characterized in that, step is as follows:
(1) be provided with variable x 1, x 2..., x n, y meets linear relation y=β 0+ β 1x 1+ ...+β nx n+ ε, wherein β i(i=0,1 ..., n) being constant, ε is stochastic error, and n > 2 carries out N observation to each variable, and observed reading is y=(y 1, y 2..., y n) ', be x wherein ijrepresent variable x jthe i time observed reading, above data and loose point set S={ (x i1..., x in, y i) | i ∈ (1 ..., N) } equivalence, the multiple linear regression equations of the estimation based on above observation data is y ^ = &beta; ^ 0 + &beta; ^ 1 x 1 + . . . + &beta; ^ n x n ;
(2) (X, Y) carried out to principal component analysis (PCA), establish major component and be followed successively by G 1..., G n+1, corresponding to the vector of unit length of above major component, be expressed as (X, Y) is at n+1 dimension space (g 1..., g n) in homography be G=(g ij) (n+1) * N;
(3) at n+1 dimension space (g 1..., g n) in, calculate and F n+1vertical and pass through each sample point geometric center lineoid equation be g n + 1 - g n + 1 &OverBar; = 0 , Wherein g j &OverBar; = &Sigma; i = 1 N g ij ;
(4) by formula middle g n+1use x 1, x 2..., x n, y represents, arranges to be form, multiple linear regression equations for the estimation based on observation data (X, Y).
2. the multiple linear regression analysis method based on dimensionality reduction lineoid, is characterized in that, step is as follows:
(1) be provided with variable x 1, x 2..., x n, y meets linear relation y=β 0+ β 1x 1+ ...+β nx n+ ε, wherein β i(i=0,1 ..., n) being constant, ε is stochastic error, and n > 2 carries out N observation to each variable, and observed reading is y=(y 1, y 2..., y n) ', be x wherein ijrepresent variable x jthe i time observed reading, above data and loose point set S={ (x i1..., x in, y i) | i ∈ (1 ..., N) } equivalence, the multiple linear regression equations of the estimation based on above observation data is y ^ = &beta; ^ 0 + &beta; ^ 1 x 1 + . . . + &beta; ^ n x n ;
(2) column vector in Y and X is normalized respectively, result is merged into matrix (X 1, Y 1), (X 1, Y 1) in variable corresponding to each column vector be x 1 1, x 1 2..., x 1 n, y 1, the method for normalized is: establishing Z is column vector, the average of each component that mean (Z) is Z, and the standard deviation of each component that std (Z) is Z, the normalized vector of Z is if Z and Z 1corresponding variable is respectively z and z 1, its relational expression is
(3) to (X 1, Y 1) carry out principal component analysis (PCA), establish major component and be followed successively by F 1... F n+1, corresponding to the vector of unit length of above major component, be expressed as (X 1, Y 1) at n+1 dimension space (f 1..., f n) in homography be F=(f ij) (n+1) * N;
(4) at n+1 dimension space (f 1..., f n) in, calculate and F n+1vertical and pass through each sample point geometric center lineoid equation be f n + 1 - f n + 1 &OverBar; = 0 , Wherein f j &OverBar; = &Sigma; i = 1 N f ij ;
(5) by formula middle f n+1use x 1 1, x 1 2..., x 1 n, y 1represent, arrange and be form;
(6) by formula in each variable x 1, x 2..., x n, y represents, arranges to be y = &beta; ^ 0 + &beta; ^ 1 x 1 + . . . + &beta; ^ n x n Form, y ^ = &beta; ^ 0 + &beta; ^ 1 x 1 + . . . + &beta; ^ n x n Multiple linear regression equations for the estimation based on observation data (X, Y).
CN201410318782.3A 2014-07-07 2014-07-07 Multiple linear regression method based on dimensionality reduction hyperplane Pending CN104063617A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410318782.3A CN104063617A (en) 2014-07-07 2014-07-07 Multiple linear regression method based on dimensionality reduction hyperplane

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410318782.3A CN104063617A (en) 2014-07-07 2014-07-07 Multiple linear regression method based on dimensionality reduction hyperplane

Publications (1)

Publication Number Publication Date
CN104063617A true CN104063617A (en) 2014-09-24

Family

ID=51551327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410318782.3A Pending CN104063617A (en) 2014-07-07 2014-07-07 Multiple linear regression method based on dimensionality reduction hyperplane

Country Status (1)

Country Link
CN (1) CN104063617A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462818A (en) * 2014-12-08 2015-03-25 天津大学 Embedding manifold regression model based on Fisher criterion
CN113449656A (en) * 2021-07-01 2021-09-28 淮阴工学院 Driver state identification method based on improved convolutional neural network

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462818A (en) * 2014-12-08 2015-03-25 天津大学 Embedding manifold regression model based on Fisher criterion
CN104462818B (en) * 2014-12-08 2017-10-10 天津大学 A kind of insertion manifold regression model based on Fisher criterions
CN113449656A (en) * 2021-07-01 2021-09-28 淮阴工学院 Driver state identification method based on improved convolutional neural network

Similar Documents

Publication Publication Date Title
Abdullaev et al. Numerical method of solution to loaded nonlocal boundary value problems for ordinary differential equations
Kane et al. Determining the number of clusters for a k-means clustering algorithm
CN104063617A (en) Multiple linear regression method based on dimensionality reduction hyperplane
CN104281770A (en) Unary linear regression method
Wang et al. Legendre polynomials method for solving a class of variable order fractional differential equation
Mishura et al. On drift parameter estimation in models with fractional Brownian motion by discrete observations
Cheng et al. A generic position based method for real root isolation of zero-dimensional polynomial systems
Zhou et al. Optimal multi-degree reduction of Bézier curves with geometric constraints
Mohammadi et al. Estimating the parameters of an α-stable distribution using the existence of moments of order statistics
CN104182379A (en) Unary linear regression method based on rotational inertia
Kudryavtsev An efficient numerical method to solve a special cass of integro-differential equations relating to the Levy models
Singh Classification of radial solutions for semilinear elliptic systems with nonlinear gradient terms
Rathie et al. Stable and generalized-t distributions and applications
Senger et al. A Monte Carlo simulation study for Kolmogorov-Smirnov two-sample test under the precondition of heterogeneity: upon the changes on the probabilities of statistical power and type I error rates with respect to skewness measure
CN104700117A (en) Principal component analysis method of two-dimensional probability
Fonseca et al. Bayesian analysis based on the Jeffreys prior for the hyperbolic distribution
Berera et al. Information production in homogeneous isotropic turbulence
WO2016061911A1 (en) Method and device for implementing clustering algorithm based on mic
Thavaneswaran et al. Inference for linear and nonlinear stable error processes via estimating functions
Alavi On a new bimodal normal family
Kim Optimal importance sampling for the Laplace transform of exponential Brownian functionals
Ouadjed Estimation of the distortion risk premium for heavy-tailed losses under serial dependence
Borowska et al. Application of difference equation to certain tridiagonal matrices
Deng et al. Rényi information flow in the Ising model with single-spin dynamics
CN104182380A (en) Binary linear regression method based on dimensionality reduction main ingredient plane

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent for invention or patent application
CB02 Change of applicant information

Address after: 310018 School of management, Zhejiang University of Media and Communications, Xiasha 998, Hangzhou, Zhejiang

Applicant after: Xu Weiwei

Address before: 100192, Beijing, Haidian District Qinghe clear East Lane 8 building, room 504

Applicant before: Xu Weiwei

WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140924

WD01 Invention patent application deemed withdrawn after publication