CN113190956A - Regression modeling method for big data of manufacturing industry - Google Patents

Regression modeling method for big data of manufacturing industry Download PDF

Info

Publication number
CN113190956A
CN113190956A CN202110295478.1A CN202110295478A CN113190956A CN 113190956 A CN113190956 A CN 113190956A CN 202110295478 A CN202110295478 A CN 202110295478A CN 113190956 A CN113190956 A CN 113190956A
Authority
CN
China
Prior art keywords
latent
data
variables
variable
regression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110295478.1A
Other languages
Chinese (zh)
Other versions
CN113190956B (en
Inventor
任鸿儒
邱勇
鲁仁全
吴元清
李鸿一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202110295478.1A priority Critical patent/CN113190956B/en
Publication of CN113190956A publication Critical patent/CN113190956A/en
Application granted granted Critical
Publication of CN113190956B publication Critical patent/CN113190956B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/10Numerical modelling
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Manufacturing & Machinery (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a regression modeling method for big data of a manufacturing industry, which comprises the following steps: s1, obtaining low-dimensional features suitable for modeling through data preprocessing; s2, converting low-dimensional data of different service domains into latent variable forms; s3, establishing a regression equation among different latent variables through partial least squares regression analysis, calculating to obtain the latent variables according to the maximum covariance among the latent variables, determining the number of the latent variables by adopting the sum of squares of the prediction residuals, and realizing the simultaneous regression analysis of a plurality of dependent variables on a plurality of independent variables; and S4, establishing a binomial regression equation between the latent variables to obtain a standard regression coefficient beta of each independent variable acting on each dependent variable, and further obtaining a single service predicted value. According to the invention, through establishing a latent structure model among service domains, influence relations among different service domain data are excavated, and different types of data of a plurality of service domains are communicated, so that a single service modeling effect is better, and the quality and efficiency of the service are improved.

Description

Regression modeling method for big data of manufacturing industry
Technical Field
The invention relates to the technical field of big data analysis and modeling, in particular to a regression modeling method for big data of a manufacturing industry.
Background
The manufacturing industry is one of the prop industries of national economy, and is the embodiment for realizing the modernization guarantee and the comprehensive national force. With the increasing development of economy and science and technology, the data volume generated by modern manufacturing industry is exponentially increased, so that the potential and value of big data are gradually accepted and received by society, and the combination of the big data and the manufacturing industry promotes the comprehensive reform of design, management, manufacturing and service modes of the manufacturing industry. However, such manufacturing data usually has characteristics of multiple sources, heterogeneity, complexity and the like, which is also one of the main problems that the manufacturing enterprise needs to face when modeling the big data.
The existing big data model is only a single service for a manufacturing enterprise, does not consider the correlation among the services, neglects the influence of the services such as design, management and service on the manufacturing process, and does not establish the incidence relation between the manufacturing service and other services, so that the data among the services of the manufacturing enterprise is not fully utilized, and the manufacturing process of the whole flow cannot be strictly controlled and reasonably planned.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a regression modeling method for big data of a manufacturing industry.
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
a regression modeling method for big data of a manufacturing industry is characterized in that an influence relation among data of different business domains is excavated by establishing a latent structure model among the business domains, and different types of data of a plurality of business domains are communicated; the method specifically comprises the following steps:
s1, performing dimensionality reduction and denoising on high-dimensional data of different service domains through data preprocessing to obtain low-dimensional features suitable for modeling;
s2, converting low-dimensional data of different service domains into latent variable forms;
s3, establishing a regression equation among different latent variables through partial least squares regression analysis, determining a weight coefficient according to the maximum covariance among the latent variables, namely, the maximum correlation degree of the latent variables, calculating to obtain the latent variables, and determining the number of the latent variables by adopting the sum of squares of predicted residuals, so as to realize the simultaneous regression analysis of a plurality of dependent variables on a plurality of independent variables;
and S4, establishing a binomial regression equation between the latent variables to obtain a standard regression coefficient beta of each independent variable acting on each dependent variable, and further obtaining a single service predicted value.
Further, in step S1, a principal component analysis method is used to establish a linear mapping from the high-dimensional space projection to the low-dimensional space, so as to obtain the projection matrix W, and the specific process is as follows:
by Y ═ Y1,y2,...,yN]Representing high-dimensional data requiring dimensionality reduction, X ═ X1,x2,...,xN]Representing the low-dimensional data after dimensionality reduction, wherein N is the number of samples; and assuming data noise ηn∈RDIn accordance with independent Gaussian distribution etan~N(0,β-1I),β-1For noise variance, I is an identity matrix, and the mapping from high-dimensional space to low-dimensional space is represented as:
yn=W·xnn, (1)
wherein the mapping is determined by a projection matrix W, then the likelihood probability of the high dimensional data space is:
p(yn|xn,W,β)=N(yn|Wxn-1I), (2)
assuming that the data points in the low-dimensional space are independently and identically distributed, there are:
p(xn)=N(xn|0,I), (3)
integrating the data points in the low-dimensional space, deducing the edge likelihood,
P(yn|W,β)=∫p(yn|xn,W,β-1I)p(xn)dxn
=N(yn|WWT-1I). (4)
and the joint likelihood of the high-dimensional spatial data,
Figure BDA0002984183990000031
obtaining a projection matrix W by using a maximum likelihood method;
from the obtained projection matrix W, a low-dimensional representation X of the high-dimensional data Y is obtained by formula (1).
Further, in step S1, when obtaining the projection matrix W, obtaining the parameter maximum likelihood estimation by using the EM algorithm specifically includes the following steps:
(i) calculating a log-likelihood function expectation for the data, the likelihood function being:
Figure BDA0002984183990000032
noting the log-likelihood function as ln p (Y; W), the log-likelihood function can be expressed as:
Figure BDA0002984183990000033
the expected value E { ln p (Y; W) } of ln p (Y; W) is obtained by the following formula
Figure BDA0002984183990000034
Where < > denotes an inner product operation, μ denotes the mean of the high dimensional data Y, D denotes the dimension of the W matrix, tr () denotes the trace and has
<xn>=M-1WT(yn-μ), (9)
Figure BDA0002984183990000035
Wherein M ═ WTW+βI;
(ii) Maximize the desired value E { ln p (Y; W) } with respect to the projection matrix W, i.e., derivative W with respect to E { ln p (Y; W) } to get the optimal value of W, noted as
Figure BDA0002984183990000036
Figure BDA0002984183990000037
(iii) Determining convergence by alternating (i) and (ii) until convergence, with any two consecutive iterations yielding a difference in E { ln p (Y; W) },
||E{ln p(Y;W)}t+1-E{ln p(Y;W)}t||≤ε (12)
if the above formula is satisfied, E { ln p (Y; W) } is considered to reach an extreme point, and a projection matrix W is obtained.
Further, in step S2, each service domain data low-dimensional representation X ═ X1,x2,...,xN]I.e. a data set of a latent structure model, consisting of two parts: by Xn×mSpace of explanatory variables expressed and represented by Yn×kA reaction variable space of representations, where n represents the number of samples, and m and k represent the number of variables;
latent variable tjAnd uj(j ═ 1, 2.., a) by the formula tj=XjwjAnd uj=YjqjCalculated, wherein A is the number of latent variables and w is the variablejAnd q isjTo make a latent variable tjAnd ujIs maximized, i.e. the latent variable t is madejAnd ujThe weight coefficient when the correlation degree reaches the maximum satisfies:
Figure BDA0002984183990000041
Figure BDA0002984183990000042
further, in the step S3, the objective is to obtain the quantitative relationship between the multiple explanatory variables and the multiple reaction variables by the partial least squares regression analysis, that is, in the explanatory variable space Xn×mAnd reaction variable space Yn×kSeparately looking for linear combinations tjAnd uj(j ═ 1, 2.., a), and maximizes the covariance of the two variable spaces;
the specific process is as follows:
(1) at a latent variable tjAnd ujEstablishing a regression equation:
uj=bjtj+ej (15)
wherein e isjIs an error vector, bjIs an unknown parameter, and bjCan be calculated by the following formula:
Figure BDA0002984183990000043
carrying out estimation; is provided with
Figure BDA0002984183990000051
For the predicted values of uj, the matrices X and Y are decomposed into the following outer product form:
Figure BDA0002984183990000052
in the formula, E and F are residual errors of matrixes X and Y after the latent variables A are extracted respectively;
(2) in partial least squaresIn the analysis process, each pair of latent variables tjAnd uj(j 1, 2.., a) are extracted in turn in an iterative process, then the extracted residual is calculated, and the analysis of the residual of each step is continued until the logarithm of the extraction latent variable is determined according to some criterion.
Further, in step S3, the prediction residual square sum PRESS is used to determine the number of latent variable logarithms to be extracted, i.e. the predicted estimated values of the response variables after 1 sample point is removed are calculated separately in each step
Figure BDA0002984183990000055
And the sum of the squared residuals of the actual observations y:
Figure BDA0002984183990000053
in the above formula, l is the number of dependent variables until PRESS (j) -PRESS (j-1) is less than the preset precision, the iteration process is ended, otherwise, latent variables are continuously extracted for iterative computation.
Further, in step S4, the following quadratic polynomial regression model is built using the a latent variables obtained according to partial least squares regression analysis:
Figure BDA0002984183990000054
wherein, beta0,βi,βii,βijAre all regression coefficients, x ∈ tj,y∈uj,tjAnd ujIs a latent variable;
and according to the obtained latent variables and the number thereof, and referring to PRESS statistic, obtaining the standard regression coefficient beta of the effect of each independent variable on each dependent variable.
Compared with the prior art, the principle and the advantages of the scheme are as follows:
1. the latent structure model provided by the scheme realizes the simplification of a data structure while modeling, analyzes the mutual influence among different services in partial least square regression analysis to obtain a regression model of multiple dependent variables to multiple independent variables, and is more effective in comparison with multiple regression of the dependent variables one by one, more reliable in conclusion and stronger in integrity.
2. According to the scheme, the regression coefficient between the single service and other services is finally obtained, the influence relation between the originally independent different service domain data is excavated, the single service predicted value is obtained, the limitation of single service domain data modeling is broken through, the data value of each service domain is fully exerted, the single service modeling effect is better, and the quality and efficiency improvement of the service are facilitated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the services required for the embodiments or the technical solutions in the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a regression modeling method for manufacturing big data according to the present invention;
FIG. 2 is a schematic flow chart of data preprocessing in the regression modeling method for big data in manufacturing industry according to the present invention;
FIG. 3 is a schematic flow chart of latent structure modeling in the regression modeling method for big data in manufacturing industry according to the present invention.
Detailed Description
The invention will be further illustrated with reference to specific examples:
in the regression modeling method for big data of a manufacturing industry, influence relations among data of different business domains are excavated by establishing a latent structure model among the business domains, and different types of data of a plurality of business domains are communicated;
as shown in fig. 1, the method specifically comprises the following steps:
s1, performing dimensionality reduction and denoising on high-dimensional data of different service domains through data preprocessing to obtain low-dimensional features suitable for modeling;
in the step, a linear mapping from a high-dimensional space projection to a low-dimensional space is established by adopting a principal component analysis method, so that a projection matrix W is obtained;
as shown in fig. 2, the specific process is as follows:
by Y ═ Y1,y2,...,yN]Representing high-dimensional data requiring dimensionality reduction, X ═ X1,x2,...,xN]Representing the low-dimensional data after dimensionality reduction, wherein N is the number of samples; and assuming data noise ηn∈RDIn accordance with independent Gaussian distribution etan~N(0,β-1I),β-1For noise variance, I is an identity matrix, and the mapping from high-dimensional space to low-dimensional space is represented as:
yn=W·xnn, (1)
wherein the mapping is determined by a projection matrix W, then the likelihood probability of the high dimensional data space is:
p(yn|xn,W,β)=N(yn|Wxn-1I), (2)
assuming that the data points in the low-dimensional space are independently and identically distributed, there are:
p(xn)=N(xn|0,I), (3)
integrating the data points in the low-dimensional space, deducing the edge likelihood,
P(yn|W,β)=∫p(yn|xn,W,β-1I)p(xn)dxn
=N(yn|WWT-1I). (4)
and the joint likelihood of the high-dimensional spatial data,
Figure BDA0002984183990000071
obtaining a projection matrix W by using a maximum likelihood method;
in the present embodiment, an EM (Expectation-Maximization) algorithm is used to evaluate the parameter maximum likelihood, which includes the following steps:
(i) calculating a log-likelihood function expectation for the data, the likelihood function being:
Figure BDA0002984183990000072
noting the log-likelihood function as ln p (Y; W), the log-likelihood function can be expressed as:
Figure BDA0002984183990000081
the expected value E { ln p (Y; W) } of ln p (Y; W) is obtained by the following formula
Figure BDA0002984183990000082
Where < > denotes an inner product operation, μ denotes the mean of the high dimensional data Y, D denotes the dimension of the W matrix, tr () denotes the trace and has
<xn>=M-1WT(yn-μ), (9)
Figure BDA0002984183990000083
Wherein M ═ WTW+βI;
(ii) Maximize the desired value E { ln p (Y; W) } with respect to the projection matrix W, i.e., derivative W with respect to E { ln p (Y; W) } to get the optimal value of W, noted as
Figure BDA0002984183990000084
Figure BDA0002984183990000085
(iii) Determining convergence by alternating (i) and (ii) until convergence, with any two consecutive iterations yielding a difference in E { ln p (Y; W) },
||E{ln p(Y;W)}t+1-E{ln p(Y;W)}t||≤ε (12)
if the above formula is satisfied, E { ln p (Y; W) } is considered to reach an extreme point, and a projection matrix W is obtained.
From the obtained projection matrix W, a low-dimensional representation X of the high-dimensional data Y is obtained by formula (1).
S2, converting low-dimensional data of different service domains into latent variable forms;
in this step, each service domain data low-dimensional representation X ═ X1,x2,...,xN]I.e. a data set of a latent structure model, consisting of two parts: by Xn×mSpace of explanatory variables expressed and represented by Yn×kA reaction variable space of representations, where n represents the number of samples, and m and k represent the number of variables;
latent variable tjAnd uj(j ═ 1, 2.., a) by the formula tj=XjwjAnd uj=YjqjCalculated, wherein A is the number of latent variables and w is the variablejAnd q isjTo make a latent variable tjAnd ujIs maximized, i.e. the latent variable t is madejAnd ujThe weight coefficient when the correlation degree reaches the maximum satisfies:
Figure BDA0002984183990000091
Figure BDA0002984183990000092
s3, establishing a regression equation among different latent variables through partial least squares regression analysis, determining a weight coefficient according to the maximum covariance among the latent variables, namely, the maximum correlation degree of the latent variables, calculating to obtain the latent variables, and determining the number of the latent variables by adopting the sum of squares of predicted residuals, so as to realize the simultaneous regression analysis of a plurality of dependent variables on a plurality of independent variables; as shown in particular in fig. 3;
this step is based onLeast squares regression analysis, aimed at obtaining quantitative relationships between multiple explanatory variables and multiple reaction variables, i.e. in the explanatory variable space Xn×mAnd reaction variable space Yn×kSeparately looking for linear combinations tjAnd uj(j ═ 1, 2.., a), and maximizes the covariance of the two variable spaces;
the specific process is as follows:
(1) at a latent variable tjAnd ujEstablishing a regression equation:
uj=bjtj+ej (15)
wherein e isjIs an error vector, bjIs an unknown parameter, and bjCan be calculated by the following formula:
Figure BDA0002984183990000093
carrying out estimation; is provided with
Figure BDA0002984183990000094
Is ujThe matrices X and Y are decomposed into the following outer product form:
Figure BDA0002984183990000095
in the formula, E and F are residual errors of matrixes X and Y after the latent variables A are extracted respectively;
(2) in the partial least squares regression analysis process, each pair of latent variables tjAnd uj(j 1, 2.., a) are extracted in turn in an iterative process, then the extracted residual is calculated, and the analysis of the residual of each step is continued until the logarithm of the extraction latent variable is determined according to some criterion.
In the above, the prediction Residual square Sum press (predicted Residual Sum of squares) is used to determine the latent variable logarithm to be extracted, i.e. 1 latent variable logarithm is removed by calculation in each stepSample point post-reaction variable prediction estimation value
Figure BDA0002984183990000103
And the sum of the squared residuals of the actual observations y:
Figure BDA0002984183990000101
in the above formula, l is the number of dependent variables until PRESS (j) -PRESS (j-1) is less than the preset precision, the iteration process is ended, otherwise, latent variables are continuously extracted for iterative computation.
And S4, finally, establishing a binomial regression equation among the latent variables to obtain a standard regression coefficient beta of each independent variable acting on each dependent variable, and further obtaining a single service predicted value.
The method specifically comprises the following steps:
according to the partial least squares regression analysis, the A latent variables are used to establish the following quadratic polynomial regression model:
Figure BDA0002984183990000102
wherein, beta0,βi,βii,βijAre all regression coefficients, x ∈ tj,y∈uj,tjAnd ujIs a latent variable;
and according to the obtained latent variables and the number thereof, and referring to PRESS statistic, obtaining the standard regression coefficient beta of the effect of each independent variable on each dependent variable.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.

Claims (7)

1. A regression modeling method for big data of a manufacturing industry is characterized in that an influence relation among data of different business domains is excavated by establishing a latent structure model among the business domains, and different types of data of a plurality of business domains are communicated; the method specifically comprises the following steps:
s1, performing dimensionality reduction and denoising on high-dimensional data of different service domains through data preprocessing to obtain low-dimensional features suitable for modeling;
s2, converting low-dimensional data of different service domains into latent variable forms;
s3, establishing a regression equation among different latent variables through partial least squares regression analysis, determining a weight coefficient according to the maximum covariance among the latent variables, namely, the maximum correlation degree of the latent variables, calculating to obtain the latent variables, and determining the number of the latent variables by adopting the sum of squares of predicted residuals, so as to realize the simultaneous regression analysis of a plurality of dependent variables on a plurality of independent variables;
and S4, establishing a binomial regression equation between the latent variables to obtain a standard regression coefficient beta of each independent variable acting on each dependent variable, and further obtaining a single service predicted value.
2. The regression modeling method for big data of manufacturing industry as claimed in claim 1, wherein said step S1 is to use principal component analysis to establish a linear mapping from high dimensional space to low dimensional space, so as to obtain the projection matrix W, and the specific process is as follows:
by Y ═ Y1,y2,...,yN]Representing high-dimensional data requiring dimensionality reduction, X ═ X1,x2,...,xN]Representing the low-dimensional data after dimensionality reduction, wherein N is the number of samples; and assuming data noise ηn∈RDIn accordance with independent Gaussian distribution etan~N(0,β-1I),β-1For noise variance, I is an identity matrix, and the mapping from high-dimensional space to low-dimensional space is represented as:
yn=W·xnn, (1)
wherein the mapping is determined by a projection matrix W, then the likelihood probability of the high dimensional data space is:
p(yn|xn,W,β)=N(yn|Wxn,β-1I), (2)
assuming that the data points in the low-dimensional space are independently and identically distributed, there are:
p(xn)=N(xn|0,I), (3)
integrating the data points in the low-dimensional space, deducing the edge likelihood,
P(yn|W,β)=∫p(yn|xn,W,β-1I)p(xn)dxn
=N(yn|WWT-1I). (4)
and the joint likelihood of the high-dimensional spatial data,
Figure FDA0002984183980000021
obtaining a projection matrix W by using a maximum likelihood method;
from the obtained projection matrix W, a low-dimensional representation X of the high-dimensional data Y is obtained by formula (1).
3. The regression modeling method for manufacturing industry big data according to claim 2, wherein in the step S1, when the projection matrix W is obtained, the parameter maximum likelihood estimation is obtained by using an EM algorithm, and the method specifically comprises the following steps:
(i) calculating a log-likelihood function expectation for the data, the likelihood function being:
Figure FDA0002984183980000022
noting the log-likelihood function as lnp (Y; W), the log-likelihood function can be expressed as:
Figure FDA0002984183980000023
the expected value E { lnp (Y; W) }of lnp (Y; W) is obtained as follows
Figure FDA0002984183980000024
Where < > denotes an inner product operation, μ denotes the mean of the high dimensional data Y, D denotes the dimension of the W matrix, tr () denotes the trace and has
<xn>=M-1WT(yn-μ), (9)
Figure FDA0002984183980000031
Wherein M ═ WTW+βI;
(ii) Maximize the desired value E { lnp (Y; W) } with respect to the projection matrix W, i.e., derivative W with respect to E { lnp (Y; W) } to get the optimal value of W, noted as
Figure FDA0002984183980000032
Figure FDA0002984183980000033
(iii) Determining convergence by alternating (i) and (ii) until convergence, with any two consecutive iterations yielding a difference in E { lnp (Y; W) },
||E{lnp(Y;W)}t+1-E{lnp(Y;W)}t||≤ε (12)
if the above formula is satisfied, E { lnp (Y; W) } is considered to reach an extreme point, and a projection matrix W is obtained.
4. The regression modeling method for manufacturing industry big data as claimed in claim 1, wherein in said step S2, each business domain data low-dimensional representation X ═ X1,x2,...,xN]I.e. a data set of a latent structure model, consisting of two parts: by Xn×mSpace of explanatory variables expressed and represented by Yn×kA reaction variable space of representations, where n represents the number of samples, and m and k represent the number of variables;
latent variable tjAnd uj(j ═ 1, 2.., a) by the formula tj=XjwjAnd uj=YjqjCalculated, wherein A is the number of latent variables and w is the variablejAnd q isjTo make a latent variable tjAnd ujIs maximized, i.e. the latent variable t is madejAnd ujThe weight coefficient when the correlation degree reaches the maximum satisfies:
Figure FDA0002984183980000034
Figure FDA0002984183980000035
5. the regression modeling method for manufacturing industry big data as claimed in claim 4, wherein in said step S3, the objective is to obtain the quantitative relationship between multiple explanatory variables and multiple reaction variables by partial least squares regression analysis, that is, in the explanatory variable space Xn×mAnd reaction variable space Yn×kSeparately looking for linear combinations tjAnd uj(j ═ 1, 2.., a), and maximizes the covariance of the two variable spaces;
the specific process is as follows:
(1) at a latent variable tjAnd ujEstablishing a regression equation:
uj=bjtj+ej (15)
wherein e isjIs an error vector, bjIs an unknown parameter, and bjCan be calculated by the following formula:
Figure FDA0002984183980000041
carrying out estimation; is provided with
Figure FDA0002984183980000042
Is ujThe matrices X and Y are decomposed into the following outer product form:
Figure FDA0002984183980000043
in the formula, E and F are residual errors of matrixes X and Y after the latent variables A are extracted respectively;
(2) in the partial least squares regression analysis process, each pair of latent variables tjAnd uj(j 1, 2.., a) are extracted in turn in an iterative process, then the extracted residual is calculated, and the analysis of the residual of each step is continued until the logarithm of the extraction latent variable is determined according to some criterion.
6. The method of claim 5, wherein in step S3, the logarithm of latent variables to be extracted is determined by using the sum of squares of prediction residuals and PRESS, i.e. the predicted estimated value of the response variable after removing 1 sample point is calculated separately in each step
Figure FDA0002984183980000045
And the sum of the squared residuals of the actual observations y:
Figure FDA0002984183980000044
in the above formula, l is the number of dependent variables until PRESS (j) -PRESS (j-1) is less than the preset precision, the iteration process is ended, otherwise, latent variables are continuously extracted for iterative computation.
7. The regression modeling method for manufacturing industry big data according to claim 6, wherein in step S4, the following quadratic polynomial regression model is established using the a latent variables obtained according to partial least squares regression analysis:
Figure FDA0002984183980000051
wherein, beta0,βi,βii,βijAre all regression coefficients, x ∈ tj,y∈uj,tjAnd ujIs a latent variable;
and according to the obtained latent variables and the number thereof, and referring to PRESS statistic, obtaining the standard regression coefficient beta of the effect of each independent variable on each dependent variable.
CN202110295478.1A 2021-03-19 2021-03-19 Regression modeling method for big data of manufacturing industry Active CN113190956B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110295478.1A CN113190956B (en) 2021-03-19 2021-03-19 Regression modeling method for big data of manufacturing industry

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110295478.1A CN113190956B (en) 2021-03-19 2021-03-19 Regression modeling method for big data of manufacturing industry

Publications (2)

Publication Number Publication Date
CN113190956A true CN113190956A (en) 2021-07-30
CN113190956B CN113190956B (en) 2022-11-22

Family

ID=76973537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110295478.1A Active CN113190956B (en) 2021-03-19 2021-03-19 Regression modeling method for big data of manufacturing industry

Country Status (1)

Country Link
CN (1) CN113190956B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116137630A (en) * 2023-04-19 2023-05-19 井芯微电子技术(天津)有限公司 Method and device for quantitatively processing network service demands

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104123451A (en) * 2014-07-16 2014-10-29 河海大学常州校区 Dredging operation yield prediction model building method based on partial least squares regression
CN104949936A (en) * 2015-07-13 2015-09-30 东北大学 Sample component determination method based on optimizing partial least squares regression model
CN108197380A (en) * 2017-12-29 2018-06-22 南京林业大学 Gauss based on offset minimum binary returns soft-measuring modeling method
CN109492265A (en) * 2018-10-18 2019-03-19 南京林业大学 The kinematic nonlinearity PLS soft-measuring modeling method returned based on Gaussian process
US20200364386A1 (en) * 2019-05-14 2020-11-19 Beijing University Of Technology Soft sensing method and system for difficult-to-measure parameters in complex industrial processes

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104123451A (en) * 2014-07-16 2014-10-29 河海大学常州校区 Dredging operation yield prediction model building method based on partial least squares regression
CN104949936A (en) * 2015-07-13 2015-09-30 东北大学 Sample component determination method based on optimizing partial least squares regression model
CN108197380A (en) * 2017-12-29 2018-06-22 南京林业大学 Gauss based on offset minimum binary returns soft-measuring modeling method
CN109492265A (en) * 2018-10-18 2019-03-19 南京林业大学 The kinematic nonlinearity PLS soft-measuring modeling method returned based on Gaussian process
US20200364386A1 (en) * 2019-05-14 2020-11-19 Beijing University Of Technology Soft sensing method and system for difficult-to-measure parameters in complex industrial processes

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
付凌晖等: "多项式回归的建模方法比较研究", 《数理统计与管理》 *
郭建校: "改进的高维非线性PLS回归方法及应用研究", 《中国博士学位论文全文数据库 (经济与管理科学辑)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116137630A (en) * 2023-04-19 2023-05-19 井芯微电子技术(天津)有限公司 Method and device for quantitatively processing network service demands
CN116137630B (en) * 2023-04-19 2023-08-18 井芯微电子技术(天津)有限公司 Method and device for quantitatively processing network service demands

Also Published As

Publication number Publication date
CN113190956B (en) 2022-11-22

Similar Documents

Publication Publication Date Title
Soofi et al. Information distinguishability with application to analysis of failure data
CN113190956B (en) Regression modeling method for big data of manufacturing industry
Atamanyuk et al. Forecasting economic indices of agricultural enterprises based on vector polynomial canonical expansion of random sequences
CN117060401A (en) New energy power prediction method, device, equipment and computer readable storage medium
CN111898653A (en) Based on robustness l1,2Norm constrained supervised dimension reduction method
CN111144650A (en) Power load prediction method, device, computer readable storage medium and equipment
Koukoumis et al. On entropy-type measures and divergences with applications in engineering, management and applied sciences
CN116187563A (en) Sea surface temperature space-time intelligent prediction method based on fusion improvement variation modal decomposition
CN113657045B (en) Complex aircraft model reduced order characterization method based on multilayer collaborative Gaussian process
CN113139247B (en) Mechanical structure uncertainty parameter quantification and correlation analysis method
CN115102868A (en) Web service QoS prediction method based on SOM clustering and depth self-encoder
Beyaztas et al. A robust partial least squares approach for function-on-function regression
Sledge et al. An information-theoretic approach for automatically determining the number of state groups when aggregating markov chains
Wang et al. Autonf: Automated architecture optimization of normalizing flows with unconstrained continuous relaxation admitting optimal discrete solution
Akgül et al. Estimation of the location and the scale parameters of Burr Type XII distribution
CN112231933B (en) Feature selection method for radar electromagnetic interference effect analysis
CN113822342B (en) Document classification method and system for security graph convolution network
Anavangot et al. A novel approximate Lloyd-Max quantizer and its analysis
Meng et al. Penalized quasi-likelihood estimation of generalized Pareto regression–consistent identification of risk factors for extreme losses
CN113242425B (en) Optimal distribution method of sampling set for small disturbance band-limited map signal
CN115936136A (en) Data recovery method and system based on low-rank structure
CN115174421B (en) Network fault prediction method and device based on self-supervision unwrapping hypergraph attention
Shim et al. Prediction intervals for LS-SVM regression using the bootstrap
De Vito et al. Unsupervised parameter selection for denoising with the elastic net
CN116432759A (en) Judicial causal Bayesian network construction method based on hierarchical additive noise model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant