CN113190956A - Regression modeling method for big data of manufacturing industry - Google Patents
Regression modeling method for big data of manufacturing industry Download PDFInfo
- Publication number
- CN113190956A CN113190956A CN202110295478.1A CN202110295478A CN113190956A CN 113190956 A CN113190956 A CN 113190956A CN 202110295478 A CN202110295478 A CN 202110295478A CN 113190956 A CN113190956 A CN 113190956A
- Authority
- CN
- China
- Prior art keywords
- latent
- data
- variables
- variable
- regression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000004519 manufacturing process Methods 0.000 title claims abstract description 28
- 230000001419 dependent effect Effects 0.000 claims abstract description 16
- 238000004458 analytical method Methods 0.000 claims abstract description 15
- 238000010238 partial least squares regression Methods 0.000 claims abstract description 11
- 238000000611 regression analysis Methods 0.000 claims abstract description 6
- 230000000694 effects Effects 0.000 claims abstract description 5
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 239000011159 matrix material Substances 0.000 claims description 26
- 230000008569 process Effects 0.000 claims description 12
- 238000013507 mapping Methods 0.000 claims description 9
- 230000009467 reduction Effects 0.000 claims description 9
- 238000007476 Maximum Likelihood Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 3
- 238000012804 iterative process Methods 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 2
- 238000000513 principal component analysis Methods 0.000 claims 1
- 238000013461 design Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000012847 principal component analysis method Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/10—Numerical modelling
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Geometry (AREA)
- Computer Hardware Design (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Manufacturing & Machinery (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Complex Calculations (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a regression modeling method for big data of a manufacturing industry, which comprises the following steps: s1, obtaining low-dimensional features suitable for modeling through data preprocessing; s2, converting low-dimensional data of different service domains into latent variable forms; s3, establishing a regression equation among different latent variables through partial least squares regression analysis, calculating to obtain the latent variables according to the maximum covariance among the latent variables, determining the number of the latent variables by adopting the sum of squares of the prediction residuals, and realizing the simultaneous regression analysis of a plurality of dependent variables on a plurality of independent variables; and S4, establishing a binomial regression equation between the latent variables to obtain a standard regression coefficient beta of each independent variable acting on each dependent variable, and further obtaining a single service predicted value. According to the invention, through establishing a latent structure model among service domains, influence relations among different service domain data are excavated, and different types of data of a plurality of service domains are communicated, so that a single service modeling effect is better, and the quality and efficiency of the service are improved.
Description
Technical Field
The invention relates to the technical field of big data analysis and modeling, in particular to a regression modeling method for big data of a manufacturing industry.
Background
The manufacturing industry is one of the prop industries of national economy, and is the embodiment for realizing the modernization guarantee and the comprehensive national force. With the increasing development of economy and science and technology, the data volume generated by modern manufacturing industry is exponentially increased, so that the potential and value of big data are gradually accepted and received by society, and the combination of the big data and the manufacturing industry promotes the comprehensive reform of design, management, manufacturing and service modes of the manufacturing industry. However, such manufacturing data usually has characteristics of multiple sources, heterogeneity, complexity and the like, which is also one of the main problems that the manufacturing enterprise needs to face when modeling the big data.
The existing big data model is only a single service for a manufacturing enterprise, does not consider the correlation among the services, neglects the influence of the services such as design, management and service on the manufacturing process, and does not establish the incidence relation between the manufacturing service and other services, so that the data among the services of the manufacturing enterprise is not fully utilized, and the manufacturing process of the whole flow cannot be strictly controlled and reasonably planned.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a regression modeling method for big data of a manufacturing industry.
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
a regression modeling method for big data of a manufacturing industry is characterized in that an influence relation among data of different business domains is excavated by establishing a latent structure model among the business domains, and different types of data of a plurality of business domains are communicated; the method specifically comprises the following steps:
s1, performing dimensionality reduction and denoising on high-dimensional data of different service domains through data preprocessing to obtain low-dimensional features suitable for modeling;
s2, converting low-dimensional data of different service domains into latent variable forms;
s3, establishing a regression equation among different latent variables through partial least squares regression analysis, determining a weight coefficient according to the maximum covariance among the latent variables, namely, the maximum correlation degree of the latent variables, calculating to obtain the latent variables, and determining the number of the latent variables by adopting the sum of squares of predicted residuals, so as to realize the simultaneous regression analysis of a plurality of dependent variables on a plurality of independent variables;
and S4, establishing a binomial regression equation between the latent variables to obtain a standard regression coefficient beta of each independent variable acting on each dependent variable, and further obtaining a single service predicted value.
Further, in step S1, a principal component analysis method is used to establish a linear mapping from the high-dimensional space projection to the low-dimensional space, so as to obtain the projection matrix W, and the specific process is as follows:
by Y ═ Y1,y2,...,yN]Representing high-dimensional data requiring dimensionality reduction, X ═ X1,x2,...,xN]Representing the low-dimensional data after dimensionality reduction, wherein N is the number of samples; and assuming data noise ηn∈RDIn accordance with independent Gaussian distribution etan~N(0,β-1I),β-1For noise variance, I is an identity matrix, and the mapping from high-dimensional space to low-dimensional space is represented as:
yn=W·xn+ηn, (1)
wherein the mapping is determined by a projection matrix W, then the likelihood probability of the high dimensional data space is:
p(yn|xn,W,β)=N(yn|Wxn,β-1I), (2)
assuming that the data points in the low-dimensional space are independently and identically distributed, there are:
p(xn)=N(xn|0,I), (3)
integrating the data points in the low-dimensional space, deducing the edge likelihood,
P(yn|W,β)=∫p(yn|xn,W,β-1I)p(xn)dxn
=N(yn|WWT+β-1I). (4)
and the joint likelihood of the high-dimensional spatial data,
obtaining a projection matrix W by using a maximum likelihood method;
from the obtained projection matrix W, a low-dimensional representation X of the high-dimensional data Y is obtained by formula (1).
Further, in step S1, when obtaining the projection matrix W, obtaining the parameter maximum likelihood estimation by using the EM algorithm specifically includes the following steps:
(i) calculating a log-likelihood function expectation for the data, the likelihood function being:
noting the log-likelihood function as ln p (Y; W), the log-likelihood function can be expressed as:
the expected value E { ln p (Y; W) } of ln p (Y; W) is obtained by the following formula
Where < > denotes an inner product operation, μ denotes the mean of the high dimensional data Y, D denotes the dimension of the W matrix, tr () denotes the trace and has
<xn>=M-1WT(yn-μ), (9)
Wherein M ═ WTW+βI;
(ii) Maximize the desired value E { ln p (Y; W) } with respect to the projection matrix W, i.e., derivative W with respect to E { ln p (Y; W) } to get the optimal value of W, noted as
(iii) Determining convergence by alternating (i) and (ii) until convergence, with any two consecutive iterations yielding a difference in E { ln p (Y; W) },
||E{ln p(Y;W)}t+1-E{ln p(Y;W)}t||≤ε (12)
if the above formula is satisfied, E { ln p (Y; W) } is considered to reach an extreme point, and a projection matrix W is obtained.
Further, in step S2, each service domain data low-dimensional representation X ═ X1,x2,...,xN]I.e. a data set of a latent structure model, consisting of two parts: by Xn×mSpace of explanatory variables expressed and represented by Yn×kA reaction variable space of representations, where n represents the number of samples, and m and k represent the number of variables;
latent variable tjAnd uj(j ═ 1, 2.., a) by the formula tj=XjwjAnd uj=YjqjCalculated, wherein A is the number of latent variables and w is the variablejAnd q isjTo make a latent variable tjAnd ujIs maximized, i.e. the latent variable t is madejAnd ujThe weight coefficient when the correlation degree reaches the maximum satisfies:
further, in the step S3, the objective is to obtain the quantitative relationship between the multiple explanatory variables and the multiple reaction variables by the partial least squares regression analysis, that is, in the explanatory variable space Xn×mAnd reaction variable space Yn×kSeparately looking for linear combinations tjAnd uj(j ═ 1, 2.., a), and maximizes the covariance of the two variable spaces;
the specific process is as follows:
(1) at a latent variable tjAnd ujEstablishing a regression equation:
uj=bjtj+ej (15)
wherein e isjIs an error vector, bjIs an unknown parameter, and bjCan be calculated by the following formula:
carrying out estimation; is provided with
For the predicted values of uj, the matrices X and Y are decomposed into the following outer product form:
in the formula, E and F are residual errors of matrixes X and Y after the latent variables A are extracted respectively;
(2) in partial least squaresIn the analysis process, each pair of latent variables tjAnd uj(j 1, 2.., a) are extracted in turn in an iterative process, then the extracted residual is calculated, and the analysis of the residual of each step is continued until the logarithm of the extraction latent variable is determined according to some criterion.
Further, in step S3, the prediction residual square sum PRESS is used to determine the number of latent variable logarithms to be extracted, i.e. the predicted estimated values of the response variables after 1 sample point is removed are calculated separately in each stepAnd the sum of the squared residuals of the actual observations y:
in the above formula, l is the number of dependent variables until PRESS (j) -PRESS (j-1) is less than the preset precision, the iteration process is ended, otherwise, latent variables are continuously extracted for iterative computation.
Further, in step S4, the following quadratic polynomial regression model is built using the a latent variables obtained according to partial least squares regression analysis:
wherein, beta0,βi,βii,βijAre all regression coefficients, x ∈ tj,y∈uj,tjAnd ujIs a latent variable;
and according to the obtained latent variables and the number thereof, and referring to PRESS statistic, obtaining the standard regression coefficient beta of the effect of each independent variable on each dependent variable.
Compared with the prior art, the principle and the advantages of the scheme are as follows:
1. the latent structure model provided by the scheme realizes the simplification of a data structure while modeling, analyzes the mutual influence among different services in partial least square regression analysis to obtain a regression model of multiple dependent variables to multiple independent variables, and is more effective in comparison with multiple regression of the dependent variables one by one, more reliable in conclusion and stronger in integrity.
2. According to the scheme, the regression coefficient between the single service and other services is finally obtained, the influence relation between the originally independent different service domain data is excavated, the single service predicted value is obtained, the limitation of single service domain data modeling is broken through, the data value of each service domain is fully exerted, the single service modeling effect is better, and the quality and efficiency improvement of the service are facilitated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the services required for the embodiments or the technical solutions in the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a regression modeling method for manufacturing big data according to the present invention;
FIG. 2 is a schematic flow chart of data preprocessing in the regression modeling method for big data in manufacturing industry according to the present invention;
FIG. 3 is a schematic flow chart of latent structure modeling in the regression modeling method for big data in manufacturing industry according to the present invention.
Detailed Description
The invention will be further illustrated with reference to specific examples:
in the regression modeling method for big data of a manufacturing industry, influence relations among data of different business domains are excavated by establishing a latent structure model among the business domains, and different types of data of a plurality of business domains are communicated;
as shown in fig. 1, the method specifically comprises the following steps:
s1, performing dimensionality reduction and denoising on high-dimensional data of different service domains through data preprocessing to obtain low-dimensional features suitable for modeling;
in the step, a linear mapping from a high-dimensional space projection to a low-dimensional space is established by adopting a principal component analysis method, so that a projection matrix W is obtained;
as shown in fig. 2, the specific process is as follows:
by Y ═ Y1,y2,...,yN]Representing high-dimensional data requiring dimensionality reduction, X ═ X1,x2,...,xN]Representing the low-dimensional data after dimensionality reduction, wherein N is the number of samples; and assuming data noise ηn∈RDIn accordance with independent Gaussian distribution etan~N(0,β-1I),β-1For noise variance, I is an identity matrix, and the mapping from high-dimensional space to low-dimensional space is represented as:
yn=W·xn+ηn, (1)
wherein the mapping is determined by a projection matrix W, then the likelihood probability of the high dimensional data space is:
p(yn|xn,W,β)=N(yn|Wxn,β-1I), (2)
assuming that the data points in the low-dimensional space are independently and identically distributed, there are:
p(xn)=N(xn|0,I), (3)
integrating the data points in the low-dimensional space, deducing the edge likelihood,
P(yn|W,β)=∫p(yn|xn,W,β-1I)p(xn)dxn
=N(yn|WWT+β-1I). (4)
and the joint likelihood of the high-dimensional spatial data,
obtaining a projection matrix W by using a maximum likelihood method;
in the present embodiment, an EM (Expectation-Maximization) algorithm is used to evaluate the parameter maximum likelihood, which includes the following steps:
(i) calculating a log-likelihood function expectation for the data, the likelihood function being:
noting the log-likelihood function as ln p (Y; W), the log-likelihood function can be expressed as:
the expected value E { ln p (Y; W) } of ln p (Y; W) is obtained by the following formula
Where < > denotes an inner product operation, μ denotes the mean of the high dimensional data Y, D denotes the dimension of the W matrix, tr () denotes the trace and has
<xn>=M-1WT(yn-μ), (9)
Wherein M ═ WTW+βI;
(ii) Maximize the desired value E { ln p (Y; W) } with respect to the projection matrix W, i.e., derivative W with respect to E { ln p (Y; W) } to get the optimal value of W, noted as
(iii) Determining convergence by alternating (i) and (ii) until convergence, with any two consecutive iterations yielding a difference in E { ln p (Y; W) },
||E{ln p(Y;W)}t+1-E{ln p(Y;W)}t||≤ε (12)
if the above formula is satisfied, E { ln p (Y; W) } is considered to reach an extreme point, and a projection matrix W is obtained.
From the obtained projection matrix W, a low-dimensional representation X of the high-dimensional data Y is obtained by formula (1).
S2, converting low-dimensional data of different service domains into latent variable forms;
in this step, each service domain data low-dimensional representation X ═ X1,x2,...,xN]I.e. a data set of a latent structure model, consisting of two parts: by Xn×mSpace of explanatory variables expressed and represented by Yn×kA reaction variable space of representations, where n represents the number of samples, and m and k represent the number of variables;
latent variable tjAnd uj(j ═ 1, 2.., a) by the formula tj=XjwjAnd uj=YjqjCalculated, wherein A is the number of latent variables and w is the variablejAnd q isjTo make a latent variable tjAnd ujIs maximized, i.e. the latent variable t is madejAnd ujThe weight coefficient when the correlation degree reaches the maximum satisfies:
s3, establishing a regression equation among different latent variables through partial least squares regression analysis, determining a weight coefficient according to the maximum covariance among the latent variables, namely, the maximum correlation degree of the latent variables, calculating to obtain the latent variables, and determining the number of the latent variables by adopting the sum of squares of predicted residuals, so as to realize the simultaneous regression analysis of a plurality of dependent variables on a plurality of independent variables; as shown in particular in fig. 3;
this step is based onLeast squares regression analysis, aimed at obtaining quantitative relationships between multiple explanatory variables and multiple reaction variables, i.e. in the explanatory variable space Xn×mAnd reaction variable space Yn×kSeparately looking for linear combinations tjAnd uj(j ═ 1, 2.., a), and maximizes the covariance of the two variable spaces;
the specific process is as follows:
(1) at a latent variable tjAnd ujEstablishing a regression equation:
uj=bjtj+ej (15)
wherein e isjIs an error vector, bjIs an unknown parameter, and bjCan be calculated by the following formula:
carrying out estimation; is provided with
Is ujThe matrices X and Y are decomposed into the following outer product form:
in the formula, E and F are residual errors of matrixes X and Y after the latent variables A are extracted respectively;
(2) in the partial least squares regression analysis process, each pair of latent variables tjAnd uj(j 1, 2.., a) are extracted in turn in an iterative process, then the extracted residual is calculated, and the analysis of the residual of each step is continued until the logarithm of the extraction latent variable is determined according to some criterion.
In the above, the prediction Residual square Sum press (predicted Residual Sum of squares) is used to determine the latent variable logarithm to be extracted, i.e. 1 latent variable logarithm is removed by calculation in each stepSample point post-reaction variable prediction estimation valueAnd the sum of the squared residuals of the actual observations y:
in the above formula, l is the number of dependent variables until PRESS (j) -PRESS (j-1) is less than the preset precision, the iteration process is ended, otherwise, latent variables are continuously extracted for iterative computation.
And S4, finally, establishing a binomial regression equation among the latent variables to obtain a standard regression coefficient beta of each independent variable acting on each dependent variable, and further obtaining a single service predicted value.
The method specifically comprises the following steps:
according to the partial least squares regression analysis, the A latent variables are used to establish the following quadratic polynomial regression model:
wherein, beta0,βi,βii,βijAre all regression coefficients, x ∈ tj,y∈uj,tjAnd ujIs a latent variable;
and according to the obtained latent variables and the number thereof, and referring to PRESS statistic, obtaining the standard regression coefficient beta of the effect of each independent variable on each dependent variable.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.
Claims (7)
1. A regression modeling method for big data of a manufacturing industry is characterized in that an influence relation among data of different business domains is excavated by establishing a latent structure model among the business domains, and different types of data of a plurality of business domains are communicated; the method specifically comprises the following steps:
s1, performing dimensionality reduction and denoising on high-dimensional data of different service domains through data preprocessing to obtain low-dimensional features suitable for modeling;
s2, converting low-dimensional data of different service domains into latent variable forms;
s3, establishing a regression equation among different latent variables through partial least squares regression analysis, determining a weight coefficient according to the maximum covariance among the latent variables, namely, the maximum correlation degree of the latent variables, calculating to obtain the latent variables, and determining the number of the latent variables by adopting the sum of squares of predicted residuals, so as to realize the simultaneous regression analysis of a plurality of dependent variables on a plurality of independent variables;
and S4, establishing a binomial regression equation between the latent variables to obtain a standard regression coefficient beta of each independent variable acting on each dependent variable, and further obtaining a single service predicted value.
2. The regression modeling method for big data of manufacturing industry as claimed in claim 1, wherein said step S1 is to use principal component analysis to establish a linear mapping from high dimensional space to low dimensional space, so as to obtain the projection matrix W, and the specific process is as follows:
by Y ═ Y1,y2,...,yN]Representing high-dimensional data requiring dimensionality reduction, X ═ X1,x2,...,xN]Representing the low-dimensional data after dimensionality reduction, wherein N is the number of samples; and assuming data noise ηn∈RDIn accordance with independent Gaussian distribution etan~N(0,β-1I),β-1For noise variance, I is an identity matrix, and the mapping from high-dimensional space to low-dimensional space is represented as:
yn=W·xn+ηn, (1)
wherein the mapping is determined by a projection matrix W, then the likelihood probability of the high dimensional data space is:
p(yn|xn,W,β)=N(yn|Wxn,β-1I), (2)
assuming that the data points in the low-dimensional space are independently and identically distributed, there are:
p(xn)=N(xn|0,I), (3)
integrating the data points in the low-dimensional space, deducing the edge likelihood,
P(yn|W,β)=∫p(yn|xn,W,β-1I)p(xn)dxn
=N(yn|WWT+β-1I). (4)
and the joint likelihood of the high-dimensional spatial data,
obtaining a projection matrix W by using a maximum likelihood method;
from the obtained projection matrix W, a low-dimensional representation X of the high-dimensional data Y is obtained by formula (1).
3. The regression modeling method for manufacturing industry big data according to claim 2, wherein in the step S1, when the projection matrix W is obtained, the parameter maximum likelihood estimation is obtained by using an EM algorithm, and the method specifically comprises the following steps:
(i) calculating a log-likelihood function expectation for the data, the likelihood function being:
noting the log-likelihood function as lnp (Y; W), the log-likelihood function can be expressed as:
the expected value E { lnp (Y; W) }of lnp (Y; W) is obtained as follows
Where < > denotes an inner product operation, μ denotes the mean of the high dimensional data Y, D denotes the dimension of the W matrix, tr () denotes the trace and has
<xn>=M-1WT(yn-μ), (9)
Wherein M ═ WTW+βI;
(ii) Maximize the desired value E { lnp (Y; W) } with respect to the projection matrix W, i.e., derivative W with respect to E { lnp (Y; W) } to get the optimal value of W, noted as
(iii) Determining convergence by alternating (i) and (ii) until convergence, with any two consecutive iterations yielding a difference in E { lnp (Y; W) },
||E{lnp(Y;W)}t+1-E{lnp(Y;W)}t||≤ε (12)
if the above formula is satisfied, E { lnp (Y; W) } is considered to reach an extreme point, and a projection matrix W is obtained.
4. The regression modeling method for manufacturing industry big data as claimed in claim 1, wherein in said step S2, each business domain data low-dimensional representation X ═ X1,x2,...,xN]I.e. a data set of a latent structure model, consisting of two parts: by Xn×mSpace of explanatory variables expressed and represented by Yn×kA reaction variable space of representations, where n represents the number of samples, and m and k represent the number of variables;
latent variable tjAnd uj(j ═ 1, 2.., a) by the formula tj=XjwjAnd uj=YjqjCalculated, wherein A is the number of latent variables and w is the variablejAnd q isjTo make a latent variable tjAnd ujIs maximized, i.e. the latent variable t is madejAnd ujThe weight coefficient when the correlation degree reaches the maximum satisfies:
5. the regression modeling method for manufacturing industry big data as claimed in claim 4, wherein in said step S3, the objective is to obtain the quantitative relationship between multiple explanatory variables and multiple reaction variables by partial least squares regression analysis, that is, in the explanatory variable space Xn×mAnd reaction variable space Yn×kSeparately looking for linear combinations tjAnd uj(j ═ 1, 2.., a), and maximizes the covariance of the two variable spaces;
the specific process is as follows:
(1) at a latent variable tjAnd ujEstablishing a regression equation:
uj=bjtj+ej (15)
wherein e isjIs an error vector, bjIs an unknown parameter, and bjCan be calculated by the following formula:
carrying out estimation; is provided with
Is ujThe matrices X and Y are decomposed into the following outer product form:
in the formula, E and F are residual errors of matrixes X and Y after the latent variables A are extracted respectively;
(2) in the partial least squares regression analysis process, each pair of latent variables tjAnd uj(j 1, 2.., a) are extracted in turn in an iterative process, then the extracted residual is calculated, and the analysis of the residual of each step is continued until the logarithm of the extraction latent variable is determined according to some criterion.
6. The method of claim 5, wherein in step S3, the logarithm of latent variables to be extracted is determined by using the sum of squares of prediction residuals and PRESS, i.e. the predicted estimated value of the response variable after removing 1 sample point is calculated separately in each stepAnd the sum of the squared residuals of the actual observations y:
in the above formula, l is the number of dependent variables until PRESS (j) -PRESS (j-1) is less than the preset precision, the iteration process is ended, otherwise, latent variables are continuously extracted for iterative computation.
7. The regression modeling method for manufacturing industry big data according to claim 6, wherein in step S4, the following quadratic polynomial regression model is established using the a latent variables obtained according to partial least squares regression analysis:
wherein, beta0,βi,βii,βijAre all regression coefficients, x ∈ tj,y∈uj,tjAnd ujIs a latent variable;
and according to the obtained latent variables and the number thereof, and referring to PRESS statistic, obtaining the standard regression coefficient beta of the effect of each independent variable on each dependent variable.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110295478.1A CN113190956B (en) | 2021-03-19 | 2021-03-19 | Regression modeling method for big data of manufacturing industry |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110295478.1A CN113190956B (en) | 2021-03-19 | 2021-03-19 | Regression modeling method for big data of manufacturing industry |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113190956A true CN113190956A (en) | 2021-07-30 |
CN113190956B CN113190956B (en) | 2022-11-22 |
Family
ID=76973537
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110295478.1A Active CN113190956B (en) | 2021-03-19 | 2021-03-19 | Regression modeling method for big data of manufacturing industry |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113190956B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116137630A (en) * | 2023-04-19 | 2023-05-19 | 井芯微电子技术(天津)有限公司 | Method and device for quantitatively processing network service demands |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104123451A (en) * | 2014-07-16 | 2014-10-29 | 河海大学常州校区 | Dredging operation yield prediction model building method based on partial least squares regression |
CN104949936A (en) * | 2015-07-13 | 2015-09-30 | 东北大学 | Sample component determination method based on optimizing partial least squares regression model |
CN108197380A (en) * | 2017-12-29 | 2018-06-22 | 南京林业大学 | Gauss based on offset minimum binary returns soft-measuring modeling method |
CN109492265A (en) * | 2018-10-18 | 2019-03-19 | 南京林业大学 | The kinematic nonlinearity PLS soft-measuring modeling method returned based on Gaussian process |
US20200364386A1 (en) * | 2019-05-14 | 2020-11-19 | Beijing University Of Technology | Soft sensing method and system for difficult-to-measure parameters in complex industrial processes |
-
2021
- 2021-03-19 CN CN202110295478.1A patent/CN113190956B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104123451A (en) * | 2014-07-16 | 2014-10-29 | 河海大学常州校区 | Dredging operation yield prediction model building method based on partial least squares regression |
CN104949936A (en) * | 2015-07-13 | 2015-09-30 | 东北大学 | Sample component determination method based on optimizing partial least squares regression model |
CN108197380A (en) * | 2017-12-29 | 2018-06-22 | 南京林业大学 | Gauss based on offset minimum binary returns soft-measuring modeling method |
CN109492265A (en) * | 2018-10-18 | 2019-03-19 | 南京林业大学 | The kinematic nonlinearity PLS soft-measuring modeling method returned based on Gaussian process |
US20200364386A1 (en) * | 2019-05-14 | 2020-11-19 | Beijing University Of Technology | Soft sensing method and system for difficult-to-measure parameters in complex industrial processes |
Non-Patent Citations (2)
Title |
---|
付凌晖等: "多项式回归的建模方法比较研究", 《数理统计与管理》 * |
郭建校: "改进的高维非线性PLS回归方法及应用研究", 《中国博士学位论文全文数据库 (经济与管理科学辑)》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116137630A (en) * | 2023-04-19 | 2023-05-19 | 井芯微电子技术(天津)有限公司 | Method and device for quantitatively processing network service demands |
CN116137630B (en) * | 2023-04-19 | 2023-08-18 | 井芯微电子技术(天津)有限公司 | Method and device for quantitatively processing network service demands |
Also Published As
Publication number | Publication date |
---|---|
CN113190956B (en) | 2022-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Soofi et al. | Information distinguishability with application to analysis of failure data | |
CN113190956B (en) | Regression modeling method for big data of manufacturing industry | |
Atamanyuk et al. | Forecasting economic indices of agricultural enterprises based on vector polynomial canonical expansion of random sequences | |
CN117060401A (en) | New energy power prediction method, device, equipment and computer readable storage medium | |
CN111898653A (en) | Based on robustness l1,2Norm constrained supervised dimension reduction method | |
CN111144650A (en) | Power load prediction method, device, computer readable storage medium and equipment | |
Koukoumis et al. | On entropy-type measures and divergences with applications in engineering, management and applied sciences | |
CN116187563A (en) | Sea surface temperature space-time intelligent prediction method based on fusion improvement variation modal decomposition | |
CN113657045B (en) | Complex aircraft model reduced order characterization method based on multilayer collaborative Gaussian process | |
CN113139247B (en) | Mechanical structure uncertainty parameter quantification and correlation analysis method | |
CN115102868A (en) | Web service QoS prediction method based on SOM clustering and depth self-encoder | |
Beyaztas et al. | A robust partial least squares approach for function-on-function regression | |
Sledge et al. | An information-theoretic approach for automatically determining the number of state groups when aggregating markov chains | |
Wang et al. | Autonf: Automated architecture optimization of normalizing flows with unconstrained continuous relaxation admitting optimal discrete solution | |
Akgül et al. | Estimation of the location and the scale parameters of Burr Type XII distribution | |
CN112231933B (en) | Feature selection method for radar electromagnetic interference effect analysis | |
CN113822342B (en) | Document classification method and system for security graph convolution network | |
Anavangot et al. | A novel approximate Lloyd-Max quantizer and its analysis | |
Meng et al. | Penalized quasi-likelihood estimation of generalized Pareto regression–consistent identification of risk factors for extreme losses | |
CN113242425B (en) | Optimal distribution method of sampling set for small disturbance band-limited map signal | |
CN115936136A (en) | Data recovery method and system based on low-rank structure | |
CN115174421B (en) | Network fault prediction method and device based on self-supervision unwrapping hypergraph attention | |
Shim et al. | Prediction intervals for LS-SVM regression using the bootstrap | |
De Vito et al. | Unsupervised parameter selection for denoising with the elastic net | |
CN116432759A (en) | Judicial causal Bayesian network construction method based on hierarchical additive noise model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |