CN110728024B - Vine copula-based soft measurement method and system - Google Patents

Vine copula-based soft measurement method and system Download PDF

Info

Publication number
CN110728024B
CN110728024B CN201910869240.8A CN201910869240A CN110728024B CN 110728024 B CN110728024 B CN 110728024B CN 201910869240 A CN201910869240 A CN 201910869240A CN 110728024 B CN110728024 B CN 110728024B
Authority
CN
China
Prior art keywords
copula
variable
predicted
value
variables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910869240.8A
Other languages
Chinese (zh)
Other versions
CN110728024A (en
Inventor
李绍军
蔡俊
周洋
倪佳能
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN201910869240.8A priority Critical patent/CN110728024B/en
Publication of CN110728024A publication Critical patent/CN110728024A/en
Application granted granted Critical
Publication of CN110728024B publication Critical patent/CN110728024B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention provides a vine copula-based soft measurement method and a vine copula-based soft measurement system, which comprise the following steps: selecting proper auxiliary variables for the soft measurement model according to actual industrial production conditions and expert knowledge; carrying out standardization and monotone transformation on the training data to obtain transformed data which accord with copula modeling; performing correlation modeling by using the D-vine copula to obtain a joint probability density function of the training sample auxiliary variable and the target variable; the method comprises the steps of online collection, standardization processing and monotonic transformation calculation of auxiliary variables of a sample to be predicted; calculating copula function values of the processed auxiliary variables of the sample to be predicted and target variables of all training samples, and further calculating the weight of each training sample; and according to the calculated weight of the training sample, carrying out linear weighting on the target variable of the training sample to obtain a predicted value of the target variable standardization of the sample to be predicted, and then carrying out inverse transformation to obtain a final predicted value.

Description

Vine copula-based soft measurement method and system
Technical Field
The invention belongs to the technical field of soft measurement, and particularly relates to a soft measurement method based on vine copula correlation description; meanwhile, the invention also relates to a soft measurement system based on the vine copula correlation description.
Background
With the introduction of industry 4.0, competition between domestic and foreign industries and manufacturing industries is becoming more intense, and requirements for product quality, manufacturing cost, energy consumption requirements and the like in industrial production are gradually increased. In order to reduce the cost of products, enterprises are developing towards complexity, scale and intellectualization. Therefore, the key information of the quality index of the related process object is obtained in time, and the method plays an important role in industrial development. However, the on-line measurement of some important process indexes is inevitably affected by factors such as a severe operating environment and a backward detection technology, and inevitably needs to be compensated by manual off-line analysis, which inevitably brings about a serious time lag and unpredictable mistakes and errors. To solve these problems, soft measurement techniques have been developed.
At present, most multivariate statistical soft measurement methods mainly use the idea of dimension reduction and decoupling (such as PCA, PLS, ICA, etc.). However, when the process is embodied as highly non-linear and non-gaussian, a significant loss of information often occurs and directly affects the final soft measurement effect. Therefore, the invention directly introduces copula theory to realize the correlation modeling of the high-dimensional data from the perspective of describing the complex correlation of the high-dimensional data. The more accurate statistical model can ensure the remarkable improvement of the soft measurement effect of the complex chemical process.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the soft measurement method based on the vine copula correlation description is provided, the problem of information loss caused by the traditional dimension reduction idea can be solved, and the prediction of the key variables of the multi-modal complex chemical process with nonlinearity and non-Gaussian is realized.
In addition, the invention also provides a soft measurement system based on the vine copula correlation description, which can overcome the problem of information loss caused by the traditional dimension reduction thought and realize the prediction of the key variables of the complex chemical process with nonlinearity and non-Gaussian.
In order to solve the technical problems, the invention adopts the following technical scheme:
a soft measurement method based on vine copula correlation description comprises the following steps:
step S1: selecting proper auxiliary variables for the soft measurement model according to actual industrial production conditions and expert knowledge;
step S2: carrying out standardization and monotone transformation on the training data to obtain transformed data which accord with copula modeling;
step S3: performing correlation modeling by using the D-vine copula to obtain a joint probability density function of the training sample auxiliary variable and the target variable;
step S4: the method comprises the steps of online collection, standardization processing and monotonic transformation calculation of auxiliary variables of a sample to be predicted;
step S5: calculating copula function values of the processed auxiliary variables of the sample to be predicted and target variables of all training samples, and further calculating the weight of each training sample;
step S6: and according to the weight of the training sample calculated in the step S5, carrying out linear weighting on the target variable of the training sample to obtain a predicted value of the target variable standardization of the sample to be predicted, and then carrying out inverse transformation to obtain a final predicted value.
Further, the step S2 obtains the monotone transformed data by the following sub-steps:
step 2.1: zero mean value standardization of original data (1)
Figure GDA0002263008630000021
Wherein the content of the first and second substances,
Xiis a variable that is to be subjected to a transformation,
Xi' is the zero mean normalized variable,
u(Xi) Is a variable XiThe average value of (a) of (b),
var(Xi) Is a variable XiThe variance of (a);
step 2.2: defining a monotonic transformation form, see equation (2):
Zi=(1-αi)Xi′+αiXr′ i=(1,2,…,d) (2)
wherein the content of the first and second substances,
Ziis monotonousThe variable after the transformation is changed, and the variable,
Xr' is a reference variable and is a reference variable,
αiis the corresponding monotonic transform coefficient;
step 2.3: determining monotonic transformation coefficients, see equation (3)
Figure GDA0002263008630000022
Figure GDA0002263008630000023
Wherein the content of the first and second substances,
ρi,0=Cov(Xr′,Xi′)=ρ(Xr′,Xi′),ρ(Xr′,Xi') represents Xr' and XiThe pearson correlation coefficient between' is,
ρmis a hyperparameter, representing p (X)r′,Zi') appropriate value, ensuring Xr' and Zr' can satisfy a monotonically increasing relationship.
Further, the step S3 obtains the joint probability density function of each modality through the following four sub-steps:
step 3.1, constructing an analytical model of copula pairs, which is shown in formula (4):
Figure GDA0002263008630000031
(F(xj|xj+1,…,xj+i-1),F(xj+i|xj+1,…,xj+i-1);θj,j+i|(j+1:j+i-1)) (4)
wherein the variables of each dimension have been normalized by the mean of the zero, i.e. xjWhich represents the variable after it has been normalized,
d is the dimension of the vector x,
f (x) is the joint probability density function of the vector x,
ft(xt) Is a variable xtThe edge probability density function of (a) is,
F(xj|xj+1,…,xj+i-1) Is a variable xjIs used to calculate the cumulative conditional distribution function of (c),
cj,j+i|j+1:j+i-1is a density function of the binary copula,
θj,j+i|(j+1:j+i-1)the parameters to be optimized in the binary copula density function are obtained;
and 3.2, selecting a D-vine copula model with a proper structure by using a formula (5):
Figure GDA0002263008630000032
wherein the content of the first and second substances,
τi,jis a variable xiAnd xjThe Kendall rank correlation coefficient of (1);
Figure GDA0002263008630000035
the optimized D-vine copula root node is obtained;
step 3.3, calculating the cumulative conditional distribution function in the formula (4) by adopting an iteration strategy, see formula (6):
Figure GDA0002263008630000033
wherein the content of the first and second substances,
v=x-iis a d-1 dimensional vector with the variable x removedi,
vjIs the jth element in the vector v,
v-jthe vector is the vector after the jth variable in the vector v is removed;
and 3.4, optimizing the structures of different binary copula in the formula (4) by adopting a BIC criterion, wherein the BIC is defined as the following formula (7):
Figure GDA0002263008630000034
wherein the content of the first and second substances,
n is the number of samples and,
q is an uncertainty parameter θj,j+i|(j+1:j+i-1)The number of (2);
the optimization of the parameters for each binary copula pair is based on the maximum likelihood estimation method, determined by the following equation (8):
Figure GDA0002263008630000041
Figure GDA0002263008630000042
wherein the content of the first and second substances,
Figure GDA0002263008630000043
to represent
Figure GDA0002263008630000044
The domain of definition of (a) is,
by selecting different binary copula structures from alternative binary copula families
Figure GDA0002263008630000045
Optimizing corresponding copula parameter by maximum likelihood estimation method
Figure GDA0002263008630000046
Finally, all copula pairs with the minimum BIC value are selected by using the BIC criterion.
Further, the step S4 determines the normalization and monotonicity processing of the test data by:
step 4.1: zero-mean normalization of auxiliary variables of the samples to be predicted based on the formula (1);
step 4.2: the samples to be predicted are monotonously transformed, based on step 2.3.
As a preferred aspect of the present invention, the step S5 determines the weight of the training sample by:
calculating coplua function values between target variable values of all training samples and auxiliary variables of the samples to be predicted according to the copula function obtained in the step 3
Figure GDA0002263008630000047
Further calculating the weights of all training samples according to equation (9) from the function value:
Figure GDA0002263008630000048
wherein the content of the first and second substances,
yiis the (i) th training sample,
w (yi) is the weight of the ith training sample,
Figure GDA0002263008630000049
is based on copula estimation from step 3.
As a preferable aspect of the present invention, the step S6 determines the predicted value of the target variable of the sample to be predicted by:
the formula (10) calculates the prediction value of the prediction sample standardization, and further obtains the final prediction value through the formula (11) inverse transformation:
Figure GDA0002263008630000051
Figure GDA0002263008630000052
wherein the content of the first and second substances,
yi' is the value of the training sample normalized by the zero mean,
w (yi) is the weight of the ith training sample,
var (y) is the variance of the target variable found based on the target variable of the training sample,
u (y) is the mean of the target variables found based on the target variables of the training samples.
The invention also provides a soft measurement system based on the vine copula correlation description, which comprises:
the training sample set acquisition module is used for determining auxiliary variables required by modeling; the data transformation module is used for carrying out standardization and monotonic transformation on each dimension variable to obtain data suitable for copula modeling; the joint probability density function acquisition module is used for performing correlation modeling to acquire a joint probability density function and a copula function of the auxiliary variable and the target variable; the on-line collection and transformation module of the auxiliary variable of the sample to be predicted; the training sample weight calculation module is used for calculating the weights of all training sample target variables according to the auxiliary variables of the test data; and the linear weighted prediction module weights the target variable probabilities of all the training samples after zero-mean standardization to obtain a predicted value of the target variable of the sample to be predicted, and then performs inverse transformation to obtain a final predicted value.
According to the method, a correlation model copula is introduced into soft measurement aiming at the nonlinearity, the non-Gaussian and the coupling relation of variables and complex non-monotonic characteristics of industrial data, and a monotonic transformation method is combined to provide a soft measurement regression model described based on the D-vine copula correlation.
The invention has the beneficial effects that: according to the soft measurement method and system based on the D-vine copula correlation description, a correlation model copula is introduced into soft measurement aiming at the nonlinearity, non-Gaussian and variable coupling relation and complex non-monotonic characteristics of industrial data, and the prediction of key variables is realized by combining a monotonic transformation method.
The invention introduces a vine copula to realize the soft measurement of a complex chemical process. Vine copula is a kind of copula which has been developed in recent years, and is widely applied to the fields of finance, economy, environmental science and the like. The vine copula can convert the correlation problem of high-dimensional data into the optimization problem of a limited number of binary copula in a sparse matrix, so that the complexity of parameter solution in a model is obviously reduced; meanwhile, based on the structural characteristics of high flexibility, the vine copula can accurately depict a complex chemical process embodied as high nonlinearity and non-Gaussian, and the method has remarkable advantages particularly for characteristic data containing tail bias. The method can ensure that the offline modeling has lower computational complexity, and can realize the real-time online prediction of the key variables of the complex chemical process.
Drawings
Fig. 1 is a flowchart of a vine copula-based soft measurement method according to the present invention.
FIG. 2 is a schematic view of a vine copula fitted during soft measurement of ethylene cracking data under the present invention.
FIG. 3 is a diagram of the prediction effect of the soft measurement of ethylene cracking data according to the present invention.
FIG. 4 is a graph showing the predicted effect of butane concentration at the bottom of the debutanizer column of the present invention.
Fig. 5 is a diagram of the prediction effect of 1000 groups of samples to be predicted according to the third embodiment.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Example one
The invention discloses a complex chemical process soft measurement method based on vine copula correlation modeling, which comprises the following specific steps:
step S1: and selecting proper auxiliary variables for the soft measurement model according to the actual industrial production condition and expert knowledge.
Step S2: and obtaining the transformed data which accords with copula modeling by using a monotone transformation method.
Zero mean value standardization of original data (1)
Figure GDA0002263008630000061
Wherein the content of the first and second substances,
Xiis a variable before transformation, Xi' is a zero mean normalized variable, u (X)i) Is a variable XiMean value of (a), var (X)i) Is a variable XiThe variance of (c). Defining a monotonic transformation form, see equation (2):
Zi=(1-αi)Xi′+αiXr′ i=(1,2,…,d) (2)
wherein
ZiIs a variable after rolling pin conversion, Xr' as a reference variable, αiIs the last dimension of the auxiliary variable directly selected by the corresponding monotonic transform coefficient reference variable, the monotonic transform coefficient is determined by the following formula (3)
Figure GDA0002263008630000062
Figure GDA0002263008630000063
Wherein the content of the first and second substances,
ρi,0=Cov(Xr′,Xi′)=ρ(Xr′,Xi′),ρ(Xr′,Xi') represents Xr' and Xi' the previous pearson correlation coefficient,
ρmis a hyperparameter, representing p (X)r′,Zi') appropriate value, ensuring Xr' and Zr' can satisfy a monotonically increasing relationship.
Step S3: and performing correlation modeling by using the D-vine copula to obtain a joint probability density function of the auxiliary variable and the target variable.
For d-dimensional random vector x ═ x1,x2,…,xd]The D-vine model (the joint probability density function of x) is:
Figure GDA0002263008630000071
(F(xj|xj+1,…,xj+i-1),F(xj+i|xj+1,…,xj+i-1);θj,j+i|(j+1:j+i-1)) (4)
where d is the dimension of the random vector x and the variables of each dimension have been normalized, ft(xt) Is a random variable xtOf a probability density function of F (x)j|xj+1,…,xj+i-1) Is a random variable xjCumulative conditional distribution function of cj,j+i|j+1:j+i-1Is a density function of binary copula, thetaj,j+i|(j+1:j+i-1)Is the parameter to be optimized in the binary copula density function.
In order to obtain the most appropriate D-vine structure in the formula (4), variable root nodes in a D-vine copula tree are determined according to the influence degrees of Kendall rank correlation coefficients of different variables, namely the following objective functions are optimized to realize:
Figure GDA0002263008630000072
wherein, taui,jIs a random variable xiAnd xjKendall rank correlation coefficient of (1).
Setting random variables xi(i ═ 1,2, …, n) initial value F of the edge-cumulative distribution functioni(xi) All cumulative conditional distribution function values referred to in equation (4) are calculated according to equation (6) and using an iterative strategy.
Figure GDA0002263008630000073
Wherein the content of the first and second substances,
Figure GDA0002263008630000074
indicating that x is not included in the random vector xiAnd xjThe set of all the elements of (a),
Figure GDA0002263008630000075
is a binary copula distribution function.
Respectively optimizing n (n-1)/2 binary copula structure domain parameters in the D-vine copula model by using a conditional distribution function value and an edge cumulative distribution function initial value in the formula (6), wherein the optimization criterion is a BIC criterion:
the structure of different binary copula in the formula (4) is optimized by adopting BIC criterion, and BIC is defined as the following formula (7):
Figure GDA0002263008630000076
where N is the number of samples and q is the uncertainty parameter θj,j+i|(j+1:j+i-1)The number of (2).
The optimization of the parameters for each binary copula pair is based on the maximum likelihood estimation method, determined by the following equation (8):
Figure GDA0002263008630000081
Figure GDA0002263008630000082
wherein the content of the first and second substances,
Figure GDA0002263008630000083
to represent
Figure GDA0002263008630000084
The domain of definition of (a) is,
by selecting different binary copula structures from alternative binary copula families
Figure GDA0002263008630000085
Optimizing corresponding copula parameter by maximum likelihood estimation method
Figure GDA0002263008630000086
Finally, all copula pairs with the minimum BIC value are selected by using the BIC criterion. Due to each binary copula parameter thetai,i+j|1:i-1Different value ranges exist, so that the L-BFGS-B algorithm is adopted to solve the problem that the formula (4) is used as an objective function, and theta is usedi,i+j|1:i-1Optimization problem with actual value range as constraint (1-2 dimensional optimization problem in general)
Step S4: normalization and monotonicity processing of test data
Step 4.1: zero-mean normalization of auxiliary variables of the samples to be predicted based on the formula (1);
step 4.2: the samples to be predicted are monotonously transformed, based on step 2.3.
Changing X to [ X ]1,x2,…,xd]Monotonic transformation into Z ═ Z1,z2,…,zd]。
Step S5, determine weights of training samples:
computing multivariate copula densities of auxiliary variables relative to target variables of all known samples
Figure GDA0002263008630000087
And determines the weight w of each training sample, see equation (9),
Figure GDA0002263008630000088
wherein the content of the first and second substances,
yiis the (i) th training sample,
w (yi) is the weight of the ith training sample,
Figure GDA0002263008630000089
is based on copula estimation from step 3.
Step S6, according to the weight of the training sample calculated in step S5, the target variable of the training sample is linearly weighted to obtain a predicted value of the target variable standardization of the sample to be predicted, and then inverse transformation is performed to obtain a final predicted value:
weighting the target variable probability of each weighted sample to obtain a standardized predicted value of the target variable of the sample to be predicted:
Figure GDA00022630086300000810
and performing inverse transformation on the formula to obtain a final predicted value, wherein the inverse transformation formula comprises the following steps:
Figure GDA0002263008630000091
wherein the content of the first and second substances,
yi' is the value of the training sample normalized by the zero mean,
w (yi) is the weight of the ith training sample,
var (y) is the variance of the target variable found based on the target variable of the training sample,
u (y) is the mean of the target variables found based on the target variables of the training samples.
Example two
The following examples are provided to aid in the understanding of the present invention and are not intended to limit the scope of the invention. Referring to fig. 2, the present embodiment realizes the Prediction (PER) of the ethylene cracking degree in the ethylene cracking process, the data of the present embodiment is derived from SRT-type III ethylene cracking furnace, the prediction target is the ethylene cracking rate, which is represented by PER (propylene/ethylene ratio), 500 groups of data of normal operating conditions are selected, 400 groups are used for training copula model, and 100 groups are used for testing.
(1) According to prior information, four auxiliary variables are selected and respectively: average outlet temperature x of cracking furnace1(ii) a Density x of pyrolysis feedstock2Total feed x3And steam to hydrocarbon ratio x4. The target variable y is the cracking depth indicator PER.
(2) Data preprocessing: standardizing zero mean value of training sample, selecting final one-dimensional auxiliary variable x as reference variable4Monotonous transformation by Pearson's correlation coefficient method, i.e. Zi=(1-αi)Xi′+αiXr', monotonic conversion coefficients are respectively alpha1=0.876,α2=0.874,α3=0.643,αy0.722, the transformed data [ z ] is obtained1,z2,z3,z4,zy]。
(3) Determining [ z ] using training samples1,z2,z3,z4,zy]And a joint probability density function of the auxiliary variable and the target variable is established, and the result of binary copula optimization among the 5-dimensional variables is shown in fig. 2. In fig. 2, the values inside the black brackets represent the serial numbers of the fitted binary copula.
(4) And carrying out the same monotone transformation on the auxiliary variables of the data to be predicted, and carrying out probability weighting to obtain the predicted values of the target variables.
(5) The prediction effect of 100 sets of samples to be predicted is shown in fig. 3.
The result shows that the effective and timely prediction of the cracking depth in the ethylene cracking process can be realized by adopting the vine copula correlation description soft measurement method.
EXAMPLE III
Referring to fig. 4, the present embodiment realizes the prediction of the concentration of butane at the bottom of the debutanizer, the data of the present embodiment is derived from the debutanizer process, the prediction target is the concentration of butane at the bottom of the debutanizer, 2000 sets of data under normal conditions were selected, 1000 sets were used to train copula model, and 1000 were used for testing.
(1) According to prior information, 7 auxiliary variables are selected to be respectively: temperature x at the top of the column1Pressure at the top of the column x2Amount of reflux at the top of the column x3Overhead product flow x46 th floor tray temperature x5Bottom temperature 1x6Bottom temperature 2x7And x is6And x7Merge into x6=(x6+x7) And/2, the dominant variable is the bottom butane concentration y.
(2) Data preprocessing: standardizing zero mean value of training sample, selecting final one-dimensional auxiliary variable x as reference variable6Using Pearson's correlation coefficient method to perform monotonic transformation, i.e.,Zi=(1-αi)Xi′+αiXr', monotonic conversion coefficients are respectively alpha1=0.676,α2=0.654,α3=0.693,α4=0.701,α5=0.603,αyGet transformed data [ z ] 0.7431,…,z6,zy]。
(3) Determining [ z ] using training samples1,…,z6,zy]And establishing a joint probability density function of the auxiliary variable and the target variable. The binary copula optimization results between the 7-dimensional variables are shown in fig. 4. In fig. 4, the values inside the black brackets represent the serial numbers of the fitted binary copula.
(4) And carrying out the same monotone transformation on the auxiliary variables of the data to be predicted, and carrying out probability weighting to obtain the predicted values of the target variables.
(5) The prediction effect of 1000 groups of samples to be predicted is shown in fig. 5.
The result shows that the effective and timely prediction of the concentration of the butane at the bottom of the debutanizer tower can be realized by adopting the soft measurement method described by the vine copula correlation.
Example four
A soft measurement system based on a vine copula correlation description, the system comprising:
the training sample set acquisition module is used for determining auxiliary variables required by modeling; the data transformation module is used for carrying out standardization and monotonic transformation on each dimension variable to obtain data suitable for copula modeling; the joint probability density function acquisition module is used for performing correlation modeling to acquire a joint probability density function and a copula function of the auxiliary variable and the target variable; the on-line collection and transformation module of the auxiliary variable of the sample to be predicted; the training sample weight calculation module is used for calculating the weights of all training sample target variables according to auxiliary variables of data to be predicted; and the linear weighted prediction module weights the target variable probabilities of all the training samples after zero-mean standardization to obtain a predicted value of the target variable of the sample to be predicted, and then performs inverse transformation to obtain a final predicted value. The specific implementation manner of each module can refer to the implementation process corresponding to each step in the first embodiment.
In summary, aiming at the nonlinearity, the non-gaussian, the coupling relation of the variables and the complex non-monotonic characteristics of the industrial data, a correlation model copula is introduced into soft measurement, and a monotonic transformation method is combined, so that a soft measurement regression model described based on the correlation of the D-vine copula is provided, the model does not need to perform dimensionality reduction processing on the original data, information loss is avoided, the monotonic transformation is performed on the original data at first, and the regression model based on the D-vine copula is established in a transformation space, so that the nonlinear, non-gaussian and non-monotonic problems of the industrial data are effectively processed, and good regression prediction capability is obtained.
The invention has the beneficial effects that: according to the soft measurement method and system based on the vine copula correlation description, a correlation model copula is introduced into soft measurement according to the nonlinear, non-Gaussian and variable coupling relation and complex non-monotonic characteristics of industrial data, and a soft measurement regression model based on the D-vine copula correlation description is provided by combining a monotonic transformation method, so that the prediction of key variables is realized.
The invention introduces a vine copula to realize the soft measurement of a complex chemical process. Vine copula is a kind of copula which has been developed in recent years, and is widely applied to the fields of finance, economy, environmental science and the like. The vine copula can convert the correlation problem of high-dimensional data into the optimization problem of a limited number of binary copula in a sparse matrix, so that the complexity of parameter solution in a model is obviously reduced; meanwhile, based on the structural characteristics of high flexibility, the vine copula can accurately depict a complex chemical process embodied as high nonlinearity and non-Gaussian, and the method has remarkable advantages particularly for characteristic data containing tail bias. The method can ensure that the offline modeling has lower computational complexity, and can realize the real-time online prediction of the key variables of the complex chemical process.
The description and application of the present invention are illustrative, and are not intended to limit the scope of the invention to the embodiments described above. Variations and modifications of the embodiments disclosed herein are possible, and alternative and equivalent various components of the embodiments will be apparent to those skilled in the art. It will be clear to those skilled in the art that the present invention may be embodied in other forms, structures, arrangements, proportions, and with other components, materials, and parts, without departing from the spirit or essential characteristics thereof. Other variations and modifications of the embodiments disclosed herein may be made without departing from the scope and spirit of the invention.

Claims (5)

1. A soft measurement method based on vine copula correlation description is characterized by comprising the following steps:
step S1: selecting proper auxiliary variables for the soft measurement model according to actual industrial production conditions and expert knowledge;
step S2: carrying out standardization and monotone transformation on the training data to obtain transformed data which accord with copula modeling;
step S3: performing correlation modeling by using the D-vine copula to obtain a joint probability density function of the training sample auxiliary variable and the target variable;
step S4: the method comprises the steps of online collection, standardization processing and monotonic transformation calculation of auxiliary variables of a sample to be predicted;
step S5: calculating copula function values of the processed auxiliary variables of the sample to be predicted and target variables of all training samples, and further calculating the weight of each training sample;
step S6: according to the weight of the training sample calculated in the step S5, carrying out linear weighting on the target variable of the training sample to obtain a predicted value of the standardization of the target variable of the sample to be predicted, and then carrying out inverse transformation to obtain a final predicted value;
the step S2 obtains the monotone transformed data by:
step 2.1: zero mean value standardization of original data (1)
Figure FDA0003089471890000011
Wherein the content of the first and second substances,
Xiis a variable that is to be subjected to a transformation,
Xi' is the zero mean normalized variable,
u(Xi) Is a variable XiThe average value of (a) of (b),
var(Xi) Is a variable XiThe variance of (a);
step 2.2: defining a monotonic transformation form, see equation (2):
Zi=(1-αi)Xi′+αiXr′i=(1,2,...,d) (2)
wherein the content of the first and second substances,
Ziis a variable that has been transformed monotonically,
Xr' is a reference variable and is a reference variable,
αiis the corresponding monotonic transform coefficient;
step 2.3: determining monotonic transformation coefficients, see equation (3)
Figure FDA0003089471890000012
Wherein the content of the first and second substances,
ρi,0=Cov(Xr′,Xi′)=ρ(Xr′,Xi′),ρ(Xr′,Xi') represents Xr' and XiPearson's correlation coefficient between, ρmIs a hyperparameter, representing p (X)r′,Zi') appropriate value, ensuring Xr' and Zr' to satisfy a monotonically increasing relationship; the step S3 obtains the joint probability density function of each modality through the following four substeps:
step 3.1, constructing an analytical model of copula pairs, which is shown in formula (4):
Figure FDA0003089471890000021
wherein the variables of each dimension have been normalized by the mean of the zero, i.e. xjWhich represents the variable after it has been normalized,
d is the dimension of the vector x,
f (x) is the joint probability density function of the vector x,
ft(xt) Is a variable xtThe edge probability density function of (a) is,
F(xj|xj+1,...,xj+i-1) Is a variable xjIs used to calculate the cumulative conditional distribution function of (c),
cj,j+i|j+1:j+i-1is a density function of the binary copula,
θj,j+i|(j+1:j+i-1)the parameters to be optimized in the binary copula density function are obtained;
and 3.2, selecting a D-vine copula model with a proper structure by using a formula (5):
Figure FDA0003089471890000022
wherein the content of the first and second substances,
τi,jis a variable xiAnd xjThe Kendall rank correlation coefficient of (1);
Figure FDA0003089471890000023
the optimized D-vine copula root node is obtained;
step 3.3, calculating the cumulative conditional distribution function in the formula (4) by adopting an iteration strategy, see formula (6):
Figure FDA0003089471890000024
wherein
v=x-iIs a d-1 dimensional vector with the variable x removedi
vjIs the jth element in the vector v,
v-jthe vector is the vector after the jth variable in the vector v is removed;
and 3.4, optimizing the structures of different binary copula in the formula (4) by adopting a BIC criterion, wherein the BIC is defined as the following formula (7):
Figure FDA0003089471890000031
wherein the content of the first and second substances,
n is the number of samples and,
q is an uncertainty parameter θj,j+i|(j+1:j+i-1)The number of (2);
the optimization of the parameters for each binary copula pair is based on the maximum likelihood estimation method, determined by the following equation (8):
Figure FDA0003089471890000032
wherein the content of the first and second substances,
Figure FDA0003089471890000033
to represent
Figure FDA0003089471890000034
The domain of definition of (a) is,
by selecting different binary copula structures from alternative binary copula families
Figure FDA0003089471890000035
Optimizing corresponding copula parameter by maximum likelihood estimation method
Figure FDA0003089471890000036
Finally, all copula pairs with the minimum BIC value are selected by using the BIC criterion.
2. The method of soft measurement described by vine copula correlation according to claim 1, wherein: the step S4 determines the normalization and monotonicity processing of the test data by the following steps:
step 4.1: zero-mean normalization of auxiliary variables of the samples to be predicted based on the formula (1);
step 4.2: the samples to be predicted are monotonously transformed, based on step 2.3.
3. The method of soft measurement described by vine copula correlation according to claim 1, wherein: the step S5 determines the weights of the training samples by: calculating coplua function values between target variable values of all training samples and auxiliary variables of the samples to be predicted according to the copula function obtained in the step 3
Figure FDA0003089471890000037
Further calculating the weights of all training samples according to equation (9) from the function value:
Figure FDA0003089471890000038
wherein the content of the first and second substances,
yiis the (i) th training sample,
w (yi) is the weight of the ith training sample,
Figure FDA0003089471890000041
is based on copula estimation from step 3.
4. The method of soft measurement as claimed in claim 1, wherein the step S6 calculates the prediction value normalized by the prediction samples according to formula (10), and further obtains the final prediction value by inverse transformation according to formula (11):
Figure FDA0003089471890000042
Figure FDA0003089471890000043
wherein the content of the first and second substances,
yi' is the value of the training sample normalized by the zero mean,
w (yi) is the weight of the ith training sample,
var (y) is the variance of the target variable found based on the target variable of the training sample,
u (y) is the mean of the target variables found based on the target variables of the training samples.
5. A complex chemical process soft measurement method based on vine copula correlation modeling comprises the following specific steps:
step S1: selecting proper auxiliary variables for the soft measurement model according to actual industrial production conditions and expert knowledge;
step S2: obtaining transformed data which accord with copula modeling by using a monotone transformation method;
zero mean value standardization of original data (1)
Figure FDA0003089471890000044
Wherein the content of the first and second substances,
Xiis a variable before transformation, Xi' is a zero mean normalized variable, u (X)i) Is a variable XiMean value of (a), var (X)i) Is a variable XiThe variance of (a); defining a monotonic transformation form, see equation (2):
Zi=(1-αi)Xi′+αiXr′i=(1,2,...,d) (2)
wherein
ZiIs a variable after rolling pin conversion, Xr' as a reference variable, αiIs a corresponding monotonic transform coefficient
The last dimension of the auxiliary variable is directly selected by reference variable, and the monotonic transformation coefficient is determined by the following formula (3)
Figure FDA0003089471890000045
Figure FDA0003089471890000051
Wherein the content of the first and second substances,
ρi,0=Cov(Xr′,Xi′)=ρ(Xr′,Xi′),ρ(Xr′,Xi') represents Xr' and Xi' the previous pearson correlation coefficient,
ρmis a hyperparameter, representing p (X)r′,Zi') appropriate value, ensuring Xr' and Zr' to satisfy a monotonically increasing relationship;
step S3: performing correlation modeling by using the D-vine copula to obtain a joint probability density function of the auxiliary variable and the target variable;
for d-dimensional random vector x ═ x1,x2,...,xd]The D-vine model (the joint probability density function of x) is:
Figure FDA0003089471890000052
where d is the dimension of the random vector x and the variables of each dimension have been normalized, ft(xt) Is a random variable xtOf a probability density function of F (x)j|xj+1,...,xj+i-1) Is a random variable xjCumulative conditional distribution function of cj,j+i|j+1:j+i-1Is a density function of binary copula, thetaj,j+i|(j+1:j+i-1)The parameters to be optimized in the binary copula density function are obtained;
in order to obtain the most appropriate D-vine structure in the formula (4), variable root nodes in a D-vine copula tree are determined according to the influence degrees of Kendall rank correlation coefficients of different variables, namely the following objective functions are optimized to realize:
Figure FDA0003089471890000053
wherein, taui,jIs a random variable xiAnd xjThe Kendall rank correlation coefficient of (1);
setting random variables xi(i ═ 1,2, Λ, n) initial value F of the edge-cumulative distribution functioni(xi) Calculating all cumulative conditional distribution function values involved in equation (4) according to equation (6) and using an iterative strategy;
Figure FDA0003089471890000054
wherein the content of the first and second substances,
Figure FDA0003089471890000055
indicating that x is not included in the random vector xiAnd xjThe set of all the elements of (a),
Figure FDA0003089471890000056
is a distribution function of binary copula;
respectively optimizing n (n-1)/2 binary copula structure domain parameters in the D-vine copula model by using a conditional distribution function value and an edge cumulative distribution function initial value in the formula (6), wherein the optimization criterion is a BIC criterion:
the structure of different binary copula in the formula (4) is optimized by adopting BIC criterion, and BIC is defined as the following formula (7):
Figure FDA0003089471890000057
where N is the number of samples and q is the uncertainty parameter θj,j+i|(j+1:j+i-1)The number of (2);
the optimization of the parameters for each binary copula pair is based on the maximum likelihood estimation method, determined by the following equation (8):
Figure FDA0003089471890000061
wherein the content of the first and second substances,
Figure FDA0003089471890000062
to represent
Figure FDA0003089471890000063
The domain of definition of (a) is,
by selecting different binary copula structures from alternative binary copula families
Figure FDA0003089471890000064
Optimizing corresponding copula parameter by maximum likelihood estimation method
Figure FDA0003089471890000065
Finally, selecting all copula pairs with the minimum BIC value by using a BIC criterion; due to each binary copula parameter thetai,i+j|1:i-1Different value ranges exist, so that the L-BFGS-B algorithm is adopted to solve the problem that the formula (4) is used as an objective function, and theta is usedi,i+j|1:i-1The actual value range is a constrained optimization problem;
step S4: normalization and monotonicity processing of test data
Step 4.1: zero-mean normalization of auxiliary variables of the samples to be predicted based on the formula (1);
step 4.2: the sample to be predicted is monotonously transformed, based on step 2.3,
changing X to [ X ]1,x2,...,xd]Monotonic transformation into Z ═ Z1,z2,...,zd];
Step S5, determine weights of training samples:
computing multivariate copula densities of auxiliary variables relative to target variables of all known samples
Figure FDA0003089471890000066
And determines the weight w of each training sample, see equation (9),
Figure FDA0003089471890000067
wherein the content of the first and second substances,
yiis the (i) th training sample,
w(yi) Is the weight of the ith training sample,
Figure FDA0003089471890000068
is estimated based on copula in step 3;
step S6, according to the weight of the training sample calculated in step S5, the target variable of the training sample is linearly weighted to obtain a predicted value of the target variable standardization of the sample to be predicted, and then inverse transformation is performed to obtain a final predicted value:
weighting the target variable probability of each weighted sample to obtain a standardized predicted value of the target variable of the sample to be predicted:
Figure FDA0003089471890000069
and performing inverse transformation on the formula to obtain a final predicted value, wherein the inverse transformation formula comprises the following steps:
Figure FDA0003089471890000071
wherein the content of the first and second substances,
yi' is the value of the training sample normalized by the zero mean,
w (yi) is the weight of the ith training sample,
var (y) is the variance of the target variable found based on the target variable of the training sample,
u (y) is the mean of the target variables found based on the target variables of the training samples.
CN201910869240.8A 2019-09-16 2019-09-16 Vine copula-based soft measurement method and system Active CN110728024B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910869240.8A CN110728024B (en) 2019-09-16 2019-09-16 Vine copula-based soft measurement method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910869240.8A CN110728024B (en) 2019-09-16 2019-09-16 Vine copula-based soft measurement method and system

Publications (2)

Publication Number Publication Date
CN110728024A CN110728024A (en) 2020-01-24
CN110728024B true CN110728024B (en) 2021-09-03

Family

ID=69219002

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910869240.8A Active CN110728024B (en) 2019-09-16 2019-09-16 Vine copula-based soft measurement method and system

Country Status (1)

Country Link
CN (1) CN110728024B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111781824B (en) * 2020-05-26 2022-08-09 华东理工大学 Self-adaptive soft measurement method and system based on vine copula quantile regression

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104914775A (en) * 2015-06-12 2015-09-16 华东理工大学 Multi-modal process fault detection method and system based on vine copula correlation description
CN108345961A (en) * 2018-01-30 2018-07-31 上海电力学院 The prediction of wind farm group output and analysis method
CN108985574A (en) * 2018-06-23 2018-12-11 浙江工业大学 A kind of polypropylene melt index flexible measurement method based on selective ensemble extreme learning machine

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10468136B2 (en) * 2016-08-29 2019-11-05 Conduent Business Services, Llc Method and system for data processing to predict health condition of a human subject
CN108804784A (en) * 2018-05-25 2018-11-13 江南大学 A kind of instant learning soft-measuring modeling method based on Bayes's gauss hybrid models

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104914775A (en) * 2015-06-12 2015-09-16 华东理工大学 Multi-modal process fault detection method and system based on vine copula correlation description
CN108345961A (en) * 2018-01-30 2018-07-31 上海电力学院 The prediction of wind farm group output and analysis method
CN108985574A (en) * 2018-06-23 2018-12-11 浙江工业大学 A kind of polypropylene melt index flexible measurement method based on selective ensemble extreme learning machine

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
An efficient copula-based method of identifying regression models of non-monotonic relationships in processing plants;Taha Mohseni Ahooyi等;《Chemical Engineering Science》;20150404;第106-114页 *
Probabilistic density-based regression model for soft sensing of nonlinear industrial processes;Xiaofeng Yuan等;《Journal of Process Control》;20170620;第15-24页 *
基于变结构Copula模型的相依关系分析;王沁;《数理统计与管理》;20120331;第31卷(第2期);第341-347页 *
基于贝叶斯理论与Vine Copula的化工过程异常事件数的预测;吕成等;《华东理工大学学报(自然科学版)》;20150430;第41卷(第2期);第144-150页 *

Also Published As

Publication number Publication date
CN110728024A (en) 2020-01-24

Similar Documents

Publication Publication Date Title
CN107451101B (en) Method for predicting concentration of butane at bottom of debutanizer by hierarchical integrated Gaussian process regression soft measurement modeling
CN112101480B (en) Multivariate clustering and fused time sequence combined prediction method
CN109635245A (en) A kind of robust width learning system
CN108399434B (en) Analysis and prediction method of high-dimensional time series data based on feature extraction
Su et al. Prediction model of permeability index for blast furnace based on the improved multi-layer extreme learning machine and wavelet transform
CN113935535A (en) Principal component analysis method for medium-and-long-term prediction model
CN110728024B (en) Vine copula-based soft measurement method and system
CN102680646A (en) Method of soft measurement for concentration of reactant in unsaturated polyester resin reacting kettle
CN112085348A (en) Soil fertility assessment method based on fuzzy neural network
Liu et al. Efficient low-order system identification from low-quality step response data with rank-constrained optimization
CN114357870A (en) Metering equipment operation performance prediction analysis method based on local weighted partial least squares
CN110033175B (en) Soft measurement method based on integrated multi-core partial least square regression model
CN110879873B (en) Soft measurement method and system for vine copula correlation description based on Hamilton Monte Carlo sampling
CN111861002A (en) Building cold and hot load prediction method based on data-driven Gaussian learning technology
CN114707424B (en) Chemical process soft measurement method based on quality-related slow characteristic analysis algorithm
CN115631804A (en) Method for predicting outlet concentration of sodium aluminate solution in evaporation process based on data coordination
CN111610514B (en) Inversion method and device for propagation characteristics of evaporation waveguide
CN112364527B (en) Debutanizer soft measurement modeling method based on ALIESN online learning algorithm
CN115034635A (en) Non-uniform frequency analysis method for bivariate hydrological sequence in changing environment
CN110866643B (en) Fermentation process quality variable prediction method based on maximum secondary mutual information criterion regression
CN112001115A (en) Soft measurement modeling method of semi-supervised dynamic soft measurement network
CN112329805A (en) Wind speed forecasting device and method based on heteroscedastic noise twin LSSVR
CN112381145A (en) Gaussian process regression multi-model fusion modeling method based on nearest correlation spectral clustering
CN111781824B (en) Self-adaptive soft measurement method and system based on vine copula quantile regression
CN113378383B (en) Food supply chain hazard prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant