CN110728024B

CN110728024B - Vine copula-based soft measurement method and system

Info

Publication number: CN110728024B
Application number: CN201910869240.8A
Authority: CN
Inventors: 李绍军; 蔡俊; 周洋; 倪佳能
Original assignee: East China University of Science and Technology
Current assignee: East China University of Science and Technology
Priority date: 2019-09-16
Filing date: 2019-09-16
Publication date: 2021-09-03
Anticipated expiration: 2039-09-16
Also published as: CN110728024A

Abstract

The invention provides a vine copula-based soft measurement method and a vine copula-based soft measurement system, which comprise the following steps: selecting proper auxiliary variables for the soft measurement model according to actual industrial production conditions and expert knowledge; carrying out standardization and monotone transformation on the training data to obtain transformed data which accord with copula modeling; performing correlation modeling by using the D-vine copula to obtain a joint probability density function of the training sample auxiliary variable and the target variable; the method comprises the steps of online collection, standardization processing and monotonic transformation calculation of auxiliary variables of a sample to be predicted; calculating copula function values of the processed auxiliary variables of the sample to be predicted and target variables of all training samples, and further calculating the weight of each training sample; and according to the calculated weight of the training sample, carrying out linear weighting on the target variable of the training sample to obtain a predicted value of the target variable standardization of the sample to be predicted, and then carrying out inverse transformation to obtain a final predicted value.

Description

Vine copula-based soft measurement method and system

Technical Field

The invention belongs to the technical field of soft measurement, and particularly relates to a soft measurement method based on vine copula correlation description; meanwhile, the invention also relates to a soft measurement system based on the vine copula correlation description.

Background

With the introduction of industry 4.0, competition between domestic and foreign industries and manufacturing industries is becoming more intense, and requirements for product quality, manufacturing cost, energy consumption requirements and the like in industrial production are gradually increased. In order to reduce the cost of products, enterprises are developing towards complexity, scale and intellectualization. Therefore, the key information of the quality index of the related process object is obtained in time, and the method plays an important role in industrial development. However, the on-line measurement of some important process indexes is inevitably affected by factors such as a severe operating environment and a backward detection technology, and inevitably needs to be compensated by manual off-line analysis, which inevitably brings about a serious time lag and unpredictable mistakes and errors. To solve these problems, soft measurement techniques have been developed.

At present, most multivariate statistical soft measurement methods mainly use the idea of dimension reduction and decoupling (such as PCA, PLS, ICA, etc.). However, when the process is embodied as highly non-linear and non-gaussian, a significant loss of information often occurs and directly affects the final soft measurement effect. Therefore, the invention directly introduces copula theory to realize the correlation modeling of the high-dimensional data from the perspective of describing the complex correlation of the high-dimensional data. The more accurate statistical model can ensure the remarkable improvement of the soft measurement effect of the complex chemical process.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the soft measurement method based on the vine copula correlation description is provided, the problem of information loss caused by the traditional dimension reduction idea can be solved, and the prediction of the key variables of the multi-modal complex chemical process with nonlinearity and non-Gaussian is realized.

In addition, the invention also provides a soft measurement system based on the vine copula correlation description, which can overcome the problem of information loss caused by the traditional dimension reduction thought and realize the prediction of the key variables of the complex chemical process with nonlinearity and non-Gaussian.

In order to solve the technical problems, the invention adopts the following technical scheme:

a soft measurement method based on vine copula correlation description comprises the following steps:

step S1: selecting proper auxiliary variables for the soft measurement model according to actual industrial production conditions and expert knowledge;

step S2: carrying out standardization and monotone transformation on the training data to obtain transformed data which accord with copula modeling;

step S3: performing correlation modeling by using the D-vine copula to obtain a joint probability density function of the training sample auxiliary variable and the target variable;

step S4: the method comprises the steps of online collection, standardization processing and monotonic transformation calculation of auxiliary variables of a sample to be predicted;

step S5: calculating copula function values of the processed auxiliary variables of the sample to be predicted and target variables of all training samples, and further calculating the weight of each training sample;

step S6: and according to the weight of the training sample calculated in the step S5, carrying out linear weighting on the target variable of the training sample to obtain a predicted value of the target variable standardization of the sample to be predicted, and then carrying out inverse transformation to obtain a final predicted value.

Further, the step S2 obtains the monotone transformed data by the following sub-steps:

step 2.1: zero mean value standardization of original data (1)

Wherein the content of the first and second substances,

X_iis a variable that is to be subjected to a transformation,

X_i' is the zero mean normalized variable,

u(X_i) Is a variable X_iThe average value of (a) of (b),

var(X_i) Is a variable X_iThe variance of (a);

step 2.2: defining a monotonic transformation form, see equation (2):

Z_i＝(1-α_i)X_i′+α_iX_r′ i＝(1,2,…,d) (2)

wherein the content of the first and second substances,

Z_iis monotonousThe variable after the transformation is changed, and the variable,

X_r' is a reference variable and is a reference variable,

α_iis the corresponding monotonic transform coefficient;

step 2.3: determining monotonic transformation coefficients, see equation (3)

Wherein the content of the first and second substances,

ρ_i,0＝Cov(X_r′,X_i′)＝ρ(X_r′,X_i′)，ρ(X_r′,X_i') represents X_r' and X_iThe pearson correlation coefficient between' is,

ρ_mis a hyperparameter, representing p (X)_r′,Z_i') appropriate value, ensuring X_r' and Z_r' can satisfy a monotonically increasing relationship.

Further, the step S3 obtains the joint probability density function of each modality through the following four sub-steps:

step 3.1, constructing an analytical model of copula pairs, which is shown in formula (4):

(F(x_j|x_j+1,…,x_j+i-1),F(x_j+i|x_j+1,…,x_j+i-1)；θ_{j,j+i|(j+1:j+i-1)}) (4)

wherein the variables of each dimension have been normalized by the mean of the zero, i.e. x_jWhich represents the variable after it has been normalized,

d is the dimension of the vector x,

f (x) is the joint probability density function of the vector x,

f_t(x_t) Is a variable x_tThe edge probability density function of (a) is,

F(x_j|x_j+1,…,x_j+i-1) Is a variable x_jIs used to calculate the cumulative conditional distribution function of (c),

c_{j,j+i|j+1:j+i-1}is a density function of the binary copula,

θ_{j,j+i|(j+1:j+i-1)}the parameters to be optimized in the binary copula density function are obtained;

and 3.2, selecting a D-vine copula model with a proper structure by using a formula (5):

wherein the content of the first and second substances,

τ_i,jis a variable x_iAnd x_jThe Kendall rank correlation coefficient of (1);

the optimized D-vine copula root node is obtained;

step 3.3, calculating the cumulative conditional distribution function in the formula (4) by adopting an iteration strategy, see formula (6):

wherein the content of the first and second substances,

v＝x_-iis a d-1 dimensional vector with the variable x removed_i,

v_jIs the jth element in the vector v,

v_-jthe vector is the vector after the jth variable in the vector v is removed;

and 3.4, optimizing the structures of different binary copula in the formula (4) by adopting a BIC criterion, wherein the BIC is defined as the following formula (7):

wherein the content of the first and second substances,

n is the number of samples and,

q is an uncertainty parameter θ_{j,j+i|(j+1:j+i-1)}The number of (2);

the optimization of the parameters for each binary copula pair is based on the maximum likelihood estimation method, determined by the following equation (8):

wherein the content of the first and second substances,

to represent

The domain of definition of (a) is,

by selecting different binary copula structures from alternative binary copula families

Optimizing corresponding copula parameter by maximum likelihood estimation method

Finally, all copula pairs with the minimum BIC value are selected by using the BIC criterion.

Further, the step S4 determines the normalization and monotonicity processing of the test data by:

step 4.1: zero-mean normalization of auxiliary variables of the samples to be predicted based on the formula (1);

step 4.2: the samples to be predicted are monotonously transformed, based on step 2.3.

As a preferred aspect of the present invention, the step S5 determines the weight of the training sample by:

calculating coplua function values between target variable values of all training samples and auxiliary variables of the samples to be predicted according to the copula function obtained in the step 3

Further calculating the weights of all training samples according to equation (9) from the function value:

wherein the content of the first and second substances,

y_iis the (i) th training sample,

w (yi) is the weight of the ith training sample,

is based on copula estimation from step 3.

As a preferable aspect of the present invention, the step S6 determines the predicted value of the target variable of the sample to be predicted by:

the formula (10) calculates the prediction value of the prediction sample standardization, and further obtains the final prediction value through the formula (11) inverse transformation:

wherein the content of the first and second substances,

y_i' is the value of the training sample normalized by the zero mean,

w (yi) is the weight of the ith training sample,

var (y) is the variance of the target variable found based on the target variable of the training sample,

u (y) is the mean of the target variables found based on the target variables of the training samples.

The invention also provides a soft measurement system based on the vine copula correlation description, which comprises:

the training sample set acquisition module is used for determining auxiliary variables required by modeling; the data transformation module is used for carrying out standardization and monotonic transformation on each dimension variable to obtain data suitable for copula modeling; the joint probability density function acquisition module is used for performing correlation modeling to acquire a joint probability density function and a copula function of the auxiliary variable and the target variable; the on-line collection and transformation module of the auxiliary variable of the sample to be predicted; the training sample weight calculation module is used for calculating the weights of all training sample target variables according to the auxiliary variables of the test data; and the linear weighted prediction module weights the target variable probabilities of all the training samples after zero-mean standardization to obtain a predicted value of the target variable of the sample to be predicted, and then performs inverse transformation to obtain a final predicted value.

According to the method, a correlation model copula is introduced into soft measurement aiming at the nonlinearity, the non-Gaussian and the coupling relation of variables and complex non-monotonic characteristics of industrial data, and a monotonic transformation method is combined to provide a soft measurement regression model described based on the D-vine copula correlation.

The invention has the beneficial effects that: according to the soft measurement method and system based on the D-vine copula correlation description, a correlation model copula is introduced into soft measurement aiming at the nonlinearity, non-Gaussian and variable coupling relation and complex non-monotonic characteristics of industrial data, and the prediction of key variables is realized by combining a monotonic transformation method.

The invention introduces a vine copula to realize the soft measurement of a complex chemical process. Vine copula is a kind of copula which has been developed in recent years, and is widely applied to the fields of finance, economy, environmental science and the like. The vine copula can convert the correlation problem of high-dimensional data into the optimization problem of a limited number of binary copula in a sparse matrix, so that the complexity of parameter solution in a model is obviously reduced; meanwhile, based on the structural characteristics of high flexibility, the vine copula can accurately depict a complex chemical process embodied as high nonlinearity and non-Gaussian, and the method has remarkable advantages particularly for characteristic data containing tail bias. The method can ensure that the offline modeling has lower computational complexity, and can realize the real-time online prediction of the key variables of the complex chemical process.

Drawings

Fig. 1 is a flowchart of a vine copula-based soft measurement method according to the present invention.

FIG. 2 is a schematic view of a vine copula fitted during soft measurement of ethylene cracking data under the present invention.

FIG. 3 is a diagram of the prediction effect of the soft measurement of ethylene cracking data according to the present invention.

FIG. 4 is a graph showing the predicted effect of butane concentration at the bottom of the debutanizer column of the present invention.

Fig. 5 is a diagram of the prediction effect of 1000 groups of samples to be predicted according to the third embodiment.

Detailed Description

Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Example one

The invention discloses a complex chemical process soft measurement method based on vine copula correlation modeling, which comprises the following specific steps:

step S1: and selecting proper auxiliary variables for the soft measurement model according to the actual industrial production condition and expert knowledge.

Step S2: and obtaining the transformed data which accords with copula modeling by using a monotone transformation method.

Zero mean value standardization of original data (1)

Wherein the content of the first and second substances,

X_iis a variable before transformation, X_i' is a zero mean normalized variable, u (X)_i) Is a variable X_iMean value of (a), var (X)_i) Is a variable X_iThe variance of (c). Defining a monotonic transformation form, see equation (2):

Z_i＝(1-α_i)X_i′+α_iX_r′ i＝(1,2,…,d) (2)

wherein

Z_iIs a variable after rolling pin conversion, X_r' as a reference variable, α_iIs the last dimension of the auxiliary variable directly selected by the corresponding monotonic transform coefficient reference variable, the monotonic transform coefficient is determined by the following formula (3)

Wherein the content of the first and second substances,

ρ_i,0＝Cov(X_r′,X_i′)＝ρ(X_r′,X_i′)，ρ(X_r′,X_i') represents X_r' and X_i' the previous pearson correlation coefficient,

Step S3: and performing correlation modeling by using the D-vine copula to obtain a joint probability density function of the auxiliary variable and the target variable.

For d-dimensional random vector x ═ x₁,x₂,…,x_d]The D-vine model (the joint probability density function of x) is:

where d is the dimension of the random vector x and the variables of each dimension have been normalized, f_t(x_t) Is a random variable x_tOf a probability density function of F (x)_j|x_j+1,…,x_j+i-1) Is a random variable x_jCumulative conditional distribution function of c_{j,j+i|j+1:j+i-1}Is a density function of binary copula, theta_{j,j+i|(j+1:j+i-1)}Is the parameter to be optimized in the binary copula density function.

In order to obtain the most appropriate D-vine structure in the formula (4), variable root nodes in a D-vine copula tree are determined according to the influence degrees of Kendall rank correlation coefficients of different variables, namely the following objective functions are optimized to realize:

wherein, tau_i,jIs a random variable x_iAnd x_jKendall rank correlation coefficient of (1).

Setting random variables x_i(i ═ 1,2, …, n) initial value F of the edge-cumulative distribution function_i(x_i) All cumulative conditional distribution function values referred to in equation (4) are calculated according to equation (6) and using an iterative strategy.

Wherein the content of the first and second substances,

indicating that x is not included in the random vector x_iAnd x_jThe set of all the elements of (a),

is a binary copula distribution function.

Respectively optimizing n (n-1)/2 binary copula structure domain parameters in the D-vine copula model by using a conditional distribution function value and an edge cumulative distribution function initial value in the formula (6), wherein the optimization criterion is a BIC criterion:

the structure of different binary copula in the formula (4) is optimized by adopting BIC criterion, and BIC is defined as the following formula (7):

where N is the number of samples and q is the uncertainty parameter θ_{j,j+i|(j+1:j+i-1)}The number of (2).

wherein the content of the first and second substances,

to represent

The domain of definition of (a) is,

Finally, all copula pairs with the minimum BIC value are selected by using the BIC criterion. Due to each binary copula parameter theta_i,i+j|1:i-1Different value ranges exist, so that the L-BFGS-B algorithm is adopted to solve the problem that the formula (4) is used as an objective function, and theta is used_i,i+j|1:i-1Optimization problem with actual value range as constraint (1-2 dimensional optimization problem in general)

Step S4: normalization and monotonicity processing of test data

Changing X to [ X ]₁,x₂,…,x_d]Monotonic transformation into Z ═ Z₁,z₂,…,z_d]。

Step S5, determine weights of training samples:

computing multivariate copula densities of auxiliary variables relative to target variables of all known samples

And determines the weight w of each training sample, see equation (9),

wherein the content of the first and second substances,

y_iis the (i) th training sample,

w (yi) is the weight of the ith training sample,

is based on copula estimation from step 3.

Step S6, according to the weight of the training sample calculated in step S5, the target variable of the training sample is linearly weighted to obtain a predicted value of the target variable standardization of the sample to be predicted, and then inverse transformation is performed to obtain a final predicted value:

weighting the target variable probability of each weighted sample to obtain a standardized predicted value of the target variable of the sample to be predicted:

and performing inverse transformation on the formula to obtain a final predicted value, wherein the inverse transformation formula comprises the following steps:

wherein the content of the first and second substances,

y_i' is the value of the training sample normalized by the zero mean,

w (yi) is the weight of the ith training sample,

Example two

The following examples are provided to aid in the understanding of the present invention and are not intended to limit the scope of the invention. Referring to fig. 2, the present embodiment realizes the Prediction (PER) of the ethylene cracking degree in the ethylene cracking process, the data of the present embodiment is derived from SRT-type III ethylene cracking furnace, the prediction target is the ethylene cracking rate, which is represented by PER (propylene/ethylene ratio), 500 groups of data of normal operating conditions are selected, 400 groups are used for training copula model, and 100 groups are used for testing.

(1) According to prior information, four auxiliary variables are selected and respectively: average outlet temperature x of cracking furnace₁(ii) a Density x of pyrolysis feedstock₂Total feed x₃And steam to hydrocarbon ratio x₄. The target variable y is the cracking depth indicator PER.

(2) Data preprocessing: standardizing zero mean value of training sample, selecting final one-dimensional auxiliary variable x as reference variable₄Monotonous transformation by Pearson's correlation coefficient method, i.e. Z_i＝(1-α_i)X_i′+α_iX_r', monotonic conversion coefficients are respectively alpha₁＝0.876，α₂＝0.874，α₃＝0.643，α_y0.722, the transformed data [ z ] is obtained₁,z₂,z₃,z₄,z_y]。

(3) Determining [ z ] using training samples₁,z₂,z₃,z₄,z_y]And a joint probability density function of the auxiliary variable and the target variable is established, and the result of binary copula optimization among the 5-dimensional variables is shown in fig. 2. In fig. 2, the values inside the black brackets represent the serial numbers of the fitted binary copula.

(4) And carrying out the same monotone transformation on the auxiliary variables of the data to be predicted, and carrying out probability weighting to obtain the predicted values of the target variables.

(5) The prediction effect of 100 sets of samples to be predicted is shown in fig. 3.

The result shows that the effective and timely prediction of the cracking depth in the ethylene cracking process can be realized by adopting the vine copula correlation description soft measurement method.

EXAMPLE III

Referring to fig. 4, the present embodiment realizes the prediction of the concentration of butane at the bottom of the debutanizer, the data of the present embodiment is derived from the debutanizer process, the prediction target is the concentration of butane at the bottom of the debutanizer, 2000 sets of data under normal conditions were selected, 1000 sets were used to train copula model, and 1000 were used for testing.

(1) According to prior information, 7 auxiliary variables are selected to be respectively: temperature x at the top of the column₁Pressure at the top of the column x₂Amount of reflux at the top of the column x₃Overhead product flow x₄6 th floor tray temperature x₅Bottom temperature 1x₆Bottom temperature 2x₇And x is₆And x₇Merge into x₆＝(x₆+x₇) And/2, the dominant variable is the bottom butane concentration y.

(2) Data preprocessing: standardizing zero mean value of training sample, selecting final one-dimensional auxiliary variable x as reference variable₆Using Pearson's correlation coefficient method to perform monotonic transformation, i.e.，Z_i＝(1-α_i)X_i′+α_iX_r', monotonic conversion coefficients are respectively alpha₁＝0.676，α₂＝0.654，α₃＝0.693，α₄＝0.701，α₅＝0.603，α_yGet transformed data [ z ] 0.743₁,…,z₆,z_y]。

(3) Determining [ z ] using training samples₁,…,z₆,z_y]And establishing a joint probability density function of the auxiliary variable and the target variable. The binary copula optimization results between the 7-dimensional variables are shown in fig. 4. In fig. 4, the values inside the black brackets represent the serial numbers of the fitted binary copula.

(5) The prediction effect of 1000 groups of samples to be predicted is shown in fig. 5.

The result shows that the effective and timely prediction of the concentration of the butane at the bottom of the debutanizer tower can be realized by adopting the soft measurement method described by the vine copula correlation.

Example four

A soft measurement system based on a vine copula correlation description, the system comprising:

the training sample set acquisition module is used for determining auxiliary variables required by modeling; the data transformation module is used for carrying out standardization and monotonic transformation on each dimension variable to obtain data suitable for copula modeling; the joint probability density function acquisition module is used for performing correlation modeling to acquire a joint probability density function and a copula function of the auxiliary variable and the target variable; the on-line collection and transformation module of the auxiliary variable of the sample to be predicted; the training sample weight calculation module is used for calculating the weights of all training sample target variables according to auxiliary variables of data to be predicted; and the linear weighted prediction module weights the target variable probabilities of all the training samples after zero-mean standardization to obtain a predicted value of the target variable of the sample to be predicted, and then performs inverse transformation to obtain a final predicted value. The specific implementation manner of each module can refer to the implementation process corresponding to each step in the first embodiment.

In summary, aiming at the nonlinearity, the non-gaussian, the coupling relation of the variables and the complex non-monotonic characteristics of the industrial data, a correlation model copula is introduced into soft measurement, and a monotonic transformation method is combined, so that a soft measurement regression model described based on the correlation of the D-vine copula is provided, the model does not need to perform dimensionality reduction processing on the original data, information loss is avoided, the monotonic transformation is performed on the original data at first, and the regression model based on the D-vine copula is established in a transformation space, so that the nonlinear, non-gaussian and non-monotonic problems of the industrial data are effectively processed, and good regression prediction capability is obtained.

The invention has the beneficial effects that: according to the soft measurement method and system based on the vine copula correlation description, a correlation model copula is introduced into soft measurement according to the nonlinear, non-Gaussian and variable coupling relation and complex non-monotonic characteristics of industrial data, and a soft measurement regression model based on the D-vine copula correlation description is provided by combining a monotonic transformation method, so that the prediction of key variables is realized.

The description and application of the present invention are illustrative, and are not intended to limit the scope of the invention to the embodiments described above. Variations and modifications of the embodiments disclosed herein are possible, and alternative and equivalent various components of the embodiments will be apparent to those skilled in the art. It will be clear to those skilled in the art that the present invention may be embodied in other forms, structures, arrangements, proportions, and with other components, materials, and parts, without departing from the spirit or essential characteristics thereof. Other variations and modifications of the embodiments disclosed herein may be made without departing from the scope and spirit of the invention.

Claims

1. A soft measurement method based on vine copula correlation description is characterized by comprising the following steps:

step S6: according to the weight of the training sample calculated in the step S5, carrying out linear weighting on the target variable of the training sample to obtain a predicted value of the standardization of the target variable of the sample to be predicted, and then carrying out inverse transformation to obtain a final predicted value;

the step S2 obtains the monotone transformed data by:

step 2.1: zero mean value standardization of original data (1)

Wherein the content of the first and second substances,

X_iis a variable that is to be subjected to a transformation,

X_i' is the zero mean normalized variable,

u(X_i) Is a variable X_iThe average value of (a) of (b),

var(X_i) Is a variable X_iThe variance of (a);

step 2.2: defining a monotonic transformation form, see equation (2):

Z_i＝(1-α_i)X_i′+α_iX_r′i＝(1，2，...，d) (2)

wherein the content of the first and second substances,

Z_iis a variable that has been transformed monotonically,

X_r' is a reference variable and is a reference variable,

α_iis the corresponding monotonic transform coefficient;

step 2.3: determining monotonic transformation coefficients, see equation (3)

Wherein the content of the first and second substances,

ρ_i，0＝Cov(X_r′，X_i′)＝ρ(X_r′，X_i′)，ρ(X_r′，X_i') represents X_r' and X_iPearson's correlation coefficient between, ρ_mIs a hyperparameter, representing p (X)_r′，Z_i') appropriate value, ensuring X_r' and Z_r' to satisfy a monotonically increasing relationship; the step S3 obtains the joint probability density function of each modality through the following four substeps:

d is the dimension of the vector x,

f (x) is the joint probability density function of the vector x,

f_t(x_t) Is a variable x_tThe edge probability density function of (a) is,

F(x_j|x_j+1，...，x_j+i-1) Is a variable x_jIs used to calculate the cumulative conditional distribution function of (c),

c_{j，j+i|j+1：j+i-1}is a density function of the binary copula,

θ_{j，j+i|(j+1：j+i-1)}the parameters to be optimized in the binary copula density function are obtained;

wherein the content of the first and second substances,

τ_i，jis a variable x_iAnd x_jThe Kendall rank correlation coefficient of (1);

the optimized D-vine copula root node is obtained;

wherein

v＝x_-iIs a d-1 dimensional vector with the variable x removed_i，

v_jIs the jth element in the vector v,

v_-jthe vector is the vector after the jth variable in the vector v is removed;

wherein the content of the first and second substances,

n is the number of samples and,

q is an uncertainty parameter θ_{j，j+i|(j+1：j+i-1)}The number of (2);

wherein the content of the first and second substances,

to represent

The domain of definition of (a) is,

2. The method of soft measurement described by vine copula correlation according to claim 1, wherein: the step S4 determines the normalization and monotonicity processing of the test data by the following steps:

3. The method of soft measurement described by vine copula correlation according to claim 1, wherein: the step S5 determines the weights of the training samples by: calculating coplua function values between target variable values of all training samples and auxiliary variables of the samples to be predicted according to the copula function obtained in the step 3

wherein the content of the first and second substances,

y_iis the (i) th training sample,

w (yi) is the weight of the ith training sample,

is based on copula estimation from step 3.

4. The method of soft measurement as claimed in claim 1, wherein the step S6 calculates the prediction value normalized by the prediction samples according to formula (10), and further obtains the final prediction value by inverse transformation according to formula (11):

wherein the content of the first and second substances,

y_i' is the value of the training sample normalized by the zero mean,

w (yi) is the weight of the ith training sample,

5. A complex chemical process soft measurement method based on vine copula correlation modeling comprises the following specific steps:

step S2: obtaining transformed data which accord with copula modeling by using a monotone transformation method;

zero mean value standardization of original data (1)

Wherein the content of the first and second substances,

X_iis a variable before transformation, X_i' is a zero mean normalized variable, u (X)_i) Is a variable X_iMean value of (a), var (X)_i) Is a variable X_iThe variance of (a); defining a monotonic transformation form, see equation (2):

Z_i＝(1-α_i)X_i′+α_iX_r′i＝(1，2，...，d) (2)

wherein

Z_iIs a variable after rolling pin conversion, X_r' as a reference variable, α_iIs a corresponding monotonic transform coefficient

The last dimension of the auxiliary variable is directly selected by reference variable, and the monotonic transformation coefficient is determined by the following formula (3)

Wherein the content of the first and second substances,

ρ_i，0＝Cov(X_r′，X_i′)＝ρ(X_r′，X_i′)，ρ(X_r′，X_i') represents X_r' and X_i' the previous pearson correlation coefficient,

ρ_mis a hyperparameter, representing p (X)_r′，Z_i') appropriate value, ensuring X_r' and Z_r' to satisfy a monotonically increasing relationship;

step S3: performing correlation modeling by using the D-vine copula to obtain a joint probability density function of the auxiliary variable and the target variable;

for d-dimensional random vector x ═ x₁，x₂，...，x_d]The D-vine model (the joint probability density function of x) is:

where d is the dimension of the random vector x and the variables of each dimension have been normalized, f_t(x_t) Is a random variable x_tOf a probability density function of F (x)_j|x_j+1，...，x_j+i-1) Is a random variable x_jCumulative conditional distribution function of c_{j，j+i|j+1：j+i-1}Is a density function of binary copula, theta_{j，j+i|(j+1：j+i-1)}The parameters to be optimized in the binary copula density function are obtained;

wherein, tau_i，jIs a random variable x_iAnd x_jThe Kendall rank correlation coefficient of (1);

setting random variables x_i(i ═ 1,2, Λ, n) initial value F of the edge-cumulative distribution function_i(x_i) Calculating all cumulative conditional distribution function values involved in equation (4) according to equation (6) and using an iterative strategy;

wherein the content of the first and second substances,

is a distribution function of binary copula;

where N is the number of samples and q is the uncertainty parameter θ_{j，j+i|(j+1：j+i-1)}The number of (2);

wherein the content of the first and second substances,

to represent

The domain of definition of (a) is,

Finally, selecting all copula pairs with the minimum BIC value by using a BIC criterion; due to each binary copula parameter theta_{i，i+j|1：i-1}Different value ranges exist, so that the L-BFGS-B algorithm is adopted to solve the problem that the formula (4) is used as an objective function, and theta is used_{i，i+j|1：i-1}The actual value range is a constrained optimization problem;

step S4: normalization and monotonicity processing of test data

step 4.2: the sample to be predicted is monotonously transformed, based on step 2.3,

changing X to [ X ]₁，x₂，...，x_d]Monotonic transformation into Z ═ Z₁，z₂，...，z_d]；

Step S5, determine weights of training samples:

And determines the weight w of each training sample, see equation (9),

wherein the content of the first and second substances,

y_iis the (i) th training sample,

w(y_i) Is the weight of the ith training sample,

is estimated based on copula in step 3;

wherein the content of the first and second substances,

y_i' is the value of the training sample normalized by the zero mean,

w (yi) is the weight of the ith training sample,