CN110490496B

CN110490496B - Method for screening sensitive variables influencing product quality in complex industrial process based on stepwise reduction

Info

Publication number: CN110490496B
Application number: CN201910880769.XA
Authority: CN
Inventors: 王雅琳; 李灵; 袁小锋; 孙备; 阳春华; 陈志文; 吴东哲; 王思哲; 郭静宇; 李繁飙
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2019-06-19
Filing date: 2019-09-18
Publication date: 2022-03-11
Anticipated expiration: 2039-09-18
Also published as: CN110490496A

Abstract

The invention discloses a method for screening sensitive variables influencing product quality in a complex industrial process based on stepwise reduction, which belongs to the technical field of soft measurement and comprises the following steps: selecting auxiliary variables influencing the product quality through expert knowledge and collecting data samples; calculating an auxiliary variable sensitivity index by comprehensively considering the variable correlation and the sensitivity of the variable to the working condition change, and primarily screening the sensitive variable influencing the dominant variable; and (3) constructing a weighted cosine martian system, and accurately screening key sensitive variables influencing the product quality. The method can accurately reflect the correlation and the working condition information of the variables, simultaneously well reduce the redundancy of the variables, not only can improve the product quality prediction precision, but also can effectively reduce the complexity of the prediction model, and has important significance on the maintenance of the soft sensor model.

Description

Method for screening sensitive variables influencing product quality in complex industrial process based on stepwise reduction

Technical Field

The invention relates to the technical field of soft measurement, in particular to a method for screening sensitive variables influencing product quality based on stepwise reduction.

Background

With the development of advanced manufacturing technology, the manufacturing industry has put higher demands on the improvement of quality, benefit and environmental protection from the expansion of quantity and scale of production development. In order to effectively monitor and evaluate the process operation status in time, accurately diagnose the system fault and quickly track the product quality, the process key product quality needs to be detected in real time. However, these key product qualities are currently more difficult to implement for on-line detection due to the harsh nature of the detection environment, the high cost of analytical instrumentation, and the lag nature of assay analysis. Therefore, data-driven soft measurement modeling techniques based on process characteristics and process data are widely used in industrial production.

The data-driven soft measurement technology organically combines the knowledge of the production process, and uses computer technology to select other variables which are easy to measure and are inferred or estimated by forming a certain mathematical relationship for the product quality which is difficult to measure or temporarily impossible to measure. Because the number of measurable variables in the process is large, if all the variables are regarded as auxiliary variables for soft measurement modeling, the complexity of the model is increased, the calculation speed is reduced, dimension disasters are caused, the stability and the prediction precision of the model are reduced, and the economic cost of data acquisition and storage is greatly increased. Therefore, it is important to quickly and efficiently select a subset of auxiliary variables that most accurately describe or explain the process dominant variables.

The current variable optimization method can be divided into three types, namely a filtering type, a wrapping type and an embedding type according to different variable searching and evaluating methods. The filtering method is widely applied because of high calculation speed and difficulty in overfitting. The method uses a variable sorting technology as a main standard for selecting variables, and generally adopts the characteristics or statistical rules of data as analysis bases. The commonly used analysis bases include correlation coefficient, mutual information, Euclidean distance, Bayesian inference and the like. The filtering variable selection method does not rely on a learning algorithm, and adapts the learning algorithm by changing the data, but the method easily ignores the variable correlations, resulting in that the selected subset may not be the optimal subset.

In order to solve the problem of variable redundancy of the filtering variable selection method, a plurality of attempts and researches are carried out in the theoretical bound at home and abroad and in industrial practice, and the researches can effectively solve the problems that the correlation and redundancy among variables are easy to ignore in the filtering variable selection method but the filtering variable selection method does not have the capability of process working condition information description. However, in the actual industrial production process, the production working conditions are in a fluctuation state due to the influences of fluctuation of the quality of the inlet raw materials, adjustment of the processing scheme, change of the product specification requirements and the like, and the product quality can have certain difference under different working conditions. If the screened auxiliary variables cannot better describe the change of the working condition, the accuracy of the prediction model is reduced to a certain extent. Therefore, the research place has practical significance in a sensitive variable selection method which can reflect the working condition information and the correlation between the main variable and the auxiliary variable.

Disclosure of Invention

The invention aims to solve the technical problem of providing a method for screening sensitive variables influencing product quality in a complex industrial process based on step-by-step reduction, which can reflect not only working condition information but also the correlation between a main variable and an auxiliary variable.

A method for screening sensitive variables influencing product quality in a complex industrial process based on stepwise reduction comprises the following steps:

s1, based on a production process, preliminarily selecting a plurality of auxiliary variables influencing product quality through mechanism analysis and expert knowledge, and collecting a plurality of groups of auxiliary variable values and product quality values at corresponding moments as samples;

s2, comprehensively considering the correlation between the auxiliary variable and the product quality and the sensitivity of the auxiliary variable to the working condition change, calculating an auxiliary variable sensitivity index, and preliminarily screening the sensitive variable influencing the product quality according to the sensitivity index:

s21, performing outlier rejection, wavelet denoising and standardization processing on the collected auxiliary variable value samples;

s22, calculating a correlation coefficient matrix of the auxiliary variable and the product quality by using a Pearson correlation analysis method, and calculating a partial correlation coefficient of the auxiliary variable and the product quality according to the correlation coefficient matrix;

s23, calculating the mean value, the standard deviation and the variance of the auxiliary variable, and further calculating the variation coefficient of the auxiliary variable;

s24, taking the product of the auxiliary variable and the partial correlation coefficient of the product quality and the variation coefficient of the auxiliary variable as the sensitivity index of the auxiliary variable, and calculating according to the partial correlation coefficient and the variation coefficient of the auxiliary variable in the steps S22 and S23 to obtain the sensitivity index of the auxiliary variable;

s25, setting different thresholds for the sensitivity indexes based on expert knowledge according to production process objects and product quality, and selecting auxiliary variables in the threshold range as sensitive variables;

s3, constructing a weighted cosine martian system, performing attribute reduction on the initially selected sensitive variables from two angles of distance and direction, and accurately screening out the sensitive variables influencing the product quality as key sensitive variables:

s31, dividing the collected sensitive variable samples into a normal sample and an abnormal sample through mechanism analysis and expert knowledge, and carrying out standardization processing on the two types of samples, wherein the mean value and the standard deviation of the abnormal sample data in the standardization processing are the same as those of the normal sample data;

s32, respectively calculating the Mahalanobis distances of all normal samples;

s33, calculating cosine values of included angles of all normal samples respectively, and further calculating cosine similarity of all normal samples respectively;

s34, calculating the variation coefficients of the Mahalanobis distance and the cosine similarity of the normal sample respectively, and determining the weight of the cosine Mahalanobis distance according to the ratio of the Mahalanobis distance and the cosine similarity of the variation coefficients to the total variation coefficient;

s35, constructing a weighted cosine Mahalanobis reference space based on the cosine Mahalanobis distance of the normal sample;

s36, designing an orthogonal table, wherein each line in the orthogonal table corresponds to a weighted cosine Mahalanobis reference space, and calculating the cosine Mahalanobis distance of the abnormal sample in each reference space;

s37, selecting the signal-to-noise ratio with the expected large characteristic to calculate the signal-to-noise ratio of the abnormal sample in each reference space;

s38, respectively calculating the mean value of the signal-to-noise ratio when the sensitive variable is used and not used, then calculating the signal-to-noise ratio increment, setting a certain threshold value for the signal-to-noise ratio increment according to expert knowledge, and selecting all sensitive variables in the threshold value range as key sensitive variables.

Further, the following steps are included after the step S35 and before the step S36:

calculating the cosine Mahalanobis distance of the abnormal sample according to the constructed weighted cosine Mahalanobis reference space, verifying the effectiveness of the constructed cosine Mahalanobis reference space, and if the weighted cosine Mahalanobis reference space can better distinguish the cosine Mahalanobis distances of the normal sample and the abnormal sample, the constructed weighted cosine Mahalanobis reference space is effective; otherwise, the process proceeds to step S3, where the weighted cosine martian system is reconstructed. Further, step S3 is followed by step S4: and (3) establishing a product quality prediction model by adopting a local weighted partial least square method, and verifying the effectiveness and accuracy of the selected key sensitive variable.

Further, the normalization processing described in steps S21, S31 takes the following form:

z_ij＝(x_ij-μ_i)/S_i

wherein z is_ijThe j sample value, x, representing the ith auxiliary or sensitive variable after normalization_ijThe j sample value, mu, representing the i auxiliary variable or the sensitive variable_iRepresenting the mean, s, of the ith auxiliary or sensitive variable_iIndicating the standard deviation of the ith auxiliary variable or the sensitive variable.

Further, the correlation coefficient matrix of step S22 is calculated as follows:

wherein the content of the first and second substances,

the partial correlation coefficient is calculated as follows:

wherein, c_ikIs said M_ccInverse matrix of

The medium element:

the coefficient of variation of the auxiliary variable in step S23 is calculated as follows:

wherein, mu_iRepresenting the mean, s, of the ith auxiliary or sensitive variable_iIndicating the standard deviation, σ, of the ith auxiliary or sensitive variable_iRepresents the variance of the ith auxiliary variable or sensitive variable;

the auxiliary variable sensitivity index of step S24 is calculated as follows:

wherein eta_ikDenotes the sensitivity index, r, of the ith auxiliary variable to the kth main variable_ikIndicating the ith assistancePartial correlation coefficient, mu, of variable with the k-th dominant variable_iRepresenting the mean, s, of the ith auxiliary or sensitive variable_iIndicating the standard deviation, σ, of the ith auxiliary or sensitive variable_iRepresenting the variance of the ith auxiliary variable or sensitive variable. Further, the mahalanobis distance of the normal sample described in step S32 is calculated as follows:

s321, selecting n normal samples, and assuming that there are q initial sensitive variables in the samples, the sample space can be expressed as:

wherein o is_liData representing the ith auxiliary variable or sensitive variable of the ith normal sample, wherein l is 1, 2. 1,2, ·, q;

s322, carrying out standardization processing on normal sample data:

wherein

Normalized data representing the ith auxiliary variable or sensitive variable of the ith normal sample, where l is 1, 2. 1,2, ·, q;

s323, the Mahalanobis distance is as follows:

wherein, MD_lMahalanobis distance of the ith normal sample, S is the correlation coefficient matrix of the normal sample,

wherein the content of the first and second substances,

transpose of the normal sample data matrix after the representation normalization process, S^-1Representing the inverse of the normal sample correlation coefficient matrix and q representing the number of initial sensitive variables.

Further, the cosine similarity in step S33 is:

wherein

Normalized data for the ith auxiliary variable or sensitive variable for the ith normal sample,

is the mean value of the ith auxiliary variable or sensitive variable data.

Further, the weight calculation method of the cosine mahalanobis distance in step S34 includes:

in which ξ₁Coefficient of variation, s, of Mahalanobis distance of normal sample_MDlStandard deviation of mahalanobis distance, μ, for normal samples_MDlThe mean value of the mahalanobis distance of the normal sample; xi₂Coefficient of variation, s, of cosine similarity of normal samples_CSlIs the standard deviation of cosine similarity of normal sample, mu_CSlThe cosine similarity is the mean value of the cosine similarity of the normal sample;

the cosine mahalanobis distance described in step S35 is calculated as follows:

CMD_l＝αMD_l+βCS_l

wherein MD_lRepresenting the mahalanobis distance of the ith normal sample to describe the similarity of the sample distances; CS_lAnd the cosine similarity of the ith normal sample is represented and used for describing the similarity of the sample directions.

Further, the signal-to-noise ratio calculating method in step S37 is as follows:

wherein the CMD_uRepresenting the mahalanobis distance of the abnormal samples, and v representing the number of the abnormal samples, for the auxiliary variable;

the signal-to-noise ratio increment described in step S38 is represented by Δ SN_jRepresents:

wherein the content of the first and second substances,

means representing the signal-to-noise ratio when using the sensitive variable;

representing the mean of the signal-to-noise ratio when the sensitive variable is not used.

Further, the complex industrial process is a cracking production process; the product is aviation kerosene with 10 percent of distillation temperature.

Compared with the prior art, the invention has the beneficial effects that: on the basis of determining the sensitive variable and the key sensitive variable, calculating a sensitivity index according to the sensitivity of the variable to the working condition change and the net correlation between the auxiliary variable and the product quality, and realizing the initial selection of the sensitive variable; and then the problem of high redundancy of the variables is solved through the constructed weighted cosine field system, and the selection of the sensitive variables is realized. The method well solves the problems that the traditional filtering type variable selection method is easy to ignore the correlation of the variables and cannot accurately reflect the working condition information, and has the advantages of simple calculation, difficult overfitting, small redundancy and the like.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

Fig. 2 is a histogram of signal-to-noise ratio increments of sensitive variables of a fracturing process in an embodiment of the present invention.

FIG. 3 shows the result of prediction using a local weighted partial least squares method and a set of key sensitivity variables according to an embodiment of the present invention.

FIG. 4 shows the result of prediction using a local weighted partial least squares method and a set of sensitivity variables according to an embodiment of the present invention.

Fig. 5 shows the result of prediction by using a local weighted partial least squares method and mechanism-based screening of an auxiliary variable set in an embodiment of the present invention.

FIG. 6 is a diagram illustrating the relative error results of a local weighted partial least squares method and predictions using a set of key sensitivity variables, according to an embodiment of the present invention.

FIG. 7 is a diagram illustrating the relative error results of a local weighted partial least squares method and predictions using a set of sensitivity variables, according to an embodiment of the present invention.

FIG. 8 is a diagram illustrating the relative error results of the prediction using the local weighted partial least squares method and the mechanism to screen the set of auxiliary variables according to an embodiment of the present invention.

Detailed Description

In order to further disclose the present invention, the technical solutions disclosed in the present invention will be fully and specifically described below with reference to the accompanying drawings of the specification:

as shown in FIG. 1, the method for screening sensitive variables influencing product quality in a complex industrial process based on step-by-step reduction provided by the invention comprises the following steps:

s1, based on a production process, preliminarily selecting a plurality of auxiliary variables which possibly influence the product quality through mechanism analysis and expert knowledge, and collecting a plurality of groups of auxiliary variable values and product quality values at corresponding moments as samples;

s2, comprehensively considering the correlation between the auxiliary variable and the product quality and the sensitivity of the auxiliary variable to the working condition change, calculating an auxiliary variable sensitivity index, and preliminarily screening the sensitive variable influencing the product quality according to the sensitivity index.

And S3, constructing a weighted cosine martian system, performing attribute reduction on the initially selected sensitive variables from two angles of distance and direction, and accurately screening out the sensitive variables affecting the product quality as key sensitive variables.

As an improvement, the foregoing technical solution may further include step S4: and (3) establishing a product quality prediction model by adopting a local weighted partial least square method, and verifying the effectiveness and accuracy of the selected key sensitive variable.

The step S2 is specifically implemented as follows:

s25, setting different threshold values for the sensitivity indexes based on expert knowledge according to production process objects and product quality, and selecting auxiliary variables in the threshold value range as sensitive variables.

The normalization processing in the foregoing steps S21 and S31 is performed as follows:

z_ij＝(x_ij-μ_i)/S_i

wherein z is_ijRepresentation normalization processSample value, x, of the ith auxiliary or sensitive variable_ijThe j sample value, mu, representing the i auxiliary variable or the sensitive variable_iRepresenting the mean, s, of the ith auxiliary or sensitive variable_iIndicating the standard deviation of the ith auxiliary variable or the sensitive variable.

The correlation coefficient matrix described in step S22 is calculated as follows:

wherein the content of the first and second substances,

the partial correlation coefficient is calculated as follows:

wherein, c_ikIs said M_ccInverse matrix of

The medium element:

the auxiliary variable sensitivity index of step S24 is calculated as follows:

wherein eta_ikDenotes the sensitivity index, r, of the ith auxiliary variable to the kth main variable_ikRepresents the partial correlation coefficient, mu, of the ith auxiliary variable and the kth main variable_iRepresenting the mean, s, of the ith auxiliary or sensitive variable_iIndicating the standard deviation, σ, of the ith auxiliary or sensitive variable_iRepresenting the variance of the ith auxiliary variable or sensitive variable. The step S3 is specifically implemented as follows:

s32, respectively calculating the Mahalanobis distances of all normal samples;

As an improvement of the foregoing technical solution, after step S35 and before step S36, the method further includes the following steps:

calculating the cosine Mahalanobis distance of the abnormal sample according to the constructed weighted cosine Mahalanobis reference space, verifying the effectiveness of the constructed cosine Mahalanobis reference space, and if the weighted cosine Mahalanobis reference space can better distinguish the cosine Mahalanobis distances of the normal sample and the abnormal sample, the constructed weighted cosine Mahalanobis reference space is effective; otherwise, the process proceeds to step S3, where the weighted cosine martian system is reconstructed. The mahalanobis distance of the normal sample described in step S32 is calculated as follows:

s322, carrying out standardization processing on normal sample data:

wherein

s323, the Mahalanobis distance is as follows:

wherein, MD_lIs the first positiveThe mahalanobis distance of the normal sample, S is the correlation coefficient matrix of the normal sample,

wherein the content of the first and second substances,

The cosine similarity in step S33 is:

wherein

is the mean value of the ith auxiliary variable or sensitive variable data.

The weight calculation method of the cosine mahalanobis distance described in step S34 includes:

the cosine mahalanobis distance described in step S35 is calculated as follows:

CMD_l＝αMD_l+βCS_l

The signal-to-noise ratio calculation method described in step S37 is as follows:

wherein the content of the first and second substances,

means representing the signal-to-noise ratio when using the sensitive variable;

The specific embodiment is as follows:

the application of the technical scheme disclosed by the invention to the product quality prediction in the hydrocracking process comprises the following steps:

a. based on the cracking production process, preliminarily selecting the quality of 10 percent of distillation temperature of the aviation kerosene according to mechanism analysis and expert experience39 related variables with influence on the index are used as input variables of the quality prediction of the hydrocracking process and are respectively marked as x₁、x₂、...、x₃₉The data of 39 relevant variables continuously produced in 966 days are extracted, and the data of 10% distillation temperature of aviation kerosene measured by offline test at 8 hours and 20 hours every day in 966 days are extracted, and the 1932 groups are total. The data obtained are divided into two parts for prediction modeling of aviation kerosene 10% distillation temperature quality index based on LWPLS, wherein 1288 groups are used as a training set, and 644 groups are used as a testing set. Using 180 sets of data in the training set as input data for the selection of the sensitive variables, the input matrix is:

x_i＝[x_i，1，x_i，2，...，x_i，39]^T，i＝1，2，...，180

X＝[x₁，x₂，...，x₁₈₀]

b. and (3) comprehensively considering the correlation of the variables and the sensitivity of the variables to the working condition change, calculating an auxiliary variable sensitivity index, and preliminarily screening the sensitive variables influencing the 10 percent distillation temperature of the aviation kerosene according to the sensitivity index.

Normalizing the selected auxiliary variable sampling samples:

z_ij＝(x_ij-μ_i)/s_i

wherein z is_ijRepresenting normalized data values, x_ijJ sample value, mu, representing the ith variable_iDenotes the mean value of the ith variable, s_iThe standard deviation of the ith variable is indicated.

Calculating partial correlation coefficients of the auxiliary variables, and calculating a correlation coefficient matrix by using a Pearson correlation analysis method:

wherein the content of the first and second substances,

normalized auxiliary variable z_iPartial phase with 10% distillation temperature A10% of aviation keroseneCoefficient of correlation r_iA10％Comprises the following steps:

wherein c is_iA10％Is M_ccInverse matrix of

Middle element

Calculating the coefficient of variation of the selected auxiliary variables:

wherein, mu_iDenotes the mean value of the ith variable, s_iDenotes the standard deviation, σ, of the ith variable_iRepresents the variance of the ith variable.

Calculating the sensitivity index of the auxiliary variable, namely the product of the partial correlation coefficient of the auxiliary variable and the 10% distillation temperature of the aviation kerosene and the variation coefficient of the auxiliary variable:

wherein eta_iA10％The sensitivity index, r, of the ith auxiliary variable to the 10% distillation temperature A10% of aviation kerosene_iA10％The coefficient of partial correlation, μ, of the ith auxiliary variable with the 10% distillation temperature A10% of the aviation kerosene_iDenotes the mean value of the ith variable, s_iDenotes the standard deviation, σ, of the ith variable_iRepresents the variance of the ith variable.

The larger the sensitivity index is, the more the auxiliary variable has influence on the 10% distillation temperature A10% of the aviation kerosene, and the more the auxiliary variable is sensitive to the change of the working condition. The dispersion degree of 39 auxiliary variables, the partial correlation coefficient with the 10% distillation temperature A10% of the aviation kerosene and the sensitivity index are calculated, and the results are shown in Table 1.

TABLE 1 hydrocracking flow mechanism screening auxiliary variable sensitivity index

Based on 10% distillation temperature of the aviation kerosene, the threshold value of the auxiliary variable sensitivity index is set to 0.01 according to expert knowledge, and the sensitivity indexes of all variables are analyzed, so that the sensitivity indexes of a refining reactor bottom temperature indicator (12), a refining reactor pressure difference (13), a water injection amount (21) of a water injection tank, a hydrogen sulfide removal stripping tower top reflux amount (24), a main fractionating tower middle section return temperature (32), a diesel stripping tower top temperature (38) and a diesel stripping tower bottom temperature (39) are lower. The remaining 32 auxiliary variables, excluding the 7 lower sensitivity index variables, are therefore sensitive variables.

c. Constructing a weighted cosine martian system, performing attribute reduction on initially selected sensitive variables from two angles of distance and direction, and selecting key sensitive variables influencing 10% of distillation temperature A10% of aviation kerosene, wherein the specific steps comprise:

(1) if 32 normal samples are selected, and there are 32 initial sensitive variables in the samples, the sample space can be expressed as:

wherein o is_ijAnd (

i

1, 2.. times, 32;

j

1, 2.. times, 32) represents the data of the j sensitive variable of the ith normal sample.

Normalizing normal sample data:

wherein

Normalized data representing the jth auxiliary variable of the ith normal sample.

Calculating cosine mahalanobis distances of all normal samples:

CMD_i＝αMD_i+βCS_i

wherein MD_iRepresenting the mahalanobis distance of the sample to describe the similarity of the sample distances; CS_iRepresenting cosine similarity of the samples, and describing similarity of sample directions; alpha and beta are weight coefficients.

Calculating the Mahalanobis distance MD of the sample_i：

Where S is the matrix of correlation coefficients for normal samples,

calculating cosine similarity of the samples:

wherein

For the jth auxiliary variable data of the ith sample,

is the mean value of the jth auxiliary variable data.

Determining the weight of the cosine Mahalanobis distance according to the variation degree of the Mahalanobis distance and the variation degree of the cosine similarity of the normal sample, wherein the specific formula is as follows:

in which ξ₁Coefficient of variation, s, of Mahalanobis distance of normal sample_MDiStandard deviation of mahalanobis distance, μ, for normal samples_MDiThe mean value of the mahalanobis distance of the normal sample; xi₂Coefficient of variation, s, of cosine similarity of normal samples_CSiIs the standard deviation of cosine similarity of normal sample, mu_CSiIs the mean value of cosine similarity of normal samples. Table 2 shows the results for the weighted cosine-scale reference space section.

TABLE 2 weighted cosine-bridge reference space

It can be seen from table 2 that the cosine mahalanobis distance of the normal sample fluctuates substantially around 1, and the mean value is 0.9020.

(2) The abnormal samples are normalized, and then the mahalanobis distance of the abnormal samples, the cosine similarity of the average value of the abnormal samples and the normal samples, and the cosine mahalanobis distance are respectively calculated, and the results are shown in table 3.

TABLE 3 cosine Mahalanobis distance of anomaly samples

As shown in table 3, the cosine mahalanobis distances of the abnormal samples are all much greater than 1, and the mean value is 205.5255, so that the constructed weighted cosine mahalanobis reference space can well distinguish the normal samples from the abnormal samples. The abnormal sample 3 is an outlier abnormal sample specially selected, the mahalanobis distance of the outlier abnormal sample is 1.6571, if the sample is judged according to the traditional madman system only according to the mahalanobis distance, the sample 3 is a normal sample and is not in accordance with the actual situation; the cosine similarity of the sample 3 is 5.5180, the cosine mahalanobis distance is 2.2362, and at this time, the weighted cosine martian system discriminates the sample 3 as an abnormal sample, so that compared with the conventional martian system, the weighted cosine martian system can better distinguish a normal sample from an abnormal sample.

(3) Optimizing a reference space: the orthogonal table shown in table 4 was designed, level 1 indicated the use of the auxiliary variable, level 2 indicated the non-use of the auxiliary variable, and the signal-to-noise ratio was calculated. Each line in the orthogonal table corresponds to a reference space, and the cosine Mahalanobis distance and the signal-to-noise ratio of the abnormal sample in each reference space are calculated according to the following calculation formula:

wherein the CMD_pMahalanobis distance for the abnormal sample. As far as the auxiliary variable is concerned,

means representing the signal-to-noise ratio when using the sensitive variable;

representing the mean of the signal-to-noise ratio when the sensitive variable is not used. Delta SN for signal-to-noise ratio increment_jAnd (4) showing.

Aiming at the 10 percent distillation temperature of the aviation kerosene in the hydrocracking process, the signal-to-noise ratio increment threshold value is set to be 0.3 based on expert knowledge, and all sensitive variables in the threshold value range are selected as key sensitive variables (the serial numbers in brackets are the variable serial numbers in a mechanism screening auxiliary variable sensitivity index table).

TABLE 4 Bi-level Quadrature Table and SNR

The signal-to-noise ratio increment histograms of the 32 sensitive variables are shown in fig. 2, and the signal-to-noise ratio increments of the variables 21 (original mechanical screening auxiliary variables 25), 28 (original mechanical screening auxiliary variable 33) and 32 (original mechanical screening auxiliary variable 37) are negative, which indicates that the auxiliary variables are invalid for modeling; the signal-to-noise ratio increment of the variable 26 (the original mechanism screening auxiliary variable 30) is small, which shows that the auxiliary variable has a small effect on modeling and can be ignored. And finally obtaining 28 key sensitive variables which can be used for predictive modeling.

d. A prediction model is built by adopting a Local Weighted Partial Least Squares (LWPLS) method, the data for modeling have 1610 groups, wherein 966 groups are used as a training set, 644 is used as a test set, an auxiliary variable set is used for modeling by screening a variable set, a sensitive variable set and a key sensitive variable set according to mechanisms respectively, the model parameters are completely the same, the prediction result is shown in figures 3-5, the prediction error is shown in figures 6-8, and the root mean square error RMSE is shown in table 5. As can be seen from the graphs in FIGS. 3 and 6, the prediction modeling is carried out by using the key sensitive variables, the prediction result can better track the actual value of the 10% distillation temperature of the aviation kerosene compared with other two auxiliary variable sets, the prediction error is smaller, the predicted root mean square error RMSE is 3.0390, the predicted root mean square error RMSE is respectively improved by 5.81% and 3.94% compared with the other two auxiliary variable sets, and the effectiveness of the method provided by the invention is verified.

TABLE 53 Root Mean Square Error (RMSE) for variable set predictive modeling

In addition, the effectiveness of the method provided by the invention is verified by respectively adopting 3 methods of Partial Least Squares (PLS), Support Vector Machines (SVM) and Local Weighted Kernel Principal Component Regression (LWKPCR), and the root mean square errors of the three methods are shown in Table 6.

TABLE 63 root mean square error RMSE for different predictive modeling of variable sets

Finally, the above examples are merely for illustrative clarity and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for screening sensitive variables influencing product quality in a complex industrial process based on stepwise reduction is characterized by comprising the following steps:

s21, performing outlier rejection, wavelet denoising and standardization processing on the collected auxiliary variable value samples, wherein the standardization processing adopts the following mode:

z_ij＝(x_ij-μ_i)/S_i

wherein z is_ijThe j sample value, x, representing the ith auxiliary or sensitive variable after normalization_ijThe j sample value, mu, representing the i auxiliary variable or the sensitive variable_iRepresenting the mean, s, of the ith auxiliary or sensitive variable_iRepresents the standard deviation of the ith auxiliary variable or sensitive variable;

s22, calculating a correlation coefficient matrix of the auxiliary variable and the product quality by using a Pearson correlation analysis method, and calculating a partial correlation coefficient of the auxiliary variable and the product quality according to the correlation coefficient matrix, wherein the correlation coefficient matrix is calculated as follows:

wherein the content of the first and second substances,

the partial correlation coefficient is calculated as follows:

wherein, c_ikIs said M_ccInverse matrix of

The medium element:

s23, calculating the mean value, the standard deviation and the variance of the auxiliary variables, and further calculating the variation coefficient of the auxiliary variables, wherein the variation coefficient of the auxiliary variables is calculated as follows:

s24, taking the product of the auxiliary variable and the partial correlation coefficient of the product quality and the variation coefficient of the auxiliary variable as the sensitivity index of the auxiliary variable, and calculating according to the partial correlation coefficient and the variation coefficient of the auxiliary variable in the steps S22 and S23 to obtain an auxiliary variable sensitivity index, wherein the auxiliary variable sensitivity index is calculated as follows:

wherein eta_ikDenotes the sensitivity index, r, of the ith auxiliary variable to the kth main variable_ikRepresents the partial correlation coefficient, mu, of the ith auxiliary variable and the kth main variable_iMeans representing the mean of the ith auxiliary variable or sensitive variable;

s31, dividing collected sensitive variable samples into a normal sample and an abnormal sample through mechanism analysis and expert knowledge, and carrying out standardization processing on the two types of samples, wherein the mean value and the standard deviation of the abnormal sample data in the standardization processing are the same as those of the normal sample data;

s32, respectively calculating the Mahalanobis distances of all normal samples;

wherein o is_liData representing the ith auxiliary variable or sensitive variable of the ith normal sample, wherein l is 1,2, …, n; i is 1,2, …, q;

s322, carrying out standardization processing on normal sample data:

wherein

s323, the Mahalanobis distance is as follows:

wherein the content of the first and second substances,

transpose of the normal sample data matrix after the representation normalization process, S^-1Representing the inverse matrix of the normal sample correlation coefficient matrix, and q representing the number of the initial sensitive variables;

s33, calculating cosine values of included angles of all normal samples respectively, and further calculating cosine similarity of all normal samples respectively, wherein the cosine similarity is as follows:

wherein

for the ith auxiliary variable or sensitivityMean of the sensed variable data;

s34, calculating the variation coefficients of the Mahalanobis distance and the cosine similarity of the normal sample respectively, and determining the weight of the cosine Mahalanobis distance according to the ratio of the Mahalanobis distance and the cosine similarity variation coefficient to the total variation coefficient, wherein the weight calculation method of the cosine Mahalanobis distance comprises the following steps:

s35, constructing a weighted cosine Mahalanobis reference space based on the cosine Mahalanobis distance of the normal sample, wherein the cosine Mahalanobis distance is calculated as follows:

CMD_l＝αMD_l+βCS_l

wherein MD_lRepresenting the mahalanobis distance of the ith normal sample to describe the similarity of the sample distances; CS_lThe cosine similarity of the ith normal sample is represented and used for describing the similarity of the sample direction;

s36, designing an orthogonal table, wherein each row in the orthogonal table corresponds to a weighted cosine mahalanobis reference space, calculating the cosine mahalanobis distance of the abnormal sample in each reference space, calculating the cosine mahalanobis distance of the abnormal sample according to the constructed weighted cosine mahalanobis reference space, verifying the effectiveness of the constructed cosine mahalanobis reference space, and if the weighted cosine mahalanobis reference space can distinguish the cosine mahalanobis distances of the normal sample and the abnormal sample, the constructed weighted cosine mahalanobis reference space is effective; otherwise, the step S3 is carried out, and the weighted cosine hippopotamus system is reconstructed;

s37, calculating the signal-to-noise ratio of the abnormal sample in each reference space by using the signal-to-noise ratio with the expected large characteristic, wherein the signal-to-noise ratio calculation method comprises the following steps:

CMD_urepresenting the mahalanobis distance of the abnormal samples, and v representing the number of the abnormal samples, for the auxiliary variable;

s38, respectively calculating the mean value of the signal-to-noise ratio when the sensitive variable is used and not used, then calculating the signal-to-noise ratio increment of the sensitive variable, setting a certain threshold value for the signal-to-noise ratio increment according to expert knowledge, and selecting all sensitive variables in the threshold value range as key sensitive variables, wherein the signal-to-noise ratio increment is obtained by using delta SN_iRepresents:

wherein the content of the first and second substances,

means representing the signal-to-noise ratio when using the sensitive variable;

means representing the signal-to-noise ratio without use of the sensitive variable;

and (3) establishing a product quality prediction model by adopting a local weighted partial least square method, and verifying the effectiveness and accuracy of the selected key sensitive variable.

2. The method for step-wise reduction-based screening of sensitive variables affecting product quality in complex industrial processes according to claim 1, characterized by: the complex industrial process is a cracking production process; the product is aviation kerosene with 10% distillation temperature.