CN110490496B - Method for screening sensitive variables influencing product quality in complex industrial process based on stepwise reduction - Google Patents
Method for screening sensitive variables influencing product quality in complex industrial process based on stepwise reduction Download PDFInfo
- Publication number
- CN110490496B CN110490496B CN201910880769.XA CN201910880769A CN110490496B CN 110490496 B CN110490496 B CN 110490496B CN 201910880769 A CN201910880769 A CN 201910880769A CN 110490496 B CN110490496 B CN 110490496B
- Authority
- CN
- China
- Prior art keywords
- variable
- sensitive
- auxiliary
- cosine
- variables
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06395—Quality analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Educational Administration (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Game Theory and Decision Science (AREA)
- Manufacturing & Machinery (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method for screening sensitive variables influencing product quality in a complex industrial process based on stepwise reduction, which belongs to the technical field of soft measurement and comprises the following steps: selecting auxiliary variables influencing the product quality through expert knowledge and collecting data samples; calculating an auxiliary variable sensitivity index by comprehensively considering the variable correlation and the sensitivity of the variable to the working condition change, and primarily screening the sensitive variable influencing the dominant variable; and (3) constructing a weighted cosine martian system, and accurately screening key sensitive variables influencing the product quality. The method can accurately reflect the correlation and the working condition information of the variables, simultaneously well reduce the redundancy of the variables, not only can improve the product quality prediction precision, but also can effectively reduce the complexity of the prediction model, and has important significance on the maintenance of the soft sensor model.
Description
Technical Field
The invention relates to the technical field of soft measurement, in particular to a method for screening sensitive variables influencing product quality based on stepwise reduction.
Background
With the development of advanced manufacturing technology, the manufacturing industry has put higher demands on the improvement of quality, benefit and environmental protection from the expansion of quantity and scale of production development. In order to effectively monitor and evaluate the process operation status in time, accurately diagnose the system fault and quickly track the product quality, the process key product quality needs to be detected in real time. However, these key product qualities are currently more difficult to implement for on-line detection due to the harsh nature of the detection environment, the high cost of analytical instrumentation, and the lag nature of assay analysis. Therefore, data-driven soft measurement modeling techniques based on process characteristics and process data are widely used in industrial production.
The data-driven soft measurement technology organically combines the knowledge of the production process, and uses computer technology to select other variables which are easy to measure and are inferred or estimated by forming a certain mathematical relationship for the product quality which is difficult to measure or temporarily impossible to measure. Because the number of measurable variables in the process is large, if all the variables are regarded as auxiliary variables for soft measurement modeling, the complexity of the model is increased, the calculation speed is reduced, dimension disasters are caused, the stability and the prediction precision of the model are reduced, and the economic cost of data acquisition and storage is greatly increased. Therefore, it is important to quickly and efficiently select a subset of auxiliary variables that most accurately describe or explain the process dominant variables.
The current variable optimization method can be divided into three types, namely a filtering type, a wrapping type and an embedding type according to different variable searching and evaluating methods. The filtering method is widely applied because of high calculation speed and difficulty in overfitting. The method uses a variable sorting technology as a main standard for selecting variables, and generally adopts the characteristics or statistical rules of data as analysis bases. The commonly used analysis bases include correlation coefficient, mutual information, Euclidean distance, Bayesian inference and the like. The filtering variable selection method does not rely on a learning algorithm, and adapts the learning algorithm by changing the data, but the method easily ignores the variable correlations, resulting in that the selected subset may not be the optimal subset.
In order to solve the problem of variable redundancy of the filtering variable selection method, a plurality of attempts and researches are carried out in the theoretical bound at home and abroad and in industrial practice, and the researches can effectively solve the problems that the correlation and redundancy among variables are easy to ignore in the filtering variable selection method but the filtering variable selection method does not have the capability of process working condition information description. However, in the actual industrial production process, the production working conditions are in a fluctuation state due to the influences of fluctuation of the quality of the inlet raw materials, adjustment of the processing scheme, change of the product specification requirements and the like, and the product quality can have certain difference under different working conditions. If the screened auxiliary variables cannot better describe the change of the working condition, the accuracy of the prediction model is reduced to a certain extent. Therefore, the research place has practical significance in a sensitive variable selection method which can reflect the working condition information and the correlation between the main variable and the auxiliary variable.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method for screening sensitive variables influencing product quality in a complex industrial process based on step-by-step reduction, which can reflect not only working condition information but also the correlation between a main variable and an auxiliary variable.
A method for screening sensitive variables influencing product quality in a complex industrial process based on stepwise reduction comprises the following steps:
s1, based on a production process, preliminarily selecting a plurality of auxiliary variables influencing product quality through mechanism analysis and expert knowledge, and collecting a plurality of groups of auxiliary variable values and product quality values at corresponding moments as samples;
s2, comprehensively considering the correlation between the auxiliary variable and the product quality and the sensitivity of the auxiliary variable to the working condition change, calculating an auxiliary variable sensitivity index, and preliminarily screening the sensitive variable influencing the product quality according to the sensitivity index:
s21, performing outlier rejection, wavelet denoising and standardization processing on the collected auxiliary variable value samples;
s22, calculating a correlation coefficient matrix of the auxiliary variable and the product quality by using a Pearson correlation analysis method, and calculating a partial correlation coefficient of the auxiliary variable and the product quality according to the correlation coefficient matrix;
s23, calculating the mean value, the standard deviation and the variance of the auxiliary variable, and further calculating the variation coefficient of the auxiliary variable;
s24, taking the product of the auxiliary variable and the partial correlation coefficient of the product quality and the variation coefficient of the auxiliary variable as the sensitivity index of the auxiliary variable, and calculating according to the partial correlation coefficient and the variation coefficient of the auxiliary variable in the steps S22 and S23 to obtain the sensitivity index of the auxiliary variable;
s25, setting different thresholds for the sensitivity indexes based on expert knowledge according to production process objects and product quality, and selecting auxiliary variables in the threshold range as sensitive variables;
s3, constructing a weighted cosine martian system, performing attribute reduction on the initially selected sensitive variables from two angles of distance and direction, and accurately screening out the sensitive variables influencing the product quality as key sensitive variables:
s31, dividing the collected sensitive variable samples into a normal sample and an abnormal sample through mechanism analysis and expert knowledge, and carrying out standardization processing on the two types of samples, wherein the mean value and the standard deviation of the abnormal sample data in the standardization processing are the same as those of the normal sample data;
s32, respectively calculating the Mahalanobis distances of all normal samples;
s33, calculating cosine values of included angles of all normal samples respectively, and further calculating cosine similarity of all normal samples respectively;
s34, calculating the variation coefficients of the Mahalanobis distance and the cosine similarity of the normal sample respectively, and determining the weight of the cosine Mahalanobis distance according to the ratio of the Mahalanobis distance and the cosine similarity of the variation coefficients to the total variation coefficient;
s35, constructing a weighted cosine Mahalanobis reference space based on the cosine Mahalanobis distance of the normal sample;
s36, designing an orthogonal table, wherein each line in the orthogonal table corresponds to a weighted cosine Mahalanobis reference space, and calculating the cosine Mahalanobis distance of the abnormal sample in each reference space;
s37, selecting the signal-to-noise ratio with the expected large characteristic to calculate the signal-to-noise ratio of the abnormal sample in each reference space;
s38, respectively calculating the mean value of the signal-to-noise ratio when the sensitive variable is used and not used, then calculating the signal-to-noise ratio increment, setting a certain threshold value for the signal-to-noise ratio increment according to expert knowledge, and selecting all sensitive variables in the threshold value range as key sensitive variables.
Further, the following steps are included after the step S35 and before the step S36:
calculating the cosine Mahalanobis distance of the abnormal sample according to the constructed weighted cosine Mahalanobis reference space, verifying the effectiveness of the constructed cosine Mahalanobis reference space, and if the weighted cosine Mahalanobis reference space can better distinguish the cosine Mahalanobis distances of the normal sample and the abnormal sample, the constructed weighted cosine Mahalanobis reference space is effective; otherwise, the process proceeds to step S3, where the weighted cosine martian system is reconstructed. Further, step S3 is followed by step S4: and (3) establishing a product quality prediction model by adopting a local weighted partial least square method, and verifying the effectiveness and accuracy of the selected key sensitive variable.
Further, the normalization processing described in steps S21, S31 takes the following form:
zij=(xij-μi)/Si
wherein z isijThe j sample value, x, representing the ith auxiliary or sensitive variable after normalizationijThe j sample value, mu, representing the i auxiliary variable or the sensitive variableiRepresenting the mean, s, of the ith auxiliary or sensitive variableiIndicating the standard deviation of the ith auxiliary variable or the sensitive variable.
Further, the correlation coefficient matrix of step S22 is calculated as follows:
the partial correlation coefficient is calculated as follows:
the coefficient of variation of the auxiliary variable in step S23 is calculated as follows:
wherein, muiRepresenting the mean, s, of the ith auxiliary or sensitive variableiIndicating the standard deviation, σ, of the ith auxiliary or sensitive variableiRepresents the variance of the ith auxiliary variable or sensitive variable;
the auxiliary variable sensitivity index of step S24 is calculated as follows:
wherein etaikDenotes the sensitivity index, r, of the ith auxiliary variable to the kth main variableikIndicating the ith assistancePartial correlation coefficient, mu, of variable with the k-th dominant variableiRepresenting the mean, s, of the ith auxiliary or sensitive variableiIndicating the standard deviation, σ, of the ith auxiliary or sensitive variableiRepresenting the variance of the ith auxiliary variable or sensitive variable. Further, the mahalanobis distance of the normal sample described in step S32 is calculated as follows:
s321, selecting n normal samples, and assuming that there are q initial sensitive variables in the samples, the sample space can be expressed as:
wherein o isliData representing the ith auxiliary variable or sensitive variable of the ith normal sample, wherein l is 1, 2. 1,2, ·, q;
s322, carrying out standardization processing on normal sample data:
whereinNormalized data representing the ith auxiliary variable or sensitive variable of the ith normal sample, where l is 1, 2. 1,2, ·, q;
s323, the Mahalanobis distance is as follows:
wherein, MDlMahalanobis distance of the ith normal sample, S is the correlation coefficient matrix of the normal sample,wherein the content of the first and second substances,transpose of the normal sample data matrix after the representation normalization process, S-1Representing the inverse of the normal sample correlation coefficient matrix and q representing the number of initial sensitive variables.
Further, the cosine similarity in step S33 is:
whereinNormalized data for the ith auxiliary variable or sensitive variable for the ith normal sample,is the mean value of the ith auxiliary variable or sensitive variable data.
Further, the weight calculation method of the cosine mahalanobis distance in step S34 includes:
in which ξ1Coefficient of variation, s, of Mahalanobis distance of normal sampleMDlStandard deviation of mahalanobis distance, μ, for normal samplesMDlThe mean value of the mahalanobis distance of the normal sample; xi2Coefficient of variation, s, of cosine similarity of normal samplesCSlIs the standard deviation of cosine similarity of normal sample, muCSlThe cosine similarity is the mean value of the cosine similarity of the normal sample;
the cosine mahalanobis distance described in step S35 is calculated as follows:
CMDl=αMDl+βCSl
wherein MDlRepresenting the mahalanobis distance of the ith normal sample to describe the similarity of the sample distances; CSlAnd the cosine similarity of the ith normal sample is represented and used for describing the similarity of the sample directions.
Further, the signal-to-noise ratio calculating method in step S37 is as follows:
wherein the CMDuRepresenting the mahalanobis distance of the abnormal samples, and v representing the number of the abnormal samples, for the auxiliary variable;
the signal-to-noise ratio increment described in step S38 is represented by Δ SNjRepresents:
wherein the content of the first and second substances,means representing the signal-to-noise ratio when using the sensitive variable;representing the mean of the signal-to-noise ratio when the sensitive variable is not used.
Further, the complex industrial process is a cracking production process; the product is aviation kerosene with 10 percent of distillation temperature.
Compared with the prior art, the invention has the beneficial effects that: on the basis of determining the sensitive variable and the key sensitive variable, calculating a sensitivity index according to the sensitivity of the variable to the working condition change and the net correlation between the auxiliary variable and the product quality, and realizing the initial selection of the sensitive variable; and then the problem of high redundancy of the variables is solved through the constructed weighted cosine field system, and the selection of the sensitive variables is realized. The method well solves the problems that the traditional filtering type variable selection method is easy to ignore the correlation of the variables and cannot accurately reflect the working condition information, and has the advantages of simple calculation, difficult overfitting, small redundancy and the like.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
Fig. 2 is a histogram of signal-to-noise ratio increments of sensitive variables of a fracturing process in an embodiment of the present invention.
FIG. 3 shows the result of prediction using a local weighted partial least squares method and a set of key sensitivity variables according to an embodiment of the present invention.
FIG. 4 shows the result of prediction using a local weighted partial least squares method and a set of sensitivity variables according to an embodiment of the present invention.
Fig. 5 shows the result of prediction by using a local weighted partial least squares method and mechanism-based screening of an auxiliary variable set in an embodiment of the present invention.
FIG. 6 is a diagram illustrating the relative error results of a local weighted partial least squares method and predictions using a set of key sensitivity variables, according to an embodiment of the present invention.
FIG. 7 is a diagram illustrating the relative error results of a local weighted partial least squares method and predictions using a set of sensitivity variables, according to an embodiment of the present invention.
FIG. 8 is a diagram illustrating the relative error results of the prediction using the local weighted partial least squares method and the mechanism to screen the set of auxiliary variables according to an embodiment of the present invention.
Detailed Description
In order to further disclose the present invention, the technical solutions disclosed in the present invention will be fully and specifically described below with reference to the accompanying drawings of the specification:
as shown in FIG. 1, the method for screening sensitive variables influencing product quality in a complex industrial process based on step-by-step reduction provided by the invention comprises the following steps:
s1, based on a production process, preliminarily selecting a plurality of auxiliary variables which possibly influence the product quality through mechanism analysis and expert knowledge, and collecting a plurality of groups of auxiliary variable values and product quality values at corresponding moments as samples;
s2, comprehensively considering the correlation between the auxiliary variable and the product quality and the sensitivity of the auxiliary variable to the working condition change, calculating an auxiliary variable sensitivity index, and preliminarily screening the sensitive variable influencing the product quality according to the sensitivity index.
And S3, constructing a weighted cosine martian system, performing attribute reduction on the initially selected sensitive variables from two angles of distance and direction, and accurately screening out the sensitive variables affecting the product quality as key sensitive variables.
As an improvement, the foregoing technical solution may further include step S4: and (3) establishing a product quality prediction model by adopting a local weighted partial least square method, and verifying the effectiveness and accuracy of the selected key sensitive variable.
The step S2 is specifically implemented as follows:
s21, performing outlier rejection, wavelet denoising and standardization processing on the collected auxiliary variable value samples;
s22, calculating a correlation coefficient matrix of the auxiliary variable and the product quality by using a Pearson correlation analysis method, and calculating a partial correlation coefficient of the auxiliary variable and the product quality according to the correlation coefficient matrix;
s23, calculating the mean value, the standard deviation and the variance of the auxiliary variable, and further calculating the variation coefficient of the auxiliary variable;
s24, taking the product of the auxiliary variable and the partial correlation coefficient of the product quality and the variation coefficient of the auxiliary variable as the sensitivity index of the auxiliary variable, and calculating according to the partial correlation coefficient and the variation coefficient of the auxiliary variable in the steps S22 and S23 to obtain the sensitivity index of the auxiliary variable;
s25, setting different threshold values for the sensitivity indexes based on expert knowledge according to production process objects and product quality, and selecting auxiliary variables in the threshold value range as sensitive variables.
The normalization processing in the foregoing steps S21 and S31 is performed as follows:
zij=(xij-μi)/Si
wherein z isijRepresentation normalization processSample value, x, of the ith auxiliary or sensitive variableijThe j sample value, mu, representing the i auxiliary variable or the sensitive variableiRepresenting the mean, s, of the ith auxiliary or sensitive variableiIndicating the standard deviation of the ith auxiliary variable or the sensitive variable.
The correlation coefficient matrix described in step S22 is calculated as follows:
the partial correlation coefficient is calculated as follows:
the coefficient of variation of the auxiliary variable in step S23 is calculated as follows:
wherein, muiRepresenting the mean, s, of the ith auxiliary or sensitive variableiIndicating the standard deviation, σ, of the ith auxiliary or sensitive variableiRepresents the variance of the ith auxiliary variable or sensitive variable;
the auxiliary variable sensitivity index of step S24 is calculated as follows:
wherein etaikDenotes the sensitivity index, r, of the ith auxiliary variable to the kth main variableikRepresents the partial correlation coefficient, mu, of the ith auxiliary variable and the kth main variableiRepresenting the mean, s, of the ith auxiliary or sensitive variableiIndicating the standard deviation, σ, of the ith auxiliary or sensitive variableiRepresenting the variance of the ith auxiliary variable or sensitive variable. The step S3 is specifically implemented as follows:
s31, dividing the collected sensitive variable samples into a normal sample and an abnormal sample through mechanism analysis and expert knowledge, and carrying out standardization processing on the two types of samples, wherein the mean value and the standard deviation of the abnormal sample data in the standardization processing are the same as those of the normal sample data;
s32, respectively calculating the Mahalanobis distances of all normal samples;
s33, calculating cosine values of included angles of all normal samples respectively, and further calculating cosine similarity of all normal samples respectively;
s34, calculating the variation coefficients of the Mahalanobis distance and the cosine similarity of the normal sample respectively, and determining the weight of the cosine Mahalanobis distance according to the ratio of the Mahalanobis distance and the cosine similarity of the variation coefficients to the total variation coefficient;
s35, constructing a weighted cosine Mahalanobis reference space based on the cosine Mahalanobis distance of the normal sample;
s36, designing an orthogonal table, wherein each line in the orthogonal table corresponds to a weighted cosine Mahalanobis reference space, and calculating the cosine Mahalanobis distance of the abnormal sample in each reference space;
s37, selecting the signal-to-noise ratio with the expected large characteristic to calculate the signal-to-noise ratio of the abnormal sample in each reference space;
s38, respectively calculating the mean value of the signal-to-noise ratio when the sensitive variable is used and not used, then calculating the signal-to-noise ratio increment, setting a certain threshold value for the signal-to-noise ratio increment according to expert knowledge, and selecting all sensitive variables in the threshold value range as key sensitive variables.
As an improvement of the foregoing technical solution, after step S35 and before step S36, the method further includes the following steps:
calculating the cosine Mahalanobis distance of the abnormal sample according to the constructed weighted cosine Mahalanobis reference space, verifying the effectiveness of the constructed cosine Mahalanobis reference space, and if the weighted cosine Mahalanobis reference space can better distinguish the cosine Mahalanobis distances of the normal sample and the abnormal sample, the constructed weighted cosine Mahalanobis reference space is effective; otherwise, the process proceeds to step S3, where the weighted cosine martian system is reconstructed. The mahalanobis distance of the normal sample described in step S32 is calculated as follows:
s321, selecting n normal samples, and assuming that there are q initial sensitive variables in the samples, the sample space can be expressed as:
wherein o isliData representing the ith auxiliary variable or sensitive variable of the ith normal sample, wherein l is 1, 2. 1,2, ·, q;
s322, carrying out standardization processing on normal sample data:
whereinNormalized data representing the ith auxiliary variable or sensitive variable of the ith normal sample, where l is 1, 2. 1,2, ·, q;
s323, the Mahalanobis distance is as follows:
wherein, MDlIs the first positiveThe mahalanobis distance of the normal sample, S is the correlation coefficient matrix of the normal sample,wherein the content of the first and second substances,transpose of the normal sample data matrix after the representation normalization process, S-1Representing the inverse of the normal sample correlation coefficient matrix and q representing the number of initial sensitive variables.
The cosine similarity in step S33 is:
whereinNormalized data for the ith auxiliary variable or sensitive variable for the ith normal sample,is the mean value of the ith auxiliary variable or sensitive variable data.
The weight calculation method of the cosine mahalanobis distance described in step S34 includes:
in which ξ1Coefficient of variation, s, of Mahalanobis distance of normal sampleMDlStandard deviation of mahalanobis distance, μ, for normal samplesMDlThe mean value of the mahalanobis distance of the normal sample; xi2Coefficient of variation, s, of cosine similarity of normal samplesCSlIs the standard deviation of cosine similarity of normal sample, muCSlThe cosine similarity is the mean value of the cosine similarity of the normal sample;
the cosine mahalanobis distance described in step S35 is calculated as follows:
CMDl=αMDl+βCSl
wherein MDlRepresenting the mahalanobis distance of the ith normal sample to describe the similarity of the sample distances; CSlAnd the cosine similarity of the ith normal sample is represented and used for describing the similarity of the sample directions.
The signal-to-noise ratio calculation method described in step S37 is as follows:
wherein the CMDuRepresenting the mahalanobis distance of the abnormal samples, and v representing the number of the abnormal samples, for the auxiliary variable;
the signal-to-noise ratio increment described in step S38 is represented by Δ SNjRepresents:
wherein the content of the first and second substances,means representing the signal-to-noise ratio when using the sensitive variable;representing the mean of the signal-to-noise ratio when the sensitive variable is not used.
The specific embodiment is as follows:
the application of the technical scheme disclosed by the invention to the product quality prediction in the hydrocracking process comprises the following steps:
a. based on the cracking production process, preliminarily selecting the quality of 10 percent of distillation temperature of the aviation kerosene according to mechanism analysis and expert experience39 related variables with influence on the index are used as input variables of the quality prediction of the hydrocracking process and are respectively marked as x1、x2、...、x39The data of 39 relevant variables continuously produced in 966 days are extracted, and the data of 10% distillation temperature of aviation kerosene measured by offline test at 8 hours and 20 hours every day in 966 days are extracted, and the 1932 groups are total. The data obtained are divided into two parts for prediction modeling of aviation kerosene 10% distillation temperature quality index based on LWPLS, wherein 1288 groups are used as a training set, and 644 groups are used as a testing set. Using 180 sets of data in the training set as input data for the selection of the sensitive variables, the input matrix is:
xi=[xi,1,xi,2,...,xi,39]T,i=1,2,...,180
X=[x1,x2,...,x180]
b. and (3) comprehensively considering the correlation of the variables and the sensitivity of the variables to the working condition change, calculating an auxiliary variable sensitivity index, and preliminarily screening the sensitive variables influencing the 10 percent distillation temperature of the aviation kerosene according to the sensitivity index.
Normalizing the selected auxiliary variable sampling samples:
zij=(xij-μi)/si
wherein z isijRepresenting normalized data values, xijJ sample value, mu, representing the ith variableiDenotes the mean value of the ith variable, siThe standard deviation of the ith variable is indicated.
Calculating partial correlation coefficients of the auxiliary variables, and calculating a correlation coefficient matrix by using a Pearson correlation analysis method:
wherein the content of the first and second substances,normalized auxiliary variable ziPartial phase with 10% distillation temperature A10% of aviation keroseneCoefficient of correlation riA10%Comprises the following steps:
Calculating the coefficient of variation of the selected auxiliary variables:
wherein, muiDenotes the mean value of the ith variable, siDenotes the standard deviation, σ, of the ith variableiRepresents the variance of the ith variable.
Calculating the sensitivity index of the auxiliary variable, namely the product of the partial correlation coefficient of the auxiliary variable and the 10% distillation temperature of the aviation kerosene and the variation coefficient of the auxiliary variable:
wherein etaiA10%The sensitivity index, r, of the ith auxiliary variable to the 10% distillation temperature A10% of aviation keroseneiA10%The coefficient of partial correlation, μ, of the ith auxiliary variable with the 10% distillation temperature A10% of the aviation keroseneiDenotes the mean value of the ith variable, siDenotes the standard deviation, σ, of the ith variableiRepresents the variance of the ith variable.
The larger the sensitivity index is, the more the auxiliary variable has influence on the 10% distillation temperature A10% of the aviation kerosene, and the more the auxiliary variable is sensitive to the change of the working condition. The dispersion degree of 39 auxiliary variables, the partial correlation coefficient with the 10% distillation temperature A10% of the aviation kerosene and the sensitivity index are calculated, and the results are shown in Table 1.
TABLE 1 hydrocracking flow mechanism screening auxiliary variable sensitivity index
Based on 10% distillation temperature of the aviation kerosene, the threshold value of the auxiliary variable sensitivity index is set to 0.01 according to expert knowledge, and the sensitivity indexes of all variables are analyzed, so that the sensitivity indexes of a refining reactor bottom temperature indicator (12), a refining reactor pressure difference (13), a water injection amount (21) of a water injection tank, a hydrogen sulfide removal stripping tower top reflux amount (24), a main fractionating tower middle section return temperature (32), a diesel stripping tower top temperature (38) and a diesel stripping tower bottom temperature (39) are lower. The remaining 32 auxiliary variables, excluding the 7 lower sensitivity index variables, are therefore sensitive variables.
c. Constructing a weighted cosine martian system, performing attribute reduction on initially selected sensitive variables from two angles of distance and direction, and selecting key sensitive variables influencing 10% of distillation temperature A10% of aviation kerosene, wherein the specific steps comprise:
(1) if 32 normal samples are selected, and there are 32 initial sensitive variables in the samples, the sample space can be expressed as:
wherein o isijAnd ( i 1, 2.. times, 32; j 1, 2.. times, 32) represents the data of the j sensitive variable of the ith normal sample.
Normalizing normal sample data:
Calculating cosine mahalanobis distances of all normal samples:
CMDi=αMDi+βCSi
wherein MDiRepresenting the mahalanobis distance of the sample to describe the similarity of the sample distances; CSiRepresenting cosine similarity of the samples, and describing similarity of sample directions; alpha and beta are weight coefficients.
Calculating the Mahalanobis distance MD of the samplei:
calculating cosine similarity of the samples:
whereinFor the jth auxiliary variable data of the ith sample,is the mean value of the jth auxiliary variable data.
Determining the weight of the cosine Mahalanobis distance according to the variation degree of the Mahalanobis distance and the variation degree of the cosine similarity of the normal sample, wherein the specific formula is as follows:
in which ξ1Coefficient of variation, s, of Mahalanobis distance of normal sampleMDiStandard deviation of mahalanobis distance, μ, for normal samplesMDiThe mean value of the mahalanobis distance of the normal sample; xi2Coefficient of variation, s, of cosine similarity of normal samplesCSiIs the standard deviation of cosine similarity of normal sample, muCSiIs the mean value of cosine similarity of normal samples. Table 2 shows the results for the weighted cosine-scale reference space section.
TABLE 2 weighted cosine-bridge reference space
It can be seen from table 2 that the cosine mahalanobis distance of the normal sample fluctuates substantially around 1, and the mean value is 0.9020.
(2) The abnormal samples are normalized, and then the mahalanobis distance of the abnormal samples, the cosine similarity of the average value of the abnormal samples and the normal samples, and the cosine mahalanobis distance are respectively calculated, and the results are shown in table 3.
TABLE 3 cosine Mahalanobis distance of anomaly samples
As shown in table 3, the cosine mahalanobis distances of the abnormal samples are all much greater than 1, and the mean value is 205.5255, so that the constructed weighted cosine mahalanobis reference space can well distinguish the normal samples from the abnormal samples. The abnormal sample 3 is an outlier abnormal sample specially selected, the mahalanobis distance of the outlier abnormal sample is 1.6571, if the sample is judged according to the traditional madman system only according to the mahalanobis distance, the sample 3 is a normal sample and is not in accordance with the actual situation; the cosine similarity of the sample 3 is 5.5180, the cosine mahalanobis distance is 2.2362, and at this time, the weighted cosine martian system discriminates the sample 3 as an abnormal sample, so that compared with the conventional martian system, the weighted cosine martian system can better distinguish a normal sample from an abnormal sample.
(3) Optimizing a reference space: the orthogonal table shown in table 4 was designed, level 1 indicated the use of the auxiliary variable, level 2 indicated the non-use of the auxiliary variable, and the signal-to-noise ratio was calculated. Each line in the orthogonal table corresponds to a reference space, and the cosine Mahalanobis distance and the signal-to-noise ratio of the abnormal sample in each reference space are calculated according to the following calculation formula:
wherein the CMDpMahalanobis distance for the abnormal sample. As far as the auxiliary variable is concerned,means representing the signal-to-noise ratio when using the sensitive variable;representing the mean of the signal-to-noise ratio when the sensitive variable is not used. Delta SN for signal-to-noise ratio incrementjAnd (4) showing.
Aiming at the 10 percent distillation temperature of the aviation kerosene in the hydrocracking process, the signal-to-noise ratio increment threshold value is set to be 0.3 based on expert knowledge, and all sensitive variables in the threshold value range are selected as key sensitive variables (the serial numbers in brackets are the variable serial numbers in a mechanism screening auxiliary variable sensitivity index table).
TABLE 4 Bi-level Quadrature Table and SNR
The signal-to-noise ratio increment histograms of the 32 sensitive variables are shown in fig. 2, and the signal-to-noise ratio increments of the variables 21 (original mechanical screening auxiliary variables 25), 28 (original mechanical screening auxiliary variable 33) and 32 (original mechanical screening auxiliary variable 37) are negative, which indicates that the auxiliary variables are invalid for modeling; the signal-to-noise ratio increment of the variable 26 (the original mechanism screening auxiliary variable 30) is small, which shows that the auxiliary variable has a small effect on modeling and can be ignored. And finally obtaining 28 key sensitive variables which can be used for predictive modeling.
d. A prediction model is built by adopting a Local Weighted Partial Least Squares (LWPLS) method, the data for modeling have 1610 groups, wherein 966 groups are used as a training set, 644 is used as a test set, an auxiliary variable set is used for modeling by screening a variable set, a sensitive variable set and a key sensitive variable set according to mechanisms respectively, the model parameters are completely the same, the prediction result is shown in figures 3-5, the prediction error is shown in figures 6-8, and the root mean square error RMSE is shown in table 5. As can be seen from the graphs in FIGS. 3 and 6, the prediction modeling is carried out by using the key sensitive variables, the prediction result can better track the actual value of the 10% distillation temperature of the aviation kerosene compared with other two auxiliary variable sets, the prediction error is smaller, the predicted root mean square error RMSE is 3.0390, the predicted root mean square error RMSE is respectively improved by 5.81% and 3.94% compared with the other two auxiliary variable sets, and the effectiveness of the method provided by the invention is verified.
TABLE 53 Root Mean Square Error (RMSE) for variable set predictive modeling
In addition, the effectiveness of the method provided by the invention is verified by respectively adopting 3 methods of Partial Least Squares (PLS), Support Vector Machines (SVM) and Local Weighted Kernel Principal Component Regression (LWKPCR), and the root mean square errors of the three methods are shown in Table 6.
TABLE 63 root mean square error RMSE for different predictive modeling of variable sets
Finally, the above examples are merely for illustrative clarity and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (2)
1. A method for screening sensitive variables influencing product quality in a complex industrial process based on stepwise reduction is characterized by comprising the following steps:
s1, based on a production process, preliminarily selecting a plurality of auxiliary variables influencing product quality through mechanism analysis and expert knowledge, and collecting a plurality of groups of auxiliary variable values and product quality values at corresponding moments as samples;
s2, comprehensively considering the correlation between the auxiliary variable and the product quality and the sensitivity of the auxiliary variable to the working condition change, calculating an auxiliary variable sensitivity index, and preliminarily screening the sensitive variable influencing the product quality according to the sensitivity index:
s21, performing outlier rejection, wavelet denoising and standardization processing on the collected auxiliary variable value samples, wherein the standardization processing adopts the following mode:
zij=(xij-μi)/Si
wherein z isijThe j sample value, x, representing the ith auxiliary or sensitive variable after normalizationijThe j sample value, mu, representing the i auxiliary variable or the sensitive variableiRepresenting the mean, s, of the ith auxiliary or sensitive variableiRepresents the standard deviation of the ith auxiliary variable or sensitive variable;
s22, calculating a correlation coefficient matrix of the auxiliary variable and the product quality by using a Pearson correlation analysis method, and calculating a partial correlation coefficient of the auxiliary variable and the product quality according to the correlation coefficient matrix, wherein the correlation coefficient matrix is calculated as follows:
the partial correlation coefficient is calculated as follows:
s23, calculating the mean value, the standard deviation and the variance of the auxiliary variables, and further calculating the variation coefficient of the auxiliary variables, wherein the variation coefficient of the auxiliary variables is calculated as follows:
wherein, muiRepresenting the mean, s, of the ith auxiliary or sensitive variableiIndicating the standard deviation, σ, of the ith auxiliary or sensitive variableiRepresents the variance of the ith auxiliary variable or sensitive variable;
s24, taking the product of the auxiliary variable and the partial correlation coefficient of the product quality and the variation coefficient of the auxiliary variable as the sensitivity index of the auxiliary variable, and calculating according to the partial correlation coefficient and the variation coefficient of the auxiliary variable in the steps S22 and S23 to obtain an auxiliary variable sensitivity index, wherein the auxiliary variable sensitivity index is calculated as follows:
wherein etaikDenotes the sensitivity index, r, of the ith auxiliary variable to the kth main variableikRepresents the partial correlation coefficient, mu, of the ith auxiliary variable and the kth main variableiMeans representing the mean of the ith auxiliary variable or sensitive variable;
s25, setting different thresholds for the sensitivity indexes based on expert knowledge according to production process objects and product quality, and selecting auxiliary variables in the threshold range as sensitive variables;
s3, constructing a weighted cosine martian system, performing attribute reduction on the initially selected sensitive variables from two angles of distance and direction, and accurately screening out the sensitive variables influencing the product quality as key sensitive variables:
s31, dividing collected sensitive variable samples into a normal sample and an abnormal sample through mechanism analysis and expert knowledge, and carrying out standardization processing on the two types of samples, wherein the mean value and the standard deviation of the abnormal sample data in the standardization processing are the same as those of the normal sample data;
s32, respectively calculating the Mahalanobis distances of all normal samples;
s321, selecting n normal samples, and assuming that there are q initial sensitive variables in the samples, the sample space can be expressed as:
wherein o isliData representing the ith auxiliary variable or sensitive variable of the ith normal sample, wherein l is 1,2, …, n; i is 1,2, …, q;
s322, carrying out standardization processing on normal sample data:
whereinNormalized data representing the ith auxiliary variable or sensitive variable of the ith normal sample, where l is 1, 2. 1,2, ·, q;
s323, the Mahalanobis distance is as follows:
wherein, MDlMahalanobis distance of the ith normal sample, S is the correlation coefficient matrix of the normal sample,wherein the content of the first and second substances,transpose of the normal sample data matrix after the representation normalization process, S-1Representing the inverse matrix of the normal sample correlation coefficient matrix, and q representing the number of the initial sensitive variables;
s33, calculating cosine values of included angles of all normal samples respectively, and further calculating cosine similarity of all normal samples respectively, wherein the cosine similarity is as follows:
whereinNormalized data for the ith auxiliary variable or sensitive variable for the ith normal sample,for the ith auxiliary variable or sensitivityMean of the sensed variable data;
s34, calculating the variation coefficients of the Mahalanobis distance and the cosine similarity of the normal sample respectively, and determining the weight of the cosine Mahalanobis distance according to the ratio of the Mahalanobis distance and the cosine similarity variation coefficient to the total variation coefficient, wherein the weight calculation method of the cosine Mahalanobis distance comprises the following steps:
in which ξ1Coefficient of variation, s, of Mahalanobis distance of normal sampleMDlStandard deviation of mahalanobis distance, μ, for normal samplesMDlThe mean value of the mahalanobis distance of the normal sample; xi2Coefficient of variation, s, of cosine similarity of normal samplesCSlIs the standard deviation of cosine similarity of normal sample, muCSlThe cosine similarity is the mean value of the cosine similarity of the normal sample;
s35, constructing a weighted cosine Mahalanobis reference space based on the cosine Mahalanobis distance of the normal sample, wherein the cosine Mahalanobis distance is calculated as follows:
CMDl=αMDl+βCSl
wherein MDlRepresenting the mahalanobis distance of the ith normal sample to describe the similarity of the sample distances; CSlThe cosine similarity of the ith normal sample is represented and used for describing the similarity of the sample direction;
s36, designing an orthogonal table, wherein each row in the orthogonal table corresponds to a weighted cosine mahalanobis reference space, calculating the cosine mahalanobis distance of the abnormal sample in each reference space, calculating the cosine mahalanobis distance of the abnormal sample according to the constructed weighted cosine mahalanobis reference space, verifying the effectiveness of the constructed cosine mahalanobis reference space, and if the weighted cosine mahalanobis reference space can distinguish the cosine mahalanobis distances of the normal sample and the abnormal sample, the constructed weighted cosine mahalanobis reference space is effective; otherwise, the step S3 is carried out, and the weighted cosine hippopotamus system is reconstructed;
s37, calculating the signal-to-noise ratio of the abnormal sample in each reference space by using the signal-to-noise ratio with the expected large characteristic, wherein the signal-to-noise ratio calculation method comprises the following steps:
CMDurepresenting the mahalanobis distance of the abnormal samples, and v representing the number of the abnormal samples, for the auxiliary variable;
s38, respectively calculating the mean value of the signal-to-noise ratio when the sensitive variable is used and not used, then calculating the signal-to-noise ratio increment of the sensitive variable, setting a certain threshold value for the signal-to-noise ratio increment according to expert knowledge, and selecting all sensitive variables in the threshold value range as key sensitive variables, wherein the signal-to-noise ratio increment is obtained by using delta SNiRepresents:
wherein the content of the first and second substances,means representing the signal-to-noise ratio when using the sensitive variable;means representing the signal-to-noise ratio without use of the sensitive variable;
and (3) establishing a product quality prediction model by adopting a local weighted partial least square method, and verifying the effectiveness and accuracy of the selected key sensitive variable.
2. The method for step-wise reduction-based screening of sensitive variables affecting product quality in complex industrial processes according to claim 1, characterized by: the complex industrial process is a cracking production process; the product is aviation kerosene with 10% distillation temperature.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910529924 | 2019-06-19 | ||
CN2019105299243 | 2019-06-19 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110490496A CN110490496A (en) | 2019-11-22 |
CN110490496B true CN110490496B (en) | 2022-03-11 |
Family
ID=68558550
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910880769.XA Active CN110490496B (en) | 2019-06-19 | 2019-09-18 | Method for screening sensitive variables influencing product quality in complex industrial process based on stepwise reduction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110490496B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112101458B (en) * | 2020-09-16 | 2024-04-19 | 河海大学常州校区 | Characteristic measurement method and device based on field function-signal-to-noise ratio |
CN112465044B (en) * | 2020-12-03 | 2022-12-27 | 上海卫星工程研究所 | Satellite working condition identification and segmentation method and system based on sensitive parameter multi-evidence fusion |
CN116306931B (en) * | 2023-05-24 | 2023-08-04 | 典基网络科技(上海)有限公司 | Knowledge graph construction method applied to industrial field |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106872658A (en) * | 2017-01-22 | 2017-06-20 | 华南理工大学 | A kind of method of the COD of sewage load prediction based on vector time series model |
CN109164794A (en) * | 2018-11-22 | 2019-01-08 | 中国石油大学(华东) | Multivariable industrial process Fault Classification based on inclined F value SELM |
CN109307749A (en) * | 2018-11-06 | 2019-02-05 | 重庆大学 | A kind of aeration disturbs the correlation analysis of lower black-odor riverway water quality indicator |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2962828B1 (en) * | 2010-07-19 | 2012-08-17 | Advanced Track & Trace | METHODS AND DEVICES FOR MARKING AND AUTHENTICATING A PRODUCT BY A CONSUMER |
US9918630B2 (en) * | 2015-09-01 | 2018-03-20 | Ou Tan | Systems and methods of glaucoma diagnosis based on frequency analysis of inner retinal surface profile measured by optical coherence tomography |
CN108776831A (en) * | 2018-05-15 | 2018-11-09 | 中南大学 | A kind of complex industrial process Data Modeling Method based on dynamic convolutional neural networks |
CN110210687A (en) * | 2019-06-13 | 2019-09-06 | 中南大学 | A kind of Nonlinear Dynamic production process product quality prediction technique returned based on local weighted slow feature |
-
2019
- 2019-09-18 CN CN201910880769.XA patent/CN110490496B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106872658A (en) * | 2017-01-22 | 2017-06-20 | 华南理工大学 | A kind of method of the COD of sewage load prediction based on vector time series model |
CN109307749A (en) * | 2018-11-06 | 2019-02-05 | 重庆大学 | A kind of aeration disturbs the correlation analysis of lower black-odor riverway water quality indicator |
CN109164794A (en) * | 2018-11-22 | 2019-01-08 | 中国石油大学(华东) | Multivariable industrial process Fault Classification based on inclined F value SELM |
Non-Patent Citations (1)
Title |
---|
"东北三省循环经济效率评价及其影响因素分析";王一帆;《中国优秀博士学位论文全文数据库(博士) 经济与管理科学辑》;20160831;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110490496A (en) | 2019-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110490496B (en) | Method for screening sensitive variables influencing product quality in complex industrial process based on stepwise reduction | |
CN106845796B (en) | One kind is hydrocracked flow product quality on-line prediction method | |
CN109685366A (en) | Equipment health state evaluation method based on mutation data | |
CN108549908B (en) | Chemical process fault detection method based on multi-sampling probability kernel principal component model | |
CN111382542A (en) | Road electromechanical equipment life prediction system oriented to full life cycle | |
CN110794093B (en) | Precision compensation method for discharged caustic alkali concentration measuring device in evaporation process | |
CN110083860B (en) | Industrial fault diagnosis method based on relevant variable selection | |
CN113642754A (en) | Complex industrial process fault prediction method based on RF noise reduction self-coding information reconstruction and time convolution network | |
CN112507479B (en) | Oil drilling machine health state assessment method based on manifold learning and softmax | |
CN113743016B (en) | Engine residual life prediction method based on self-encoder and echo state network | |
CN116380445B (en) | Equipment state diagnosis method and related device based on vibration waveform | |
Gao et al. | A process fault diagnosis method using multi‐time scale dynamic feature extraction based on convolutional neural network | |
CN116383636A (en) | Coal mill fault early warning method based on PCA and LSTM fusion algorithm | |
WO2021114320A1 (en) | Wastewater treatment process fault monitoring method using oica-rnn fusion model | |
CN114297918A (en) | Aero-engine residual life prediction method based on full-attention depth network and dynamic ensemble learning | |
CN109240276B (en) | Multi-block PCA fault monitoring method based on fault sensitive principal component selection | |
CN105868164A (en) | Soft measurement modeling method based on monitored linear dynamic system model | |
Wang et al. | A novel sliding window PCA-IPF based steady-state detection framework and its industrial application | |
CN113240527A (en) | Bond market default risk early warning method based on interpretable machine learning | |
CN112395684A (en) | Intelligent fault diagnosis method for high-speed train running part system | |
CN116339275A (en) | Multi-scale process fault detection method based on full-structure dynamic autoregressive hidden variable model | |
CN111931574B (en) | Robust fault diagnosis method for pneumatic regulating valve | |
CN115186584A (en) | Width learning semi-supervised soft measurement modeling method integrating attention mechanism and adaptive composition | |
CN112069621B (en) | Method for predicting residual service life of rolling bearing based on linear reliability index | |
CN115034307A (en) | Vibration data self-confirmation method, system and terminal for multi-parameter self-adaptive fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |