CN111639304B - CSTR fault positioning method based on Xgboost regression model - Google Patents

CSTR fault positioning method based on Xgboost regression model Download PDF

Info

Publication number
CN111639304B
CN111639304B CN202010491108.0A CN202010491108A CN111639304B CN 111639304 B CN111639304 B CN 111639304B CN 202010491108 A CN202010491108 A CN 202010491108A CN 111639304 B CN111639304 B CN 111639304B
Authority
CN
China
Prior art keywords
fault
variable
model
cstr
splitting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010491108.0A
Other languages
Chinese (zh)
Other versions
CN111639304A (en
Inventor
赵忠盖
潘磊
李庆华
刘成林
刘飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202010491108.0A priority Critical patent/CN111639304B/en
Publication of CN111639304A publication Critical patent/CN111639304A/en
Application granted granted Critical
Publication of CN111639304B publication Critical patent/CN111639304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention discloses a CSTR fault positioning method based on an Xgboost regression model. The invention relates to a CSTR fault positioning method based on Xgboost regression, which comprises the following steps: 1) Normal data generated by sensors in the CSTR is collected, as well as unknown offline data. 2) And (3) establishing a monitoring model of the normal data acquired in the step (1), and freely selecting different monitoring models according to the requirements of different occasions. 3) And (3) establishing a monitoring model through the step (2), bringing the offline unknown data acquired in the step (1) into the monitoring model, extracting sample statistics to detect faults, and screening out fault data. The invention has the beneficial effects that: 1) The importance of the variables of the Xgboost regression model measures the influence of the variables on the output prediction accuracy, and the calculation of the metric value of each variable is independent from the other, and compared with the prior art, the variable importance measure does not contain components of the action of other variables, so that the influence of the tailing effect is eliminated.

Description

CSTR fault positioning method based on Xgboost regression model
Technical Field
The invention relates to the field of CSTR, in particular to a CSTR fault positioning method based on an Xgboost regression model.
Background
The Continuous Stirred Tank Reactor (CSTR) is a very important reaction device in chemical production and has very wide application. In the production of three large synthetic materials of chemical fiber, plastic and synthetic rubber, the CSTR occupies more than 90% of the synthetic production reactors, and is also widely used in the fields of pharmacy, pesticides, fuels and the like. In view of the wide application of CSTR in the actual production process, it is very valuable to ensure the stability and safety of the operation.
With continuous scale and complication of modern chemical production, huge loss is often caused when faults occurring in the production cannot be accurately identified and timely recovered. With the continuous generation of a large amount of data reflecting process mechanisms in industrial processes, monitoring of industrial processes through data-driven multivariate statistical monitoring models becomes more and more popular.
The traditional technology has the following technical problems:
at present, a great number of technical means are applied to the aspect of fault detection based on multivariate statistical analysis in the actual industrial process, but fault location is still a technical difficulty to be further solved as an important link to be completed after fault detection. Currently, common fault location methods based on multivariate statistical analysis mainly include a contribution graph method, a reconstruction method and a reconstruction contribution method (RBC), but these methods are susceptible to smearing effect, so that misdiagnosis may occur in practical application. Meanwhile, in systems with different characteristics, such as linearity, nonlinearity, non-gaussian and the like, the traditional fault positioning methods are different from one another, the fault positioning methods are greatly different from one another, and few related technical documents propose a unified method to realize the positioning of the fault source.
Disclosure of Invention
The invention provides a CSTR fault positioning method based on an Xgboost regression model, which comprises the steps of firstly establishing a multivariate statistical monitoring model aiming at normal data collected in a CSTR; screening out a fault data section in offline acquired data through a monitoring model, taking the fault data section as input, taking corresponding statistic as output to establish an Xgboost regression model, taking variable importance measurement as the contribution rate of a variable to the statistic, wherein the variable with a larger value is more likely to be a fault variable, and identifying the largest variable as the fault variable. The method has the advantages that the method is different from fault positioning methods such as a traditional reconstruction contribution method, a partial differential method and the like, the Xgboost regression model used by the method can be simultaneously used for fault positioning in nonlinear and linear processes, the calculated amount is small, the tailing effect is small, and the performance is better in the aspects of micro fault and random fault positioning of the CSTR.
In order to solve the technical problem, the invention provides a CSTR fault positioning method based on an Xgboost regression model, which comprises the following steps:
1) Collecting normal data generated by a sensor in the CSTR and unknown off-line data;
2) Establishing a monitoring model of the normal data acquired in the step 1, and freely selecting different monitoring models according to the requirements of different occasions;
3) Establishing a monitoring model through the step 2, bringing the offline unknown data acquired in the step 1 into the monitoring model, extracting sample statistics to detect faults, and screening out fault data;
4) Collecting the fault data in the step 3 as the input of the training sample and the corresponding statistic as the output of the training sample;
5) And (4) establishing an Xgboost regression model of the training sample in the step (4) to obtain variable importance measurement of each variable, wherein the variables with larger measurement values are more likely to be fault variables, and the fault variable with the largest value is identified.
In one embodiment, in step 2, the monitoring model of the normal data collected in step 1 is a PCA monitoring model; the method specifically comprises the following steps:
assuming that a sample set under a normal working condition is X ∈ R n×m N is the number of samples, m is the number of variables; after standardization treatment, the mean value is 0, and the standard deviation is 1; obtaining a covariance matrix S and carrying out singular value decomposition to obtain:
Figure BDA0002521128480000031
wherein P ∈ R m×l
Figure BDA0002521128480000032
Respectively are principal component and residual load vector, l is the number of principal component, lambda,
Figure BDA0002521128480000033
Diagonal arrays consisting of principal component and residual characteristic values respectively;
any one sample can be decomposed into:
Figure BDA0002521128480000034
in the formula, C and
Figure BDA0002521128480000035
projection matrices representing principal component and residual space, respectively;
in one embodiment, fault detection is performed by extracting SPE statistics, and for the SPE statistics, there are:
Figure BDA0002521128480000036
SPE statistic control limit can be obtained by sampling distribution, if the statistic exceeds the corresponding control limit, the process is considered to be abnormal, and therefore fault detection is achieved.
In one embodiment, the step 5 specifically includes the following steps:
5a) For a fault data set with n samples of m variables:
D={(x i ,y i )}(|D|=n,x i ∈R m ,y i ∈R)
where y is a statistic, an Xgboost regression model is defined to predict x in D:
Figure BDA0002521128480000037
wherein K is the number of decision trees; f is a CART regression tree function;
Figure BDA0002521128480000038
is a prediction output;
Figure BDA0002521128480000039
representing a set of possible decision tree functions;
defining the loss function L as:
Figure BDA0002521128480000041
wherein l is a slightly convex function, the difference between the predicted value and the true value is measured, and a mean square error function is selected; Ω (f) is:
Ω(f)=γT+λ||w|| 2 /2
wherein T represents the number of leaves, w represents the weight of the leaves, and lambda and gamma are penalty terms;
5b) Establishing a CART regression tree model for the training samples in the step 4, in order to prevent overfitting, putting back extracted equivalent data for each tree in a resampling mode, and selecting an optimal splitting variable and an optimal splitting point through a greedy algorithm to enable splitting gain to be maximum;
5c) And continuously iterating the step 5b to generate a new CART regression tree to fit the prediction residual error of the last CART regression tree, and iterating until the loss function is minimum, wherein the loss function L (t) iterated to the t step comprises the following steps:
Figure BDA0002521128480000042
and (3) popularizing the Taylor series of the loss function to 2 orders, and moving out the constant term, so that the loss function in the t step becomes:
Figure BDA0002521128480000043
Figure BDA0002521128480000044
wherein g is i 、h i Are respectively provided with
Figure BDA0002521128480000045
About
Figure BDA0002521128480000046
1 and 2 derivatives of; the derivation is carried out on the above formula and the derivation result is 0 to obtain the leaf weight w * And substituting the following formula:
Figure BDA0002521128480000051
5d) Combining all CART regression trees together to obtain an Xgboost regression model, dividing the gain sum of each variable during splitting by the corresponding splitting times to obtain an average splitting gain, and dividing the gain of each variable by the average splitting gain sum of all variables to obtain the variable importance measurement of the corresponding variable, wherein the variable with larger measurement value is more likely to be a fault variable.
In one embodiment, in step 5c, the smaller the loss function of the above formula, the better the model fit is; and selecting the optimal splitting variable and the optimal splitting point through a loss function, and simultaneously calculating the splitting gain corresponding to the optimal splitting variable when the optimal splitting point is split.
In one embodiment, in step 5c, assume that L L And L R Respectively, set of left and right nodes after division, I = I L ∪I R (ii) a The split gain after splitting is:
Figure BDA0002521128480000052
in one embodiment, in step 2, different monitoring models can be freely selected according to the requirements of different occasions, specifically as follows: the linear model selects PCA and the nonlinear model selects KPCA.
Based on the same inventive concept, the present application also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods when executing the program.
Based on the same inventive concept, the present application also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of any of the methods.
Based on the same inventive concept, the present application further provides a processor for executing a program, wherein the program executes to perform any one of the methods.
The invention has the beneficial effects that:
1) The importance of the variables of the Xgboost regression model measures the influence of the variables on the output prediction accuracy, and the calculation of the metric value of each variable is independent from the other, and compared with the prior art, the variable importance measure does not contain components of the action of other variables, so that the influence of the tailing effect is eliminated.
2) Compared with the existing RBC fault recognition technology, the fault recognition method of the CSRT model has high running speed and can be used for fault location in various occasions such as linearity, nonlinearity, multi-mode and the like.
Drawings
FIG. 1 is a flow chart of fault location in the CSTR fault location method based on the Xgboost regression model.
FIG. 2 is a generation flow chart of the CSTR fault location method based on the Xgboost regression model.
FIG. 3 shows the feed concentration C in the CSTR fault location method based on the Xgboost regression model of the present invention i Random disturbance fault identification.
FIG. 4 shows the cooling water temperature T in the CSTR fault location method based on the Xgboost regression model ci And identifying zero drift faults.
FIG. 5 is a schematic diagram of a CSTR device in the CSTR fault location method based on the Xgboost regression model.
Detailed Description
The present invention is further described below in conjunction with the drawings and the embodiments so that those skilled in the art can better understand the present invention and can carry out the present invention, but the embodiments are not to be construed as limiting the present invention.
As shown in fig. 1, a CSTR fault location method based on Xgboost regression includes the following steps:
1) Normal data generated by sensors in the CSTR is collected, as well as unknown offline data.
2) And (2) establishing a monitoring model of the normal data acquired in the step (1), and freely selecting different monitoring models according to the requirements of different occasions.
3) And (3) establishing a monitoring model through the step (2), bringing the offline unknown data acquired in the step (1) into the monitoring model, extracting sample statistics to detect faults, and screening out fault data.
4) And collecting the fault data in the step 3 as the input of the training sample and the corresponding statistic as the output of the training sample.
5) And (4) establishing an Xgboost regression model of the training sample in the step (4) to obtain variable importance measurement of each variable, wherein the variables with larger measurement values are more likely to be fault variables, and the fault variable with the largest value is identified.
The step 2 specifically comprises the following steps:
2a) For the establishment of the monitoring model, the PCA monitoring model is taken as an example in the present invention. Assuming that a sample set under a normal working condition is X ∈ R n×m N is the number of samples, and m is the number of variables. After normalization, the mean value was set to 0 and the standard deviation was set to 1. Obtaining a covariance matrix S and carrying out singular value decomposition to obtain:
Figure BDA0002521128480000071
wherein P ∈ R m×l
Figure BDA0002521128480000072
Respectively are principal component and residual load vector, l is the number of principal component, lambda,
Figure BDA0002521128480000073
And the diagonal matrixes are formed by principal elements and residual characteristic values respectively.
Any one sample can be decomposed into:
Figure BDA0002521128480000074
in the formula, C and
Figure BDA0002521128480000075
the projection matrices represent principal component and residual space, respectively.
Carry out fault detection through extracting SPE statistics, there is:
Figure BDA0002521128480000076
SPE statistic control limit can be obtained by sampling distribution, if the statistic exceeds the corresponding control limit, the process is considered to be abnormal, and therefore fault detection is achieved.
The step 5 specifically comprises the following steps:
5a) For a fault data set with n samples of m variables:
D={(x i ,y i )}(|D|=n,x i ∈R m ,y i ∈R)
where y is a statistic, an Xgboost regression model is defined to predict x in D:
Figure BDA0002521128480000081
wherein K is the number of decision trees; f is a CART regression tree function;
Figure BDA0002521128480000082
is a prediction output;
Figure BDA0002521128480000083
representing a set of possible decision tree functions.
Defining the loss function L as:
Figure BDA0002521128480000084
where l is a slightly convex function, the difference between the predicted value and the true value is measured, where the mean square error function is selected. Ω (f) is:
Ω(f)=γT+λ||w|| 2 /2
wherein, T represents the number of leaves, w represents the weight of the leaves, and λ and γ are penalty terms.
5b) And (5) establishing a CART regression tree model for the training samples in the step (4), in order to prevent overfitting, putting back extracted equivalent data for each tree in a resampling mode, and selecting an optimal splitting variable and an optimal splitting point through a greedy algorithm to enable splitting gain to be maximum.
5c) And continuously iterating the step 5b to generate a new CART regression tree to fit the prediction residual error of the last CART regression tree, and iterating until the loss function is minimum, wherein the loss function L (t) iterated to the t step comprises the following steps:
Figure BDA0002521128480000091
and (3) popularizing the Taylor series of the loss function to 2 orders, and moving out the constant term, so that the loss function in the t step becomes:
Figure BDA0002521128480000092
Figure BDA0002521128480000093
wherein g is i 、h i Are respectively provided with
Figure BDA0002521128480000094
About
Figure BDA0002521128480000095
1 and 2 derivatives of. The derivation is carried out on the above formula and the derivation result is 0 to obtain the leaf weight w * And substituting the following formula:
Figure BDA0002521128480000096
when the loss function of the above formula is smaller, the better the model fits is. And selecting the optimal splitting variable and the optimal splitting point through a loss function, and simultaneously calculating the splitting gain corresponding to the optimal splitting variable when the optimal splitting point is split.
Suppose L L And L R Respectively, set of left node and right node after division, and let I = I L ∪I R . The split gain after splitting is:
Figure BDA0002521128480000097
5d) Combining all CART regression trees together to obtain an Xgboost regression model, dividing the gain sum of each Variable during splitting by the corresponding splitting times to obtain an Average splitting gain (Average gain), and dividing the gain of each Variable by the Average splitting gain sum of all variables to obtain a Variable Importance measure (Variable Importance) of the corresponding Variable, wherein the larger the measure value, the more possible the Variable is a fault Variable.
A specific application scenario of the present invention is given below:
taking sample data collected by a certain CSTR equipment as an example, the data comprises normal working condition data and fault data. As shown in FIG. 5, the model contains the feed concentration C i Temperature T of feed i (ii) a Discharge concentration C and discharge temperature T; cooling water inlet temperature T ci Cooling water outlet temperature T c And cooling water flow rate Q c
The Xgboost regression fault identification method is compared with the existing RBC identification method for verification, and FIG. 3 shows that the two methods are used for the feed concentration C i Compared with random interference fault identification effects, the Xgboost regression method can obviously and effectively remove the influence of the tailing effect, although the RBC method with the largest contribution rate is also the variable C i However, it is clear that the tailing effect is severe, and FIG. 4 shows the temperature T of the cooling water in the two methods ci The random interference fault identification effects are compared, and the Xgboost regression method is proved to be capable of effectively removing the tailing effect compared with the RBC and aiming at the fault variable T ci Recognition effectMore preferably.
In summary, compared with the RBC method, the Xgboost regression model-based fault location method provided by the invention can effectively identify fault variables under the PCA model, and is not affected by the smearing effect. The PCA monitoring model is only an example for clearly illustrating the present invention, and is not a limitation on the fault detection method implemented by the present invention, and the Xgboost regression model may be combined with the PCA monitoring model, or may be combined with other multivariate statistical monitoring models such as KPCA to realize the positioning of the fault by extracting statistics.
The CSTR fault location method based on the Xgboost regression model provided by the present invention is described in detail above, and the following points need to be explained:
a CSTR fault positioning method based on an Xgboost regression model is characterized by comprising the following steps: the method comprises the following steps in sequence:
a) Normal data generated by sensors in the CSTR is collected, as well as unknown offline data.
b) And (b) establishing a monitoring model of the normal data acquired in the step (a), and freely selecting different monitoring models according to the requirements of different occasions, such as linear model selection PCA and nonlinear model selection KPCA.
c) And b, building a monitoring model through the step b, bringing the offline unknown data collected in the step a into the monitoring model, detecting whether a fault exists, screening out fault data if the fault exists, and performing next fault positioning operation.
d) And c, collecting the fault data in the step c as the input of the training sample and the corresponding statistic as the output of the training sample.
e) And d, establishing an Xgboost regression model of the training samples in the step d to obtain variable importance measurement of each variable, wherein the variables with larger measurement values are more likely to be fault variables, and identifying the fault variable with the largest value.
In the step b, correspondingly different multivariate statistical monitoring models such as a linear PCA model, a nonlinear KPCA model and the like can be selected for different system characteristics, and all the methods can be combined with an Xgboost regression method to perform fault location.
3. The Xgboost regression model-based industrial process fault location method of claim 1. The method is characterized in that: the step c specifically comprises the following steps:
step c1: and taking the fault data after the monitoring model is screened as input, and taking the corresponding statistic as output to be combined together to be used as a training sample.
Step c2: establishing a CART regression tree model of training samples, wherein each tree has replaced extracted equivalent data in a resampling mode for preventing overfitting, and randomly extracting
Figure BDA0002521128480000111
The variables are used as the splitting variable selection range of each tree, and the splitting gain is made to be maximum by selecting the optimal splitting variable and the optimal splitting point.
And c3: and c2, iteratively generating a new CART regression tree to fit the prediction residual of the last tree until the cost function is minimum.
And c4: combining all CART regression trees together to obtain an Xgboost regression model, obtaining variable importance measurement of each variable, wherein the variables with larger measurement values are more likely to be fault variables, and identifying the fault variable with the largest value.
The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims (8)

1. A CSTR fault positioning method based on an Xgboost regression model is characterized by comprising the following steps:
1) Collecting normal data generated by a sensor in the CSTR and unknown off-line data;
2) Establishing a monitoring model of the normal data acquired in the step 1, and freely selecting different monitoring models according to the requirements of different occasions;
3) Establishing a monitoring model through the step 2, bringing the offline unknown data acquired in the step 1 into the monitoring model, extracting sample statistics to detect faults, and screening out fault data;
4) Collecting the fault data in the step 3 as the input of the training sample and the corresponding statistic as the output of the training sample;
5) Establishing an Xgboost regression model of the training sample in the step 4 to obtain variable importance measurement of each variable, wherein the variables with larger measurement values are more likely to be fault variables, and the fault variable with the largest value is identified;
the step 5 specifically comprises the following steps:
5a) For a fault data set with n samples of m variables:
D={(x i ,y i )}(|D|=n,x i ∈R m ,y i ∈R)
where y is a statistic, an Xgboost regression model is defined to predict x in D:
Figure FDA0003845293020000011
wherein K is the number of decision trees; f is a CART regression tree function;
Figure FDA0003845293020000012
is the predicted output;
Figure FDA0003845293020000013
representing a set of possible decision tree functions;
defining the loss function L as:
Figure FDA0003845293020000014
wherein l is a slightly convex function, the difference between the predicted value and the true value is measured, and a mean square error function is selected; Ω (f) is:
Ω(f)=γT+λ||w|| 2 /2
wherein T represents the number of leaves, w represents the weight of the leaves, and lambda and gamma are penalty terms;
5b) Establishing a CART regression tree model for the training samples in the step 4, in order to prevent overfitting, putting back extracted equivalent data for each tree in a resampling mode, and selecting an optimal splitting variable and an optimal splitting point through a greedy algorithm to enable splitting gain to be maximum;
5c) Continuously generating new CART regression tree through the step 5b to fit the prediction residual error of the last CART regression tree, and iterating until the loss function is minimum, wherein the loss function L iterated to the t step (t) Comprises the following steps:
Figure FDA0003845293020000021
and (3) popularizing the Taylor series of the loss function to 2 orders, and moving out the constant term, so that the loss function in the t step becomes:
Figure FDA0003845293020000022
Figure FDA0003845293020000023
wherein g is i 、h i Are respectively provided with
Figure FDA0003845293020000024
About
Figure FDA0003845293020000025
1 and 2 derivatives of; the derivation is carried out on the above formula and the derivation result is 0 to obtain the leaf weight w * And substituting the following formula:
Figure FDA0003845293020000026
wherein the smaller the loss function of the above formula, the better the model fit; selecting an optimal splitting variable and an optimal splitting point through a loss function, and simultaneously calculating splitting gain corresponding to the optimal splitting variable when the optimal splitting point is split;
5d) Combining all CART regression trees together to obtain an Xgboost regression model, dividing the gain sum of each variable during splitting by the corresponding splitting times to obtain an average splitting gain, and dividing the gain sum of each variable by the average splitting gain sum of all variables to obtain the variable importance measurement of the corresponding variable, wherein the variable with larger measurement value is more likely to be a fault variable.
2. The CSTR fault location method based on the Xgboost regression model as claimed in claim 1, wherein in the step 2, the monitoring model of the normal data collected in the step 1 is a PCA monitoring model; the method specifically comprises the following steps:
assuming that a sample set under normal working conditions is X ∈ R n×m N is the number of samples, m is the number of variables; after standardization, the mean value is 0 and the standard deviation is 1; obtaining a covariance matrix S and carrying out singular value decomposition to obtain:
Figure FDA0003845293020000031
wherein P ∈ R m×l
Figure FDA0003845293020000032
Respectively are principal component and residual load vector, l is the number of principal component, lambda,
Figure FDA0003845293020000033
Diagonal arrays respectively composed of principal elements and residual characteristic values;
any one sample can be decomposed into:
Figure FDA0003845293020000034
in the formula, C and
Figure FDA0003845293020000035
representing the projection matrices of the principal component and residual space, respectively.
3. The CSTR fault location method based on Xgboost regression model as claimed in claim 2, characterized by that fault detection is performed by extracting SPE statistics, for SPE statistics there are:
Figure FDA0003845293020000036
SPE statistic control limit can be obtained by sampling distribution, if the statistic exceeds the corresponding control limit, the process is considered to be abnormal, and therefore fault detection is achieved.
4. The CSTR fault location method based on the Xgboost regression model as claimed in claim 1, wherein in step 5c, L is assumed L And L R Respectively, set of left node and right node after division, and let I = I L ∪I R (ii) a The split gain after splitting is:
Figure FDA0003845293020000041
5. the CSTR fault location method based on the Xgboost regression model as claimed in claim 1, wherein in step 2, different monitoring models can be freely selected according to the requirements of different occasions, specifically as follows: the linear model selects PCA and the nonlinear model selects KPCA.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 5 are implemented when the program is executed by the processor.
7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
8. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of any of claims 1 to 5.
CN202010491108.0A 2020-06-02 2020-06-02 CSTR fault positioning method based on Xgboost regression model Active CN111639304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010491108.0A CN111639304B (en) 2020-06-02 2020-06-02 CSTR fault positioning method based on Xgboost regression model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010491108.0A CN111639304B (en) 2020-06-02 2020-06-02 CSTR fault positioning method based on Xgboost regression model

Publications (2)

Publication Number Publication Date
CN111639304A CN111639304A (en) 2020-09-08
CN111639304B true CN111639304B (en) 2023-02-21

Family

ID=72330616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010491108.0A Active CN111639304B (en) 2020-06-02 2020-06-02 CSTR fault positioning method based on Xgboost regression model

Country Status (1)

Country Link
CN (1) CN111639304B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112180893B (en) * 2020-09-15 2021-07-13 郑州轻工业大学 Construction method of fault-related distributed orthogonal neighborhood preserving embedded model in CSTR process and fault monitoring method thereof
CN113156812B (en) * 2021-01-28 2021-11-23 淮阴工学院 Fault detection method for secondary chemical reactor based on unknown input observer
CN112749370B (en) * 2021-04-06 2021-07-02 广东际洲科技股份有限公司 Fault tracking and positioning method and system based on Internet of things

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541050A (en) * 2012-01-05 2012-07-04 浙江大学 Chemical process fault diagnosis method based on improved support vector machine
CN110674842A (en) * 2019-08-26 2020-01-10 明阳智慧能源集团股份公司 Wind turbine generator main shaft bearing fault prediction method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190152011A1 (en) * 2017-11-21 2019-05-23 General Electric Company Predictive cutting tool failure determination

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541050A (en) * 2012-01-05 2012-07-04 浙江大学 Chemical process fault diagnosis method based on improved support vector machine
CN110674842A (en) * 2019-08-26 2020-01-10 明阳智慧能源集团股份公司 Wind turbine generator main shaft bearing fault prediction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Xgboost在滚动轴承故障诊断中的应用;张钰 等;《噪声与振动控制》;20170831;全文 *
基于PCA-RFR的传感器故障定位方法;潘磊 等;《计算机测量与控制》;20200430;全文 *

Also Published As

Publication number Publication date
CN111639304A (en) 2020-09-08

Similar Documents

Publication Publication Date Title
Fezai et al. Online reduced kernel principal component analysis for process monitoring
Choi et al. Fault detection and identification of nonlinear processes based on kernel PCA
Grbić et al. Adaptive soft sensor for online prediction and process monitoring based on a mixture of Gaussian process models
Yu et al. MoniNet with concurrent analytics of temporal and spatial information for fault detection in industrial processes
CN109459993B (en) Online adaptive fault monitoring and diagnosing method for process industrial process
CN111639304B (en) CSTR fault positioning method based on Xgboost regression model
Jia et al. Quality-related fault detection approach based on dynamic kernel partial least squares
Tong et al. Ensemble modified independent component analysis for enhanced non-Gaussian process monitoring
Kariwala et al. A branch and bound method for isolation of faulty variables through missing variable analysis
Chen et al. Probabilistic contribution analysis for statistical process monitoring: A missing variable approach
Shang et al. Recursive dynamic transformed component statistical analysis for fault detection in dynamic processes
Prieto-Moreno et al. Principal components selection for dimensionality reduction using discriminant information applied to fault diagnosis
Bi et al. Large-scale chemical process causal discovery from big data with transformer-based deep learning
Monroy et al. Fault diagnosis of a benchmark fermentation process: a comparative study of feature extraction and classification techniques
Liu et al. Fuzzy decision fusion system for fault classification with analytic hierarchy process approach
Zhang et al. A novel plant-wide process monitoring framework based on distributed Gap-SVDD with adaptive radius
Ge Improved two-level monitoring system for plant-wide processes
CN106354125A (en) Method for utilizing block PCA (Principal Component Analysis) to detect fault of chemical process
Wang et al. Decentralized plant-wide monitoring based on mutual information-Louvain decomposition and support vector data description diagnosis
Li et al. A robust supervised subspace learning approach for output-relevant prediction and detection against outliers
Li et al. Dynamic non-Gaussian hybrid serial modeling for industrial process monitoring
Qin et al. Root cause analysis of industrial faults based on binary extreme gradient boosting and temporal causal discovery network
Chen et al. Root cause diagnosis of oscillation-type plant faults using nonlinear causality analysis
Wang et al. Orthogonal nonnegative matrix factorization based local hidden Markov model for multimode process monitoring
Wang et al. Robust decomposition of kernel function-based nonlinear robust multimode process monitoring

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant