CN111639304A - CSTR fault positioning method based on Xgboost regression model - Google Patents

CSTR fault positioning method based on Xgboost regression model Download PDF

Info

Publication number
CN111639304A
CN111639304A CN202010491108.0A CN202010491108A CN111639304A CN 111639304 A CN111639304 A CN 111639304A CN 202010491108 A CN202010491108 A CN 202010491108A CN 111639304 A CN111639304 A CN 111639304A
Authority
CN
China
Prior art keywords
fault
variable
model
cstr
splitting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010491108.0A
Other languages
Chinese (zh)
Other versions
CN111639304B (en
Inventor
赵忠盖
潘磊
李庆华
刘成林
刘飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202010491108.0A priority Critical patent/CN111639304B/en
Publication of CN111639304A publication Critical patent/CN111639304A/en
Application granted granted Critical
Publication of CN111639304B publication Critical patent/CN111639304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention discloses a CSTR fault positioning method based on an Xgboost regression model. The invention relates to a CSTR fault positioning method based on Xgboost regression, which comprises the following steps: 1) normal data generated by sensors in the CSTR is collected, as well as unknown offline data. 2) And (3) establishing a monitoring model of the normal data acquired in the step (1), and freely selecting different monitoring models according to the requirements of different occasions. 3) And (3) establishing a monitoring model through the step (2), bringing the offline unknown data acquired in the step (1) into the monitoring model, extracting sample statistics to detect faults, and screening out fault data. The invention has the beneficial effects that: 1) the importance of the variables of the Xgboost regression model measures the influence of the variables on the output prediction accuracy, and the calculation of the metric value of each variable is independent from the other, and compared with the prior art, the variable importance measure does not contain components of the action of other variables, so that the influence of the tailing effect is eliminated.

Description

CSTR fault positioning method based on Xgboost regression model
Technical Field
The invention relates to the field of CSTR, in particular to a CSTR fault positioning method based on an Xgboost regression model.
Background
The Continuous Stirred Tank Reactor (CSTR) is a very important reaction device in chemical production and has very wide application. In the production of three large synthetic materials of chemical fiber, plastic and synthetic rubber, the CSTR occupies more than 90% of the synthetic production reactors, and is also widely used in the fields of pharmacy, pesticides, fuels and the like. In view of the wide application of CSTR in the actual production process, it is very valuable to ensure the stability and safety of the operation.
With continuous scale and complication of modern chemical production, huge loss is often caused when faults occurring in the production cannot be accurately identified and timely recovered. With the continuous generation of a large amount of data reflecting process mechanisms in industrial processes, monitoring of industrial processes through data-driven multivariate statistical monitoring models becomes more and more popular.
The traditional technology has the following technical problems:
at present, in the aspect of fault detection based on multivariate statistical analysis, a large number of technical means are applied to the actual industrial process, but fault location is still a technical difficulty to be further solved as an important link to be completed after fault detection. Currently, common fault location methods based on multivariate statistical analysis mainly include a contribution graph method, a reconstruction method and a reconstruction contribution method (RBC), but these methods are susceptible to smearing effect, so that misdiagnosis may occur in practical application. Meanwhile, in systems with different characteristics, such as linearity, nonlinearity, non-gaussian and the like, the traditional fault positioning methods are different from each other, the fault positioning methods are greatly different from each other, and few related technical documents propose a unified method to realize the positioning of the fault source.
Disclosure of Invention
The invention provides a CSTR fault positioning method based on an Xgboost regression model, which comprises the steps of firstly establishing a multivariate statistical monitoring model aiming at normal data collected in a CSTR; screening out a fault data section in offline acquired data through a monitoring model, taking the fault data section as input, taking corresponding statistic as output to establish an Xgboost regression model, taking variable importance measurement as the contribution rate of variables to the statistic, wherein the variables with larger values are more likely to be fault variables, and identifying the maximum variable as the fault variable. The method has the advantages that the Xgboost regression model used by the method is different from fault positioning methods such as a traditional reconstruction contribution method, a partial differential method and the like, can be simultaneously used for fault positioning in nonlinear and linear processes, is small in calculated amount and trailing effect, and has better performance in the aspects of micro fault and random fault positioning of CSTR.
In order to solve the technical problem, the invention provides a CSTR fault positioning method based on an Xgboost regression model, which comprises the following steps:
1) collecting normal data generated by a sensor in the CSTR and unknown off-line data;
2) establishing a monitoring model of the normal data acquired in the step 1, and freely selecting different monitoring models according to the requirements of different occasions;
3) establishing a monitoring model through the step 2, bringing the offline unknown data acquired in the step 1 into the monitoring model, extracting sample statistics to detect faults, and screening out fault data;
4) collecting the fault data in the step 3 as the input of the training sample and the corresponding statistic as the output of the training sample;
5) and (4) establishing an Xgboost regression model of the training sample in the step (4) to obtain variable importance measurement of each variable, wherein the variables with larger measurement values are more likely to be fault variables, and the fault variable with the largest value is identified.
In one embodiment, in step 2, the monitoring model of the normal data collected in step 1 is a PCA monitoring model; the method specifically comprises the following steps:
assume a sample set of X ∈ R under normal operating conditionsn×mN is the number of samples, m is the number of variables; after standardization, the mean value is 0 and the standard deviation is 1; obtaining a covariance matrix S and carrying out singular value decomposition to obtain:
Figure BDA0002521128480000031
wherein P ∈ Rm×l
Figure BDA0002521128480000032
Respectively are principal component and residual load vector, l is the number of principal component, Λ,
Figure BDA0002521128480000033
Diagonal arrays respectively composed of principal elements and residual characteristic values;
any one sample can be decomposed into:
Figure BDA0002521128480000034
in the formula, C and
Figure BDA0002521128480000035
projection matrices representing principal component and residual space, respectively;
in one embodiment, fault detection is performed by extracting SPE statistics, for which:
Figure BDA0002521128480000036
SPE statistic control limit can be obtained by sampling distribution, if the statistic exceeds the corresponding control limit, the process is considered to be abnormal, and therefore fault detection is achieved.
In one embodiment, the step 5 specifically includes the following steps:
5a) for a fault data set with n samples of m variables:
D={(xi,yi)}(|D|=n,xi∈Rm,yi∈R)
where y is a statistic, an Xgboost regression model is defined to predict x in D:
Figure BDA0002521128480000037
wherein K is the number of decision trees; f is a CART regression tree function;
Figure BDA0002521128480000038
is a prediction output;
Figure BDA0002521128480000039
representing a set of possible decision tree functions;
defining the loss function L as:
Figure BDA0002521128480000041
wherein l is a slightly convex function, the difference between the predicted value and the true value is measured, and a mean square error function is selected; Ω (f) is:
Ω(f)=γT+λ||w||2/2
wherein T represents the number of leaves, w represents the weight of the leaves, and lambda and gamma are penalty terms;
5b) establishing a CART regression tree model for the training samples in the step 4, in order to prevent overfitting, putting back extracted equivalent data for each tree in a resampling mode, and selecting an optimal splitting variable and an optimal splitting point through a greedy algorithm to enable splitting gain to be maximum;
5c) fitting the prediction residual of the last CART regression tree by continuously iterating to generate new CART regression trees in the step 5b until the loss function is minimum, wherein the loss function l (t) iterated to the t step is as follows:
Figure BDA0002521128480000042
and (3) popularizing the Taylor series of the loss function to 2 orders, and moving out the constant term, so that the loss function in the t step becomes:
Figure BDA0002521128480000043
Figure BDA0002521128480000044
wherein g isi、hiAre respectively provided with
Figure BDA0002521128480000045
About
Figure BDA0002521128480000046
1 and 2 derivatives of; the derivation is carried out on the above formula and the derivation result is 0 to obtain the leaf weight w*And substituting the following formula:
Figure BDA0002521128480000051
5d) combining all CART regression trees together to obtain an Xgboost regression model, dividing the gain sum of each variable during splitting by the corresponding splitting times to obtain an average splitting gain, and dividing the gain of each variable by the average splitting gain sum of all variables to obtain the variable importance measurement of the corresponding variable, wherein the variable with larger measurement value is more likely to be a fault variable.
In one embodiment, in step 5c, the smaller the loss function of the above formula, the better the model fit is; and selecting the optimal splitting variable and the optimal splitting point through a loss function, and simultaneously calculating the splitting gain corresponding to the optimal splitting variable when the optimal splitting point is split.
In one embodiment, in step 5c, assume LLAnd LRRespectively, set of left node and right node after division, I ═ IL∪IR(ii) a The split gain after splitting is:
Figure BDA0002521128480000052
in one embodiment, in step 2, different monitoring models can be freely selected according to the requirements of different occasions, specifically as follows: the linear model selects PCA and the nonlinear model selects KPCA.
Based on the same inventive concept, the present application also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods when executing the program.
Based on the same inventive concept, the present application also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of any of the methods.
Based on the same inventive concept, the present application further provides a processor for executing a program, wherein the program executes to perform any one of the methods.
The invention has the beneficial effects that:
1) the importance of the variables of the Xgboost regression model measures the influence of the variables on the output prediction accuracy, and the calculation of the metric value of each variable is independent from the other, and compared with the prior art, the variable importance measure does not contain components of the action of other variables, so that the influence of the tailing effect is eliminated.
2) Compared with the existing RBC fault recognition technology, the fault recognition method of the CSRT model has high running speed and can be used for fault location in various occasions such as linearity, nonlinearity, multi-mode and the like.
Drawings
FIG. 1 is a flow chart of fault location in the CSTR fault location method based on the Xgboost regression model.
FIG. 2 is a generation flow chart of the CSTR fault location method based on the Xgboost regression model.
FIG. 3 shows the feed concentration C in the CSTR fault location method based on the Xgboost regression model of the present inventioniRandom disturbance fault identification.
FIG. 4 shows the cooling water temperature T in the CSTR fault location method based on the Xgboost regression modelciAnd identifying zero drift faults.
FIG. 5 is a schematic diagram of a CSTR device in the CSTR fault location method based on the Xgboost regression model.
Detailed Description
The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.
As shown in fig. 1, a CSTR fault location method based on Xgboost regression includes the following steps:
1) normal data generated by sensors in the CSTR is collected, as well as unknown offline data.
2) And (3) establishing a monitoring model of the normal data acquired in the step (1), and freely selecting different monitoring models according to the requirements of different occasions.
3) And (3) establishing a monitoring model through the step (2), bringing the offline unknown data acquired in the step (1) into the monitoring model, extracting sample statistics to detect faults, and screening out fault data.
4) And collecting the fault data in the step 3 as the input of the training sample and the corresponding statistic as the output of the training sample.
5) And (4) establishing an Xgboost regression model of the training sample in the step (4) to obtain variable importance measurement of each variable, wherein the variables with larger measurement values are more likely to be fault variables, and the fault variable with the largest value is identified.
The step 2 specifically comprises the following steps:
2a) for the establishment of the monitoring model, the PCA monitoring model is taken as an example in the invention, and the sample set under the normal working condition is assumed to be X ∈ Rn×mN is the number of samples, and m is the number of variables. After the normalization treatment, the mean value was set to 0 and the standard deviation was set to 1. Obtaining a covariance matrix S and carrying out singular value decomposition to obtain:
Figure BDA0002521128480000071
wherein P ∈ Rm×l
Figure BDA0002521128480000072
Are respectively principal elementsAnd residual load vector, where l is the number of principal elements, Λ,
Figure BDA0002521128480000073
And the diagonal matrixes are formed by principal elements and residual characteristic values respectively.
Any one sample can be decomposed into:
Figure BDA0002521128480000074
in the formula, C and
Figure BDA0002521128480000075
the projection matrices represent principal component and residual space, respectively.
Carry out fault detection through extracting SPE statistics, to SPE statistics have:
Figure BDA0002521128480000076
SPE statistic control limit can be obtained by sampling distribution, if the statistic exceeds the corresponding control limit, the process is considered to be abnormal, and therefore fault detection is achieved.
The step 5 specifically comprises the following steps:
5a) for a fault data set with n samples of m variables:
D={(xi,yi)}(|D|=n,xi∈Rm,yi∈R)
where y is a statistic, an Xgboost regression model is defined to predict x in D:
Figure BDA0002521128480000081
wherein K is the number of decision trees; f is a CART regression tree function;
Figure BDA0002521128480000082
is a prediction output;
Figure BDA0002521128480000083
representing a set of possible decision tree functions.
Defining the loss function L as:
Figure BDA0002521128480000084
where l is a slightly convex function, the difference between the predicted value and the true value is measured, where the mean square error function is selected. Ω (f) is:
Ω(f)=γT+λ||w||2/2
wherein, T represents the number of leaves, w represents the weight of the leaves, and λ and γ are penalty terms.
5b) And (4) establishing a CART regression tree model for the training samples in the step (4), in order to prevent overfitting, putting back extracted equivalent data in each tree in a resampling mode, and selecting an optimal splitting variable and an optimal splitting point through a greedy algorithm to enable splitting gain to be maximum.
5c) Fitting the prediction residual of the last CART regression tree by continuously iterating to generate new CART regression trees in the step 5b until the loss function is minimum, wherein the loss function l (t) iterated to the t step is as follows:
Figure BDA0002521128480000091
and (3) popularizing the Taylor series of the loss function to 2 orders, and moving out the constant term, so that the loss function in the t step becomes:
Figure BDA0002521128480000092
Figure BDA0002521128480000093
wherein g isi、hiAre respectively provided with
Figure BDA0002521128480000094
About
Figure BDA0002521128480000095
1 and 2 derivatives of. The derivation is carried out on the above formula and the derivation result is 0 to obtain the leaf weight w*And substituting the following formula:
Figure BDA0002521128480000096
the smaller the loss function of the above formula, the better the model fit. And selecting the optimal splitting variable and the optimal splitting point through a loss function, and simultaneously calculating the splitting gain corresponding to the optimal splitting variable when the optimal splitting point is split.
Suppose LLAnd LRRespectively, set of left node and right node after division, I ═ IL∪IR. The split gain after splitting is:
Figure BDA0002521128480000097
5d) combining all CART regression trees together to obtain an Xgboost regression model, dividing the gain sum of each variable during splitting by the corresponding splitting times to obtain an Average splitting gain (Average gain), and dividing the gain of each variable by the Average splitting gain sum of all variables to obtain a variable importance measure (VariableImport) of the corresponding variable, wherein the variable with larger measure value is more likely to be a fault variable.
A specific application scenario of the present invention is given below:
taking sample data collected by a certain CSTR equipment as an example, the data comprises normal working condition data and fault data. As shown in FIG. 5, the model contains the feed concentration CiTemperature T of feedi(ii) a Discharge concentration C and discharge temperature T; cooling water inlet temperature TciCooling water outlet temperature TcAnd cooling water flow rate Qc
The Xgboost regression fault identification method is compared with the existing RBC identification method for verification, and FIG. 3 shows that the two methods are used for the feed concentration CiComparison of random disturbance fault recognition effectsIt is evident that the Xgboost regression method can effectively remove the effects of the smearing effect, although the RBC method contribution rate is the largest also variable CiHowever, it is clear that the tailing effect is severe, and FIG. 4 shows the two methods for the cooling water temperature TciThe random interference fault identification effects are compared, and the Xgboost regression method is proved to be capable of effectively removing the tailing effect compared with the RBC and aiming at the fault variable TciThe recognition effect is better.
In summary, compared with the RBC method, the Xgboost regression model-based fault location method provided by the invention can effectively identify fault variables under the PCA model, and is not affected by the smearing effect. The PCA monitoring model is only an example for clearly illustrating the present invention, and is not a limitation on the fault detection method implemented by the present invention, and the Xgboost regression model may be combined with the PCA monitoring model, or may be combined with other multivariate statistical monitoring models such as KPCA to realize the positioning of the fault by extracting statistics.
The CSTR fault location method based on the Xgboost regression model provided by the present invention is described in detail above, and the following points need to be explained:
a CSTR fault positioning method based on an Xgboost regression model is characterized by comprising the following steps: the method comprises the following steps in sequence:
a) normal data generated by sensors in the CSTR is collected, as well as unknown offline data.
b) And (b) establishing a monitoring model of the normal data acquired in the step (a), and freely selecting different monitoring models according to the requirements of different occasions, such as linear model selection PCA and nonlinear model selection KPCA.
c) And b, building a monitoring model through the step b, bringing the offline unknown data collected in the step a into the monitoring model, detecting whether a fault exists, screening out fault data if the fault exists, and performing next fault positioning operation.
d) And c, collecting the fault data in the step c as the input of the training sample and the corresponding statistic as the output of the training sample.
e) And d, establishing an Xgboost regression model of the training samples in the step d to obtain variable importance measurement of each variable, wherein the variables with larger measurement values are more likely to be fault variables, and identifying the fault variable with the largest value.
In the step b, correspondingly different multivariate statistical monitoring models such as a linear PCA model, a nonlinear KPCA model and the like can be selected for different system characteristics, and all the methods can be combined with an Xgboost regression method to perform fault location.
3. The Xgboost regression model-based industrial process fault location method of claim 1. The method is characterized in that: the step c specifically comprises the following steps:
step c 1: and taking the fault data after the monitoring model is screened as input, and taking the corresponding statistic as output to be combined together to be used as a training sample.
Step c 2: establishing CART regression tree model of training sample, in order to prevent overfitting, each tree has replaced extraction equivalent data in a resampling mode, and random extraction is adopted
Figure BDA0002521128480000111
The variables are used as the splitting variable selection range of each tree, and the splitting gain is made to be maximum by selecting the optimal splitting variable and the optimal splitting point.
Step c 3: iteratively generating a new CART regression tree to fit the prediction residual of the last tree through step c2, iterating until the cost function is minimal.
Step c 4: combining all CART regression trees together to obtain an Xgboost regression model, obtaining variable importance measurement of each variable, wherein the variables with larger measurement values are more likely to be fault variables, and identifying the fault variable with the largest value.
The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims (10)

1. A CSTR fault positioning method based on an Xgboost regression model is characterized by comprising the following steps:
1) collecting normal data generated by a sensor in the CSTR and unknown off-line data;
2) establishing a monitoring model of the normal data acquired in the step 1, and freely selecting different monitoring models according to the requirements of different occasions;
3) establishing a monitoring model through the step 2, bringing the offline unknown data acquired in the step 1 into the monitoring model, extracting sample statistics to detect faults, and screening out fault data;
4) collecting the fault data in the step 3 as the input of the training sample and the corresponding statistic as the output of the training sample;
5) and (4) establishing an Xgboost regression model of the training sample in the step (4) to obtain variable importance measurement of each variable, wherein the variables with larger measurement values are more likely to be fault variables, and the fault variable with the largest value is identified.
2. The CSTR fault location method based on the Xgboost regression model as claimed in claim 1, wherein in the step 2, the monitoring model of the normal data collected in the step 1 is a PCA monitoring model; the method specifically comprises the following steps:
assume a sample set of X ∈ R under normal operating conditionsn×mN is the number of samples, m is the number of variables; after standardization, the mean value is 0 and the standard deviation is 1; obtaining a covariance matrix S and carrying out singular value decomposition to obtain:
Figure FDA0002521128470000011
wherein P ∈ Rm×l
Figure FDA0002521128470000012
Respectively are principal component and residual load vector, l is the number of principal component, Λ,
Figure FDA0002521128470000013
Diagonal arrays respectively composed of principal elements and residual characteristic values;
any one sample can be decomposed into:
Figure FDA0002521128470000014
in the formula, C and
Figure FDA0002521128470000015
the projection matrices represent principal component and residual space, respectively.
3. The CSTR fault location method based on Xgboost regression model as claimed in claim 2, characterized by that fault detection is performed by extracting SPE statistics, for SPE statistics there are:
Figure FDA0002521128470000021
SPE statistic control limit can be obtained by sampling distribution, if the statistic exceeds the corresponding control limit, the process is considered to be abnormal, and therefore fault detection is achieved.
4. The CSTR fault location method based on the Xgboost regression model as claimed in claim 1, wherein said step 5 comprises the following steps:
5a) for a fault data set with n samples of m variables:
D={(xi,yi)}(|D|=n,xi∈Rm,yi∈R)
where y is a statistic, an Xgboost regression model is defined to predict x in D:
Figure FDA0002521128470000022
wherein K is the number of decision trees; f is a CART regression tree function;
Figure FDA0002521128470000023
is a prediction output;
Figure FDA0002521128470000024
representing a set of possible decision tree functions;
defining the loss function L as:
Figure FDA0002521128470000025
wherein l is a slightly convex function, the difference between the predicted value and the true value is measured, and a mean square error function is selected; Ω (f) is:
Ω(f)=γT+λ||w||2/2
wherein T represents the number of leaves, w represents the weight of the leaves, and lambda and gamma are penalty terms;
5b) establishing a CART regression tree model for the training samples in the step 4, in order to prevent overfitting, putting back extracted equivalent data for each tree in a resampling mode, and selecting an optimal splitting variable and an optimal splitting point through a greedy algorithm to enable splitting gain to be maximum;
5c) continuously generating new CART regression tree through the step 5b to fit the prediction residual error of the last CART regression tree, and iterating until the loss function is minimum, wherein the loss function L iterated to the t step(t)Comprises the following steps:
Figure FDA0002521128470000031
and (3) popularizing the Taylor series of the loss function to 2 orders, and moving out the constant term, so that the loss function in the t step becomes:
Figure FDA0002521128470000032
Figure FDA0002521128470000033
wherein g isi、hiAre respectively provided with
Figure FDA0002521128470000034
About
Figure FDA0002521128470000035
1 and 2 derivatives of; the derivation is carried out on the above formula and the derivation result is 0 to obtain the leaf weight w*And substituting the following formula:
Figure FDA0002521128470000036
5d) combining all CART regression trees together to obtain an Xgboost regression model, dividing the gain sum of each variable during splitting by the corresponding splitting times to obtain an average splitting gain, and dividing the gain of each variable by the average splitting gain sum of all variables to obtain the variable importance measurement of the corresponding variable, wherein the variable with larger measurement value is more likely to be a fault variable.
5. The CSTR fault location method based on the Xgboost regression model as claimed in claim 4, wherein in step 5c, the smaller the loss function of the above formula, the better the model fit; and selecting the optimal splitting variable and the optimal splitting point through a loss function, and simultaneously calculating the splitting gain corresponding to the optimal splitting variable when the optimal splitting point is split.
6. The CSTR fault location method based on the Xgboost regression model as claimed in claim 4, wherein in step 5c, L is assumedLAnd LRRespectively, set of left node and right node after division, I ═ IL∪IR(ii) a The split gain after splitting is:
Figure FDA0002521128470000041
7. the CSTR fault location method based on the Xgboost regression model as claimed in claim 1, wherein in step 2, different monitoring models can be freely selected according to the requirements of different occasions, specifically as follows: the linear model selects PCA and the nonlinear model selects KPCA.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the program is executed by the processor.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
10. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of any of claims 1 to 7.
CN202010491108.0A 2020-06-02 2020-06-02 CSTR fault positioning method based on Xgboost regression model Active CN111639304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010491108.0A CN111639304B (en) 2020-06-02 2020-06-02 CSTR fault positioning method based on Xgboost regression model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010491108.0A CN111639304B (en) 2020-06-02 2020-06-02 CSTR fault positioning method based on Xgboost regression model

Publications (2)

Publication Number Publication Date
CN111639304A true CN111639304A (en) 2020-09-08
CN111639304B CN111639304B (en) 2023-02-21

Family

ID=72330616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010491108.0A Active CN111639304B (en) 2020-06-02 2020-06-02 CSTR fault positioning method based on Xgboost regression model

Country Status (1)

Country Link
CN (1) CN111639304B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112180893A (en) * 2020-09-15 2021-01-05 郑州轻工业大学 Construction and application of fault-related distributed orthogonal neighborhood preserving embedded model in CSTR (continuous stirred tank reactor) process
CN112749370A (en) * 2021-04-06 2021-05-04 广东际洲科技股份有限公司 Fault tracking and positioning method and system based on Internet of things
CN113156812A (en) * 2021-01-28 2021-07-23 淮阴工学院 Fault detection method for secondary chemical reactor based on unknown input observer

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541050A (en) * 2012-01-05 2012-07-04 浙江大学 Chemical process fault diagnosis method based on improved support vector machine
US20190152011A1 (en) * 2017-11-21 2019-05-23 General Electric Company Predictive cutting tool failure determination
CN110674842A (en) * 2019-08-26 2020-01-10 明阳智慧能源集团股份公司 Wind turbine generator main shaft bearing fault prediction method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541050A (en) * 2012-01-05 2012-07-04 浙江大学 Chemical process fault diagnosis method based on improved support vector machine
US20190152011A1 (en) * 2017-11-21 2019-05-23 General Electric Company Predictive cutting tool failure determination
CN110674842A (en) * 2019-08-26 2020-01-10 明阳智慧能源集团股份公司 Wind turbine generator main shaft bearing fault prediction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张钰 等: "Xgboost在滚动轴承故障诊断中的应用", 《噪声与振动控制》 *
潘磊 等: "基于PCA-RFR的传感器故障定位方法", 《计算机测量与控制》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112180893A (en) * 2020-09-15 2021-01-05 郑州轻工业大学 Construction and application of fault-related distributed orthogonal neighborhood preserving embedded model in CSTR (continuous stirred tank reactor) process
CN113156812A (en) * 2021-01-28 2021-07-23 淮阴工学院 Fault detection method for secondary chemical reactor based on unknown input observer
CN112749370A (en) * 2021-04-06 2021-05-04 广东际洲科技股份有限公司 Fault tracking and positioning method and system based on Internet of things

Also Published As

Publication number Publication date
CN111639304B (en) 2023-02-21

Similar Documents

Publication Publication Date Title
Fezai et al. Online reduced kernel principal component analysis for process monitoring
Choi et al. Fault detection and identification of nonlinear processes based on kernel PCA
Yu et al. MoniNet with concurrent analytics of temporal and spatial information for fault detection in industrial processes
Grbić et al. Adaptive soft sensor for online prediction and process monitoring based on a mixture of Gaussian process models
CN107632592B (en) Nonlinear time-varying process fault monitoring method based on efficient recursion kernel principal component analysis
CN111639304B (en) CSTR fault positioning method based on Xgboost regression model
Tong et al. Ensemble modified independent component analysis for enhanced non-Gaussian process monitoring
Kariwala et al. A branch and bound method for isolation of faulty variables through missing variable analysis
Khediri et al. Variable window adaptive kernel principal component analysis for nonlinear nonstationary process monitoring
Chen et al. Probabilistic contribution analysis for statistical process monitoring: A missing variable approach
Shang et al. Recursive dynamic transformed component statistical analysis for fault detection in dynamic processes
Prieto-Moreno et al. Principal components selection for dimensionality reduction using discriminant information applied to fault diagnosis
CN103678936B (en) Exceptional part localization method in a kind of multi-part engineering system
CN112904810B (en) Process industry nonlinear process monitoring method based on effective feature selection
Monroy et al. Fault diagnosis of a benchmark fermentation process: a comparative study of feature extraction and classification techniques
Ge Improved two-level monitoring system for plant-wide processes
Zhang et al. A novel plant-wide process monitoring framework based on distributed Gap-SVDD with adaptive radius
Li et al. A robust supervised subspace learning approach for output-relevant prediction and detection against outliers
Wang et al. Decentralized plant-wide monitoring based on mutual information-Louvain decomposition and support vector data description diagnosis
CN113703422B (en) Gas turbine pneumatic actuator fault diagnosis method based on feature analysis processing
CN111062848A (en) Intelligent monitoring method for monitoring abnormal state of fire-fighting engineering
Liu et al. Structured sequential Gaussian graphical models for monitoring time-varying process
Wang et al. Orthogonal nonnegative matrix factorization based local hidden Markov model for multimode process monitoring
CN110244690B (en) Multivariable industrial process fault identification method and system
Fei et al. Online process monitoring for complex systems with dynamic weighted principal component analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant