WO2023138140A1 - 基于宽度混合森林回归的mswi过程二噁英排放软测量方法 - Google Patents

基于宽度混合森林回归的mswi过程二噁英排放软测量方法 Download PDF

Info

Publication number
WO2023138140A1
WO2023138140A1 PCT/CN2022/127864 CN2022127864W WO2023138140A1 WO 2023138140 A1 WO2023138140 A1 WO 2023138140A1 CN 2022127864 W CN2022127864 W CN 2022127864W WO 2023138140 A1 WO2023138140 A1 WO 2023138140A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
layer
mixed forest
mapping
follows
Prior art date
Application number
PCT/CN2022/127864
Other languages
English (en)
French (fr)
Inventor
汤健
夏恒
崔璨麟
乔俊飞
Original Assignee
北京工业大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京工业大学 filed Critical 北京工业大学
Publication of WO2023138140A1 publication Critical patent/WO2023138140A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Definitions

  • the invention relates to the technical field of soft measurement of dioxin emission, in particular to a method for soft measurement of dioxin emission in MSWI process based on width mixed forest regression.
  • MSWI Municipal Solid Waste Incineration
  • DXN Dioxin
  • HRGC/HRMS high-resolution gas chromatography-high-resolution mass spectrometry
  • the online indirect detection method using the DXN related substances that can be detected online to build an association model and then indirectly obtain the concentration of DXN has become a hotspot; however, it has problems such as complex equipment, high cost, many interference factors, and unguaranteed prediction accuracy.
  • it is essentially a detection method combined with data modeling.
  • the soft sensor technology driven by the easy-to-detect process data collected by the industrial distributed control system is an effective way to solve the problem that DXN cannot be detected online, and it has the characteristics of stability, accuracy and fast response.
  • Soft sensing technology has been widely used in the detection of difficult parameters in complex industrial processes such as petroleum, chemical industry and steelmaking.
  • the object of the present invention is to provide a kind of soft measurement method of dioxin emission in MSWI process based on broad hybrid forest regression, aim at the detection of DXN emission concentration in MSWI process, propose a soft sensor modeling algorithm based on broad hybrid forest regression (Broad Hybrid Forest Regression, BHFR).
  • the present invention provides the following scheme:
  • a soft sensor method for MSWI process dioxin emission based on wide mixed forest regression Based on the BLS framework, a non-differential base learner is used to replace neurons to construct a BHFR soft sensor model for high-dimensional data with small samples.
  • the BHFR soft sensor model includes the construction of a feature mapping layer, a latent feature extraction layer, a feature enhancement layer, and an incremental learning layer, specifically including the following steps:
  • S1 build a feature mapping layer, build a mixed forest group composed of random forest RF and complete random forest CRF to map high-dimensional features;
  • step S1 constructing a feature mapping layer, constructing a mixed forest group composed of random forest RF and complete random forest CRF to map high-dimensional features, specifically including:
  • the original data be ⁇ X,y ⁇ , where is the original input data, N Raw is the number of raw data, M is the dimension of the original input data, which comes from six different stages of the MSWI process, and is collected and stored in the DCS system in seconds, is the output true value of the DXN emission concentration, which comes from the emission DXN detection sample obtained by the offline detection method; taking the nth mixed forest group of the feature mapping layer as an example to describe the modeling process of the feature mapping layer:
  • L represents the number of leaf nodes in the decision tree
  • I( ) represents the indicator function
  • c l is calculated by recursive splitting
  • the splitting loss function ⁇ i ( ) of a decision tree in RF is expressed as:
  • ⁇ i (s, v) represents the value v of the sth feature as the loss function value of the segmentation criterion
  • y L represents the true value vector of DXN emission concentration of the left leaf node
  • E[y L ] represents the mathematical expectation of y L
  • y R represents the true value vector of DXN emission concentration of the right leaf node
  • E[y R ] represents the mathematical expectation of y R
  • Indicates the true value of the i-th DXN emission concentration of the left leaf node Indicates the true value of the i-th DXN emission concentration of the right leaf node
  • c L represents the predicted output of the left leaf node DXN emission concentration
  • c R represents the predicted output of the right leaf node DXN emission concentration
  • N L and NR respectively represent and The number of samples in ;
  • DXN emission concentration prediction output value output value of current left and right tree nodes and is the expectation of the true value of the sample, as follows:
  • y L and y R represent and In the DXN emission concentration truth vector, E[y L ] and E[y R ] represent the mathematical expectations of y L and y R ;
  • the decision tree splitting in CRF adopts a completely random selection method, expressed as,
  • the nth mixed forest group can be expressed as,
  • nth mapping feature Z n can be expressed as
  • Z 1 is the first mapping feature
  • Z 2 is the second mapping feature
  • Z N is the Nth mapping feature
  • the mapping feature matrix Z N contains N Raw samples and 2N-dimensional features.
  • step S2 is to construct a latent feature extraction layer, perform latent feature extraction on the feature space of the fully connected mixing matrix according to the contribution rate, ensure the maximum transfer of potentially valuable information and minimize redundancy based on information measurement criteria, and reduce model complexity and calculation consumption, specifically including:
  • A contains N Raw samples and (M+2N) dimensional features
  • PCA is used here to minimize the redundant information in A and calculate the correlation matrix R of A, as follows:
  • U (M+2N) represents the (M+2N) order orthogonal matrix
  • ⁇ (M+2N) represents the (M+2N) order diagonal matrix
  • V (M+2N) represents the (M+2N) order orthogonal matrix
  • ⁇ 1 > ⁇ 2 >...> ⁇ (M+2N) represents the eigenvalues arranged from large to small
  • a set of feature values is obtained The corresponding eigenvector matrix That is, the projection matrix of A; then, feature projection is performed on A to minimize redundant information, and the obtained potential features are denoted as X PCA , namely
  • the information maximization selection mechanism is used to ensure the correlation between the selected latent features and the true value, which is expressed as:
  • a feature enhancement layer is constructed, and the feature enhancement layer is trained based on the extracted potential features to further enhance the feature representation ability, specifically including:
  • X′ and y are the input and output of the new training set, represents the Bootstrap sampling of the kth mixed forest group, Indicates the RSM sampling of the kth mixed forest group;
  • c l is calculated by recursive splitting, the specific process formula (3)-(5);
  • the RF model in the kth mixed forest group in the feature enhancement layer can be obtained, which is expressed as,
  • the CRF model of the kth mixed forest group in the feature enhancement layer can be obtained, which is expressed as,
  • the kth mixed forest group is obtained Furthermore, the kth enhanced feature can be expressed as follows:
  • H 1 is the first enhanced feature
  • H 2 is the second enhanced feature
  • H K is the Kth enhanced feature
  • the representation of the BHFR model is as follows:
  • W K represents the feature mapping layer and the weight between the feature enhancement layer and the output layer, which is calculated as follows:
  • represents the identity matrix
  • represents the coefficient of the regularization term
  • step S4 is to build an incremental learning layer, build an incremental learning layer through an incremental learning strategy, and use the Moore-Penrose pseudo-inverse to obtain a weight matrix, thereby realizing high-precision modeling of the BHFR soft sensor model, specifically including:
  • X′ and y are the input and output of the new training set, and Indicates the Bootstrap sampling and RSM sampling of the pth mixed forest group in the incremental learning layer;
  • the output G K+1 of the feature mapping layer, feature incremental layer and incremental learning layer is expressed as follows:
  • G k [Z n
  • G K+1 contains N Raw samples and (2N+2K+2J) dimensional features
  • W K ( ⁇ +[G K ] T G K ) -1 [G K ] T Y;
  • the convergence threshold of the definition error is ⁇ Con to determine the number p of the mixed forest group in incremental learning; correspondingly, the incremental learning training error of the BHFR model is expressed as follows:
  • the present invention discloses the following technical effects: the MSWI process dioxin emission soft sensor method based on width mixed forest regression provided by the present invention establishes a soft sensor model based on BHFR, which combines algorithms such as width learning modeling, integrated learning and latent feature extraction.
  • the internal information is processed to effectively ensure the maximum transmission of the internal feature information and the minimum redundancy of the BHFR model; 3)
  • the mixed forest group is used as the mapping unit to realize the incremental learning of the modeling process, and the weight matrix of the output layer is quickly calculated through the pseudo-inverse strategy, and then the incremental learning is adaptively adjusted by the convergence degree of the training error, and high-precision soft sensor modeling is realized.
  • the effectiveness and rationality of the proposed method are verified on high-dimensional benchmark datasets and industrial process DXN datasets.
  • Fig. 1 is the flow chart of the MSWI process dioxin emission soft measurement method based on width mixed forest regression according to an embodiment of the present invention
  • Fig. 2 is a process flow chart of the municipal solid waste incineration process in the embodiment of the present invention
  • Fig. 3 is the training error convergence curve of the embodiment of the present invention.
  • Fig. 4a is the fitting curve of the training set in the DXN dataset of the embodiment of the present invention.
  • Fig. 4b is the fitting curve of the verification set in the DXN data set of the embodiment of the present invention.
  • Fig. 4c is a fitting curve of the test set in the DXN dataset of the embodiment of the present invention.
  • the object of the present invention is to provide a kind of soft measurement method of dioxin emission in MSWI process based on broad hybrid forest regression, aim at the detection of DXN emission concentration in MSWI process, propose a soft sensor modeling algorithm based on broad hybrid forest regression (Broad Hybrid Forest Regression, BHFR).
  • the MSWI process dioxin emission soft measurement method based on width mixed forest regression comprises the following steps:
  • the BHFR soft sensor model for small-sample high-dimensional data is constructed by replacing neurons with non-differential base learners.
  • the BHFR soft sensor model includes the construction of a feature mapping layer, a latent feature extraction layer, a feature enhancement layer, and an incremental learning layer. Specifically, it includes the following steps:
  • S1 build a feature mapping layer, build a mixed forest group composed of random forest RF and complete random forest CRF to map high-dimensional features;
  • the MSWI process includes solid waste storage and transportation, solid waste incineration, waste heat boiler, steam power generation, flue gas purification, and flue gas discharge. Taking the grate-type MSWI process with a daily processing capacity of 800 tons as an example, the process flow is shown in Figure 2.
  • Solid waste storage and transportation stage sanitation vehicles transport MSW from various collection sites in the city to MSWI power plant, and dump it from the unloading platform to the unfermented area in the solid waste storage tank after weighing and recording, then mix and stir it with the solid waste grab bucket, and then grab it to the fermentation area, and ferment and dehydrate for 3 to 7 days to ensure the low calorific value of MSW incineration.
  • Native MSW contains trace amounts of DXN (about 0.8ng TEQ/Kg), and contains a variety of chlorine-containing compounds required for DXN formation reactions.
  • Solid waste incineration stage The solid waste grabber puts the fermented MSW into the feeding hopper, and pushes the MSW into the incinerator through the feeder. After drying, burning 1, burning 2 and burning the grate, the combustible components in the MSW are completely burned; the required combustion air is injected from the bottom of the grate and the middle of the furnace by the primary fan and the secondary fan, and the ash generated by the final combustion falls from the end of the burning grate to the slag extractor, and is sent to the slag pool after water cooling.
  • the furnace combustion process In order to ensure that the DXN contained in the original MSW and produced during incineration can be completely decomposed under the high-temperature combustion conditions in the furnace, the furnace combustion process must strictly control the flue gas temperature above 850 ° C, the residence time of the high-temperature flue gas in the furnace exceeds 2 seconds, and ensure sufficient flue gas turbulence.
  • Waste heat boiler stage The high-temperature flue gas (higher than 850°C) generated by the furnace enters the waste heat boiler system through the induced draft fan, and passes through the superheater, evaporator and economizer equipment successively. After heat exchange between the high-temperature flue gas and the liquid water in the boiler drum, high-temperature steam is generated, and then the cooling treatment of the high-temperature flue gas is realized, so that the flue gas temperature at the waste heat boiler outlet is lower than 200°C (ie, flue gas G1).
  • Steam power generation stage use the high-temperature steam generated by the waste heat boiler to drive the turbogenerator, convert mechanical energy into electrical energy, realize self-sufficiency in power consumption at the plant level and grid-connected power supply of surplus power, realize resource utilization and obtain economic benefits.
  • Flue gas purification stage The flue gas purification of the MSWI process mainly includes a series of processes such as denitrification (NO x ), desulfurization (HCL, HF, SO 2 , etc.), heavy metal removal (Pb, Hg, Cd, etc.), adsorption of dioxins (DXN) and dust removal (particulate matter), so as to achieve the goal of meeting the emission standards of incineration flue gas pollutants.
  • the use of activated carbon injection system to adsorb DXN in the incineration flue gas is the most widely used technical means at present, and the absorbed DXN is enriched in the fly ash.
  • Flue gas discharge stage After cooling and purification, the incineration flue gas containing a small amount of DXN (ie, flue gas G2) is sucked by the induced draft fan and discharged into the atmosphere through the chimney.
  • DXN a small amount of DXN
  • the uninterrupted and long-term operation characteristics of the MSWI process lead to a large amount of DXN attached to the particles on the inner wall of the chimney (that is, the memory effect), and the possibility of release under what working conditions is still a difficult research problem at present.
  • the research on DXN soft sensor detection for MSWI process mainly focuses on the detection of DXN concentration in the emission stage (ie, flue gas G3).
  • the research focus of this application is to build a soft sensor model at G3 flue gas.
  • the BHFR modeling strategy proposed in this application includes four main parts: feature mapping layer, latent feature extraction layer, feature enhancement layer and incremental learning layer.
  • N Raw is the number of raw data
  • M is the dimension of the original input data, which comes from the six different stages of the above-mentioned MSWI process, and is collected and stored in the DCS system in seconds, is the true output value of the DXN emission concentration, which is derived from the dioxin DXN detection sample obtained by the off-line detection method
  • ⁇ DT 1 ,...,DT J ⁇ represents J decision tree models in the mixed forest algorithm
  • DT 1 is the first decision tree model
  • DT J is the Jth decision tree model
  • Bootstrap and RSM represent samples and feature sampling of the input data
  • Z N represents the output of the feature mapping layer
  • H K represents the output of the feature enhancement layer
  • Z N ] represents the fully connected mixing matrix of the original data and Z N ;
  • Feature mapping layer the original input data will be derived from six different stages of the MSWI process Group of N mixed forests through feature map layer Perform feature mapping to obtain the mapping output matrix Z N ;
  • Latent feature extraction layer use principal component analysis to analyze the original input data
  • Z N ] composed of the output Z N of the feature mapping layer is used to extract potential features, remove redundant information in the feature space, and further determine the potential feature dimension through the mutual information between the extracted potential features and the output true value y of DXN emission concentration and obtain a new training set
  • Feature enhancement layer with new training set As input, a group of K mixed forests passed through the feature enhancement layer group to perform feature mapping to obtain the enhancement layer output matrix H K ;
  • Incremental learning layer with new training set As an input, the weight W K+P is gradually increased and updated with the mixed forest group as the minimum unit until the training error converges.
  • BHFR uses a mixed forest group composed of RF and CRF as the basic mapping unit to replace the neurons in the original BLS; the step S1 constructs a feature mapping layer, and constructs a mixed forest group composed of random forest RF and complete random forest CRF to map high-dimensional features, specifically including:
  • the original data be ⁇ X,y ⁇ , where is the original input data, N Raw is the number of raw data, M is the dimension of the original input data, which comes from six different stages of the MSWI process, and is collected and stored in the DCS system in seconds, is the output true value of the DXN emission concentration, which is derived from the emission DXN detection sample obtained by the offline detection method; taking the nth mixed forest group of the feature mapping layer as an example to describe the modeling process of the feature mapping layer:
  • L represents the number of leaf nodes in the decision tree
  • I( ) represents the indicator function
  • c l is calculated by recursive splitting
  • the splitting loss function ⁇ i ( ) of a decision tree in RF is expressed as:
  • ⁇ i (s, v) represents the value v of the sth feature as the loss function value of the segmentation criterion
  • y L represents the true value vector of DXN emission concentration of the left leaf node
  • E[y L ] represents the mathematical expectation of y L
  • y R represents the true value vector of DXN emission concentration of the right leaf node
  • E[y R ] represents the mathematical expectation of y R
  • c L represents the predicted output of the DXN emission concentration of the left leaf node
  • c R represents the predicted output of the DXN emission concentration of the right leaf node
  • N L and NR respectively represent and The number of samples in ;
  • DXN emission concentration prediction output value output value of current left and right tree nodes and is the expectation of the true value of the sample, as follows:
  • y L and y R represent and In the DXN emission concentration truth vector, E[y L ] and E[y R ] represent the mathematical expectations of y L and y R ;
  • the decision tree splitting in CRF adopts a completely random selection method, expressed as,
  • the nth mixed forest group can be expressed as,
  • nth mapping feature Z n can be expressed as
  • Z 1 is the first mapping feature
  • Z 2 is the second mapping feature
  • Z N is the Nth mapping feature
  • the mapping feature matrix Z N contains N Raw samples and 2N-dimensional features.
  • the BHFR proposed in this application adopts the full connection strategy to realize the information transmission between the feature mapping layer, feature enhancement layer and incremental learning layer.
  • Principal Component Analysis is used here to extract the potential features of the fully connected mixed matrix feature space, and then the mutual information is used to further screen the potential features related to the maximization of true value information, thereby realizing the dimensionality reduction processing of high-dimensional data.
  • the step S2 is to construct a latent feature extraction layer, perform latent feature extraction on the feature space of the fully connected mixing matrix according to the contribution rate, ensure the maximum transfer of potentially valuable information and minimize redundancy based on the information measurement criterion, and reduce model complexity and calculation consumption, specifically including:
  • A contains N Raw samples and (M+2N) dimensional features
  • PCA is used here to minimize the redundant information in A and calculate the correlation matrix R of A, as follows:
  • U (M+2N) represents the (M+2N) order orthogonal matrix
  • ⁇ (M+2N) represents the (M+2N) order diagonal matrix
  • V (M+2N) represents the (M+2N) order orthogonal matrix
  • ⁇ 1 > ⁇ 2 >...> ⁇ (M+2N) represents the eigenvalues arranged from large to small
  • the information maximization selection mechanism is used to ensure the correlation between the selected latent features and the true value, which is expressed as:
  • a feature enhancement layer is constructed, and the feature enhancement layer is trained based on the extracted potential features to further enhance the feature representation ability, specifically including:
  • X′ and y are the input and output of the new training set, represents the Bootstrap sampling of the kth mixed forest group, Indicates the RSM sampling of the kth mixed forest group;
  • c l is calculated by recursive splitting, the specific process formula (3)-(5);
  • the RF model in the kth mixed forest group in the feature enhancement layer can be obtained, which is expressed as,
  • the CRF model of the kth mixed forest group in the feature enhancement layer can be obtained, which is expressed as,
  • the kth mixed forest group is obtained Furthermore, the kth enhanced feature can be expressed as follows:
  • H 1 is the first enhanced feature
  • H 2 is the second enhanced feature
  • H K is the Kth enhanced feature
  • the representation of the BHFR model is as follows:
  • W K represents the feature mapping layer and the weight between the feature enhancement layer and the output layer, which is calculated as follows:
  • represents the identity matrix
  • represents the coefficient of the regularization term
  • the BHFR proposed in this application uses the mixed forest group as the basic unit to realize incremental learning according to the convergence degree of the training error.
  • the step S4 is to build an incremental learning layer, build an incremental learning layer through an incremental learning strategy, and obtain a weight matrix by using the Moore-Penrose pseudo-inverse, and then realize high-precision modeling of the BHFR soft sensor model, specifically including:
  • X′ and y are the input and output of the new training set, and Indicates the Bootstrap sampling and RSM sampling of the pth mixed forest group in the incremental learning layer;
  • the output G K+1 of the feature mapping layer, feature incremental layer and incremental learning layer is expressed as follows:
  • G k [Z n
  • G K+1 contains N Raw samples and (2N+2K+2J) dimensional features
  • W K ( ⁇ +[G K ] T G K ) -1 [G K ] T Y;
  • the convergence threshold of the definition error is ⁇ Con to determine the number p of the mixed forest group in incremental learning; correspondingly, the incremental learning training error of the BHFR model is expressed as follows:
  • This application uses the actual DXN data of a MSWI power plant for industrial verification.
  • the DXN data comes from a MSWI incineration power plant in Beijing, covering a total of 141 sets of DXN emission concentration modeling data from 2009 to 2020.
  • the true value of DXN is the converted concentration after 2 hours of sampling and testing.
  • the input variable after removing missing data and abnormal variables is 116 dimensions, and the corresponding value is the average value of the current DXN true value sampling period.
  • Root Mean Square Error Root Mean Square Error
  • Mean Absolute Error MAE Mean Absolute Error MAE
  • Determination Coefficient Coefficient of Determination, R 2
  • N is the number of data
  • y i is the ith true value
  • i-th predicted value is the mean value
  • the parameters of the BHFR method are set as follows: the minimum number of samples N smples of decision tree leaf nodes is 7, the number of RSM feature selection The number N tree of the decision tree is 10, the number N Forest of the mixed forest group in the feature mapping layer and the feature enhancement layer are both 10, the potential feature contribution rate threshold ⁇ is 0.9, and the regularization parameter ⁇ is 2 ⁇ -10.
  • the number of latent features for the feature enhancement layer and incremental learning layer is first determined based on the fully connected mixture matrix and the feature space A.
  • the feature dimension of A in the DXN dataset is 316 dimensions.
  • the latent feature contribution rate threshold ⁇ is 0.9
  • the number of latent features selected in the DXN dataset is 35 respectively.
  • the mutual information threshold ⁇ is set to 0.75, and the number of potential features selected in the DXN dataset is 6.
  • the number of mixed forest group units in the preset incremental learning layer is 1000, and accordingly the relationship between the training error of the BHFR model and the number of mixed forest groups is shown in Figure 3.
  • the parameter settings are: (1) RF, the minimum number of samples N samples of decision tree leaf nodes is 3, and the number of RSM feature selection is The number of decision trees N tree is 500; (2) DFR, the minimum number of samples N smples of decision tree leaf nodes is 3, and the number of RSM feature selection is The number of decision trees N tree is 500, the number of RF and CRF models N RF and N CRF in each layer are both 2, and the total number of layers is set to 50; (3) DFR-clfc, the minimum number of samples N samples of decision tree leaf nodes is 3, and the number of RSM feature selection is The number of decision trees N tree is 500, the number of RF and CRF models N RF and N CRF in each layer are both 2, and the total number of layers is set to 50; (4) BLS-NN, the number of feature nodes N m is 5, the number of enhanced nodes N e is 41, the
  • the DXN soft sensor modeling experiments show that the BHFR proposed in this application has better training and learning ability than the classic RF and DFR extremely improved version DFR-clfc, and the modeling accuracy and data fitting degree on the test set are also stronger than RF, DFR, DFR-clfc and BLS-NN, reflecting its obvious advantages in building DXN soft sensor models.
  • the soft sensor method for MSWI process dioxin emission based on width mixed forest regression establishes a soft sensor model based on BHFR, which combines algorithms such as width learning modeling, integrated learning and latent feature extraction.
  • a soft sensor model including feature mapping layer, latent feature extraction layer, feature enhancement layer and incremental learning layer is constructed by using a non-differential learner;
  • the internal information of the BHFR model is processed by using information full connection, latent feature extraction and mutual information measurement, effectively ensuring the transfer of internal feature information of the BHFR model Maximization and redundancy minimization;
  • the mixed forest group is used as the mapping unit to realize the incremental learning of the modeling process, and the weight matrix of the output layer is quickly calculated through the pseudo-inverse strategy, and then the convergence degree of the training error is used to adaptively adjust the incremental learning, realizing high-precision soft sensor modeling.
  • the effectiveness and rationality of the proposed method are verified on high-dimensional benchmark datasets and industrial process DXN datasets.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Geometry (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

一种基于宽度混合森林回归的MSWI过程二噁英排放软测量方法,基于BLS框架,以非微分基学习器替换神经元构建面向小样本高维数据的BHFR软测量模型,BHFR软测量模型包括特征映射层、潜在特征提取层、特征增强层和增量学习层的构建:首先,构建由随机森林和完全随机森林组成的混合森林组进行高维特征映射;其次,依据贡献率对全联接混合矩阵的特征空间进行潜在特征提取,采用信息度量准则降低模型复杂度和计算消耗;然后,基于所提取潜在信息训练特征增强层以增强特征表征能力;最后,通过增量式学习策略构建增量学习层,采用Moore-Penrose伪逆获得权重矩阵,进而实现高精度建模。在高维基准数据集和工业过程DXN数据集上验证了所提方法的有效性和合理性。

Description

基于宽度混合森林回归的MSWI过程二噁英排放软测量方法 技术领域
本发明涉及二噁英排放软测量技术领域,特别是涉及一种基于宽度混合森林回归的MSWI过程二噁英排放软测量方法。
背景技术
城市固废焚烧(Municipal Solid Waste Incineration,MSWI)是目前世界范围内解决城市“垃圾围城”困境的主要方式之一,具有无害化、减量化和资源化等显著优势。二噁英(Dioxin,DXN)作为MSWI过程排放的有组织废气中具有持久性和剧毒性的有机污染物,是造成焚烧建厂存在“邻避现象”的主要原因,也是MSWI过程必须最小化控制的重要环保指标之一。基于高分辨气相色谱-高分辨质谱(HRGC/HRMS)的离线化验分析方法是目前用于检测DXN排放浓度的主要手段,存在技术难度大、时间滞后性大、人力与经济成本高等缺点,已经成为阻碍MSWI过程实现实时优化控制的关键因素之一。因此,DXN排放浓度的在线检测已成为MSWI过程的首要挑战问题。
针对上述问题,利用可在线检测的DXN关联物构建关联模型进而间接获得DXN浓度的在线间接检测方法成为热点;然而,其存在设备复杂、成本高、干扰因素多、预测精度无法保证等问题,同时其在本质上也是一种结合数据建模的检测手段。相较于离线分析和在线间接检测方法而言,基于工业集散控制系统采集的易检测过程数据驱动的软测量技术是解决DXN无法在线检测问题的有效途径,具有稳定、精准和快速响应等特点。软测量技术已在石油、化工和炼钢等复杂工业过程的难测参数检测中广泛应用。
发明内容
本发明的目的是提供一种基于宽度混合森林回归的MSWI过程二噁英排放软测量方法,以MSWI过程DXN排放浓度检测为目标,提出了基于宽度混合森林回归(Broad Hybrid Forest Regression,BHFR)的软测量建模算法。
为实现上述目的,本发明提供了如下方案:
一种基于宽度混合森林回归的MSWI过程二噁英排放软测量方法,基于BLS框架,以非微分基学习器替换神经元构建面向小样本高维数据的BHFR软测量模型,所述BHFR软测量模型包括特征映射层、潜在特征提取层、特征增强层和增量学习层的构建,具体包括以下步骤:
S1,构建特征映射层,构建由随机森林RF和完全随机森林CRF组成的混合森林组对 高维特征进行映射;
S2,构建潜在特征提取层,依据贡献率对全联接混合矩阵的特征空间进行潜在特征提取,基于信息度量准则保证潜在有价值信息的最大化传递和最小化冗余,降低模型复杂度和计算消耗;
S3,构建特征增强层,基于所提取的潜在特征训练特征增强层以进一步增强特征表征能力;
S4,构建增量学习层,通过增量式学习策略构建增量学习层,采用Moore-Penrose伪逆获得权重矩阵,进而实现BHFR软测量模型的高精度建模;
S5,采用高维基准数据集和工业过程DXN数据集验证所述软测量模型;
S6,采用步骤S1-S5建立的软测量模型,对MSWI过程二噁英排放进行软测量。
进一步的,所述步骤S1,构建特征映射层,构建由随机森林RF和完全随机森林CRF组成的混合森林组对高维特征进行映射,具体包括:
设原始数据为{X,y},其中
Figure PCTCN2022127864-appb-000001
是原始输入数据,N Raw是原始数据的数量,M是原始输入数据的维数,其来源于MSWI过程的六个不同阶段,以秒为单位在DCS系统采集与存储,
Figure PCTCN2022127864-appb-000002
是DXN排放浓度的输出真值,其来源于采用离线检测法得到排放物DXN检测样本;以特征映射层的第nth个混合森林组为例描述特征映射层的建模过程:
对{X,y}进行Bootstrap和随机子空间RSM采样,获得混合森林组模型的J个训练子集,如下:
Figure PCTCN2022127864-appb-000003
其中,
Figure PCTCN2022127864-appb-000004
Figure PCTCN2022127864-appb-000005
为第J个训练子集的输入和输出,
Figure PCTCN2022127864-appb-000006
Figure PCTCN2022127864-appb-000007
表示特征映射层中对第nth个混合森林组的Bootstrap和RSM采样,P Bootstrap表示Bootstrap采样概率;
基于
Figure PCTCN2022127864-appb-000008
训练包含J个决策树的混合森林算法,其中特征映射层中的第nth个混合森林组的第jth个决策树表示如下:
Figure PCTCN2022127864-appb-000009
其中,L表示决策树叶节点数量,I(·)表示指示函数,c l采用递归分裂方式计算;
RF中决策树的分裂损失函数Ω i(·)表示为:
Figure PCTCN2022127864-appb-000010
其中,Ω i(s,v)表示第sth个特征的值v作为切分准则的损失函数值,y L表示左叶节点的DXN排放浓度真值向量,E[y L]表示y L的数学期望,y R表示右叶节点的DXN排放浓度真值向量,E[y R]表示y R的数学期望,
Figure PCTCN2022127864-appb-000011
表示左叶节点第i个DXN排放浓度真值,
Figure PCTCN2022127864-appb-000012
表示右叶节点第i个DXN排放浓度真值,c L表示左叶节点DXN排放浓度预测输出,c R表示右叶节点DXN排放浓度预测输出;
通过最小化Ω i(s,v),将训练集
Figure PCTCN2022127864-appb-000013
切分为两个树节点,如下:
Figure PCTCN2022127864-appb-000014
其中,
Figure PCTCN2022127864-appb-000015
Figure PCTCN2022127864-appb-000016
表示切分后左右两个树节点所包含的样本集,N L和N R分别表示
Figure PCTCN2022127864-appb-000017
Figure PCTCN2022127864-appb-000018
中的样本数量;
当前左右树节点的DXN排放浓度预测输出值输出值
Figure PCTCN2022127864-appb-000019
Figure PCTCN2022127864-appb-000020
为样本真值的期望,如下:
Figure PCTCN2022127864-appb-000021
其中,y L和y R表示
Figure PCTCN2022127864-appb-000022
Figure PCTCN2022127864-appb-000023
中的DXN排放浓度真值向量,E[y L]和E[y R]表示y L和y R的数学期望;
与RF不同,CRF中决策树分裂采用完全随机选择方式,表示为,
Figure PCTCN2022127864-appb-000024
其中,
Figure PCTCN2022127864-appb-000025
表示完全随机选取第sth个特征的值v作为切分点;
被随机分裂的左右树节点的DXN排放浓度预测输出值
Figure PCTCN2022127864-appb-000026
Figure PCTCN2022127864-appb-000027
为样本真值的期望,如下:
Figure PCTCN2022127864-appb-000028
通过上述过程,第nth个混合森林组
Figure PCTCN2022127864-appb-000029
可表示为,
Figure PCTCN2022127864-appb-000030
其中,
Figure PCTCN2022127864-appb-000031
表示第nth个随机森林,
Figure PCTCN2022127864-appb-000032
表示第nth个完全随机森林;
进而,第nth个映射特征Z n可表示为
Figure PCTCN2022127864-appb-000033
其中,
Figure PCTCN2022127864-appb-000034
表示第nth组混合森林对来源于MSWI过程六个不同阶段的原始输入数据第1个样本的映射特征,
Figure PCTCN2022127864-appb-000035
表示第nth组混合森林对来源于MSWI过程六个 不同阶段的原始输入数据第n Rawth个样本的映射特征,
Figure PCTCN2022127864-appb-000036
表示第nth组混合森林对来源于MSWI过程六个不同阶段的原始输入数据第N Rawth个样本的映射特征;
最终,特征映射层的输出表示为:
Figure PCTCN2022127864-appb-000037
其中,Z 1为第1个映射特征,Z 2为第2个映射特征,Z N为第N个映射特征,映射特征矩阵Z N包含N Raw个样本和2N维特征。
进一步的,所述步骤S2,构建潜在特征提取层,依据贡献率对全联接混合矩阵的特征空间进行潜在特征提取,基于信息度量准则保证潜在有价值信息的最大化传递和最小化冗余,降低模型复杂度和计算消耗,具体包括:
首先,来源于MSWI过程六个不同阶段的原始输入数据X与特征映射矩阵Z N组合得到全联接混合矩阵A,表示为:
Figure PCTCN2022127864-appb-000038
其中,A含N Raw个样本和(M+2N)维特征;
接着,考虑到A的维数远高于原始数据,此处利用PCA最小化A中的冗余信息,计算A的相关矩阵R,如下:
Figure PCTCN2022127864-appb-000039
进一步,对R进行奇异值分解,得到(M+2N)个特征值和相应特征向量,如下:
R=U (M+2N)Σ (M+2N)V (M+2N)    (13)
其中,U (M+2N)表示(M+2N)阶正交矩阵,Σ (M+2N)表示(M+2N)阶对角矩阵,V (M+2N)表示(M+2N)阶正交矩阵;
Figure PCTCN2022127864-appb-000040
其中,σ 12>…>σ (M+2N)表示由大到小排列的特征值;
然后,根据设定潜在特征贡献阈值η,确定最终的主成分数量,
Figure PCTCN2022127864-appb-000041
其中,潜在特征数量Q PCA□(M+2N);
基于上述确定的Q PCA个潜在特征,获得特征值集合
Figure PCTCN2022127864-appb-000042
对应的特征向量矩阵
Figure PCTCN2022127864-appb-000043
即A的投影矩阵;然后,对A进行特征投影以实现冗余信息的最小化处理,将获得潜在特征记为X PCA,即
Figure PCTCN2022127864-appb-000044
其中,
Figure PCTCN2022127864-appb-000045
表示前Q PCA个潜在特征的特征向量;
进一步,计算所选潜在特征X PCA与真值
Figure PCTCN2022127864-appb-000046
间的互信息值I MI,如下:
Figure PCTCN2022127864-appb-000047
其中,
Figure PCTCN2022127864-appb-000048
表示第qth个潜在特征
Figure PCTCN2022127864-appb-000049
与DXN排放浓度真值y的联合概率分布,
Figure PCTCN2022127864-appb-000050
表示第qth个潜在特征
Figure PCTCN2022127864-appb-000051
的边缘概率分布,p(y)表示DXN排放浓度真值y的边缘概率分布;
接着,通过信息最大化选择机制以保证所选择潜在特征与真值的相关性,表示为:
Figure PCTCN2022127864-appb-000052
其中,
Figure PCTCN2022127864-appb-000053
表示Q PCA个潜在特征
Figure PCTCN2022127864-appb-000054
与真值y的互信息值,ζ表示最大化信息的阈值,
Figure PCTCN2022127864-appb-000055
表示与DXN排放浓度真值y信息相关度最大的
Figure PCTCN2022127864-appb-000056
个潜在特征;
最终,获得包括
Figure PCTCN2022127864-appb-000057
个潜在特征的新数据集
Figure PCTCN2022127864-appb-000058
并设定提取后维数
Figure PCTCN2022127864-appb-000059
进一步的,所述步骤S3中,构建特征增强层,基于所提取的潜在特征训练特征增强层以进一步增强特征表征能力,具体包括:
首先对新数据集{X′,y}进行基于Bootstrap和RSM的采样,获取混合森林算法的第个J训练子集,如下:
Figure PCTCN2022127864-appb-000060
其中,
Figure PCTCN2022127864-appb-000061
Figure PCTCN2022127864-appb-000062
为第个J训练子集的输入和输出,X′和y为新训练集的输入和输出,
Figure PCTCN2022127864-appb-000063
表示对第kth个混合森林组的Bootstrap采样,
Figure PCTCN2022127864-appb-000064
表示对第kth个混合森林组的RSM采样;
接着,以第kth个混合森林组中第j个RF的构建为例,如下:
Figure PCTCN2022127864-appb-000065
其中,
Figure PCTCN2022127864-appb-000066
表示特征增强层中第kth个混合森林组中RF的第jth个决策树;L表示决策树叶节点的数量;c l采用递归分裂方式计算,具体过程公式(3)-(5);
进而,可得到特征增强层中第kth个混合森林组中的RF模型,其表示为,
Figure PCTCN2022127864-appb-000067
然后,类似地以第kth个混合森林组中的第j个CRF的构建为例,如下:
Figure PCTCN2022127864-appb-000068
其中,
Figure PCTCN2022127864-appb-000069
表示特征增强层中第kth个混合森林组中CRF的第jth个决策树;c l采用递归分裂方式计算,具体过程见公式(6)-(7);
进而,可得到特征增强层中第kth个混合森林组的CRF模型,其表示为,
Figure PCTCN2022127864-appb-000070
通过上述过程,得到第kth个混合森林组
Figure PCTCN2022127864-appb-000071
进而,第kth个增强特征可表示如下:
Figure PCTCN2022127864-appb-000072
其中,
Figure PCTCN2022127864-appb-000073
表示第kth个混合森林组对新数据中第1个样本的增强映射,
Figure PCTCN2022127864-appb-000074
表示第kth个混合森林组对新数据中第n Rawth个样本的增强映射,
Figure PCTCN2022127864-appb-000075
表示第kth个混合森林组对新数据中第N Rawth个样本的增强映射;
最后,特征增强层的输出H K表示如下:
Figure PCTCN2022127864-appb-000076
其中,H 1为第1个增强特征,H 2为第2个增强特征,H K为第K个增强特征;
当不考虑增量学习策略时,BHFR模型的表示如下:
Figure PCTCN2022127864-appb-000077
其中,G K表示特征映射层与特征增强层输出的组合,即G K=[Z N|H K],其包含N Raw个样本和(2N+2K)维特征;W K表示特征映射层和特征增强层与输出层间的权重,其计算如下:
W K=(λΙ+[G K] TG K) -1[G K] TY    (27)
其中,Ι表示单位矩阵,λ表示正则项系数;相应地,G K的伪逆计算可表示为:
Figure PCTCN2022127864-appb-000078
进一步的,所述步骤S4,构建增量学习层,通过增量式学习策略构建增量学习层,采用Moore-Penrose伪逆获得权重矩阵,进而实现BHFR软测量模型的高精度建模,具体包括:
首先,对新数据集{X′,y}进行基于Bootstrap和RSM的采样,获取混合森林算法训练子集,过程如下:
Figure PCTCN2022127864-appb-000079
其中,
Figure PCTCN2022127864-appb-000080
Figure PCTCN2022127864-appb-000081
为混合森林算法第个J训练子集的输入和输出,X′和y为新训练集的输入和输出,
Figure PCTCN2022127864-appb-000082
Figure PCTCN2022127864-appb-000083
表示增量学习层中第pth个混合森林组的Bootstrap采样和RSM采样;
接着,构建第pth个混合森林组中的决策树
Figure PCTCN2022127864-appb-000084
Figure PCTCN2022127864-appb-000085
其过程与特征映射层和特征增量层相同,此处不再赘述;
进一步,当增加1个混合森林组后,特征映射层、特征增量层和增量学习层的输出G K+1表示如下:
Figure PCTCN2022127864-appb-000086
其中,G k=[Z n|H k]包含N Raw个样本和(2N+2K)维特征,G K+1包含N Raw个样本和(2N+2K+2J)维特征;
然后,进行G K+1的Moore-Penrose逆矩阵的递推更新,如下:
Figure PCTCN2022127864-appb-000087
其中,矩阵C和矩阵D的计算如下:
C=H K+1-G KD    (32)
Figure PCTCN2022127864-appb-000088
进而,G K+1的Moore-Penrose逆矩阵的递推公式如下:
Figure PCTCN2022127864-appb-000089
进一步,计算特征映射层、特征增量层和增量学习层与输出层间权重的更新矩阵W K+1,如下:
Figure PCTCN2022127864-appb-000090
其中,W K=(λΙ+[G K] TG K) -1[G K] TY;
由于采用上述伪逆更新策略只需要计算增量学习层混合森林组的伪逆矩阵,因此能够实现快速的增量式学习;
进一步,根据训练误差的收敛程度实现自适应增量学习;
定义误差的收敛阈值为θ Con用以确定增量学习中混合森林组的数量p;相应地,BHFR模型的增量学习训练误差表示如下:
Figure PCTCN2022127864-appb-000091
其中,
Figure PCTCN2022127864-appb-000092
表示增量学习第p+1个与第p个混合森林组的训练误差值,
Figure PCTCN2022127864-appb-000093
Figure PCTCN2022127864-appb-000094
表示包含p个和p+1个混合森林组的BHFR模型训练误差;
最终,所提BHFR软测量模型的预测输出
Figure PCTCN2022127864-appb-000095
为,
Figure PCTCN2022127864-appb-000096
根据本发明提供的具体实施例,本发明公开了以下技术效果:本发明提供的基于宽度混合森林回归的MSWI过程二噁英排放软测量方法,建立了基于BHFR的软测量模型,其结合了宽度学习建模、集成学习和潜在特征提取等算法,1)基于宽度学习系统框架,采用非微分学习器构建了包含特征映射层、潜在特征提取层、特征增强层和增量学习层的软测量模型;2)利用信息全联接、潜在特征提取和互信息度量对BHFR模型内部信息进行处理,有效保证了BHFR模型内部特征信息的传递最大化和冗余度最小化;3)采用混合森林组为映射单元实现建模过程的增量学习,通过伪逆策略快速计算输出层权重矩阵,再利用训练误差的收敛程度自适应调整增量学习,实现了高精度的软测量建模。在高维基准数据集和工业过程DXN数据集上验证了所提方法的有效性和合理性。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例基于宽度混合森林回归的MSWI过程二噁英排放软测量方法流程图;
图2是本发明实施例城市固废焚烧过程工艺流程图;
图3是本发明实施例训练误差收敛曲线;
图4a是本发明实施例DXN数据集中训练集的拟合曲线;
图4b是本发明实施例DXN数据集中验证集的拟合曲线;
图4c是本发明实施例DXN数据集中测试集的拟合曲线。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本发明的目的是提供一种基于宽度混合森林回归的MSWI过程二噁英排放软测量方法,以MSWI过程DXN排放浓度检测为目标,提出了基于宽度混合森林回归(Broad Hybrid Forest Regression,BHFR)的软测量建模算法。
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。
如图1所示,本发明提供的基于宽度混合森林回归的MSWI过程二噁英排放软测量方法,包括如下步骤:
基于BLS框架,以非微分基学习器替换神经元构建面向小样本高维数据的BHFR软测量模型,所述BHFR软测量模型包括特征映射层、潜在特征提取层、特征增强层和增量学习层的构建,具体包括以下步骤:
S1,构建特征映射层,构建由随机森林RF和完全随机森林CRF组成的混合森林组对高维特征进行映射;
S2,构建潜在特征提取层,依据贡献率对全联接混合矩阵的特征空间进行潜在特征提 取,基于信息度量准则保证潜在有价值信息的最大化传递和最小化冗余,降低模型复杂度和计算消耗;
S3,构建特征增强层,基于所提取的潜在特征训练特征增强层以进一步增强特征表征能力;
S4,构建增量学习层,通过增量式学习策略构建增量学习层,采用Moore-Penrose伪逆获得权重矩阵,进而实现BHFR软测量模型的高精度建模;
S5,采用高维基准数据集和工业过程DXN数据集验证所述软测量模型;
S6,采用步骤S1-S5建立的软测量模型,对MSWI过程二噁英排放进行软测量。
MSWI过程包含固废储运、固废焚烧、余热锅炉、蒸汽发电、烟气净化和烟气排放等工艺阶段,以日处理量800吨的炉排式MSWI过程为例,其工艺流程如图2所示。
结合DXN分解、生成、吸附和排放的全流程对各阶段的主要功能描述如下:
1)固废储运阶段:环卫车辆从城市各收集站点将MSW运输至MSWI电厂,经称重记录后从卸料平台倾倒至固废储存池中未发酵区,然后由固废抓斗对其进行混合搅拌,再抓取至发酵区,经3~7天发酵和脱水以保证MSW焚烧的低位热值。研究表明,原生MSW中含有微量DXN(约0.8ng TEQ/Kg),并含有DXN生成反应所需的多种含氯化合物。
2)固废焚烧阶段:固废抓斗将发酵后的MSW投放至进料斗,经进料器将MSW推送到焚烧炉内,依次经过干燥、燃烧1、燃烧2和燃烬炉排后,MSW中的可燃成分随之完全燃烧;所需助燃空气由一次风机和二次风机从炉排下方和炉膛中部注入,最终燃烧产生的灰渣从燃烬炉排末端落至捞渣机,经水冷后送入炉渣池。为保证原生MSW中含有的以及焚烧时产生的DXN在炉内高温燃烧条件下能够被完全分解,炉膛燃烧过程需严格控制烟气温度在850℃以上、高温烟气在炉内停留时间超过2秒、确保足够大的烟气湍流度等工艺要求。
3)余热锅炉阶段:炉膛产生的高温烟气(高于850℃)经引风机抽吸进入余热锅炉系统,先后经过过热器、蒸发器和省煤器设备,高温烟气与锅炉汽包液态水进行热交换后产生高温蒸汽,进而实现对高温烟气的降温处理,使余热锅炉出口的烟气温度低于200℃(即烟气G1)。从DXN生成机理的角度,高温烟气经余热锅炉降温时,导致DXN生成的化学反应包括高温气相合成反应(800℃~500℃)、前驱物合成(450℃~200℃)和从头合成(350℃~250℃)等,但目前还暂无统一的定论。
4)蒸汽发电阶段:利用余热锅炉产生的高温蒸汽推动汽轮发电机,将机械能转变成电能,实现厂级用电的自给自足和剩余电量的上网供电,实现资源化和获取经济效益。
5)烟气净化阶段:MSWI过程的烟气净化主要包含脱硝(NO x)、脱硫(HCL、HF、SO 2 等)、脱重金属(Pb、Hg、Cd等)、吸附二噁英(DXN)和除尘(颗粒物)等一系列过程,进而实现焚烧烟气污染物排放达标的目的。采用活性炭喷射系统吸附焚烧烟气中DXN,是目前应用最广泛的技术手段,吸附后的DXN富集于飞灰中。
6)烟气排放阶段:经降温和净化处理后的含有微量DXN的焚烧烟气(即烟气G2)由引风机抽吸经烟囱排放至大气中。MSWI过程的不间断、长时间的运行特性导致烟囱内壁颗粒物中附着大量DXN(即记忆效应),在何种工况下存在释放的可能性还是目前的研究难题。
目前,面向MSWI过程的DXN软测量检测研究主要集中针对排放阶段(即烟气G3)的DXN浓度检测,本申请研究重点是构建G3烟气处的软测量模型。
本申请所提BHFR建模策略包含特征映射层、潜在特征提取层、特征增强层和增量学习层四个主要部分。
如图1中,
Figure PCTCN2022127864-appb-000097
表示原始数据,其中
Figure PCTCN2022127864-appb-000098
是原始输入数据,N Raw是原始数据的数量,M是原始输入数据的维数,其来源于上述MSWI过程的六个不同阶段,以秒为单位在DCS系统采集与存储,
Figure PCTCN2022127864-appb-000099
是DXN排放浓度的输出真值,其来源于采用离线检测法得到排放物二噁英DXN检测样本;{DT 1,…,DT J}表示混合森林算法中的J个决策树模型,DT 1为第1个决策树模型,DT J为第J个决策树模型;Bootstrap和RSM表示对输入数据进行样本和特征采样;{RF n,CRF n}表示第n个混合森林组模型,RF n和CRF n表示第n个RF和CRF模型;
Figure PCTCN2022127864-appb-000100
表示特征映射层中包含N个混合森林组模型;Z N表示特征映射层的输出;H K表示特征增强层的输出;[X|Z N]表示原始数据与Z N的全联接混合矩阵;
Figure PCTCN2022127864-appb-000101
表示经潜在特征提取后的新训练数据;
Figure PCTCN2022127864-appb-000102
表示特征增强层包含的K个混合森林组模型;
Figure PCTCN2022127864-appb-000103
表示增量学习层中包含的P个混合森林组模型;W K+P表示最终的权重矩阵。
各部分的主要功能如下:
1)特征映射层:将来源于MSWI过程六个不同阶段的原始输入数据
Figure PCTCN2022127864-appb-000104
通过特征映射层的N个混合森林组
Figure PCTCN2022127864-appb-000105
进行特征映射,得到映射输出矩阵Z N
2)潜在特征提取层:利用主成分分析对由原始输入数据
Figure PCTCN2022127864-appb-000106
与特征映射层输出Z N组成的全联接混合矩阵[X|Z N]进行潜在特征提取,去除特征空间的冗余信息,进一步通过所提取的潜在特征与DXN排放浓度的输出真值y的互信息确定潜在特征维数并得到新训练集
Figure PCTCN2022127864-appb-000107
3)特征增强层:以新训练集
Figure PCTCN2022127864-appb-000108
作为输入,通过特征增强层的K个混合森林组
Figure PCTCN2022127864-appb-000109
组进行特征映射,得到增强层输出矩阵H K
4)增量学习层:以新训练集
Figure PCTCN2022127864-appb-000110
作为输入,以混合森林组为最小单位逐步增 加并更新权重W K+P,直到训练误差收敛。
从本质上讲,BHFR是以RF和CRF为基元构成的混合森林组作为基础映射单元取代原始BLS中的神经元;所述步骤S1,构建特征映射层,构建由随机森林RF和完全随机森林CRF组成的混合森林组对高维特征进行映射,具体包括:
设原始数据为{X,y},其中
Figure PCTCN2022127864-appb-000111
是原始输入数据,N Raw是原始数据的数量,M是原始输入数据的维数,其来源于MSWI过程的六个不同阶段,以秒为单位在DCS系统采集与存储,
Figure PCTCN2022127864-appb-000112
是DXN排放浓度的输出真值,其来源于采用离线检测法得到排放物DXN检测样本;以特征映射层的第nth个混合森林组为例描述特征映射层的建模过程:
对{X,y}进行Bootstrap和随机子空间RSM采样,获得混合森林组模型的J个训练子集,如下:
Figure PCTCN2022127864-appb-000113
其中,
Figure PCTCN2022127864-appb-000114
Figure PCTCN2022127864-appb-000115
为第J个训练子集的输入和输出,
Figure PCTCN2022127864-appb-000116
Figure PCTCN2022127864-appb-000117
表示特征映射层中对第nth个混合森林组的Bootstrap和RSM采样,P Bootstrap表示Bootstrap采样概率;
基于
Figure PCTCN2022127864-appb-000118
训练包含J个决策树的混合森林算法,其中特征映射层中的第nth个混合森林组的第jth个决策树表示如下:
Figure PCTCN2022127864-appb-000119
其中,L表示决策树叶节点数量,I(·)表示指示函数,c l采用递归分裂方式计算;
RF中决策树的分裂损失函数Ω i(·)表示为:
Figure PCTCN2022127864-appb-000120
其中,Ω i(s,v)表示第sth个特征的值v作为切分准则的损失函数值,y L表示左叶节点的DXN排放浓度真值向量,E[y L]表示y L的数学期望,y R表示右叶节点的DXN排放浓度真值向量,E[y R]表示y R的数学期望,
Figure PCTCN2022127864-appb-000121
表示左叶节点第i个DXN排放浓度真值,
Figure PCTCN2022127864-appb-000122
表示右叶节点第i个DXN排放浓度真值,c L表示左叶节点DXN排放浓度预测输出,c R表示右叶节点DXN排放浓度预测输出;
通过最小化Ω i(s,v),将训练集
Figure PCTCN2022127864-appb-000123
切分为两个树节点,如下:
Figure PCTCN2022127864-appb-000124
其中,
Figure PCTCN2022127864-appb-000125
Figure PCTCN2022127864-appb-000126
表示切分后左右两个树节点所包含的样本集,N L和N R分别表示
Figure PCTCN2022127864-appb-000127
Figure PCTCN2022127864-appb-000128
中的样本数量;
当前左右树节点的DXN排放浓度预测输出值输出值
Figure PCTCN2022127864-appb-000129
Figure PCTCN2022127864-appb-000130
为样本真值的期望,如下:
Figure PCTCN2022127864-appb-000131
其中,y L和y R表示
Figure PCTCN2022127864-appb-000132
Figure PCTCN2022127864-appb-000133
中的DXN排放浓度真值向量,E[y L]和E[y R]表示y L和y R的数学期望;
与RF不同,CRF中决策树分裂采用完全随机选择方式,表示为,
Figure PCTCN2022127864-appb-000134
其中,
Figure PCTCN2022127864-appb-000135
表示完全随机选取第sth个特征的值v作为切分点;
被随机分裂的左右树节点的DXN排放浓度预测输出值
Figure PCTCN2022127864-appb-000136
Figure PCTCN2022127864-appb-000137
为样本真值的期望,如下:
Figure PCTCN2022127864-appb-000138
通过上述过程,第nth个混合森林组
Figure PCTCN2022127864-appb-000139
可表示为,
Figure PCTCN2022127864-appb-000140
其中,
Figure PCTCN2022127864-appb-000141
表示第nth个随机森林,
Figure PCTCN2022127864-appb-000142
表示第nth个完全随机森林;
进而,第nth个映射特征Z n可表示为
Figure PCTCN2022127864-appb-000143
其中,
Figure PCTCN2022127864-appb-000144
表示第nth组混合森林对来源于MSWI过程六个不同阶段的原始输入数据第1个样本的映射特征,
Figure PCTCN2022127864-appb-000145
表示第nth组混合森林对来源于MSWI过程六个不同阶段的原始输入数据第n Rawth个样本的映射特征,
Figure PCTCN2022127864-appb-000146
表示第nth组混合森林对来源于MSWI过程六个不同阶段的原始输入数据第N Rawth个样本的映射特征;
最终,特征映射层的输出表示为:
Figure PCTCN2022127864-appb-000147
其中,Z 1为第1个映射特征,Z 2为第2个映射特征,Z N为第N个映射特征,映射特征矩阵Z N包含N Raw个样本和2N维特征。
为了避免信息传递过程中的信息丢失导致的过拟合现象,本申请所提BHFR采用全联接策略实现特征映射层与特征增强层、增量学习层之间的信息传递。同时,为了保证模型 训练过程中信息冗余最小化,此处采用主成分分析(Principal Component Analysis,PCA)提取全联接混合矩阵特征空间的潜在特征,再利用互信息进一步筛选与真值信息最大化相关的潜在特征,进而实现对高维数据的降维处理。
所述步骤S2,构建潜在特征提取层,依据贡献率对全联接混合矩阵的特征空间进行潜在特征提取,基于信息度量准则保证潜在有价值信息的最大化传递和最小化冗余,降低模型复杂度和计算消耗,具体包括:
首先,来源于MSWI过程六个不同阶段的原始输入数据X与特征映射矩阵Z N组合得到全联接混合矩阵A,表示为:
Figure PCTCN2022127864-appb-000148
其中,A含N Raw个样本和(M+2N)维特征;
接着,考虑到A的维数远高于原始数据,此处利用PCA最小化A中的冗余信息,计算A的相关矩阵R,如下:
Figure PCTCN2022127864-appb-000149
进一步,对R进行奇异值分解,得到(M+2N)个特征值和相应特征向量,如下:
R=U (M+2N)Σ (M+2N)V (M+2N)    (13)
其中,U (M+2N)表示(M+2N)阶正交矩阵,Σ (M+2N)表示(M+2N)阶对角矩阵,V (M+2N)表示(M+2N)阶正交矩阵;
Figure PCTCN2022127864-appb-000150
其中,σ 12>…>σ (M+2N)表示由大到小排列的特征值;
然后,根据设定潜在特征贡献阈值η,确定最终的主成分数量,
Figure PCTCN2022127864-appb-000151
其中,潜在特征数量Q PCA□(M+2N);
基于上述确定的Q PCA个潜在特征,获得特征值集合
Figure PCTCN2022127864-appb-000152
对应的特征向量矩阵V QPCA,即A的投影矩阵;然后,对A进行特征投影以实现冗余信息的最小化处理,将获得潜在特征记为X PCA,即
Figure PCTCN2022127864-appb-000153
其中,
Figure PCTCN2022127864-appb-000154
表示前Q PCA个潜在特征的特征向量;
进一步,计算所选潜在特征X PCA与真值
Figure PCTCN2022127864-appb-000155
间的互信息值I MI,如下:
Figure PCTCN2022127864-appb-000156
其中,
Figure PCTCN2022127864-appb-000157
表示第qth个潜在特征
Figure PCTCN2022127864-appb-000158
与DXN排放浓度真值y的联合概率分布,
Figure PCTCN2022127864-appb-000159
表示第qth个潜在特征
Figure PCTCN2022127864-appb-000160
的边缘概率分布,p(y)表示DXN排放浓度真值y的边缘 概率分布;
接着,通过信息最大化选择机制以保证所选择潜在特征与真值的相关性,表示为:
Figure PCTCN2022127864-appb-000161
其中,
Figure PCTCN2022127864-appb-000162
表示Q PCA个潜在特征
Figure PCTCN2022127864-appb-000163
与真值y的互信息值,ζ表示最大化信息的阈值,
Figure PCTCN2022127864-appb-000164
表示与DXN排放浓度真值y信息相关度最大的
Figure PCTCN2022127864-appb-000165
个潜在特征;
最终,获得包括
Figure PCTCN2022127864-appb-000166
个潜在特征的新数据集
Figure PCTCN2022127864-appb-000167
并设定提取后维数
Figure PCTCN2022127864-appb-000168
所述步骤S3中,构建特征增强层,基于所提取的潜在特征训练特征增强层以进一步增强特征表征能力,具体包括:
首先对新数据集{X′,y}进行基于Bootstrap和RSM的采样,获取混合森林算法的第个J训练子集,如下:
Figure PCTCN2022127864-appb-000169
其中,
Figure PCTCN2022127864-appb-000170
Figure PCTCN2022127864-appb-000171
为第个J训练子集的输入和输出,X′和y为新训练集的输入和输出,
Figure PCTCN2022127864-appb-000172
表示对第kth个混合森林组的Bootstrap采样,
Figure PCTCN2022127864-appb-000173
表示对第kth个混合森林组的RSM采样;
接着,以第kth个混合森林组中第j个RF的构建为例,如下:
Figure PCTCN2022127864-appb-000174
其中,
Figure PCTCN2022127864-appb-000175
表示特征增强层中第kth个混合森林组中RF的第jth个决策树;L表示决策树叶节点的数量;c l采用递归分裂方式计算,具体过程公式(3)-(5);
进而,可得到特征增强层中第kth个混合森林组中的RF模型,其表示为,
Figure PCTCN2022127864-appb-000176
然后,类似地以第kth个混合森林组中的第j个CRF的构建为例,如下:
Figure PCTCN2022127864-appb-000177
其中,
Figure PCTCN2022127864-appb-000178
表示特征增强层中第kth个混合森林组中CRF的第jth个决策树;c l采用递归分裂方式计算,具体过程见公式(6)-(7);
进而,可得到特征增强层中第kth个混合森林组的CRF模型,其表示为,
Figure PCTCN2022127864-appb-000179
通过上述过程,得到第kth个混合森林组
Figure PCTCN2022127864-appb-000180
进而,第kth个增强特征可表示如下:
Figure PCTCN2022127864-appb-000181
其中,
Figure PCTCN2022127864-appb-000182
表示第kth个混合森林组对新数据中第1个样本的增强映射,
Figure PCTCN2022127864-appb-000183
表示第kth个混合森林组对新数据中第n Rawth个样本的增强映射,
Figure PCTCN2022127864-appb-000184
表示第kth个混合森林组对新数据中第N Rawth个样本的增强映射;
最后,特征增强层的输出H K表示如下:
Figure PCTCN2022127864-appb-000185
其中,H 1为第1个增强特征,H 2为第2个增强特征,H K为第K个增强特征;
当不考虑增量学习策略时,BHFR模型的表示如下:
Figure PCTCN2022127864-appb-000186
其中,G K表示特征映射层与特征增强层输出的组合,即G K=[Z N|H K],其包含N Raw个样本和(2N+2K)维特征;W K表示特征映射层和特征增强层与输出层间的权重,其计算如下:
W K=(λΙ+[G K] TG K) -1[G K] TY    (27)
其中,Ι表示单位矩阵,λ表示正则项系数;相应地,G K的伪逆计算可表示为:
Figure PCTCN2022127864-appb-000187
本申请所提的BHFR以混合森林组为基本单元依据训练误差的收敛程度实现增量学习。所述步骤S4,构建增量学习层,通过增量式学习策略构建增量学习层,采用Moore-Penrose伪逆获得权重矩阵,进而实现BHFR软测量模型的高精度建模,具体包括:
首先,对新数据集{X′,y}进行基于Bootstrap和RSM的采样,获取混合森林算法训练子集,过程如下:
Figure PCTCN2022127864-appb-000188
其中,
Figure PCTCN2022127864-appb-000189
Figure PCTCN2022127864-appb-000190
为混合森林算法第个J训练子集的输入和输出,X′和y为新训练集的输入和输出,
Figure PCTCN2022127864-appb-000191
Figure PCTCN2022127864-appb-000192
表示增量学习层中第pth个混合森林组的Bootstrap采样和RSM采样;
接着,构建第pth个混合森林组中的决策树
Figure PCTCN2022127864-appb-000193
Figure PCTCN2022127864-appb-000194
其过程与特征映射层和特征增量层相同,此处不再赘述;
进一步,当增加1个混合森林组后,特征映射层、特征增量层和增量学习层的输出G K+1表示如下:
Figure PCTCN2022127864-appb-000195
其中,G k=[Z n|H k]包含N Raw个样本和(2N+2K)维特征,G K+1包含N Raw个样本和(2N+2K+2J)维特征;
然后,进行G K+1的Moore-Penrose逆矩阵的递推更新,如下:
Figure PCTCN2022127864-appb-000196
其中,矩阵C和矩阵D的计算如下:
C=H K+1-G KD    (32)
Figure PCTCN2022127864-appb-000197
进而,G K+1的Moore-Penrose逆矩阵的递推公式如下:
Figure PCTCN2022127864-appb-000198
进一步,计算特征映射层、特征增量层和增量学习层与输出层间权重的更新矩阵W K+1,如下:
Figure PCTCN2022127864-appb-000199
其中,W K=(λΙ+[G K] TG K) -1[G K] TY;
由于采用上述伪逆更新策略只需要计算增量学习层混合森林组的伪逆矩阵,因此能够实现快速的增量式学习;
进一步,根据训练误差的收敛程度实现自适应增量学习;
定义误差的收敛阈值为θ Con用以确定增量学习中混合森林组的数量p;相应地,BHFR模型的增量学习训练误差表示如下:
Figure PCTCN2022127864-appb-000200
其中,
Figure PCTCN2022127864-appb-000201
表示增量学习第p+1个与第p个混合森林组的训练误差值,
Figure PCTCN2022127864-appb-000202
Figure PCTCN2022127864-appb-000203
表示包含p个和p+1个混合森林组的BHFR模型训练误差;
最终,所提BHFR软测量模型的预测输出
Figure PCTCN2022127864-appb-000204
为,
Figure PCTCN2022127864-appb-000205
本申请采用某MSWI电厂的实际DXN数据进行工业验证。DXN数据源自于北京某MSWI焚烧发电厂,共涵盖了2009-2020年的DXN排放浓度建模数据141组,DXN真值为2小时采样化验后的折算浓度,对缺失数据和异常变量进行剔除后的输入变量为116维,相应地取值为当前DXN真值采样时间段内的均值。
本申请选取均方根误差(Root Mean Square Error,RMSE)、平均绝对误差MAE和决定系数(Coefficient of Determination,R 2)共三个评价指标比较不同方法的性能,计算如下:
Figure PCTCN2022127864-appb-000206
Figure PCTCN2022127864-appb-000207
Figure PCTCN2022127864-appb-000208
其中,N为数据的数量,y i为第i个真值,
Figure PCTCN2022127864-appb-000209
为第i个预测值,
Figure PCTCN2022127864-appb-000210
为均值。
在DXN数据集中,BHFR方法的参数设置为:决策树叶节点最小样本数N smples为7, RSM特征选择数量
Figure PCTCN2022127864-appb-000211
决策树的数量N tree为10,特征映射层和特征增强层中混合森林组的数量N Forest均为10,潜在特征贡献率阈值η为0.9,正则化参数λ为2^-10。
类似基准数据集,首先基于全联接混合矩阵和特征空间A确定用于特征增强层和增量学习层潜在特征数量。在DXN数据集中A的特征维数为316维。当潜在特征贡献率阈值η为0.9时,DXN数据集中选择的潜在特征数量分别为35个。接着,计算35个潜在特征与DXN真值间的互信息值。将互信息阈值ζ设置为0.75,DXN数据集中被选的潜在特征数量为6个。
进一步,预设增量学习层的混合森林组单元数量为1000,相应地BHFR模型的训练误差与混合森林组数量间的关系如图3所示。
由图3所示的训练误差曲线可知,BHFR在DXN数据集上的训练过程可收敛至某一确定下限值。
然后,采用RF、DFR、DFR-clfc和BLS-NN与所提BHFR进行对比,参数设置为:(1)RF,决策树叶节点最小样本数N smples为3,RSM特征选择数量为
Figure PCTCN2022127864-appb-000212
决策树的数量N tree为500;(2)DFR,决策树叶节点最小样本数N smples为3,RSM特征选择数量为
Figure PCTCN2022127864-appb-000213
决策树的数量N tree为500,每层中RF和CRF模型的数量N RF和N CRF均为2,总层数设置为50;(3)DFR-clfc,决策树叶节点最小样本数N smples为3,RSM特征选择数量为
Figure PCTCN2022127864-appb-000214
决策树的数量N tree为500,每层中RF和CRF模型的数量N RF和N CRF均为2,总层数设置为50;(4)BLS-NN,特征节点数N m为5,增强节点数N e为41,神经元数量N n为9和正则化参数λ为2^30。上述方法在相同条件下重复20次实验,其统计结果和预测曲线如表1和图4a-4c所示。
表1 DXN数据集实验结果
Figure PCTCN2022127864-appb-000215
Figure PCTCN2022127864-appb-000216
由表1和图4a-4c可知:1)RF在训练、验证和测试中的RMSE、MAE和R 2指标均值统计结果均优于DFR,但在稳定性指标上弱于DFR;2)DFR和DFR-clfc,在建模精度上与RF接近,同时建模稳定性要好于RF,其中DFR-clfc在训练、验证和测试集的精度略高于DFR,但DFR的稳定性更好;3)BLS-NN对训练数据出现了明显的过拟合,其在验证和测试集中的泛化性能和稳定性上均表现最差,表明BLS-NN难以适用于本申请中的真实工业过程的小样本高维数据;4)BHFR在测试集中的RMSE、MAE和R 2指标的均值统计结果均为最佳,稳定性仅弱于DFR,表明BHFR具有良好的泛化性能和稳定性。
综上可知,DXN软测量建模实验表明本申请所提BHFR具有比经典RF、DFR极其改进版DFR-clfc更好的训练学习能力,同时在测试集上的建模精度和对数据的拟合程度也强于RF、DFR、DFR-clfc和BLS-NN,体现了其在构建DXN软测量模型中的明显优势。
本发明提供的基于宽度混合森林回归的MSWI过程二噁英排放软测量方法,建立了基于BHFR的软测量模型,其结合了宽度学习建模、集成学习和潜在特征提取等算法,1)基于宽度学习系统框架,采用非微分学习器构建了包含特征映射层、潜在特征提取层、特征增强层和增量学习层的软测量模型;2)利用信息全联接、潜在特征提取和互信息度量对BHFR模型内部信息进行处理,有效保证了BHFR模型内部特征信息的传递最大化和冗余度最小化;3)采用混合森林组为映射单元实现建模过程的增量学习,通过伪逆策略快速计算输出层权重矩阵,再利用训练误差的收敛程度自适应调整增量学习,实现了高精度的软测量建模。在高维基准数据集和工业过程DXN数据集上验证了所提方法的有效性和合理性。
本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处。综上所述,本说明书内容不应理解为对本发明的限制。

Claims (5)

  1. 一种基于宽度混合森林回归的MSWI过程二噁英排放软测量方法,基于BLS框架,以非微分基学习器替换神经元构建面向小样本高维数据的BHFR软测量模型,其特征在于,所述BHFR软测量模型包括特征映射层、潜在特征提取层、特征增强层和增量学习层的构建,具体包括以下步骤:
    S1,构建特征映射层,构建由随机森林RF和完全随机森林CRF组成的混合森林组对高维特征进行映射;
    S2,构建潜在特征提取层,依据贡献率对全联接混合矩阵的特征空间进行潜在特征提取,基于信息度量准则保证潜在有价值信息的最大化传递和最小化冗余,降低模型复杂度和计算消耗;
    S3,构建特征增强层,基于所提取的潜在特征训练特征增强层以进一步增强特征表征能力;
    S4,构建增量学习层,通过增量式学习策略构建增量学习层,采用Moore-Penrose伪逆获得权重矩阵,进而实现BHFR软测量模型的高精度建模;
    S5,采用高维基准数据集和工业过程DXN数据集验证所述软测量模型;
    S6,采用步骤S1-S5建立的软测量模型,对MSWI过程二噁英排放进行软测量。
  2. 根据权利要求1所述的基于宽度混合森林回归的MSWI过程二噁英排放软测量方法,其特征在于,步骤S1,构建特征映射层,构建由随机森林RF和完全随机森林CRF组成的混合森林组对高维特征进行映射,具体包括:
    设原始数据为{X,y},其中
    Figure PCTCN2022127864-appb-100001
    是原始输入数据,N Raw是原始数据的数量,M是原始输入数据的维数,其来源于MSWI过程的六个不同阶段,以秒为单位在DCS系统采集与存储,
    Figure PCTCN2022127864-appb-100002
    是DXN排放浓度的输出真值,其来源于采用离线检测法得到排放物DXN检测样本;以特征映射层的第nth个混合森林组为例描述特征映射层的建模过程:
    对{X,y}进行Bootstrap和随机子空间RSM采样,获得混合森林组模型的J个训练子集,如下:
    Figure PCTCN2022127864-appb-100003
    其中,
    Figure PCTCN2022127864-appb-100004
    Figure PCTCN2022127864-appb-100005
    为第J个训练子集的输入和输出,
    Figure PCTCN2022127864-appb-100006
    Figure PCTCN2022127864-appb-100007
    表示特征映射层中对第nth个混合森林组的Bootstrap和RSM采样,P Bootstrap表示Bootstrap采样概率;
    基于
    Figure PCTCN2022127864-appb-100008
    训练包含J个决策树的混合森林算法,其中特征映射层中的第nth个混合森林组的第jth个决策树表示如下:
    Figure PCTCN2022127864-appb-100009
    其中,L表示决策树叶节点数量,I(·)表示指示函数,c l采用递归分裂方式计算;
    RF中决策树的分裂损失函数Ω i(·)表示为:
    Figure PCTCN2022127864-appb-100010
    其中,Ω i(s,v)表示第sth个特征的值v作为切分准则的损失函数值,y L表示左叶节点的DXN排放浓度真值向量,E[y L]表示y L的数学期望,y R表示右叶节点的DXN排放浓度真值向量,E[y R]表示y R的数学期望,
    Figure PCTCN2022127864-appb-100011
    表示左叶节点第i个DXN排放浓度真值,
    Figure PCTCN2022127864-appb-100012
    表示右叶节点第i个DXN排放浓度真值,c L表示左叶节点DXN排放浓度预测输出,c R表示右叶节点DXN排放浓度预测输出;
    通过最小化Ω i(s,v),将训练集
    Figure PCTCN2022127864-appb-100013
    切分为两个树节点,如下:
    Figure PCTCN2022127864-appb-100014
    其中,
    Figure PCTCN2022127864-appb-100015
    Figure PCTCN2022127864-appb-100016
    表示切分后左右两个树节点所包含的样本集,N L和N R分别表示
    Figure PCTCN2022127864-appb-100017
    Figure PCTCN2022127864-appb-100018
    中的样本数量;
    当前左右树节点的DXN排放浓度预测输出值输出值
    Figure PCTCN2022127864-appb-100019
    Figure PCTCN2022127864-appb-100020
    为样本真值的期望,如下:
    Figure PCTCN2022127864-appb-100021
    其中,y L和y R表示
    Figure PCTCN2022127864-appb-100022
    Figure PCTCN2022127864-appb-100023
    中的DXN排放浓度真值向量,E[y L]和E[y R]表示y L和y R的数学期望;
    与RF不同,CRF中决策树分裂采用完全随机选择方式,表示为,
    Figure PCTCN2022127864-appb-100024
    其中,
    Figure PCTCN2022127864-appb-100025
    表示完全随机选取第sth个特征的值v作为切分点;
    被随机分裂的左右树节点的DXN排放浓度预测输出值
    Figure PCTCN2022127864-appb-100026
    Figure PCTCN2022127864-appb-100027
    为样本真 值的期望,如下:
    Figure PCTCN2022127864-appb-100028
    通过上述过程,第nth个混合森林组
    Figure PCTCN2022127864-appb-100029
    可表示为,
    Figure PCTCN2022127864-appb-100030
    其中,
    Figure PCTCN2022127864-appb-100031
    表示第nth个随机森林,
    Figure PCTCN2022127864-appb-100032
    表示第nth个完全随机森林;
    进而,第nth个映射特征Z n可表示为
    Figure PCTCN2022127864-appb-100033
    其中,
    Figure PCTCN2022127864-appb-100034
    表示第nth组混合森林对来源于MSWI过程六个不同阶段的原始输入数据第1个样本的映射特征,
    Figure PCTCN2022127864-appb-100035
    表示第nth组混合森林对来源于MSWI过程六个不同阶段的原始输入数据第n Rawth个样本的映射特征,
    Figure PCTCN2022127864-appb-100036
    表示第nth组混合森林对来源于MSWI过程六个不同阶段的原始输入数据第N Rawth个样本的映射特征;
    最终,特征映射层的输出表示为:
    Figure PCTCN2022127864-appb-100037
    其中,Z 1为第1个映射特征,Z 2为第2个映射特征,Z N为第N个映射特征,映射特征矩阵Z N包含N Raw个样本和2N维特征。
  3. 根据权利要求2所述的基于宽度混合森林回归的MSWI过程二噁英排放软测量方法,其特征在于,所述步骤S2,构建潜在特征提取层,依据贡献率对全联接混合矩阵的特征空间进行潜在特征提取,基于信息度量准则保证潜在有价值信息的最大化传递和最小化冗余,降低模型复杂度和计算消耗,具体包括:
    首先,来源于MSWI过程六个不同阶段的原始输入数据X与特征映射矩阵Z N组合得到全联接混合矩阵A,表示为:
    Figure PCTCN2022127864-appb-100038
    其中,A含N Raw个样本和(M+2N)维特征;
    接着,考虑到A的维数远高于原始数据,此处利用PCA最小化A中的冗余信息,计算A的相关矩阵R,如下:
    Figure PCTCN2022127864-appb-100039
    进一步,对R进行奇异值分解,得到(M+2N)个特征值和相应特征向量,如下:
    R=U (M+2N)Σ (M+2N)V (M+2N)    (13)
    其中,U (M+2N)表示(M+2N)阶正交矩阵,Σ (M+2N)表示(M+2N)阶对角矩阵,V (M+2N)表示(M+2N)阶正交矩阵;
    Figure PCTCN2022127864-appb-100040
    其中,σ 12>…>σ (M+2N)表示由大到小排列的特征值;
    然后,根据设定潜在特征贡献阈值η,确定最终的主成分数量,
    Figure PCTCN2022127864-appb-100041
    其中,潜在特征数量Q PCA□(M+2N);
    基于上述确定的Q PCA个潜在特征,获得特征值集合
    Figure PCTCN2022127864-appb-100042
    对应的特征向量矩阵V QPCA,即A的投影矩阵;然后,对A进行特征投影以实现冗余信息的最小化处理,将获得潜在特征记为X PCA,即
    Figure PCTCN2022127864-appb-100043
    其中,
    Figure PCTCN2022127864-appb-100044
    表示前Q PCA个潜在特征的特征向量;
    进一步,计算所选潜在特征X PCA与真值
    Figure PCTCN2022127864-appb-100045
    间的互信息值I MI,如下:
    Figure PCTCN2022127864-appb-100046
    其中,
    Figure PCTCN2022127864-appb-100047
    表示第qth个潜在特征
    Figure PCTCN2022127864-appb-100048
    与DXN排放浓度真值y的联合概率分布,
    Figure PCTCN2022127864-appb-100049
    表示第qth个潜在特征
    Figure PCTCN2022127864-appb-100050
    的边缘概率分布,p(y)表示DXN排放浓度真值y的边缘概率分布;
    接着,通过信息最大化选择机制以保证所选择潜在特征与真值的相关性,表示为:
    Figure PCTCN2022127864-appb-100051
    其中,
    Figure PCTCN2022127864-appb-100052
    表示Q PCA个潜在特征
    Figure PCTCN2022127864-appb-100053
    与真值y的互信息值,ζ表示最大化信息的阈值,
    Figure PCTCN2022127864-appb-100054
    表示与DXN排放浓度真值y信息相关度最大的
    Figure PCTCN2022127864-appb-100055
    个潜在特征;
    最终,获得包括
    Figure PCTCN2022127864-appb-100056
    个潜在特征的新数据集
    Figure PCTCN2022127864-appb-100057
    并设定提取后维数
    Figure PCTCN2022127864-appb-100058
  4. 根据权利要求3所述的基于宽度混合森林回归的MSWI过程二噁英排放软测量方法,其特征在于,所述步骤S3中,构建特征增强层,基于所提取的潜在特征训练特征增强层以进一步增强特征表征能力,具体包括:
    首先对新数据集{X′,y}进行基于Bootstrap和RSM的采样,获取混合森林算法的第个J训练子集,如下:
    Figure PCTCN2022127864-appb-100059
    其中,
    Figure PCTCN2022127864-appb-100060
    Figure PCTCN2022127864-appb-100061
    为第个J训练子集的输入和输出,X′和y为新训练集的输入和输出,
    Figure PCTCN2022127864-appb-100062
    表示对第kth个混合森林组的Bootstrap采样,
    Figure PCTCN2022127864-appb-100063
    表示对第kth个混合森林组的RSM采样;
    接着,以第kth个混合森林组中第j个RF的构建为例,如下:
    Figure PCTCN2022127864-appb-100064
    其中,
    Figure PCTCN2022127864-appb-100065
    表示特征增强层中第kth个混合森林组中RF的第jth个决策树;L表示决策树叶节点的数量;c l采用递归分裂方式计算,具体过程公式(3)-(5);
    进而,可得到特征增强层中第kth个混合森林组中的RF模型,其表示为,
    Figure PCTCN2022127864-appb-100066
    然后,类似地以第kth个混合森林组中的第j个CRF的构建为例,如下:
    Figure PCTCN2022127864-appb-100067
    其中,
    Figure PCTCN2022127864-appb-100068
    表示特征增强层中第kth个混合森林组中CRF的第jth个决策树;c l采用递归分裂方式计算,具体过程见公式(6)-(7);
    进而,可得到特征增强层中第kth个混合森林组的CRF模型,其表示为,
    Figure PCTCN2022127864-appb-100069
    通过上述过程,得到第kth个混合森林组
    Figure PCTCN2022127864-appb-100070
    进而,第kth个增强特征可表示如下:
    Figure PCTCN2022127864-appb-100071
    其中,
    Figure PCTCN2022127864-appb-100072
    表示第kth个混合森林组对新数据中第1个样本的增强映射,
    Figure PCTCN2022127864-appb-100073
    表示第kth个混合森林组对新数据中第n Rawth个样本的增强映射,
    Figure PCTCN2022127864-appb-100074
    表示第kth个混合森林组对新数据中第N Rawth个样本的增强映 射;
    最后,特征增强层的输出H K表示如下:
    Figure PCTCN2022127864-appb-100075
    其中,H 1为第1个增强特征,H 2为第2个增强特征,H K为第K个增强特征;
    当不考虑增量学习策略时,BHFR模型的表示如下:
    Figure PCTCN2022127864-appb-100076
    其中,G K表示特征映射层与特征增强层输出的组合,即G K=[Z N|H K],其包含N Raw个样本和(2N+2K)维特征;W K表示特征映射层和特征增强层与输出层间的权重,其计算如下:
    W K=(λΙ+[G K] TG K) -1[G K] TY    (27)
    其中,Ι表示单位矩阵,λ表示正则项系数;相应地,G K的伪逆计算可表示为:
    Figure PCTCN2022127864-appb-100077
  5. 根据权利要求4所述的基于宽度混合森林回归的MSWI过程二噁英排放软测量方法,其特征在于,所述步骤S4,构建增量学习层,通过增量式学习策略构建增量学习层,采用Moore-Penrose伪逆获得权重矩阵,进而实现
    BHFR软测量模型的高精度建模,具体包括:
    首先,对新数据集{X′,y}进行基于Bootstrap和RSM的采样,获取混合森林算法训练子集,过程如下:
    Figure PCTCN2022127864-appb-100078
    其中,
    Figure PCTCN2022127864-appb-100079
    Figure PCTCN2022127864-appb-100080
    为混合森林算法第个J训练子集的输入和输出,X′和y为新训练集的输入和输出,
    Figure PCTCN2022127864-appb-100081
    Figure PCTCN2022127864-appb-100082
    表示增量学习层中第pth个混合森林组的Bootstrap采样和RSM采样;
    接着,构建第pth个混合森林组中的决策树
    Figure PCTCN2022127864-appb-100083
    Figure PCTCN2022127864-appb-100084
    其过程与特征映射层和特征增量层相同,此处不再赘述;
    进一步,当增加1个混合森林组后,特征映射层、特征增量层和增量学习层的输出G K+1表示如下:
    Figure PCTCN2022127864-appb-100085
    其中,G k=[Z n|H k]包含N Raw个样本和(2N+2K)维特征,G K+1包含N Raw个样本和(2N+2K+2J)维特征;
    然后,进行G K+1的Moore-Penrose逆矩阵的递推更新,如下:
    Figure PCTCN2022127864-appb-100086
    其中,矩阵C和矩阵D的计算如下:
    C=H K+1-G KD    (32)
    Figure PCTCN2022127864-appb-100087
    进而,G K+1的Moore-Penrose逆矩阵的递推公式如下:
    Figure PCTCN2022127864-appb-100088
    进一步,计算特征映射层、特征增量层和增量学习层与输出层间权重的更新矩阵W K+1,如下:
    Figure PCTCN2022127864-appb-100089
    其中,W K=(λΙ+[G K] TG K) -1[G K] TY;
    由于采用上述伪逆更新策略只需要计算增量学习层混合森林组的伪逆矩阵,因此能够实现快速的增量式学习;
    进一步,根据训练误差的收敛程度实现自适应增量学习;
    定义误差的收敛阈值为θ Con用以确定增量学习中混合森林组的数量p;相应地,BHFR模型的增量学习训练误差表示如下:
    Figure PCTCN2022127864-appb-100090
    其中,
    Figure PCTCN2022127864-appb-100091
    表示增量学习第p+1个与第p个混合森林组的训练误差值,
    Figure PCTCN2022127864-appb-100092
    Figure PCTCN2022127864-appb-100093
    表示包含p个和p+1个混合森林组的BHFR模型训练误差;
    最终,所提BHFR软测量模型的预测输出
    Figure PCTCN2022127864-appb-100094
    为:
    Figure PCTCN2022127864-appb-100095
PCT/CN2022/127864 2022-01-19 2022-10-27 基于宽度混合森林回归的mswi过程二噁英排放软测量方法 WO2023138140A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210059984.5A CN114398836A (zh) 2022-01-19 2022-01-19 基于宽度混合森林回归的mswi过程二噁英排放软测量方法
CN202210059984.5 2022-01-19

Publications (1)

Publication Number Publication Date
WO2023138140A1 true WO2023138140A1 (zh) 2023-07-27

Family

ID=81231725

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/127864 WO2023138140A1 (zh) 2022-01-19 2022-10-27 基于宽度混合森林回归的mswi过程二噁英排放软测量方法

Country Status (2)

Country Link
CN (1) CN114398836A (zh)
WO (1) WO2023138140A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738866A (zh) * 2023-08-11 2023-09-12 中国石油大学(华东) 一种基于时间序列特征提取的即时学习的软测量建模方法
CN117970428A (zh) * 2024-04-02 2024-05-03 山东省地质科学研究院 基于随机森林算法的地震信号识别方法、装置及设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114398836A (zh) * 2022-01-19 2022-04-26 北京工业大学 基于宽度混合森林回归的mswi过程二噁英排放软测量方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960873A (zh) * 2019-03-24 2019-07-02 北京工业大学 一种城市固废焚烧过程二噁英排放浓度软测量方法
CN111462835A (zh) * 2020-04-07 2020-07-28 北京工业大学 一种基于深度森林回归算法的二噁英排放浓度软测量方法
WO2021159585A1 (zh) * 2020-02-10 2021-08-19 北京工业大学 一种二噁英排放浓度预测方法
CN114398836A (zh) * 2022-01-19 2022-04-26 北京工业大学 基于宽度混合森林回归的mswi过程二噁英排放软测量方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960873A (zh) * 2019-03-24 2019-07-02 北京工业大学 一种城市固废焚烧过程二噁英排放浓度软测量方法
WO2021159585A1 (zh) * 2020-02-10 2021-08-19 北京工业大学 一种二噁英排放浓度预测方法
CN111462835A (zh) * 2020-04-07 2020-07-28 北京工业大学 一种基于深度森林回归算法的二噁英排放浓度软测量方法
CN114398836A (zh) * 2022-01-19 2022-04-26 北京工业大学 基于宽度混合森林回归的mswi过程二噁英排放软测量方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIA HENG; TANG JIAN; QIAO JUNFEI; YAN AIJUN; GUO ZIHAO: "Soft Measuring Method of Dioxin Emission Concentration for MSWI Process Based on RF and GBDT", 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC), IEEE, 22 August 2020 (2020-08-22), pages 2173 - 2178, XP033809116, DOI: 10.1109/CCDC49329.2020.9164125 *
ZHAN CHOUJUN; ZHENG YUFAN; ZHANG HAIJUN; WEN QUANSI: "Random-Forest-Bagging Broad Learning System With Applications for COVID-19 Pandemic", IEEE INTERNET OF THINGS JOURNAL, IEEE, USA, vol. 8, no. 21, 17 March 2021 (2021-03-17), USA, pages 15906 - 15918, XP011884701, DOI: 10.1109/JIOT.2021.3066575 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738866A (zh) * 2023-08-11 2023-09-12 中国石油大学(华东) 一种基于时间序列特征提取的即时学习的软测量建模方法
CN116738866B (zh) * 2023-08-11 2023-10-27 中国石油大学(华东) 一种基于时间序列特征提取的即时学习的软测量建模方法
CN117970428A (zh) * 2024-04-02 2024-05-03 山东省地质科学研究院 基于随机森林算法的地震信号识别方法、装置及设备

Also Published As

Publication number Publication date
CN114398836A (zh) 2022-04-26

Similar Documents

Publication Publication Date Title
WO2023138140A1 (zh) 基于宽度混合森林回归的mswi过程二噁英排放软测量方法
Xia et al. Dioxin emission prediction based on improved deep forest regression for municipal solid waste incineration process
Bodha et al. A player unknown's battlegrounds ranking based optimization technique for power system optimization problem
CN108549792B (zh) 一种基于潜结构映射算法的固废焚烧过程二噁英排放浓度软测量方法
CN111144609A (zh) 一种锅炉废气排放预测模型建立方法、预测方法及装置
CN111260149B (zh) 一种二噁英排放浓度预测方法
Noushabadi et al. Estimation of higher heating values (HHVs) of biomass fuels based on ultimate analysis using machine learning techniques and improved equation
CN110135057B (zh) 基于多层特征选择的固废焚烧过程二噁英排放浓度软测量方法
Ibikunle et al. Modelling the energy content of municipal solid waste and determination of its physico-chemical correlation using multiple regression analysis
Yildirim et al. Statistical optimization of dilute acid pretreatment of lignocellulosic biomass by response surface methodology to obtain fermentable sugars for bioethanol production
CN114266461A (zh) 基于可视化分布gan的mswi过程二噁英排放风险预警方法
CN114330845A (zh) 基于多窗口概念漂移检测的mswi过程二噁英排放预测方法
WO2023231667A1 (zh) 基于集成t-s模糊回归树的mswi过程二噁英排放软测量方法
Kumar et al. Development of lower heating value prediction models and estimation of energy recovery potential of municipal solid waste and RDF incineration
Olabi et al. Application of artificial intelligence to maximize methane production from waste paper
CN111462835B (zh) 一种基于深度森林回归算法的二噁英排放浓度软测量方法
Ma et al. Supercritical water gasification of organic solid waste: H2 yield and cold gas efficiency optimization considering modeling uncertainties
Bojanovský et al. Rotary Kiln, a Unit on the Border of the Process and Energy Industry—Current State and Perspectives
Cui et al. Multi-condition operational optimization with adaptive knowledge transfer for municipal solid waste incineration process
CN114881355A (zh) 一种基于极限学习机的脱硫系统多工况预测方法
WO2024146070A1 (zh) 一种基于改进生成对抗网络的二噁英排放浓度软测量方法
Andrade et al. A review on the interplay between bioeconomy and soil organic carbon stocks maintenance.
CN113780384B (zh) 基于集成决策树算法的城市固废焚烧过程关键被控变量预测方法
Zhang et al. CO emission predictions in municipal solid waste incineration based on reduced depth features and long short-term memory optimization
CN113780383B (zh) 基于半监督随机森林和深度森林回归集成的二噁英排放浓度预测方法

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 18276179

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22921564

Country of ref document: EP

Kind code of ref document: A1