WO2021159585A1 - 一种二噁英排放浓度预测方法 - Google Patents

一种二噁英排放浓度预测方法 Download PDF

Info

Publication number
WO2021159585A1
WO2021159585A1 PCT/CN2020/080528 CN2020080528W WO2021159585A1 WO 2021159585 A1 WO2021159585 A1 WO 2021159585A1 CN 2020080528 W CN2020080528 W CN 2020080528W WO 2021159585 A1 WO2021159585 A1 WO 2021159585A1
Authority
WO
WIPO (PCT)
Prior art keywords
dxn
sub
model
gbdt
training
Prior art date
Application number
PCT/CN2020/080528
Other languages
English (en)
French (fr)
Inventor
汤健
夏恒
乔俊飞
郭子豪
Original Assignee
北京工业大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京工业大学 filed Critical 北京工业大学
Publication of WO2021159585A1 publication Critical patent/WO2021159585A1/zh
Priority to US17/544,213 priority Critical patent/US20220092482A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/0004Gaseous mixtures, e.g. polluted air
    • G01N33/0009General constructional details of gas analysers, e.g. portable test equipment
    • G01N33/0027General constructional details of gas analysers, e.g. portable test equipment concerning the detector
    • G01N33/0036General constructional details of gas analysers, e.g. portable test equipment concerning the detector specially adapted to detect a particular component
    • G01N33/0047Organic compounds
    • G01N33/0049Halogenated organic compounds

Definitions

  • the invention belongs to the technical field of urban solid waste incineration, and in particular relates to a method for predicting the concentration of dioxin emission based on the hybrid integration of random forest and gradient boosting tree.
  • MSWI Municipal solid waste incineration
  • the soft-sensing method has the ability to predict difficult-to-measure parameters faster and more economically than direct offline analysis and related object detection, and it has been widely used in the industrial field [13].
  • Random forest (RF) algorithm has strong noise processing and nonlinear data modeling capabilities [17,18], but it is less used for nonlinear regression [19].
  • Literature [20] is oriented towards electrostatic sensor arrays, and uses an RF-based integrated model to predict the moisture content of biomass in the fluidized bed.
  • Literature [21] proposed a soft-sensing model based on principal component analysis and RF for online prediction of the tensile properties of polylactide during twin-screw extrusion.
  • Literature [22] proposed an RF model with self-monitoring to estimate the P 80 particle size in the mill online.
  • GBDT gradient boosting decision tree
  • LR logistic regression
  • VFI voting feature interval
  • Literature [26] uses GBDT to predict building energy consumption.
  • Literature [27] builds a prediction model based on GBDT to automatically determine the load cycle of the power system.
  • Literature [28] proposed a GBDT-based photovoltaic power prediction model. The main idea is to integrate binary trees through gradient boosting.
  • Literature [29] uses an example-based transfer learning method combined with GBDT to establish a wind power quantile regression model.
  • Literature [30] combined GBDT and proposed a prediction model based on the Bagging integrated learning framework. The above studies mostly use a single RF or GBDT algorithm for modeling, and it is difficult to effectively construct a DXN emission concentration prediction model with small samples and high-dimensional characteristics.
  • Dioxins are highly toxic pollutants emitted from the MSWI process.
  • the actual industrial process mainly measures the DXN emission concentration by first collecting the exhaust gas samples on the spot and then testing and analyzing the DXN emission concentration in the laboratory, which has problems such as long cycle and high cost.
  • This application uses the process variables collected in real time by the process control system to establish a DXN emission concentration prediction model based on the hybrid integration of Random Forest (RF) and Gradient Boosting Tree (GBDT).
  • RF Random Forest
  • GBDT Gradient Boosting Tree
  • MSW is transported by vehicles to the weighbridge and discharged into the garbage pool. After 3-7 days of biological fermentation and dehydration, the MSW is thrown into the hopper by the garbage grab, and then pushed to the incinerator grate via the feeder. There are three main stages of drying, burning and burning.
  • the combustible components in the dried MSW begin to ignite and burn through the combustion-supporting air delivered by the primary fan.
  • the generated ash falls from the end of the grate to the slag conveyor and then enters the slag pit, and finally is landfilled at the designated location.
  • the temperature of the high-temperature flue gas generated in the combustion process should be controlled above 850°C in the first combustion chamber to ensure the decomposition and combustion of harmful gases.
  • the air transported by the secondary fan When the flue gas passes through the second combustion chamber, the air transported by the secondary fan generates a high degree of turbulence and ensures that the flue gas stays for more than 2 seconds, so that the harmful gas is further decomposed.
  • the high-temperature flue gas then enters the waste heat boiler system, and the high-temperature steam generated by the absorption of heat drives the turbine generator unit to generate electricity.
  • the flue gas mixed with lime and activated carbon enters the deacidification reactor for neutralization reaction, adsorbing DXN and heavy metals, and then the flue gas particles, neutralization reactants and activated carbon adsorbents are removed in the bag filter.
  • Part of the soot mixture is After adding water to the mixer, re-enter the deacidification reactor for repeated treatment.
  • the fly ash produced by the reactor and the bag filter enters the fly ash tank and needs to be transported to relevant institutions for further processing.
  • the final exhaust gas is discharged to the atmosphere through the chimney through the induced draft fan, which contains soot, CO, NOx, SO 2 , HCL, HF, Hg, Cd, DXN and other substances.
  • the MSWI process mainly converts MSW into residue, fly ash, flue gas and heat, among which the three products of residue, fly ash and flue gas are related to the emission of DXN [31].
  • Furnace residues are produced in a large amount, but the DXN concentration is low; the amount of fly ash produced is less than that of residues, and its DXN concentration is higher than that of residues; the DXN concentration in flue gas includes incomplete combustion formation and new synthesis reaction formation [32 ].
  • companies and environmental protection departments conduct offline testing on a monthly or quarterly cycle, which is not only a long cycle but also expensive.
  • DXN modeling data has problems such as few true value samples and high dimension of process variables; at the same time, there are also objective problems such as unknown DXN content in MSW, complicated and unclear mechanism of DXN generation and absorption stage. Therefore, the use of soft measurement technology to establish a DXN emission concentration prediction model meets actual needs.
  • This paper proposes a hybrid integrated DXN modeling strategy of RF and GBDT (EnRFGBDT), including random sampling of training samples and input features, RF-based DXN sub-model construction, GBDT-based DXN sub-model construction and simple average DXN integrated prediction. Two modules, as shown in Figure 2.
  • the internal sub-models of the EnRFGBDT model mentioned in this paper are all constructed using the CART regression tree to maximize growth.
  • the training subset of the RF-based DXN sub-model and its input features are generated by random sampling, and the number of features is much smaller than the number of features in the initial modeling data, thereby reducing the correlation between the CART regression trees and improving the outlier And the robustness of noisy data.
  • Multiple serial DXN sub-models based on GBDT also further improve the prediction accuracy of the CART regression tree.
  • a DXN integrated prediction model with a "parallel + serial" model was established. The functions of the different sub-modules are as follows:
  • Random sampling module of training samples and input features Randomly sample the training sample set ⁇ X ⁇ R N ⁇ M ,y ⁇ R N ⁇ 1 ⁇ with replacement N times and randomly select a fixed number of input features to generate Training subset
  • (2) RF-based DXN sub-model building module use the training subset generated in the previous module Establish RF-based DXN sub-model The predicted value of DXN emission concentration And measured value Subtract to get the prediction error
  • DXN sub-model building module based on GBDT the error output by the previous module As the true value of the output data, and the input data of the training subset Form a new training subset After one iteration for each training subset, I ⁇ J GDBT-based DXN sub-models are constructed
  • DXN integrated prediction module based on simple average the DXN sub-model based on RF And GBDT-based DXN sub-model Carry out simple averaging to establish the final DXN emission concentration prediction model.
  • Step 1 Random sampling with replacement and random extraction of the specified number of features on the MSWI process data to generate J training subsets; Step 2 , Construct J DXN sub-models based on RF Step 3 to Prediction error In order to output the true value of the data, I iterative learning is performed to obtain I ⁇ J GBDT-based DXN sub-models In the fourth step, the DXN sub-model based on RF and GBDT is simply averaged and weighted to obtain the final DXN emission concentration integrated prediction model.
  • the specific working process of the training sample and input feature random sampling module is:
  • Bootstrap and random subspace method are used to process MSWI process data.
  • Bootstrap is used to extract the training subset with the same number of samples as the training sample subset, and then the RSM mechanism is introduced to randomly select some features, and finally J training subsets containing N samples and M j features are generated.
  • the generation process of the training subset can be expressed as:
  • the specific working process of the RF-based DXN sub-model building module is:
  • C 1 and C 2 represent the average values of the measured values of the DXN emission concentration in the regions R 1 and R 2 respectively.
  • the RF-based DXN sub-model constructed by CART regression tree can be expressed as:
  • (e j, 0 ) n represents the prediction error of the DXN emission concentration based on the nth training sample.
  • the GBDT-based DXN sub-model of this application is implemented by constructing multiple "series" weak learner models, where: the input data of the training subset of multiple weak learner models remains unchanged, except for the first sub-model
  • the true value of the output data of the training subset is the error between the predicted value and the measured value of the RF-based sub-model, and the prediction error of the previous iteration of the GBDT sub-model is used as the true value of the output data of the training subset.
  • e j,1 is used as the second DXN sub-model based on GBDT The true value of the output data of the training subset.
  • the second DXN sub-model can be expressed as,
  • (e j, 1 ) n represents the prediction error of the first DXN sub-model based on GBDT for the nth sample.
  • Ith sub-model can be expressed as,
  • (e j,I-1 ) n represents the prediction error of the (I-1)th DXN sub-model based on GBDT for the nth sample.
  • this paper constructs 1 RF-based and 1 GBDT-based DXN sub-models. These sub-models are generated in a serial manner, and the sum of their prediction outputs is used as the overall output of the jth training subset , Can be expressed as,
  • the modeling data in this paper is the inspection data of the 1# and 2# furnaces of a MSWI power plant in Beijing in the past 6 years, including process variables as input data and DXN emission concentration measurement values as output data.
  • the process variables are derived from the power generation system. (53), public electrical system (115), waste heat boiler system (14), incineration system (79), flue gas treatment system (20) and terminal detection system (6); DXN as output data
  • the emission concentration is obtained by online collection and offline laboratory analysis, and its unit is ng/Nm 3 . Of the total 67 samples, 2/3 (45) are used as training data, and 1/3 (22) are used as test data.
  • the RF and GBDT methods both use the square error as the loss function, the number of random samples is 45, the range of the number of input features is [10,20,30,40,50,60,70,80,90,100], the iteration of GBDT
  • the frequency range is [1,2,3,4,5,6,7,8,9], and the minimum number of samples contained in the leaf node of the CART regression tree is 3.
  • OOB out-of-bag data
  • RMSE root mean square error
  • Table 1 shows the relationship between the number of input features and the OOB error when the number of fixed CART regression trees is 5 (the experimental result is the average of 50 times).
  • the modeling parameters used for the method proposed in this application are: input feature dimension 10, CART regression tree number 5, GBDT sub-model number (number of iterations) 5.
  • the RMSE statistical results of different methods for the training set and the test set are shown in Table 4.
  • Figures 3 and 4 show the prediction curves of RF, GBDT and the method proposed in this application, respectively.
  • this paper establishes a hybrid integrated DXN emission concentration prediction model based on random forest (RF) and gradient boosting tree (GBDT), which is innovative Reflected in:
  • the first layer DXN sub-model constructed by RF and GBDT are used to construct multiple DXN sub-models, and at the same time, dimensionality reduction and model prediction errors are reduced.
  • the simulation experiment results based on the real data of the MSWI process show that the proposed method is superior to the single RF and GBDT prediction model in terms of prediction effect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Medicinal Chemistry (AREA)
  • Combustion & Propulsion (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Food Science & Technology (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Educational Administration (AREA)
  • Mathematical Optimization (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Mathematical Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)

Abstract

本发明公开一种基于随机森林和梯度提升树混合集成的二噁英排放浓度预测方法,首先,针对具有小样本高维特性的DXN建模数据进行训练样本和输入特征的随机采样以生成训练子集;接着,基于训练子集建立J个基于RF的DXN子模型;然后,对每个基于RF的DXN子模型进行I次迭代,构建J×I个基于GBDT的DXN子模型;最后,对基于RF和GBDT的DXN子模型的预测输出采用简单平均加权方式进行合并,获得最终输出。采用集成RF和GBDT的DXN预测模型构建方法能够提高DXN在线预测精度,辅助进行MSWI过程操作参数的运行优化,提高企业经济效益。

Description

一种二噁英排放浓度预测方法 技术领域
本发明属于城市固废焚烧技术领域,尤其涉及一种基于随机森林和梯度提升树混合集成的二噁英排放浓度预测方法。
背景技术
经济的迅速发展和城市化建设的不断升级使得我国城市固体废物(MSW)的产生量迅速增加,特别是在经济发达和人口密集的地区,某些城市正面临着垃圾围城危机[1]。城市固体废物焚烧(MSWI)发电是实现垃圾减量化、资源化、无害化的典型处理方式[2]。目前国内MSWI发电厂数量已超过300座,炉排炉式焚烧炉占比超过了2/3[3]。由于我国垃圾组分的特殊性,导致引进的焚烧设备多处于人工手动控制运行状态,常出现“水土不服”的现象,同时造成了MSWI排放不合标等问题[4]。针对这种现象,最为紧要的问题是:如何在满足经济效益的情况下控制MSWI过程的污染排放[5]。二噁英(DXN)作为MSWI排放的一种具有极强化学性和热稳定性的剧毒持久性有机污染物,是造成焚烧建厂出现“邻避效应”的主要原因之一[6]。
在实际工业过程中,主要通过在线采样与离线实验分析相结合的方法按照一定的周期进行DXN排放浓度检测[3],但该方式成本昂贵且周期比较长,主要问题是:难以支撑MSWI运行参数的实时优化控制以达到使DXN排放浓度最小化的目的[7]。因此,实现DXN排放浓度的在线预测非常必要。MSWI过程具有复杂的物理和化学特性,难以建立DXN排放浓度的精确机理模型[8]。DXN排放浓度的在线预测是实现MSWI过程优化控制必不可少的重要环节[9]。针对DXN的在线检测研究,目前多是先进行相关关联物的测量再通过映射关系实现DXN的在线预测[10,11,12];但存在设备昂贵、适应性弱和预测精度有待提升等问题[3]。软测量方法具有比直接离线分析和关联物检测更快、更经济地预测难测参数的能力,其在工业领域中已得到广泛应用[13]。针对MSWI过程,已存在采用特征选择结合神经网络构建DXN预测建模的研究[14,15,16];由于DXN建模数据具有的样本少、维数高、共线性等特性,使得这些方法存在易落入局部最小值、过拟合和模型泛化性能差等问题。
针对传统单一预测模型存在的局限性,基于集成学习的预测模型成为当前研究热点。随机森林(RF)算法具有较强的噪声处理和非线性数据建模能力[17,18],但较少用于非线性回归[19]。文献[20]面向静电传感器阵列,采用基于RF的集成模型预测硫化床中生物质的水分含量。文献[21]提出基于主成分分析和RF的软测量模型,用于在线预测双螺杆挤出过程中聚丙交酯的拉伸性能。文献[22]提出了具有自我监测的RF模型在线估算磨机中的P 80粒径。相对于基于建模数据采样进行并行集成的RF算法,梯度提升决策树(GBDT)是另外一种比较流行的机器学习算法[23],但在样本特征维数较高且样本数量较大时,其效率和可伸缩性仍有待提高[24]。文献[25]集逻辑回归(LR)、GBDT和投票特征间隔(VFI)等方法于一体对滑坡敏感性进行评估。文献[26]采用GBDT进行建筑能耗预测。文献[27]构建基于GBDT的自动判断电力系统负荷周期的预测模型。文献[28]提出了一种基于GBDT的光伏功率预测模型,主要思想是通过梯度提升对二叉树进行集成融合。文献[29]采用基于实例的迁移学习方法结合GBDT建立风力发电分位数回归模型。文献[30]结合GBDT提出了基于Bagging集成学习框架的预测模型。以上研究多采用单一的RF或GBDT算法建模,难以有效构建具有小样本、高维特性的DXN排放浓度预测模型。
发明内容
二噁英(DXN)是城市固废焚烧(MSWI)过程排放的剧毒污染物。目前实际工业过程主要通过先现场采集排放烟气样品再在实验室化验分析的方式对DXN排放浓度进行检测,存在周期长、费用高等问题。本申请利用过程控制系统实时采集的过程变量,建立基于随机森林(RF)和梯度提升树(GBDT)混合集成的DXN排放浓度预测模型。首先,针对具有小样本高维特性的DXN建模数据进行训练样本和输入特征的随机采样以生成训练子集;接着,基于训练子集建立J个基于RF的DXN子模型;然后,对每个基于RF的DXN子模型进行I次迭代,构建J×I个基于GBDT的DXN子模型;最后,对基于RF和GBDT的DXN子模型的预测输出采用简单平均加权方式进行合并,获得最终输出。采用集成RF和GBDT的DXN预测模型构建方法能够提高DXN在线预测精度,辅助进行MSWI过程操作参数的运行优化,提高企业经济效益。
附图说明
图1城市固废焚烧工艺流程;
图2建模策略图;
图3训练数据的预测曲线;
图4测试数据的预测曲线。
具体实施方式
面向DXN生成的MSWI过程描述
MSW通过车辆运输到地磅称重后卸入垃圾池,经3~7天的生物发酵和脱水后,由垃圾抓斗投放至加料斗,经进料器将其推送到焚烧炉排上,先后经历干燥、燃烧和燃烬三个主要阶段。干燥后的MSW中的可燃成分通过一次风机输送的助燃空气开始着火燃烧,产生的灰渣从炉排末端落至输渣机上后再进入到渣坑,最后在指定地点进行填埋处理。燃烧过程产生的高温烟气在一燃室的温度应控制在850℃以上,以保证有害气体的分解和燃烧。烟气经过二燃室时,通过二次风机输送的空气产生高度湍流并保证烟气停留超过2s,使有害气体进一步分解。高温烟气随后进入余热锅炉系统,通过吸热产生的高温蒸汽推动汽轮发电机组进行发电。随后烟气混合石灰和活性炭进入脱酸反应器发生中和反应,吸附其中的DXN和重金属,接着在袋式除尘器中被除去烟气颗粒物、中和反应物和活性炭吸附物,部分烟灰混合物在混合器中加水后重新进入脱酸反应器进行重复处理。反应器和袋式除尘器产生的飞灰进入到飞灰罐后需运输至相关机构以进一步处理。最终的尾气通过引风机经烟囱排放到大气,其包含烟尘、CO、NOx、SO 2、HCL、HF、Hg、Cd和DXN等物质。
由图1可知,MSWI过程主要是将MSW转化为残渣、飞灰、烟气与热量,其中残渣、飞灰与烟气三种产物与DXN的排放相关[31]。炉膛残渣产生量多,但DXN浓度含量较低;飞灰产生量比残渣少,其DXN浓度比残渣高;烟气中的DXN浓度包括不完全燃烧生成和新规合成反应生成两种方式[32]。目前,针对DXN检测主要是企业和环保部门以月或者季度为周期进行离线化验,不仅周期长而且费用昂贵。由此可知,DXN建模数据存在真值样本少、过程变量维数高等问题;同时,也存在MSW中的DXN含量未知、DXN生成和吸收阶段的机理复杂不清等客观问题。因此,采用软测量技术建立DXN排放浓度预测模型符合实际需求。
本文提出RF和GBDT(EnRFGBDT)混合集成的DXN建模策略,包含训练样本与输入特征随机采样、基于RF的DXN子模型构建、基于GBDT的DXN子模型构建和基于简单平均的DXN集成预测共4个模块,如图2所示。
在图2中,
Figure PCTCN2020080528-appb-000001
表示与采集DXN化验样品同时段的MSWI过程的炉膛温度、活性炭喷射量、烟囱排放气体浓度、炉排速度、一次风\二次风流量由过程控制系统所采集的过程变量(输入特征)所组成的输入数据,其中N为训练样本数量,M为过程变量数量;
Figure PCTCN2020080528-appb-000002
表示在MSWI过程末端,即在烟囱排放处进行在线采集离线化验的DXN排放浓度组成的输出数据;{X,y}表示由输入数据和输出数据所组成的训练样本集;{X j,y j}表示第jth个从{X,y}中随机采样得到的训练子集,
Figure PCTCN2020080528-appb-000003
表示全部训练子集;J为训练子集的数量,同时也为基于RF的DXN子模型数量;
Figure PCTCN2020080528-appb-000004
表示第jth个基于RF的DXN子模型
Figure PCTCN2020080528-appb-000005
的DXN排放浓度预测值,
Figure PCTCN2020080528-appb-000006
表示全部基于RF的DXN子模型的预测输出;e j,0表示第jth个基于RF的DXN子模型的DXN排放浓度预测值
Figure PCTCN2020080528-appb-000007
与测量值y j的误差;e j,1表示基于第jth个训练子集的基于GBDT的第1个DXN子模型的误差预测值
Figure PCTCN2020080528-appb-000008
与作为其输出数据真值e j,0的误差;e j,i表示针对第jth个训练子集的基于GBDT的第ith个DXN子模型
Figure PCTCN2020080528-appb-000009
的误差预测值
Figure PCTCN2020080528-appb-000010
与作为其输入数据真值e j,i-1的误差;
Figure PCTCN2020080528-appb-000011
表示针对第jth个训练子集的基于GBDT的全部DXN子模型的误差预测输出,I为针对单个训练子集基于GBDT的DXN子模型数量,也是针对单个训练子集的迭代次数;
Figure PCTCN2020080528-appb-000012
表示混合集成模型的DXN排放浓度预测输出。
本文所提EnRFGBDT模型内部的子模型均采用最大化生长的CART回归树构建。基于RF的DXN子模型的训练子集及其输入特征采用随机采样方式产生,其特征数量远小于初始建模数据中的特征数量,进而降低了CART回归树间的相关性,提高了对异常值和噪声数据的鲁棒性。基于GBDT的多个串行DXN子模型也进一步提高了CART回归树的预测精度。最终建立了具有“并行+串行”模式的DXN集成预测模型。不同子模块的功能如下所示:
(1)训练样本与输入特征随机采样模块:对训练样本集{X∈R N×M,y∈R N×1}进行有放回的N次随机采样并随机选择固定数量的输入特征,生成训练子集
Figure PCTCN2020080528-appb-000013
(2)基于RF的DXN子模型构建模块:利用前一模块中生成的训练子集
Figure PCTCN2020080528-appb-000014
建立基于RF的DXN子模型
Figure PCTCN2020080528-appb-000015
将DXN排放浓度预测值
Figure PCTCN2020080528-appb-000016
与测量值
Figure PCTCN2020080528-appb-000017
进行相减,得到预测误差
Figure PCTCN2020080528-appb-000018
(3)基于GBDT的DXN子模型构建模块:以上一模块输出的误差
Figure PCTCN2020080528-appb-000019
作为输出数据真值,和训练子集输入数据
Figure PCTCN2020080528-appb-000020
组成新的训练子集
Figure PCTCN2020080528-appb-000021
针对每个训练子集进行I次迭代后,构建I×J个基于GDBT的DXN子模型
Figure PCTCN2020080528-appb-000022
(4)基于简单平均的DXN集成预测模块:将基于RF的DXN子模型
Figure PCTCN2020080528-appb-000023
和基于GBDT的DXN子模型
Figure PCTCN2020080528-appb-000024
进行简单平均,建立最终的DXN排放浓度预测模型。
综合上述模块的功能可知,本文所提方法的建模步骤为:第1步,对MSWI过程数据进行有放回的随机采样和指定特征数量的随机抽取,生成J个训练子集;第2步,构建J个基于RF的DXN子模型
Figure PCTCN2020080528-appb-000025
第3步,以
Figure PCTCN2020080528-appb-000026
的预测误差
Figure PCTCN2020080528-appb-000027
为输出数据真值,进行I次迭代学习,得到I×J个基于GBDT的DXN子模型
Figure PCTCN2020080528-appb-000028
第四步,将基于RF和GBDT的DXN子模型进行简单平均加权,得到最终的DXN排放浓度集成预测模型。
训练样本与输入特征随机采样模块的具体工作过程为:
采用自助采样法(Bootstrap)与随机子空间法(RSM)对MSWI过程数据进行处理。利用Bootstrap抽取与训练样本子集的样本数量相同的训练子集,随后引入RSM机制随机选择部分特征,最终生成包含N个样本和M j个特征的J个训练子集。
训练子集的产生过程可表示为:
Figure PCTCN2020080528-appb-000029
其中,{X j,y j}表示第jth个训练子集;
Figure PCTCN2020080528-appb-000030
表示第jth个训练子集的第nth个输入和输出样本对;m=1,L,M j,M j表示第jth个训练子集所包含的输入特征数量,通常存在M j<<M。
基于RF的DXN子模型构建模块的具体工作过程为:
以第jth个训练子集
Figure PCTCN2020080528-appb-000031
为例描述构建过程。
首先去除因随机采样造成的训练子集
Figure PCTCN2020080528-appb-000032
中所存在的重复样本,并将其标记为
Figure PCTCN2020080528-appb-000033
以第mth个输入特征x j,m作为切分变量,以第n selth个样本所对应的值
Figure PCTCN2020080528-appb-000034
作为切分点,将输入特征空间切分为两个区域R 1和R 2
Figure PCTCN2020080528-appb-000035
基于以下准则遍历寻找最佳切分变量(输入特征)编号和切分点取值,
Figure PCTCN2020080528-appb-000036
其中,
Figure PCTCN2020080528-appb-000037
Figure PCTCN2020080528-appb-000038
分别表示第jth个训练子集在R 1和R 2区域的DXN排放浓度的测量值;C 1和C 2分别表示在R 1和R 2两个区域的DXN排放浓度测量值的均值。
基于上述准则,首先通过遍历所有输入特征找到最优切分变量编号和切分点的取值,并将输入特征空间划分为两个区域;然后对每个区域重复上述过程,直到叶子点所包含的训练样本数量少于预先设定的阈值θ RF;最终将输入特征空间划分为K个区域(其中K也表示CART回归树的叶子节点数),将这些区域分别标记为R 1,L,R k,L,R K
采用CART回归树构建的基于RF的DXN子模型可表示为:
Figure PCTCN2020080528-appb-000039
其中,
Figure PCTCN2020080528-appb-000040
其中,
Figure PCTCN2020080528-appb-000041
表示区域R k所包含的训练样本数量;
Figure PCTCN2020080528-appb-000042
表示第jth个训练子集在R k区域的第
Figure PCTCN2020080528-appb-000043
th个DXN排放浓度测量值;I(·)为指示函数,在
Figure PCTCN2020080528-appb-000044
时存在I(·)=1,否则存在I(·)=0.
第jth个训练子集
Figure PCTCN2020080528-appb-000045
构建的基于RF的DXN子模型的预测误差为,
Figure PCTCN2020080528-appb-000046
其中,(e j,0) n表示基于第nth个训练样本的DXN排放浓度预测误差。
重复上述过程,得到采用CART回归树构建的J个基于RF的DXN子模型
Figure PCTCN2020080528-appb-000047
通过将这些子模型的预测输出
Figure PCTCN2020080528-appb-000048
与DXN测量值{y j} j=1相减,获得输出误差
Figure PCTCN2020080528-appb-000049
基于GBDT的DXN子模型构建模块的具体工作过程为:
本申请基于GBDT的DXN子模型是通过构建多个“串联”的弱学习器模型的方式实现,其中:多个弱学习器模型的训练子集的输入数据保持不变,除第1个子模型的训练子集的输出数据真值为基于RF的子模型的预测值与测量值的误差外,其它子模型均以前一次迭代的GBDT子模型的预测误差作为训练子集的输出数据真值。
此处,以第jth个基于GBDT的DXN子模型的构建为例。假定共有I个基于GBDT的DXN子模型需要构建,并且均采用CART回归树构建。
首先,构建第1个子模型
Figure PCTCN2020080528-appb-000050
其可表示为,
Figure PCTCN2020080528-appb-000051
其中,
Figure PCTCN2020080528-appb-000052
表示第1个基于GBDT的DXN子模型的预测输出。
上述子模型的损失函数的定义如下,
Figure PCTCN2020080528-appb-000053
其中,
Figure PCTCN2020080528-appb-000054
表示第jth个训练子集中的第nth个样本的预测值。
然后,计算子模型
Figure PCTCN2020080528-appb-000055
的输出残差e j,1,如下所示,
Figure PCTCN2020080528-appb-000056
接着,e j,1作为基于GBDT的第2个DXN子模型
Figure PCTCN2020080528-appb-000057
的训练子集的输出数据真值。类似的,第2个DXN子模型可表示为,
Figure PCTCN2020080528-appb-000058
其中,(e j,1) n表示针对第nth个样本的基于GBDT的第1个DXN子模型的预测误差。
重复上述过程,可知基于GBDT的第ith(i≤I)个DXN子模型可标记为
Figure PCTCN2020080528-appb-000059
其残差计算如下所示,
Figure PCTCN2020080528-appb-000060
在迭代I-1次之后,第Ith个子模型的训练子集的输出数据真值为,
Figure PCTCN2020080528-appb-000061
其中,
Figure PCTCN2020080528-appb-000062
为第(I-1)th个子模型
Figure PCTCN2020080528-appb-000063
的预测输出。
进而,第Ith个子模型可表示为,
Figure PCTCN2020080528-appb-000064
其中,(e j,I-1) n表示针对第nth个样本的基于GBDT的第(I-1)th个DXN子模型的预测误差。
因此,基于第jth个训练子集构建的全部I个基于GBDT的DXN子模型可表示为
Figure PCTCN2020080528-appb-000065
其相应的输出可表示为
Figure PCTCN2020080528-appb-000066
基于简单平均的DXN集成预测模块的具体工作流程为:
由上述过程可知,数量为J的基于RF的DXN子模型可表示为
Figure PCTCN2020080528-appb-000067
这些模型以并行方式构建;数量为J×I的基于GBDT的DXN子模型可表示为
Figure PCTCN2020080528-appb-000068
这些模型同时以串行和并行方式构建。
针对第jth个训练子集,本文构建了1个基于RF的和I个基于GBDT的DXN子模型,这些子模型以串行方式产生,其预测输出之和作为第jth个训练子集的总体输出,可表示为,
Figure PCTCN2020080528-appb-000069
由于J个训练子集间是并行的,通过简单平均加权方式对上述DXN子模型进行合并,最后DXN排放浓度集成预测模型f DXN(·)可表示如下:
Figure PCTCN2020080528-appb-000070
实验验证
建模数据
本文建模数据为北京某MSWI发电厂1#和2#炉近6年的检验数据,包含作为输入数据的过程变量和作为输出数据的DXN排放浓度测量值,其中:过程变量分别源于发电系统(53个)、公共电气系统(115个)、余热锅炉系统(14个)、焚烧系统(79个)、烟气处理系统(20个)和末端检测系统(6个);作为输出数据的DXN排放浓度采用在线采集离线化验分析的方式获得,其单位为ng/Nm 3。全部67个样本中的2/3(45个)用作训练数据,1/3(22个)用作测试数据。
建模实验
实验中,RF和GBDT方法均采用平方误差作为损失函数,随机样本数量为45,输入特征数量的范围为[10,20,30,40,50,60,70,80,90,100],GBDT的迭代次数范围为[1,2,3,4,5,6,7,8,9],CART回归树叶子节点包含的最小样本数量为3。采用Bootstrap算法抽样的袋外数据(OOB)进行模型测试,以均方根误差(RMSE)作为评估指标。
针对基于RF的DXN预测模型,表1给出了固定CART回归树的数量为5时,输入特征数量与OOB误差间的关系(实验结果为50次的均值)。
表1不同特征数量时的OOB误差
Figure PCTCN2020080528-appb-000071
由表1可知,当特征数量为15时,OOB误差最低。固定输入特征数量后,RF模型中CAR回归树的数量与OOB误差间的关系如表2所示(实验结果为50次的均值)。
表2不同CART树数量时的OOB误差
Figure PCTCN2020080528-appb-000072
由表2可知知,当CART回归树达到40棵时,基于RF的DXN模型具有最小的OOB误差,但其略小表1中的最小值。可见,需要在RF中同时对回归树和输入特征数量进行优化才能够获得更佳的预测性能。
针对基于GBDT的DXN预测模型,平方误差损失函数与迭代次数间的关系如表3所示。
表3 GBDT预测模型中迭代次数与损失函数间的关系
Figure PCTCN2020080528-appb-000073
Figure PCTCN2020080528-appb-000074
由图3可知,损失函数值随迭代次数的增加而逐渐降低,在迭代次数达到5次后,误差的下降趋势变弱。因此,确定合适的迭代次数对降低计算消耗非常必要。
综合考虑上述RF和GBDT模型的建模结果,此处针对本申请所提方法采用的建模参数为:输入特征维数10、CART回归树数量5、GBDT子模型数量(迭代次数)5。不同方法针对训练集和测试集的RMSE统计结果如表4所示。图3和4分别给出了RF、GBDT和本申请所提方法的预测曲线。
表4基于RF、GBDT和所提方法构建DXN模型的统计结果
Figure PCTCN2020080528-appb-000075
由表4、图3和图4可知:(1)基于GBDT的DXN模型在测试集上具有最大的预测误差(0.03529),主要原因在于GBDT采用了全部过程变量作为DXN模型的输入特征,而另外两种方法均对输入特征进行了基于随机选择的约简。可见,对高维过程变量进行特征选择非常有必要;(2)基于RF的DXN模型,在CART回归树数量设为5和输入特征设为15时,其在训练集上的RMSE值最大(0.34060),在测试集中的RMSE(0.030199)小于GBDT(0.035291)方法,说明RF的泛化能力强于GBDT;(3)本文所提EnRFGBDT方法,在训练和测试数据上都具有最好的预测性能,表明所提策略能够同时降低输入特征维度和提升预测模型泛化性能的能力。
针对二噁英(DXN)难以实时检测的问题,基于实际城市固废焚烧过程数据,本文建立了基于随机森林(RF)和梯度提升树(GBDT)的混合集成DXN排放浓度预测模型,其创新性体现在:通过RF构建的首层DXN子模型和GBDT构建多个DXN子模型,同时进行维数约简和降低模型预测误差。基于MSWI过程的真实数据的仿真实验结果表明了所提方法在预测效果上优于单一的RF和GBDT预测模型。
参考文献
[1]Li X,Zhang C,Li Y,et al.The Status of Municipal Solid Waste Incineration(MSWI)in China and its Clean Development.Waste Management,2016,104:498-503.
[2]Li X,Zhang C,Li Y,et al.The Status of Municipal Solid Waste Incineration(MSWI)in China and its Clean Development.Waste Management,2016,104:498-503.
[3]乔俊飞,郭子豪,汤健.面向城市固废焚烧过程的二噁英排放浓度检测方法综述[J/OL].自动化学报:1-26[2019-12-24].https://doi.org/10.16383/j.aas.c190005..
[4]J.W.Lu,S.Zhang,J.Hai,et al.Status and perspectives of municipal solid waste incineration in China:a comparison with developed regions.Waste Manage.Vol.69,170-186,2017.
[5]Yuanan H,Hefa C,Shu T.The growing importance of waste-to-energy(WTE)incineration in China's anthropogenic mercury emissions:Emission inventories and reduction strategies[J].Renewable and Sustainable Energy Reviews,2018,97:119-137.
[6]Li X,Zhang C,Li Y,Zhi Q.The Status of Municipal Solid Waste Incineration(MSWI)in China and its Clean Development.Energy Procedia,2016,104:498-503
[7]Zhang H J,Ni Y W,Chen J P,Zhang Q.Influence of variation in the operating conditions on PCDD/F distribution in a full-scale MSW incinerator[J].Chemosphere,2008,70(4):721-730.
[8]B.R.Stanmore.Modeling the formation of PCDD/F in solid waste incinerators,Chemosphere,Vol.47,565-773,2002.
[9]Lavric E D,Konnov A A,Ruyck J D.Surrogate compounds for dioxins in incineration.A review.Waste Management,2005,25(7):755-765
[10]Li A-Dan,Hong-Wei,Wang Jing.Online detection of dioxin and dioxin-related substances using laser desoption/laser ionization-mass spectrometry.Journal of Yanshan University,2015,39(6):511-515.
[11]Cao Y,Shang Fan-Jie,Pan Deng-Gao.Gas Chromatography-Mass Spectrometry Transmission Line System for On-line Detection of Dioxins.China,CN206378474U,2017-08-04.
[12]Nakui H,Koyama H,Takakura A,Watanabe N.Online measurements of low-volatile organic chlorine for dioxin monitoring at municipal waste incinerators.Chemosphere,2011,85(2):151-155
[13]F.A.A.Souza,R.Araújo,J.Mendes,Review of soft sensor methods for regression applications,Chemometr.Intell.Lab.Syst.152(2016)69–79.
[14]Bunsan S,Chen W Y,Chen H W,Chuang Y H,Grisdanurak N.Modeling the dioxin emission of a municipal solid waste incinerator using neural networks.Chemosphere,2013,92:258-264.
[15]Chang N B,Chen W C.Prediction of PCDDs/PCDFs emissions from municipal incinerators by genetic programming and neural network modeling.Waste Management&Research,2000,18,41-351.
[16]Wang Hai-Rui,Zhang Yong,Wang Hua.As tudy of GA-BP based prediction model of Dioxin emis s ion from MSW incinerator.Microcomputer Information,2008,24(21):222-224.
[17]F.Stulp,O.Sigaud,Many regression algorithms,one unified model:areview,Neural Network.69(2015)60–79.
[18]Breiman,L.,2001.Random Forests.Machine Learning.45,5-32.
[19]Kneale,C.,Brown,S.D.,2018.Small moving window calibration models for soft sensing processes with limited history.Chemometrics and Intelligent Laboratory Systems 183,36-46.
[20]Zhang,W.B.,Cheng,X.F.,Hu,Y.H.,Yan,Y.,2019.Online prediction of biomass moisture content in a fluidized bed dryer using electrostatic sensor arrays and the Random Forest method.Fuel 239,437-445.
[21]Mulrennan,K.,Donovan,J.,Creedon,L.,Rogers,I.,Lyons,J.G.,McAfee,M.,2018.A soft sensor for prediction of mechanical properties of extruded PLA sheet using an instrumented slit die and machine learning algorithms.Polymer Testing 69,462-469.
[22]Napier,L.F.A.,Aldrich,C.,2017.An IsaMill(TM)Soft Sensor based on Random Forests and Principal Component Analysis.Ifac Papersonline 50,1175-1180.
[23]Friedman J.Greedy function approximation:a gradient boosting machine.Annals of Statistics,2001,29(5)
[24]Ke,G.L.,Meng,Q.,Finley,T.,Wang,T.F.,Chen,W.,Ma,W.D.,Ye,Q.W.,Liu,T.Y.,2017.LightGBM:A Highly Efficient Gradient Boosting Decision Tree.Advances in Neural Information Processing Systems 30 (Nips 2017)30.
[25]Sachdeva,S.,Bhatia,T.,Verma,A.K.,2020.A novel voting ensemble model for spatial prediction of landslides using GIS.International Journal of Remote Sensing 41,929-952.
[26]Wang,R.,Lu,S.L.,Li,Q.P.,2019.Multi-criteria comprehensive study on predictive algorithm of hourly heating energy consumption for residential buildings.Sustainable Cities and Society 49.
[27]Chen,B.B.,Lin,R.H.,Zou,H.,2018.A Short Term Load Periodic Prediction Model Based on GBDT.2018 Ieee 18th International Conference on Communication Technology(Icct),1402-1406.
[28]Wang,J.D.,Li,P.,Ran,R.,Che,Y.B.,Zhou,Y.,2018.A Short-Term Photovoltaic Power Prediction Model Based on the Gradient Boost Decision Tree.Applied Sciences-Basel 8.
[29]Cai,L.,Gu,J.,Ma,J.H.,Jin,Z.J.,2019.Probabilistic Wind Power Forecasting Approach via Instance-Based Transfer Learning Embedded Gradient Boosting Decision Trees.Energies 12.
[30]Liu,X.L.,Tan,W.A.,Tang,S.,2019.A Bagging-GBDT ensemble learning model for city air pollutant concentration prediction.4th International Conference on Advances in Energy Resources and Environment Engineering 237.
[31]Mckay G.Dioxin characterisation,formation and minimisation during municipal solid waste(MSW)incineration:review.Chemical Engineering Journal,2002,86(3):343-368
[32]Li Hai-Ying,Zhang Shu-Ting,Zhao Xin-Hua.Detection methods of dioxins emitted from municipal solid waste incinerator.Journal of Fuel Chemistry and Technology,2005,33(3):379-384.

Claims (5)

  1. 一种二噁英排放浓度预测方法,其特征在于,包括以下步骤:
    步骤1、通过训练样本与输入特征随机采样模块,对训练样本集{X∈R N×M,y∈R N×1}进行有放回的N次随机采样并随机选择固定数量的输入特征,生成训练子集
    Figure PCTCN2020080528-appb-100001
    其中,
    Figure PCTCN2020080528-appb-100002
    表示与采集DXN化验样品同时段的MSWI过程的炉膛温度、活性炭喷射量、烟囱排放气体浓度、炉排速度、一次风\二次风流量由过程控制系统所采集的过程变量所组成的输入数据,其中N为训练样本数量,M为过程变量数量;
    Figure PCTCN2020080528-appb-100003
    表示在MSWI过程末端,即在烟囱排放处进行在线采集离线化验的DXN排放浓度组成的输出数据;
    步骤2、通过基于RF的DXN子模型构建模块,利用生成的训练子集
    Figure PCTCN2020080528-appb-100004
    建立基于RF的DXN子模型
    Figure PCTCN2020080528-appb-100005
    将DXN排放浓度预测值
    Figure PCTCN2020080528-appb-100006
    与测量值
    Figure PCTCN2020080528-appb-100007
    进行相减,得到预测误差
    Figure PCTCN2020080528-appb-100008
    步骤3、通过基于GBDT的DXN子模型构建模块,以输出的误差
    Figure PCTCN2020080528-appb-100009
    作为输出数据真值,和训练子集输入数据
    Figure PCTCN2020080528-appb-100010
    组成新的训练子集
    Figure PCTCN2020080528-appb-100011
    针对每个训练子集进行I次迭代后,构建I×J个基于GDBT的DXN子模型
    Figure PCTCN2020080528-appb-100012
    步骤4、通过基于简单平均的DXN集成预测模块,将基于RF的DXN子模型
    Figure PCTCN2020080528-appb-100013
    和基于GBDT的DXN子模型
    Figure PCTCN2020080528-appb-100014
    进行简单平均,建立最终的DXN排放浓度预测模型。
  2. 如权利要求1所述的二噁英排放浓度预测方法,其特征在于,所述训练样本与输入特征随机采样模块的具体工作过程为:
    采用自助采样法(Bootstrap)与随机子空间法(RSM)对MSWI过程数据进行处理,利用Bootstrap抽取与训练样本子集的样本数量相同的训练子集,随后引入RSM机制随机选择部分特征,最终生成包含N个样本和M j个特征的J个训练子集,
    训练子集的产生过程可表示为:
    Figure PCTCN2020080528-appb-100015
    其中,{X j,y j}表示第jth个训练子集;
    Figure PCTCN2020080528-appb-100016
    表示第jth个训练子集的第nth个输入和输出样本对;m=1,L,M j,M j表示第jth个训练子集所包含的输入特征数量,通常存在M j<<M。
  3. 如权利要求2所述的二噁英排放浓度预测方法,其特征在于,所述基于RF的DXN子模型构建模块的具体工作过程为:
    以第jth个训练子集
    Figure PCTCN2020080528-appb-100017
    为例描述构建过程,
    首先去除因随机采样造成的训练子集
    Figure PCTCN2020080528-appb-100018
    中所存在的重复样本,并将其标记为
    Figure PCTCN2020080528-appb-100019
    以第mth个输入特征x j,m作为切分变量,以第n selth个样本所对应的值
    Figure PCTCN2020080528-appb-100020
    作为切分点,将输入特征空间切分为两个区域R 1和R 2
    Figure PCTCN2020080528-appb-100021
    基于以下准则遍历寻找最佳切分变量编号和切分点取值,
    Figure PCTCN2020080528-appb-100022
    其中,
    Figure PCTCN2020080528-appb-100023
    Figure PCTCN2020080528-appb-100024
    分别表示第jth个训练子集在R 1和R 2区域的DXN排放浓度的测量值;C 1和C 2分别表示在R 1和R 2两个区域的DXN排放浓度测量值的均值,
    基于上述准则,首先通过遍历所有输入特征找到最优切分变量编号和切分点的取值,并将输入特征空间划分为两个区域;然后对每个区域重复上述过程,直到叶子点所包含的训练样本数量少于预先设定的阈值θ RF;最终将输入特征空间划分为K个区域,将这些区域分别标记为R 1,L,R k,L,R K,所述K也表示CART回归树的叶子节点数,
    采用CART回归树构建的基于RF的DXN子模型可表示为:
    Figure PCTCN2020080528-appb-100025
    其中,
    Figure PCTCN2020080528-appb-100026
    其中,
    Figure PCTCN2020080528-appb-100027
    表示区域R k所包含的训练样本数量;
    Figure PCTCN2020080528-appb-100028
    表示第jth个训练子集在R k区域的第
    Figure PCTCN2020080528-appb-100029
    个DXN排放浓度测量值;I(·)为指示函数,在
    Figure PCTCN2020080528-appb-100030
    时存在I(·)=1,否则存在I(·)=0,
    第jth个训练子集
    Figure PCTCN2020080528-appb-100031
    构建的基于RF的DXN子模型的预测误差为,
    Figure PCTCN2020080528-appb-100032
    其中,(e j,0) n表示基于第nth个训练样本的DXN排放浓度预测误差,
    重复上述过程,得到采用CART回归树构建的J个基于RF的DXN子模型
    Figure PCTCN2020080528-appb-100033
    通过将这些子模型的预测输出
    Figure PCTCN2020080528-appb-100034
    与DXN测量值
    Figure PCTCN2020080528-appb-100035
    相减,获得输出误差
    Figure PCTCN2020080528-appb-100036
  4. 如权利要求3所述的二噁英排放浓度预测方法,其特征在于,所述基于GBDT的DXN子模型构建模块的具体工作过程为:通过构建多个“串联”的弱学习器模型的方式实现,其中,多个弱学习器模型的训练子集的输入数据保持不变,除第1个子模型的训练子集的输出数据真值为基于RF的子模型的预测值与测量值的误差外,其它子模型均以前一次迭代的GBDT子模型的预测误差作为训练子集的输出数据真值,
    以第jth个基于GBDT的DXN子模型的构建为例,假定共有I个基于GBDT的DXN子模型需要构建,并且均采用CART回归树构建,
    首先,构建第1个子模型
    Figure PCTCN2020080528-appb-100037
    其可表示为,
    Figure PCTCN2020080528-appb-100038
    其中,
    Figure PCTCN2020080528-appb-100039
    表示第1个基于GBDT的DXN子模型的预测输出,
    上述子模型的损失函数的定义如下,
    Figure PCTCN2020080528-appb-100040
    其中,
    Figure PCTCN2020080528-appb-100041
    表示第jth个训练子集中的第nth个样本的预测值,
    然后,计算子模型
    Figure PCTCN2020080528-appb-100042
    的输出残差e j,1,如下所示,
    Figure PCTCN2020080528-appb-100043
    接着,e j,1作为基于GBDT的第2个DXN子模型
    Figure PCTCN2020080528-appb-100044
    的训练子集的输出数据真值。类似的,第2个DXN子模型可表示为,
    Figure PCTCN2020080528-appb-100045
    其中,(e j,1) n表示针对第nth个样本的基于GBDT的第1个DXN子模型的预测误差,
    重复上述过程,可知基于GBDT的第ith(i≤I)个DXN子模型可标记为
    Figure PCTCN2020080528-appb-100046
    其残差计算如下所示,
    Figure PCTCN2020080528-appb-100047
    在迭代I-1次之后,第Ith个子模型的训练子集的输出数据真值为,
    Figure PCTCN2020080528-appb-100048
    其中,
    Figure PCTCN2020080528-appb-100049
    为第(I-1)th个子模型
    Figure PCTCN2020080528-appb-100050
    的预测输出。
    进而,第Ith个子模型可表示为,
    Figure PCTCN2020080528-appb-100051
    其中,(e j,I-1) n表示针对第nth个样本的基于GBDT的第(I-1)th个DXN子模型的预测误差,
    因此,基于第jth个训练子集构建的全部I个基于GBDT的DXN子模型可表示为
    Figure PCTCN2020080528-appb-100052
    其相应的输出可表示为
    Figure PCTCN2020080528-appb-100053
  5. 如权利要求4所述的二噁英排放浓度预测方法,其特征在于,所述基于简单平均的DXN集成预测模块的具体工作流程为:
    由上述过程可知,数量为J的基于RF的DXN子模型可表示为
    Figure PCTCN2020080528-appb-100054
    这些模型以并行方式构建;数量为J×I的基于GBDT的DXN子模型可表示为
    Figure PCTCN2020080528-appb-100055
    这些模型同时以串行和并行方式构建,
    针对第jth个训练子集,构建了1个基于RF的和I个基于GBDT的DXN子模型,这些子模型以串行方式产生,其预测输出之和作为第jth个训练子集的总体输出,可表示为,
    Figure PCTCN2020080528-appb-100056
    由于J个训练子集间是并行的,通过简单平均加权方式对上述DXN子模型进行合并,最后DXN排放浓度集成预测模型f DXN(·)可表示如下:
    Figure PCTCN2020080528-appb-100057
PCT/CN2020/080528 2020-02-10 2020-03-21 一种二噁英排放浓度预测方法 WO2021159585A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/544,213 US20220092482A1 (en) 2020-02-10 2021-12-07 Method for predicting dioxin emission concentration

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010083784.4A CN111260149B (zh) 2020-02-10 2020-02-10 一种二噁英排放浓度预测方法
CN202010083784.4 2020-02-10

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/544,213 Continuation US20220092482A1 (en) 2020-02-10 2021-12-07 Method for predicting dioxin emission concentration

Publications (1)

Publication Number Publication Date
WO2021159585A1 true WO2021159585A1 (zh) 2021-08-19

Family

ID=70954426

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/080528 WO2021159585A1 (zh) 2020-02-10 2020-03-21 一种二噁英排放浓度预测方法

Country Status (3)

Country Link
US (1) US20220092482A1 (zh)
CN (1) CN111260149B (zh)
WO (1) WO2021159585A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023138140A1 (zh) * 2022-01-19 2023-07-27 北京工业大学 基于宽度混合森林回归的mswi过程二噁英排放软测量方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882130B (zh) * 2020-07-30 2022-01-11 浙江大学 一种基于生成路径聚类与Box-Cox变换的二噁英排放在线预测方法
CN112183709B (zh) * 2020-09-22 2023-11-10 生态环境部华南环境科学研究所 一种垃圾焚烧废气二噁英超标预测预警方法
CN112464544B (zh) * 2020-11-17 2024-06-18 北京工业大学 一种城市固废焚烧过程二噁英排放浓度预测模型构建方法
CN112420135B (zh) * 2020-11-20 2024-09-13 北京化工大学 一种基于样方法和分位数回归的虚拟样本生成方法
CN113780384B (zh) * 2021-08-28 2024-05-28 北京工业大学 基于集成决策树算法的城市固废焚烧过程关键被控变量预测方法
CN114943151A (zh) * 2022-05-31 2022-08-26 北京工业大学 基于集成t-s模糊回归树的mswi过程二噁英排放软测量方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426882A (zh) * 2015-12-24 2016-03-23 上海交通大学 一种人脸图像中快速定位人眼的方法
CN108549792A (zh) * 2018-06-27 2018-09-18 北京工业大学 一种基于潜结构映射算法的固废焚烧过程二噁英排放浓度软测量方法
AU2018102040A4 (en) * 2018-12-10 2019-01-17 Chen, Shixuan Mr The method of an efficient and accurate credit rating system through the gradient boost decision tree
CN109408774A (zh) * 2018-11-07 2019-03-01 上海海事大学 基于随机森林和梯度提升树的预测污水出水指标的方法
CN109976998A (zh) * 2017-12-28 2019-07-05 航天信息股份有限公司 一种软件缺陷预测方法、装置和电子设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389253B (zh) * 2018-11-09 2022-04-15 国网四川省电力公司电力科学研究院 一种基于可信性集成学习的电力系统扰动后频率预测方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426882A (zh) * 2015-12-24 2016-03-23 上海交通大学 一种人脸图像中快速定位人眼的方法
CN109976998A (zh) * 2017-12-28 2019-07-05 航天信息股份有限公司 一种软件缺陷预测方法、装置和电子设备
CN108549792A (zh) * 2018-06-27 2018-09-18 北京工业大学 一种基于潜结构映射算法的固废焚烧过程二噁英排放浓度软测量方法
CN109408774A (zh) * 2018-11-07 2019-03-01 上海海事大学 基于随机森林和梯度提升树的预测污水出水指标的方法
AU2018102040A4 (en) * 2018-12-10 2019-01-17 Chen, Shixuan Mr The method of an efficient and accurate credit rating system through the gradient boost decision tree

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023138140A1 (zh) * 2022-01-19 2023-07-27 北京工业大学 基于宽度混合森林回归的mswi过程二噁英排放软测量方法

Also Published As

Publication number Publication date
CN111260149A (zh) 2020-06-09
US20220092482A1 (en) 2022-03-24
CN111260149B (zh) 2023-06-23

Similar Documents

Publication Publication Date Title
WO2021159585A1 (zh) 一种二噁英排放浓度预测方法
Xia et al. Dioxin emission prediction based on improved deep forest regression for municipal solid waste incineration process
CN108549792B (zh) 一种基于潜结构映射算法的固废焚烧过程二噁英排放浓度软测量方法
CN111461355B (zh) 基于随机森林的二噁英排放浓度迁移学习预测方法
CN111144609A (zh) 一种锅炉废气排放预测模型建立方法、预测方法及装置
WO2020192166A1 (zh) 一种城市固废焚烧过程二噁英排放浓度软测量方法
CN112464544B (zh) 一种城市固废焚烧过程二噁英排放浓度预测模型构建方法
CN107944173B (zh) 一种基于选择性集成最小二乘支撑向量机的二噁英软测量系统
CN111462835B (zh) 一种基于深度森林回归算法的二噁英排放浓度软测量方法
CN110135057B (zh) 基于多层特征选择的固废焚烧过程二噁英排放浓度软测量方法
CN110991756B (zh) 基于ts模糊神经网络的mswi炉膛温度预测方法
CN107356710A (zh) 一种垃圾焚烧烟气中二噁英类浓度预测方法及系统
Sun et al. Prediction of oxygen content using weighted PCA and improved LSTM network in MSWI process
CN114398836A (zh) 基于宽度混合森林回归的mswi过程二噁英排放软测量方法
Xia et al. Soft measuring method of dioxin emission concentration for MSWI process based on RF and GBDT
Xu et al. A novel online combustion optimization method for boiler combining dynamic modeling, multi-objective optimization and improved case-based reasoning
Cui et al. Multi-condition operational optimization with adaptive knowledge transfer for municipal solid waste incineration process
Yin et al. Enhancing deep learning for the comprehensive forecast model in flue gas desulfurization systems
CN109978011A (zh) 一种城市固废焚烧过程二噁英排放浓度预测系统
CN116906910A (zh) 一种基于深度卷积神经网络的高效燃烧控制方法及系统
Jin et al. Machine learning-aided optimization of coal decoupling combustion for lowering NO and CO emissions simultaneously
Ilamathi et al. Predictive modelling and optimization of nitrogen oxides emission in coal power plant using Artificial Neural Network and Simulated Annealing
Zhang et al. Heterogeneous ensemble prediction model of CO emission concentration in municipal solid waste incineration process using virtual data and real data hybrid-driven
Xia et al. Dioxin emission concentration forecasting model for MSWI process with random forest-based transfer learning
Wang et al. Key controlled variable model of MSWI process based on ensembled decision tree algorithm

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20918857

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20918857

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.04.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20918857

Country of ref document: EP

Kind code of ref document: A1