CN115293231A - A Random Forest Prediction Method for Regional Ecological Harmony - Google Patents
A Random Forest Prediction Method for Regional Ecological Harmony Download PDFInfo
- Publication number
- CN115293231A CN115293231A CN202210747133.XA CN202210747133A CN115293231A CN 115293231 A CN115293231 A CN 115293231A CN 202210747133 A CN202210747133 A CN 202210747133A CN 115293231 A CN115293231 A CN 115293231A
- Authority
- CN
- China
- Prior art keywords
- random forest
- model
- elements
- time
- prediction method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 34
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000000694 effects Effects 0.000 claims abstract description 14
- 238000010801 machine learning Methods 0.000 claims abstract description 13
- 238000011160 research Methods 0.000 claims abstract description 12
- 238000013178 mathematical model Methods 0.000 claims abstract description 11
- 238000012512 characterization method Methods 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims abstract description 10
- 244000025254 Cannabis sativa Species 0.000 claims abstract description 9
- 230000008569 process Effects 0.000 claims abstract description 9
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 15
- 238000004422 calculation algorithm Methods 0.000 claims description 13
- 230000007613 environmental effect Effects 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000011156 evaluation Methods 0.000 claims description 8
- 238000005516 engineering process Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 6
- 241000209504 Poaceae Species 0.000 claims description 5
- 238000011161 development Methods 0.000 claims description 5
- 230000007774 longterm Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 210000003484 anatomy Anatomy 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000004140 cleaning Methods 0.000 claims description 3
- 238000013480 data collection Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000010187 selection method Methods 0.000 claims description 3
- 230000002269 spontaneous effect Effects 0.000 claims description 3
- 230000002194 synthesizing effect Effects 0.000 claims description 3
- 230000009897 systematic effect Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims 1
- 230000008859 change Effects 0.000 abstract description 2
- 230000009467 reduction Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000005553 drilling Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Game Theory and Decision Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Entrepreneurship & Innovation (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Educational Administration (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域technical field
本发明属于生态环境预测技术领域,具体涉及一种地区生态和谐随机森林预测方法。The invention belongs to the technical field of ecological environment prediction, and in particular relates to a method for predicting regional ecological harmony random forest.
背景技术Background technique
分析某城市的人地系统,并根据对其环境承载力的估计,可以作为生态和谐型立体城市进行规划。基于本项技术,可以进一步通过自然科学与社会科学在人文、社会、管理等领域的交叉,要充分分析人类活动对其干扰,利用社会科学评价方法的办法融入社会效益进行二次评价。根据当地经济发展需求及不同利益方诉求,人居环境等要素,结合可预测的经济社会学模型,优选出最完善的规划组合提供给规划人员,同时提出分年度的用地策略及具体修复措施。因此,如何提高地区生态和谐的预测效率和预测精准度是当下急需解决的问题。By analyzing the man-land system of a city and estimating its environmental carrying capacity, it can be planned as an ecologically harmonious three-dimensional city. Based on this technology, through the intersection of natural science and social science in the fields of humanities, society, and management, it is necessary to fully analyze the interference of human activities, and use social science evaluation methods to integrate social benefits into secondary evaluation. According to the needs of local economic development, the appeals of different stakeholders, human settlements and other elements, combined with predictable economic and sociological models, the most complete planning combination is selected and provided to planners, and at the same time, annual land use strategies and specific restoration measures are proposed. Therefore, how to improve the prediction efficiency and accuracy of regional ecological harmony is an urgent problem to be solved.
发明内容Contents of the invention
本发明为了解决现有技术中的不足之处,提供一种地区生态和谐随机森林预测方法;其根据地球历史上长期和快速的环境变化证据提供了与现代进行比较关键基线,探索用直观定量的数学模型分析自然系统关系,借助跨越了地质时间尺度和人类时间尺度的高精度测年技术,使解决连接人类历史和地质历史,抽象人类要素对山水林田湖草系统的干扰要素和关系,运用人工智能算法实现对其高精度的还原与预测。In order to solve the deficiencies in the prior art, the present invention provides a regional ecological harmony random forest prediction method; it provides a key baseline for comparison with modern times based on the long-term and rapid environmental change evidence in the history of the earth, and uses intuitive and quantitative methods to explore Mathematical models analyze the relationship between natural systems, and with the help of high-precision dating technology that spans the geological time scale and human time scale, it can solve the problem of connecting human history and geological history, abstract the interference factors and relationships of human elements on the landscape, forest, field, lake and grass system, and use artificial The intelligent algorithm realizes its high-precision restoration and prediction.
为解决上述技术问题,本发明采用如下技术方案:地区生态和谐随机森林预测方法,包括以下步骤:In order to solve the above-mentioned technical problems, the present invention adopts the following technical solutions: a regional ecological harmony random forest prediction method, comprising the following steps:
第一步、针对多重要素,综合不同时间尺度,从地下到地表对山水林田湖草要素精细描述;The first step is to comprehensively describe the elements of mountains, rivers, forests, fields, lakes and grasses from the underground to the surface, aiming at multiple elements and synthesizing different time scales;
第二步、结合长时段卫星遥感分时段对近百年的自然要素做精细定量解译,自然要素包括山、水、林、田、湖、草的面积,以1年为时间单位标定,综合解译人类要素,人类要素包括建筑用地面积;The second step is to combine the long-term satellite remote sensing to make a fine and quantitative interpretation of the natural elements of the past century. The natural elements include the area of mountains, water, forests, fields, lakes, and grass. The time unit is calibrated in one year, and the comprehensive solution Human factors, human factors include building land area;
第三步、收集研究区范围内,不同年份人类要素表征数据,表征数据包括人口数量、GDP和工业开发强度,以1年为时间单位标定,综合解译人类要素;The third step is to collect the characterization data of human factors in different years within the research area. The characterization data include population, GDP and industrial development intensity, calibrated with one year as the time unit, and comprehensively interpret human factors;
第四步、根据历史资料和专家判断,收集研究年限内对研究区的环境承载能力做出判断,并作为后续模型训练的标准。通过多元回归及机器学习的方法,建立人类活动要素函数;基于系统性思维,通过多元回归及机器学习的方法建立起人与自然多要素多尺度的拟合关系,多尺度包括时间尺度,探讨自然环境自发演化过程及人类活动对这一过程的影响,拟合成一个跟时间相关的数学模型,数学模型即曲线函数;The fourth step is to collect and judge the environmental carrying capacity of the research area within the research period based on historical data and expert judgment, and use it as a standard for subsequent model training. Through multiple regression and machine learning methods, the function of human activity elements is established; based on systematic thinking, through multiple regression and machine learning methods, a multi-factor and multi-scale fitting relationship between man and nature is established. Multi-scales include time scales to explore nature The spontaneous evolution process of the environment and the influence of human activities on this process are fitted into a time-related mathematical model, and the mathematical model is a curve function;
第五步、通过将时间设定为未来的某个时间,分析曲线的周期与频率,用数学模型预测未来各个要素的变化特点,预测人类活动对其他要素的影响;基于上述工作提出环境承载力下限,划定该地区山、水、林、田、湖、草的面积或比例,划定生态功能保障基线、环境质量安全底线、自然资源利用上线,指导生态和谐型城市立体规划。The fifth step is to set the time as a certain time in the future, analyze the cycle and frequency of the curve, use mathematical models to predict the characteristics of changes in various elements in the future, and predict the impact of human activities on other elements; based on the above work, put forward the environmental carrying capacity The lower limit defines the area or proportion of mountains, water, forests, fields, lakes, and grass in the area, defines the baseline for ecological function protection, the bottom line for environmental quality and safety, and the upper limit for the use of natural resources to guide the three-dimensional planning of an ecologically harmonious city.
第一步具体为:在重点解剖区布置密集浅钻,通过对地下地质体精细定量表征,建立包括山水林田湖草要素的三维空间模型;利用多个钻孔,结合高精度测年技术,从全新世晚期(>5000a),按照千年级划分,到1000年前开始以100年为时间单位标定,精细恢复古地理,古环境格局,最终建立精度较高的四维时空地质模型。The first step is specifically: arrange intensive shallow drilling in key anatomical areas, establish a three-dimensional space model including the elements of mountains, rivers, forests, fields, lakes and grasses through fine and quantitative characterization of underground geological bodies; use multiple boreholes, combined with high-precision dating technology, from In the late Holocene (>5000a), according to the division of the millennium, the time unit of 100 years began to be calibrated by 1000 years ago, the paleogeography and paleoenvironmental pattern were finely restored, and a four-dimensional spatiotemporal geological model with high precision was finally established.
第四步中的多元回归及机器学习的方法为随机森林模型算法,随机森林模型算法的地理区域情况要素指标如下表所示:The method of multiple regression and machine learning in the fourth step is the random forest model algorithm, and the geographical area factor indicators of the random forest model algorithm are shown in the following table:
表1 地理区域情况要素指标Table 1 Geographical Region Situation Factor Indicators
假设数据收集年份为两个时段,第一个时段全新世中晚期至现代(7000 B.C-1950),共50个时间点,第二个时段1951-2021,共71个时间点,第一个时段主要用来构建地质演化背景, 1951-2020年该地区可作为生态和谐型立体城市进行规划的评判结果已知(1:可以作为;0:不可以作为),2021年的评判结果未知,是需要通过训练后的预测模型进行预测的;表1中共有23个指标,所以标准化后的Z矩阵大小是23*120,其对应的评判结果Y矩阵大小为120*1;待预测年份2021年的已知指标Z2021矩阵大小为23*1。Assuming that the data collection year is divided into two periods, the first period is from the middle and late Holocene to modern (7000 BC-1950), a total of 50 time points, the second period is 1951-2021, a total of 71 time points, the first period It is mainly used to construct the background of geological evolution. From 1951 to 2020, the evaluation results of this area as an ecologically harmonious three-dimensional city are known (1: can be used; 0: cannot be used), and the evaluation results in 2021 are unknown, which is required Predicted by the trained prediction model; there are 23 indicators in Table 1, so the size of the standardized Z matrix is 23*120, and the corresponding judgment result Y matrix size is 120*1; Known index Z 2021 matrix size is 23*1.
随机森林模型算法采用表1中参数进行数据清理例如处理缺失值、光滑噪声、识别或删除离群点以及归一化进行数据预处理;包括以下步骤:The random forest model algorithm uses the parameters in Table 1 for data cleaning, such as processing missing values, smooth noise, identifying or deleting outliers, and normalizing for data preprocessing; including the following steps:
(1)随机数生成,模型中的每棵树的生长为关键步骤、(2)计算预测指标MAE和MAPE、(3)随机森林参数优化、(4)根据准确率最高的原则,选择出最优模型、(5)根据随机森林生成的最优模型直接计算各特征的权重(非零实数),并根据由大到小的原则选择一定数目的较为重要的特征。(1) Random number generation, the growth of each tree in the model is a key step, (2) Calculation of predictive indicators MAE and MAPE, (3) Random forest parameter optimization, (4) According to the principle of the highest accuracy, select the most Optimal model, (5) directly calculate the weight of each feature (non-zero real number) according to the optimal model generated by the random forest, and select a certain number of more important features according to the principle from large to small.
步骤(1)包括以下三个主要步骤:Step (1) consists of the following three main steps:
A、bootstrap 采样:若训练集大小为N,对于每棵树随机且有放回地从训练集中的抽取n个训练样本作为该树的训练集;A. Bootstrap sampling: if the size of the training set is N, for each tree, randomly select n training samples from the training set with replacement as the training set of the tree;
B、特征随机:若每个样本的特征维度为M,指定一个常数m<<M,随机从M个特征中选取m个特征子集,每次树进行分裂时从这m个特征中选择最优的;B. Random features: If the feature dimension of each sample is M, specify a constant m<<M, randomly select m feature subsets from M features, and select the most feature from these m features each time the tree is split. Excellent;
C、每棵树都尽最大程度的生长,并且没有剪枝过。C. Each tree grows to its maximum extent and has not been pruned.
步骤(2)具体为:为结果的真实值为结果的估计值。预测指标MAE(Mean Absolute Error)表示平均绝对误差,值域:[0,+∞);当预测值与真实值完全吻合时等于0,即完美模型;误差越大,MAE 值越大:Step (2) is specifically: is the true value of the result Estimated value for the result. The predictive indicator MAE (Mean Absolute Error) represents the mean absolute error, value range: [0,+∞); when the predicted value is completely consistent with the real value, it is equal to 0, that is, the perfect model; the larger the error, the larger the MAE value:
预测指标MAPE(Mean Absolute Percentage Error)表示平均绝对百分比误差,值域:[0,+∞);当预测值与真实值完全吻合时等于0,即完美模型;误差越大,MAE 值越大:The predictive indicator MAPE (Mean Absolute Percentage Error) represents the mean absolute percentage error, value range: [0,+∞); when the predicted value is completely consistent with the real value, it is equal to 0, that is, the perfect model; the larger the error, the larger the MAE value:
。 .
步骤(3)具体为:使用机器学习中经典调参方法,对建立树的个数、最大特征的选择方式、树的最大深度、节点最小分裂所需样本个数、叶子节点最小样本数、是否随机选择最合适的参数组合、是否贝叶斯优化进行调整。Step (3) is specifically: using the classic parameter tuning method in machine learning, the number of established trees, the selection method of the largest feature, the maximum depth of the tree, the number of samples required for the minimum split of nodes, the minimum number of samples of leaf nodes, whether Randomly select the most suitable parameter combination, and adjust it with or without Bayesian optimization.
采用上述技术方案,本发明具有以下技术效果:Adopt above-mentioned technical scheme, the present invention has following technical effect:
从空间角度对构成某一地区(如城市)的自然资源和人类居住环境的各种地理要素的基本情况的反映,可以看为是地理信息根据不同的需求,在感知、统计和分析三种不同深度处理后得到的信息。本方法中关于建立起人与自然多要素多尺度(时间尺度)的拟合关系是核心要解决的问题。From the perspective of space, the reflection of the basic situation of various geographical elements that constitute the natural resources and human living environment of a certain area (such as a city) can be regarded as the three different types of geographical information in perception, statistics and analysis according to different needs. Information obtained after in-depth processing. In this method, the core problem to be solved is the establishment of a multi-element multi-scale (time scale) fitting relationship between human and nature.
参照(马万钟,杜清运.地理国情监测的体系框架研究[J].国土资源科技管理,2011,28(06):104-111)的研究,可以将归纳为自然环境要素、社会人文要素和产业经济要素。参考(刘凯. 生态脆弱型人地系统演变与可持续发展模式选择研究[D].山东师范大学,2017)中提出的指标体系原则,以某一地区可作为生态和谐型立体城市进行规划为目标。通过采用自然环境、经济社会2个要素的指标采用随机森林的方法进行预测模型的建立,进一步获得指标对预测结果影响的权重分析。Referring to (Ma Wanzhong, Du Qingyun. Research on the System Framework of Geographical National Conditions Monitoring[J]. Land and Resources Science and Technology Management, 2011,28(06):104-111), it can be summarized as natural environment elements, social and cultural elements and industrial economy. elements. Referring to the principle of the indicator system proposed in (Liu Kai. Research on the evolution of ecologically fragile human-land systems and the selection of sustainable development models [D]. Shandong Normal University, 2017), a certain area can be planned as an ecologically harmonious three-dimensional city. Target. By using the indicators of the natural environment and economic society, the prediction model is established by using the random forest method, and the weight analysis of the impact of the indicators on the prediction results is further obtained.
对于表1中所列的指标,可进行适当的补充或删减,特征纳入越多,准确率越高。尽量保留权重大的指标需要保留,可减少特征可降低运行时间,建议按照95%阈值选择的部分重要特征的数据集。对于采集时间,同一年可进行多个时间点的采集,比如每月一个数据点,使得样本量大大增加。增加样本量可增加预测模型的准确率。For the indicators listed in Table 1, appropriate supplements or deletions can be made. The more features included, the higher the accuracy rate. Try to keep the weighted indicators as much as possible, which can reduce the characteristics and reduce the running time. It is recommended to select some important feature data sets according to the 95% threshold. For the collection time, multiple time points can be collected in the same year, such as one data point per month, which greatly increases the sample size. Increasing the sample size increases the accuracy of the predictive model.
随机森林分类效果(错误率)与两个因素有关:森林中任意两棵树的相关性:相关性越大,错误率越大;森林中每棵树的分类能力:每棵树的分类能力越强,整个森林的错误率越低。减小特征选择个数m,树的相关性和分类能力也会相应的降低;增大m,两者也会随之增大。所以关键问题是如何选择最优的m(或者是范围),这也是随机森林唯一的一个参数。The random forest classification effect (error rate) is related to two factors: the correlation of any two trees in the forest: the greater the correlation, the greater the error rate; the classification ability of each tree in the forest: the higher the classification ability of each tree Stronger, the lower the error rate of the entire forest. Reducing the number of feature selection m will reduce the correlation and classification ability of the tree; increasing m will increase both. So the key question is how to choose the optimal m (or range), which is also the only parameter of random forest.
本发明选用随机森林模型算法,具有以下优点:1)在当前所有算法中,具有极好的准确率;2)能够有效地运行在大数据集上;3)能够处理具有高维特征的输入样本,而且不需要降维;4)能够评估各个特征在分类问题上的重要性;5)在生成过程中,能够获取到内部生成误差的一种无偏估计;6)对于缺省值问题也能够获得很好得结果。The present invention uses the random forest model algorithm, which has the following advantages: 1) among all the current algorithms, it has excellent accuracy; 2) it can effectively run on large data sets; 3) it can process input samples with high-dimensional features , and does not require dimensionality reduction; 4) It can evaluate the importance of each feature in the classification problem; 5) During the generation process, an unbiased estimate of the internal generation error can be obtained; 6) For the default value problem, it can also Get great results.
附图说明Description of drawings
图1 是随机森林模型示意图;Figure 1 is a schematic diagram of the random forest model;
图2 是随机森林模型算法流程示意图;Figure 2 is a schematic diagram of the random forest model algorithm flow;
图3是预测指标权重排列示意图;Figure 3 is a schematic diagram of the weight arrangement of predictive indicators;
图4是预测模型结果与实际结果比较示意图。Figure 4 is a schematic diagram of the comparison between the prediction model results and the actual results.
具体实施方式Detailed ways
如图1-4所示,本发明的地区生态和谐随机森林预测方法,包括以下步骤:As shown in Figures 1-4, the regional ecological harmony random forest prediction method of the present invention comprises the following steps:
第一步、针对多重要素,综合不同时间尺度,从地下到地表对山水林田湖草要素精细描述;The first step is to comprehensively describe the elements of mountains, rivers, forests, fields, lakes and grasses from the underground to the surface, aiming at multiple elements and synthesizing different time scales;
第二步、结合长时段卫星遥感分时段对近百年的自然要素做精细定量解译,自然要素包括山、水、林、田、湖、草的面积,以1年为时间单位标定,综合解译人类要素,人类要素包括建筑用地面积;The second step is to combine the long-term satellite remote sensing to make a fine and quantitative interpretation of the natural elements of the past century. The natural elements include the area of mountains, water, forests, fields, lakes, and grass. The time unit is calibrated in one year, and the comprehensive solution Human factors, human factors include building land area;
第三步、收集研究区范围内,不同年份人类要素表征数据,表征数据包括人口数量、GDP和工业开发强度,以1年为时间单位标定,综合解译人类要素;The third step is to collect the characterization data of human factors in different years within the research area. The characterization data include population, GDP and industrial development intensity, calibrated with one year as the time unit, and comprehensively interpret human factors;
第四步、根据历史资料和专家判断,收集研究年限内对研究区的环境承载能力做出判断,并作为后续模型训练的标准。通过多元回归及机器学习的方法,建立人类活动要素函数;基于系统性思维,通过多元回归及机器学习的方法建立起人与自然多要素多尺度的拟合关系,多尺度包括时间尺度,探讨自然环境自发演化过程及人类活动对这一过程的影响,拟合成一个跟时间相关的数学模型,数学模型即曲线函数;The fourth step is to collect and judge the environmental carrying capacity of the research area within the research period based on historical data and expert judgment, and use it as a standard for subsequent model training. Through multiple regression and machine learning methods, the function of human activity elements is established; based on systematic thinking, through multiple regression and machine learning methods, a multi-factor and multi-scale fitting relationship between man and nature is established. Multi-scales include time scales to explore nature The spontaneous evolution process of the environment and the influence of human activities on this process are fitted into a time-related mathematical model, and the mathematical model is a curve function;
第五步、通过将时间设定为未来的某个时间,分析曲线的周期与频率,用数学模型预测未来各个要素的变化特点,预测人类活动对其他要素的影响;基于上述工作提出环境承载力下限,划定该地区山、水、林、田、湖、草的面积或比例,划定生态功能保障基线、环境质量安全底线、自然资源利用上线,指导生态和谐型城市立体规划。The fifth step is to set the time as a certain time in the future, analyze the cycle and frequency of the curve, use mathematical models to predict the characteristics of changes in various elements in the future, and predict the impact of human activities on other elements; based on the above work, put forward the environmental carrying capacity The lower limit defines the area or proportion of mountains, water, forests, fields, lakes, and grass in the area, defines the baseline for ecological function protection, the bottom line for environmental quality and safety, and the upper limit for the use of natural resources to guide the three-dimensional planning of an ecologically harmonious city.
第一步具体为:在重点解剖区布置密集浅钻,通过对地下地质体精细定量表征,建立包括山水林田湖草要素的三维空间模型;利用多个钻孔,结合高精度测年技术,从全新世晚期(>5000a),按照千年级划分,到1000年前开始以100年为时间单位标定,精细恢复古地理,古环境格局,最终建立精度较高的四维时空地质模型。The first step is specifically: arrange intensive shallow drilling in key anatomical areas, establish a three-dimensional space model including the elements of mountains, rivers, forests, fields, lakes and grasses through fine and quantitative characterization of underground geological bodies; use multiple boreholes, combined with high-precision dating technology, from In the late Holocene (>5000a), according to the division of the millennium, the time unit of 100 years began to be calibrated by 1000 years ago, the paleogeography and paleoenvironmental pattern were finely restored, and a four-dimensional spatiotemporal geological model with high precision was finally established.
第四步中的多元回归及机器学习的方法为随机森林模型算法,随机森林模型算法的地理区域情况要素指标如下表所示:The method of multiple regression and machine learning in the fourth step is the random forest model algorithm, and the geographical area factor indicators of the random forest model algorithm are shown in the following table:
表1 地理区域情况要素指标Table 1 Geographical Region Situation Factor Indicators
假设数据收集年份为两个时段,第一个时段全新世中晚期至现代(7000 B.C-1950),共50个时间点,第二个时段1951-2021,共71个时间点,第一个时段主要用来构建地质演化背景, 1951-2020年该地区可作为生态和谐型立体城市进行规划的评判结果已知(1:可以作为;0:不可以作为),2021年的评判结果未知,是需要通过训练后的预测模型进行预测的;表1中共有23个指标,所以标准化后的Z矩阵大小是23*120,其对应的评判结果Y矩阵大小为120*1;待预测年份2021年的已知指标Z2021矩阵大小为23*1。Assuming that the data collection year is divided into two periods, the first period is from the middle and late Holocene to modern (7000 BC-1950), a total of 50 time points, the second period is 1951-2021, a total of 71 time points, the first period It is mainly used to construct the background of geological evolution. From 1951 to 2020, the evaluation results of this area as an ecologically harmonious three-dimensional city are known (1: can be used; 0: cannot be used), and the evaluation results in 2021 are unknown, which is required Predicted by the trained prediction model; there are 23 indicators in Table 1, so the size of the standardized Z matrix is 23*120, and the corresponding judgment result Y matrix size is 120*1; Known index Z 2021 matrix size is 23*1.
随机森林模型算法采用表1中参数进行数据清理例如处理缺失值、光滑噪声、识别或删除离群点以及归一化进行数据预处理;包括以下步骤:The random forest model algorithm uses the parameters in Table 1 for data cleaning, such as processing missing values, smooth noise, identifying or deleting outliers, and normalizing for data preprocessing; including the following steps:
(1)随机数生成,模型中的每棵树的生长为关键步骤、(2)计算预测指标MAE和MAPE、(3)随机森林参数优化、(4)根据准确率最高的原则,选择出最优模型、(5)根据随机森林生成的最优模型直接计算各特征的权重(非零实数),并根据由大到小的原则选择一定数目的较为重要的特征。(1) Random number generation, the growth of each tree in the model is a key step, (2) Calculation of predictive indicators MAE and MAPE, (3) Random forest parameter optimization, (4) According to the principle of the highest accuracy, select the most Optimal model, (5) directly calculate the weight of each feature (non-zero real number) according to the optimal model generated by the random forest, and select a certain number of more important features according to the principle from large to small.
步骤(1)包括以下三个主要步骤:Step (1) consists of the following three main steps:
A、bootstrap 采样:若训练集大小为N,对于每棵树随机且有放回地从训练集中的抽取n个训练样本作为该树的训练集;A. Bootstrap sampling: if the size of the training set is N, for each tree, randomly select n training samples from the training set with replacement as the training set of the tree;
B、特征随机:若每个样本的特征维度为M,指定一个常数m<<M,随机从M个特征中选取m个特征子集,每次树进行分裂时从这m个特征中选择最优的;B. Random features: If the feature dimension of each sample is M, specify a constant m<<M, randomly select m feature subsets from M features, and select the most feature from these m features each time the tree is split. Excellent;
C、每棵树都尽最大程度的生长,并且没有剪枝过。C. Each tree grows to its maximum extent and has not been pruned.
步骤(2)具体为:为结果的真实值为结果的估计值。预测指标MAE(Mean Absolute Error)表示平均绝对误差,值域:[0,+∞);当预测值与真实值完全吻合时等于0,即完美模型;误差越大,MAE 值越大:Step (2) is specifically: is the true value of the result Estimated value for the result. The predictive indicator MAE (Mean Absolute Error) represents the mean absolute error, value range: [0,+∞); when the predicted value is completely consistent with the real value, it is equal to 0, that is, the perfect model; the larger the error, the larger the MAE value:
预测指标MAPE(Mean Absolute Percentage Error)表示平均绝对百分比误差,值域:[0,+∞);当预测值与真实值完全吻合时等于0,即完美模型;误差越大,MAE 值越大:The predictive indicator MAPE (Mean Absolute Percentage Error) represents the mean absolute percentage error, value range: [0,+∞); when the predicted value is completely consistent with the real value, it is equal to 0, that is, the perfect model; the larger the error, the larger the MAE value:
。 .
步骤(3)具体为:使用机器学习中经典调参方法,对建立树的个数、最大特征的选择方式、树的最大深度、节点最小分裂所需样本个数、叶子节点最小样本数、是否随机选择最合适的参数组合、是否贝叶斯优化进行调整。Step (3) is specifically: using the classic parameter tuning method in machine learning, the number of established trees, the selection method of the largest feature, the maximum depth of the tree, the number of samples required for the minimum split of nodes, the minimum number of samples of leaf nodes, whether Randomly select the most suitable parameter combination, and adjust it with or without Bayesian optimization.
本实施例并非对本发明的形状、材料、结构等作任何形式上的限制,凡是依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与修饰,均属于本发明技术方案的保护范围。This embodiment does not impose any formal restrictions on the shape, material, structure, etc. of the present invention. All simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention belong to the protection of the technical solution of the present invention. scope.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210747133.XA CN115293231A (en) | 2022-06-29 | 2022-06-29 | A Random Forest Prediction Method for Regional Ecological Harmony |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210747133.XA CN115293231A (en) | 2022-06-29 | 2022-06-29 | A Random Forest Prediction Method for Regional Ecological Harmony |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115293231A true CN115293231A (en) | 2022-11-04 |
Family
ID=83819879
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210747133.XA Pending CN115293231A (en) | 2022-06-29 | 2022-06-29 | A Random Forest Prediction Method for Regional Ecological Harmony |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115293231A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116823067A (en) * | 2023-08-29 | 2023-09-29 | 北控水务(中国)投资有限公司 | Method and device for determining water quality cleaning state of pipe network and electronic equipment |
CN118411056A (en) * | 2024-06-28 | 2024-07-30 | 贵州师范大学 | Ecological product information data sharing method for karst rural ecosystem |
-
2022
- 2022-06-29 CN CN202210747133.XA patent/CN115293231A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116823067A (en) * | 2023-08-29 | 2023-09-29 | 北控水务(中国)投资有限公司 | Method and device for determining water quality cleaning state of pipe network and electronic equipment |
CN116823067B (en) * | 2023-08-29 | 2023-12-19 | 北控水务(中国)投资有限公司 | Method and device for determining water quality cleaning state of pipe network and electronic equipment |
CN118411056A (en) * | 2024-06-28 | 2024-07-30 | 贵州师范大学 | Ecological product information data sharing method for karst rural ecosystem |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114971301B (en) | Ecological interference risk identification and evaluation method based on automatic parameter adjustment optimization model | |
Lan et al. | A clustering preprocessing framework for the subannual calibration of a hydrological model considering climate‐land surface variations | |
CN117236199B (en) | Methods and systems for improving river and lake water quality and ensuring water security in urban water network areas | |
Yan et al. | Many-objective robust decision making for water allocation under climate change | |
CN103177301A (en) | Typhoon disaster risk estimate method | |
CN115293231A (en) | A Random Forest Prediction Method for Regional Ecological Harmony | |
Deng | Modeling the dynamics and consequences of land system change | |
Chen et al. | River ecological flow early warning forecasting using baseflow separation and machine learning in the Jiaojiang River Basin, Southeast China | |
Zeng et al. | A Bayesian belief network approach for mapping water conservation ecosystem service optimization region | |
Xiong et al. | Assessing and decoupling ecosystem services evolution in karst areas: A multi-model approach to support land management decision-making | |
CN114022008A (en) | A method for assessing suitable ecological flow in estuaries based on water ecological zoning theory | |
Wang et al. | Review of evaluation on ecological carrying capacity: The progress and trend of methodology | |
Abuamra et al. | Medium-term forecasts for salinity rates and groundwater levels | |
Donyaii | Evaluation of climate change impacts on the optimal operation of multipurpose reservoir systems using cuckoo search algorithm | |
Siwailam et al. | Integrated DPSIR-ANP-SD framework for sustainability assessment of water resources system in Egypt | |
Noor et al. | Prediction map of rainfall classification using random forest and inverse distance weighted (IDW) | |
Xu et al. | Retracted: Multi-energy system smart tool for ecological water body restoration using an AI-based decision-making framework | |
Chen et al. | An approach of multi-element fusion method for harmful algal blooms prediction | |
CN115293230A (en) | Prediction Method of Regional Ecological Harmony LSTM Algorithm | |
Han et al. | Shift in the migration trajectory of the green biomass loss barycenter in Central Asia | |
Guo et al. | Research on key technologies of spatio-temporal analysis and prediction of marine ecological environment based on association rule mining analysis | |
Ben-Salem et al. | Mapping steady-state groundwater levels in the Mediterranean region: The Iberian Peninsula as a benchmark | |
CN106228277A (en) | A kind of reservoir operation forecast information effective accuracy recognition methods based on data mining | |
Wang et al. | Ecological restoration process of watershed land space with intelligent IoT technology | |
Gouda et al. | Data mining for weather and climate studies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |