CN113240185A - County carbon emission prediction method based on random forest - Google Patents

County carbon emission prediction method based on random forest Download PDF

Info

Publication number
CN113240185A
CN113240185A CN202110570856.2A CN202110570856A CN113240185A CN 113240185 A CN113240185 A CN 113240185A CN 202110570856 A CN202110570856 A CN 202110570856A CN 113240185 A CN113240185 A CN 113240185A
Authority
CN
China
Prior art keywords
carbon emission
county
data
training
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110570856.2A
Other languages
Chinese (zh)
Inventor
狄筝
黄少远
王晓飞
张恒
罗韬
张赫
王睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202110570856.2A priority Critical patent/CN113240185A/en
Publication of CN113240185A publication Critical patent/CN113240185A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Operations Research (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Primary Health Care (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a county carbon emission prediction method based on random forests, which considers a multi-feature random forest model to train and predict carbon emission in county, can comprehensively extract multi-dimensional features in county, can realize parallel training operation facing large data volume of county, and has high training speed and simple realization. In addition, after the random forest model training is completed, the important degree of influence of each characteristic on carbon emission can be obtained, so that the carbon emission pollution is effectively treated.

Description

County carbon emission prediction method based on random forest
Technical Field
The invention relates to the field of artificial intelligence application, in particular to a method for predicting county carbon emission by performing model training by using characteristic data after county multi-element optimization.
Background
Currently, China still lacks in carbon emission prediction research, and cannot effectively prevent excessive emission in a certain area and lose balance of carbon emission among areas. With the development of artificial intelligence, the carbon emission characteristics can be constructed by analyzing factors influencing carbon emission and utilizing characteristic engineering, and the carbon emission amount of county areas can be predicted from the aspect of characteristics, so that the accuracy of carbon emission prediction is improved. Through yearbook analysis, relevant indexes of economic development, traffic trip, resident life and ecological greening can be used as direct carbon emission characteristics, relevant indexes of scale structure and energy efficiency are used as indirect carbon emission characteristics, and the direct carbon emission characteristics and the indirect carbon emission characteristics are combined through a random forest algorithm, so that the county carbon emission amount is effectively predicted.
Disclosure of Invention
In order to solve the above problems, the present invention provides a county carbon emission prediction method based on a random forest algorithm, the method comprising:
a county carbon emission prediction method based on random forests, wherein data in a prediction model is subjected to feature extraction according to three-dimensional data of county production, resident life and road traffic, and the carbon emission is predicted based on the three-dimensional data, and the prediction method comprises the following steps:
step 1: screening county data to form an initial data set required by a training model, forming initial county town carbon emission index elements, and dividing the carbon emission index elements into three types: production, living and transportation;
step 2: carrying out data cleaning and standardized data preprocessing on the data;
and step 3: forming a training data set, generating a training subset and a decision tree in each category of the production category, the life category and the traffic category by adopting a Bootstrap method, and randomly selecting N carbon emission influence indexes from N attributes as a subset of current node splitting when each node of the decision tree is split, wherein N is required to be less than N; combining all the decision trees after splitting to form a random forest;
and 4, step 4: inputting the parameter vector feature in the prediction set into the trained model, wherein each decision tree TmObtaining a predicted result value
Figure BDA0003082553510000021
Figure BDA0003082553510000022
And adding the prediction results obtained by all the decision trees to obtain an arithmetic mean value, respectively obtaining the carbon emission predicted by the life category, the production category and the traffic category of each county area, and adding the three types of predicted carbon emission to obtain a final carbon emission predicted value.
Further, the index elements of the production type, the life type and the traffic type in the carbon emission index elements are respectively used as N input variables, the actually measured carbon emission in the current year is used as an output variable, and the input variable and the output variable jointly form a training data set D.
Further, the data cleaning includes cleaning the initial data set by using a mean value substitution method, and includes the steps of: cleaning missing values, cleaning format contents, cleaning logic errors and cleaning waste demand data; the standardized data preprocessing comprises adopting min-max standardization, and if t elements exist in the set, carrying out set element x standardization1,x2,......,xtPerforming transformation to obtain dimensionless new sequence y1,y2,......,yt∈[0,1]Wherein
Figure BDA0003082553510000023
Further, the step of generating the training subset and the decision tree by adopting the Bootstrap method comprises the steps of carrying out replaced random sampling on the training samples, repeating the sampling for m times and then jointly obtaining m training samplesTraining data subset D forming a training data set DmTraining a decision tree T for each subset of training datamAs a sample of the root node of the decision tree.
Further, each of the node splits of the decision tree includes selecting the 1 carbon emission impact index X of the optimal outcome in accordance with the "least square error criterion" in the split subset by employing a classification and regression tree approachkAnd as the splitting attribute of the node, until the decision tree can not be split any more, pruning is not carried out in the splitting process, and the value of n is kept unchanged.
Further, the parameter vector feature in the prediction set can be defined as follows according to the collected characteristic indexes affecting the t-year carbon emission of county areas according to the production class, the life class and the traffic class:
Figure BDA0003082553510000031
Figure BDA0003082553510000032
Figure BDA0003082553510000033
wherein n is1,n2,n3The characteristic category numbers of production, life and traffic are respectively, X is a characteristic index of each type, and the carbon emission prediction task is further classified into a multiple linear regression problem, namely:
Figure BDA0003082553510000034
Figure BDA0003082553510000035
Figure BDA0003082553510000036
wherein beta is an unknown parameter, epsilon is a random error, and f is an optimal function for solving an algorithm model, namely beta0,β1,...,βn(ii) a Thus, the final predicted carbon emissions
Figure BDA0003082553510000041
Figure BDA0003082553510000042
Wherein
Figure BDA0003082553510000043
The carbon emission caused by the characteristic of the production class element in the county area,
Figure BDA0003082553510000044
carbon emissions due to the characteristics of the life-style elements in counties,
Figure BDA0003082553510000045
the carbon emission caused by the characteristics of the traffic class elements in county areas.
The invention provides a random forest model considering multiple features to train and predict carbon emission in county areas, which can comprehensively extract the multiple-dimensional features in the county areas and does not need to select from the multiple county area features. Because the decision trees in the random forest model are independent, parallel training operation can be realized in the face of large data volume of county areas, the training speed is high, and the realization is simple. In addition, after the random forest model training is completed, the important degree of influence of each characteristic on carbon emission can be obtained, so that enterprises and governments can better control the carbon emission, and the carbon emission pollution is effectively treated.
Drawings
Fig. 1 shows an algorithm procedure of a Random Forest (RF) algorithm.
Fig. 2 shows the comparison result of the RF algorithm and LR algorithm, LASSO algorithm, SVR algorithm of the present invention with respect to county-area life-type carbon emission prediction.
Fig. 3 shows the comparison results of the RF algorithm and LR algorithm, LASSO algorithm, SVR algorithm of the present invention with respect to county production carbon-like emission prediction.
Fig. 4 shows the comparison result of the RF algorithm and LR algorithm, LASSO algorithm, SVR algorithm of the present invention with respect to county traffic-like carbon emission prediction.
Detailed Description
The following examples are presented to enable those skilled in the art to more fully understand the present invention and are not intended to limit the invention in any way.
The invention mainly utilizes a Random Forest (RF) algorithm, wherein the Random Forest refers to a classifier which trains and predicts samples by utilizing a plurality of trees, the RF algorithm is a learning method which adopts a bagging thought in integrated learning, the RF algorithm is a model consisting of a plurality of decision trees, and each decision tree has no correlation. The algorithmic process is shown in fig. 1. In the process of the RF algorithm, firstly, a bootstrap method, namely a replaced random sampling method, is adopted, n samples are extracted from a data set to serve as a training set, a decision tree is trained through each training set, and the experiment is repeated until m decision trees are constructed. And then, taking the average value of the prediction results of each decision tree of the random forest as the most overall prediction result, thereby performing overall prediction. The prediction accuracy by using the random forest is high, the method can be effectively operated on a large data set, and overfitting is not easy to occur. In addition, the model can be trained in parallel due to the fact that the model is composed of a plurality of decision trees, training speed is improved, random forests are insensitive to noise in training sets, and comprehensive decisions of the decision trees are more stable than a single decision tree algorithm.
Based on the method, the carbon emission in the county area is predicted by comprehensively extracting the three-dimensional data characteristics of production, resident life and road traffic in the county area and establishing a multi-characteristic random forest model, so that the method has more accurate prediction performance.
Description of the problem
First, the county carbon emission prediction is assumed to be a regression prediction process, and collected characteristics affecting the county in t years are classified into three categories, namely production (production), life (life), and traffic (traffic). Let C be the total carbon emission of a county i in t yearsitAccording to the classification of carbon emissions, then
Figure BDA0003082553510000061
Wherein
Figure BDA0003082553510000062
The carbon emission caused by the characteristic of the production class element in the county area,
Figure BDA0003082553510000063
carbon emissions due to the characteristics of the life-style elements in counties,
Figure BDA0003082553510000064
the carbon emission caused by the characteristics of the traffic class elements in county areas.
According to the collected characteristic indexes affecting the t-year carbon emission of county areas, the method can be defined as follows according to production, life and traffic:
Figure BDA0003082553510000065
Figure BDA0003082553510000066
Figure BDA0003082553510000067
wherein n is1,n2,n3The characteristic category numbers of production, life and traffic are respectively, and X is a characteristic index of each type.
The average influence coefficient of each index is obtained through a Pearson correlation analysis method, the linear correlation between the characteristics and the carbon emission can be determined, and the carbon emission prediction task can be summarized as a multiple linear regression problem, namely:
Figure BDA0003082553510000068
Figure BDA0003082553510000069
Figure BDA00030825535100000610
where β is the unknown parameter and ε is the random error. f is the optimal function for the algorithm model of the present invention to solve, i.e. β0,β1,...,βn
Final predicted carbon emissions
Figure BDA00030825535100000611
Figure BDA0003082553510000071
The loss function takes the Mean Squared Error (MSE), which is defined as:
Figure BDA0003082553510000072
wherein m is the observed number.
Solving method
Based on random forests, the invention provides a carbon emission structure for predicting carbon emission in consideration of production, resident life and road traffic in county. The prediction method comprises the following specific steps:
step 1: data acquisition
And obtaining data of N counties according to the statistical yearbook, wherein N is 1814. First, a set of elements of N county areas is obtained. Screening the county data to form an initial data set required by a training model, and forming initial county town carbon emission index elements. The factors comprise the area of a built-up area, the land urbanization rate, the population scale, the county population scale, the population density of the built-up area, GDP, the average-of-people GDP, the added value of a first industry, the added value of a second industry, the total fixed asset investment sum of the whole society, the gas supply coverage rate, the heat supply pipeline density, the heat supply volume rate, the living density, the road density, the quantity of public transportation tools owned by every ten thousand persons, the quantity of motor vehicles owned by every ten thousand persons, the medical facility allocation rate, the social welfare facility allocation rate and the pavement area proportion of the footpath occupied by the footpath.
Secondly, in order to improve the accuracy and the robustness of the model, the set elements are classified. Elements are classified into three categories: production, living and transportation. The production category comprises a built-up area, a land urbanization rate, a population scale, a county population scale, a built-up area population density, a GDP (product data processing), a per-capita GDP (product data processing), a first industry added value and a second industry added value; the living categories comprise the area of a built-up area, the land urbanization rate, the population scale, the county population scale, GDP, the population-average GDP, a first industry added value, a second industry added value, the gas supply coverage rate, the heat supply pipeline density, the heat supply volume rate and the living density; the traffic category comprises the area of a built-up area, the population scale, the county population scale, GDP, the average population GDP, a first industry added value, a second industry added value, road density, the quantity of public transport means owned by every ten thousand persons, the quantity of motor vehicles owned by each person, the quantity of parks owned by every ten thousand persons, the allocation rate of medical facilities, the allocation rate of social welfare facilities and the area proportion of footpath occupied by roads.
Step 2: data pre-processing
The clustering is divided into a production class, a life class and a traffic class, the data file types are Html and Excel, the files totally comprise 21 fields and 7612 records, the content covers 1905 county domain data, and the time span is from 2010 to 2018. The raw data set comprises the area of a built-up area, the land urbanization rate, the population scale, the county population scale and the like. Data cleaning is needed due to the problems of missing data, wrong format and the like of data in the yearbook. The invention adopts a mean value substitution method to clean a data set, and the method comprises the following steps: purge missing values, purge format content, purge logic errors, and purge fee requirement data.
After data is cleaned, due to the fact that the data magnitude span of each field is large and limited by data units, the data needs to be standardized, the data is scaled in proportion, the data falls into a very small specific interval and is converted into a dimensionless pure numerical value, and therefore indexes of different units can be weighted. The invention adopts min-max standardization, if there are t elements in the set, the set element x is1,x2,......,xtPerforming transformation to obtain dimensionless new sequence y1,y2,......,yt∈[0,1]Wherein
Figure BDA0003082553510000091
And step 3: algorithmic prediction
Forming a training data set: the area of the built-up area, the land urbanization rate, the population scale, the county population scale, the density of the built-up area population, GDP per capita, the first industry added value and the second industry added value in the production class, the area of the built-up area, the land urbanization rate, the population scale, the county population scale, GDP per capita, the first industry added value, the second industry added value, the gas supply coverage rate, the heat supply pipeline density, the heat supply volume rate and the residence density in the living class, the area of the built-up area, the population scale, the county population scale, GDP per capita, the first industry added value, the second industry added value, the road density, the quantity of public transportation tools owned by each ten thousand people, the quantity of motor vehicles owned by each ten thousand people, the quantity of parks owned by the medical facilities, the distribution rate of the social welfare facilities and the pavement area proportion of the pavement area occupied road surface area in the transportation class are respectively used as the production, And the characteristic indexes of life and traffic, namely the input variables of the model, take the carbon emission in the current year which is actually measured as the output variables, and the input variables and the output variables jointly form a training data set D.
Generating a training subset and a decision tree: in each category, by using the Bootstrap methodRandomly sampling the training samples with the training samples put back, repeating the sampling for m times, and forming a training data subset D of a training data set D by the m training samplesmTraining a decision tree T for each subset of training datamAs a sample of the root node of the decision tree.
Splitting a node: when each node of the decision tree is split, N carbon emission influence indexes are randomly selected from N attributes to serve as a subset of the split current node, and N is required to be smaller than N. Selecting 1 carbon emission influence index X with optimal result according to 'least square error criterion' in the splitting subset by adopting Classification And Regression Tree (CART) methodkAs a split attribute for that node until the decision tree can no longer be split. Pruning is not performed during the splitting process, and the value of n remains unchanged.
Generating a random forest: and combining all the decision trees after splitting to form a random forest.
Predicting carbon emission: inputting the parameter vector feature in the prediction set into the trained model, wherein each decision tree TmObtaining a predicted result value
Figure BDA0003082553510000101
Figure BDA0003082553510000102
Predicting results obtained by all decision trees
Figure BDA0003082553510000103
Adding the arithmetic mean value to respectively obtain the predicted carbon emission of life class, production class and traffic class of each county
Figure BDA0003082553510000104
Adding the three types of predicted carbon emission to obtain the final predicted carbon emission value
Figure BDA0003082553510000105
The method solves the problem of carbon emission prediction in county areas, is different from the traditional carbon emission solution, and has the advantages of higher speed, higher accuracy and higher generalization capability by adopting a random forest prediction algorithm based on multiple characteristics, so that the problem of overproof carbon emission can be better prevented. According to the invention, counties with high carbon emission can be observed, and aspects with high carbon emission can be effectively managed. The carbon emission prediction capability of the model is evaluated by using Mean Squared Error (MSE) and Mean Absolute Error (MAE) indexes according to a correlation evaluation function of a regression task. The formula is as follows:
Figure BDA0003082553510000111
Figure BDA0003082553510000112
wherein, OiPredicted value of carbon emission, T, for model outputiIs an observed value of carbon emission, and n is an observed amount.
In order to verify the prediction capability of the model, the invention compares a Logistic Regression (LR) algorithm, a Least Absolute Shrinkage and Selection (LASSO) algorithm, and a Support Vector Regression (SVR) algorithm. As shown in table 1, the experimental results show that the RF algorithm achieves the best results in terms of MSE and MAE indices compared to other methods.
Figure BDA0003082553510000113
TABLE 1 comparison of the results
Those skilled in the art will appreciate that the above embodiments are merely exemplary embodiments and that various changes, substitutions, and alterations can be made without departing from the spirit and scope of the application.

Claims (6)

1. A county carbon emission prediction method based on random forests, wherein data in a prediction model is subjected to feature extraction according to three-dimensional data of county production, resident life and road traffic, and the carbon emission is predicted based on the three-dimensional data, and the prediction method comprises the following steps:
step 1: screening county data to form an initial data set required by a training model, forming initial county town carbon emission index elements, and dividing the carbon emission index elements into three types: production, living and transportation;
step 2: carrying out data cleaning and standardized data preprocessing on the data;
and step 3: forming a training data set, generating a training subset and a decision tree in each category of the production category, the life category and the traffic category by adopting a Bootstrap method, and randomly selecting N carbon emission influence indexes from N attributes as a subset of current node splitting when each node of the decision tree is split, wherein N is required to be less than N; combining all the decision trees after splitting to form a random forest;
and 4, step 4: inputting the parameter vector feature in the prediction set into the trained model, wherein each decision tree TmObtaining a predicted result value
Figure FDA0003082553500000011
Figure FDA0003082553500000012
And adding the prediction results obtained by all the decision trees to obtain an arithmetic mean value, respectively obtaining the carbon emission predicted by the life category, the production category and the traffic category of each county area, and adding the three types of predicted carbon emission to obtain a final carbon emission predicted value.
2. The method for predicting county-side carbon emissions according to claim 1, wherein index elements of the production class, the living class, and the transportation class in the carbon emission index elements are respectively used as N input variables, the actually measured carbon emission amount of the current year is used as an output variable, and the input variables and the output variables together form a training data set D.
3. The method of predicting county carbon emissions of claim 1, wherein the data cleansing comprises cleansing the initial data set using a mean-substitution method, comprising: cleaning missing values, cleaning format contents, cleaning logic errors and cleaning waste demand data; the standardized data preprocessing comprises adopting min-max standardization, and if t elements exist in the set, carrying out set element x standardization1,x2,......,xtPerforming transformation to obtain dimensionless new sequence y1,y2,......,yt∈[0,1]Wherein
Figure FDA0003082553500000021
4. The method for predicting county carbon emission according to claim 1, wherein the generating of the training subsets and the decision trees by using the Bootstrap method comprises performing replaced random sampling on training samples, and forming the training data subsets D of the training data set D by combining m training samples obtained after repeating the sampling m timesmTraining a decision tree T for each subset of training datamAs a sample of the root node of the decision tree.
5. The method of predicting county-scale carbon emissions of claim 1, wherein splitting each node of the decision tree comprises selecting an optimal outcome of 1 carbon emission impact index X according to a "least square error criterion" in a split subset by using a classification and regression tree approachkAnd as the splitting attribute of the node, until the decision tree can not be split any more, pruning is not carried out in the splitting process, and the value of n is kept unchanged.
6. The county carbon emission prediction method according to claim 1, wherein the parameter vector feature in the prediction set is defined as follows according to the collected characteristic indexes affecting the county carbon emission in the t year, according to the production class, the life class and the traffic class:
Figure FDA0003082553500000022
Figure FDA0003082553500000031
Figure FDA0003082553500000032
wherein n is1,n2,n3The characteristic category numbers of production, life and traffic are respectively, X is a characteristic index of each type, and the carbon emission prediction task is further classified into a multiple linear regression problem, namely:
Figure FDA0003082553500000033
Figure FDA0003082553500000034
Figure FDA0003082553500000035
wherein beta is an unknown parameter, epsilon is a random error, and f is an optimal function for solving an algorithm model, namely beta0,β1,...,βn(ii) a Thus, the final predicted carbon emissions
Figure FDA0003082553500000036
Wherein
Figure FDA0003082553500000037
The carbon emission caused by the characteristic of the production class element in the county area,
Figure FDA0003082553500000038
carbon emissions due to the characteristics of the life-style elements in counties,
Figure FDA0003082553500000039
the carbon emission caused by the characteristics of the traffic class elements in county areas.
CN202110570856.2A 2021-05-25 2021-05-25 County carbon emission prediction method based on random forest Pending CN113240185A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110570856.2A CN113240185A (en) 2021-05-25 2021-05-25 County carbon emission prediction method based on random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110570856.2A CN113240185A (en) 2021-05-25 2021-05-25 County carbon emission prediction method based on random forest

Publications (1)

Publication Number Publication Date
CN113240185A true CN113240185A (en) 2021-08-10

Family

ID=77138616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110570856.2A Pending CN113240185A (en) 2021-05-25 2021-05-25 County carbon emission prediction method based on random forest

Country Status (1)

Country Link
CN (1) CN113240185A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114548565A (en) * 2022-02-24 2022-05-27 天津大学 Express prediction method based on random forest
CN114819305A (en) * 2022-04-13 2022-07-29 山东高速云南发展有限公司 Path planning method based on carbon emission measurement scale
CN115015486A (en) * 2022-06-13 2022-09-06 中南大学 Carbon emission measurement and calculation method based on regression tree model
CN116108998A (en) * 2023-02-22 2023-05-12 葛洲坝集团交通投资有限公司 Expressway construction project carbon emission prediction method and system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114548565A (en) * 2022-02-24 2022-05-27 天津大学 Express prediction method based on random forest
CN114819305A (en) * 2022-04-13 2022-07-29 山东高速云南发展有限公司 Path planning method based on carbon emission measurement scale
CN114819305B (en) * 2022-04-13 2023-03-14 山东高速云南发展有限公司 Path planning method based on carbon emission measurement scale
CN115015486A (en) * 2022-06-13 2022-09-06 中南大学 Carbon emission measurement and calculation method based on regression tree model
CN116108998A (en) * 2023-02-22 2023-05-12 葛洲坝集团交通投资有限公司 Expressway construction project carbon emission prediction method and system
CN116108998B (en) * 2023-02-22 2023-12-15 葛洲坝集团交通投资有限公司 Expressway construction project carbon emission prediction method and system

Similar Documents

Publication Publication Date Title
CN113240185A (en) County carbon emission prediction method based on random forest
Chang A comparative study of artificial neural networks, and decision trees for digital game content stocks price prediction
CN112070125A (en) Prediction method of unbalanced data set based on isolated forest learning
CN111785329A (en) Single-cell RNA sequencing clustering method based on confrontation automatic encoder
CN110826785B (en) High-risk road section identification method based on k-medoids clustering and Poisson inverse Gaussian
CN110969304A (en) Method, system and device for predicting production capacity of digital factory
CN115794803B (en) Engineering audit problem monitoring method and system based on big data AI technology
Medeiros et al. Applying the coral reefs optimization algorithm to clustering problems
CN112330052A (en) Distribution transformer load prediction method
CN103699812A (en) Plant variety authenticity authenticating site screening method based on genetic algorithm
CN104966106A (en) Biological age step-by-step predication method based on support vector machine
CN115147155A (en) Railway freight customer loss prediction method based on ensemble learning
CN114936694A (en) Photovoltaic power prediction method based on double integration models
CN113344130B (en) Method and device for generating differentiated river patrol strategy
CN115481841A (en) Material demand prediction method based on feature extraction and improved random forest
CN115660221B (en) Oil and gas reservoir economic recoverable reserve assessment method and system based on hybrid neural network
CN116662860A (en) User portrait and classification method based on energy big data
CN114757433B (en) Method for rapidly identifying relative risk of drinking water source antibiotic resistance
Zhou et al. Data-driven solutions for building environmental impact assessment
Shi et al. Random forest algorithm based on genetic algorithm optimization for property-related crime prediction
Boyapati et al. An Analysis of House Price Prediction Using Ensemble Learning Algorithms
CN113657441A (en) Classification algorithm based on weighted Pearson correlation coefficient and combined with feature screening
MEHR et al. Electrical energy demand prediction: A comparison between genetic programming and decision tree
Can et al. A literature review on the use of genetic algorithms in data mining
Shen et al. Stock trends prediction by hypergraph modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination