CN113240185A - County carbon emission prediction method based on random forest - Google Patents
County carbon emission prediction method based on random forest Download PDFInfo
- Publication number
- CN113240185A CN113240185A CN202110570856.2A CN202110570856A CN113240185A CN 113240185 A CN113240185 A CN 113240185A CN 202110570856 A CN202110570856 A CN 202110570856A CN 113240185 A CN113240185 A CN 113240185A
- Authority
- CN
- China
- Prior art keywords
- carbon emission
- county
- data
- training
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 title claims abstract description 93
- 229910052799 carbon Inorganic materials 0.000 title claims abstract description 93
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 34
- 238000012549 training Methods 0.000 claims abstract description 43
- 238000003066 decision tree Methods 0.000 claims description 37
- 238000004519 manufacturing process Methods 0.000 claims description 30
- 238000004422 calculation algorithm Methods 0.000 claims description 29
- 238000004140 cleaning Methods 0.000 claims description 13
- 238000005070 sampling Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000006467 substitution reaction Methods 0.000 claims description 4
- 238000012417 linear regression Methods 0.000 claims description 3
- 238000013138 pruning Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000013459 approach Methods 0.000 claims description 2
- 238000000605 extraction Methods 0.000 claims description 2
- 239000002699 waste material Substances 0.000 claims description 2
- 238000007477 logistic regression Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000010926 purge Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000010220 Pearson correlation analysis Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Educational Administration (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Operations Research (AREA)
- General Engineering & Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Primary Health Care (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a county carbon emission prediction method based on random forests, which considers a multi-feature random forest model to train and predict carbon emission in county, can comprehensively extract multi-dimensional features in county, can realize parallel training operation facing large data volume of county, and has high training speed and simple realization. In addition, after the random forest model training is completed, the important degree of influence of each characteristic on carbon emission can be obtained, so that the carbon emission pollution is effectively treated.
Description
Technical Field
The invention relates to the field of artificial intelligence application, in particular to a method for predicting county carbon emission by performing model training by using characteristic data after county multi-element optimization.
Background
Currently, China still lacks in carbon emission prediction research, and cannot effectively prevent excessive emission in a certain area and lose balance of carbon emission among areas. With the development of artificial intelligence, the carbon emission characteristics can be constructed by analyzing factors influencing carbon emission and utilizing characteristic engineering, and the carbon emission amount of county areas can be predicted from the aspect of characteristics, so that the accuracy of carbon emission prediction is improved. Through yearbook analysis, relevant indexes of economic development, traffic trip, resident life and ecological greening can be used as direct carbon emission characteristics, relevant indexes of scale structure and energy efficiency are used as indirect carbon emission characteristics, and the direct carbon emission characteristics and the indirect carbon emission characteristics are combined through a random forest algorithm, so that the county carbon emission amount is effectively predicted.
Disclosure of Invention
In order to solve the above problems, the present invention provides a county carbon emission prediction method based on a random forest algorithm, the method comprising:
a county carbon emission prediction method based on random forests, wherein data in a prediction model is subjected to feature extraction according to three-dimensional data of county production, resident life and road traffic, and the carbon emission is predicted based on the three-dimensional data, and the prediction method comprises the following steps:
step 1: screening county data to form an initial data set required by a training model, forming initial county town carbon emission index elements, and dividing the carbon emission index elements into three types: production, living and transportation;
step 2: carrying out data cleaning and standardized data preprocessing on the data;
and step 3: forming a training data set, generating a training subset and a decision tree in each category of the production category, the life category and the traffic category by adopting a Bootstrap method, and randomly selecting N carbon emission influence indexes from N attributes as a subset of current node splitting when each node of the decision tree is split, wherein N is required to be less than N; combining all the decision trees after splitting to form a random forest;
and 4, step 4: inputting the parameter vector feature in the prediction set into the trained model, wherein each decision tree TmObtaining a predicted result value And adding the prediction results obtained by all the decision trees to obtain an arithmetic mean value, respectively obtaining the carbon emission predicted by the life category, the production category and the traffic category of each county area, and adding the three types of predicted carbon emission to obtain a final carbon emission predicted value.
Further, the index elements of the production type, the life type and the traffic type in the carbon emission index elements are respectively used as N input variables, the actually measured carbon emission in the current year is used as an output variable, and the input variable and the output variable jointly form a training data set D.
Further, the data cleaning includes cleaning the initial data set by using a mean value substitution method, and includes the steps of: cleaning missing values, cleaning format contents, cleaning logic errors and cleaning waste demand data; the standardized data preprocessing comprises adopting min-max standardization, and if t elements exist in the set, carrying out set element x standardization1,x2,......,xtPerforming transformation to obtain dimensionless new sequence y1,y2,......,yt∈[0,1]Wherein
Further, the step of generating the training subset and the decision tree by adopting the Bootstrap method comprises the steps of carrying out replaced random sampling on the training samples, repeating the sampling for m times and then jointly obtaining m training samplesTraining data subset D forming a training data set DmTraining a decision tree T for each subset of training datamAs a sample of the root node of the decision tree.
Further, each of the node splits of the decision tree includes selecting the 1 carbon emission impact index X of the optimal outcome in accordance with the "least square error criterion" in the split subset by employing a classification and regression tree approachkAnd as the splitting attribute of the node, until the decision tree can not be split any more, pruning is not carried out in the splitting process, and the value of n is kept unchanged.
Further, the parameter vector feature in the prediction set can be defined as follows according to the collected characteristic indexes affecting the t-year carbon emission of county areas according to the production class, the life class and the traffic class:
wherein n is1,n2,n3The characteristic category numbers of production, life and traffic are respectively, X is a characteristic index of each type, and the carbon emission prediction task is further classified into a multiple linear regression problem, namely:
wherein beta is an unknown parameter, epsilon is a random error, and f is an optimal function for solving an algorithm model, namely beta0,β1,...,βn(ii) a Thus, the final predicted carbon emissions
WhereinThe carbon emission caused by the characteristic of the production class element in the county area,carbon emissions due to the characteristics of the life-style elements in counties,the carbon emission caused by the characteristics of the traffic class elements in county areas.
The invention provides a random forest model considering multiple features to train and predict carbon emission in county areas, which can comprehensively extract the multiple-dimensional features in the county areas and does not need to select from the multiple county area features. Because the decision trees in the random forest model are independent, parallel training operation can be realized in the face of large data volume of county areas, the training speed is high, and the realization is simple. In addition, after the random forest model training is completed, the important degree of influence of each characteristic on carbon emission can be obtained, so that enterprises and governments can better control the carbon emission, and the carbon emission pollution is effectively treated.
Drawings
Fig. 1 shows an algorithm procedure of a Random Forest (RF) algorithm.
Fig. 2 shows the comparison result of the RF algorithm and LR algorithm, LASSO algorithm, SVR algorithm of the present invention with respect to county-area life-type carbon emission prediction.
Fig. 3 shows the comparison results of the RF algorithm and LR algorithm, LASSO algorithm, SVR algorithm of the present invention with respect to county production carbon-like emission prediction.
Fig. 4 shows the comparison result of the RF algorithm and LR algorithm, LASSO algorithm, SVR algorithm of the present invention with respect to county traffic-like carbon emission prediction.
Detailed Description
The following examples are presented to enable those skilled in the art to more fully understand the present invention and are not intended to limit the invention in any way.
The invention mainly utilizes a Random Forest (RF) algorithm, wherein the Random Forest refers to a classifier which trains and predicts samples by utilizing a plurality of trees, the RF algorithm is a learning method which adopts a bagging thought in integrated learning, the RF algorithm is a model consisting of a plurality of decision trees, and each decision tree has no correlation. The algorithmic process is shown in fig. 1. In the process of the RF algorithm, firstly, a bootstrap method, namely a replaced random sampling method, is adopted, n samples are extracted from a data set to serve as a training set, a decision tree is trained through each training set, and the experiment is repeated until m decision trees are constructed. And then, taking the average value of the prediction results of each decision tree of the random forest as the most overall prediction result, thereby performing overall prediction. The prediction accuracy by using the random forest is high, the method can be effectively operated on a large data set, and overfitting is not easy to occur. In addition, the model can be trained in parallel due to the fact that the model is composed of a plurality of decision trees, training speed is improved, random forests are insensitive to noise in training sets, and comprehensive decisions of the decision trees are more stable than a single decision tree algorithm.
Based on the method, the carbon emission in the county area is predicted by comprehensively extracting the three-dimensional data characteristics of production, resident life and road traffic in the county area and establishing a multi-characteristic random forest model, so that the method has more accurate prediction performance.
Description of the problem
First, the county carbon emission prediction is assumed to be a regression prediction process, and collected characteristics affecting the county in t years are classified into three categories, namely production (production), life (life), and traffic (traffic). Let C be the total carbon emission of a county i in t yearsitAccording to the classification of carbon emissions, thenWhereinThe carbon emission caused by the characteristic of the production class element in the county area,carbon emissions due to the characteristics of the life-style elements in counties,the carbon emission caused by the characteristics of the traffic class elements in county areas.
According to the collected characteristic indexes affecting the t-year carbon emission of county areas, the method can be defined as follows according to production, life and traffic:
wherein n is1,n2,n3The characteristic category numbers of production, life and traffic are respectively, and X is a characteristic index of each type.
The average influence coefficient of each index is obtained through a Pearson correlation analysis method, the linear correlation between the characteristics and the carbon emission can be determined, and the carbon emission prediction task can be summarized as a multiple linear regression problem, namely:
where β is the unknown parameter and ε is the random error. f is the optimal function for the algorithm model of the present invention to solve, i.e. β0,β1,...,βn。
The loss function takes the Mean Squared Error (MSE), which is defined as:
wherein m is the observed number.
Solving method
Based on random forests, the invention provides a carbon emission structure for predicting carbon emission in consideration of production, resident life and road traffic in county. The prediction method comprises the following specific steps:
step 1: data acquisition
And obtaining data of N counties according to the statistical yearbook, wherein N is 1814. First, a set of elements of N county areas is obtained. Screening the county data to form an initial data set required by a training model, and forming initial county town carbon emission index elements. The factors comprise the area of a built-up area, the land urbanization rate, the population scale, the county population scale, the population density of the built-up area, GDP, the average-of-people GDP, the added value of a first industry, the added value of a second industry, the total fixed asset investment sum of the whole society, the gas supply coverage rate, the heat supply pipeline density, the heat supply volume rate, the living density, the road density, the quantity of public transportation tools owned by every ten thousand persons, the quantity of motor vehicles owned by every ten thousand persons, the medical facility allocation rate, the social welfare facility allocation rate and the pavement area proportion of the footpath occupied by the footpath.
Secondly, in order to improve the accuracy and the robustness of the model, the set elements are classified. Elements are classified into three categories: production, living and transportation. The production category comprises a built-up area, a land urbanization rate, a population scale, a county population scale, a built-up area population density, a GDP (product data processing), a per-capita GDP (product data processing), a first industry added value and a second industry added value; the living categories comprise the area of a built-up area, the land urbanization rate, the population scale, the county population scale, GDP, the population-average GDP, a first industry added value, a second industry added value, the gas supply coverage rate, the heat supply pipeline density, the heat supply volume rate and the living density; the traffic category comprises the area of a built-up area, the population scale, the county population scale, GDP, the average population GDP, a first industry added value, a second industry added value, road density, the quantity of public transport means owned by every ten thousand persons, the quantity of motor vehicles owned by each person, the quantity of parks owned by every ten thousand persons, the allocation rate of medical facilities, the allocation rate of social welfare facilities and the area proportion of footpath occupied by roads.
Step 2: data pre-processing
The clustering is divided into a production class, a life class and a traffic class, the data file types are Html and Excel, the files totally comprise 21 fields and 7612 records, the content covers 1905 county domain data, and the time span is from 2010 to 2018. The raw data set comprises the area of a built-up area, the land urbanization rate, the population scale, the county population scale and the like. Data cleaning is needed due to the problems of missing data, wrong format and the like of data in the yearbook. The invention adopts a mean value substitution method to clean a data set, and the method comprises the following steps: purge missing values, purge format content, purge logic errors, and purge fee requirement data.
After data is cleaned, due to the fact that the data magnitude span of each field is large and limited by data units, the data needs to be standardized, the data is scaled in proportion, the data falls into a very small specific interval and is converted into a dimensionless pure numerical value, and therefore indexes of different units can be weighted. The invention adopts min-max standardization, if there are t elements in the set, the set element x is1,x2,......,xtPerforming transformation to obtain dimensionless new sequence y1,y2,......,yt∈[0,1]Wherein
And step 3: algorithmic prediction
Forming a training data set: the area of the built-up area, the land urbanization rate, the population scale, the county population scale, the density of the built-up area population, GDP per capita, the first industry added value and the second industry added value in the production class, the area of the built-up area, the land urbanization rate, the population scale, the county population scale, GDP per capita, the first industry added value, the second industry added value, the gas supply coverage rate, the heat supply pipeline density, the heat supply volume rate and the residence density in the living class, the area of the built-up area, the population scale, the county population scale, GDP per capita, the first industry added value, the second industry added value, the road density, the quantity of public transportation tools owned by each ten thousand people, the quantity of motor vehicles owned by each ten thousand people, the quantity of parks owned by the medical facilities, the distribution rate of the social welfare facilities and the pavement area proportion of the pavement area occupied road surface area in the transportation class are respectively used as the production, And the characteristic indexes of life and traffic, namely the input variables of the model, take the carbon emission in the current year which is actually measured as the output variables, and the input variables and the output variables jointly form a training data set D.
Generating a training subset and a decision tree: in each category, by using the Bootstrap methodRandomly sampling the training samples with the training samples put back, repeating the sampling for m times, and forming a training data subset D of a training data set D by the m training samplesmTraining a decision tree T for each subset of training datamAs a sample of the root node of the decision tree.
Splitting a node: when each node of the decision tree is split, N carbon emission influence indexes are randomly selected from N attributes to serve as a subset of the split current node, and N is required to be smaller than N. Selecting 1 carbon emission influence index X with optimal result according to 'least square error criterion' in the splitting subset by adopting Classification And Regression Tree (CART) methodkAs a split attribute for that node until the decision tree can no longer be split. Pruning is not performed during the splitting process, and the value of n remains unchanged.
Generating a random forest: and combining all the decision trees after splitting to form a random forest.
Predicting carbon emission: inputting the parameter vector feature in the prediction set into the trained model, wherein each decision tree TmObtaining a predicted result value Predicting results obtained by all decision treesAdding the arithmetic mean value to respectively obtain the predicted carbon emission of life class, production class and traffic class of each countyAdding the three types of predicted carbon emission to obtain the final predicted carbon emission value
The method solves the problem of carbon emission prediction in county areas, is different from the traditional carbon emission solution, and has the advantages of higher speed, higher accuracy and higher generalization capability by adopting a random forest prediction algorithm based on multiple characteristics, so that the problem of overproof carbon emission can be better prevented. According to the invention, counties with high carbon emission can be observed, and aspects with high carbon emission can be effectively managed. The carbon emission prediction capability of the model is evaluated by using Mean Squared Error (MSE) and Mean Absolute Error (MAE) indexes according to a correlation evaluation function of a regression task. The formula is as follows:
wherein, OiPredicted value of carbon emission, T, for model outputiIs an observed value of carbon emission, and n is an observed amount.
In order to verify the prediction capability of the model, the invention compares a Logistic Regression (LR) algorithm, a Least Absolute Shrinkage and Selection (LASSO) algorithm, and a Support Vector Regression (SVR) algorithm. As shown in table 1, the experimental results show that the RF algorithm achieves the best results in terms of MSE and MAE indices compared to other methods.
TABLE 1 comparison of the results
Those skilled in the art will appreciate that the above embodiments are merely exemplary embodiments and that various changes, substitutions, and alterations can be made without departing from the spirit and scope of the application.
Claims (6)
1. A county carbon emission prediction method based on random forests, wherein data in a prediction model is subjected to feature extraction according to three-dimensional data of county production, resident life and road traffic, and the carbon emission is predicted based on the three-dimensional data, and the prediction method comprises the following steps:
step 1: screening county data to form an initial data set required by a training model, forming initial county town carbon emission index elements, and dividing the carbon emission index elements into three types: production, living and transportation;
step 2: carrying out data cleaning and standardized data preprocessing on the data;
and step 3: forming a training data set, generating a training subset and a decision tree in each category of the production category, the life category and the traffic category by adopting a Bootstrap method, and randomly selecting N carbon emission influence indexes from N attributes as a subset of current node splitting when each node of the decision tree is split, wherein N is required to be less than N; combining all the decision trees after splitting to form a random forest;
and 4, step 4: inputting the parameter vector feature in the prediction set into the trained model, wherein each decision tree TmObtaining a predicted result value And adding the prediction results obtained by all the decision trees to obtain an arithmetic mean value, respectively obtaining the carbon emission predicted by the life category, the production category and the traffic category of each county area, and adding the three types of predicted carbon emission to obtain a final carbon emission predicted value.
2. The method for predicting county-side carbon emissions according to claim 1, wherein index elements of the production class, the living class, and the transportation class in the carbon emission index elements are respectively used as N input variables, the actually measured carbon emission amount of the current year is used as an output variable, and the input variables and the output variables together form a training data set D.
3. The method of predicting county carbon emissions of claim 1, wherein the data cleansing comprises cleansing the initial data set using a mean-substitution method, comprising: cleaning missing values, cleaning format contents, cleaning logic errors and cleaning waste demand data; the standardized data preprocessing comprises adopting min-max standardization, and if t elements exist in the set, carrying out set element x standardization1,x2,......,xtPerforming transformation to obtain dimensionless new sequence y1,y2,......,yt∈[0,1]Wherein
4. The method for predicting county carbon emission according to claim 1, wherein the generating of the training subsets and the decision trees by using the Bootstrap method comprises performing replaced random sampling on training samples, and forming the training data subsets D of the training data set D by combining m training samples obtained after repeating the sampling m timesmTraining a decision tree T for each subset of training datamAs a sample of the root node of the decision tree.
5. The method of predicting county-scale carbon emissions of claim 1, wherein splitting each node of the decision tree comprises selecting an optimal outcome of 1 carbon emission impact index X according to a "least square error criterion" in a split subset by using a classification and regression tree approachkAnd as the splitting attribute of the node, until the decision tree can not be split any more, pruning is not carried out in the splitting process, and the value of n is kept unchanged.
6. The county carbon emission prediction method according to claim 1, wherein the parameter vector feature in the prediction set is defined as follows according to the collected characteristic indexes affecting the county carbon emission in the t year, according to the production class, the life class and the traffic class:
wherein n is1,n2,n3The characteristic category numbers of production, life and traffic are respectively, X is a characteristic index of each type, and the carbon emission prediction task is further classified into a multiple linear regression problem, namely:
wherein beta is an unknown parameter, epsilon is a random error, and f is an optimal function for solving an algorithm model, namely beta0,β1,...,βn(ii) a Thus, the final predicted carbon emissionsWhereinThe carbon emission caused by the characteristic of the production class element in the county area,carbon emissions due to the characteristics of the life-style elements in counties,the carbon emission caused by the characteristics of the traffic class elements in county areas.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110570856.2A CN113240185A (en) | 2021-05-25 | 2021-05-25 | County carbon emission prediction method based on random forest |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110570856.2A CN113240185A (en) | 2021-05-25 | 2021-05-25 | County carbon emission prediction method based on random forest |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113240185A true CN113240185A (en) | 2021-08-10 |
Family
ID=77138616
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110570856.2A Pending CN113240185A (en) | 2021-05-25 | 2021-05-25 | County carbon emission prediction method based on random forest |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113240185A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114548565A (en) * | 2022-02-24 | 2022-05-27 | 天津大学 | Express prediction method based on random forest |
CN114819305A (en) * | 2022-04-13 | 2022-07-29 | 山东高速云南发展有限公司 | Path planning method based on carbon emission measurement scale |
CN115015486A (en) * | 2022-06-13 | 2022-09-06 | 中南大学 | Carbon emission measurement and calculation method based on regression tree model |
CN116108998A (en) * | 2023-02-22 | 2023-05-12 | 葛洲坝集团交通投资有限公司 | Expressway construction project carbon emission prediction method and system |
-
2021
- 2021-05-25 CN CN202110570856.2A patent/CN113240185A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114548565A (en) * | 2022-02-24 | 2022-05-27 | 天津大学 | Express prediction method based on random forest |
CN114819305A (en) * | 2022-04-13 | 2022-07-29 | 山东高速云南发展有限公司 | Path planning method based on carbon emission measurement scale |
CN114819305B (en) * | 2022-04-13 | 2023-03-14 | 山东高速云南发展有限公司 | Path planning method based on carbon emission measurement scale |
CN115015486A (en) * | 2022-06-13 | 2022-09-06 | 中南大学 | Carbon emission measurement and calculation method based on regression tree model |
CN116108998A (en) * | 2023-02-22 | 2023-05-12 | 葛洲坝集团交通投资有限公司 | Expressway construction project carbon emission prediction method and system |
CN116108998B (en) * | 2023-02-22 | 2023-12-15 | 葛洲坝集团交通投资有限公司 | Expressway construction project carbon emission prediction method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113240185A (en) | County carbon emission prediction method based on random forest | |
Chang | A comparative study of artificial neural networks, and decision trees for digital game content stocks price prediction | |
CN112070125A (en) | Prediction method of unbalanced data set based on isolated forest learning | |
CN111785329A (en) | Single-cell RNA sequencing clustering method based on confrontation automatic encoder | |
CN110826785B (en) | High-risk road section identification method based on k-medoids clustering and Poisson inverse Gaussian | |
CN110969304A (en) | Method, system and device for predicting production capacity of digital factory | |
CN115794803B (en) | Engineering audit problem monitoring method and system based on big data AI technology | |
Medeiros et al. | Applying the coral reefs optimization algorithm to clustering problems | |
CN112330052A (en) | Distribution transformer load prediction method | |
CN103699812A (en) | Plant variety authenticity authenticating site screening method based on genetic algorithm | |
CN104966106A (en) | Biological age step-by-step predication method based on support vector machine | |
CN115147155A (en) | Railway freight customer loss prediction method based on ensemble learning | |
CN114936694A (en) | Photovoltaic power prediction method based on double integration models | |
CN113344130B (en) | Method and device for generating differentiated river patrol strategy | |
CN115481841A (en) | Material demand prediction method based on feature extraction and improved random forest | |
CN115660221B (en) | Oil and gas reservoir economic recoverable reserve assessment method and system based on hybrid neural network | |
CN116662860A (en) | User portrait and classification method based on energy big data | |
CN114757433B (en) | Method for rapidly identifying relative risk of drinking water source antibiotic resistance | |
Zhou et al. | Data-driven solutions for building environmental impact assessment | |
Shi et al. | Random forest algorithm based on genetic algorithm optimization for property-related crime prediction | |
Boyapati et al. | An Analysis of House Price Prediction Using Ensemble Learning Algorithms | |
CN113657441A (en) | Classification algorithm based on weighted Pearson correlation coefficient and combined with feature screening | |
MEHR et al. | Electrical energy demand prediction: A comparison between genetic programming and decision tree | |
Can et al. | A literature review on the use of genetic algorithms in data mining | |
Shen et al. | Stock trends prediction by hypergraph modeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |