CN114139719A - Multi-source artificial heat space-time quantization method based on machine learning - Google Patents

Multi-source artificial heat space-time quantization method based on machine learning Download PDF

Info

Publication number
CN114139719A
CN114139719A CN202111354918.2A CN202111354918A CN114139719A CN 114139719 A CN114139719 A CN 114139719A CN 202111354918 A CN202111354918 A CN 202111354918A CN 114139719 A CN114139719 A CN 114139719A
Authority
CN
China
Prior art keywords
heat
ahf
county
data
monthly
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111354918.2A
Other languages
Chinese (zh)
Inventor
孟庆岩
钱江康
张琳琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hainan Research Institute Institute Of Aerospace Information Chinese Academy Of Sciences
Aerospace Information Research Institute of CAS
Original Assignee
Hainan Research Institute Institute Of Aerospace Information Chinese Academy Of Sciences
Aerospace Information Research Institute of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hainan Research Institute Institute Of Aerospace Information Chinese Academy Of Sciences, Aerospace Information Research Institute of CAS filed Critical Hainan Research Institute Institute Of Aerospace Information Chinese Academy Of Sciences
Priority to CN202111354918.2A priority Critical patent/CN114139719A/en
Publication of CN114139719A publication Critical patent/CN114139719A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

Man-made heat has a significant impact on city climate and air quality, but currently there is no accurate and efficient estimation method for multi-source man-made heat. The invention improves the flow of artificial heat modeling and provides a multisource artificial heat space-time quantization method based on machine learning. The method comprises the following steps: step 1) calculating county-level annual average Artificial Heat Flux (AHF) based on energy consumption and socioeconomic data; step 2) carrying out time-dimension scale reduction treatment by using artificial heat with alternative data as different sources to obtain county-level monthly AHF; step 3) calculating a monthly county-level average value of the artificial heat related multi-source data as an explanatory variable, and forming a training sample with the corresponding AHF; step 4) training models based on two machine learning algorithms of a gradient lifting regression tree and a Cubist, carrying out error analysis, and selecting an optimal algorithm for different heat sources for modeling; and 5) inputting the specific raster data into the optimal model to calculate the multi-source artificial heat flux of the specific area at specific time.

Description

Multi-source artificial heat space-time quantization method based on machine learning
Technical Field
The invention relates to a multisource artificial heat space-time quantification method based on machine learning.
Background
Anthropogenic heat rejection has a significant impact on city climate and air quality, and is also an important data input for climate modeling. Accurate artificial heat data can be used as a regional or global scale climate simulation ground surface boundary condition, influence of human activities on urban environment is reasonably evaluated, and the method is an important basis for solving the problems of climate warming, heat island effect, air pollution and the like. Artificial Heat Flux (AHF) is artificial heat per unit time and per unit area and is the main target of artificial heat estimation. The traditional AHF estimation method comprises an energy balance equation method, building energy efficiency modeling and an energy consumption inventory method. Among them, the energy consumption inventory method is a relatively general and reliable method for estimating AHF. Further, to simplify the complexity of the AHF calculation and reduce repetitive work, some variables having strong correlation with heat emission, such as night lights and air pollutants, are used in combination with machine learning algorithms to build more practical empirical models.
While a number of methods currently proposed have been able to meet general research and application requirements, simple models such as simple linear regression and single variables have not been able to respond well to challenges in the face of complex spatio-temporal variations of multi-source artificial heat. In addition, in previous studies, AHF samples were usually estimated from provinces or cities (Chen et al, 2012; Chen et al, 2020; Sailor et al, 2015), but the administrative area of a province or city was generally large, and the proportion of built-up areas was low, which resulted in high artificial heat emission and low AHF, which caused some trouble in analyzing the estimation results. For the estimation model, although the explanatory variables are also processed into the same spatial scale, the large spatial scale mean processing reduces the otherwise scarce medium and high value AHF samples; meanwhile, the estimation model based on the annual average AHF is prone to enabling samples to become similar, and much time information is lost. In summary, current studies do not adequately account for the diversity and variability of training samples. On the other hand, machine learning-based AHF modeling is often affected by different algorithms, different artificial heat sources may have different optimized modeling algorithms due to differences in their spatio-temporal characteristics, and this influencing factor has not yet been considered and practiced. Aiming at the problems of the existing artificial calorification method, the invention constructs a sample with more space-time diversity, and simultaneously uses two more complex machine learning algorithms to construct an optimal multisource AHF estimation model.
A Gradient boosting regression tree (GBDT) is a very classical integrated learning algorithm, which integrates weak learners to generate strong learners, and is widely used in data analysis and prediction in multiple fields. The Cubist algorithm was developed from a model tree (Quinlan, 1992; Quinlan, 1993; Quinlan,1996) and is a rule-based algorithm in which the leaf nodes are a multiple linear regression model rather than single values. Similar to GBDT, Cubist can also carry out ensemble learning and has better performance in the fields of traffic flow prediction, air surface temperature estimation, ground surface coverage estimation, leaf area index estimation and artificial heat estimation.
Disclosure of Invention
Aiming at the technical defect problem of accurate and efficient estimation of multisource artificial heat, the multisource artificial heat space-time quantization method based on machine learning provided by the invention is mainly realized based on the following steps:
step 1) calculating a county grade annual average AHF by adopting a top-down energy consumption inventory method based on energy consumption data and socioeconomic data;
step 2) carrying out time-dimension scale reduction treatment by using artificial heat with alternative data as different sources to obtain county-level monthly AHF;
step 3) preprocessing a multi-source data set related to artificial heat, calculating a monthly county-level average value as an explanatory variable, and forming a training sample with a corresponding AHF;
step 4) training models based on two machine learning algorithms of a gradient lifting regression tree and a Cubist, carrying out error analysis, selecting an optimal algorithm for modeling different heat sources, and simultaneously using a simple linear regression model based on night lamplight as a reference for precision lifting;
and 5) inputting the specific grid data into the optimal model to calculate the artificial heat flux of the specific area at the specific time, and outputting a grid result.
Drawings
FIG. 1 is a technical flow diagram;
FIG. 2 shows the error results of the models;
FIG. 3 is a spatio-temporal profile of a model output multi-source AHF;
Detailed Description
The invention 'a multi-source artificial thermal spatiotemporal quantization method based on machine learning' is further explained below with reference to the accompanying drawings.
Estimation of monthly AHF at county level
Artificial heat encompasses four sources of industry, construction, traffic, and human metabolism. And sequentially estimating annual average AHF at provincial level, city level and county level based on an energy inventory method. When the urban AHF is reduced to the county level, the industrial heat is calculated according to the proportion of the quantity of industrial POI in the county to the whole city, the traffic heat and the building heat are calculated according to the proportion of the county population, and the metabolic heat is estimated directly through the county population. The monthly AHF is calculated from the temporal variation of the replacement data, which is a common time down scaling rule in the top-down energy inventory method. In the absence of heating in the study area, monthly building heat and industrial heat are estimated based on monthly power consumption, monthly traffic heat is estimated based on monthly freight volume, and metabolic heat is fixed, participating in model training as a whole with building heat. The specific calculation is as follows:
Figure BDA0003357152590000031
wherein the content of the first and second substances,
Figure BDA0003357152590000032
AHF of mth month of heat source representing type S, i represents prefecture city, j represents prefecture;
Figure BDA0003357152590000033
emission fraction (%) for the mth month of the corresponding heat source;
Figure BDA0003357152590000034
annual heat emission (J) for industrial traffic and buildings, respectively. A. thejIs the area of county (m)2);TyIs the time of year(s). The process can be realized by using Python and R language programming, and can also be directly calculated by using Excel.
(II) construction of training samples
The method comprises the steps of establishing a density grid at a search radius of 1000 meters for roads, railways and industrial POI, and calculating a distance grid at the same time, wherein a point density tool and a Euclidean distance tool in ArcGIS can be used for calculation respectively. Here, roads and railways are special variables of traffic heat, industrial POI and railways are special variables of industrial heat, and building area ratio is a special variable of building metabolic heat.
The common variable participates in the estimation of three heat sources at the same time, and the processing is as follows: remote sensing data (surface temperature, NDVI and the like), meteorological data (air temperature, humidity, wind speed) and topographic data (DEM and gradient) are subjected to data screening, monthly synthesis, cutting, re-projection and re-sampling through a Google Earth Engine (GEE); and finally, carrying out partition statistics on the grids in ArcGIS, outputting an interpretation variable table, adding two classification variables of the region and the month to which the two classification variables belong, and forming a training sample together with the every-month county-level AHF.
(III) training and evaluation of models
Training samples are led into R, 80% of samples are randomly selected for training and testing, and the rest samples are used for model verification. GBDT and Cubist both contain more hyper-parameters and can be adjusted, a caret packet, a gbm packet and a Cubist packet are called in R, GBDT and Cubist models are respectively trained according to different AHF sources, 10-fold cross validation repeated for 10 times is used for parameter adjustment, optimal model parameters are selected according to the principle of minimizing training/testing errors, multi-source artificial heat space-time quantification models (GBDT and Cubist) are constructed, and finally an algorithm (GBDT or Cubist) used for modeling each heat source is determined according to the validation errors; a Simple Linear Regression (SLR) model based on night light is used as a reference for testing the accuracy improvement of the two complex algorithms. All the above processes can be completed in R.
(III) outputting of the results
Calling a raster bag in the R, converting the interpretation variables in the grid form into a data frame format, and constructing a simple classification regression tree model to fill in the missing variables; and calling the trained model prediction result, constructing a grid input prediction result, and finally outputting the multisource artificial heat space-time distribution in the grid form.

Claims (6)

1. A multisource artificial heat space-time quantification method based on machine learning is mainly realized by the following technical steps:
step 1) calculating an annual average Artificial Heat Flux (AHF) of a county level by adopting a top-down energy consumption inventory method based on energy consumption data and socioeconomic data;
step 2) carrying out time-dimension scale reduction treatment by using artificial heat with alternative data as different sources to obtain county-level monthly AHF;
step 3) preprocessing a multi-source data set related to artificial heat, calculating a monthly county-level average value as an explanatory variable, and forming a training sample with a corresponding AHF;
step 4) training models based on two machine learning algorithms of a gradient lifting regression tree and a Cubist, carrying out error analysis, selecting an optimal algorithm for modeling different heat sources, and simultaneously using a simple linear regression model based on night lamplight as a reference for precision lifting;
and 5) inputting the specific grid data into the optimal model to calculate the artificial heat flux of the specific area at the specific time, and outputting a grid result.
2. The method of claim 1, wherein step 1): artificial heat encompasses four sources of industry, construction, traffic, and human metabolism. And sequentially estimating annual average AHF at provincial level, city level and county level based on an energy inventory method. When the urban AHF is reduced to the county level, the industrial heat is calculated according to the proportion of the quantity of industrial POI in the county to the whole city, the traffic heat and the building heat are calculated according to the proportion of the county population, and the metabolic heat is estimated directly through the county population.
3. The method of claim 1, wherein step 2): the monthly AHF is calculated from the temporal variation of the replacement data, which is a common time down scaling rule in the top-down energy inventory method. In the absence of heating in the study area, monthly building heat and industrial heat are estimated based on monthly power consumption, monthly traffic heat is estimated based on monthly freight volume, and metabolic heat is fixed, participating in model training as a whole with building heat. The specific calculation is as follows:
Figure FDA0003357152580000011
wherein the content of the first and second substances,
Figure FDA0003357152580000012
AHF of mth month of heat source representing type S, i represents prefecture city, j represents prefecture;
Figure FDA0003357152580000013
emission fraction (%) for the mth month of the corresponding heat source;
Figure FDA0003357152580000014
annual heat emission (J) for industrial traffic and buildings, respectively. A. thejIs the area of county (m)2);TyIs the time of year(s). The process can be realized by using Python and R language programming, and can also be directly calculated by using Excel.
4. The method of claim 1, wherein step 3): the method comprises the steps of establishing a density grid at a search radius of 1000 meters for roads, railways and industrial POI, and calculating a distance grid at the same time, wherein a point density tool and a Euclidean distance tool in ArcGIS can be used for calculation respectively. Here, roads and railways are special variables of traffic heat, industrial POI and railways are special variables of industrial heat, and building area ratio is a special variable of building metabolic heat. The common variable participates in the estimation of three heat sources at the same time, and the processing is as follows: remote sensing data (surface temperature, NDVI and the like), meteorological data (air temperature, humidity, wind speed) and topographic data (DEM and gradient) are subjected to data screening, monthly synthesis, cutting, re-projection and re-sampling through a Google Earth Engine (GEE); and finally, carrying out partition statistics on the grids in ArcGIS, outputting an interpretation variable table, adding two classification variables of the region and the month to which the two classification variables belong, and forming a training sample together with the every-month county-level AHF.
5. The method of claim 1, wherein step 4): training samples are led into R, 80% of samples are randomly selected for training and testing, and the rest samples are used for model verification. GBDT and Cubist both contain more hyper-parameters and can be adjusted, a caret packet, a gbm packet and a Cubist packet are called in R, GBDT and Cubist models are respectively trained according to different AHF sources, 10-fold cross validation repeated for 10 times is used for parameter adjustment, optimal model parameters are selected according to the principle of minimizing training/testing errors, multi-source artificial heat space-time quantification models (GBDT and Cubist) are constructed, and finally an algorithm (GBDT or Cubist) used for modeling each heat source is determined according to the validation errors; a Simple Linear Regression (SLR) model based on night light is used as a reference for testing the accuracy improvement of the two complex algorithms. All the above processes can be completed in R.
6. The method of claim 1, wherein step 5): calling a raster bag in the R, converting the interpretation variables in the grid form into a data frame format, and constructing a simple classification regression tree model to fill in the missing variables; and calling the trained model prediction result, constructing a grid input prediction result, and finally outputting the multisource artificial heat space-time distribution in the grid form.
CN202111354918.2A 2021-11-16 2021-11-16 Multi-source artificial heat space-time quantization method based on machine learning Pending CN114139719A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111354918.2A CN114139719A (en) 2021-11-16 2021-11-16 Multi-source artificial heat space-time quantization method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111354918.2A CN114139719A (en) 2021-11-16 2021-11-16 Multi-source artificial heat space-time quantization method based on machine learning

Publications (1)

Publication Number Publication Date
CN114139719A true CN114139719A (en) 2022-03-04

Family

ID=80393380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111354918.2A Pending CN114139719A (en) 2021-11-16 2021-11-16 Multi-source artificial heat space-time quantization method based on machine learning

Country Status (1)

Country Link
CN (1) CN114139719A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115204691A (en) * 2022-07-13 2022-10-18 中国科学院地理科学与资源研究所 Urban artificial heat emission estimation method based on machine learning and remote sensing technology
CN117313451A (en) * 2023-09-01 2023-12-29 长安大学 Crop canopy structure parameter inversion method based on E-INFORM model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115204691A (en) * 2022-07-13 2022-10-18 中国科学院地理科学与资源研究所 Urban artificial heat emission estimation method based on machine learning and remote sensing technology
CN115204691B (en) * 2022-07-13 2023-02-03 中国科学院地理科学与资源研究所 Urban artificial heat emission estimation method based on machine learning and remote sensing technology
CN117313451A (en) * 2023-09-01 2023-12-29 长安大学 Crop canopy structure parameter inversion method based on E-INFORM model

Similar Documents

Publication Publication Date Title
CN113919448B (en) Method for analyzing influence factors of carbon dioxide concentration prediction at any time-space position
CN110570651B (en) Road network traffic situation prediction method and system based on deep learning
CN108701274B (en) Urban small-scale air quality index prediction method and system
Mishra et al. Prediction of land use changes based on land change modeler (LCM) using remote sensing: A case study of Muzaffarpur (Bihar), India
CN110782093B (en) PM fusing SSAE deep feature learning and LSTM2.5Hourly concentration prediction method and system
CN114139719A (en) Multi-source artificial heat space-time quantization method based on machine learning
CN106355334A (en) Farmland construction area determining method
CN109948547A (en) Urban green space landscape evaluation method, device, storage medium and terminal device
CN110334732A (en) A kind of Urban Air Pollution Methods and device based on machine learning
Nadoushan et al. Modeling land use/cover changes by the combination of Markov chain and cellular automata Markov (CA-Markov) models
CN114861277B (en) Long-time-sequence territorial space function and structure simulation method
CN114881356A (en) Urban traffic carbon emission prediction method based on particle swarm optimization BP neural network optimization
CN113806419B (en) Urban area function recognition model and recognition method based on space-time big data
CN106127333A (en) Movie attendance Forecasting Methodology and system
CN109685249A (en) Air PM2.5 concentration prediction method based on AutoEncoder and BiLSTM fused neural network
CN110889092A (en) Short-time large-scale activity peripheral track station passenger flow volume prediction method based on track transaction data
CN110826244A (en) Conjugate gradient cellular automata method for simulating influence of rail transit on urban growth
CN115759488A (en) Carbon emission monitoring and early warning analysis system and method based on edge calculation
CN116681176A (en) Traffic flow prediction method based on clustering and heterogeneous graph neural network
Rimba et al. Identifying land use and land cover (LULC) change from 2000 to 2025 driven by tourism growth: A study case in Bali
Tehrani et al. Predicting solar radiation in the urban area: A data-driven analysis for sustainable city planning using artificial neural networking
Stevanovic et al. Evaluating robustness of signal timings for varying traffic flows
Akhter et al. Climate modeling of Jhelum River basin-a comparative study
CN115983522B (en) Rural habitat quality assessment and prediction method
Kopyrin Simulation modelling of the municipal sanatorium-tourist branch

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination