CN112070316A - Short-term load prediction method and system based on catboost algorithm and ensemble learning - Google Patents
Short-term load prediction method and system based on catboost algorithm and ensemble learning Download PDFInfo
- Publication number
- CN112070316A CN112070316A CN202010978760.5A CN202010978760A CN112070316A CN 112070316 A CN112070316 A CN 112070316A CN 202010978760 A CN202010978760 A CN 202010978760A CN 112070316 A CN112070316 A CN 112070316A
- Authority
- CN
- China
- Prior art keywords
- feature set
- prediction
- target value
- short
- training target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 50
- 238000002790 cross-validation Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 11
- 101001095088 Homo sapiens Melanoma antigen preferentially expressed in tumors Proteins 0.000 claims description 5
- 102100037020 Melanoma antigen preferentially expressed in tumors Human genes 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 abstract description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000004873 anchoring Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000004134 energy conservation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Entrepreneurship & Innovation (AREA)
- Theoretical Computer Science (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the field of big data, in particular to a short-term load prediction method and a short-term load prediction system based on a catboost algorithm and ensemble learning, which can realize short-term load prediction by using data characteristics and improve the calculation precision and the stability of an output result. The invention relates to a short-term load prediction method based on a catboost algorithm and ensemble learning, which constructs a feature set according to a set prediction time step and acquires the feature set and a training target value corresponding to the feature set; then, constructing a prediction model according to the feature set, the training target value and the catboost algorithm; and finally, obtaining a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model. The invention is suitable for load prediction.
Description
Technical Field
The invention relates to the field of big data, in particular to a short-term load prediction method and system based on a catboost algorithm and ensemble learning.
Background
With the development of industrial energy internet, the internet of things and communication technology help enterprises to complete energy utilization collection, processing and digital presentation, and then on one hand, more and more enterprises begin to pay attention to energy efficiency and take energy conservation and efficiency improvement as important strategic deployment of the enterprises, and on the other hand, the enterprises need big data technology to provide decision support no matter internal management and energy market trading, which brings a new turn of technical innovation and application of the industrial internet.
Based on data-driven enterprise load short-term load prediction, an anchoring effect can be provided for energy utilization management of an enterprise, and the enterprise is assisted to make an energy efficiency management scheme. However, the short-term load prediction method of the enterprise at the present stage has the disadvantages of small data processing amount, low efficiency and precision and the like, in recent years, the machine learning method based on the integrated learning idea is used for predicting the great diversity of various data prediction competitions, and the catboost learning algorithm based on gradient number and developed by russian top-level science and technology company in 2017 is added with a processing mode and a feature combination processing module of category features and simultaneously supports the operation of a GPU, so that the method becomes an excellent classifier and a regression device.
In the prior art, in the regression analysis of large data volume by adopting the algorithm, the output result is unstable and the calculation precision is not high due to the random combination processing of the model on the characteristics, and the short-term load prediction cannot be carried out by utilizing the data characteristics in the prior art.
Disclosure of Invention
The invention aims to provide a short-term load prediction method and a short-term load prediction system based on a catboost algorithm and ensemble learning, which can realize short-term load prediction by using data characteristics and can improve the calculation precision and the stability of an output result.
The invention adopts the following technical scheme to realize the purpose, and the short-term load prediction method based on the catboost algorithm and the ensemble learning comprises the following steps:
step (1), constructing a feature set according to a set prediction time step, and acquiring the feature set and a training target value corresponding to the feature set;
step (2), constructing a prediction model according to the feature set, the training target value and the catboost algorithm;
and (3) obtaining a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model.
Further, in step (1), the constructing the feature set according to the set predicted time step includes:
A. determining a data source to be extracted and a time sequence length according to the set prediction time step;
B. extracting data in the time sequence length;
C. and processing and calculating the data in the length of the time sequence to construct a feature set.
Further, in step B, the data in the time-series length includes: historical energy usage data, weather data, production data, and economic data.
Further, in step C, the feature set includes: time series features, time window sliding features, and statistical features.
Further, in the step (2), the constructing a prediction model according to the feature set, the training target value and the catboost algorithm includes:
21. taking the obtained feature set and the training target value as an original feature set and a training target value;
22. performing catboost parameter adjustment on the original feature set and the training target value in a cross validation mode, and selecting in an alternative parameter range to form a cataboost meta-model;
23. k folding the original feature set according to the Stacking layer number K, then respectively delivering the feature set to K element models for training, predicting a predicted value corresponding to each folded test set, and splicing the predicted folds of the K folds to form a predicted value of a complete sample as a Stacking feature;
24. fusing the original feature set, the target value and the Stacking feature, and sending the fused feature set to a Bagging layer;
25. generating M meta-models according to the Bagging layer number M, training each meta-model by adopting all fusion characteristics, averaging the output prediction result, dynamically selecting the Stacking layer number K and the Bagging layer number M through cross validation, and performing model evaluation by adopting RMSE (root-mean-square error) and MAPE (MAPE) (mean Absolute Percentage error) to form a final prediction model.
Further, the constructing the prediction model according to the feature set, the training target value and the catboost algorithm further comprises: 26. and (4) constructing S prediction models according to the prediction time step S, setting time intervals, and dynamically updating the models at intervals.
Short-term load prediction system based on a catboost algorithm and ensemble learning comprises:
the data processing module is used for constructing a feature set according to the set prediction time step, acquiring the feature set and a training target value corresponding to the feature set, sending the feature set to the load prediction module, and sending the feature set and the training target value corresponding to the feature set to the model construction module;
the model building module is used for building a prediction model according to the feature set, the training target value and the catboost algorithm;
and the load prediction module is used for outputting a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model.
The method extracts data in the time sequence length according to the set prediction time step, constructs a feature set according to the data, acquires the feature set and a training target value corresponding to the feature set, constructs a prediction model according to the feature set, the training target value, a calking algorithm and ensemble learning, and obtains a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model, so that short-term load prediction can be realized by using the data feature.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a flowchart of the integrated learning algorithm based on the catboost.
FIG. 3 is a graph showing the comparison of the effects of the embodiment of the present invention.
Detailed Description
The short-term load prediction method based on the catboost algorithm and the ensemble learning is shown in a flow chart of a method shown in figure 1 and comprises the following steps,
step 101: constructing a feature set according to the set prediction time step;
step 102: acquiring a feature set and a training target value corresponding to the feature set;
step 103: constructing a prediction model according to the feature set, the training target value and the catboost algorithm;
step 104: and obtaining a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model.
In step 101, the specific implementation steps of constructing the feature set according to the set prediction time step include:
A. determining a data source to be extracted and a time sequence length according to the set prediction time step;
B. extracting data in the time sequence length;
C. and processing and calculating the data in the length of the time sequence to construct a feature set.
In step B, the data in the time series length includes: historical energy data, weather data, production data and economic data;
in step C, the feature set comprises: time series features, time window sliding features, and statistical features.
In step 103, the specific implementation steps of constructing a prediction model according to the feature set, the training target value and the catboost algorithm include:
21. taking the obtained feature set and the training target value as an original feature set and a training target value;
22. performing catboost parameter adjustment on the original feature set and the training target value in a cross validation mode, and selecting in an alternative parameter range to form a cataboost meta-model;
23. k folding the original feature set according to the Stacking layer number K, then respectively delivering the feature set to K meta-models for training, predicting a predicted value corresponding to each folded test set, and splicing the predicted folds of the K folds to form a predicted value of a complete sample as a Stacking feature;
24. fusing the original feature set, the training target value and the Stacking feature, and then sending the fused feature set to a Bagging layer;
25. and generating M meta-models according to the Bagging layer number M, training each meta-model by adopting all fusion characteristics, averaging the output prediction result, dynamically selecting the Stacking layer number K and the Bagging layer number M through cross validation, and performing model evaluation by adopting RMSE and MAPE to form a final prediction model.
The specific implementation steps for constructing the prediction model according to the feature set, the training target value and the catboost algorithm further comprise: 26. and (4) constructing S prediction models according to the prediction time step S, setting time intervals, and dynamically updating the models at intervals.
The method comprises the following steps of (1) extracting characteristics and training target values required by a prediction model; (2) constructing a meta-model by using a catboost based on the constructed feature set, and adjusting parameters by using cross validation; (3) and performing ensemble learning of the meta-model by adopting a Stacking + Bagging ensemble learning mode.
The specific implementation steps of performing ensemble learning of the meta-model by adopting a Stacking + Bagging ensemble learning mode comprise the following steps: the obtained feature set firstly passes through a Stacking layer, the Stacking layer divides training set data into NS misaligned cross validation sets according to the designated number of layers, trains NS submodels, obtains feature vectors of the Stacking layer by taking a test set in the cross validation sets as prediction data, fuses the features of the Stacking layer and all the feature sets, then passes through a Bagging layer, constructs NB models on the basis of meta-models, and takes the average value of the NB models as a final prediction value. And training the whole model to obtain a final model, determining the number of layers of Stacking and Bagging through cross validation, and adjusting the model by taking RMSE and MAPE as evaluation indexes.
The invention relates to a short-term load prediction system based on a catboost algorithm and ensemble learning, which comprises the following components:
the data processing module is used for constructing a feature set according to the set prediction time step, acquiring the feature set and a training target value corresponding to the feature set, sending the feature set to the load prediction module, and sending the feature set and the training target value corresponding to the feature set to the model construction module;
the model building module is used for building a prediction model according to the feature set, the training target value and the catboost algorithm;
and the load prediction module is used for outputting a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model.
The method comprises the following steps that (1) a catboost algorithm is adopted, the main point is that a strong learner is constructed by training a plurality of weak learners, the catboost characteristic enables the catboost to be trained by fully utilizing multidimensional characteristics to become a meta-learner with strong performance, then a Stacking learning layer is introduced, a plurality of meta-learners are stacked, the concept of cross validation is adopted, original data are subjected to characteristic conversion, characteristic dimensions of a data set are expanded, and the purpose of improving the performance of an overall model is achieved; and finally, introducing a Bagging learning layer, dispersing randomness caused by single model output by adopting a mode of training a plurality of models in parallel, reducing the variance of the overall output of the models, and achieving the purpose of improving the generalization capability of the models.
Fig. 2 shows a flowchart of an ensemble learning algorithm based on a catboost, which includes:
and step 205, obtaining a final prediction model through a Bagging layer.
In an embodiment of the invention, data of a certain internet of things platform is taken as an example, a training set is adopted to train and predict daily load of a future week, LightGBM, CatBoost, Lstm (3Layers × 128Cells) and an algorithm (8Stacking Layers, 16Bagging Layers)8S16B of the technical scheme are respectively adopted to predict unknown test data, and the result is shown in FIG. 3, wherein the abscissa is time, and the ordinate is percentage error, so that the scheme has lower error under the action of Stacking, and the Bagging layer ensures more stable output result.
In summary, the present invention can realize short-term load prediction by using data characteristics, and can improve the calculation accuracy and the stability of the output result.
Claims (7)
1. The short-term load prediction method based on the catboost algorithm and ensemble learning is characterized by comprising the following steps:
step (1), constructing a feature set according to a set prediction time step, and acquiring the feature set and a training target value corresponding to the feature set;
step (2), constructing a prediction model according to the feature set, the training target value and the catboost algorithm;
and (3) obtaining a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model.
2. The short-term load prediction method based on the catboost algorithm and ensemble learning of claim 1, wherein in step (1), the constructing feature set according to the set prediction time step comprises:
A. determining a data source to be extracted and a time sequence length according to the set prediction time step;
B. extracting data in the time sequence length;
C. and processing and calculating the data in the length of the time sequence to construct a feature set.
3. The method for short-term load prediction based on the catboost algorithm and ensemble learning of claim 2, wherein in step B, the data in the time series length comprises: historical energy usage data, weather data, production data, and economic data.
4. The short-term load prediction method based on the catboost algorithm and ensemble learning of claim 2, wherein in step C, the feature set comprises: time series features, time window sliding features, and statistical features.
5. The short-term load prediction method based on the catboost algorithm and ensemble learning of claim 2, wherein in the step (2), the constructing the prediction model according to the feature set, the training target value and the catboost algorithm comprises:
21. taking the obtained feature set and the training target value as an original feature set and a training target value;
22. performing catboost parameter adjustment on the original feature set and the training target value in a cross validation mode, and selecting in an alternative parameter range to form a cataboost meta-model;
23. k folding the original feature set according to the Stacking layer number K, then respectively delivering the feature set to K meta-models for training, predicting a predicted value corresponding to each folded test set, and splicing the predicted folds of the K folds to form a predicted value of a complete sample as a Stacking feature;
24. fusing the original feature set, the training target value and the Stacking feature, and then sending the fused feature set to a Bagging layer;
25. and generating M meta-models according to the Bagging layer number M, training each meta-model by adopting all fusion characteristics, averaging the output prediction result, dynamically selecting the Stacking layer number K and the Bagging layer number M through cross validation, and performing model evaluation by adopting RMSE and MAPE to form a final prediction model.
6. The short-term load prediction method based on the catboost algorithm and ensemble learning of claim 5, further comprising: 26. and (4) constructing S prediction models according to the prediction time step S, setting time intervals, and dynamically updating the models at intervals.
7. Short-term load prediction system based on a catboost algorithm and ensemble learning is characterized by comprising the following components:
the data processing module is used for constructing a feature set according to the set prediction time step, acquiring the feature set and a training target value corresponding to the feature set, sending the feature set to the load prediction module, and sending the feature set and the training target value corresponding to the feature set to the model construction module;
the model building module is used for building a prediction model according to the feature set, the training target value and the catboost algorithm;
and the load prediction module is used for outputting a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010978760.5A CN112070316A (en) | 2020-09-17 | 2020-09-17 | Short-term load prediction method and system based on catboost algorithm and ensemble learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010978760.5A CN112070316A (en) | 2020-09-17 | 2020-09-17 | Short-term load prediction method and system based on catboost algorithm and ensemble learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112070316A true CN112070316A (en) | 2020-12-11 |
Family
ID=73681677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010978760.5A Pending CN112070316A (en) | 2020-09-17 | 2020-09-17 | Short-term load prediction method and system based on catboost algorithm and ensemble learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112070316A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112785056A (en) * | 2021-01-22 | 2021-05-11 | 杭州市电力设计院有限公司 | Short-term load prediction method based on fusion of Catboost and LSTM models |
CN114519920A (en) * | 2022-01-10 | 2022-05-20 | 广西大学 | Intelligent early warning method, system and equipment for hard rock collapse based on microseism multi-precursor characteristics |
-
2020
- 2020-09-17 CN CN202010978760.5A patent/CN112070316A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112785056A (en) * | 2021-01-22 | 2021-05-11 | 杭州市电力设计院有限公司 | Short-term load prediction method based on fusion of Catboost and LSTM models |
CN114519920A (en) * | 2022-01-10 | 2022-05-20 | 广西大学 | Intelligent early warning method, system and equipment for hard rock collapse based on microseism multi-precursor characteristics |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110366734B (en) | Optimizing neural network architecture | |
Kankal et al. | Modeling and forecasting of Turkey’s energy consumption using socio-economic and demographic variables | |
CN102945507B (en) | Based on distributing wind energy turbine set Optimizing Site Selection method and the device of Fuzzy Level Analytic Approach | |
Pandey et al. | A decision tree algorithm pertaining to the student performance analysis and prediction | |
Nyangarika et al. | Energy stability and decarbonization in developing countries: Random Forest approach for forecasting of crude oil trade flows and macro indicators | |
CN112070316A (en) | Short-term load prediction method and system based on catboost algorithm and ensemble learning | |
CN106062786A (en) | Computing system for training neural networks | |
Xie et al. | A new multi-criteria decision model based on incomplete dual probabilistic linguistic preference relations | |
Cordova et al. | Combined electricity and traffic short-term load forecasting using bundled causality engine | |
CN112488055B (en) | Video question-answering method based on progressive graph attention network | |
Jiang et al. | Day‐ahead renewable scenario forecasts based on generative adversarial networks | |
US20230222325A1 (en) | Binary neural network model training method and system, and image processing method and system | |
CN111400592A (en) | Personalized course recommendation method and system based on eye movement technology and deep learning | |
CN114443899A (en) | Video classification method, device, equipment and medium | |
CN107239850A (en) | A kind of long-medium term power load forecasting method based on system dynamics model | |
Zhao et al. | Spatiotemporal semantic network for ENSO forecasting over long time horizon | |
Lee et al. | Predicting the performance of solar power generation using deep learning methods | |
CN113743083A (en) | Test question difficulty prediction method and system based on deep semantic representation | |
CN117076931A (en) | Time sequence data prediction method and system based on conditional diffusion model | |
Srivastav et al. | Simulation-optimization framework for multi-site multi-season hybrid stochastic streamflow modeling | |
CN106650972A (en) | Recommendation system scoring prediction method based on cloud model facing social network | |
Bowden | Forecasting water resources variables using artificial neural networks | |
CN111369046A (en) | Wind-solar complementary power prediction method based on grey neural network | |
Balraj et al. | A DNN based LSTM model for predicting future energy consumption | |
Li et al. | Prediction of High-Speed Railway Passenger Traffic Volume Based on Integrated Method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201211 |