CN112070316A - Short-term load prediction method and system based on catboost algorithm and ensemble learning - Google Patents

Short-term load prediction method and system based on catboost algorithm and ensemble learning Download PDF

Info

Publication number
CN112070316A
CN112070316A CN202010978760.5A CN202010978760A CN112070316A CN 112070316 A CN112070316 A CN 112070316A CN 202010978760 A CN202010978760 A CN 202010978760A CN 112070316 A CN112070316 A CN 112070316A
Authority
CN
China
Prior art keywords
feature set
prediction
target value
short
training target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010978760.5A
Other languages
Chinese (zh)
Inventor
王浩磊
宋佶聪
何金辉
李哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN202010978760.5A priority Critical patent/CN112070316A/en
Publication of CN112070316A publication Critical patent/CN112070316A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of big data, in particular to a short-term load prediction method and a short-term load prediction system based on a catboost algorithm and ensemble learning, which can realize short-term load prediction by using data characteristics and improve the calculation precision and the stability of an output result. The invention relates to a short-term load prediction method based on a catboost algorithm and ensemble learning, which constructs a feature set according to a set prediction time step and acquires the feature set and a training target value corresponding to the feature set; then, constructing a prediction model according to the feature set, the training target value and the catboost algorithm; and finally, obtaining a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model. The invention is suitable for load prediction.

Description

Short-term load prediction method and system based on catboost algorithm and ensemble learning
Technical Field
The invention relates to the field of big data, in particular to a short-term load prediction method and system based on a catboost algorithm and ensemble learning.
Background
With the development of industrial energy internet, the internet of things and communication technology help enterprises to complete energy utilization collection, processing and digital presentation, and then on one hand, more and more enterprises begin to pay attention to energy efficiency and take energy conservation and efficiency improvement as important strategic deployment of the enterprises, and on the other hand, the enterprises need big data technology to provide decision support no matter internal management and energy market trading, which brings a new turn of technical innovation and application of the industrial internet.
Based on data-driven enterprise load short-term load prediction, an anchoring effect can be provided for energy utilization management of an enterprise, and the enterprise is assisted to make an energy efficiency management scheme. However, the short-term load prediction method of the enterprise at the present stage has the disadvantages of small data processing amount, low efficiency and precision and the like, in recent years, the machine learning method based on the integrated learning idea is used for predicting the great diversity of various data prediction competitions, and the catboost learning algorithm based on gradient number and developed by russian top-level science and technology company in 2017 is added with a processing mode and a feature combination processing module of category features and simultaneously supports the operation of a GPU, so that the method becomes an excellent classifier and a regression device.
In the prior art, in the regression analysis of large data volume by adopting the algorithm, the output result is unstable and the calculation precision is not high due to the random combination processing of the model on the characteristics, and the short-term load prediction cannot be carried out by utilizing the data characteristics in the prior art.
Disclosure of Invention
The invention aims to provide a short-term load prediction method and a short-term load prediction system based on a catboost algorithm and ensemble learning, which can realize short-term load prediction by using data characteristics and can improve the calculation precision and the stability of an output result.
The invention adopts the following technical scheme to realize the purpose, and the short-term load prediction method based on the catboost algorithm and the ensemble learning comprises the following steps:
step (1), constructing a feature set according to a set prediction time step, and acquiring the feature set and a training target value corresponding to the feature set;
step (2), constructing a prediction model according to the feature set, the training target value and the catboost algorithm;
and (3) obtaining a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model.
Further, in step (1), the constructing the feature set according to the set predicted time step includes:
A. determining a data source to be extracted and a time sequence length according to the set prediction time step;
B. extracting data in the time sequence length;
C. and processing and calculating the data in the length of the time sequence to construct a feature set.
Further, in step B, the data in the time-series length includes: historical energy usage data, weather data, production data, and economic data.
Further, in step C, the feature set includes: time series features, time window sliding features, and statistical features.
Further, in the step (2), the constructing a prediction model according to the feature set, the training target value and the catboost algorithm includes:
21. taking the obtained feature set and the training target value as an original feature set and a training target value;
22. performing catboost parameter adjustment on the original feature set and the training target value in a cross validation mode, and selecting in an alternative parameter range to form a cataboost meta-model;
23. k folding the original feature set according to the Stacking layer number K, then respectively delivering the feature set to K element models for training, predicting a predicted value corresponding to each folded test set, and splicing the predicted folds of the K folds to form a predicted value of a complete sample as a Stacking feature;
24. fusing the original feature set, the target value and the Stacking feature, and sending the fused feature set to a Bagging layer;
25. generating M meta-models according to the Bagging layer number M, training each meta-model by adopting all fusion characteristics, averaging the output prediction result, dynamically selecting the Stacking layer number K and the Bagging layer number M through cross validation, and performing model evaluation by adopting RMSE (root-mean-square error) and MAPE (MAPE) (mean Absolute Percentage error) to form a final prediction model.
Further, the constructing the prediction model according to the feature set, the training target value and the catboost algorithm further comprises: 26. and (4) constructing S prediction models according to the prediction time step S, setting time intervals, and dynamically updating the models at intervals.
Short-term load prediction system based on a catboost algorithm and ensemble learning comprises:
the data processing module is used for constructing a feature set according to the set prediction time step, acquiring the feature set and a training target value corresponding to the feature set, sending the feature set to the load prediction module, and sending the feature set and the training target value corresponding to the feature set to the model construction module;
the model building module is used for building a prediction model according to the feature set, the training target value and the catboost algorithm;
and the load prediction module is used for outputting a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model.
The method extracts data in the time sequence length according to the set prediction time step, constructs a feature set according to the data, acquires the feature set and a training target value corresponding to the feature set, constructs a prediction model according to the feature set, the training target value, a calking algorithm and ensemble learning, and obtains a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model, so that short-term load prediction can be realized by using the data feature.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a flowchart of the integrated learning algorithm based on the catboost.
FIG. 3 is a graph showing the comparison of the effects of the embodiment of the present invention.
Detailed Description
The short-term load prediction method based on the catboost algorithm and the ensemble learning is shown in a flow chart of a method shown in figure 1 and comprises the following steps,
step 101: constructing a feature set according to the set prediction time step;
step 102: acquiring a feature set and a training target value corresponding to the feature set;
step 103: constructing a prediction model according to the feature set, the training target value and the catboost algorithm;
step 104: and obtaining a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model.
In step 101, the specific implementation steps of constructing the feature set according to the set prediction time step include:
A. determining a data source to be extracted and a time sequence length according to the set prediction time step;
B. extracting data in the time sequence length;
C. and processing and calculating the data in the length of the time sequence to construct a feature set.
In step B, the data in the time series length includes: historical energy data, weather data, production data and economic data;
in step C, the feature set comprises: time series features, time window sliding features, and statistical features.
In step 103, the specific implementation steps of constructing a prediction model according to the feature set, the training target value and the catboost algorithm include:
21. taking the obtained feature set and the training target value as an original feature set and a training target value;
22. performing catboost parameter adjustment on the original feature set and the training target value in a cross validation mode, and selecting in an alternative parameter range to form a cataboost meta-model;
23. k folding the original feature set according to the Stacking layer number K, then respectively delivering the feature set to K meta-models for training, predicting a predicted value corresponding to each folded test set, and splicing the predicted folds of the K folds to form a predicted value of a complete sample as a Stacking feature;
24. fusing the original feature set, the training target value and the Stacking feature, and then sending the fused feature set to a Bagging layer;
25. and generating M meta-models according to the Bagging layer number M, training each meta-model by adopting all fusion characteristics, averaging the output prediction result, dynamically selecting the Stacking layer number K and the Bagging layer number M through cross validation, and performing model evaluation by adopting RMSE and MAPE to form a final prediction model.
The specific implementation steps for constructing the prediction model according to the feature set, the training target value and the catboost algorithm further comprise: 26. and (4) constructing S prediction models according to the prediction time step S, setting time intervals, and dynamically updating the models at intervals.
The method comprises the following steps of (1) extracting characteristics and training target values required by a prediction model; (2) constructing a meta-model by using a catboost based on the constructed feature set, and adjusting parameters by using cross validation; (3) and performing ensemble learning of the meta-model by adopting a Stacking + Bagging ensemble learning mode.
The specific implementation steps of performing ensemble learning of the meta-model by adopting a Stacking + Bagging ensemble learning mode comprise the following steps: the obtained feature set firstly passes through a Stacking layer, the Stacking layer divides training set data into NS misaligned cross validation sets according to the designated number of layers, trains NS submodels, obtains feature vectors of the Stacking layer by taking a test set in the cross validation sets as prediction data, fuses the features of the Stacking layer and all the feature sets, then passes through a Bagging layer, constructs NB models on the basis of meta-models, and takes the average value of the NB models as a final prediction value. And training the whole model to obtain a final model, determining the number of layers of Stacking and Bagging through cross validation, and adjusting the model by taking RMSE and MAPE as evaluation indexes.
The invention relates to a short-term load prediction system based on a catboost algorithm and ensemble learning, which comprises the following components:
the data processing module is used for constructing a feature set according to the set prediction time step, acquiring the feature set and a training target value corresponding to the feature set, sending the feature set to the load prediction module, and sending the feature set and the training target value corresponding to the feature set to the model construction module;
the model building module is used for building a prediction model according to the feature set, the training target value and the catboost algorithm;
and the load prediction module is used for outputting a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model.
The method comprises the following steps that (1) a catboost algorithm is adopted, the main point is that a strong learner is constructed by training a plurality of weak learners, the catboost characteristic enables the catboost to be trained by fully utilizing multidimensional characteristics to become a meta-learner with strong performance, then a Stacking learning layer is introduced, a plurality of meta-learners are stacked, the concept of cross validation is adopted, original data are subjected to characteristic conversion, characteristic dimensions of a data set are expanded, and the purpose of improving the performance of an overall model is achieved; and finally, introducing a Bagging learning layer, dispersing randomness caused by single model output by adopting a mode of training a plurality of models in parallel, reducing the variance of the overall output of the models, and achieving the purpose of improving the generalization capability of the models.
Fig. 2 shows a flowchart of an ensemble learning algorithm based on a catboost, which includes:
step 201, obtaining an original feature set and a training target value;
step 202, obtaining a cataost meta-model according to the original feature set and the training target value;
step 203, obtaining stacking characteristics according to a cataboost meta-model;
step 204, fusing the original feature set, the training target value and the Stacking feature, and sending the fused feature set to a Bagging layer;
and step 205, obtaining a final prediction model through a Bagging layer.
In an embodiment of the invention, data of a certain internet of things platform is taken as an example, a training set is adopted to train and predict daily load of a future week, LightGBM, CatBoost, Lstm (3Layers × 128Cells) and an algorithm (8Stacking Layers, 16Bagging Layers)8S16B of the technical scheme are respectively adopted to predict unknown test data, and the result is shown in FIG. 3, wherein the abscissa is time, and the ordinate is percentage error, so that the scheme has lower error under the action of Stacking, and the Bagging layer ensures more stable output result.
In summary, the present invention can realize short-term load prediction by using data characteristics, and can improve the calculation accuracy and the stability of the output result.

Claims (7)

1. The short-term load prediction method based on the catboost algorithm and ensemble learning is characterized by comprising the following steps:
step (1), constructing a feature set according to a set prediction time step, and acquiring the feature set and a training target value corresponding to the feature set;
step (2), constructing a prediction model according to the feature set, the training target value and the catboost algorithm;
and (3) obtaining a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model.
2. The short-term load prediction method based on the catboost algorithm and ensemble learning of claim 1, wherein in step (1), the constructing feature set according to the set prediction time step comprises:
A. determining a data source to be extracted and a time sequence length according to the set prediction time step;
B. extracting data in the time sequence length;
C. and processing and calculating the data in the length of the time sequence to construct a feature set.
3. The method for short-term load prediction based on the catboost algorithm and ensemble learning of claim 2, wherein in step B, the data in the time series length comprises: historical energy usage data, weather data, production data, and economic data.
4. The short-term load prediction method based on the catboost algorithm and ensemble learning of claim 2, wherein in step C, the feature set comprises: time series features, time window sliding features, and statistical features.
5. The short-term load prediction method based on the catboost algorithm and ensemble learning of claim 2, wherein in the step (2), the constructing the prediction model according to the feature set, the training target value and the catboost algorithm comprises:
21. taking the obtained feature set and the training target value as an original feature set and a training target value;
22. performing catboost parameter adjustment on the original feature set and the training target value in a cross validation mode, and selecting in an alternative parameter range to form a cataboost meta-model;
23. k folding the original feature set according to the Stacking layer number K, then respectively delivering the feature set to K meta-models for training, predicting a predicted value corresponding to each folded test set, and splicing the predicted folds of the K folds to form a predicted value of a complete sample as a Stacking feature;
24. fusing the original feature set, the training target value and the Stacking feature, and then sending the fused feature set to a Bagging layer;
25. and generating M meta-models according to the Bagging layer number M, training each meta-model by adopting all fusion characteristics, averaging the output prediction result, dynamically selecting the Stacking layer number K and the Bagging layer number M through cross validation, and performing model evaluation by adopting RMSE and MAPE to form a final prediction model.
6. The short-term load prediction method based on the catboost algorithm and ensemble learning of claim 5, further comprising: 26. and (4) constructing S prediction models according to the prediction time step S, setting time intervals, and dynamically updating the models at intervals.
7. Short-term load prediction system based on a catboost algorithm and ensemble learning is characterized by comprising the following components:
the data processing module is used for constructing a feature set according to the set prediction time step, acquiring the feature set and a training target value corresponding to the feature set, sending the feature set to the load prediction module, and sending the feature set and the training target value corresponding to the feature set to the model construction module;
the model building module is used for building a prediction model according to the feature set, the training target value and the catboost algorithm;
and the load prediction module is used for outputting a prediction result according to the set prediction time interval, the prediction time step, the feature set and the prediction model.
CN202010978760.5A 2020-09-17 2020-09-17 Short-term load prediction method and system based on catboost algorithm and ensemble learning Pending CN112070316A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010978760.5A CN112070316A (en) 2020-09-17 2020-09-17 Short-term load prediction method and system based on catboost algorithm and ensemble learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010978760.5A CN112070316A (en) 2020-09-17 2020-09-17 Short-term load prediction method and system based on catboost algorithm and ensemble learning

Publications (1)

Publication Number Publication Date
CN112070316A true CN112070316A (en) 2020-12-11

Family

ID=73681677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010978760.5A Pending CN112070316A (en) 2020-09-17 2020-09-17 Short-term load prediction method and system based on catboost algorithm and ensemble learning

Country Status (1)

Country Link
CN (1) CN112070316A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112785056A (en) * 2021-01-22 2021-05-11 杭州市电力设计院有限公司 Short-term load prediction method based on fusion of Catboost and LSTM models
CN114519920A (en) * 2022-01-10 2022-05-20 广西大学 Intelligent early warning method, system and equipment for hard rock collapse based on microseism multi-precursor characteristics

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112785056A (en) * 2021-01-22 2021-05-11 杭州市电力设计院有限公司 Short-term load prediction method based on fusion of Catboost and LSTM models
CN114519920A (en) * 2022-01-10 2022-05-20 广西大学 Intelligent early warning method, system and equipment for hard rock collapse based on microseism multi-precursor characteristics

Similar Documents

Publication Publication Date Title
CN110366734B (en) Optimizing neural network architecture
Kankal et al. Modeling and forecasting of Turkey’s energy consumption using socio-economic and demographic variables
CN102945507B (en) Based on distributing wind energy turbine set Optimizing Site Selection method and the device of Fuzzy Level Analytic Approach
Pandey et al. A decision tree algorithm pertaining to the student performance analysis and prediction
Nyangarika et al. Energy stability and decarbonization in developing countries: Random Forest approach for forecasting of crude oil trade flows and macro indicators
CN112070316A (en) Short-term load prediction method and system based on catboost algorithm and ensemble learning
CN106062786A (en) Computing system for training neural networks
Xie et al. A new multi-criteria decision model based on incomplete dual probabilistic linguistic preference relations
Cordova et al. Combined electricity and traffic short-term load forecasting using bundled causality engine
CN112488055B (en) Video question-answering method based on progressive graph attention network
Jiang et al. Day‐ahead renewable scenario forecasts based on generative adversarial networks
US20230222325A1 (en) Binary neural network model training method and system, and image processing method and system
CN111400592A (en) Personalized course recommendation method and system based on eye movement technology and deep learning
CN114443899A (en) Video classification method, device, equipment and medium
CN107239850A (en) A kind of long-medium term power load forecasting method based on system dynamics model
Zhao et al. Spatiotemporal semantic network for ENSO forecasting over long time horizon
Lee et al. Predicting the performance of solar power generation using deep learning methods
CN113743083A (en) Test question difficulty prediction method and system based on deep semantic representation
CN117076931A (en) Time sequence data prediction method and system based on conditional diffusion model
Srivastav et al. Simulation-optimization framework for multi-site multi-season hybrid stochastic streamflow modeling
CN106650972A (en) Recommendation system scoring prediction method based on cloud model facing social network
Bowden Forecasting water resources variables using artificial neural networks
CN111369046A (en) Wind-solar complementary power prediction method based on grey neural network
Balraj et al. A DNN based LSTM model for predicting future energy consumption
Li et al. Prediction of High-Speed Railway Passenger Traffic Volume Based on Integrated Method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201211