CN115907033A

CN115907033A - Method and system for predicting hourly power consumption based on machine learning algorithm

Info

Publication number: CN115907033A
Application number: CN202211534465.6A
Authority: CN
Inventors: 夏勇军; 罗宾; 郭志刚; 陈莉娟; 施志勇; 徐文; 赵立华
Original assignee: Hubei Central China Technology Development Of Electric Power Co ltd; State Grid Hubei Electric Power Co Ltd
Current assignee: Hubei Central China Technology Development Of Electric Power Co ltd; State Grid Hubei Electric Power Co Ltd
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2023-04-04

Abstract

The invention provides a method and a system for predicting electricity consumption per hour based on a machine learning algorithm, wherein the method comprises the following steps: acquiring power consumption data and weather data, wherein the power consumption data comprises time, user numbers and power consumption, and the weather data comprises local recording time, temperature, wind power and humidity; cleaning the acquired electric quantity data and weather data; modeling the cleaned data to obtain a VAR model; and predicting the electricity consumption at the future moment according to the established VAR model. The VAR model established by the method is an improved version based on the ARIMA model, the prediction precision is improved, the required data sample size is small based on the characteristics of the algorithm, and the method has obvious advantages in the required sample size compared with a machine learning algorithm with supervised learning. Under the specific scene that the number of samples is insufficient, the prediction effect is more prominent by using the algorithm model compared with other machine learning models.

Description

Method and system for predicting hourly power consumption based on machine learning algorithm

Technical Field

The invention relates to the field of machine learning algorithms, in particular to a method and a system for predicting hourly power consumption based on a machine learning algorithm.

Background

Today, smart grids are considered to have great potential for providing fault-free, continuous power supplies. To achieve this goal, much research has focused on using historical hourly power usage to predict hourly power usage. The technology is beneficial to monitoring the demand of power consumption, reasonably distributing power, maintaining the stability of a power consumption network and ensuring reliable power supply and power consumption safety. Otherwise, overestimating or underestimating the power usage of the grid can present challenges to the grid.

Overestimation of the power load prediction may result in unnecessary power reserve, uneven energy distribution, and increased operating costs. However, underestimation of the power load causes problems of reliability and safety of power utilization. In summary, accurate power prediction is very important for smart grids, which can improve stability, safety and reliability of power systems.

At present, the methods for predicting the electric power are generally classified into 2 types, which are respectively a statistical method based on ARIMA integration moving average autoregression and the like and a machine learning method based on RNN recurrent neural network and the like. However, the 2 types of methods have obvious disadvantages, the former method is provided for a long time, the prediction effect is not ideal, and the latter method as an algorithm with supervised learning needs a large number of learning samples.

Disclosure of Invention

The invention provides a method and a system for predicting the electricity consumption per hour based on a machine learning algorithm, which can improve the prediction precision, simultaneously enable the required data sample amount to be small based on the characteristics of the algorithm, have obvious advantages on the required sample amount compared with the machine learning algorithm with supervised learning, and have more prominent prediction effect compared with other machine learning models when the algorithm model is used in a specific scene with insufficient sample amount.

A method for predicting electricity consumption per hour based on a machine learning algorithm comprises the following steps:

acquiring power consumption data and weather data, wherein the power consumption data comprises time, user numbers and power consumption, and the weather data comprises local recording time, temperature, wind power and humidity;

step two, cleaning the electric quantity data and the weather data obtained in the step one;

step three, modeling the cleaned data, wherein the modeling process is as follows in sequence:

performing a test 1 of independent variables and dependent variables, namely a grand cause and effect relationship test, judging whether cause and effect relationships exist between the power consumption as the dependent variables and the air temperature, the wind power and the humidity as the independent variables, and if the cause and effect relationships exist, executing the following steps:

performing independent variable and dependent variable inspection 2, namely, performing coordination inspection, judging whether the time sequence is stable, when the coordination inspection fails, performing difference, then performing coordination inspection by using difference data, and if the inspection passes, executing the next step;

determining the hysteresis order of the model according to the akali pool information criterion;

the dependent variable of the model is checked, namely when the fact that the power consumption time sequence number sequence has no random error autocorrelation is judged, the checking is passed, and then the next step is executed;

building a VAR model: the VAR model comprises a multiple linear regression model and a time series model, a first mapping is formed between the temperature, the wind power, the humidity and the electricity consumption of the lag order as Y which are currently used as X, and a second mapping is formed between the electricity consumption of the lag order as Y2 and the electricity consumption of the Y1 which are currently used as Y1, wherein the first mapping is the multiple linear regression model, and the second mapping is the ARIMA time series model;

and step four, predicting the electricity consumption at the future moment according to the VAR model established in the step three.

Further, in the first step, power consumption data are obtained in a company database in an SQL query mode, and weather data are obtained by capturing a weather website through a crawler program.

Further, the step of cleaning the obtained data includes:

merging data: the method comprises the steps that power consumption data inquired by a database and weather data acquired by a crawler are correlated through time dimension, and dimension time, user number, power consumption, air temperature, wind power and humidity contained in each piece of data are obtained;

deleting the abnormal value: deleting a sample with negative electricity consumption;

and (3) deleting the vacancy value: deleting the samples with the vacancy values of the power consumption;

dimension correlation analysis: and obtaining the correlation among the air temperature, the wind power and the humidity in the independent variable dimension and the power consumption in the dependent variable through a scatter diagram and a thermodynamic diagram.

Further, the residual error is judged by adopting the Dubin-Watson test, when the test value is close to 2, no random error autocorrelation exists, and the time sequence series passes the Dubin-Watson test. When the test value is close to 0 or 4, the positive sequence correlation or the negative sequence correlation is indicated to be randomly required to exist, and the time sequence does not pass the Dubin-Watson test.

Further, the fourth step of predicting the power consumption at the future moment according to the VAR model established in the third step specifically comprises: and inputting a prediction set comprising electricity consumption, air temperature, wind power and humidity into the VAR model, namely outputting the electricity consumption at the 24 th future moment.

A system for predicting electricity usage per hour based on machine learning algorithms, comprising:

the data acquisition module is used for acquiring power consumption data and weather data, wherein the power consumption data comprises time, user numbers and power consumption, and the weather data comprises local recording time, temperature, wind power and humidity;

the data cleaning module is used for cleaning the electric quantity data and the weather data acquired by the data acquisition module;

the VAR model establishing module is used for establishing a model for the cleaned data, and the modeling process sequentially comprises the following steps:

building a VAR model: the VAR model comprises a multiple linear regression model and a time sequence model, a first mapping is formed between the current temperature, wind power, humidity and electricity consumption of a lag order as Y, and a second mapping is formed between the current electricity consumption of Y1 and the electricity consumption of the lag order as Y2, wherein the first mapping is a multiple linear regression model, and the second mapping is an ARIMA time sequence model;

and the prediction module is used for predicting the power consumption at the future moment according to the VAR model established by the VAR model establishing module.

Furthermore, the data acquisition module acquires power consumption data in an SQL query mode, and acquires weather data by capturing a weather website through a crawler program.

Further, the data cleaning module cleans the electric quantity data and the weather data acquired by the data acquisition module, and specifically includes:

merging data: the method comprises the steps that power consumption data inquired by a database and weather data acquired by a crawler are correlated through time dimensions, and the dimension time, user number, power consumption, air temperature, wind power and humidity contained in each data are obtained;

deleting abnormal values: deleting a sample with negative electricity consumption;

dimension correlation analysis: and obtaining the correlation among the air temperature, the wind power and the humidity in the independent variable dimension and the electricity consumption in the dependent variable through a scatter diagram and a thermodynamic diagram.

The VAR model established by the invention is an improved version based on the ARIMA model, the prediction precision is improved, the required data sample size is small (usually dozens of data are enough) based on the self algorithm characteristics, and the advantage on the required sample size is obvious compared with the machine learning algorithm with supervised learning. Under the specific scene that the number of samples is insufficient, compared with other machine learning models, the prediction effect is more prominent by using the algorithm model.

Drawings

FIG. 1 is a visual display of the predicted hourly power consumption and actual values using hourly power consumption, air temperature, wind speed and humidity on the previous day;

FIG. 2 is a visual display of predicted hourly power consumption and real values for hourly power consumption, air temperature, wind speed, and humidity for the first two days;

FIG. 3 is a schematic diagram of a prediction method of the present invention;

FIG. 4 is a flow chart of a method of the present invention for predicting electricity usage per hour based on a machine learning algorithm.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

The inventor of the present application found through research that: the vector autoregressive model (VAR model for short) is an improved version based on an ARIMA model, the prediction precision is improved, the required data sample quantity is small, and generally dozens of samples are enough, so that the VAR model has certain application in the field of financial quantification.

As shown in fig. 3, the present invention uses a VAR vector autocorrelation machine learning model to predict the electricity usage per hour based on the hourly meteorological data (air temperature, wind power and humidity) of the previous day or the previous two days in combination with the hourly electricity usage data of the previous day.

FIG. 4 is a detailed flow chart of the method of predicting electricity usage per hour based on a machine learning algorithm of the present invention, the method comprising the steps of:

step one, acquiring data

Data sources come from section 2, database queries and crawler fetches. Acquiring power consumption data in a company database in an SQL (structured query language) query mode, wherein dimensions comprise time, user numbers, power consumption and the like; and meanwhile, the weather data of the weather website, such as local recording time, air temperature, wind power, humidity and the like, is captured through a crawler program.

Step two, data processing

The method comprises the following steps of cleaning acquired data, wherein the specific process comprises the following steps:

2.1. merging data: the method comprises the steps that power consumption data inquired by a database and weather data acquired by a crawler are correlated through time dimension, and dimension time, user number, power consumption, air temperature, wind power and humidity contained in each piece of data are obtained;

2.2. deleting the abnormal value: for example, a sample with a negative electricity consumption amount is deleted;

2.3. and (3) deleting the vacancy value: specifically, samples with vacant values in power consumption are deleted, and 24 data of a day are deleted if 24 samples are vacant in 24 samples in a day due to the fact that the predicted granularity is of an hour level;

2.4. dimension correlation analysis: and obtaining the correlation among the air temperature, the wind power and the humidity in the independent variable dimension and the electricity consumption in the dependent variable by means of a scatter diagram, a thermodynamic diagram and the like.

And step three, modeling the data processed in the step two, and finally predicting the power consumption at the future moment according to the established model.

In this embodiment, the electricity consumption and the meteorological data are time series data, and a time series algorithm is selected for modeling prediction analysis. Modeling is started after data processing, and the whole modeling process comprises the following steps: the method comprises the steps of 1 (Glangel causal relationship test) for model independent variables and dependent variables, 2 (coordination test) for model independent variables and dependent variables, determining parameters (hysteresis orders) of a model, testing (residual judgment) for model dependent variables, establishing a VAR model, and finally outputting predicted power consumption results at future moments (see figures 1-2) and analyzing.

The modeling process comprises the following specific steps:

(1) Model independent and dependent variable test 1 (Glanberg causal relationship test)

The glange causal relationship test is used in the modeling process to analyze relationships between variables that are dependent on certain past times, and is one of the prerequisites for VAR algorithm calculations. The arguments passing the test can be predicted using the VAR vector autocorrelation algorithm; independent variables that do not pass the granger causal test need to be deleted.

Since the grangerscausalitytests module is already packaged in the statunmolds library of python, the call can be made. After the invocation, the causal relationship between the dependent variables (power consumption) and the independent variables (air temperature, wind power and humidity) is found, and therefore the test is passed.

(2) Model independent and dependent variable test 2 (synergy test)

The covariance check is a check method used in the modeling process to determine whether the time series is stable (mean has no change of rising or falling and variance is unchanged), and is one of the prerequisites for VAR algorithm calculation, and after the glange causal relationship check. Only time series (stationary time series) that pass the co-integration test can use the VAR prediction value. When the coordination check fails, a difference is needed to be made, and then the coordination check is carried out by using the difference data.

The paint module is already packaged in the statmodels library of python and can be called. After the invocation, the short-term stationary relationship between the dependent variables (electricity consumption) and the independent variables (air temperature, wind power, humidity) respectively is found, and thus the test is passed.

(3) Determining parameters of the model (hysteresis order)

The more important parameter in the model is the hysteresis order. Determining the hysteresis order is a step in the modeling process, typically determined according to the akachi-pool information criterion (AIC), and is also a prerequisite for VAR algorithm calculations. The hysteresis order is very large or very small, which can have a serious impact on the efficiency of the parameter estimation of the VAR. The hysteresis number is generally determined according to the Chichi-information criterion (AIC), and a smaller value of AIC is better.

In practice, the magnitude of the Akabane Information Criterion (AIC) shows that the error is minimal when the lag order is 24 (day before).

This result is also consistent with our knowledge that since the daily power usage data is periodic, a hysteresis order of 24 represents: the model predicts the 25 th time point by the 1 st time point, predicts the 26 th time point by the 2 nd time point, …, and then sequentially runs in a rolling mode.

(4) Examination of dependent variables of model (residual judgment)

The judgment of the correlation of the residual errors is a step in the modeling process, and is usually judged by adopting a Dubin-Watson test and is a prerequisite for VAR algorithm calculation. Duren-watt son was originally used in the economies of metrology to examine the method of sequence random error autocorrelation. When the test value is close to 2, there is no random error autocorrelation and the time series passes the Dubin-Watson test. When the test value is close to 0 or 4, the positive sequence correlation or the negative sequence correlation is indicated to be randomly required to exist, and the time sequence does not pass the Dubin-Watson test.

If the power consumption time sequence number cannot be used for the test, the correlation of the random error value sequence is shown, and the method is not suitable for building VAR model prediction. Then the batch is not available and another batch needs to be replaced (returning to step one) and the data is purged.

In this embodiment, the value of the durin-watt test for each dependent variable (power usage) is very close to 2, indicating that there is no autocorrelation of the random error, passing the durin-watt test.

(5) Building VAR model

The VAR algorithm is a calculation and prediction link in a modeling process and is used for predicting the electricity consumption. From the Akabane Information Criterion (AIC) of step 5, here we constructed the VAR algorithm with a hysteresis order of 24.

The VAR model is composed of 2 models (multiple linear regression + time series). Let map 1 be formed between the current X (air temperature, wind power, humidity) and Y (power usage) at the 24 th future time (i.e., hysteresis order), and let map 2 be formed between the current Y1 (power usage) and Y2 (power usage) at the 24 th future time. Map 1 is a multiple linear regression model and map 2 is an ARIMA time series model.

The VAR module is already packaged in the statmodels library of python and can be called. And fitting training of multiple groups of data is carried out through the training set, and when the overall error is minimum, the training is completed. At this time, the prediction set (electricity consumption, air temperature, wind power, humidity) is inputted, and the electricity consumption at the 24 th future time can be predicted.

(6) Analysis of results

After the VAR algorithm outputs the predicted value, the error between the predicted value and the true value is calculated by using the index for evaluating the performance of the model. Here, 2 indices are used: r-square (R2) and Mean Absolute Percent Error (MAPE).

The definition of the R side is as follows: the ratio of the regression squared sum of the Y variations caused by the variable X to the sum of the Y variations squares, also known as goodness of fit, reflects how many percent of the Y variations can be described by the X variations, i.e., how many percent of the variations characterizing the dependent variable Y can be interpreted by the controlled independent variable X.

MAPE is defined as: and averaging the absolute percentage error, wherein the MAPE is 0% to represent a perfect model, and the MAPE is more than 100% to represent a poor model.

The overall error results were analyzed as follows:

TABLE 1

	Previous day data prediction	First two day data prediction
			MAPE	1.58％	8.60％
Predicted value R2 of electricity consumption	0.867	0.739

As can be seen from Table 1, when the electricity consumption, air temperature, wind power and humidity of the hour of the previous day are used to predict the electricity consumption of the hour of the day, the model electricity consumption explainability (R2) and the average absolute percentage error (MAPE) are as above. The overall performance of the model is good.

And refining the error condition to every day, and comparing a predicted value with a true value by using a graph:

fig. 1 and 2 are graphs comparing predicted power usage with actual values. We can see that when the hourly power usage is predicted using the data from the previous day, the predicted values are compared to the actual values. In particular the tendency of the fluctuations to rise and fall is kept uniform. When the fluctuation of the true value is large, the difference between the predicted value and the true value is large. In addition, when the electricity consumption of each hour is predicted by the electricity consumption and the meteorological value of the previous two days, the difference of the curve of the predicted value and the real value is larger than that of the curve of the previous day. As can be seen, when the future power consumption is predicted from the power consumption at the time when the time is relatively close and the trend is relatively uniform, the difference is relatively small. Therefore, in the process of real implementation, the predicted values of the data in the previous day are used as main monitoring values, the predicted values of the data in the previous two days are used as auxiliary monitoring values, and early warning is given to high power consumption which possibly occurs in the future.

The embodiment of the invention also provides a system for predicting the electricity consumption per hour based on the machine learning algorithm, which comprises the following steps:

the VAR model establishing module is used for establishing a model for the cleaned data, and the modeling process is as follows in sequence:

determining the hysteresis order of the model according to the Chichi information criterion;

checking the dependent variable of the model, namely when judging that the power consumption time sequence number sequence has no random error autocorrelation, passing the checking, and then executing the next step;

The invention is used as a machine learning method based on a measurement economics algorithm, is applied to the power industry, carries out power consumption prediction in each hour, and belongs to innovation in the application field; in addition, the method has good model effect in a specific scene with insufficient sample size. The VAR model is an improved version based on the ARIMA model, the prediction precision is improved, meanwhile, the required data sample size is small (usually dozens of data are enough) based on the self algorithm characteristics, and the advantage on the required sample size is obvious compared with the machine learning algorithm with supervised learning. Under the specific scene that the number of samples is insufficient, the prediction effect is more prominent by using the algorithm model compared with other machine learning models.

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for predicting hourly power consumption based on a machine learning algorithm is characterized in that: the method comprises the following steps:

step two, cleaning the electric quantity data and the weather data acquired in the step one;

2. The method of predicting electricity usage per hour based on machine learning algorithms according to claim 1, wherein: in the first step, power consumption data are obtained in a company database in an SQL query mode, and weather data are obtained by capturing a weather website through a crawler program.

3. The machine-learning algorithm-based method of predicting electricity usage per hour of claim 1, wherein: cleaning the acquired data, specifically comprising:

and deleting the vacancy value: deleting the samples with the vacancy values of the power consumption;

4. The method for predicting electricity usage per hour based on machine learning algorithm of claim 1, wherein: and (4) adopting the Dubin-Watson test to judge residual errors, and when the test value is close to 2, no random error autocorrelation exists, and the time sequence series passes the Dubin-Watson test. When the test value is close to 0 or 4, the positive sequence correlation or the negative sequence correlation is indicated to be randomly required to exist, and the time sequence does not pass the Dubin-Watson test.

5. The method for predicting electricity usage per hour based on machine learning algorithm of claim 1, wherein: step four, predicting the electricity consumption at a future moment according to the VAR model established in the step three, and specifically comprises the following steps: and inputting a prediction set comprising electricity consumption, air temperature, wind power and humidity into the VAR model, namely outputting the electricity consumption at the 24 th future moment.

6. A system for predicting electricity usage per hour based on machine learning algorithms, comprising:

performing independent variable and dependent variable inspection 2, namely, a co-integration inspection, judging whether the time sequence is stable or not, when the co-integration inspection fails, performing difference, performing the co-integration inspection by using difference data, and if the co-integration inspection passes the inspection, performing the next step;

7. The system for predicting electricity usage per hour based on machine learning algorithms according to claim 6, wherein: the data acquisition module acquires power consumption data in an SQL query mode, and acquires weather data by capturing a weather website through a crawler program.

8. The system for predicting electricity usage per hour based on machine learning algorithms of claim 6, wherein: the data cleaning module is used for cleaning the electric quantity data and the weather data acquired by the data acquisition module, and specifically comprises the following steps: