CN114169434A - Load prediction method - Google Patents

Load prediction method Download PDF

Info

Publication number
CN114169434A
CN114169434A CN202111486036.1A CN202111486036A CN114169434A CN 114169434 A CN114169434 A CN 114169434A CN 202111486036 A CN202111486036 A CN 202111486036A CN 114169434 A CN114169434 A CN 114169434A
Authority
CN
China
Prior art keywords
data set
new energy
output load
model
load
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111486036.1A
Other languages
Chinese (zh)
Inventor
伍林
肖飞
王治华
陆继翔
张琪培
陈宏福
高峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nari Technology Co Ltd
State Grid Shanghai Electric Power Co Ltd
State Grid Electric Power Research Institute
Original Assignee
Nari Technology Co Ltd
State Grid Shanghai Electric Power Co Ltd
State Grid Electric Power Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nari Technology Co Ltd, State Grid Shanghai Electric Power Co Ltd, State Grid Electric Power Research Institute filed Critical Nari Technology Co Ltd
Priority to CN202111486036.1A priority Critical patent/CN114169434A/en
Publication of CN114169434A publication Critical patent/CN114169434A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/003Load forecast, e.g. methods or systems for forecasting future load demand

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Power Engineering (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a load prediction method, which comprises the following steps: acquiring a power generation system output sample data set; decomposing the output sample data set into a new energy output load data set and a non-new energy output load data set; carrying out exception handling on the new energy output load data set; acquiring a new energy combined characteristic data set according to the new energy output load data set subjected to exception processing; obtaining an output load training data set according to the new energy combination characteristic data set; obtaining a fusion model alternative set according to the output load training data set; optimizing the fusion model in the alternative set of the fusion model according to the output load training data set; predicting the new energy output load by adopting the optimized fusion model; predicting the non-new energy output load according to the non-new energy output data set and a pre-trained non-new energy output prediction model, and adding the prediction result and the new energy output result to obtain the final predicted output load; the invention can accurately predict the system load.

Description

Load prediction method
Technical Field
The invention belongs to the technical field of load prediction, and particularly relates to a load prediction method.
Background
At the present stage, with the appearance of the new targets of realizing carbon peak reaching before 2030 and carbon neutralization before 2060 in China, the occupation proportion of new energy power generation in the field of power generation is increased sharply, and the new energy power generation is increased gradually to replace the traditional thermal power generation gradually, so that the occupation proportion of new energy load in regional system load is increased gradually. The difficulty of system load prediction is increased due to randomness and uncontrollable property of new energy power generation, the accurate prediction of the new energy load power generation can increase the accuracy of regional system load prediction, the power supply utilization rate of power grid scheduling is improved, and the scheduling plan is more efficient and reasonable.
Compared with the traditional resident load power generation prediction, the system load prediction of the area with higher new energy load ratio has higher difficulty, and besides the higher fluctuation of the system load of the area with higher new energy load ratio, the new energy load greatly depends on weather and the external environment of a power station, and the regularity is less obvious. The difficulty of system load prediction is greatly increased.
The current system load prediction method mainly comprises the following steps: traditional linear predictive regression models and novel predictive models for artificial neural networks. The linear predictive regression model includes: regression models, random forest models, kalman filtering methods, time series models, and the like are commonly used for the prediction of small data sets, and have poor effects in the prediction of complex nonlinear relationships. The novel predictive model comprises: neural network models, grey prediction models, wavelet analysis, expert systems, and the like.
The current method for predicting the new energy load mainly comprises the following steps: the method comprises a direct prediction method and an indirect prediction method, wherein the direct prediction method (statistical method) is mainly based on mathematical statistics prediction theory and method, and comprises methods such as probability, time sequence, artificial intelligence and the like. The method has the advantages of simple and clear procedure and no requirements on the position of the power generation station and the power conversion parameters; the method has the defect that in order to ensure the accuracy of the forecast result, a large amount of historical operating data of the photovoltaic power station and accurate future forecast weather data are needed, so that the current forecast difficulty is high. The indirect prediction method (physical method) is mainly based on the physical power generation principle of a new energy power generation system mainly comprising photovoltaic and wind power, and has the advantages that historical operation data is not needed, and the photovoltaic and wind power stations can be directly predicted after being built; the method has the defects that data such as a power station detailed topographic map, power station coordinates, a power generation power station power curve and other related photoelectric conversion parameters are needed, more data are difficult to quantify, and the difficulty in obtaining power station detailed physical information on a system load level is high, so that the current prediction difficulty is high.
Because the new energy load is gradually increased in the area system load ratio, the system load prediction is influenced, the accuracy is reduced to some extent, and the new energy load cannot adapt to the load change with large fluctuation.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a load prediction method which can accurately predict the system load.
The technical problem to be solved by the invention is realized by the following technical scheme:
the invention provides a load prediction method, which comprises the following steps:
acquiring a power generation system output sample data set;
decomposing the output sample data set into a new energy output load data set and a non-new energy output load data set;
carrying out exception handling on the new energy output load data set;
acquiring a new energy combined characteristic data set according to the new energy output load data set subjected to exception processing;
obtaining an output load training data set according to the new energy combination characteristic data set;
obtaining a fusion model alternative set according to the output load training data set;
optimizing the fusion model in the alternative set of the fusion model according to the output load training data set;
predicting the new energy output load by adopting the optimized fusion model;
and predicting the non-new energy output load according to the non-new energy output data set and the pre-trained non-new energy output prediction model, and adding the prediction result and the new energy output result to obtain the final predicted power generation system output load.
Further, the obtaining of the output sample data set of the power generation system includes: acquiring a power generation output load data set and a weather data set of a target area system, filling abnormal points of data loss and numerical abnormality in the power generation output load data set of the area system of the area system by adopting a mean value method, a spline difference value or linear interpolation, and deleting abnormal data sections in the power generation output data set of the area system by comparing historical real weather and output load correlation in the weather data set.
Further, decomposing the output sample data set into a new energy output load data set and a non-new energy output load data set includes: and decomposing the output load sample data set of the power generation system into a trend component, a periodic component and a new energy component by adopting an STL time sequence decomposition method, wherein the sum of the trend component and the periodic component is non-new energy output load data, and the new energy component is new energy output load data.
Further, the exception handling of the new energy output load data set includes:
calculating a person correlation coefficient of the characteristics and the output load according to the new energy output load data set and the weather data set, and screening out a characteristic composition characteristic set of which the correlation with the output load is higher than a threshold value;
inputting the feature set and the new energy output load data into an Isolation Forest anomaly detection model to obtain insignificant anomaly points;
and interpolating abnormal points in the non-significant abnormal points by adopting a linear method, a mean value interpolation method or a mode interpolation method, and deleting abnormal data sections in the abnormal points.
Further, the person correlation coefficient of all the characteristics and the output force is calculated by adopting the formula (1):
Figure BDA0003396592030000021
wherein X represents all the characteristic sample variables, Y represents all the output sample variables, XiFor the value of the current feature sample variable at the ith time point, YiThe value of the force load at the ith time point,
Figure BDA0003396592030000022
is the mean of the current feature sample variables, σ X is the standard deviation of the current feature sample variables,
Figure BDA0003396592030000031
is the mean value of the output load sample variables, σ Y is the standard deviation of the output load sample variables, and n is the number of time points.
Further, the obtaining of the new energy combined feature data set according to the new energy output data set after the exception handling includes:
generating a characteristic column according to the prior knowledge and the characteristic weather;
calculating a person correlation coefficient and a Spearman correlation coefficient of the characteristics and the output load in the characteristic column according to the new energy output load data set after exception processing, and screening out the characteristics of which the correlation with the output load is higher than a threshold value;
and (4) carrying out independent transformation on the time characteristics in the screened characteristics, carrying out triangular transformation on the wind direction characteristics, and adding the combination characteristics on the basis to obtain a new energy combination characteristic data set.
Further, the obtaining of the fusion model candidate set according to the output training data set includes:
and (3) segmenting the output load training data set, respectively inputting the segmented output load training data set into an LSTM model, a KNN model, an XGboost model and a Lightgbm model, evaluating the model performance, and selecting a group with the optimal evaluation value as a model candidate set.
Further, evaluating the model performance by adopting an MSE and MAE evaluation method;
wherein the MSE evaluation is as shown in formula (2)
Figure BDA0003396592030000032
MAE evaluation is shown in formula (3)
Figure BDA0003396592030000033
Where m denotes the number of samples, yiRepresents the actual force load value of the ith sample,
Figure BDA0003396592030000034
the predicted output load value for the ith sample is shown.
Further, optimizing the fusion model in the fusion model candidate set according to the new energy combination feature data set includes:
dividing the features in the new energy combined feature data set into basic features and additional features, randomly disordering and uniformly dividing the additional features onto the basic features to form different feature sets of each model, and taking out corresponding feature column data according to the feature sets of each model to form a model training set;
inputting the model training set into each model, splicing each model, and forming a neural network model training set according to the prediction result on each cross validation set;
and inputting the obtained neural network model training data set into the neural network model, and adjusting parameters of the neural network model to ensure that the parameters are optimal in the cross validation result.
The invention has the beneficial effects that: the method carries out high-precision targeted modeling prediction on the new energy load component with high fluctuation and high randomness of regional system load, adopts multiple linear regression on the non-new energy load component of the stable part, and fits the regular residential and industrial and commercial power loads. And the accuracy of the regional system load is improved by adopting a separation prediction method for stable data and unstable data, and the influence of new energy fluctuation on the regional system load is reduced.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a graph of data obtained after decomposition of the original output in the present invention;
FIG. 3 is a schematic structural diagram of the LSTM model of the present invention;
FIG. 4 is a schematic diagram of the significance of some features of the present invention;
FIG. 5 is a schematic diagram illustrating the influence of some features on the output result according to the present invention;
FIG. 6 is a diagram illustrating a predicted result of a new energy load part according to the present invention;
FIG. 7 is a schematic diagram of a non-new energy load data prediction result according to the present invention;
FIG. 8 is a diagram illustrating the sum of decomposition prediction results in the present invention.
Detailed Description
To further describe the technical features and effects of the present invention, the present invention will be further described with reference to the accompanying drawings and detailed description.
In a power grid system, the generated output load and the power load of the system are basically in a balanced state, and the difficulty of load acquisition of power consumption measurement is caused due to the diversity of power consumption, so that the power consumption is estimated through the generated output load, and the obtained final output of the system is equivalent to the obtained regional system load
As shown in fig. 1 to 8, the load decomposition prediction method based on feature combination and model fusion of the present application includes the following steps:
step 1: and processing the power generation output load and weather data set of the target area system, finding out the abnormal points and abnormal data sections with obvious data, and screening to obtain a sample data set of the output load of the area power generation system.
In the embodiment of the application, historical data of power generation processing loads of systems 2019-1-1 to 2021-1-1 in a certain area is obtained, and relevant weather data comprises: wind speed, direction, irradiance, temperature, rainfall, air pressure, etc., one sample point every 15 minutes. The time periods of continuous loss in the output load data and the weather data are preliminarily screened and deleted, extreme abnormal values such as temperature of-99 ℃ and the like in the weather data are deleted, continuous abnormal data segments (for example, the output load values of the output load in a continuous period of time are all abnormal) are deleted, and interpolation is carried out on abnormal points of discontinuous data.
Step 2: decomposing the output load sample data set into a new energy output load data set and a non-new energy output load data set;
and decomposing the output load data of the systems of the regions 2019-1-1 to 2021-1-1 into a trend component, a periodic component and a new energy component remainder by using a time sequence decomposition algorithm STL. Wherein the STL consists of two loop mechanisms, an inner loop nested within an outer loop. The seasonal item and the trend item are updated once every time the internal circulation is performed; each inner loop consists of n (i) such processes; each outer cycle consists of an inner cycle, and the stable weight can be obtained through calculation; these weights are used in the next inner loop to reduce transient abnormal behavior in the trend and seasonal terms. The robust weights set for the first outer loop are all equal to 1, and then n (o) outer loops are performed.
And obtaining the output load trend component, the periodic component (non-new energy) and the new energy component remainder of the regional system. And taking out the new energy component remainder data set as a prediction new energy component prediction data set, wherein the sum of the trend component and the period component is a non-new energy output data set.
The results of the time-series decomposition part are shown in FIG. 2: wherein, Observed represents the actual output load value, Trend represents the Trend component, Season represents the period component, and Resid represents the new energy remainder.
And step 3: the method for processing the new energy processing data set comprises the following steps:
step 3.1, according to the new energy source data set obtained in the step 2, a Person correlation coefficient of each line of characteristic data and output load is calculated in the data set, a characteristic line with a correlation value larger than or equal to 0.25 is screened out, corresponding data is taken out according to the characteristic line to predict a related data set for a new energy component, and a Person correlation coefficient calculation formula is as follows:
Figure BDA0003396592030000051
where n is the number of samples, XiFor the ith (i represents the time point ordinal) value of the current feature column data (current feature sample variable), YiIs the value at the ith time point of the force load data,
Figure BDA0003396592030000052
respectively, the current characteristic column mean value and the output load column mean value, wherein sigma X and sigma Y are the current characteristic column standard deviation and the output load column standard deviation.
Step 3.2, the screened new energy component prediction related data set is used for carrying out abnormal data point detection by using an abnormal detection algorithm isolated Forest (Isolation Forest), and the calculation mode is as follows:
step 3.2.1 obtains the new energy component prediction related dataset from step 3.1, and takes all the feature columns to form a dataset X ═ X1,...,xn},
Figure BDA0003396592030000053
Where d represents a data feature dimension, randomly drawn from X
Figure BDA0003396592030000054
The data for each time point constitutes a subset X' of X, which is placed into the root node.
Randomly choosing a feature q from d feature dimensions,randomly generating a cut point p in the current dataset such that: min (x)ij,j=q,xij∈X')<p<max(xij,j=q,xij∈X')。
And a hyperplane is generated by the cutting point p, the current data space is divided into two subspaces, the sample points with the value smaller than p on the characteristic p are placed into the left child node, and the other sample points are placed into the right child node.
Repeating the cutting and dividing operations on the left child node and the right child node until all leaf nodes only have one sample point or an isolated tree (iTree) reaches the designated height, and generating an isolated tree
And repeating the first-stage step method until t isolated trees are generated.
Step 3.2.2 the outlier of each data point is calculated from the new energy component prediction related data set obtained in step 3.1, the specific method is as follows:
for each data point xiTraverse each orphan tree (iTree) to compute point xiAverage height h (x) in foresti) And normalizing the average height of all the points, and calculating the abnormal value fraction by using the following formula:
Figure BDA0003396592030000061
wherein
Figure BDA0003396592030000062
Finally, the judgment is carried out according to the following modes:
when
Figure BDA0003396592030000063
When the temperature of the water is higher than the set temperature,
Figure BDA0003396592030000064
i.e., the average path length of data point x is similar to the average path length of the tree, it is not possible to distinguish whether it is an anomaly or not.
When E (h (x)) → 0,
Figure BDA0003396592030000065
i.e., the anomaly score for data point x is close to 1, it is an anomaly point.
When E (h (x)) → n-1,
Figure BDA0003396592030000066
is a normal value.
Step 3.3: according to the calculation result, interpolation is carried out on the abnormal points through methods such as mean value, mode, linear filling and the like, the filled data are ensured to meet the condition that the timestamp of each day is complete and 96 data points exist each day, the abnormal values in a long time period are deleted, and the deleted data comprise the data of the whole day. And obtaining a new energy component training set.
And 4, step 4: acquiring a new energy combined characteristic data set according to the new energy output data set subjected to exception processing; the method specifically comprises the following steps:
step 4.1: and (4) generating a characteristic column on the new energy component training set generated in the step (3), wherein the characteristic column comprises historical output load characteristics formed by output load data of the previous period of time, historical weather characteristics formed by weather data of the previous period of time, and time-class characteristics formed by year, month, day, week, hour and the like.
Step 4.2: using a formula for each column of characteristic data of the data set obtained in the step 4.1:
Figure BDA0003396592030000067
calculating the Person correlation coefficient of each column of characteristics and output load, wherein
Figure BDA0003396592030000068
The mean value of the data characteristic column, the standard deviation of the sigma X data characteristic column,
Figure BDA0003396592030000069
the mean value of the output load line and the sigma Y are the standard deviation of the output load line; using the formula:
Figure BDA00033965920300000610
a correlation coefficient. It is composed ofIn diAnd representing the difference value of the ith sample point of the characteristic X and the ith sample point of the output load Y, wherein n is the number of the sample points.
The characteristic column with a Person correlation coefficient less than 0.2 and the characteristic column with a Spearman correlation coefficient less than 0.1 are removed from the data set. The main characteristics are obtained as follows:
TABLE 1 set of predicted characteristics of force load
Figure BDA0003396592030000071
Figure BDA0003396592030000081
And 4.3, performing one-hot (one-hot) transformation on the time characteristics and triangular transformation on the wind direction characteristics in the characteristics screened in the step 4.2, adding combined characteristics on the basis, wherein the combined characteristics comprise the sum, difference, product and quotient of any two rows of characteristics, and additionally adding the difference value between the characteristics and the mean value thereof and the difference value between the characteristics and the median to obtain a data characteristic complete set and a data complete set.
The step 5 specifically comprises the following steps:
step 5.1: the data sets are scrambled by day, and the data in the day are kept in order.
Step 5.2: and inputting the disordered data set into an XGboost model to calculate the feature importance of each feature, selecting the features with the feature importance greater than 0.05, and taking out corresponding feature data.
Step 5.3: the features with the remaining feature importance of less than or equal to 0.05 are gradually added to the data set extracted in step 5.2, and the influence of the obtained features on the model result is shown in fig. 5. Dividing the data set equally into ten parts, and selecting a model with the optimal ten-fold average result in a ten-fold cross validation mode, wherein the corresponding characteristic is a model characteristic complete set. And the feature importance ordering (feature importance) in the feature set is shown in fig. 4.
Step 6 specifically includes the following steps
Step 6.1: and (5) taking out corresponding data according to the model feature complete set obtained in the step (5), disordering the data set according to the day, and keeping the data in order in the day.
Step 6.2: and (4) dividing the data set by adopting a ten-fold cross validation mode, and respectively inputting the data set into model training of an LSTM model, a KNN model, an XGboost model, a LightGBM, ET and the like.
Step 6.3: adjusting each model parameter training model, predicting a verification set in the cross-folding intersection by using the model, calculating MSE and MAE values of the verification set, taking the average value of the MSE and the MAE as a model evaluation value, and finally obtaining 5 better models: LSTM, XGboost, LightGBM, KNN.
The step 7 specifically comprises the following steps:
and 7.1, extracting corresponding data according to the model feature complete set obtained in the step 5, and dividing the features into basic features and additional features according to feature importance, wherein the basic features are the features 15 before feature importance ranking, and the additional features are the rest features. In order to increase the diversity of the model, the additional features are disorganized and equally divided into 4 parts, and the 4 parts of feature sets are spliced with the basic features to obtain the feature sets of the 4 models.
And 7.2, respectively taking out 4 training data sets corresponding to the 4 models according to the obtained 5-model feature set and the data feature complete set obtained in the step 4.3. And (4) carrying out dimension transformation on the 4 model training sets, and inputting the 4 models by adopting a ten-fold cross validation mode.
The training of the plurality of models in step 7.2 specifically comprises the following steps
The specific method for training the XGboost model is as follows:
taking a corresponding training data set of the XGboost model, adopting a ten-fold cross validation mode, and repeating the following steps for training:
step (1): establishing a regression tree model;
step (2): and adding subtrees through feature splitting, adding the current tree into the original model every time one regression tree is added, and fitting the residual error of the last prediction, namely the difference between the actual value and the predicted value. The splitting rule is to calculate the gain after each split and select the gain splitting scheme with the maximum gain.
And (3): continuously splitting the characteristics, finally reaching a leaf node, summing the results of each leaf node to obtain a predicted value of the sample,
and (4): and traversing all the characteristic division points by using a greedy algorithm, and minimizing the objective function to obtain the XGboost model by using the objective function value as an evaluation function.
And (5): XGboost parameter adjustment, parameters include: the number of maximum trees generated, the learning rate, the minimum loss function degradation value required for node splitting, the random sampling proportion, the maximum depth of the trees, the L1 regularization term of the weight, and the L2 regularization term of the weight. And continuously setting a parameter minimization objective function to obtain a final XGboost model as a fusion model.
The LightGBM training steps are slightly different according to the XGboost model training method:
when a split point is searched, continuous floating point characteristic values are discretized into k integers, and a histogram with the width of k is constructed. When data is traversed, statistics are accumulated in the histogram according to the discretized value serving as an index, after the data is traversed once, the histogram accumulates needed statistics, and then the optimal segmentation point is found by traversing and calculating the splitting gain according to the discretized value of the histogram.
And during splitting, a Leaf-wise splitting method is adopted, and one Leaf with the largest splitting gain is found from all current leaves every time during splitting. LightGBM therefore adds a maximum depth limit above the Leaf-wise, preventing overfitting while ensuring high efficiency.
When the histogram is calculated, the leaf node with small histogram is firstly calculated, and then the difference is made by utilizing the histogram to obtain the leaf node with large histogram, so that the histogram of the brother leaf can be obtained with very little cost.
LSTM training step:
(1) taking a training data set corresponding to an LSTM model, determining input and output dimensions (for example, in the application, the input dimension is (96, num _ features) and the output dimension of the model, predicting data of one day, the output dimension is (96,1), wherein num _ features represents the feature number), and repeating the following steps for training by adopting a ten-fold cross validation mode:
(2) selecting a mean square error as a loss function, designing an LSTM model structure according to a calculation formula;
Figure BDA0003396592030000091
where m denotes the number of samples, yiRepresents the actual force load value of the ith sample,
Figure BDA0003396592030000092
the predicted output load value for the ith sample is shown.
(3) Putting the data set into an LSTM model for forward propagation calculation;
(4) and (4) calculating the back propagation of the model.
And (5): and minimizing MSE by using an Adam optimization algorithm, and adding a regularization term dropout to obtain an LSTM fusion model.
The structure of the LSTM model is shown in fig. 3.
KNN model training step:
taking a KNN model corresponding training data set, adopting a ten-fold cross validation mode, and repeating the following steps for training:
(1) in ten-fold cross validation, nine-fold data is used for training, one-fold data is used for validation, and the distance between each sample and the sample in the training is calculated and validated (the Euclidean distance is used in the application).
(2) And gradually increasing the k value to ensure that the model performs best on the verification set, so as to obtain the KNN fusion model.
Step 7.3: and respectively splicing the ten-fold cross validation effect of each model on the validation set to obtain the predicted value of each model on the training set, wherein four columns in total respectively represent: the LightGBM model predicted value, the XGboost model predicted value, the LSTM model predicted value and the KNN model predicted value.
Step 7.4: inputting the characteristics of the 4 columns into a multiple linear regression neural network model, and finally obtaining 10 linear regression models by adopting a 10-fold cross validation mode.
The step 8 specifically comprises the following steps:
step 8.1: using the test set and the different feature columns corresponding to each model in step 5, the corresponding features and the corresponding data sets are respectively extracted, and the data is input into each cross-validated model obtained in step 7.3 (for example, a total of 40 models are obtained by cross-validation using 4 models 10 in the present application). And obtaining a plurality of cross validation model prediction results of each model (4 models in total, each model has ten models corresponding to different training sets, and each model performs prediction and has 40 groups of prediction values in total).
Step 8.2: and (3) respectively averaging the prediction results according to the model types to obtain the prediction input characteristics of the neural network, inputting the prediction input characteristics into the 10 linear regression models obtained in the step (7.4) for prediction, and averaging the prediction results to obtain the prediction results of the fusion models on the test set, wherein partial results are shown in FIG. 6.
Preferably, step 9 specifically comprises the following steps:
step 9.1: and (3) obtaining a training set and a testing set according to the step 5 by using the non-new energy output load data set obtained in the step 2 and the weather cleaning data set obtained in the step 1, and obtaining a corresponding time data set to input into a linear regression neural network model for training. And adjusting parameters in a cross validation mode to obtain a non-new energy output load prediction model.
Step 9.2: inputting the corresponding test set obtained in the step 9.1 into a non-new energy load prediction model for prediction, wherein partial prediction results are shown in fig. 7. And (3) adding the prediction result and the new energy load prediction result obtained in the step (8.2) to obtain a final load prediction result, wherein partial results are shown in fig. 8. The accuracy rate obtained without decomposition is 95.3%, and the accuracy rate predicted by decomposition is 97.1%.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (9)

1. A method of load prediction, comprising:
acquiring a power generation system output sample data set;
decomposing the output sample data set into a new energy output load data set and a non-new energy output load data set;
carrying out exception handling on the new energy output load data set;
acquiring a new energy combined characteristic data set according to the new energy output load data set subjected to exception processing;
obtaining an output load training data set according to the new energy combination characteristic data set;
obtaining a fusion model alternative set according to the output load training data set;
optimizing the fusion model in the alternative set of the fusion model according to the output load training data set;
predicting the new energy output load by adopting the optimized fusion model;
and predicting the non-new energy output load according to the non-new energy output data set and the pre-trained non-new energy output prediction model, and adding the prediction result and the new energy output result to obtain the final predicted power generation system output load.
2. The method of claim 1, wherein the obtaining a power generation system output sample set comprises: acquiring a power generation output load data set and a weather data set of a target area system, filling abnormal points of data loss and numerical abnormality in the power generation output load data set of the area system of the area system by adopting a mean value method, a spline difference value or linear interpolation, and deleting abnormal data sections in the power generation output data set of the area system by comparing historical real weather and output load correlation in the weather data set.
3. The method of claim 2, wherein decomposing the set of output sample data into the new energy output load data set and the non-new energy output load data set comprises:
and decomposing the output load sample data set of the power generation system into a trend component, a periodic component and a new energy component by adopting an STL time sequence decomposition method, wherein the sum of the trend component and the periodic component is non-new energy output load data, and the new energy component is new energy output load data.
4. The method of claim 2, wherein the exception handling of the new energy contribution load dataset comprises:
calculating a person correlation coefficient of the characteristics and the output load according to the new energy output load data set and the weather data set, and screening out a characteristic composition characteristic set of which the correlation with the output load is higher than a threshold value;
inputting the feature set and the new energy output load data into an Isolation Forest anomaly detection model to obtain insignificant anomaly points;
and interpolating abnormal points in the non-significant abnormal points by adopting a linear method, a mean value interpolation method or a mode interpolation method, and deleting abnormal data sections in the abnormal points.
5. A method of load prediction as claimed in claim 4, characterised by calculating the person correlation coefficient for all features and contribution using equation (1):
Figure FDA0003396592020000021
wherein X represents all the characteristic sample variables, Y represents all the output sample variables, XiFor the value of the current feature sample variable at the ith time point, YiThe value of the force load at the ith time point,
Figure FDA0003396592020000022
is the mean of the current feature sample variables, σ X is the standard deviation of the current feature sample variables,
Figure FDA0003396592020000023
is the mean value of the output load sample variables, σ Y is the standard deviation of the output load sample variables, and n is the number of time points.
6. The method of claim 1, wherein the deriving the new energy combined feature data set from the new energy contribution load data set after exception handling comprises:
generating a characteristic column according to the prior knowledge and the characteristic weather;
calculating a person correlation coefficient and a Spearman correlation coefficient of the characteristics and the output load in the characteristic column according to the new energy output load data set after exception processing, and screening out the characteristics of which the correlation with the output load is higher than a threshold value;
and (4) carrying out independent transformation on the time characteristics in the screened characteristics, carrying out triangular transformation on the wind direction characteristics, and adding the combination characteristics on the basis to obtain a new energy combination characteristic data set.
7. The method of claim 1, wherein deriving the fused model candidate set from the imposed load training data set comprises:
and (3) segmenting the output load training data set, respectively inputting the segmented output load training data set into an LSTM model, a KNN model, an XGboost model and a Lightgbm model, evaluating the model performance, and selecting a group with the optimal evaluation value as a model candidate set.
8. A method of load prediction as claimed in claim 7 wherein the model performance is assessed using MSE and MAE assessment;
wherein the MSE evaluation is as shown in formula (2)
Figure FDA0003396592020000024
MAE evaluation is shown in formula (3)
Figure FDA0003396592020000025
Where m denotes the number of samples, yiRepresents the actual force load value of the ith sample,
Figure FDA0003396592020000026
the predicted output load value for the ith sample is shown.
9. The method of claim 7, wherein optimizing the fusion model in the candidate set of fusion models according to the new energy source combined feature data set comprises:
dividing the features in the new energy combined feature data set into basic features and additional features, randomly disordering and uniformly dividing the additional features onto the basic features to form different feature sets of each model, and taking out corresponding feature column data according to the feature sets of each model to form a model training set;
inputting the model training set into each model, splicing each model, and forming a neural network model training set according to the prediction result on each cross validation set;
and inputting the obtained neural network model training data set into the neural network model, and adjusting parameters of the neural network model to ensure that the parameters are optimal in the cross validation result.
CN202111486036.1A 2021-12-07 2021-12-07 Load prediction method Pending CN114169434A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111486036.1A CN114169434A (en) 2021-12-07 2021-12-07 Load prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111486036.1A CN114169434A (en) 2021-12-07 2021-12-07 Load prediction method

Publications (1)

Publication Number Publication Date
CN114169434A true CN114169434A (en) 2022-03-11

Family

ID=80483964

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111486036.1A Pending CN114169434A (en) 2021-12-07 2021-12-07 Load prediction method

Country Status (1)

Country Link
CN (1) CN114169434A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114819380A (en) * 2022-05-12 2022-07-29 福州大学 Power grid bus load prediction method based on model fusion
CN115130741A (en) * 2022-06-20 2022-09-30 北京工业大学 Multi-model fusion based multi-factor power demand medium and short term prediction method
CN116227741A (en) * 2023-05-05 2023-06-06 深圳市万物云科技有限公司 Water chilling unit energy saving method and device based on self-adaptive algorithm and related medium
CN116436002A (en) * 2023-06-13 2023-07-14 成都航空职业技术学院 Building electricity utilization prediction method
CN117477581A (en) * 2023-12-26 2024-01-30 佛山市达衍数据科技有限公司 Power system load balancing control method and power system
CN118364364A (en) * 2024-06-19 2024-07-19 南京信息工程大学 Photovoltaic power generation prediction method and system based on complex neural network

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114819380A (en) * 2022-05-12 2022-07-29 福州大学 Power grid bus load prediction method based on model fusion
CN115130741A (en) * 2022-06-20 2022-09-30 北京工业大学 Multi-model fusion based multi-factor power demand medium and short term prediction method
CN116227741A (en) * 2023-05-05 2023-06-06 深圳市万物云科技有限公司 Water chilling unit energy saving method and device based on self-adaptive algorithm and related medium
CN116436002A (en) * 2023-06-13 2023-07-14 成都航空职业技术学院 Building electricity utilization prediction method
CN116436002B (en) * 2023-06-13 2023-09-05 成都航空职业技术学院 Building electricity utilization prediction method
CN117477581A (en) * 2023-12-26 2024-01-30 佛山市达衍数据科技有限公司 Power system load balancing control method and power system
CN117477581B (en) * 2023-12-26 2024-03-26 佛山市达衍数据科技有限公司 Power system load balancing control method and power system
CN118364364A (en) * 2024-06-19 2024-07-19 南京信息工程大学 Photovoltaic power generation prediction method and system based on complex neural network
CN118364364B (en) * 2024-06-19 2024-08-27 南京信息工程大学 Photovoltaic power generation prediction method and system based on complex neural network

Similar Documents

Publication Publication Date Title
Ibrahim et al. A novel hybrid model for hourly global solar radiation prediction using random forests technique and firefly algorithm
CN114169434A (en) Load prediction method
CN110969290B (en) Runoff probability prediction method and system based on deep learning
CN110619360A (en) Ultra-short-term wind power prediction method considering historical sample similarity
CN110717610B (en) Wind power prediction method based on data mining
CN105574615B (en) wavelet-BP neural network wind power prediction method based on spatial correlation and GA
CN112100911B (en) Solar radiation prediction method based on depth BILSTM
CN116596044B (en) Power generation load prediction model training method and device based on multi-source data
CN113344288B (en) Cascade hydropower station group water level prediction method and device and computer readable storage medium
CN113554466A (en) Short-term power consumption prediction model construction method, prediction method and device
CN112801388B (en) Power load prediction method and system based on nonlinear time series algorithm
CN111738477A (en) Deep feature combination-based power grid new energy consumption capability prediction method
CN113128666A (en) Mo-S-LSTMs model-based time series multi-step prediction method
CN114970353A (en) MSWI process dioxin emission soft measurement method based on missing data filling
CN114580762A (en) Hydrological forecast error correction method based on XGboost
CN116187835A (en) Data-driven-based method and system for estimating theoretical line loss interval of transformer area
CN115907131A (en) Method and system for building electric heating load prediction model in northern area
CN117371581A (en) New energy generated power prediction method, device and storage medium
Srivastava et al. Weather Prediction Using LSTM Neural Networks
CN116883057A (en) XGBoost-based high-precision power customer marketing channel preference prediction system
CN116454875A (en) Regional wind farm mid-term power probability prediction method and system based on cluster division
CN116245259A (en) Photovoltaic power generation prediction method and device based on depth feature selection and electronic equipment
Zahraoui et al. ANN-LSTM Based Tool For Photovoltaic Power Forecasting.
Huang et al. Probabilistic prediction intervals of wind speed based on explainable neural network
CN112183814A (en) Short-term wind speed prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination