CN114169434A - Load prediction method - Google Patents
Load prediction method Download PDFInfo
- Publication number
- CN114169434A CN114169434A CN202111486036.1A CN202111486036A CN114169434A CN 114169434 A CN114169434 A CN 114169434A CN 202111486036 A CN202111486036 A CN 202111486036A CN 114169434 A CN114169434 A CN 114169434A
- Authority
- CN
- China
- Prior art keywords
- data set
- new energy
- output load
- model
- load
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000012549 training Methods 0.000 claims abstract description 47
- 238000010248 power generation Methods 0.000 claims abstract description 29
- 230000004927 fusion Effects 0.000 claims abstract description 22
- 238000012545 processing Methods 0.000 claims abstract description 14
- 230000002159 abnormal effect Effects 0.000 claims description 23
- 238000002790 cross-validation Methods 0.000 claims description 16
- 238000003062 neural network model Methods 0.000 claims description 11
- 238000000354 decomposition reaction Methods 0.000 claims description 9
- 238000011156 evaluation Methods 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 7
- 230000000737 periodic effect Effects 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 4
- 238000002955 isolation Methods 0.000 claims description 3
- 230000005856 abnormality Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 10
- 238000004590 computer program Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000012417 linear regression Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 229910052799 carbon Inorganic materials 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000001932 seasonal effect Effects 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006386 neutralization reaction Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000000053 physical method Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/003—Load forecast, e.g. methods or systems for forecasting future load demand
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Economics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Power Engineering (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a load prediction method, which comprises the following steps: acquiring a power generation system output sample data set; decomposing the output sample data set into a new energy output load data set and a non-new energy output load data set; carrying out exception handling on the new energy output load data set; acquiring a new energy combined characteristic data set according to the new energy output load data set subjected to exception processing; obtaining an output load training data set according to the new energy combination characteristic data set; obtaining a fusion model alternative set according to the output load training data set; optimizing the fusion model in the alternative set of the fusion model according to the output load training data set; predicting the new energy output load by adopting the optimized fusion model; predicting the non-new energy output load according to the non-new energy output data set and a pre-trained non-new energy output prediction model, and adding the prediction result and the new energy output result to obtain the final predicted output load; the invention can accurately predict the system load.
Description
Technical Field
The invention belongs to the technical field of load prediction, and particularly relates to a load prediction method.
Background
At the present stage, with the appearance of the new targets of realizing carbon peak reaching before 2030 and carbon neutralization before 2060 in China, the occupation proportion of new energy power generation in the field of power generation is increased sharply, and the new energy power generation is increased gradually to replace the traditional thermal power generation gradually, so that the occupation proportion of new energy load in regional system load is increased gradually. The difficulty of system load prediction is increased due to randomness and uncontrollable property of new energy power generation, the accurate prediction of the new energy load power generation can increase the accuracy of regional system load prediction, the power supply utilization rate of power grid scheduling is improved, and the scheduling plan is more efficient and reasonable.
Compared with the traditional resident load power generation prediction, the system load prediction of the area with higher new energy load ratio has higher difficulty, and besides the higher fluctuation of the system load of the area with higher new energy load ratio, the new energy load greatly depends on weather and the external environment of a power station, and the regularity is less obvious. The difficulty of system load prediction is greatly increased.
The current system load prediction method mainly comprises the following steps: traditional linear predictive regression models and novel predictive models for artificial neural networks. The linear predictive regression model includes: regression models, random forest models, kalman filtering methods, time series models, and the like are commonly used for the prediction of small data sets, and have poor effects in the prediction of complex nonlinear relationships. The novel predictive model comprises: neural network models, grey prediction models, wavelet analysis, expert systems, and the like.
The current method for predicting the new energy load mainly comprises the following steps: the method comprises a direct prediction method and an indirect prediction method, wherein the direct prediction method (statistical method) is mainly based on mathematical statistics prediction theory and method, and comprises methods such as probability, time sequence, artificial intelligence and the like. The method has the advantages of simple and clear procedure and no requirements on the position of the power generation station and the power conversion parameters; the method has the defect that in order to ensure the accuracy of the forecast result, a large amount of historical operating data of the photovoltaic power station and accurate future forecast weather data are needed, so that the current forecast difficulty is high. The indirect prediction method (physical method) is mainly based on the physical power generation principle of a new energy power generation system mainly comprising photovoltaic and wind power, and has the advantages that historical operation data is not needed, and the photovoltaic and wind power stations can be directly predicted after being built; the method has the defects that data such as a power station detailed topographic map, power station coordinates, a power generation power station power curve and other related photoelectric conversion parameters are needed, more data are difficult to quantify, and the difficulty in obtaining power station detailed physical information on a system load level is high, so that the current prediction difficulty is high.
Because the new energy load is gradually increased in the area system load ratio, the system load prediction is influenced, the accuracy is reduced to some extent, and the new energy load cannot adapt to the load change with large fluctuation.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a load prediction method which can accurately predict the system load.
The technical problem to be solved by the invention is realized by the following technical scheme:
the invention provides a load prediction method, which comprises the following steps:
acquiring a power generation system output sample data set;
decomposing the output sample data set into a new energy output load data set and a non-new energy output load data set;
carrying out exception handling on the new energy output load data set;
acquiring a new energy combined characteristic data set according to the new energy output load data set subjected to exception processing;
obtaining an output load training data set according to the new energy combination characteristic data set;
obtaining a fusion model alternative set according to the output load training data set;
optimizing the fusion model in the alternative set of the fusion model according to the output load training data set;
predicting the new energy output load by adopting the optimized fusion model;
and predicting the non-new energy output load according to the non-new energy output data set and the pre-trained non-new energy output prediction model, and adding the prediction result and the new energy output result to obtain the final predicted power generation system output load.
Further, the obtaining of the output sample data set of the power generation system includes: acquiring a power generation output load data set and a weather data set of a target area system, filling abnormal points of data loss and numerical abnormality in the power generation output load data set of the area system of the area system by adopting a mean value method, a spline difference value or linear interpolation, and deleting abnormal data sections in the power generation output data set of the area system by comparing historical real weather and output load correlation in the weather data set.
Further, decomposing the output sample data set into a new energy output load data set and a non-new energy output load data set includes: and decomposing the output load sample data set of the power generation system into a trend component, a periodic component and a new energy component by adopting an STL time sequence decomposition method, wherein the sum of the trend component and the periodic component is non-new energy output load data, and the new energy component is new energy output load data.
Further, the exception handling of the new energy output load data set includes:
calculating a person correlation coefficient of the characteristics and the output load according to the new energy output load data set and the weather data set, and screening out a characteristic composition characteristic set of which the correlation with the output load is higher than a threshold value;
inputting the feature set and the new energy output load data into an Isolation Forest anomaly detection model to obtain insignificant anomaly points;
and interpolating abnormal points in the non-significant abnormal points by adopting a linear method, a mean value interpolation method or a mode interpolation method, and deleting abnormal data sections in the abnormal points.
Further, the person correlation coefficient of all the characteristics and the output force is calculated by adopting the formula (1):
wherein X represents all the characteristic sample variables, Y represents all the output sample variables, XiFor the value of the current feature sample variable at the ith time point, YiThe value of the force load at the ith time point,is the mean of the current feature sample variables, σ X is the standard deviation of the current feature sample variables,is the mean value of the output load sample variables, σ Y is the standard deviation of the output load sample variables, and n is the number of time points.
Further, the obtaining of the new energy combined feature data set according to the new energy output data set after the exception handling includes:
generating a characteristic column according to the prior knowledge and the characteristic weather;
calculating a person correlation coefficient and a Spearman correlation coefficient of the characteristics and the output load in the characteristic column according to the new energy output load data set after exception processing, and screening out the characteristics of which the correlation with the output load is higher than a threshold value;
and (4) carrying out independent transformation on the time characteristics in the screened characteristics, carrying out triangular transformation on the wind direction characteristics, and adding the combination characteristics on the basis to obtain a new energy combination characteristic data set.
Further, the obtaining of the fusion model candidate set according to the output training data set includes:
and (3) segmenting the output load training data set, respectively inputting the segmented output load training data set into an LSTM model, a KNN model, an XGboost model and a Lightgbm model, evaluating the model performance, and selecting a group with the optimal evaluation value as a model candidate set.
Further, evaluating the model performance by adopting an MSE and MAE evaluation method;
wherein the MSE evaluation is as shown in formula (2)
MAE evaluation is shown in formula (3)
Where m denotes the number of samples, yiRepresents the actual force load value of the ith sample,the predicted output load value for the ith sample is shown.
Further, optimizing the fusion model in the fusion model candidate set according to the new energy combination feature data set includes:
dividing the features in the new energy combined feature data set into basic features and additional features, randomly disordering and uniformly dividing the additional features onto the basic features to form different feature sets of each model, and taking out corresponding feature column data according to the feature sets of each model to form a model training set;
inputting the model training set into each model, splicing each model, and forming a neural network model training set according to the prediction result on each cross validation set;
and inputting the obtained neural network model training data set into the neural network model, and adjusting parameters of the neural network model to ensure that the parameters are optimal in the cross validation result.
The invention has the beneficial effects that: the method carries out high-precision targeted modeling prediction on the new energy load component with high fluctuation and high randomness of regional system load, adopts multiple linear regression on the non-new energy load component of the stable part, and fits the regular residential and industrial and commercial power loads. And the accuracy of the regional system load is improved by adopting a separation prediction method for stable data and unstable data, and the influence of new energy fluctuation on the regional system load is reduced.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a graph of data obtained after decomposition of the original output in the present invention;
FIG. 3 is a schematic structural diagram of the LSTM model of the present invention;
FIG. 4 is a schematic diagram of the significance of some features of the present invention;
FIG. 5 is a schematic diagram illustrating the influence of some features on the output result according to the present invention;
FIG. 6 is a diagram illustrating a predicted result of a new energy load part according to the present invention;
FIG. 7 is a schematic diagram of a non-new energy load data prediction result according to the present invention;
FIG. 8 is a diagram illustrating the sum of decomposition prediction results in the present invention.
Detailed Description
To further describe the technical features and effects of the present invention, the present invention will be further described with reference to the accompanying drawings and detailed description.
In a power grid system, the generated output load and the power load of the system are basically in a balanced state, and the difficulty of load acquisition of power consumption measurement is caused due to the diversity of power consumption, so that the power consumption is estimated through the generated output load, and the obtained final output of the system is equivalent to the obtained regional system load
As shown in fig. 1 to 8, the load decomposition prediction method based on feature combination and model fusion of the present application includes the following steps:
step 1: and processing the power generation output load and weather data set of the target area system, finding out the abnormal points and abnormal data sections with obvious data, and screening to obtain a sample data set of the output load of the area power generation system.
In the embodiment of the application, historical data of power generation processing loads of systems 2019-1-1 to 2021-1-1 in a certain area is obtained, and relevant weather data comprises: wind speed, direction, irradiance, temperature, rainfall, air pressure, etc., one sample point every 15 minutes. The time periods of continuous loss in the output load data and the weather data are preliminarily screened and deleted, extreme abnormal values such as temperature of-99 ℃ and the like in the weather data are deleted, continuous abnormal data segments (for example, the output load values of the output load in a continuous period of time are all abnormal) are deleted, and interpolation is carried out on abnormal points of discontinuous data.
Step 2: decomposing the output load sample data set into a new energy output load data set and a non-new energy output load data set;
and decomposing the output load data of the systems of the regions 2019-1-1 to 2021-1-1 into a trend component, a periodic component and a new energy component remainder by using a time sequence decomposition algorithm STL. Wherein the STL consists of two loop mechanisms, an inner loop nested within an outer loop. The seasonal item and the trend item are updated once every time the internal circulation is performed; each inner loop consists of n (i) such processes; each outer cycle consists of an inner cycle, and the stable weight can be obtained through calculation; these weights are used in the next inner loop to reduce transient abnormal behavior in the trend and seasonal terms. The robust weights set for the first outer loop are all equal to 1, and then n (o) outer loops are performed.
And obtaining the output load trend component, the periodic component (non-new energy) and the new energy component remainder of the regional system. And taking out the new energy component remainder data set as a prediction new energy component prediction data set, wherein the sum of the trend component and the period component is a non-new energy output data set.
The results of the time-series decomposition part are shown in FIG. 2: wherein, Observed represents the actual output load value, Trend represents the Trend component, Season represents the period component, and Resid represents the new energy remainder.
And step 3: the method for processing the new energy processing data set comprises the following steps:
step 3.1, according to the new energy source data set obtained in the step 2, a Person correlation coefficient of each line of characteristic data and output load is calculated in the data set, a characteristic line with a correlation value larger than or equal to 0.25 is screened out, corresponding data is taken out according to the characteristic line to predict a related data set for a new energy component, and a Person correlation coefficient calculation formula is as follows:
where n is the number of samples, XiFor the ith (i represents the time point ordinal) value of the current feature column data (current feature sample variable), YiIs the value at the ith time point of the force load data,respectively, the current characteristic column mean value and the output load column mean value, wherein sigma X and sigma Y are the current characteristic column standard deviation and the output load column standard deviation.
Step 3.2, the screened new energy component prediction related data set is used for carrying out abnormal data point detection by using an abnormal detection algorithm isolated Forest (Isolation Forest), and the calculation mode is as follows:
step 3.2.1 obtains the new energy component prediction related dataset from step 3.1, and takes all the feature columns to form a dataset X ═ X1,...,xn},Where d represents a data feature dimension, randomly drawn from XThe data for each time point constitutes a subset X' of X, which is placed into the root node.
Randomly choosing a feature q from d feature dimensions,randomly generating a cut point p in the current dataset such that: min (x)ij,j=q,xij∈X')<p<max(xij,j=q,xij∈X')。
And a hyperplane is generated by the cutting point p, the current data space is divided into two subspaces, the sample points with the value smaller than p on the characteristic p are placed into the left child node, and the other sample points are placed into the right child node.
Repeating the cutting and dividing operations on the left child node and the right child node until all leaf nodes only have one sample point or an isolated tree (iTree) reaches the designated height, and generating an isolated tree
And repeating the first-stage step method until t isolated trees are generated.
Step 3.2.2 the outlier of each data point is calculated from the new energy component prediction related data set obtained in step 3.1, the specific method is as follows:
for each data point xiTraverse each orphan tree (iTree) to compute point xiAverage height h (x) in foresti) And normalizing the average height of all the points, and calculating the abnormal value fraction by using the following formula:
whenWhen the temperature of the water is higher than the set temperature,i.e., the average path length of data point x is similar to the average path length of the tree, it is not possible to distinguish whether it is an anomaly or not.
Step 3.3: according to the calculation result, interpolation is carried out on the abnormal points through methods such as mean value, mode, linear filling and the like, the filled data are ensured to meet the condition that the timestamp of each day is complete and 96 data points exist each day, the abnormal values in a long time period are deleted, and the deleted data comprise the data of the whole day. And obtaining a new energy component training set.
And 4, step 4: acquiring a new energy combined characteristic data set according to the new energy output data set subjected to exception processing; the method specifically comprises the following steps:
step 4.1: and (4) generating a characteristic column on the new energy component training set generated in the step (3), wherein the characteristic column comprises historical output load characteristics formed by output load data of the previous period of time, historical weather characteristics formed by weather data of the previous period of time, and time-class characteristics formed by year, month, day, week, hour and the like.
Step 4.2: using a formula for each column of characteristic data of the data set obtained in the step 4.1:
calculating the Person correlation coefficient of each column of characteristics and output load, whereinThe mean value of the data characteristic column, the standard deviation of the sigma X data characteristic column,the mean value of the output load line and the sigma Y are the standard deviation of the output load line; using the formula:a correlation coefficient. It is composed ofIn diAnd representing the difference value of the ith sample point of the characteristic X and the ith sample point of the output load Y, wherein n is the number of the sample points.
The characteristic column with a Person correlation coefficient less than 0.2 and the characteristic column with a Spearman correlation coefficient less than 0.1 are removed from the data set. The main characteristics are obtained as follows:
TABLE 1 set of predicted characteristics of force load
And 4.3, performing one-hot (one-hot) transformation on the time characteristics and triangular transformation on the wind direction characteristics in the characteristics screened in the step 4.2, adding combined characteristics on the basis, wherein the combined characteristics comprise the sum, difference, product and quotient of any two rows of characteristics, and additionally adding the difference value between the characteristics and the mean value thereof and the difference value between the characteristics and the median to obtain a data characteristic complete set and a data complete set.
The step 5 specifically comprises the following steps:
step 5.1: the data sets are scrambled by day, and the data in the day are kept in order.
Step 5.2: and inputting the disordered data set into an XGboost model to calculate the feature importance of each feature, selecting the features with the feature importance greater than 0.05, and taking out corresponding feature data.
Step 5.3: the features with the remaining feature importance of less than or equal to 0.05 are gradually added to the data set extracted in step 5.2, and the influence of the obtained features on the model result is shown in fig. 5. Dividing the data set equally into ten parts, and selecting a model with the optimal ten-fold average result in a ten-fold cross validation mode, wherein the corresponding characteristic is a model characteristic complete set. And the feature importance ordering (feature importance) in the feature set is shown in fig. 4.
Step 6 specifically includes the following steps
Step 6.1: and (5) taking out corresponding data according to the model feature complete set obtained in the step (5), disordering the data set according to the day, and keeping the data in order in the day.
Step 6.2: and (4) dividing the data set by adopting a ten-fold cross validation mode, and respectively inputting the data set into model training of an LSTM model, a KNN model, an XGboost model, a LightGBM, ET and the like.
Step 6.3: adjusting each model parameter training model, predicting a verification set in the cross-folding intersection by using the model, calculating MSE and MAE values of the verification set, taking the average value of the MSE and the MAE as a model evaluation value, and finally obtaining 5 better models: LSTM, XGboost, LightGBM, KNN.
The step 7 specifically comprises the following steps:
and 7.1, extracting corresponding data according to the model feature complete set obtained in the step 5, and dividing the features into basic features and additional features according to feature importance, wherein the basic features are the features 15 before feature importance ranking, and the additional features are the rest features. In order to increase the diversity of the model, the additional features are disorganized and equally divided into 4 parts, and the 4 parts of feature sets are spliced with the basic features to obtain the feature sets of the 4 models.
And 7.2, respectively taking out 4 training data sets corresponding to the 4 models according to the obtained 5-model feature set and the data feature complete set obtained in the step 4.3. And (4) carrying out dimension transformation on the 4 model training sets, and inputting the 4 models by adopting a ten-fold cross validation mode.
The training of the plurality of models in step 7.2 specifically comprises the following steps
The specific method for training the XGboost model is as follows:
taking a corresponding training data set of the XGboost model, adopting a ten-fold cross validation mode, and repeating the following steps for training:
step (1): establishing a regression tree model;
step (2): and adding subtrees through feature splitting, adding the current tree into the original model every time one regression tree is added, and fitting the residual error of the last prediction, namely the difference between the actual value and the predicted value. The splitting rule is to calculate the gain after each split and select the gain splitting scheme with the maximum gain.
And (3): continuously splitting the characteristics, finally reaching a leaf node, summing the results of each leaf node to obtain a predicted value of the sample,
and (4): and traversing all the characteristic division points by using a greedy algorithm, and minimizing the objective function to obtain the XGboost model by using the objective function value as an evaluation function.
And (5): XGboost parameter adjustment, parameters include: the number of maximum trees generated, the learning rate, the minimum loss function degradation value required for node splitting, the random sampling proportion, the maximum depth of the trees, the L1 regularization term of the weight, and the L2 regularization term of the weight. And continuously setting a parameter minimization objective function to obtain a final XGboost model as a fusion model.
The LightGBM training steps are slightly different according to the XGboost model training method:
when a split point is searched, continuous floating point characteristic values are discretized into k integers, and a histogram with the width of k is constructed. When data is traversed, statistics are accumulated in the histogram according to the discretized value serving as an index, after the data is traversed once, the histogram accumulates needed statistics, and then the optimal segmentation point is found by traversing and calculating the splitting gain according to the discretized value of the histogram.
And during splitting, a Leaf-wise splitting method is adopted, and one Leaf with the largest splitting gain is found from all current leaves every time during splitting. LightGBM therefore adds a maximum depth limit above the Leaf-wise, preventing overfitting while ensuring high efficiency.
When the histogram is calculated, the leaf node with small histogram is firstly calculated, and then the difference is made by utilizing the histogram to obtain the leaf node with large histogram, so that the histogram of the brother leaf can be obtained with very little cost.
LSTM training step:
(1) taking a training data set corresponding to an LSTM model, determining input and output dimensions (for example, in the application, the input dimension is (96, num _ features) and the output dimension of the model, predicting data of one day, the output dimension is (96,1), wherein num _ features represents the feature number), and repeating the following steps for training by adopting a ten-fold cross validation mode:
(2) selecting a mean square error as a loss function, designing an LSTM model structure according to a calculation formula;
where m denotes the number of samples, yiRepresents the actual force load value of the ith sample,the predicted output load value for the ith sample is shown.
(3) Putting the data set into an LSTM model for forward propagation calculation;
(4) and (4) calculating the back propagation of the model.
And (5): and minimizing MSE by using an Adam optimization algorithm, and adding a regularization term dropout to obtain an LSTM fusion model.
The structure of the LSTM model is shown in fig. 3.
KNN model training step:
taking a KNN model corresponding training data set, adopting a ten-fold cross validation mode, and repeating the following steps for training:
(1) in ten-fold cross validation, nine-fold data is used for training, one-fold data is used for validation, and the distance between each sample and the sample in the training is calculated and validated (the Euclidean distance is used in the application).
(2) And gradually increasing the k value to ensure that the model performs best on the verification set, so as to obtain the KNN fusion model.
Step 7.3: and respectively splicing the ten-fold cross validation effect of each model on the validation set to obtain the predicted value of each model on the training set, wherein four columns in total respectively represent: the LightGBM model predicted value, the XGboost model predicted value, the LSTM model predicted value and the KNN model predicted value.
Step 7.4: inputting the characteristics of the 4 columns into a multiple linear regression neural network model, and finally obtaining 10 linear regression models by adopting a 10-fold cross validation mode.
The step 8 specifically comprises the following steps:
step 8.1: using the test set and the different feature columns corresponding to each model in step 5, the corresponding features and the corresponding data sets are respectively extracted, and the data is input into each cross-validated model obtained in step 7.3 (for example, a total of 40 models are obtained by cross-validation using 4 models 10 in the present application). And obtaining a plurality of cross validation model prediction results of each model (4 models in total, each model has ten models corresponding to different training sets, and each model performs prediction and has 40 groups of prediction values in total).
Step 8.2: and (3) respectively averaging the prediction results according to the model types to obtain the prediction input characteristics of the neural network, inputting the prediction input characteristics into the 10 linear regression models obtained in the step (7.4) for prediction, and averaging the prediction results to obtain the prediction results of the fusion models on the test set, wherein partial results are shown in FIG. 6.
Preferably, step 9 specifically comprises the following steps:
step 9.1: and (3) obtaining a training set and a testing set according to the step 5 by using the non-new energy output load data set obtained in the step 2 and the weather cleaning data set obtained in the step 1, and obtaining a corresponding time data set to input into a linear regression neural network model for training. And adjusting parameters in a cross validation mode to obtain a non-new energy output load prediction model.
Step 9.2: inputting the corresponding test set obtained in the step 9.1 into a non-new energy load prediction model for prediction, wherein partial prediction results are shown in fig. 7. And (3) adding the prediction result and the new energy load prediction result obtained in the step (8.2) to obtain a final load prediction result, wherein partial results are shown in fig. 8. The accuracy rate obtained without decomposition is 95.3%, and the accuracy rate predicted by decomposition is 97.1%.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.
Claims (9)
1. A method of load prediction, comprising:
acquiring a power generation system output sample data set;
decomposing the output sample data set into a new energy output load data set and a non-new energy output load data set;
carrying out exception handling on the new energy output load data set;
acquiring a new energy combined characteristic data set according to the new energy output load data set subjected to exception processing;
obtaining an output load training data set according to the new energy combination characteristic data set;
obtaining a fusion model alternative set according to the output load training data set;
optimizing the fusion model in the alternative set of the fusion model according to the output load training data set;
predicting the new energy output load by adopting the optimized fusion model;
and predicting the non-new energy output load according to the non-new energy output data set and the pre-trained non-new energy output prediction model, and adding the prediction result and the new energy output result to obtain the final predicted power generation system output load.
2. The method of claim 1, wherein the obtaining a power generation system output sample set comprises: acquiring a power generation output load data set and a weather data set of a target area system, filling abnormal points of data loss and numerical abnormality in the power generation output load data set of the area system of the area system by adopting a mean value method, a spline difference value or linear interpolation, and deleting abnormal data sections in the power generation output data set of the area system by comparing historical real weather and output load correlation in the weather data set.
3. The method of claim 2, wherein decomposing the set of output sample data into the new energy output load data set and the non-new energy output load data set comprises:
and decomposing the output load sample data set of the power generation system into a trend component, a periodic component and a new energy component by adopting an STL time sequence decomposition method, wherein the sum of the trend component and the periodic component is non-new energy output load data, and the new energy component is new energy output load data.
4. The method of claim 2, wherein the exception handling of the new energy contribution load dataset comprises:
calculating a person correlation coefficient of the characteristics and the output load according to the new energy output load data set and the weather data set, and screening out a characteristic composition characteristic set of which the correlation with the output load is higher than a threshold value;
inputting the feature set and the new energy output load data into an Isolation Forest anomaly detection model to obtain insignificant anomaly points;
and interpolating abnormal points in the non-significant abnormal points by adopting a linear method, a mean value interpolation method or a mode interpolation method, and deleting abnormal data sections in the abnormal points.
5. A method of load prediction as claimed in claim 4, characterised by calculating the person correlation coefficient for all features and contribution using equation (1):
wherein X represents all the characteristic sample variables, Y represents all the output sample variables, XiFor the value of the current feature sample variable at the ith time point, YiThe value of the force load at the ith time point,is the mean of the current feature sample variables, σ X is the standard deviation of the current feature sample variables,is the mean value of the output load sample variables, σ Y is the standard deviation of the output load sample variables, and n is the number of time points.
6. The method of claim 1, wherein the deriving the new energy combined feature data set from the new energy contribution load data set after exception handling comprises:
generating a characteristic column according to the prior knowledge and the characteristic weather;
calculating a person correlation coefficient and a Spearman correlation coefficient of the characteristics and the output load in the characteristic column according to the new energy output load data set after exception processing, and screening out the characteristics of which the correlation with the output load is higher than a threshold value;
and (4) carrying out independent transformation on the time characteristics in the screened characteristics, carrying out triangular transformation on the wind direction characteristics, and adding the combination characteristics on the basis to obtain a new energy combination characteristic data set.
7. The method of claim 1, wherein deriving the fused model candidate set from the imposed load training data set comprises:
and (3) segmenting the output load training data set, respectively inputting the segmented output load training data set into an LSTM model, a KNN model, an XGboost model and a Lightgbm model, evaluating the model performance, and selecting a group with the optimal evaluation value as a model candidate set.
8. A method of load prediction as claimed in claim 7 wherein the model performance is assessed using MSE and MAE assessment;
wherein the MSE evaluation is as shown in formula (2)
MAE evaluation is shown in formula (3)
9. The method of claim 7, wherein optimizing the fusion model in the candidate set of fusion models according to the new energy source combined feature data set comprises:
dividing the features in the new energy combined feature data set into basic features and additional features, randomly disordering and uniformly dividing the additional features onto the basic features to form different feature sets of each model, and taking out corresponding feature column data according to the feature sets of each model to form a model training set;
inputting the model training set into each model, splicing each model, and forming a neural network model training set according to the prediction result on each cross validation set;
and inputting the obtained neural network model training data set into the neural network model, and adjusting parameters of the neural network model to ensure that the parameters are optimal in the cross validation result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111486036.1A CN114169434A (en) | 2021-12-07 | 2021-12-07 | Load prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111486036.1A CN114169434A (en) | 2021-12-07 | 2021-12-07 | Load prediction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114169434A true CN114169434A (en) | 2022-03-11 |
Family
ID=80483964
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111486036.1A Pending CN114169434A (en) | 2021-12-07 | 2021-12-07 | Load prediction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114169434A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114819380A (en) * | 2022-05-12 | 2022-07-29 | 福州大学 | Power grid bus load prediction method based on model fusion |
CN115130741A (en) * | 2022-06-20 | 2022-09-30 | 北京工业大学 | Multi-model fusion based multi-factor power demand medium and short term prediction method |
CN116227741A (en) * | 2023-05-05 | 2023-06-06 | 深圳市万物云科技有限公司 | Water chilling unit energy saving method and device based on self-adaptive algorithm and related medium |
CN116436002A (en) * | 2023-06-13 | 2023-07-14 | 成都航空职业技术学院 | Building electricity utilization prediction method |
CN117477581A (en) * | 2023-12-26 | 2024-01-30 | 佛山市达衍数据科技有限公司 | Power system load balancing control method and power system |
CN118364364A (en) * | 2024-06-19 | 2024-07-19 | 南京信息工程大学 | Photovoltaic power generation prediction method and system based on complex neural network |
-
2021
- 2021-12-07 CN CN202111486036.1A patent/CN114169434A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114819380A (en) * | 2022-05-12 | 2022-07-29 | 福州大学 | Power grid bus load prediction method based on model fusion |
CN115130741A (en) * | 2022-06-20 | 2022-09-30 | 北京工业大学 | Multi-model fusion based multi-factor power demand medium and short term prediction method |
CN116227741A (en) * | 2023-05-05 | 2023-06-06 | 深圳市万物云科技有限公司 | Water chilling unit energy saving method and device based on self-adaptive algorithm and related medium |
CN116436002A (en) * | 2023-06-13 | 2023-07-14 | 成都航空职业技术学院 | Building electricity utilization prediction method |
CN116436002B (en) * | 2023-06-13 | 2023-09-05 | 成都航空职业技术学院 | Building electricity utilization prediction method |
CN117477581A (en) * | 2023-12-26 | 2024-01-30 | 佛山市达衍数据科技有限公司 | Power system load balancing control method and power system |
CN117477581B (en) * | 2023-12-26 | 2024-03-26 | 佛山市达衍数据科技有限公司 | Power system load balancing control method and power system |
CN118364364A (en) * | 2024-06-19 | 2024-07-19 | 南京信息工程大学 | Photovoltaic power generation prediction method and system based on complex neural network |
CN118364364B (en) * | 2024-06-19 | 2024-08-27 | 南京信息工程大学 | Photovoltaic power generation prediction method and system based on complex neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ibrahim et al. | A novel hybrid model for hourly global solar radiation prediction using random forests technique and firefly algorithm | |
CN114169434A (en) | Load prediction method | |
CN110969290B (en) | Runoff probability prediction method and system based on deep learning | |
CN110619360A (en) | Ultra-short-term wind power prediction method considering historical sample similarity | |
CN110717610B (en) | Wind power prediction method based on data mining | |
CN105574615B (en) | wavelet-BP neural network wind power prediction method based on spatial correlation and GA | |
CN112100911B (en) | Solar radiation prediction method based on depth BILSTM | |
CN116596044B (en) | Power generation load prediction model training method and device based on multi-source data | |
CN113344288B (en) | Cascade hydropower station group water level prediction method and device and computer readable storage medium | |
CN113554466A (en) | Short-term power consumption prediction model construction method, prediction method and device | |
CN112801388B (en) | Power load prediction method and system based on nonlinear time series algorithm | |
CN111738477A (en) | Deep feature combination-based power grid new energy consumption capability prediction method | |
CN113128666A (en) | Mo-S-LSTMs model-based time series multi-step prediction method | |
CN114970353A (en) | MSWI process dioxin emission soft measurement method based on missing data filling | |
CN114580762A (en) | Hydrological forecast error correction method based on XGboost | |
CN116187835A (en) | Data-driven-based method and system for estimating theoretical line loss interval of transformer area | |
CN115907131A (en) | Method and system for building electric heating load prediction model in northern area | |
CN117371581A (en) | New energy generated power prediction method, device and storage medium | |
Srivastava et al. | Weather Prediction Using LSTM Neural Networks | |
CN116883057A (en) | XGBoost-based high-precision power customer marketing channel preference prediction system | |
CN116454875A (en) | Regional wind farm mid-term power probability prediction method and system based on cluster division | |
CN116245259A (en) | Photovoltaic power generation prediction method and device based on depth feature selection and electronic equipment | |
Zahraoui et al. | ANN-LSTM Based Tool For Photovoltaic Power Forecasting. | |
Huang et al. | Probabilistic prediction intervals of wind speed based on explainable neural network | |
CN112183814A (en) | Short-term wind speed prediction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |