CN113256036A - Power supply cost analysis and prediction method based on Prophet-LSTNet combined model - Google Patents

Power supply cost analysis and prediction method based on Prophet-LSTNet combined model Download PDF

Info

Publication number
CN113256036A
CN113256036A CN202110791617.XA CN202110791617A CN113256036A CN 113256036 A CN113256036 A CN 113256036A CN 202110791617 A CN202110791617 A CN 202110791617A CN 113256036 A CN113256036 A CN 113256036A
Authority
CN
China
Prior art keywords
data
power supply
supply cost
model
lstnet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110791617.XA
Other languages
Chinese (zh)
Other versions
CN113256036B (en
Inventor
王海庆
蓝飞
姚日权
孙泉辉
程嵩
金绍君
费英群
方利锋
罗哲珺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Zhejiang Electric Power Co Ltd
Huzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
State Grid Zhejiang Electric Power Co Ltd
Huzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Zhejiang Electric Power Co Ltd, Huzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical State Grid Zhejiang Electric Power Co Ltd
Priority to CN202110791617.XA priority Critical patent/CN113256036B/en
Publication of CN113256036A publication Critical patent/CN113256036A/en
Application granted granted Critical
Publication of CN113256036B publication Critical patent/CN113256036B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The invention discloses a power supply cost analysis and prediction method based on a Prophet-LSTNet combined model, which solves the defects of the prior art and comprises the following steps: step 1, acquiring historical daily power supply cost data within a period of time, and inputting the power supply cost data into a Prophet model; step 2, decomposing the power supply cost data into nonlinear trend component data, seasonal component data and holiday component data through a Prophet model; step 3, performing characteristic engineering construction to obtain multidimensional characteristics of the power supply cost data; step 4, inputting the multidimensional characteristics of the power supply cost data into an LSTNet model as parameters, inputting nonlinear trend component data into the LSTNet model, training the LSTNet model, searching the dependency relationship between the multidimensional characteristics of the power supply cost data and the nonlinear trend component data, and training and alternately updating the LSTNet model and the nonlinear trend component data weight parameters; and 5, predicting the power supply cost to obtain a prediction result.

Description

Power supply cost analysis and prediction method based on Prophet-LSTNet combined model
Technical Field
The invention relates to the technical field of power distribution networks, in particular to a power supply cost analysis and prediction method based on a Prophet-LSTNet combined model.
Background
The power grid power supply cost prediction means that the estimation of cost consumption in a specific time in the future is realized by analyzing the form of the future according to the previous operation and maintenance cost, including the conditions of labor cost, overhaul operation and maintenance cost, marketing operation and maintenance cost, other operation expenses and the like, and the accurate prediction of the power grid power supply cost prediction has very important significance for the overall grasp of the national power grid on the use cost, capital deployment, investment construction and the like. At present, related researches are few, and the main reasons are that the power grid is large in scale, wide in distribution region and complex in operation conditions, so that the data fluctuation of a power grid power supply station is very large, and the data is short-term and non-continuous, so that the prediction is very difficult.
The data mining tools available for prediction at present are mainly some traditional statistical analysis methods, such as time series analysis, linear/nonlinear regression models, gray system models, maximum entropy markov models, etc., wherein the time series analysis method highlights the role of time factors in prediction, and thus is widely applied to economic prediction.
In the time series analysis method, the ARIMA model proposed by Box and Jenkins is the most commonly used, but the cost change cannot be well predicted by the traditional statistical model due to the following two reasons: on one hand, when the data volume is large, it is necessary to construct an effective input data structure to depict corresponding cost and money, however, in the traditional method, a small number of data dimensions are selected, or the overall correlation property between the existing data is ignored, which causes the information of the data to be lost, so that the further modeling is limited a priori; on the other hand, based on all data dimensions that can be obtained, the traditional method cannot effectively extract high-dimensional and interactive effective features that are beneficial to prediction, such as: the time series model only highlights time series without considering the influence of external factors, so that the cost value with larger variation can be predicted by using the ARIMA model to generate larger deviation, and the reasons cause inaccuracy of statistical prediction, thereby limiting the practical application of the traditional method.
In order to well predict the cost by using historical data (big data), it is necessary to construct an effective input data structure to characterize the corresponding cost amount, and thus the requirement for big data processing and mining also arises, wherein a representative method is a neural network method, in the past decade, a neural network method based on deep learning has been widely applied to data processing, and an LSTM (long short term memory network) neural network model proposed by Hochreiter and Schmidhuber in 1997 is widely applied to stock prediction, handwriting recognition, voice recognition, power prediction, and the like.
However, in practice, LSTM cannot capture very long-term sequence relationships, so relevant researchers design LSTNet models to solve this problem, where LSTNet includes a convolution component, a recurrent neural network component, a hopping recurrent neural network component, and an autoregressive component, and can capture multi-scale periodic regularity of data, but LSTNet can only mine sequence features at different time intervals, and cannot smoothly process sequence noise and the like.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a power supply cost analysis and prediction method based on a Prophet-LSTNet combined model.
The purpose of the invention is realized by the following technical scheme:
a power supply cost analysis and prediction method based on a Prophet-LSTNet combined model comprises the following steps:
step 1, acquiring historical daily power supply cost data within a period of time, performing data cleaning on the power supply cost data, and inputting the power supply cost data after the data cleaning is completed into a Prophet model;
step 2, decomposing the power supply cost data into nonlinear trend component data, seasonal component data and holiday component data through a Prophet model;
step 3, carrying out characteristic engineering construction, and mining and analyzing the power supply cost related information to obtain multidimensional characteristics of the power supply cost data;
step 4, inputting the multidimensional characteristics of the power supply cost data into an LSTNet model as parameters, inputting nonlinear trend component data into the LSTNet model, training the LSTNet model, searching the dependency relationship between the multidimensional characteristics of the power supply cost data and the nonlinear trend component data, and training and alternately updating the LSTNet model and the nonlinear trend component data weight parameters;
and 5, predicting the power supply cost by combining the LSTNet model, the seasonal component data and the holiday component data to obtain a prediction result.
The Prophet model is an algorithmic model for time series prediction. The algorithm principle of the method is to decompose data into nonlinear trend components, seasonal (daily) components and holiday components so as to predict sequences, compared with other models, the Prophet model has stronger robustness to missing data, abnormal data and variation trends, better performs on sequences with multi-seasonality and obvious holiday effects, has enough flexibility for various service time sequences, and can be configured by non-experts with little knowledge of data generation processes and time sequence models.
Long-and Short-term Time-series network (LSTNet) is specially designed for deep learning network of Time series prediction, which uses convolutional layer (CNN), cyclic neural network (LSTM), Skip cyclic neural network (Skip LSTM), and auto-regressive mechanism, LSTNet uses the advantages of convolutional layer to find the dependency pattern between local multidimensional input variables and cyclic layer to capture complex Long-term dependency, it captures very Long-term dependency pattern through a novel recursive structure (i.e. recursive Skip) and uses the periodicity of input Time series signal to simplify the optimization process, finally, LSTNet combines the traditional auto-regressive linear model parallel to the non-linear neural network part, which makes the non-linear deep learning model more robust to Time series violating scale change.
In the scheme, firstly, time series decomposition is carried out on power supply cost data through a Prophet model, wherein seasonal component data and holiday component data are periodic variation data which are stable, and nonlinear trend component data need further analysis and research, so that the dependency relationship between multidimensional characteristics of the power supply cost data and the nonlinear trend component data is analyzed through the LSTNet model, the multidimensional characteristics of the power supply cost data can be multiple dimensions such as power generation distribution, power supply industry distribution, power generation power of multiple power generation types, date and the like, each dimension can be divided into a plurality of sub-dimensions, for example, the sub-dimension of the power generation distribution can be divided according to regions and time periods, the sub-dimension of the date can be divided into week attributes, solar calendar month attributes, lunar calendar month attributes and the like, the LSTNet model can find the most consistent dependency relationship, and the result of the nonlinear trend component data can be predicted through the multidimensional characteristics in the prediction process, and the power supply cost can be accurately predicted by combining seasonal component data and holiday component data.
As a preference, the first and second liquid crystal compositions are,the data cleaning comprises cleaning of abnormal value data, the method for cleaning the abnormal value data comprises the following steps of firstly judging the positive and negative conditions of the data, if the data is a negative value, judging that the data is abnormal and deleting, and if the data is a positive value, detecting and processing the abnormal value in the data through quantiles: calculating a first quartile Q1 and a third quartile Q3, the outliers being data points that lie outside the range of quartiles
Figure 103039DEST_PATH_IMAGE001
The calculation method is as follows:
Figure 285759DEST_PATH_IMAGE002
wherein
Figure 206441DEST_PATH_IMAGE003
Figure 867230DEST_PATH_IMAGE004
When data exceeding the upper limit of the abnormal value is replaced with the upper limit of the abnormal value, data below the lower limit of the abnormal value is replaced with the lower limit of the abnormal value.
Preferably, the data cleaning in step 1 includes missing value data cleaning, and the method for missing value data cleaning is to first find power supply cost data of several days adjacent to a missing value, construct a fitting curve according to the data of several days, then determine a fitting numerical value of a date corresponding to the missing value on the fitting curve, where the fitting numerical value is power supply cost data matched with the missing value, and supplement the missing value in the historical daily power supply cost data.
Preferably, the supplemental missing value is also verified by first determining date information corresponding to the missing value, determining data characteristics of the date information, and then look for power cost data for all other dates within a period of time that are the same as the data characteristic of the date information, and performing cluster analysis on the data with the same data characteristics, determining a cluster central point, calculating the Euclidean distance between the supplemented deficiency value and the central point, if the Euclidean distance is less than a set threshold value, determining that the supplemented missing value is credible, if the Euclidean distance is greater than or equal to the set threshold value, judging that the missing value is not credible, the amount of power cost data for days adjacent to the missing value is re-determined, until the supplemental missing value is determined to be trustworthy, or the number of times of redetermining the number of pieces of power supply cost data on several days adjacent to the deficiency value exceeds the set number of times.
The dimensionality of the cluster analysis is a plurality of subdata of the power supply cost, including labor cost, overhaul operation and maintenance cost, marketing operation and maintenance cost and other operation expenses, if the missing value is simply supplemented through a fitting curve, errors of data can be caused, and because the date of data of a certain date can be a special date, the actual data can be far away from the fitting data, therefore, the scheme considers the data characteristics of the date, such as week data, month data, holidays and the like, carries out the cluster analysis on the power supply cost of the same type of data characteristic date, and can judge whether the power supply cost data supplemented to the missing value through the fitting curve is similar to the power supply cost data of the same type of data characteristic date, so the scheme further improves the reliability of the data supplemented to the missing value data.
Preferably, the specific method for decomposing the power supply cost data into the nonlinear trend component data by the Prophet model is to use a logic growth model for modeling, and the modeling form is represented by the following formula:
Figure 681602DEST_PATH_IMAGE005
wherein C represents the carrying capacity, and wherein,krepresenting the growth rate, m represents the midpoint of the curve;
the Prophet model simulates a periodic effect model by a fourier series on seasonal and holiday component data.
Preferably, the LSTNet model specifically includes:
a one-dimensional convolution component: the method is realized by using a Conv1D function, the short-term characteristics of a time sequence are extracted by using a convolutional layer, and meanwhile, a short-term mode between multi-dimensional variables, namely the local dependency relationship of a time dimension, is captured;
a circulating component: using an LSTM function to realize, capturing the time dependence of data;
cycling skips the assembly: the input data dimensionality is sorted by using the Lambda layer, data jumping short-period linkage is realized, then the data jumping short-period linkage is input into the LSTM layer, Skip-LSTM process is realized, longer-term information is captured, and the periodicity of the sequence is fully utilized;
an autoregressive component: jumping the data to short period link by using a Lambda layer to eliminate periodicity, and simulating an autoregressive process by a Dense layer;
attention mechanism assembly: the method is realized by using a Dense layer based on a Softmax activation function, and an attention mechanism is used for deciding which dimensions play a key role in the nonlinear trend component so as to realize the aim of different importance of dimension weights.
Preferably, the step 4 specifically comprises: inputting the multidimensional characteristics of the power supply cost data as parameters into an LSTNet model, taking nonlinear trend component data as an output result of the LSTNet model, training the LSTNet model, dividing the nonlinear trend component data into training data, verification data and test data, wherein the data volume ratio of each part in the time dimension is 3:1:1, the LSTNet model searches for the dependency relationship between the multidimensional characteristics of the power supply cost data and the nonlinear trend component data through the training data, then verifies the accuracy of the dependency relationship by using the verification data, then tests the error between the output result of the LSTNet model and the real data through the test data, if the error is smaller than a set threshold value, the LSTNet model is judged to be trained successfully, if the error is larger than the set threshold value, the LSTNet model is judged to be trained unsuccessfully, the step 3 is returned to carry out feature engineering construction again, mining and analyzing the related information of the power supply cost, and obtaining multi-dimensional characteristics of the power supply cost data.
Preferably, the method for analyzing and predicting the power supply cost based on the Prophet-LSTNet combined model further comprises a step 6 of verifying the prediction result, comparing and verifying the prediction result with the actual power supply cost data, and if the daily difference between the prediction result and the actual power supply cost data in the set date is always smaller than the set error value, or the sum of all variances of the prediction result and the actual power supply cost data in the set date is smaller than the set value, the prediction result is correct, otherwise, the prediction result is wrong.
Preferably, if the prediction result is judged to be wrong, an error correction substep is executed, one subdata in the power supply cost data is removed, the step 1 to the step 5 are executed again, and if the prediction result is correct again, the subdata is judged to be abnormal and fed back to related workers; and performing independent traversal and elimination on all the subdata until all the subdata with abnormity is found out. The method comprises the steps that a plurality of subdata of power supply cost are analyzed, wherein the subdata comprises labor cost, overhaul operation and maintenance cost, marketing operation and maintenance cost and other operation expenses, if a prediction result is wrong, which subdata of the power supply cost is abnormal and causes the wrong prediction result can be analyzed, the subdata of the power supply cost can be possibly changed due to special conditions, the change cannot be predicted by the prediction method or other prediction methods in the scheme, but abnormal subdata can be found out through the scheme, and the subdata can be subsequently re-entered for supplementary training or abandoned, so that the prediction method in the scheme is more accurate.
The invention has the beneficial effects that: compared with the simple power supply station operation and maintenance cost curve trend prediction method based on the Prophet-LSTNet combined model, the power supply cost analysis prediction method based on the Prophet-LSTNet combined model is more accurate and reasonable, and the accuracy is improved; compared with the traditional statistical model, the built deep learning model can be subjected to less manual intervention, the result is more robust, the model is more suitable for big data, valuable feature combinations can be automatically learned and extracted, and the rule of the sales market is favorably found.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a comparison of the predicted results of the present invention versus the daily predicted results of the comparison model;
FIG. 3 is a schematic diagram of a cyclic analysis method of the Prophet model;
fig. 4 is a schematic diagram of an LSTNet network structure.
Detailed Description
The invention is further described below with reference to the figures and examples.
Example (b):
a power supply cost analysis and prediction method based on a Prophet-LSTNet combined model is shown in figure 1 and comprises the following steps:
step 1, acquiring historical daily power supply cost data within a period of time, performing data cleaning on the power supply cost data, and inputting the power supply cost data after the data cleaning is completed into a Prophet model;
step 2, decomposing the power supply cost data into nonlinear trend component data, seasonal component data and holiday component data through a Prophet model;
step 3, carrying out characteristic engineering construction, and mining and analyzing the power supply cost related information to obtain multidimensional characteristics of the power supply cost data;
step 4, inputting the multidimensional characteristics of the power supply cost data into an LSTNet model as parameters, inputting nonlinear trend component data into the LSTNet model, training the LSTNet model, searching the dependency relationship between the multidimensional characteristics of the power supply cost data and the nonlinear trend component data, and training and alternately updating the LSTNet model and the nonlinear trend component data weight parameters;
step 5, predicting the power supply cost by combining the LSTNet model, the seasonal component data and the holiday component data to obtain a prediction result;
and 6, verifying the prediction result, comparing the prediction result with the actual power supply cost data, and verifying, wherein if the daily difference between the prediction result and the actual power supply cost data in a set date is always smaller than a set error value, or the sum of all variances between the prediction result and the actual power supply cost data in the set date is smaller than a set value, the prediction result is correct, otherwise, the prediction result is wrong.
The Prophet model is an algorithm model aiming at time sequence prediction, the algorithm principle is to decompose data into nonlinear trend components, seasonal components of week (day) and holiday components so as to predict sequences, compared with other models, the Prophet model has stronger robustness for missing data, abnormal data and change trend, the Prophet model is better to be performed on sequences with multi-seasonality and obvious holiday effect, the model has enough flexibility for various service time sequences, and can be configured by non-experts with little knowledge about data generation process and time sequence model, and fig. 3 summarizes a 'cycle analysis' method for performing service prediction by using the model: firstly, modeling is carried out on a time sequence, each parameter in the model has intuitive manual explanation, then, prediction is generated according to the model, a group of reasonable baselines are generated in various historical simulation prediction dates, the prediction effect is evaluated, when the effect is poor or manual intervention is needed in other aspects of prediction, potential problems are marked to an analyst according to the sequence, then, the analyst can check the prediction process, and the model is adjusted according to the feedback.
Long-and Short-term Time-series network (LSTNet) is specially designed as a deep learning network for Time series prediction, and the network uses a convolutional layer (CNN), a recurrent neural network (LSTM), a Skip recurrent neural network (Skip I LSTM) and an autoregressive mechanism, and has a structure shown in FIG. 4. LSTNet exploits the advantages of convolutional layers to find patterns of dependencies between local multidimensional input variables and cyclic layers to capture complex long-term dependencies, captures very long-term dependency patterns through a novel recursive structure (i.e., recursive jumps), and exploits the periodicity of the input time series signals to simplify the optimization process, and finally, LSTNet incorporates a traditional autoregressive linear model in parallel with a nonlinear neural network part, which makes the nonlinear deep learning model more robust against time series violating scale variations.
In the scheme, firstly, time series decomposition is carried out on power supply cost data through a Prophet model, wherein seasonal component data and holiday component data are periodic variation data which are stable, and nonlinear trend component data need further analysis and research, so that the dependency relationship between multidimensional characteristics of the power supply cost data and the nonlinear trend component data is analyzed through the LSTNet model, the multidimensional characteristics of the power supply cost data can be multiple dimensions such as power generation distribution, power supply industry distribution, power generation power of multiple power generation types, date and the like, each dimension can be divided into a plurality of sub-dimensions, for example, the sub-dimension of the power generation distribution can be divided according to regions and time periods, the sub-dimension of the date can be divided into week attributes, solar calendar month attributes, lunar calendar month attributes and the like, the LSTNet model can find the most consistent dependency relationship, and the result of the nonlinear trend component data can be predicted through the multidimensional characteristics in the prediction process, the seasonal component data and the holiday component data are obtained through the Prophet model decomposition, and are stable and periodically changed data, so that the seasonal component data and the holiday component data can be quickly obtained according to the result of the periodic change, and therefore the power supply cost can be accurately predicted by accumulating the result of the nonlinear trend component data predicted through the LSTNet model with the seasonal component data and the holiday component data.
The data cleaning comprises cleaning of abnormal value data, the method for cleaning the abnormal value data comprises the following steps of firstly judging the positive and negative conditions of the data, if the data is a negative value, judging that the data is abnormal and deleting, and if the data is a positive value, detecting and processing the abnormal value in the data through quantiles: the first quartile Q1 and the third quartile Q3 are calculated, and the outliers are data points that lie outside the range of the quartile, as follows:
Figure 769644DEST_PATH_IMAGE006
wherein
Figure 302256DEST_PATH_IMAGE007
Figure 907681DEST_PATH_IMAGE008
When data exceeding the upper limit of the abnormal value is replaced with the upper limit of the abnormal value, data below the lower limit of the abnormal value is replaced with the lower limit of the abnormal value.
The data cleaning in the step 1 comprises missing value data cleaning, and the method for cleaning the missing value data comprises the steps of firstly searching power supply cost data of a plurality of days with adjacent missing values, constructing a fitting curve according to the data of the plurality of days, then determining a fitting numerical value of the date corresponding to the missing value on the fitting curve, wherein the fitting numerical value is the power supply cost data matched with the missing value, and supplementing the missing value in the historical daily power supply cost data.
And verifying the supplemented missing value, which comprises the steps of firstly determining date information corresponding to the missing value, determining data characteristics of the date information, then searching power supply cost data of other dates with the same data characteristics as the date information within a period of time, carrying out cluster analysis on the data with the same data characteristics, determining a cluster center point, calculating Euclidean distances between the supplemented missing value and the center point, if the Euclidean distances are smaller than a set threshold value, determining that the supplemented missing value is credible, if the Euclidean distances are larger than or equal to the set threshold value, judging that the missed missing value is not credible, re-determining the quantity of the power supply cost data of a plurality of days adjacent to the missing value until the supplemented missing value is determined to be credible, or re-determining that the quantity of the power supply cost data of a plurality of days adjacent to the missing value exceeds the set quantity, the dimensionality of the cluster analysis is a plurality of subdata of the power supply cost, including labor cost, overhaul operation and maintenance cost, marketing operation and maintenance cost and other operation expenses, if the missing value is simply supplemented through a fitting curve, errors of data can be caused, and because the date of data of a certain date can be a special date, the actual data can be far away from the fitting data, therefore, the scheme considers the data characteristics of the date, such as week data, month data, holidays and the like, carries out the cluster analysis on the power supply cost of the same type of data characteristic date, and can judge whether the power supply cost data supplemented to the missing value through the fitting curve is similar to the power supply cost data of the same type of data characteristic date, so the scheme further improves the reliability of the data supplemented to the missing value data.
The specific method for decomposing the power supply cost data into the nonlinear trend component data by the Prophet model is to use a logic growth model for modeling, and the modeling form is represented by the following formula:
Figure 576560DEST_PATH_IMAGE005
wherein C represents the carrying capacity, k represents the growth rate, and m represents the midpoint of the curve;
the Prophet model simulates a periodic effect model by a Fourier series on seasonal component data and holiday component data, the period of the seasonal component data and holiday component data is P, the annual data P =365.25, the weekly data P =7, and the Fourier series of the following formula is used for approximating an arbitrary smooth seasonal effect and holiday effect:
Figure 835503DEST_PATH_IMAGE009
the LSTNet model specifically comprises the following steps:
a one-dimensional convolution component: the method is realized by using a Conv1D function, the short-term characteristics of a time sequence are extracted by using a convolutional layer, and meanwhile, a short-term mode between multi-dimensional variables, namely the local dependency relationship of a time dimension, is captured;
a circulating component: using an LSTM function to realize, capturing the time dependence of data;
cycling skips the assembly: the input data dimensionality is sorted by using the Lambda layer, data jumping short-period linkage is realized, then the data jumping short-period linkage is input into the LSTM layer, Skip-LSTM process is realized, longer-term information is captured, and the periodicity of the sequence is fully utilized;
an autoregressive component: jumping the data to short period link by using a Lambda layer to eliminate periodicity, and simulating an autoregressive process by a Dense layer;
attention mechanism assembly: the method is realized by using a Dense layer based on a Softmax activation function, and an attention mechanism is used for deciding which dimensions play a key role in the nonlinear trend component so as to realize the aim of different importance of dimension weights.
The step 4 is specifically as follows: inputting the multidimensional characteristics of the power supply cost data as parameters into an LSTNet model, taking nonlinear trend component data as an output result of the LSTNet model, training the LSTNet model, dividing the nonlinear trend component data into training data, verification data and test data, wherein the data volume ratio of each part in the time dimension is 3:1:1, the LSTNet model searches for the dependency relationship between the multidimensional characteristics of the power supply cost data and the nonlinear trend component data through the training data, then verifies the accuracy of the dependency relationship by using the verification data, then tests the error between the output result of the LSTNet model and the real data through the test data, if the error is smaller than a set threshold value, the LSTNet model is judged to be trained successfully, if the error is larger than the set threshold value, the LSTNet model is judged to be trained unsuccessfully, the step 3 is returned to carry out feature engineering construction again, mining and analyzing the related information of the power supply cost, and obtaining multi-dimensional characteristics of the power supply cost data.
In step 6, if the prediction result is judged to be wrong, executing an error correction substep, removing a certain subdata in the power supply cost data, and executing the steps 1 to 5 again, if the prediction result is correct again, judging that the subdata is abnormal and feeding back the data to related workers; and performing independent traversal and elimination on all the subdata until all the subdata with abnormity is found out.
The method comprises the steps that a plurality of subdata of power supply cost are analyzed, wherein the subdata comprises labor cost, overhaul operation and maintenance cost, marketing operation and maintenance cost and other operation expenses, if a prediction result is wrong, which subdata of the power supply cost is abnormal and causes the wrong prediction result can be analyzed, the subdata of the power supply cost can be possibly changed due to special conditions, the change cannot be predicted by the prediction method or other prediction methods in the scheme, but abnormal subdata can be found out through the scheme, and the subdata can be subsequently re-entered for supplementary training or abandoned, so that the prediction method in the scheme is more accurate.
The scheme is verified by combining actual data, because the data volume of the power supply cost is insufficient, daily cash flow and daily electricity flow data of a high-voltage user in Zhejiang province are used as experimental data, wherein the daily cash flow is equal to the power supply cost data, the daily electricity flow data is equal to the multi-dimensional characteristics of the power supply cost data, the data structure of an input LSTNet model is shown in Table 1 and has 35 dimensions in total, and the data structure comprises 1-dimensional daily cash flow sum estimation (namely nonlinear trend component data), 29-dimensional user attribute characteristics and 5-dimensional date-time characteristics which are sequentially spliced. The user features comprise four classes of classified variable features:
and (3) evaluating distribution: the payment time is within the minimum to maximum account arrival date closing interval and is marked as 0; if the minimum arrival date is less than the minimum arrival date, recording as 1; if the current account date is larger than the maximum account arrival date, recording as 2, and accordingly evaluating the account entry capacity condition of the power system;
and (3) industrial distribution: the high-voltage users share nine major industries, and according to a preset sequence, the characteristic of a certain user is defined as a serial number of the industries in the sequence, and the value is 0, 1.., 8;
distribution of electricity consumption: in order to investigate the distribution of the power consumption of a user group corresponding to the current day of the account, the previous month of the current day of the account is taken as a date label of 'year and month', the power consumption of the corresponding user is found, and the power consumption is divided into nine intervals according to the quantity of the power consumption, so that the characteristic value of the user is defined to be 0, 1.. 8;
and (3) payment amount distribution: dividing the payment amount of the user into eight intervals, so as to define the value of the characteristic of the user to be 0, 1.., 7;
the user characteristic estimation scheme combines multi-mode information of monthly power consumption and daily cash flow, and the effectiveness and the regularity of sample information are guaranteed to the greatest extent; since seasons, weekdays, and holidays, as well as traditional holidays, can have a significant impact on power usage, the time signature must be incorporated into the data structure, requiring normalization for each dimension of the time signature.
TABLE 1 data Structure
Figure 855411DEST_PATH_IMAGE010
The LSTNet network model training and alternately updating the weight parameters of the LSTNet network model comprises the following steps:
dividing a training set Tr and a test set Tx according to the historical standard cost data volume in a ratio of 3: 1;
traversing the hyper-parameters of the LSTNet network model by using a grid method; the super parameters comprise current days, network layer number, learning rate, iterative algorithm and iterative times;
and after traversing, recording the optimal hyper-parameter and establishing an optimal model.
The day cash flow and daily electricity flow data of the high-voltage users from 1/2013 to 4/31/2019 in Zhejiang province are used as training data sets, and the data from 5/1/2019 to 6/2019/30 are used as test data sets. The loss function in neural networks herein is the Mean Absolute Error (MAE), i.e.
Figure 554658DEST_PATH_IMAGE011
Wherein
Figure 343622DEST_PATH_IMAGE013
,
Figure 304625DEST_PATH_IMAGE015
(i = 1.. and n.) represent the predicted and true values, respectively, for day i, with n being the number of test set days.
The experimental environment GPU is GTX 1080, the CPU is Intel core i5, 2.6GHz, the memory is 8GB, the software platform is Tensorflow, and in order to illustrate the effectiveness of the model, the model and the traditional STL, LSTM and LSTNet models are used as comparison experiments.
FIG. 2 is a comparison between the prediction result of the present scheme and the daily prediction result of the comparison model, and it can be seen that the true value has 4 to 5 peaks and valleys each month, but it is not precise to take one week as one period, and it is difficult to predict the true value; the prediction of the comparison model at the peak value and the peak valley can generate deviation, which shows that the daily prediction error is larger; compared with the prediction result of the model, the prediction result is closer to the true value, the precision is higher, the trend prediction is more accurate, and particularly the peak value prediction is more accurate.
In order to more intuitively evaluate the error of the predicted value and the true value, the prediction performance of the model is evaluated by adopting weighted average absolute error percentage (WMAPE) and correlation Coefficient (CORR),
Figure 546250DEST_PATH_IMAGE016
Figure 962319DEST_PATH_IMAGE017
wherein
Figure DEST_PATH_IMAGE018
,
Figure 136949DEST_PATH_IMAGE019
The predicted and true values on day i are indicated, respectively.
The following table shows the comparison of the predicted result indexes of the models
Figure DEST_PATH_IMAGE020
The prediction result, the true value MAE and the WMAPE index are minimum, the correlation index CORR is maximum, and the prediction result, the true value MAE and the WMAPE index are remarkably improved compared with other models, so that the superiority of the scheme on the prediction problem of the daily cash flow of the power grid is proved.
The above-described embodiments are only preferred embodiments of the present invention, and are not intended to limit the present invention in any way, and other variations and modifications may be made without departing from the spirit of the invention as set forth in the claims.

Claims (9)

1. The power supply cost analysis and prediction method based on the Prophet-LSTNet combined model is characterized by comprising the following steps of:
step 1, acquiring historical daily power supply cost data within a period of time, performing data cleaning on the power supply cost data, and inputting the power supply cost data after the data cleaning is completed into a Prophet model;
step 2, decomposing the power supply cost data into nonlinear trend component data, seasonal component data and holiday component data through a Prophet model;
step 3, carrying out characteristic engineering construction, and mining and analyzing the power supply cost related information to obtain multidimensional characteristics of the power supply cost data;
step 4, inputting the multidimensional characteristics of the power supply cost data into an LSTNet model as parameters, inputting nonlinear trend component data into the LSTNet model, training the LSTNet model, searching the dependency relationship between the multidimensional characteristics of the power supply cost data and the nonlinear trend component data, and training and alternately updating the LSTNet model and the nonlinear trend component data weight parameters;
and 5, predicting the power supply cost by combining the LSTNet model, the seasonal component data and the holiday component data to obtain a prediction result.
2. The Prophet-LSTNet combined model-based power supply cost analysis and prediction method of claim 1, wherein the data cleansing includes an abnormal value data cleansing, the abnormal value data cleansing is a method of first determining whether the data is positive or negative, if the data is negative, determining that the data is abnormal and deleting, and if the data is positive, detecting the abnormal value in the processed data by quantiles: calculating a first quartile Q1 and a third quartile Q3, the outliers being data points that lie outside the range of quartiles
Figure 459439DEST_PATH_IMAGE001
The calculation method is as follows:
Figure 111000DEST_PATH_IMAGE002
wherein
Figure 625158DEST_PATH_IMAGE003
Figure 613842DEST_PATH_IMAGE004
When data exceeding the upper limit of the abnormal value is replaced with the upper limit of the abnormal value, data below the lower limit of the abnormal value is replaced with the lower limit of the abnormal value.
3. The Prophet-LSTNet combined model-based power supply cost analysis and prediction method according to claim 1, wherein the data cleaning in step 1 includes missing value data cleaning, and the missing value data cleaning is performed by first searching power supply cost data of several days adjacent to a missing value, constructing a fitting curve according to the data of several days, then determining a fitting value of a date corresponding to the missing value on the fitting curve, wherein the fitting value is power supply cost data matched with the missing value, and supplementing the missing value in historical daily power supply cost data.
4. The Prophet-LSTNet combined model-based power supply cost analysis and prediction method according to claim 3, wherein the supplemented missing value is verified by first determining date information corresponding to the missing value, determining data characteristics of the date information, searching for power supply cost data of other dates having the same data characteristics as the date information within a period of time, performing cluster analysis on the data having the same data characteristics, determining a cluster center point, calculating a euclidean distance between the supplemented missing value and the center point, determining that the supplemented missing value is reliable if the euclidean distance is less than a set threshold, determining that the missing value is not reliable if the euclidean distance is greater than or equal to the set threshold, re-determining the number of power supply cost data of days adjacent to the missing value until the supplemented missing value is determined, or the number of times of redetermining the number of pieces of power supply cost data on several days adjacent to the deficiency value exceeds the set number of times.
5. The method for analyzing and predicting the power supply cost based on the Prophet-LSTNet combined model as claimed in claim 1, wherein the specific method for decomposing the power supply cost data into the nonlinear trend component data by the Prophet model is to use a logic growth model for modeling, and the modeling form is represented by the following formula:
Figure 162635DEST_PATH_IMAGE005
wherein C represents the carrying capacity, and wherein,krepresenting the growth rate, m represents the midpoint of the curve;
the Prophet model simulates a periodic effect model by a fourier series on seasonal and holiday component data.
6. The method for analyzing and predicting power supply cost based on the Prophet-LSTNet combined model of claim 1, wherein the LSTNet model specifically comprises:
a one-dimensional convolution component: the method is realized by using a Conv1D function, the short-term characteristics of a time sequence are extracted by using a convolutional layer, and meanwhile, a short-term mode between multi-dimensional variables, namely the local dependency relationship of a time dimension, is captured;
a circulating component: using an LSTM function to realize, capturing the time dependence of data;
a cycle skip component: the input data dimensionality is sorted by using the Lambda layer, data Skip short-period linkage is realized, then the data Skip short-period linkage is input into the LSTM layer, the Skip-LSTM process is realized, longer-term information is captured, and the periodicity of the sequence is fully utilized;
an autoregressive component: jumping the data to short period link by using a Lambda layer to eliminate periodicity, and simulating an autoregressive process by a Dense layer;
attention mechanism assembly: the method is realized by using a Dense layer based on a Softmax activation function, and an attention mechanism is used for deciding which dimensions play a key role in the nonlinear trend component so as to realize the aim of different importance of dimension weights.
7. The Prophet-LSTNet combination model-based power supply cost analysis and prediction method according to claim 1, wherein the step 4 specifically comprises: inputting the multidimensional characteristics of the power supply cost data as parameters into an LSTNet model, taking nonlinear trend component data as an output result of the LSTNet model, training the LSTNet model, dividing the nonlinear trend component data into training data, verification data and test data, wherein the data volume ratio of each part in the time dimension is 3:1:1, the LSTNet model searches for the dependency relationship between the multidimensional characteristics of the power supply cost data and the nonlinear trend component data through the training data, then verifies the accuracy of the dependency relationship by using the verification data, then tests the error between the output result of the LSTNet model and the real data through the test data, if the error is smaller than a set threshold value, the LSTNet model is judged to be trained successfully, if the error is larger than the set threshold value, the LSTNet model is judged to be trained unsuccessfully, the step 3 is returned to carry out feature engineering construction again, mining and analyzing the related information of the power supply cost, and obtaining multi-dimensional characteristics of the power supply cost data.
8. The Prophet-LSTNet combination model-based power supply cost analysis and prediction method of claim 1, further comprising a step 6 of verifying the prediction result, comparing the prediction result with the actual power supply cost data, and verifying that the prediction result is correct if the daily difference between the prediction result and the actual power supply cost data in the set date is always smaller than the set error value or the sum of all variances of the prediction result and the actual power supply cost data in the set date is smaller than the set value, and otherwise, the prediction result is incorrect.
9. The Prophet-LSTNet combination model-based power supply cost analysis and prediction method of claim 8, wherein if the prediction result is judged to be wrong, the error correction substep is executed, one subdata in the power supply cost data is removed, the steps 1 to 5 are executed again, and if the prediction result is correct again, the subdata is judged to be abnormal and fed back to related workers; and performing independent traversal and elimination on all the subdata until all the subdata with abnormity is found out.
CN202110791617.XA 2021-07-13 2021-07-13 Power supply cost analysis and prediction method based on Prophet-LSTNet combined model Active CN113256036B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110791617.XA CN113256036B (en) 2021-07-13 2021-07-13 Power supply cost analysis and prediction method based on Prophet-LSTNet combined model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110791617.XA CN113256036B (en) 2021-07-13 2021-07-13 Power supply cost analysis and prediction method based on Prophet-LSTNet combined model

Publications (2)

Publication Number Publication Date
CN113256036A true CN113256036A (en) 2021-08-13
CN113256036B CN113256036B (en) 2021-10-12

Family

ID=77191168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110791617.XA Active CN113256036B (en) 2021-07-13 2021-07-13 Power supply cost analysis and prediction method based on Prophet-LSTNet combined model

Country Status (1)

Country Link
CN (1) CN113256036B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116205330A (en) * 2022-12-13 2023-06-02 国网浙江省电力有限公司营销服务中心 Enterprise power consumption prediction method based on enterprise power consumption data
CN116545954A (en) * 2023-07-06 2023-08-04 浙江赫斯电气有限公司 Communication gateway data transmission method and system based on Internet of things

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897941A (en) * 2017-01-03 2017-06-27 北京国能日新系统控制技术有限公司 A kind of blower fan method for processing abnormal data and device based on quartile box traction substation
CN109993370A (en) * 2019-04-10 2019-07-09 国网浙江省电力有限公司 A kind of electric power sale day cash flow projections method based on nonstationary time series
CN111008757A (en) * 2019-11-09 2020-04-14 苏州浪潮智能科技有限公司 Parameter setting method and device for predicting SSD life based on Prophet model
US20210004035A1 (en) * 2019-07-02 2021-01-07 Microsoft Technology Licensing, Llc Datacenter stabilization of regional power grids
CN113052214A (en) * 2021-03-14 2021-06-29 北京工业大学 Heat exchange station ultra-short term heat load prediction method based on long and short term time series network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897941A (en) * 2017-01-03 2017-06-27 北京国能日新系统控制技术有限公司 A kind of blower fan method for processing abnormal data and device based on quartile box traction substation
CN109993370A (en) * 2019-04-10 2019-07-09 国网浙江省电力有限公司 A kind of electric power sale day cash flow projections method based on nonstationary time series
US20210004035A1 (en) * 2019-07-02 2021-01-07 Microsoft Technology Licensing, Llc Datacenter stabilization of regional power grids
CN111008757A (en) * 2019-11-09 2020-04-14 苏州浪潮智能科技有限公司 Parameter setting method and device for predicting SSD life based on Prophet model
CN113052214A (en) * 2021-03-14 2021-06-29 北京工业大学 Heat exchange station ultra-short term heat load prediction method based on long and short term time series network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GUOKUN LAI ET AL: "Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks", 《ARXIV:1703.07015V3》 *
赵英等: "基于LSTM-Prophet非线性组合的时间序列预测模型", 《计算机与现代化》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116205330A (en) * 2022-12-13 2023-06-02 国网浙江省电力有限公司营销服务中心 Enterprise power consumption prediction method based on enterprise power consumption data
CN116545954A (en) * 2023-07-06 2023-08-04 浙江赫斯电气有限公司 Communication gateway data transmission method and system based on Internet of things
CN116545954B (en) * 2023-07-06 2023-08-29 浙江赫斯电气有限公司 Communication gateway data transmission method and system based on internet of things

Also Published As

Publication number Publication date
CN113256036B (en) 2021-10-12

Similar Documents

Publication Publication Date Title
Zhai et al. Enabling predictive maintenance integrated production scheduling by operation-specific health prognostics with generative deep learning
CN109587713B (en) Network index prediction method and device based on ARIMA model and storage medium
CN113256036B (en) Power supply cost analysis and prediction method based on Prophet-LSTNet combined model
CN109308571B (en) Distribution line variable relation detection method
CN113298288A (en) Power supply station operation and maintenance cost prediction method integrating time sequence and neural network
CN113256020B (en) Power supply system cost prediction method considering multi-scale time sequence
CN110956309A (en) Flow activity prediction method based on CRF and LSTM
CN114048436A (en) Construction method and construction device for forecasting enterprise financial data model
Liu et al. Petroleum production forecasting based on machine learning
CN115542429A (en) XGboost-based ozone quality prediction method and system
CN115238573A (en) Hydroelectric generating set performance degradation trend prediction method and system considering working condition parameters
CN113449919B (en) Power consumption prediction method and system based on feature and trend perception
CN110852496A (en) Natural gas load prediction method based on LSTM recurrent neural network
WO2022222230A1 (en) Indicator prediction method and apparatus based on machine learning, and device and storage medium
CN109754186B (en) Probability calculation method and device based on energy consumption analysis
Wibawa et al. Bidirectional Long Short-Term Memory (Bi-LSTM) Hourly Energy Forecasting
CN117593101B (en) Financial risk data processing and analyzing method and system based on multidimensional data
Turkmen et al. Intermittent demand forecasting with renewal processes
CN111079842A (en) Simulation generation method of time series structure data
Friederich et al. A Framework for Validating Data-Driven Discrete-Event Simulation Models of Cyber-Physical Production Systems
CN116542380B (en) Power plant supply chain carbon footprint optimization method and device based on natural language
Haiyan et al. Abnormal Financial Big Data Based on Deep Neural Network Recognition Methods
CN116108963A (en) Electric power carbon emission prediction method and equipment based on integrated learning module
CN117436911A (en) Bagging-based power transmission and transformation project cost prediction method and terminal
Vishwakarma et al. House Price Forecasting Based on Hybrid Multi-regression Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant