CN114626640A - Natural gas load prediction method and system based on characteristic engineering and LSTM neural network - Google Patents

Natural gas load prediction method and system based on characteristic engineering and LSTM neural network Download PDF

Info

Publication number
CN114626640A
CN114626640A CN202210452587.4A CN202210452587A CN114626640A CN 114626640 A CN114626640 A CN 114626640A CN 202210452587 A CN202210452587 A CN 202210452587A CN 114626640 A CN114626640 A CN 114626640A
Authority
CN
China
Prior art keywords
natural gas
gas load
characteristic
load prediction
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210452587.4A
Other languages
Chinese (zh)
Inventor
边根庆
周妮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Architecture and Technology
Original Assignee
Xian University of Architecture and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Architecture and Technology filed Critical Xian University of Architecture and Technology
Priority to CN202210452587.4A priority Critical patent/CN114626640A/en
Publication of CN114626640A publication Critical patent/CN114626640A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Marketing (AREA)
  • Biophysics (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a natural gas load prediction method and a system based on characteristic engineering and an LSTM neural network, wherein the method comprises the following processes: preprocessing the acquired natural gas data set, removing default values and abnormal values, and processing the natural gas data set according to time characteristics to obtain a natural gas load data set; screening characteristic factors which possibly influence the natural gas load prediction by utilizing characteristic engineering to obtain the characteristic factors which influence the natural gas load prediction; screening the data of the natural gas consumption load data set by using the characteristic factors, and selecting a natural gas consumption load data set corresponding to the characteristic factors influencing natural gas load prediction to obtain a characteristic data set; and processing the characteristic data set through a pre-established multi-characteristic LSTM model to obtain a natural gas load prediction value, so as to realize natural gas load prediction. The method can predict the natural gas load more accurately.

Description

Natural gas load prediction method and system based on feature engineering and LSTM neural network
Technical Field
The invention belongs to the technical field of prediction based on a machine learning model, and relates to a natural gas load prediction method and a natural gas load prediction system based on feature engineering and an LSTM neural network.
Background
As a clean and efficient energy source, the natural gas not only has lower greenhouse gas, sulfur dioxide and particulate matter emission than coal and petroleum, but also can effectively make up for the defects that wind energy and solar energy are not easy to store and unstable in supply. The countries in the world develop low-carbon economy, clean energy utilization and reduce atmospheric emission. Under the large background of transformation of global energy consumption structures, natural gas is favored and valued by all countries in the world, the large trend of replacing high-carbon and high-pollution coal by natural gas is irreversible, and the natural gas plays a more important role in global economic development and energy consumption structures.
The traditional natural gas load prediction method mainly comprises a time sequence method, a regression analysis method, a grey prediction method and the like, wherein the method is mainly based on mathematical theories such as calculus and mathematical statistics and establishes a mathematical relation among existing data through mathematical derivation, and then relevant calculation and analysis prediction are carried out on future data according to the mathematical relation. Although the method has the advantages of simple modeling, economy, applicability and the like, the method is generally suitable for occasions with small sample size, few influence factors and relatively simple association relation, and has poor accuracy in natural gas load prediction.
Disclosure of Invention
The invention aims to solve the problems in the prior art and provides a natural gas load prediction method and system based on characteristic engineering and an LSTM neural network.
The technical scheme adopted by the invention is as follows:
the natural gas load prediction method based on the characteristic engineering and the LSTM neural network comprises the following processes:
preprocessing the acquired natural gas data set, removing default values and abnormal values, and processing the natural gas data set according to time characteristics to obtain a natural gas load data set;
screening characteristic factors which possibly influence the natural gas load prediction by utilizing characteristic engineering to obtain the characteristic factors which influence the natural gas load prediction;
screening the data of the natural gas load data set by using the characteristic factors, and selecting the natural gas load data set corresponding to the characteristic factors influencing natural gas load prediction to obtain a characteristic data set;
and processing the characteristic data set through a pre-established multi-characteristic LSTM model to obtain a natural gas load predicted value, so as to realize natural gas load prediction.
Preferably, the natural gas data set is processed into ultra-short-term data, short-term data and medium-term data according to the time characteristics, so as to obtain the natural gas consumption load data set.
Preferably, the ultra-short term data is hourly data, the short term data is daily data and the medium term data is monthly data.
Preferably, when the characteristic factors possibly influencing the natural gas load prediction are screened by utilizing the characteristic engineering, the characteristic factors possibly influencing the natural gas load prediction are screened by utilizing a variance selection method, a correlation coefficient method, a mutual information method, a CART (decision tree based on) selection method and an SVR-RFECV-based characteristic selection method, and the characteristic factors influencing the natural gas load prediction are screened according to a screening result.
Preferably, after the feature factors possibly influencing the natural gas load prediction are respectively processed by using a variance selection method, a correlation coefficient method, a mutual information method, a decision tree CART-based selection method and an SVR-RFECV-based feature selection method, common feature factors screened by the variance selection method, the correlation coefficient method, the mutual information method, the decision tree CART-based selection method and the SVR-RFECV-based feature selection method are used as the feature factors finally influencing the natural gas load prediction.
Preferably, the characteristic factors influencing the natural gas load prediction comprise the highest weather temperature, the lowest weather temperature, the weather condition, the wind direction and the air index.
Preferably, the weather conditions include light rain-cloudy, light rain-rainfall, cloudy-cloudy, cloudy-light rain, light rain-heavy rain, heavy rain-light rain, light rain-sunny, cloudy-light rain, cloudy-rainfall, cloudy-snow-mixed rain, light rain-cloudy, cloudy-cloudy, light rain-medium rain, cloudy-sunny, cloudy-medium rain, sunny-cloudy, sunny-light rain, sleet-snow-cloudy, snow-medium-cloudy, cloudy-sunny, snow-medium-cloudy, snow-medium-snow-cloudy, rain-medium-heavy snow-light, heavy rain-cloudy, rain-light rain, sunny-cloudy, rain-medium-cloudy, sunny, light rain, cloudy, and medium rain; the wind directions include southeast wind, southwest wind, northeast wind, northwest wind, and air index, east wind, south wind, and west wind.
The invention also provides a natural gas load prediction system based on the characteristic engineering and the LSTM neural network, which comprises the following steps:
a data preprocessing module: the natural gas load processing system is used for preprocessing the acquired natural gas data set, eliminating default values and abnormal values, and then processing the natural gas data set according to time characteristics to obtain a natural gas load data set;
a first screening module: the method is used for screening characteristic factors which possibly influence the natural gas load prediction by utilizing characteristic engineering to obtain the characteristic factors which influence the natural gas load prediction;
a second screening module: the natural gas consumption load data set is used for screening the data of the natural gas consumption load data set by utilizing the characteristic factors, selecting the natural gas consumption load data set corresponding to the characteristic factors influencing the natural gas load prediction, and obtaining the characteristic data set;
a calculation module: and processing the characteristic data set through a pre-established multi-characteristic LSTM model to obtain a natural gas load prediction value, so as to realize natural gas load prediction.
The present invention also provides an electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method for natural gas load prediction based on feature engineering and LSTM neural networks of the present invention as described above.
The invention also provides a storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method for natural gas load prediction based on feature engineering and LSTM neural networks of the present invention as described above.
Compared with the prior art, the invention has the following beneficial effects:
the method introduces the characteristic engineering, and when the influence factors are screened, the influence factors are not screened manually, but screened through the characteristic engineering, so that the prediction efficiency is improved. And (4) bringing the screened feature set into an LSTM model for prediction. Compared with the traditional time sequence model, when the multi-feature LSTM is used for prediction, the multi-feature LSTM neural network can control the transmission of input data through the input gate, the forgetting gate and the output gate, and keep the independence of the output of the memory storage unit and the output of the result, so that the sequence can keep important information during transmission and keep longer-term memory for the sequence. Therefore, the multi-feature LSTM neural network can improve the load prediction accuracy in the prediction of the load time series.
Drawings
FIG. 1 is an overall framework diagram of the method for natural gas load prediction based on feature engineering and an LSTM neural network according to the present invention;
FIG. 2 is a technical route diagram of the natural gas load prediction method based on feature engineering and an LSTM neural network according to the present invention;
FIG. 3 is a diagram illustrating the results of a variance selection method in an embodiment of the present invention;
FIG. 4 is a graph of correlation coefficient method in an embodiment of the present invention;
FIG. 5 is a bar chart of the dependent variable and independent variable correlation coefficients in an embodiment of the present invention;
FIG. 6 is a CART selection method result diagram based on decision tree in the embodiment of the present invention;
FIG. 7 is a flow chart of a natural gas load prediction multi-feature LSTM in an embodiment of the present invention;
FIG. 8 is a single feature LSTM neural network training diagram;
FIG. 9 is a multi-feature LSTM model prediction graph of the present invention in a validation test;
FIG. 10 is a system requirements analysis diagram of the present invention;
FIG. 11 is a general architecture of the intelligent analysis and control system of the present invention;
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings:
the natural gas load prediction method based on the characteristic engineering and the LSTM neural network can select main influence factors of load prediction by screening characteristic variables, and then apply the characteristic variables to a multi-characteristic LSTM model through multi-characteristic variables, so that the accuracy of natural gas load prediction is improved. Therefore, the prediction accuracy is improved, and the natural gas can be better scheduled and distributed by a natural gas company. The invention constructs a natural gas load prediction model by researching characteristic engineering and a multi-characteristic LSTM algorithm, and uses the model to actually predict the natural gas load of a certain area in the future, thereby proving the mechanism, the framework and the effect of a machine learning algorithm.
The natural gas load prediction method based on the characteristic engineering and the LSTM neural network comprises the following steps:
1) preprocessing gas load data for natural gas: the default values and the abnormal values are removed, and then the natural gas data set is processed into ultra-short-term (hourly) data, short-term (daily) data and medium-term (monthly) data according to the time characteristics, so that subsequent prediction is facilitated;
2) the method comprises the steps of obtaining characteristic factors influencing natural gas load prediction, selecting the characteristics, eliminating irrelevant or redundant characteristics, reducing the number of effective characteristics, shortening model training time, effectively avoiding overfitting, improving the generalization capability of a model and further improving the prediction precision of the model;
3) the processed gas load data set for the natural gas is obtained through the step 1), and the characteristic data set influencing load prediction after characteristic engineering screening is obtained through the step 2). The obtained characteristic data set and the natural gas load data set are used for constructing a multi-characteristic LSTM model, training and predicting are carried out, and the advantages of the multi-characteristic LSTM model in natural gas load prediction are verified;
4) by utilizing the research results, a natural gas short-term load prediction prototype system is constructed, and is cooperated with natural gas storage and transportation enterprises, and the related technology and parameters of the natural gas short-term load prediction prototype system are further optimized from the aspects of accuracy, generalization, high efficiency and the like by combining the on-site verification result.
In the screening process, the invention uses five methods: variance selection method, mutual information method, correlation coefficient method, decision tree CART selection method and feature selection method based on SVR-RFECV. Through the five methods, the most influential characteristic factors are screened out.
Examples
As shown in fig. 1 and thorium 2, the natural gas load prediction method based on the characteristic engineering and the LSTM neural network of the embodiment includes the following steps:
1) preprocessing the obtained natural gas data set: the default value and the abnormal value are removed, so that subsequent prediction is facilitated;
the natural gas load data of the embodiment is from a certain provincial natural gas company, the recording time of the data set is from 31 days in the year 01 and 31 days in the year 2010 to 31 days in the year 12 and 31 days in the year 2020, each data interval is one day, namely 1 data point every day, and the data set has 3987 sample sizes in the case of data loss.
Because data given by a certain gas company is messy, the data needs to be processed before the experiment is carried out, and through data cleaning, data rechecking and verification, repeated information deletion, existing error correction and data consistency provision are carried out.
And after the natural gas load data is processed by default values and abnormal values, processing and dividing are carried out according to time characteristics so as to be used for subsequent prediction.
2) Acquiring characteristic factors influencing natural gas load prediction:
in addition to the historical load values, the data set also contains characteristics which may influence the load fluctuations, such as temperature, weather, wind direction, wind power and air quality. The characteristic data is obtained by crawling on the internet, the source of the characteristic data is a certain weather website, the weather data from 31.01.2010 to 31.2020 and 12.31.31 are crawled, and seven characteristic data such as the highest temperature, the lowest temperature, the weather, the wind power, the wind direction, the air quality, the air index and the like are crawled respectively.
For subsequent data analysis, for a date-type variable, namely "date", the data format is several years and months, conversion into a language that can be recognized by machine learning is required, and then the data is converted into days, and 1 month and 31 days in 2010 are taken as a uniform time point.
Because the natural gas load prediction models constructed subsequently are based on float type instead of vector space measurement, the discrete variables in the data set are only subjected to simple digital coding processing, the coded numerical values only represent class symbols and have no size difference, the class number of the individual variables in the data set is large, and the digital coding effectively avoids the defect of large feature space caused by single-hot coding.
Therefore, the crawled data are firstly standardized, and one-hot coding is carried out on three feature sets of weather, wind direction and air quality, and the coding result is as follows:
the variable weather, i.e., the weather condition of the current day, includes 31 categories, and the weather code table is shown in table 1.
TABLE 1
Figure BDA0003619349710000071
The variable wind direction, i.e. the wind direction condition of the current day, includes 7 categories, and the wind direction code table is shown in table 2.
TABLE 2
Figure BDA0003619349710000072
The variable air quality, i.e. the air quality status of the day, comprises 5 categories, and the air quality code table is shown in table 3.
TABLE 3
Figure BDA0003619349710000073
The feature data after the coding is selected, the feature engineering method used by the invention mainly comprises 5 types: a variance selection method, a correlation coefficient method, a mutual information method, a CART selection method based on a decision tree, and a feature selection method based on SVR-RFECV.
Selecting method of variance
Feature data is first selected using a variance selection method. The variance selection method is to calculate the variance of the feature variables in the feature set, and then to perform a selection on the feature variables according to the variance, and the selection result is shown in fig. 3.
As can be seen from fig. 3, the variance value of the characteristic variable wind power is the smallest and is close to 0, which indicates that the relationship between the change of the natural gas air load data and the characteristic variable wind power is not large, and the natural gas air load data hardly changes depending on the wind power when other characteristic variables are kept unchanged. Therefore, the characteristic variable wind power can be eliminated. And similarly, the characteristic variable air quality also needs to be eliminated. The variance of the characteristic variables weather, lowest temperature, highest temperature is large and should be preserved. Although the variance of the characteristic variables, namely the air index and the wind direction, is not large, the value is above 1, which indicates that the sample value in the variable still has fluctuation, and the sample value is further analyzed in the subsequent characteristic selection method.
Correlation coefficient method
When the variance selection method is used for feature selection, only the data characteristics of the feature variables are considered, and the correlation between the feature variables is not required to be considered. In order to further carry out objective screening on the correlation degree among the natural gas characteristic variables, a correlation coefficient method is adopted to select the characteristic data.
The correlation coefficient method is used for calculating the degree of correlation between the characteristic variables and judging whether the characteristic variables have a relationship with the characteristic variables. There are three common correlation coefficients: pearson product difference correlation coefficient, Spearman rank correlation coefficient, Kendall rank correlation coefficient. Compared with the other two methods, the using condition of the Pearson product difference correlation coefficient is severer and is suitable for linear correlation data; compared with the former, the Spearman rank correlation coefficient has wider application range, but is lower than the Pearson product difference correlation coefficient in the aspect of statistical efficiency; the Kendall rank correlation coefficient is a rank correlation coefficient, is used for reflecting indexes of feature variable correlation, and is suitable for the condition that two feature variables are both in ordered classification.
Since more discrete variables exist in the embodiment data, the Spearman correlation coefficient is used to measure the correlation degree between the variables. According to the correlation coefficient determination criterion, when the absolute value of the correlation coefficient is 0.8 or more, it is considered that the two variables are highly correlated, and when the absolute value of the correlation coefficient is 0.1 or less, it is considered that there is very little correlation or no correlation between the two variables. The correlation between the independent variables was first examined by Spearman correlation coefficients in case multiple collinearity between the variables adversely affected the model results, as shown in fig. 4.
As can be seen from fig. 4, the degree of correlation between the characteristic variables. The most relevant of them is the highest temperature and the lowest temperature of the characteristic variable, and the correlation coefficient between the two is 0.91. The correlation between the highest temperature and the weather is extremely poor, the absolute value of a correlation coefficient is smaller than 0.1, but in a variance selection method, the variance between the highest temperature and the weather is large, so that the data values of the highest temperature and the weather have certain fluctuation, certain influence is brought to subsequent prediction, and the retention is considered. Overall, the correlation values of wind power, wind direction and weather are very low, and elimination is considered.
Fig. 5 is a graph for observing the correlation between each characteristic variable and gas load data for natural gas. As can be seen from the figure: the correlation coefficient between the characteristic variable air index and the natural gas consumption load data is the largest, the value of the correlation coefficient is 0.3, and the correlation is positive, which indicates that the fluctuation of the natural gas consumption load data can be influenced by the air index; according to the numerical values, the highest temperature and the lowest temperature of the characteristic variables and the gas load data for the natural gas are in a negative correlation relationship, and when the highest temperature and the lowest temperature are changed, the change of the gas load data for the natural gas is influenced; the characteristic variables wind direction and weather are in positive correlation with the natural gas load data, and the change of the wind direction and the weather also influences the change of the natural gas load data; the correlation coefficient between the characteristic variable wind power and the natural gas consumption load data is the lowest, which shows that the fluctuation influence of the wind power on the natural gas consumption load is extremely low.
③ mutual information method
The correlation coefficient method is used for judging the correlation between characteristic variables in the whole characteristic data set, and the mutual information method is used for evaluating the dependency between the characteristic variables in the characteristic data set and the gas load data for natural gas. That is, when a characteristic variable is given, the degree of dependence between the gas load data for natural gas and the given characteristic variable in this case is determined. In short, the certainty of the gas consumption load is increased when a characteristic variable is given, and the increment of the certainty is the information quantity. The value range of the mutual information method is between [0 and 1], when the value is smaller, the dependency between the two variables is weaker, and when the value is larger, the dependency between the two variables is stronger.
In this embodiment, a mutual information method is implemented by using Python, and after a mutual information method is selected for a feature variable of a feature data set, a result of selecting the mutual information variable is shown in table 4:
TABLE 4
Figure BDA0003619349710000101
As can be seen from table 4, the mutual information coefficient of the characteristic variable air index is 0.91, which indicates that the interdependency between the natural gas air load and the air index is very strong, and the difference of the air indexes has a very strong influence on the fluctuation of the natural gas air load data; mutual information values of the maximum temperature, the minimum temperature, the weather and the wind direction of the characteristic variables are all above 0.4, which shows that the 4 characteristic variables have different degrees of influence on the gas load of the natural gas. However, the mutual information value of the characteristic variable wind power is 0.31, and compared with other characteristic variables, the fluctuation relation between the wind power and the natural gas air load is not large, and the characteristic variables are removed.
CART selection method based on decision tree
The first three feature selection methods in this embodiment all belong to filtering type feature selection, and the filtering type selection method is to select according to the calculation result of data, and does not belong to the range of machine learning algorithms. Compared with a filtering method, the embedding method is based on the feature selection of a machine learning algorithm, and the selection result is superior to that of the filtering method. For regression and classification problems, decision trees are often used for resolution. Decision tree selection methods are mainly divided into three categories: (1) the ID3 algorithm; (2) c4.5 algorithm; (3) the CART algorithm. Since the ID3 algorithm and the C4.5 algorithm need to satisfy their requirements and application ranges when selecting characteristic variables, and are too severe, this embodiment will select natural gas characteristic variables using CART.
In this embodiment, feature variable selection based on a decision tree is implemented by Python, where the sum of the values of all feature variables is 1, which represents the importance degree of the feature variables to the model, and the selection result is shown in fig. 6.
As can be seen from fig. 6, the highest temperature and the lowest temperature of the characteristic variables have the highest importance to the model, and the values thereof are above 0.3 and should be retained; secondly, the importance of the characteristic variables such as weather, wind direction and air index to the model is higher, and the importance is higher than 0.13, which indicates that the dependency exists between the characteristic variables and the model and the dependency should be reserved; the numerical values of the characteristic variables of wind power and air quality are below 0.05 and close to 0, which shows that the influence of the characteristic variables of wind power and air quality is not large in model training and the characteristic variables of wind power and air quality are required to be removed.
Feature selection method based on SVR-RFECV
Unlike the filtering method and the embedding method, which solve the problem with one training, the Wrapper method uses a feature subset to perform multiple training, in this process, an objective function is often used to help select features, and the most typical objective function is Recursive Feature Elimination (RFE). The method is a greedy optimization algorithm, a machine learning model is used for continuous training, the least important feature is removed when a model is trained, and then the next round of training is carried out based on a new feature set. However, RFE has a certain blindness when setting the parameter n _ features _ to _ select, and if the setting value is too small, the features with strong correlation may be removed, resulting in information loss; if the set value is too large, irrelevant features are still retained, and information redundancy is caused. Therefore, in variable selection, RFE is usually combined with K-fold cross validation to find the best feature set, perfectly solving its drawbacks. The invention selects the characteristic by selecting the classical SVR-RFE algorithm.
According to the method, recursive elimination selection is achieved through Python, a skleann packet for machine learning is called, the n _ features _ to _ select is used for keeping the feature quantity of the natural gas feature data set, and the cross checking frequency of the experiment is defaulted to be 5. The results of SVR-RFECV selection are shown in Table 5, by experiment:
TABLE 5
Figure BDA0003619349710000111
The metric "feature rank" in table 5 represents the score ranking of each feature, and a lower feature score represents a higher importance of the feature in the model, and when the feature score is 1, it indicates that the variable is a retained feature and can be used in the subsequent construction of the model. As can be seen from the table, the SVR-RFECV method screens 6 variables of the highest temperature, the lowest temperature, the weather, the wind direction, the air index and the air quality.
Based on the above research, the present invention selects natural characteristic data using an analysis of variance method, a correlation coefficient method, a mutual information method, a decision tree CART-based embedding method, and a RFECV-based packaging method, and the summary results of the selection of characteristic variables are shown in table 6. The characteristic variables thus retained are the maximum temperature, the minimum temperature, the weather, the wind direction, the air index.
TABLE 6
Figure BDA0003619349710000121
3) Respectively obtaining a processed natural gas consumption load data set and a feature data set after feature selection through the step 1) and the step 2), using the obtained feature data set and the natural gas consumption load data set to construct a multi-feature LSTM model, training and predicting, observing a prediction result, and analyzing the advantages of the multi-feature LSTM model in natural gas load prediction;
when modeling the multi-feature LSTM model, the flow is shown in fig. 7. Experiments were performed according to the modeling flow.
Normalizing input data
The input data of the invention consists of characteristic data sets and gas load data for natural gas. Before modeling the LSTM model, the relationship between gas load data for natural gas and the characteristic variables may be observed.
And after the observation is finished, normalizing the input data. In this embodiment, a Min-Max normalization method is used to implement data processing. The method aims to scale the values of the characteristic data and the gas load data for natural gas so that the value range of the characteristic data and the gas load data for natural gas is between 0 and 1.
② splitting data sets
The data set needs to be split before the model is trained. According to past experience, the splitting ratio of the training set and the test set in the embodiment is 8: 2. The method aims to enable the model to be better learned and further improve the prediction accuracy.
Constructing feature data set and label data set
After splitting the input data, dividing the input data into training and testing sets. And respectively constructing a feature data set and a label data set for the training set and the test set. To reduce the time for model runs, the present embodiment processes by constructing batches of data. Wherein the batch size is 12, that is, each 12 data points constitute a batch for training.
Model construction, compilation and training
A multi-feature LSTM model is first constructed. During the experiment, neurons of the hidden layer were set to a value of 256 and Dense to a value of 1. And then optimizing the model in the compiling process of the model, selecting Adam by an optimizer, and selecting MAE by the evaluation index of the model. Finally, the model is trained. In the experimental process, the number of iterations is set to 10, wherein fig. 8 is a training diagram of the multi-feature LSTM of the present embodiment.
As can be seen from fig. 8, as the number of iterations increases, the loss values of the training data and the model approach each other and tend to converge. Therefore, the model is ideal and can be used for subsequent prediction.
Based on the construction and training of the model, the gas consumption load of the natural gas is predicted next. The test data is imported into the model and predicted, and the prediction result is shown in fig. 8.
As can be seen from FIG. 9, when the model is used for prediction, the fitting effect of the gas load for natural gas is good, and the prediction result is accurate. The RMSE value for this model was calculated to be 0.058. The combination of the research shows that the prediction accuracy of the multi-feature LSTM neural network is very accurate. Therefore, in the subsequent system implementation process, the model is adopted for prediction, and scientific basis is provided for managers so as to facilitate the scheduling and distribution of natural gas.
Through experiments, the invention can obtain a better natural gas load prediction method, improve the prediction accuracy and facilitate the storage and scheduling of natural gas resources.
4) Constructing natural gas load pre-prototype system
Through the steps, the optimal natural gas consumption load prediction model is obtained. The model is used for training the historical gas load data of natural gas provided by a certain province natural gas company, predicting the gas load in a certain time in the future, and then applying the model to a natural gas load prediction prototype system. And 4) mainly introducing the implementation process of the natural gas consumption load prediction prototype system. Analyzing the specific requirements of the system, and designing and implementing the system on the basis of the specific requirements.
(ii) requirement analysis
In order to improve the storage safety of a natural gas station and realize effective scheduling and distribution of natural gas, a natural gas load forecasting prototype system is constructed by combining the actual production status of a certain provincial branch company so as to facilitate the management of natural gas by users.
The natural gas load forecasting prototype system accurately depicts the gas load of downstream customers and reasonably forecasts the ultra-short-term, short-term and medium-term gas load of the downstream customers. Through the system, the historical gas consumption load data of a certain provincial branch company can be managed, and the future gas consumption load can be predicted. The system provides decision basis for effectively ensuring sustainable natural gas supply, relieving supply and demand contradiction, optimizing natural gas storage and transportation management, and realizing fine management control and pipe network planning of a natural gas storage and transportation system for a certain provincial branch company.
Aiming at the practical requirements of safe and efficient production of natural gas storage and transportation stations, and combining the actual production current situation of a certain company, the intelligent analysis and control system of the stations can solve the technical problems of large data analysis, optimized control and the like of the natural gas storage and transportation process urgently needed in daily production of the certain company. The system requirements analysis is shown in fig. 10.
The system is used by workers of a natural gas company, so that a user needs to log in and operate the system. The natural gas load prediction system mainly comprises load prediction of each station. The load prediction of each station consists of three parts: ultra-short term (by hour) forecast, short term (by day) forecast, medium term (by month) forecast.
Ultra-short term load prediction: and (4) giving the air load change result of the downstream door station in the unit of hours, and displaying the result in a terminal display layer in a graph mode.
Short-term load prediction: and (4) giving the air load change result of the downstream gate station by taking days as a unit, and displaying the result in a graph form on a terminal display layer.
And (3) medium-term load prediction: and (4) giving the air load change result of the downstream door station by taking months as a unit, and displaying the result in a graph form on a terminal display layer.
The ultra-short term and short term load adopts an online prediction mode based on an LSTM neural network, and the medium term prediction adopts an offline prediction mode.
System architecture design
The general architecture of the station intelligent analysis and control system is shown in fig. 11.
a) The equipment layer is responsible for the collection of various detection data, and the data type has the heterogeneous characteristics of multisource, contains the signal data from valve room and flow control equipment, and reserves the collection interface of other types of data.
b) And the data gateway layer completes the caching and forwarding of the data.
c) The data interface layer and the storage layer complete data management and cleaning functions, and meanwhile, a convenient and efficient data access interface is provided for data intelligent analysis service.
d) The intelligent data analysis service layer is a business core for intelligent analysis and control of the station, is composed of functions of data fusion, deep learning, statistical learning, process detection and the like, and calls related analysis models to analyze station operation data according to different data types and user requirements.
e) The visual interaction layer is based on a Web GIS and a BIM model lightweight engine, so that the real-time display of the state and the detection result of the pipe network station is realized, and auxiliary decision support is provided for managers at all levels of an enterprise.

Claims (10)

1. The natural gas load prediction method based on the characteristic engineering and the LSTM neural network is characterized by comprising the following processes:
preprocessing the acquired natural gas data set, removing default values and abnormal values, and processing the natural gas data set according to time characteristics to obtain a natural gas load data set;
screening characteristic factors which possibly influence the natural gas load prediction by utilizing characteristic engineering to obtain the characteristic factors which influence the natural gas load prediction;
screening the data of the natural gas load data set by using the characteristic factors, and selecting the natural gas load data set corresponding to the characteristic factors influencing natural gas load prediction to obtain a characteristic data set;
and processing the characteristic data set through a pre-established multi-characteristic LSTM model to obtain a natural gas load prediction value, so as to realize natural gas load prediction.
2. The natural gas load forecasting method based on the characteristic engineering and the LSTM neural network as claimed in claim 1, wherein the natural gas consumption load data set is obtained by processing a natural gas data set into ultra-short term data, short term data and medium term data according to time characteristics.
3. The signature engineering and LSTM neural network based natural gas load forecasting method of claim 2, wherein the ultra-short term data is hourly data, the short term data is daily data and the medium term data is monthly data.
4. The natural gas load prediction method based on the feature engineering and the LSTM neural network as claimed in claim 1, wherein when the feature engineering is used for screening the feature factors which may affect the natural gas load prediction, the feature factors which may affect the natural gas load prediction are respectively screened by using a variance selection method, a correlation coefficient method, a mutual information method, a CART selection method based on a decision tree and a feature selection method based on SVR-RFECV, and the feature factors which may affect the natural gas load prediction are screened according to the screening result.
5. The natural gas load prediction method based on the feature engineering and the LSTM neural network as claimed in claim 4, characterized in that after the feature factors possibly influencing the natural gas load prediction are respectively processed by using a variance selection method, a correlation coefficient method, a mutual information method, a decision tree CART selection method and an SVR-RFECV-based feature selection method, the common feature factors screened by the variance selection method, the correlation coefficient method, the mutual information method, the decision tree CART selection method and the SVR-RFECV-based feature selection method are used as the feature factors finally influencing the natural gas load prediction.
6. The natural gas load prediction method based on feature engineering and an LSTM neural network according to claim 2, wherein the feature factors influencing the natural gas load prediction comprise highest weather temperature, lowest weather temperature, weather conditions, wind direction and air index.
7. The method of claim 6 for natural gas load prediction based on feature engineering and LSTM neural networks, characterized in that the weather conditions comprise light rain-rain, light rain-rain, cloudy-light rain, light rain-heavy rain, heavy rain-light rain, light rain-fine, cloudy-light rain, cloudy-snow-entrained rain, light rain-cloudy, cloudy-cloudy, light rain-medium rain, cloudy-fine, cloudy-medium rain, fine-cloudy, fine-light rain, sleet-snow-cloudy, medium snow-cloudy, cloudy-fine, sleet-cloudy, small rain-light snow, heavy rain-rain, medium rain-light rain, fine-cloudy, medium rain-cloudy, fine, light rain, cloudy, and medium rain; the wind directions include southeast wind, southwest wind, northeast wind, northwest wind, and air index, east wind, south wind, and west wind.
8. Natural gas load prediction system based on characteristic engineering and LSTM neural network, characterized by comprising:
a data preprocessing module: the natural gas data processing system is used for preprocessing the acquired natural gas data set, eliminating default values and abnormal values, and then processing the natural gas data set according to time characteristics to obtain a natural gas load data set;
a first screening module: the method is used for screening characteristic factors which possibly influence the natural gas load prediction by utilizing characteristic engineering to obtain the characteristic factors which influence the natural gas load prediction;
a second screening module: the data of the natural gas load data set are screened by using the characteristic factors, and the natural gas load data set corresponding to the characteristic factors influencing natural gas load prediction is selected to obtain a characteristic data set;
a calculation module: and processing the characteristic data set through a pre-established multi-characteristic LSTM model to obtain a natural gas load prediction value, so as to realize natural gas load prediction.
9. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method for natural gas load prediction based on feature engineering and LSTM neural networks of any of claims 1-7.
10. A storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method for natural gas load prediction based on feature engineering and LSTM neural networks according to any one of claims 1 to 7.
CN202210452587.4A 2022-04-27 2022-04-27 Natural gas load prediction method and system based on characteristic engineering and LSTM neural network Pending CN114626640A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210452587.4A CN114626640A (en) 2022-04-27 2022-04-27 Natural gas load prediction method and system based on characteristic engineering and LSTM neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210452587.4A CN114626640A (en) 2022-04-27 2022-04-27 Natural gas load prediction method and system based on characteristic engineering and LSTM neural network

Publications (1)

Publication Number Publication Date
CN114626640A true CN114626640A (en) 2022-06-14

Family

ID=81905981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210452587.4A Pending CN114626640A (en) 2022-04-27 2022-04-27 Natural gas load prediction method and system based on characteristic engineering and LSTM neural network

Country Status (1)

Country Link
CN (1) CN114626640A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880959A (en) * 2022-07-11 2022-08-09 广东电网有限责任公司佛山供电局 Input variable acquisition method and system for building energy consumption hybrid model
CN116958482A (en) * 2023-06-16 2023-10-27 大拓(山东)物联网科技有限公司 BIM model light weight method and related equipment thereof
CN117610720A (en) * 2023-11-20 2024-02-27 武汉城市数字科技有限公司 Big data platform and neural network-based gas load prediction system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704966A (en) * 2017-10-17 2018-02-16 华南理工大学 A kind of Energy Load forecasting system and method based on weather big data
CN110852496A (en) * 2019-10-29 2020-02-28 同济大学 Natural gas load prediction method based on LSTM recurrent neural network
US20200076196A1 (en) * 2018-08-28 2020-03-05 Johnson Controls Technology Company Building energy optimization system with a dynamically trained load prediction model
CN113326654A (en) * 2021-05-20 2021-08-31 北京市燃气集团有限责任公司 Method and device for constructing gas load prediction model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704966A (en) * 2017-10-17 2018-02-16 华南理工大学 A kind of Energy Load forecasting system and method based on weather big data
US20200076196A1 (en) * 2018-08-28 2020-03-05 Johnson Controls Technology Company Building energy optimization system with a dynamically trained load prediction model
CN110852496A (en) * 2019-10-29 2020-02-28 同济大学 Natural gas load prediction method based on LSTM recurrent neural network
CN113326654A (en) * 2021-05-20 2021-08-31 北京市燃气集团有限责任公司 Method and device for constructing gas load prediction model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
由育阳: "《数据挖掘技术与应用》", 30 June 2021, 北京理工大学出版社, pages: 49 - 50 *
薛薇等: "《Python机器学习 数据建模与分析》", 31 March 2021, 机械工业出版社, pages: 280 - 281 *
谭洪卫: "《图说公共建筑能耗的数据挖掘与模型方法》", 31 July 2021, 同济大学出版社, pages: 78 *
谷宇: "《人工智能基础》", 31 January 2022, 机械工业出版社, pages: 41 - 42 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880959A (en) * 2022-07-11 2022-08-09 广东电网有限责任公司佛山供电局 Input variable acquisition method and system for building energy consumption hybrid model
CN114880959B (en) * 2022-07-11 2022-12-30 广东电网有限责任公司佛山供电局 Input variable acquisition method and system for building energy consumption hybrid model
CN116958482A (en) * 2023-06-16 2023-10-27 大拓(山东)物联网科技有限公司 BIM model light weight method and related equipment thereof
CN117610720A (en) * 2023-11-20 2024-02-27 武汉城市数字科技有限公司 Big data platform and neural network-based gas load prediction system

Similar Documents

Publication Publication Date Title
CN113962364B (en) Multi-factor power load prediction method based on deep learning
CN110135630B (en) Short-term load demand prediction method based on random forest regression and multi-step optimization
CN114626640A (en) Natural gas load prediction method and system based on characteristic engineering and LSTM neural network
CN111210093B (en) Daily water consumption prediction method based on big data
CN113205207A (en) XGboost algorithm-based short-term power consumption load fluctuation prediction method and system
CN110222882A (en) A kind of prediction technique and device of electric system Mid-long Term Load
CN112085285B (en) Bus load prediction method, device, computer equipment and storage medium
CN115860797B (en) Electric quantity demand prediction method suitable for new electricity price reform situation
CN110852496A (en) Natural gas load prediction method based on LSTM recurrent neural network
CN114330934A (en) Model parameter self-adaptive GRU new energy short-term power generation power prediction method
CN114444660A (en) Short-term power load prediction method based on attention mechanism and LSTM
CN115965110A (en) Accurate measurement and calculation method for enterprise energy consumption image and carbon emission facing industrial park
CN117194957A (en) Ultra-short-term prediction method based on satellite inversion radiation data technology
CN117494906B (en) Natural gas daily load prediction method based on multivariate time series
CN114662795A (en) Natural gas load prediction method and system based on EMD-ARIMA-LSTM model
CN114819395A (en) Industry medium and long term load prediction method based on long and short term memory neural network and support vector regression combination model
CN114239292A (en) Comprehensive evaluation method and system for low-carbon economic operation-oriented multifunctional demand park
CN113869633A (en) Power distribution network multi-source data quality control method
CN113537336A (en) XGboost-based short-term thunderstorm and strong wind forecasting method
Şişman A comparison of ARIMA and grey models for electricity consumption demand forecasting: The case of Turkey
CN109829115B (en) Search engine keyword optimization method
CN113780655A (en) Steel multi-variety demand prediction method based on intelligent supply chain
CN113723670A (en) Photovoltaic power generation power short-term prediction method with variable time window
Xiao et al. Research on fault-environment association rules of distribution network based on improved Apriori algorithm
Shendryk et al. Short-term Solar Power Generation Forecasting for Microgrid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination