CN114626640A

CN114626640A - Natural gas load prediction method and system based on characteristic engineering and LSTM neural network

Info

Publication number: CN114626640A
Application number: CN202210452587.4A
Authority: CN
Inventors: 边根庆; 周妮
Original assignee: Xian University of Architecture and Technology
Current assignee: Xian University of Architecture and Technology
Priority date: 2022-04-27
Filing date: 2022-04-27
Publication date: 2022-06-14

Abstract

The invention discloses a natural gas load prediction method and a system based on characteristic engineering and an LSTM neural network, wherein the method comprises the following processes: preprocessing the acquired natural gas data set, removing default values and abnormal values, and processing the natural gas data set according to time characteristics to obtain a natural gas load data set; screening characteristic factors which possibly influence the natural gas load prediction by utilizing characteristic engineering to obtain the characteristic factors which influence the natural gas load prediction; screening the data of the natural gas consumption load data set by using the characteristic factors, and selecting a natural gas consumption load data set corresponding to the characteristic factors influencing natural gas load prediction to obtain a characteristic data set; and processing the characteristic data set through a pre-established multi-characteristic LSTM model to obtain a natural gas load prediction value, so as to realize natural gas load prediction. The method can predict the natural gas load more accurately.

Description

Natural gas load prediction method and system based on feature engineering and LSTM neural network

Technical Field

The invention belongs to the technical field of prediction based on a machine learning model, and relates to a natural gas load prediction method and a natural gas load prediction system based on feature engineering and an LSTM neural network.

Background

As a clean and efficient energy source, the natural gas not only has lower greenhouse gas, sulfur dioxide and particulate matter emission than coal and petroleum, but also can effectively make up for the defects that wind energy and solar energy are not easy to store and unstable in supply. The countries in the world develop low-carbon economy, clean energy utilization and reduce atmospheric emission. Under the large background of transformation of global energy consumption structures, natural gas is favored and valued by all countries in the world, the large trend of replacing high-carbon and high-pollution coal by natural gas is irreversible, and the natural gas plays a more important role in global economic development and energy consumption structures.

The traditional natural gas load prediction method mainly comprises a time sequence method, a regression analysis method, a grey prediction method and the like, wherein the method is mainly based on mathematical theories such as calculus and mathematical statistics and establishes a mathematical relation among existing data through mathematical derivation, and then relevant calculation and analysis prediction are carried out on future data according to the mathematical relation. Although the method has the advantages of simple modeling, economy, applicability and the like, the method is generally suitable for occasions with small sample size, few influence factors and relatively simple association relation, and has poor accuracy in natural gas load prediction.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides a natural gas load prediction method and system based on characteristic engineering and an LSTM neural network.

The technical scheme adopted by the invention is as follows:

the natural gas load prediction method based on the characteristic engineering and the LSTM neural network comprises the following processes:

preprocessing the acquired natural gas data set, removing default values and abnormal values, and processing the natural gas data set according to time characteristics to obtain a natural gas load data set;

screening characteristic factors which possibly influence the natural gas load prediction by utilizing characteristic engineering to obtain the characteristic factors which influence the natural gas load prediction;

screening the data of the natural gas load data set by using the characteristic factors, and selecting the natural gas load data set corresponding to the characteristic factors influencing natural gas load prediction to obtain a characteristic data set;

and processing the characteristic data set through a pre-established multi-characteristic LSTM model to obtain a natural gas load predicted value, so as to realize natural gas load prediction.

Preferably, the natural gas data set is processed into ultra-short-term data, short-term data and medium-term data according to the time characteristics, so as to obtain the natural gas consumption load data set.

Preferably, the ultra-short term data is hourly data, the short term data is daily data and the medium term data is monthly data.

Preferably, when the characteristic factors possibly influencing the natural gas load prediction are screened by utilizing the characteristic engineering, the characteristic factors possibly influencing the natural gas load prediction are screened by utilizing a variance selection method, a correlation coefficient method, a mutual information method, a CART (decision tree based on) selection method and an SVR-RFECV-based characteristic selection method, and the characteristic factors influencing the natural gas load prediction are screened according to a screening result.

Preferably, after the feature factors possibly influencing the natural gas load prediction are respectively processed by using a variance selection method, a correlation coefficient method, a mutual information method, a decision tree CART-based selection method and an SVR-RFECV-based feature selection method, common feature factors screened by the variance selection method, the correlation coefficient method, the mutual information method, the decision tree CART-based selection method and the SVR-RFECV-based feature selection method are used as the feature factors finally influencing the natural gas load prediction.

Preferably, the characteristic factors influencing the natural gas load prediction comprise the highest weather temperature, the lowest weather temperature, the weather condition, the wind direction and the air index.

Preferably, the weather conditions include light rain-cloudy, light rain-rainfall, cloudy-cloudy, cloudy-light rain, light rain-heavy rain, heavy rain-light rain, light rain-sunny, cloudy-light rain, cloudy-rainfall, cloudy-snow-mixed rain, light rain-cloudy, cloudy-cloudy, light rain-medium rain, cloudy-sunny, cloudy-medium rain, sunny-cloudy, sunny-light rain, sleet-snow-cloudy, snow-medium-cloudy, cloudy-sunny, snow-medium-cloudy, snow-medium-snow-cloudy, rain-medium-heavy snow-light, heavy rain-cloudy, rain-light rain, sunny-cloudy, rain-medium-cloudy, sunny, light rain, cloudy, and medium rain; the wind directions include southeast wind, southwest wind, northeast wind, northwest wind, and air index, east wind, south wind, and west wind.

The invention also provides a natural gas load prediction system based on the characteristic engineering and the LSTM neural network, which comprises the following steps:

a data preprocessing module: the natural gas load processing system is used for preprocessing the acquired natural gas data set, eliminating default values and abnormal values, and then processing the natural gas data set according to time characteristics to obtain a natural gas load data set;

a first screening module: the method is used for screening characteristic factors which possibly influence the natural gas load prediction by utilizing characteristic engineering to obtain the characteristic factors which influence the natural gas load prediction;

a second screening module: the natural gas consumption load data set is used for screening the data of the natural gas consumption load data set by utilizing the characteristic factors, selecting the natural gas consumption load data set corresponding to the characteristic factors influencing the natural gas load prediction, and obtaining the characteristic data set;

a calculation module: and processing the characteristic data set through a pre-established multi-characteristic LSTM model to obtain a natural gas load prediction value, so as to realize natural gas load prediction.

The present invention also provides an electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method for natural gas load prediction based on feature engineering and LSTM neural networks of the present invention as described above.

The invention also provides a storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method for natural gas load prediction based on feature engineering and LSTM neural networks of the present invention as described above.

Compared with the prior art, the invention has the following beneficial effects:

the method introduces the characteristic engineering, and when the influence factors are screened, the influence factors are not screened manually, but screened through the characteristic engineering, so that the prediction efficiency is improved. And (4) bringing the screened feature set into an LSTM model for prediction. Compared with the traditional time sequence model, when the multi-feature LSTM is used for prediction, the multi-feature LSTM neural network can control the transmission of input data through the input gate, the forgetting gate and the output gate, and keep the independence of the output of the memory storage unit and the output of the result, so that the sequence can keep important information during transmission and keep longer-term memory for the sequence. Therefore, the multi-feature LSTM neural network can improve the load prediction accuracy in the prediction of the load time series.

Drawings

FIG. 1 is an overall framework diagram of the method for natural gas load prediction based on feature engineering and an LSTM neural network according to the present invention;

FIG. 2 is a technical route diagram of the natural gas load prediction method based on feature engineering and an LSTM neural network according to the present invention;

FIG. 3 is a diagram illustrating the results of a variance selection method in an embodiment of the present invention;

FIG. 4 is a graph of correlation coefficient method in an embodiment of the present invention;

FIG. 5 is a bar chart of the dependent variable and independent variable correlation coefficients in an embodiment of the present invention;

FIG. 6 is a CART selection method result diagram based on decision tree in the embodiment of the present invention;

FIG. 7 is a flow chart of a natural gas load prediction multi-feature LSTM in an embodiment of the present invention;

FIG. 8 is a single feature LSTM neural network training diagram;

FIG. 9 is a multi-feature LSTM model prediction graph of the present invention in a validation test;

FIG. 10 is a system requirements analysis diagram of the present invention;

FIG. 11 is a general architecture of the intelligent analysis and control system of the present invention;

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings:

the natural gas load prediction method based on the characteristic engineering and the LSTM neural network can select main influence factors of load prediction by screening characteristic variables, and then apply the characteristic variables to a multi-characteristic LSTM model through multi-characteristic variables, so that the accuracy of natural gas load prediction is improved. Therefore, the prediction accuracy is improved, and the natural gas can be better scheduled and distributed by a natural gas company. The invention constructs a natural gas load prediction model by researching characteristic engineering and a multi-characteristic LSTM algorithm, and uses the model to actually predict the natural gas load of a certain area in the future, thereby proving the mechanism, the framework and the effect of a machine learning algorithm.

The natural gas load prediction method based on the characteristic engineering and the LSTM neural network comprises the following steps:

1) preprocessing gas load data for natural gas: the default values and the abnormal values are removed, and then the natural gas data set is processed into ultra-short-term (hourly) data, short-term (daily) data and medium-term (monthly) data according to the time characteristics, so that subsequent prediction is facilitated;

2) the method comprises the steps of obtaining characteristic factors influencing natural gas load prediction, selecting the characteristics, eliminating irrelevant or redundant characteristics, reducing the number of effective characteristics, shortening model training time, effectively avoiding overfitting, improving the generalization capability of a model and further improving the prediction precision of the model;

3) the processed gas load data set for the natural gas is obtained through the step 1), and the characteristic data set influencing load prediction after characteristic engineering screening is obtained through the step 2). The obtained characteristic data set and the natural gas load data set are used for constructing a multi-characteristic LSTM model, training and predicting are carried out, and the advantages of the multi-characteristic LSTM model in natural gas load prediction are verified;

4) by utilizing the research results, a natural gas short-term load prediction prototype system is constructed, and is cooperated with natural gas storage and transportation enterprises, and the related technology and parameters of the natural gas short-term load prediction prototype system are further optimized from the aspects of accuracy, generalization, high efficiency and the like by combining the on-site verification result.

In the screening process, the invention uses five methods: variance selection method, mutual information method, correlation coefficient method, decision tree CART selection method and feature selection method based on SVR-RFECV. Through the five methods, the most influential characteristic factors are screened out.

Examples

As shown in fig. 1 and thorium 2, the natural gas load prediction method based on the characteristic engineering and the LSTM neural network of the embodiment includes the following steps:

1) preprocessing the obtained natural gas data set: the default value and the abnormal value are removed, so that subsequent prediction is facilitated;

the natural gas load data of the embodiment is from a certain provincial natural gas company, the recording time of the data set is from 31 days in the year 01 and 31 days in the year 2010 to 31 days in the year 12 and 31 days in the year 2020, each data interval is one day, namely 1 data point every day, and the data set has 3987 sample sizes in the case of data loss.

Because data given by a certain gas company is messy, the data needs to be processed before the experiment is carried out, and through data cleaning, data rechecking and verification, repeated information deletion, existing error correction and data consistency provision are carried out.

And after the natural gas load data is processed by default values and abnormal values, processing and dividing are carried out according to time characteristics so as to be used for subsequent prediction.

2) Acquiring characteristic factors influencing natural gas load prediction:

in addition to the historical load values, the data set also contains characteristics which may influence the load fluctuations, such as temperature, weather, wind direction, wind power and air quality. The characteristic data is obtained by crawling on the internet, the source of the characteristic data is a certain weather website, the weather data from 31.01.2010 to 31.2020 and 12.31.31 are crawled, and seven characteristic data such as the highest temperature, the lowest temperature, the weather, the wind power, the wind direction, the air quality, the air index and the like are crawled respectively.

For subsequent data analysis, for a date-type variable, namely "date", the data format is several years and months, conversion into a language that can be recognized by machine learning is required, and then the data is converted into days, and 1 month and 31 days in 2010 are taken as a uniform time point.

Because the natural gas load prediction models constructed subsequently are based on float type instead of vector space measurement, the discrete variables in the data set are only subjected to simple digital coding processing, the coded numerical values only represent class symbols and have no size difference, the class number of the individual variables in the data set is large, and the digital coding effectively avoids the defect of large feature space caused by single-hot coding.

Therefore, the crawled data are firstly standardized, and one-hot coding is carried out on three feature sets of weather, wind direction and air quality, and the coding result is as follows:

the variable weather, i.e., the weather condition of the current day, includes 31 categories, and the weather code table is shown in table 1.

TABLE 1

The variable wind direction, i.e. the wind direction condition of the current day, includes 7 categories, and the wind direction code table is shown in table 2.

TABLE 2

The variable air quality, i.e. the air quality status of the day, comprises 5 categories, and the air quality code table is shown in table 3.

TABLE 3

The feature data after the coding is selected, the feature engineering method used by the invention mainly comprises 5 types: a variance selection method, a correlation coefficient method, a mutual information method, a CART selection method based on a decision tree, and a feature selection method based on SVR-RFECV.

Selecting method of variance

Feature data is first selected using a variance selection method. The variance selection method is to calculate the variance of the feature variables in the feature set, and then to perform a selection on the feature variables according to the variance, and the selection result is shown in fig. 3.

As can be seen from fig. 3, the variance value of the characteristic variable wind power is the smallest and is close to 0, which indicates that the relationship between the change of the natural gas air load data and the characteristic variable wind power is not large, and the natural gas air load data hardly changes depending on the wind power when other characteristic variables are kept unchanged. Therefore, the characteristic variable wind power can be eliminated. And similarly, the characteristic variable air quality also needs to be eliminated. The variance of the characteristic variables weather, lowest temperature, highest temperature is large and should be preserved. Although the variance of the characteristic variables, namely the air index and the wind direction, is not large, the value is above 1, which indicates that the sample value in the variable still has fluctuation, and the sample value is further analyzed in the subsequent characteristic selection method.

Correlation coefficient method

When the variance selection method is used for feature selection, only the data characteristics of the feature variables are considered, and the correlation between the feature variables is not required to be considered. In order to further carry out objective screening on the correlation degree among the natural gas characteristic variables, a correlation coefficient method is adopted to select the characteristic data.

The correlation coefficient method is used for calculating the degree of correlation between the characteristic variables and judging whether the characteristic variables have a relationship with the characteristic variables. There are three common correlation coefficients: pearson product difference correlation coefficient, Spearman rank correlation coefficient, Kendall rank correlation coefficient. Compared with the other two methods, the using condition of the Pearson product difference correlation coefficient is severer and is suitable for linear correlation data; compared with the former, the Spearman rank correlation coefficient has wider application range, but is lower than the Pearson product difference correlation coefficient in the aspect of statistical efficiency; the Kendall rank correlation coefficient is a rank correlation coefficient, is used for reflecting indexes of feature variable correlation, and is suitable for the condition that two feature variables are both in ordered classification.

Since more discrete variables exist in the embodiment data, the Spearman correlation coefficient is used to measure the correlation degree between the variables. According to the correlation coefficient determination criterion, when the absolute value of the correlation coefficient is 0.8 or more, it is considered that the two variables are highly correlated, and when the absolute value of the correlation coefficient is 0.1 or less, it is considered that there is very little correlation or no correlation between the two variables. The correlation between the independent variables was first examined by Spearman correlation coefficients in case multiple collinearity between the variables adversely affected the model results, as shown in fig. 4.

As can be seen from fig. 4, the degree of correlation between the characteristic variables. The most relevant of them is the highest temperature and the lowest temperature of the characteristic variable, and the correlation coefficient between the two is 0.91. The correlation between the highest temperature and the weather is extremely poor, the absolute value of a correlation coefficient is smaller than 0.1, but in a variance selection method, the variance between the highest temperature and the weather is large, so that the data values of the highest temperature and the weather have certain fluctuation, certain influence is brought to subsequent prediction, and the retention is considered. Overall, the correlation values of wind power, wind direction and weather are very low, and elimination is considered.

Fig. 5 is a graph for observing the correlation between each characteristic variable and gas load data for natural gas. As can be seen from the figure: the correlation coefficient between the characteristic variable air index and the natural gas consumption load data is the largest, the value of the correlation coefficient is 0.3, and the correlation is positive, which indicates that the fluctuation of the natural gas consumption load data can be influenced by the air index; according to the numerical values, the highest temperature and the lowest temperature of the characteristic variables and the gas load data for the natural gas are in a negative correlation relationship, and when the highest temperature and the lowest temperature are changed, the change of the gas load data for the natural gas is influenced; the characteristic variables wind direction and weather are in positive correlation with the natural gas load data, and the change of the wind direction and the weather also influences the change of the natural gas load data; the correlation coefficient between the characteristic variable wind power and the natural gas consumption load data is the lowest, which shows that the fluctuation influence of the wind power on the natural gas consumption load is extremely low.

③ mutual information method

The correlation coefficient method is used for judging the correlation between characteristic variables in the whole characteristic data set, and the mutual information method is used for evaluating the dependency between the characteristic variables in the characteristic data set and the gas load data for natural gas. That is, when a characteristic variable is given, the degree of dependence between the gas load data for natural gas and the given characteristic variable in this case is determined. In short, the certainty of the gas consumption load is increased when a characteristic variable is given, and the increment of the certainty is the information quantity. The value range of the mutual information method is between [0 and 1], when the value is smaller, the dependency between the two variables is weaker, and when the value is larger, the dependency between the two variables is stronger.

In this embodiment, a mutual information method is implemented by using Python, and after a mutual information method is selected for a feature variable of a feature data set, a result of selecting the mutual information variable is shown in table 4:

TABLE 4

As can be seen from table 4, the mutual information coefficient of the characteristic variable air index is 0.91, which indicates that the interdependency between the natural gas air load and the air index is very strong, and the difference of the air indexes has a very strong influence on the fluctuation of the natural gas air load data; mutual information values of the maximum temperature, the minimum temperature, the weather and the wind direction of the characteristic variables are all above 0.4, which shows that the 4 characteristic variables have different degrees of influence on the gas load of the natural gas. However, the mutual information value of the characteristic variable wind power is 0.31, and compared with other characteristic variables, the fluctuation relation between the wind power and the natural gas air load is not large, and the characteristic variables are removed.

CART selection method based on decision tree

The first three feature selection methods in this embodiment all belong to filtering type feature selection, and the filtering type selection method is to select according to the calculation result of data, and does not belong to the range of machine learning algorithms. Compared with a filtering method, the embedding method is based on the feature selection of a machine learning algorithm, and the selection result is superior to that of the filtering method. For regression and classification problems, decision trees are often used for resolution. Decision tree selection methods are mainly divided into three categories: (1) the ID3 algorithm; (2) c4.5 algorithm; (3) the CART algorithm. Since the ID3 algorithm and the C4.5 algorithm need to satisfy their requirements and application ranges when selecting characteristic variables, and are too severe, this embodiment will select natural gas characteristic variables using CART.

In this embodiment, feature variable selection based on a decision tree is implemented by Python, where the sum of the values of all feature variables is 1, which represents the importance degree of the feature variables to the model, and the selection result is shown in fig. 6.

As can be seen from fig. 6, the highest temperature and the lowest temperature of the characteristic variables have the highest importance to the model, and the values thereof are above 0.3 and should be retained; secondly, the importance of the characteristic variables such as weather, wind direction and air index to the model is higher, and the importance is higher than 0.13, which indicates that the dependency exists between the characteristic variables and the model and the dependency should be reserved; the numerical values of the characteristic variables of wind power and air quality are below 0.05 and close to 0, which shows that the influence of the characteristic variables of wind power and air quality is not large in model training and the characteristic variables of wind power and air quality are required to be removed.

Feature selection method based on SVR-RFECV

Unlike the filtering method and the embedding method, which solve the problem with one training, the Wrapper method uses a feature subset to perform multiple training, in this process, an objective function is often used to help select features, and the most typical objective function is Recursive Feature Elimination (RFE). The method is a greedy optimization algorithm, a machine learning model is used for continuous training, the least important feature is removed when a model is trained, and then the next round of training is carried out based on a new feature set. However, RFE has a certain blindness when setting the parameter n _ features _ to _ select, and if the setting value is too small, the features with strong correlation may be removed, resulting in information loss; if the set value is too large, irrelevant features are still retained, and information redundancy is caused. Therefore, in variable selection, RFE is usually combined with K-fold cross validation to find the best feature set, perfectly solving its drawbacks. The invention selects the characteristic by selecting the classical SVR-RFE algorithm.

According to the method, recursive elimination selection is achieved through Python, a skleann packet for machine learning is called, the n _ features _ to _ select is used for keeping the feature quantity of the natural gas feature data set, and the cross checking frequency of the experiment is defaulted to be 5. The results of SVR-RFECV selection are shown in Table 5, by experiment:

TABLE 5

The metric "feature rank" in table 5 represents the score ranking of each feature, and a lower feature score represents a higher importance of the feature in the model, and when the feature score is 1, it indicates that the variable is a retained feature and can be used in the subsequent construction of the model. As can be seen from the table, the SVR-RFECV method screens 6 variables of the highest temperature, the lowest temperature, the weather, the wind direction, the air index and the air quality.

Based on the above research, the present invention selects natural characteristic data using an analysis of variance method, a correlation coefficient method, a mutual information method, a decision tree CART-based embedding method, and a RFECV-based packaging method, and the summary results of the selection of characteristic variables are shown in table 6. The characteristic variables thus retained are the maximum temperature, the minimum temperature, the weather, the wind direction, the air index.

TABLE 6

3) Respectively obtaining a processed natural gas consumption load data set and a feature data set after feature selection through the step 1) and the step 2), using the obtained feature data set and the natural gas consumption load data set to construct a multi-feature LSTM model, training and predicting, observing a prediction result, and analyzing the advantages of the multi-feature LSTM model in natural gas load prediction;

when modeling the multi-feature LSTM model, the flow is shown in fig. 7. Experiments were performed according to the modeling flow.

Normalizing input data

The input data of the invention consists of characteristic data sets and gas load data for natural gas. Before modeling the LSTM model, the relationship between gas load data for natural gas and the characteristic variables may be observed.

And after the observation is finished, normalizing the input data. In this embodiment, a Min-Max normalization method is used to implement data processing. The method aims to scale the values of the characteristic data and the gas load data for natural gas so that the value range of the characteristic data and the gas load data for natural gas is between 0 and 1.

② splitting data sets

The data set needs to be split before the model is trained. According to past experience, the splitting ratio of the training set and the test set in the embodiment is 8: 2. The method aims to enable the model to be better learned and further improve the prediction accuracy.

Constructing feature data set and label data set

After splitting the input data, dividing the input data into training and testing sets. And respectively constructing a feature data set and a label data set for the training set and the test set. To reduce the time for model runs, the present embodiment processes by constructing batches of data. Wherein the batch size is 12, that is, each 12 data points constitute a batch for training.

Model construction, compilation and training

A multi-feature LSTM model is first constructed. During the experiment, neurons of the hidden layer were set to a value of 256 and Dense to a value of 1. And then optimizing the model in the compiling process of the model, selecting Adam by an optimizer, and selecting MAE by the evaluation index of the model. Finally, the model is trained. In the experimental process, the number of iterations is set to 10, wherein fig. 8 is a training diagram of the multi-feature LSTM of the present embodiment.

As can be seen from fig. 8, as the number of iterations increases, the loss values of the training data and the model approach each other and tend to converge. Therefore, the model is ideal and can be used for subsequent prediction.

Based on the construction and training of the model, the gas consumption load of the natural gas is predicted next. The test data is imported into the model and predicted, and the prediction result is shown in fig. 8.

As can be seen from FIG. 9, when the model is used for prediction, the fitting effect of the gas load for natural gas is good, and the prediction result is accurate. The RMSE value for this model was calculated to be 0.058. The combination of the research shows that the prediction accuracy of the multi-feature LSTM neural network is very accurate. Therefore, in the subsequent system implementation process, the model is adopted for prediction, and scientific basis is provided for managers so as to facilitate the scheduling and distribution of natural gas.

Through experiments, the invention can obtain a better natural gas load prediction method, improve the prediction accuracy and facilitate the storage and scheduling of natural gas resources.

4) Constructing natural gas load pre-prototype system

Through the steps, the optimal natural gas consumption load prediction model is obtained. The model is used for training the historical gas load data of natural gas provided by a certain province natural gas company, predicting the gas load in a certain time in the future, and then applying the model to a natural gas load prediction prototype system. And 4) mainly introducing the implementation process of the natural gas consumption load prediction prototype system. Analyzing the specific requirements of the system, and designing and implementing the system on the basis of the specific requirements.

(ii) requirement analysis

In order to improve the storage safety of a natural gas station and realize effective scheduling and distribution of natural gas, a natural gas load forecasting prototype system is constructed by combining the actual production status of a certain provincial branch company so as to facilitate the management of natural gas by users.

The natural gas load forecasting prototype system accurately depicts the gas load of downstream customers and reasonably forecasts the ultra-short-term, short-term and medium-term gas load of the downstream customers. Through the system, the historical gas consumption load data of a certain provincial branch company can be managed, and the future gas consumption load can be predicted. The system provides decision basis for effectively ensuring sustainable natural gas supply, relieving supply and demand contradiction, optimizing natural gas storage and transportation management, and realizing fine management control and pipe network planning of a natural gas storage and transportation system for a certain provincial branch company.

Aiming at the practical requirements of safe and efficient production of natural gas storage and transportation stations, and combining the actual production current situation of a certain company, the intelligent analysis and control system of the stations can solve the technical problems of large data analysis, optimized control and the like of the natural gas storage and transportation process urgently needed in daily production of the certain company. The system requirements analysis is shown in fig. 10.

The system is used by workers of a natural gas company, so that a user needs to log in and operate the system. The natural gas load prediction system mainly comprises load prediction of each station. The load prediction of each station consists of three parts: ultra-short term (by hour) forecast, short term (by day) forecast, medium term (by month) forecast.

Ultra-short term load prediction: and (4) giving the air load change result of the downstream door station in the unit of hours, and displaying the result in a terminal display layer in a graph mode.

Short-term load prediction: and (4) giving the air load change result of the downstream gate station by taking days as a unit, and displaying the result in a graph form on a terminal display layer.

And (3) medium-term load prediction: and (4) giving the air load change result of the downstream door station by taking months as a unit, and displaying the result in a graph form on a terminal display layer.

The ultra-short term and short term load adopts an online prediction mode based on an LSTM neural network, and the medium term prediction adopts an offline prediction mode.

System architecture design

The general architecture of the station intelligent analysis and control system is shown in fig. 11.

a) The equipment layer is responsible for the collection of various detection data, and the data type has the heterogeneous characteristics of multisource, contains the signal data from valve room and flow control equipment, and reserves the collection interface of other types of data.

b) And the data gateway layer completes the caching and forwarding of the data.

c) The data interface layer and the storage layer complete data management and cleaning functions, and meanwhile, a convenient and efficient data access interface is provided for data intelligent analysis service.

d) The intelligent data analysis service layer is a business core for intelligent analysis and control of the station, is composed of functions of data fusion, deep learning, statistical learning, process detection and the like, and calls related analysis models to analyze station operation data according to different data types and user requirements.

e) The visual interaction layer is based on a Web GIS and a BIM model lightweight engine, so that the real-time display of the state and the detection result of the pipe network station is realized, and auxiliary decision support is provided for managers at all levels of an enterprise.

Claims

1. The natural gas load prediction method based on the characteristic engineering and the LSTM neural network is characterized by comprising the following processes:

and processing the characteristic data set through a pre-established multi-characteristic LSTM model to obtain a natural gas load prediction value, so as to realize natural gas load prediction.

2. The natural gas load forecasting method based on the characteristic engineering and the LSTM neural network as claimed in claim 1, wherein the natural gas consumption load data set is obtained by processing a natural gas data set into ultra-short term data, short term data and medium term data according to time characteristics.

3. The signature engineering and LSTM neural network based natural gas load forecasting method of claim 2, wherein the ultra-short term data is hourly data, the short term data is daily data and the medium term data is monthly data.

4. The natural gas load prediction method based on the feature engineering and the LSTM neural network as claimed in claim 1, wherein when the feature engineering is used for screening the feature factors which may affect the natural gas load prediction, the feature factors which may affect the natural gas load prediction are respectively screened by using a variance selection method, a correlation coefficient method, a mutual information method, a CART selection method based on a decision tree and a feature selection method based on SVR-RFECV, and the feature factors which may affect the natural gas load prediction are screened according to the screening result.

5. The natural gas load prediction method based on the feature engineering and the LSTM neural network as claimed in claim 4, characterized in that after the feature factors possibly influencing the natural gas load prediction are respectively processed by using a variance selection method, a correlation coefficient method, a mutual information method, a decision tree CART selection method and an SVR-RFECV-based feature selection method, the common feature factors screened by the variance selection method, the correlation coefficient method, the mutual information method, the decision tree CART selection method and the SVR-RFECV-based feature selection method are used as the feature factors finally influencing the natural gas load prediction.

6. The natural gas load prediction method based on feature engineering and an LSTM neural network according to claim 2, wherein the feature factors influencing the natural gas load prediction comprise highest weather temperature, lowest weather temperature, weather conditions, wind direction and air index.

7. The method of claim 6 for natural gas load prediction based on feature engineering and LSTM neural networks, characterized in that the weather conditions comprise light rain-rain, light rain-rain, cloudy-light rain, light rain-heavy rain, heavy rain-light rain, light rain-fine, cloudy-light rain, cloudy-snow-entrained rain, light rain-cloudy, cloudy-cloudy, light rain-medium rain, cloudy-fine, cloudy-medium rain, fine-cloudy, fine-light rain, sleet-snow-cloudy, medium snow-cloudy, cloudy-fine, sleet-cloudy, small rain-light snow, heavy rain-rain, medium rain-light rain, fine-cloudy, medium rain-cloudy, fine, light rain, cloudy, and medium rain; the wind directions include southeast wind, southwest wind, northeast wind, northwest wind, and air index, east wind, south wind, and west wind.

8. Natural gas load prediction system based on characteristic engineering and LSTM neural network, characterized by comprising:

a data preprocessing module: the natural gas data processing system is used for preprocessing the acquired natural gas data set, eliminating default values and abnormal values, and then processing the natural gas data set according to time characteristics to obtain a natural gas load data set;

a second screening module: the data of the natural gas load data set are screened by using the characteristic factors, and the natural gas load data set corresponding to the characteristic factors influencing natural gas load prediction is selected to obtain a characteristic data set;

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method for natural gas load prediction based on feature engineering and LSTM neural networks of any of claims 1-7.

10. A storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method for natural gas load prediction based on feature engineering and LSTM neural networks according to any one of claims 1 to 7.