CN114298411A

CN114298411A - GDP prediction method based on N-BEATS

Info

Publication number: CN114298411A
Application number: CN202111628933.1A
Authority: CN
Inventors: 刘嘉; 王昊文; 彭奕琦; 赵涛; 张磊; 张永平; 孙博伟; 孟冲; 赵云飞; 梁鄯文
Original assignee: Renmu Consulting Beijing Co ltd
Current assignee: Renmu Consulting Beijing Co ltd
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-04-08

Abstract

The invention discloses a GDP prediction method based on N-BEATS, which comprises the following steps: downloading an original GDP data set on a data platform; performing data preprocessing on the original GDP data set to obtain a preprocessed GDP data set; performing feature selection on the preprocessed GDP data set, and extracting data features of the preprocessed GDP data set; establishing an N-BEATS model; training the N-BEATS model by the preprocessed GDP data set to obtain a trained N-BEATS model; and based on the original GDP data set, predicting the GDP to be predicted by adopting a trained N-BEATS model. The invention is easier to train, the prediction accuracy is improved, meanwhile, the invention can provide interpretable output, and the invention uses the least possible prior knowledge, namely greatly reduces the data volume of the training set, shortens the training time, and has the prediction efficiency superior to LSTM and SVM models.

Description

GDP prediction method based on N-BEATS

Technical Field

The invention relates to the technical field of computer prediction models, in particular to a GDP prediction method based on N-BEATS.

Background

GDP (Gross Domestic Product) refers to the final result of all the people living in a certain country or region in a certain period of time. The GDP is not only a core index for calculating national economy, but also one of indexes for measuring the economic condition and the development level of the country or the region in a certain period. Meanwhile, governments and enterprises can conduct macroscopic regulation and control and adjustment financing decisions by referring to the acceleration rate of the GDP. In many areas of research in macroeconomics, the prediction of GDP is one of the important research components. At present, along with the increasingly complex and changeable economic situation at home and abroad, under the action of structural reform at the supply side of China, economic growth is influenced by economic structural characteristics, meanwhile, friction and competition which continuously emerge in international trade also cause the economic growth of China to be full of uncertainty, and the difficulty of GDP prediction is greatly increased due to various reasons.

In order to ensure that the GDP prediction still has high accuracy under complex conditions, a proper prediction method needs to be adopted to establish a reasonable prediction model. The existing main models for predicting GDP are roughly divided into two types, one of which is called linear model, and such models include bayesian vector autoregressive model (BVAR), differential integrated moving average autoregressive model (ARIMA), and the like. The BVAR model is a more optimal prediction model formed by combining sample information by a Bayesian technology based on a Vector Autoregressive (VAR) model and adopting prior information. The ARIMA model is composed of three models, which are called an autoregressive model AR, a difference model I and a moving average model MA, respectively. As an important model for describing the random process of time series data, an I model in an ARIMA model can remove trends in the data, so that the sequence has a stable state, the later AR model is convenient to fit the sequence, and a moving average model is a regression equation for constructing system noise.

Another class of predictive models, collectively referred to as nonlinear models, includes artificial neural network models (ANNs), long-short term memory neural network models (LSTMs), and support vector machine algorithm models (SVMs). The model adopts a nonlinear method in the field of artificial intelligence, the effect of predicting GDP is achieved by processing high-frequency multidimensional and nonlinear data, and the prediction error obtained by predicting GDP by using a neural network model is reduced by about 25% compared with that of a linear model. Since the prediction of the GDP needs to take into account the time-series relationship existing in the economics, the conventional ANN model cannot reflect the time-series relationship between samples in the calculation. Different from an ANN model, the LSTM model can reflect the time sequence relation among samples, and the LSTM model is composed of a memory gate, a forgetting gate and an output gate, wherein the memory gate is responsible for determining whether information is reserved, and the forgetting gate is responsible for reducing redundant information in the data processing process, so that the data processing speed is improved; the output gate plays the role of outputting the current state value and the hidden state, the LSTM model can process the information of the previous state in the process of processing data and then transmit the information to the next state, and by means of the mechanism, the LSTM model has wider application in the aspect of processing time sequence data, including in the fields of economy and finance. The Support Vector Machine (SVM) algorithm is used as a small sample learning method for processing the problems of highly nonlinear classification and regression, does not relate to the law of large numbers and probability measurement, can avoid various problems of an artificial neural network in data processing, such as under-learning, over-learning and local minimum problems, can map a sample space to a high-dimensional feature space through nonlinear mapping, and realizes linear division or linear regression through a linear hyperplane. The SVM method weights a few samples of the support vector in the sample set, and these few support vectors can determine the final computation result. In recent years, SVM methods have provided new tools for numerical prediction.

Compared with a non-linear model, a prediction curve of a linear model such as a Bayesian vector autoregressive model (BVAR) is unstable when GDP prediction is performed, the degree of fitting of the prediction value of the BVAR model to the actual value of the GDP is low on the whole, and the prediction curve calculated by the BAVR model is rough in prediction of local extreme points, so that a phenomenon of amplifying short-term trend is generated, and a prediction result has a large degree of error. In addition, in the linear model ARIMA model, sufficient stability of time series data is required during data processing, or the time series data after differentiation is sufficient and stable, but the GDP is affected by various complex factors in an actual situation, and the stability of the data cannot be maintained all the time, so that when the ARIMA model is used for predicting the GDP, the prediction result is not accurate enough due to the defects of the model, and in essence, the model can only capture a linear relation and cannot capture a nonlinear relation, so that the limitation is brought to the prediction of the GDP.

For the nonlinear model, the conventional ANN model cannot reflect the time series relationship between samples when performing data processing, and the analysis and prediction of GDP greatly contribute to the prediction result if the time series relationship between samples is added, so the ANN model is not as good as the LSTM model in predicting GDP. However, the LSTM model depends on gradual prediction when predicting a time sequence, and therefore, a result generated by predicting a GDP in a long term using the LSTM model may be invalid. The SVM is a novel small sample learning method, and has the characteristics different from the conventional statistical method, so that efficient transduction reasoning from a training sample to a prediction sample can be realized, the problem can be greatly simplified, but the SVM is essentially to solve a support vector by means of quadratic programming, so that the calculation of an m-order matrix is involved, when the number m of samples is very large, a large amount of internal storage and calculation time of a computer is consumed, and therefore, the use of an SVM model depends on a large amount of feature engineering, and a large amount of operation time is required when GDP is predicted.

Disclosure of Invention

In view of the above-mentioned defects of the prior art, the technical problem to be solved by the present invention is to provide a GDP prediction method based on N-tables, the present invention applies N-tables model, based on several characteristics of decision tree model screening, respectively designs 2N-tables model composed of different stack structures, one is a stack (stack) composed of two general blocks (generic blocks), which is constructed as an unexplainable general model; and the other type of the stack is formed by stacking 1 trend block (tend block) and 1 seasonal block (seasonal block), and the interpretable model is formed. After the data set is trained, on one hand, the seasonal GDP is subjected to univariate prediction by using the two models, and on the other hand, the annual GDP is subjected to multivariate prediction by using the interpretable N-BEATS model formed by the second stack.

In order to achieve the above object, the present invention provides a GDP prediction method based on N-tables, comprising the following steps:

s101: downloading an original GDP data set on a data platform;

s102: performing data preprocessing on the original GDP data set to obtain a preprocessed GDP data set;

s103: performing feature selection on the preprocessed GDP data set, and extracting data features of the preprocessed GDP data set;

s104: establishing an N-BEATS model;

s105: training the N-BEATS model by the preprocessed GDP data set to obtain a trained N-BEATS model;

s106: and predicting the GDP to be predicted by adopting the trained N-BEATS model based on the original GDP data set.

Preferably, in the step S101, the data platform specifically includes but is not limited to a national statistics bureau, and the original GDP data set specifically includes but is not limited to a GDP in china and a GDP in quarter in 1980 and 2020.

Preferably, in the step S102, the method for performing data preprocessing on the original GDP dataset specifically includes, but is not limited to, reconstructing univariate data, where the data processing software specifically includes, but is not limited to, Python and MATLAB.

Preferably, the step S103 specifically includes:

s301: normalizing the preprocessed GDP data set by using MinMaxScale to obtain a normalized GDP data set;

s302: and performing feature selection on the normalized GDP data set by using a feature selection method, and extracting the data features of the preprocessed GDP data set.

Preferably, the feature selection method specifically includes, but is not limited to, a Filter variance selection method, and a Wrapper univariate feature selection method, and the evaluation index of the feature selection includes, but is not limited to, displayed _ variance _ score.

Preferably, in the step S104, the N-beans model includes, but is not limited to, a stack formed by stacking two general blocks, which is configured as an unexplainable general model; another stack of 1 trending block and 1 seasonal block constitutes an interpretable model.

The invention has the beneficial effects that:

the invention applies an N-BEATS model, and respectively designs 2N-BEATS models consisting of different stack structures based on a plurality of characteristics screened by a decision tree model, wherein one model is a stack (stack) formed by stacking two general blocks (generic blocks) and is formed into an unexplainable general model; and the other type of the stack is formed by stacking 1 trend block (trend block) and 1 seasonal block (seasonal block), and the interpretable model is formed. After the data set is trained, on one hand, the seasonal GDP is subjected to univariate prediction by using the two models, and on the other hand, the annual GDP is subjected to multivariate prediction by using the interpretable N-BEATS model formed by the second stack. Compared with the traditional neural network and statistical time series model, the N-BEATS model is easier to train, the prediction accuracy is improved, meanwhile, interpretable output can be provided, and the least possible prior knowledge is used, namely the data volume of a training set is greatly reduced, the training time is shortened, and the prediction efficiency is superior to that of LSTM and SVM models.

The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.

Drawings

FIG. 1 is an overall flow chart of the present invention.

FIG. 2 is a flow chart of the present invention for feature selection of the preprocessed GDP data set.

FIG. 3 is a block stack design called in code when applied (left); structure of seasonal and periodic blocks in source code (right).

FIG. 4 is a complete block diagram of the N-BEATS model.

FIG. 5 is a representation of the application of N-BEATS from different stacking designs to the univariate time series problem, quarterly GDP dataset, generic model framework.

FIG. 6 is an application representation of N-BEATS of different stacking designs on a univariate time series problem, quaternary GDP dataset, interpretable model framework trend & seacoast.

Detailed Description

N-BEATS is a deep learning algorithm based on trending and seasonal statistical models with backward and forward residual linking and very deep full link layer stacks in the model architecture. The key principles of frame design are as follows:

first, the infrastructure should be simple and general, but deep enough to mine potential information.

Secondly, the architecture should not rely on time series specific feature engineering or scaling of the input data.

And thirdly, the architecture should be extensible and can be interpreted. These principles are all embodied in the architecture of the N-BEATS.

The smallest unit of stack in the N-beans model is a block, with 1 block having 4 fully connected layer stacks. As shown in fig. 4, N-tables is characterized by using a dual residual stack design, with the residual design applied to both the backward and forward tasks. After data is input, two operation paths of backward and forward are provided, and adjacent full connection layers are connected through residual errors. The design of the residual means that each output is subtracted and converted into the residual, and the residual is used as the input of the deeper part of the model, so that the learning target of the next block can be concentrated on the part which cannot be explained at present, and the depth of the neural network is ensured.

As shown in FIG. 4, N-BEATS has designed two versions of the internal structure: a generic model and an interpretable model. Stacking of blocks forms a stack, and blocks are connected by residual errors. At the same time, the weights are shared within the same stack. It can be observed from fig. 4 that by comparison of the different stack designs, the generic frame (left) will have a higher interpretable variance value than the interpretable frame (right), and a better fit. In the aspect of prediction performance, an interpretable statistical model is introduced to the N-BEATS, certain accuracy needs to be weighed, and the interpretable variance value is reduced by 1.6%.

Furthermore, the general framework and the interpretable framework do not perform much differently when verified on the annual GDP dataset with higher smoothness and larger time interval. This can be understood as the accuracy is not affected by invoking the N-bes interpretable model on the premise that the data set itself is not very fluctuating.

Compared with the performances of other algorithms on the same data set, compared with the SVM, the deep learning N-BEATS does not depend on feature engineering and feature engineering; compared with LSTM, N-BEATS shows strong adaptability to multivariate and univariate time series data sets simultaneously, and model robustness is guaranteed.

In general, the accuracy of the N-BEATS is benefited by a residual error neural network framework, and each transmission link and stacking link are connected by the residual error, so that potential information deep enough can be mined; the advantage of N-BEATS interpretability is that after the training prediction is completed, the coefficients defining the statistical model can be referenced to bring meaningful insight into the output results. The N-BEATS algorithm also has enough data mining potential and application value in practical economic prediction application in consideration of the fact that the accuracy rate is reduced by only 1.6% for the problem that the N-BEATS setting interpretable model can balance the accuracy rate.

The N-best model is easier to train and has improved prediction accuracy compared to conventional neural networks and statistical time series models, while providing interpretable output. The general N-BEATS model uses as little prior knowledge as possible, namely greatly reduces the data volume of a training set, shortens the training time, and has better prediction efficiency than LSTM and SVM models.

The N-BEATS model can be divided into a generic model and an interpretable model. For the generic N-BEATS model, the output appears arbitrary, i.e., either trending, seasonal, or a mixture of both in the stack output. For the interpretable N-BEATS model, the output is divided into two independent parts, a trend output and a seasonal output. The trending output is monotonous and slowly moving, and the seasonal output is regular, periodic, exhibiting cyclical fluctuations. The interpretability of the N-best model does not lead to a reduction of the predictive performance, and it is also feasible to decompose it into different human interpretable outputs. The method separates the trend and the seasonality of the data through interpretability, simplifies the prediction process to a certain extent, is favorable for predicting the long-term GDP, and has obvious advantages compared with LSTM and SVM models.

As shown in fig. 1 and 2, a GDP prediction method based on N-tables includes the following steps:

s101: downloading an original GDP data set on a data platform;

s104: establishing an N-BEATS model;

s106: and based on the original GDP data set, predicting the GDP to be predicted by adopting a trained N-BEATS model.

In this embodiment, in step S101, the data platform specifically includes, but is not limited to, the national statistics bureau, and the original GDP data set specifically includes, but is not limited to, the GDP data set of the year 1980 and 2020, and the GDP data set of the quarter.

In this embodiment, in step S102, the method for performing data preprocessing on the original GDP data set specifically includes, but is not limited to, reconstructing single variable data, where the data processing software specifically includes, but is not limited to, Python and MATLAB.

In this embodiment, step S103 specifically includes:

s301: normalizing the preprocessed GDP data set by using MinMaxScaler to obtain a normalized GDP data set;

s302: and (4) performing feature selection on the normalized GDP data set by using a feature selection method, and extracting the data features of the preprocessed GDP data set.

In this embodiment, the feature selection method specifically includes, but is not limited to, a Filter _ variance selection method and a Wrapper _ univariate feature selection method, and the evaluation index of the feature selection includes, but is not limited to, an extended _ variance _ score.

In this embodiment, in step S104, the N-beans model includes, but is not limited to, a stack formed by stacking two general blocks, which is configured as an unexplainable general model; another stack of 1 trending block and 1 seasonal block constitutes an interpretable model.

In summary, the present invention applies the N-beans model, and based on several features of the decision tree model screening, 2N-beans models composed of different stack structures are respectively designed, one is a stack formed by stacking two general blocks, which is configured as an unexplainable general model; another stack of 1 trending block and 1 seasonal block constitutes an interpretable model. After the data set is trained, on one hand, the seasonal GDP is subjected to univariate prediction by using the two models, and on the other hand, the annual GDP is subjected to multivariate prediction by using the interpretable N-BEATS model formed by the second stack.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A GDP prediction method based on N-BEATS is characterized by comprising the following steps:

s101: downloading an original GDP data set on a data platform;

s104: establishing an N-BEATS model;

2. The method of claim 1, wherein the GDP prediction method based on N-tables is as follows: in the step S101, the data platform specifically includes, but is not limited to, the national statistics bureau, and the original GDP data set specifically includes, but is not limited to, the GDP data set of the year 1980 and 2020, and the GDP data set of the quarter.

3. The method of claim 1, wherein the GDP prediction method based on N-tables is as follows: in the step S102, the method for performing data preprocessing on the original GDP data set specifically includes, but is not limited to, reconstructing single variable data, where the data processing software specifically includes, but is not limited to, Python and MATLAB.

4. The method as claimed in claim 1, wherein the step S103 specifically includes:

5. The N-BEATS-based GDP prediction method of claim 4, wherein the feature selection method specifically includes but is not limited to Filter variance selection, Wrapper univariate feature selection, and the evaluation index of feature selection includes but is not limited to extended _ variance _ score.

6. The method as claimed in claim 1, wherein in the step S104, the N-bes model includes, but is not limited to, a stack formed by stacking two generic blocks, which is configured as an unexplainable generic model; another stack of 1 trending block and 1 seasonal block constitutes an interpretable model.