WO2021082811A1

WO2021082811A1 - Foreign exchange time series prediction method

Info

Publication number: WO2021082811A1
Application number: PCT/CN2020/116955
Authority: WO
Inventors: 倪丽娜; 李玉洁; 张金泉; 张泽坤; 亓亮
Original assignee: 山东科技大学
Priority date: 2019-10-29
Filing date: 2020-09-23
Publication date: 2021-05-06
Also published as: CN110782096A

Abstract

Disclosed is a foreign exchange time series prediction method, relating to the field of foreign exchange time series data. According to the prediction method, foreign exchange time series data is analyzed and predicted on the basis of a deep learning algorithm C-LSTM, which combines a convolutional neural network with a long short-term memory network, and a short-term prediction method for a foreign exchange time series is proposed. Three kinds of main factors affecting the prediction precision are systematically studied. The optimal input feature, network structure and training method are selected. As for the problem of big data noise, a feature optimization algorithm is constructed on the basis of PCA to perform dimension reduction and denoising on input features, and Dropout and L2 regularization methods are then used to avoid the problem of over-fitting, thereby further improving the prediction precision of the prediction method. At the same time, in order to meet the requirement of a foreign exchange market for high time effectiveness, a parallel optimization algorithm is constructed on the basis of high-performance GPU computing technology, thereby increasing the training speed of a network model and improving the availability of the prediction method in actual application scenarios.

Description

A foreign exchange time series forecasting method

Technical field

The invention relates to the field of foreign exchange time series data, in particular to a foreign exchange time series forecasting method.

Background technique

The foreign exchange market plays a key role in the healthy development of the world economy. The foreign exchange time series data fluctuates sharply, and there are many factors that affect its fluctuations. It is one of the most difficult financial derivatives to analyze and predict in the financial market. Traditional analysis and prediction methods have long been unable to do so. In the era of big data, with the continuous growth of data volume and the rapid improvement of computing power, deep learning technology has made major breakthroughs in image recognition, natural language processing, speech recognition and other fields. Many scholars have begun to apply deep learning technology to foreign exchange time. In the sequence analysis, certain research results have been obtained. However, due to the large noise and strong randomness of foreign exchange time series data, there are many factors affecting its fluctuations. Therefore, the application research of deep learning technology in foreign exchange time series analysis needs to be explored continuously. And perfect.

At present, there are mainly two types of analysis methods for foreign exchange time series:

(1) Traditional statistical methods

Traditional statistical methods use statistical methods to establish mathematical models, fit historical foreign exchange time series data, and then use the built models to predict future foreign exchange time series. Common methods include MA (Moving Average) model, ARIMA (AutoRegressive Integrated Moving Average) model and GARCH (Generalized AutoRegressive Conditional Heteroskedasticity, generalized autoregressive conditional heteroscedasticity) model, etc. ^[1] . Traditional statistical methods rely less on data, and only need the trend curve of historical foreign exchange time series to build a model, which has strong versatility. However, it has the problem of lag, the predicted value is later than the true value, and for systems with high complexity, traditional statistical methods cannot effectively mine the internal laws of the system, which makes the traditional statistical methods of financial time series analysis and forecasting effects Not ideal.

(2) Neural network method

Neural network can better fit complex nonlinear systems, so it has great potential for the analysis and prediction of foreign exchange time series. Therefore, many scholars use neural network methods to analyze and forecast foreign exchange time series and have obtained a lot of research results. Common methods include BP neural network, radial basis neural network, wavelet neural network, etc. ^[2] . However, the learning ability of shallow neural network methods is limited and cannot fit foreign exchange time series data well. Although the analysis and forecasting effect is better than traditional statistical methods, there is still a lot of room for improvement.

Deep learning technology makes up for the lack of learning ability of shallow neural networks, so it has better application prospects in foreign exchange time series analysis, but the structure of deep learning algorithms is complex, and there are many factors that affect its prediction accuracy, such as sample characteristics, Subjective factors such as algorithm structure and training optimization method have an important impact on the model's prediction accuracy. Researching these factors is of great significance for improving the prediction accuracy of deep learning algorithms in foreign exchange time series. In addition, the current application of deep learning algorithms in foreign exchange time series analysis is based on a single structure. How to effectively combine different deep learning algorithms with complementary advantages to further improve the prediction accuracy of deep learning algorithms requires continuous exploration and improvement.

As early as the end of the 20th century, MarkStaley and PcterKim had successfully applied simple artificial neural networks to foreign exchange time series analysis. Through analysis and forecasting of Canadian spot exchange rates, they proved the effectiveness of neural network methods in foreign exchange time series analysis. Later, Hui Xiaofeng, Hu Yunquan and others used artificial neural networks to predict the exchange rate of RMB against the US dollar, and compared them with traditional statistical analysis methods. Experimental data showed that neural network methods are superior to traditional statistical analysis methods. Jingtao Yao and Chew Lim Tan used neural network methods to analyze and predict the exchange rate time series between the U.S. dollar and the other five major currency pairs, which fully proved the good applicability of neural network algorithms in foreign exchange time series analysis, but they also pointed out that only It is difficult to obtain high returns in the foreign exchange market by relying on neural network algorithms. Therefore, many researchers have begun to adopt a combination model to improve the forecasting effect of foreign exchange time series. For example, Ouyang Liang merged the wavelet analysis method into the neural network algorithm, constructed the wavelet neural network prediction method, and improved the generalization ability of the neural network. He Ni and Yin Hujun combined multiple regression neural networks to construct a hybrid forecasting model, and used genetic algorithms to optimize the model. Experiments show that the hybrid forecasting model has a higher return on profit. Georgios Sermpinis and Konstantinos Theofilatos built a foreign exchange time series analysis model based on an adaptive radial basis function neural network, and optimized it with a particle swarm optimization algorithm. Experimental data shows that the model has a significant improvement in accuracy and speed. Lukas F et al. built a foreign exchange time series forecasting model based on a radial basis neural network, a combination of genetic algorithms and moving averages, and conducted an experimental analysis on the high-frequency time series data of the US dollar against the Canadian dollar. The experimental data showed that the model is more autoregressive. The model and BP neural network model have higher prediction accuracy. Kristjanpoller W and Minutolo M C use a hybrid model of neural network and GARCH, and incorporate multiple financial variables to predict the volatility of oil prices. Experiments show that the hybrid model has a 30% higher prediction accuracy than the previous model. Petropoulos A and other intelligence combined various machine learning models to research and develop an automatic foreign exchange portfolio trading system, which uses support vector machines, random forests, Bayesian regression trees, fully connected neural networks and naive Bayes classifiers to simulate The dependence patterns between major currency pairs generate implicit signals of exchange rate fluctuations based on the output of these models, and finally combine these implicit signals into aggregate forecast waveforms through majority voting, genetic algorithm optimization and regression weighting technology. The system is tested in actual transactions, and the test results show that the system can significantly improve transaction performance. Dash Rajashree proposed an evolutionary framework that uses an improved hybrid leapfrog algorithm and artificial neural network to predict foreign exchange time series data. And compared with the hybrid leapfrog algorithm and particle swarm optimization algorithm, the experiment shows that the model proposed in the article is more suitable for foreign exchange time series analysis.

The above studies are mostly based on shallow neural network algorithms, but the foreign exchange time series are volatile and random. The shallow neural network algorithms are difficult to fully explore the internal laws of foreign exchange time series. With the rapid development of deep learning technology, deep learning technology has made major breakthroughs in image recognition, speech recognition, natural speech processing and other fields. Therefore, the application of deep learning technology in financial time series analysis has also attracted the attention of many scholars.

Jing Chao et al. used an improved deep belief network (DBN) to analyze and predict foreign exchange time series data. In this paper, a continuous restricted Boltzmann machine was used to construct DBN, and the classic DBN model was improved to predict continuous data. The conjugate gradient descent method accelerates the training of DBN. In the experiment, six evaluation criteria were used to test three kinds of foreign exchange sequence data. The experimental data showed that the prediction method is superior to the prediction methods such as feedforward neural network. Korczak Jerzy and Hernes Marcin built a model that supports foreign exchange market trading decisions based on the CNN deep learning algorithm. Experiments show that the prediction error of the deep convolutional neural network on foreign exchange time series data is significantly reduced. Galeshchuk S and Mukherjee S predict foreign exchange time series data in emerging markets based on deep learning algorithms, and propose novel input features based on currency clusters. Experiments show that this input feature helps improve the prediction accuracy of deep learning algorithms. Dadabada Pradeepkumar and Vadlamani Ravi proposed a novel particle swarm optimization quantile recurrent neural network algorithm for analyzing and predicting financial data such as foreign exchange time series. The paper uses eight types of financial time series data for experimental analysis. The experimental data shows that The algorithm is better than generalized autoregressive conditional heteroscedasticity (GARCH), multi-layer perceptron (MLP), generalized regression neural network (GRNN), random forest (RF) and other models. Fischer T and Krauss C predict financial time series data based on long and short-term memory (LSTM) deep neural networks. Experiments show that the prediction effect of long-term and short-term memory networks is better than logistic regression, random forest and traditional RNN algorithms. Troiano L and Villa E M build trading robots based on LSTM to identify the logic between market sentiment given by technical indicators and investment decisions. Experimental results prove the feasibility of the solution.

In summary, due to the limitations of shallow neural networks and the development of deep learning technology, scholars have begun to study financial time series based on deep neural networks. The convolutional neural network considers the spatial characteristics of the data and can truly simulate the process of neural tissue learning. The algorithm has a good processing effect on the spatially correlated sequence data. The long and short-term memory network considers the time sequence of the data and can more realistically simulate the cognitive process of neural tissue. This algorithm produces a relatively good processing effect on the sequence data with time-related nature. However, their structure is complex, and there are many factors that affect their performance. At present, the effective combination of the two algorithms and the impact of specific input feature selection, network structure, and training methods on the prediction accuracy in the training and learning process have not been systematically studied.

Summary of the invention

The purpose of the present invention is to address the above shortcomings, based on a variety of foreign exchange currency data as research samples, based on the combination of convolutional neural network and long-term short-term memory network C-LSTM to construct a foreign exchange time series short-term prediction method, and convolutional deep neural network and The long and short-term memory network two deep learning algorithms are effectively combined, the factors that affect the prediction accuracy are systematically studied, and the principal component analysis, dropout, and L2 regularization methods are used to optimize, and finally a high prediction accuracy is constructed. C-LSTM foreign exchange time series short-term forecasting method.

The present invention specifically adopts the following technical solutions:

A foreign exchange time series forecasting method, including the following steps:

Step 1. Construct a C-LSTM prediction method based on the combination of convolutional neural network and long short-term memory network, which specifically includes:

1-1. Construct a C-LSTM network model based on the combination of convolutional neural network and long short-term memory network, including:

1-1-1, build five functional modules including input layer, hidden layer, output layer, network training and network prediction;

1-1-2, construct training and prediction algorithms for the C-LSTM short-term prediction method of foreign exchange time series based on the combination of convolutional neural network and long short-term memory network;

1-2. Choose the activation function of C-LSTM that combines convolutional neural network and long-term short-term memory network;

1-3, define the loss function of C-LSTM combining convolutional neural network and long short-term memory network;

1-4. Select transaction indicators and fundamental data as the input features of C-LSTM combined with convolutional neural network and long- and short-term memory network;

Step 2. Train and optimize the method constructed in Step 1 from the three aspects of input features, network structure and training methods. The training optimization items include feature optimization of principal component analysis, convolutional neural network and long- and short-term memory network combined C- LSTM lag period optimization, C-LSTM structure optimization combining convolutional neural network and long short-term memory network, C-LSTM training method optimization combining convolutional neural network and long short-term memory network, GPU-based parallel optimization;

In terms of input features, 18 indicator data are selected as input features. The 18 indicator data are divided into four categories: basic transaction data, technical indicator data, dollar index and national economic indicators. These four types of indicators are combined and based on principal components. The analysis method optimizes the input features, studies the impact of different indicators on the prediction accuracy and selects the best input features, and then experimentally studies the impact of the number of lag periods on the prediction accuracy, so as to select the best number of lag periods;

In terms of network structure, according to the grid search algorithm to study the best hidden layer structure size, by changing the combination of different convolutional neural networks and long and short-term memory networks, to study the impact of different algorithm combinations on the prediction accuracy, and choose the best The hidden layer size and algorithm combination method;

In terms of training methods, the Adam, SGD, and RMSProp methods are used to train the network. By comparing the prediction accuracy of the training algorithm and the change of the loss function with the number of iterations and the convergence speed during the training process, the effect of different training methods on the training is studied. And the impact of prediction accuracy, and finally choose the appropriate training method.

Preferably, in the step 1, the relu function is selected as the activation function of the C-LSTM that combines the convolutional neural network and the long- and short-term memory network. After the activation function is added to the network structure, the neural network has the fitting ability of a nonlinear system.

Preferably, in the step 1, the mean square error is selected as the loss function, and the loss function is shown in formula (1),

Among them, y _i is the correct answer corresponding to the i-th data in the data sequence batch,

Is the predicted value of the neural network corresponding to the i-th data.

Preferably, in the step 1, the technical indicators are calculated by trading indicators. Commonly used technical indicators include moving parallel lines and smooth similarities and differences moving parallel lines. Moving parallel lines and smoothing similarities and differences moving parallel lines are used to reflect current exchange rate changes. Trend, the trend turning point is judged by counter-trend indicators. Counter-trend indicators include stochastic indicators, deviation rates, relative strength indicators, and price changes.

Preferably, the moving parallel line index is to calculate the average value of the closing price of the exchange rate in a certain period of time, and the average value is used as the basis for judging the trend change. The specific calculation formula is as shown in formula (2).

Among them, N represents the time period, close _i represents the closing price of the i-th day;

Select the fast moving average and the slow moving average, and then calculate the DIF smooth moving average DEA, and finally get the smooth moving average of similarity and difference, the specific calculation is shown in formulas (3)-(7),

BAR=2×(DIF-DEA) (7)

In formulas (3)-(7), EMA _-1 is the exponential moving average of the previous day, Close is today's closing price, and BAR is the height of the MACD histogram.

Preferably, the specific calculation formula of the stochastic index is shown in formulas (8)-(11),

RSV _N = (Close _(N) -Low _(N) )÷(High _(N) -Low _(N) )×100% (8)

J=3×K-2×D (11)

Among them, Close _(N) is the average closing price in N days, Low _(N) is the lowest price in N days, High _(N) is the highest price in N days, K _-1 is the K value of the previous day, and D _-1 is D value of the previous day;

The specific calculation formula of the deviation rate is as formula (12),

Among them, Close is the closing price of the day, N is the time period, and the value is 12;

The calculation formula of the relative strength index is as formula (13),

Among them, Rise _i is the increase in the closing price on the i day, and Fall _i is the decrease in the closing price on the i day;

The formula for calculating the rate of price change is equation (14),

ROC=Close÷Close _-N (14)

Among them, Close is the closing price of the day, and Close _-N is the closing price of the previous N days.

Preferably, in step 2, a feature optimization algorithm is constructed based on PCA, and the input features are reduced in dimensionality.

Preferably, the steps of constructing a feature optimization algorithm based on PCA are specifically as follows:

Perform centralization processing on the input n-dimensional feature matrix D, that is, each column of data is subtracted from the column mean μ;

Calculate the covariance matrix S of the input feature matrix after centering;

For the calculated eigenvalue λ of the covariance matrix and its corresponding eigenvector ω, and sort the eigenvalues from large to small λ ₁ , λ ₂ ,..., λ _n ;

_{Take the eigenvectors ω 1} , ω ₂ ,..., ω _k corresponding to the first k large eigenvalues λ ₁ , λ ₂ ,..., λ _k , and map the n-dimensional features to k-dimensional through equation (15),

The k- _th dimension of the new x′ _i is the projection of x i in the direction of the k-th principal component ω _k . By selecting the eigenvectors corresponding to the largest k eigenvalues, the features with the smaller variance are discarded, so that each n The dimensional column vector is mapped to a k-dimensional column vector x′ _i , and a k-dimensional feature matrix D′ is obtained.

Preferably, the optimization of the C-LSTM network structure combining the convolutional neural network and the long short-term memory network in step 2 includes the following parts:

Hyperparameter optimization of cyclic layer of long-short-term memory network;

Convolutional layer hyperparameter optimization;

The algorithm combination method is optimized. The combination method of convolutional neural network and long short-term memory network includes:

The convolutional neural network is first followed by the long and short-term memory network, and the output of the convolutional neural network layer is used as the input of the long and short-term memory network layer;

The long-short-term memory network is first followed by the convolutional neural network, and the output of the long-short-term memory network layer is used as the input of the convolutional neural network layer;

After the convolutional neural network, the long and short-term memory network is performed separately, and the output of the two algorithms is combined to make the final prediction.

Preferably, in the step 2, the Adam, SGD, and RMSProp methods are used for network training, and the RMSProp training optimization method is selected. The RMSProp training optimization method has a fast convergence speed, the most stable training process, and the best training optimization effect.

The present invention has the following beneficial effects:

The prediction method is based on the C-LSTM, which is a combination of CNN and LSTM two deep neural networks. The algorithm analyzes and predicts foreign exchange time series data, and proposes and constructs a C-LSTM foreign exchange time series short-term forecasting method. The three main factors (training samples, network structure, training methods) that affect its prediction accuracy are systematically studied, and feature optimization algorithms are constructed based on the principal component analysis method. The input features are reduced in dimensionality and noise, and then Dropout and The L2 regularization method avoids the over-fitting problem, and further improves the prediction accuracy of the forecast method on foreign exchange data. In order to meet the high timeliness requirements of the foreign exchange market, a parallel optimization algorithm was constructed based on GPU high-performance computing technology to increase the training speed of the network, and finally a C-LSTM foreign exchange time series short-term prediction method with high prediction accuracy was constructed. Then compared with different neural network methods on 9 different foreign exchange currency pair data, the experimental results show that the constructed C-LSTM foreign exchange time series short-term forecasting method is better than the comparison method, which fully proves the construction The effectiveness and applicability of the C-LSTM foreign exchange time series short-term forecasting method in the foreign exchange market analysis and forecasting.

In terms of training samples, according to the input features, comprehensive consideration of various factors affecting exchange rate fluctuations, four types of features are extracted from them: basic transaction data, technical indicators, dollar index, and national economic indicators. These four types of features and their different combinations are taken as Input features, and at the same time, the optimization algorithm of input features is constructed based on PCA. The research shows that the combination of basic transaction data, technical indicators and dollar index as input variables has the highest prediction accuracy; for the number of lag periods, different lag periods are selected for training, research It is found that if the number of lag periods is too short, the deep learning algorithm cannot fully learn the essential laws of the time series, which will lead to a decrease in accuracy; if the number of lag periods is too long, the noise contained in the sequence will increase, which will also affect the essential laws of the time series of the deep learning algorithm. Mining, and the training time will also be greatly increased. Therefore, for different problems, choosing the appropriate number of lag periods can achieve better prediction results.

In terms of network structure, the prediction accuracy of different hidden layer numbers and the number of neurons in each layer was compared and analyzed. The study found that the number of hidden layers and the number of neurons in each layer would reduce the prediction accuracy. If the number and the number of neurons in each layer are too small, it is impossible to fully learn the essential laws in the time series, and the problem of under-fitting occurs. When the number of hidden layers and the number of neurons in each layer is too large, over-fitting will occur. Problems, leading to a decline in prediction accuracy. The combination of different deep learning algorithms will also have an impact on the prediction accuracy. The three different combination methods are compared through experiments, and the serial combination method of CNN first and LSTM is finally selected.

In terms of training methods, the Adam, SGD and RMSProp optimization methods were used to optimize the network training. The study found that the prediction accuracy of the two optimization methods of Adam and RMSProp is not much different. The RMSProp optimization method is more stable than the training optimization process of the Adam optimization method. The convergence speed is faster, so the RMSProp training optimization effect is better. Compared with the other two methods, the SGD optimization method has a slower convergence speed, so the training optimization effect of this optimization method is poor. Therefore, the optimization method of RMSProp is selected. Because the foreign exchange market has high requirements for timeliness and trading opportunities are fleeting, the use of GPU high-performance computer technology can effectively accelerate the training speed of the network and help improve the availability of prediction methods in actual application scenarios.

Finally, compare experiments with different neural network algorithms such as BP, CNN, RNN and LSTM on 9 different foreign exchange currency pairs. The experimental data show that the short-term C-LSTM foreign exchange time series constructed based on the combination of two deep neural network algorithms The prediction accuracy of the prediction method on the 9 currency pairs is higher than its comparison method, which fully proves the effectiveness and applicability of the constructed C-LSTM foreign exchange time series short-term forecasting method in the analysis and forecasting of the foreign exchange market.

Description of the drawings

Figure 1 is a schematic diagram of the network structure of the C-LSTM prediction method;

Figure 2 is a schematic diagram of the structure of the training set of the C-LSTM prediction method;

Figure 3 is a schematic diagram of tanh, sigmoid and relu activation functions;

Figure 4 is a schematic diagram of the change of the loss function value during the training process of the C-LSTM prediction method;

Figure 5 is a schematic diagram of the influence of different input features on the root mean square error;

Figure 6 shows the relationship between the number of lag periods n and the root mean square error;

Figure 7 is a diagram of the relationship between the size of the LSTM hidden layer and the RMSE;

Figure 8 shows the relationship between RMSE and the size of the convolutional layer;

Figure 9 shows the change of the loss value with the number of iterations when the Adam training optimization method is used;

Figure 10 shows the change of loss value with the number of iterations when the RMSProp training optimization method is used;

Figure 11 shows how the loss value changes with the number of iterations when the SGD training optimization method is used;

Figure 12 is a flow chart of the parallel optimization algorithm in asynchronous mode;

Figure 13 is a flow chart of the parallel optimization algorithm in synchronous mode;

Figure 14 is a graph showing the change trend of training speed with the increase in the number of GPUs;

Figure 15 is a data flow diagram of the C-LSTM prediction method;

Figure 16 shows the RMSE value when different forecasting methods predict different currency pairs;

Figure 17 is a fitting diagram of the prediction effect of the C-LSTM prediction method;

Figure 18 is a fitting diagram of the prediction effect of the RNN prediction method;

Figure 19 is a fitting diagram of the prediction effect of the LSTM prediction method;

Figure 20 is a fitting diagram of the prediction effect of the CNN prediction method;

Figure 21 is a fitting diagram of the prediction effect of the prediction method.

Detailed ways

The specific implementation of the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments:

1-1-1, construct five functional modules including input layer, hidden layer, output layer, network training and network prediction. The structure diagram is shown in Figure 1.

Input layer. First, the preprocessed and normalized 250,000 exchange rate data are divided into training set and test set at a ratio of 4:1. The training input is a period of time-lag historical exchange rate data and related feature data, and the output is the training input with a certain lag The predicted closing price of time, the structure of the training set is shown in Figure 2. In Figure 2, n is the number of lag periods. Starting at time t, input the historical exchange rate data and related features of the previous n moments, and the corresponding training output is the one-step forward prediction value y _t+1 at time t, according to the previous n moments Based on historical exchange rate data and related characteristics, predict the closing price of foreign exchange at the next moment.

Hidden layer. The size of the hidden layer is the number of neurons in the hidden layer, which has an important influence on the learning ability of the algorithm. Too few numbers will lead to insufficient learning, and too many numbers will lead to overfitting. Therefore, when determining the number of hidden layers and the number of neurons in each layer, it is necessary to ensure that the network can learn the hidden essential laws of the training data sequence, but also to prevent the over-fitting problem caused by the network's complexity.

The output layer. The number of output neurons is determined by the number of output variables. The academic community agrees that when the algorithm has only one output neuron, the output result will be optimal. Therefore, the number of output neurons is set to 1.

Network training. According to the batch gradient descent method, the training data set D _{train is} divided into batches, and the size of each batch is m. Then divide the data window according to the number of lag periods n, and enter the hidden layer. The input of the hidden layer after division is X={X ₁ ,X ₂ ,...,X _n }, the output of X after the hidden layer is expressed as H={H ₁ ,H ₂ ,...,H _n }, the corresponding theoretical output Is Y, and the predicted output is

Where H _i = C-LSTM _forward (X _i , S _i-1 , P _i-1 ), Si _-1 and P _i-1 are the state and output of the previous LSTM loop body respectively, and C-LSTM _forward is CNN And the forward calculation method of LSTM recurrent neural network.

W is the weight matrix of the output layer, and b is the bias of the output layer.

The average error is selected as the loss function, and the loss function is defined as

Taking the minimum value of the loss function as the optimization goal, given the initial learning rate η is 0.01, the learning rate attenuation coefficient α is 0.99, the number of training steps and the network initialization random number seed seed, the size of the hidden layer and the number of layers of the hidden layer. Use RMSProp optimization algorithm to continuously optimize and update network weights, stop network training when the number of training steps is reached or the loss function reaches a predetermined threshold, and store the trained network in the hard disk for network prediction.

Network forecast. Use the trained network to make predictions. Using an iterative method, predict the predicted value at each moment. The prediction process only involves the forward calculation process of the network, which is similar to the forward calculation process of network training. The test set is input into the trained network to obtain the predicted value, and the root mean square error (RMSE) of the predicted value and the real value is calculated as the standard for evaluating the prediction effect of the network. The smaller the root mean square error, the prediction accuracy Higher.

1-2. Select the activation function of the convolutional neural network and the long and short-term memory network, and select the relu function as the activation function of the convolutional neural network and the long and short-term memory network. After the activation function is added to the network structure, the neural network has a nonlinear system Fitting ability.

The schematic diagram of the activation functions of tanh, sigmoid and relu is shown in Figure 3. When the independent variable is greater than 0, the relu function makes the gradient change more stable, so the training of the algorithm is more stable and effective.

1-3, define the loss function of the convolutional neural network and the long short-term memory network, choose the mean square error as the loss function, and the loss function is shown in equation (1),

Among them, y _i is the correct answer corresponding to the i-th data in the batch,

Is the predicted value of the neural network corresponding to the i-th data. The mean square error amplifies the error, which can better measure the subtle difference of the forecast error, and is an important signal of the difference between the forecast data and the actual data. The change trend of the loss function value with the increase of the number of training iterations is shown in Figure 4. It can be seen from Figure 4 that during the training process of the prediction method, as the number of iterations increases, the loss function value decreases rapidly and steadily, indicating the training effect of the prediction method better.

1-4. Select trading indicators and fundamental data as the input features of the convolutional neural network and the long- and short-term memory network. The opening price, highest price, lowest price and closing price of the foreign exchange rate on the day are the best and most direct reflection of current market conditions.

Technical indicators are calculated through basic transaction data and are mainly used to assist in judging the trend of exchange rate changes. Commonly used technical indicators mainly include the following: MA (moving average) and MACD (moving average of similarities and differences) and other trend indicators. It is used to reflect the current trend of exchange rate changes, whether it is an upward trend, a downward trend, or a shock trend. The other type is counter-trend indicators, or overbought and oversold indicators, which are mainly used to determine the turning point of the trend. The commonly used indicators of this type are KDJ (stochastic index), BIAS (rate of deviation), and RSI (relative strength indicator). ) And ROC (rate of price change), etc. In terms of the overall market situation of the foreign exchange market, the US dollar index is used to represent the situation of the entire market because the index represents the volatility of mainstream currency pairs in the foreign exchange market.

Technical indicators are calculated through trading indicators. Commonly used technical indicators include moving parallel lines and smooth similarities and differences moving parallel lines. Moving parallel lines and smoothing similarities and differences moving parallel lines are used to reflect the current trend of exchange rate changes, and use counter-trend indicators to determine trend turning points. , Counter-trend indicators include stochastic indicators, deviation rates, relative strength indicators and price changes.

The moving parallel line mark is to calculate the average value of the closing price of the exchange rate in a certain period of time. The average value is used as the basis for judging the trend change. The specific calculation formula is shown in formula (2).

Among them, N represents the time period, and close represents the closing price;

Select the fast moving average and the slow moving average, and then calculate the DIF smooth moving average DEA, and finally get the smooth similarity and difference moving average. The specific calculation is shown in formulas (3)-(7).

BAR=2×(DIF-DEA) (7)

The specific calculation formula of the stochastic index is shown in formulas (8)-(11),

RSV _N = (Close _(N) -Low _(N) )÷(High _(N) -Low _(N) )×100% (8)

J=3×K-2×D (11)

Among them, Close _(N) is the average closing price in N days, Low _(N) is the lowest price in N days, High _(N) is the highest price in N days, K _-1 is the K value of the previous day, and D _-1 is The D value of the previous day can be divided into overbought, oversold and shock zones according to the different values of KDJ. The general classification standard is: when the KDJ value is above 80, it is considered as an overbought zone; when the KDJ value is below 20, it is considered as an oversold zone, and you can consider buying; when the KDJ value is between 20-80 It is a shock zone at times, and should continue to wait and see, and it is not suitable to trade.

The specific calculation formula of the deviation rate is as formula (12),

The calculation formula of the relative strength index is as formula (13),

Among them, Rise _i is the increase in the closing price on the i day, and Fall _i is the decrease in the closing price on the i day; the formula for calculating the rate of price change is equation (14),

ROC=Close÷Close _-N (14)

In summary, the transaction indicators are summarized as shown in Table 1.

Table 1

Economic indicators

interest rate

Interest rate refers to the ratio of the amount of interest to the amount of borrowed funds or principal during a certain period of time. The level of capital cost of an enterprise is mainly affected by interest rates. At the same time, interest rates also determine the financing and investment of enterprises. The current status of interest rates and changes and development trends must be paid attention to in the study of financial markets. Refers to the ratio of the amount of interest due during each period in the amount of borrowing, deposit or borrowing (referred to as the total principal) to the face value.

Factors such as the total principal amount, interest rate, frequency of compound interest, and length of time for lending, depositing or borrowing determine the sum of all interest on the loaned or borrowed funds. The interest rate is the remuneration that the borrower pays for the principal borrowed or consumption in advance, and the price to be paid for the money borrowed. It is also the remuneration received by the lender for deferring consumption and lending funds to the borrower. Interest rate generally refers to the percentage of interest earned over a one-year period to the principal.

GDP

GDP (Gross Domestic Product): refers to the total market value of all final products and labor produced in a country (or region) within a certain period of time. It is a measure of the comprehensive strength of a country (or region) and a country ( (Or region) an important indicator of economic development status is the core indicator of national economic accounting. GDP cannot be used to measure the economic status of a region or city. According to the state or higher-level unit, the amount that needs to be levied in different cities each year is different, so each The remaining wealth in each city is also different.

Step 2. Train and optimize the method constructed in Step 1 from the three aspects of input features, network structure and training methods. The training optimization items include feature optimization of principal component analysis, convolutional neural network and long and short-term memory network lag period optimization, Convolutional neural network and long short-term memory network structure optimization, convolutional neural network and long short-term memory network training method optimization, GPU-based parallel optimization;

In terms of training methods, the Adam, SGD, and RMSProp methods are used to train the network. By comparing the prediction accuracy of the training algorithm and the change of the loss function with the number of iterations and the convergence speed during the training process, the effect of different training methods on the training is studied. And the impact of prediction accuracy, the RMSProp training optimization method was finally chosen. The RMSProp training optimization method has a fast convergence speed, the most stable training process, and the best training optimization effect.

Construct a feature optimization algorithm based on PCA, and perform dimensionality reduction on input features. PCA (Principal Components Analysis) is the most classic method of dimensionality reduction. It is a linear, unsupervised, and global dimensionality reduction algorithm. The aim is to find the principal components in the data, and use these principal components to represent the features of the original data, so as to achieve the purpose of dimensionality reduction.

The main idea of PCA is to map the n-dimensional input feature vector to k-dimensional. This k-dimensional feature vector is a brand new orthogonal feature (ie principal component), which is a k-dimensional feature reconstructed on the basis of the original n-dimensional feature. . The main job of PCA is to sequentially find a set of mutually orthogonal coordinate axes from the feature space of the original input data. The first new coordinate axis is selected according to the direction with the largest variance in the original data, and the second new coordinate axis is selected. It is the plane orthogonal to the first coordinate axis that maximizes the variance, and the third axis is the plane orthogonal to the first two coordinate axes that has the largest variance. By analogy, n such coordinate axes can be obtained.

Most of the variances are contained in the first k coordinate axes, and the following coordinate axes contain very small variance values. Therefore, we can only keep the first k coordinate axes that contain most of the variance, which is equivalent to keeping only the Most of the dimensional characteristics of the variance, so as to achieve the dimensionality reduction of the original input data under the premise of preserving most of the characteristics of the data.

When researching problems, we often introduce multiple independent variables, which are combined into relatively high-dimensional feature vectors. The high-dimensional space in which these vectors are located often contains a lot of information redundancy and noise, and the dimensionality of the input variables is different. Too high will increase the complexity of the problem. Therefore, we hope to reduce the dimensionality of input variables as much as possible while preserving the main information, so as to improve the feature expression ability and reduce the complexity of training.

This method selects a total of 4 types of indicator data, one is basic transaction data directly related to foreign exchange transactions, the other is technical indicator data calculated from transaction data, the other is the US dollar index related to the overall situation of the foreign exchange market, and the other is National economic indicators that reflect the state of the country's economy. These four types of index data may have internal correlations, and too much input will also affect the convergence speed and generalization ability of the deep learning algorithm. Therefore, the input index is reduced based on the principal component analysis method. While reducing the dimensionality, part of the noise data in the data is removed.

The specific steps of constructing feature optimization algorithm based on PCA are as follows:

Calculate the covariance matrix S of the input feature matrix after centering;

_{Take the eigenvectors ω 1} , ω ₂ ,..., ω _k corresponding to the first k large eigenvalues λ ₁ , λ ₂ ,..., λ _k , and map the n-dimensional features to k-dimensional through equation (15)

The k- _th dimension of the new x′ _i is the projection of x i in the direction of the k-th principal component ω _k . By selecting the eigenvectors corresponding to the largest k eigenvalues, we discard the features with the smaller variance, so that each The n-dimensional column vector is mapped to the k-dimensional column vector x′ _i , and the k-dimensional feature matrix D′ is obtained.

There are many factors that affect exchange rate changes. This method mainly divides the influencing factors into four categories: basic transaction data, technical indicator data, dollar index, and national economic indicators. Based on the PCA dimensionality reduction algorithm, the following six comparison methods are established for the four types of influencing factors, and the best input feature combination after dimensionality reduction and denoising is selected through experimental research. Table 2 analyzes the impact of input features on prediction accuracy.

Table 2

It can be seen intuitively from Figure 5 that the root mean square error obtained by the fourth input feature combination is the smallest and the prediction accuracy is the highest. The third input feature combination comes next, indicating that the US dollar index has a large impact on the exchange rate. The root mean square error of the first type of input feature combination is only slightly higher than that of the third type, indicating that the index data calculated based on the basic transaction data has more redundant information, which is of little value for the analysis and prediction of the exchange rate. The root mean square error of the sixth input feature combination is the highest, indicating that only the US dollar index and national economic indicators cannot be used to predict the exchange rate better. The root mean square error of the seventh input feature combination is higher than that of the first, third, and fourth input feature combinations. This combination of input features shows that the use of all four types of input features will increase the noise of training samples and increase redundant information, which is not conducive to the analysis and prediction of exchange rates.

In summary, the input feature that has the greatest impact on foreign exchange forecasts is basic transaction data. When technical indicators are calculated through basic transaction data, information is lost. When the input data dimension is small, appropriately increasing the effective information can improve the prediction accuracy, but the increased information redundancy and noise will affect the prediction effect. Therefore, it is necessary to select appropriate input features for specific problems, and there are too few input features. It is easy to lead to under-fitting. Too many input features will increase the noise of the data, which will reduce the learning effect and training rate. Therefore, the fourth input feature combination is selected: basic transaction data, technical indicators and dollar index as the best input feature combination.

The number of lag periods n refers to the length of the analysis and prediction time series, that is, the n+1 day is predicted using the data of the previous n days. The difference in the number of lag periods may have an important impact on forecast accuracy. Based on the optimization in Section 4.1, choose 5, 10, 20, 30, 40, 50, 60 different lag periods, study the influence of the lag period n on the prediction accuracy, and select the best lag period n. The detailed laboratory data is shown in Table 3, and the data in Table 3 is visualized to get Figure 6.

table 3

Figure 6 clearly shows that as the number of lag periods increases, the root mean square error first decreases and then increases. When the number of lag periods is 30, the prediction accuracy is the highest. It shows that when the number of lag periods is less than 30, the sequence length is too short to fully reflect the changes in the sequence, and the algorithm cannot learn the essential laws in the training samples. When the number of lag periods is greater than 30, the sequence length is too long and the sequence contains More noise data affects the learning and training of the algorithm, indicating that the data far away from the predicted value has less impact on it. Therefore, the number of lag periods is set to 30.

The network structure of the deep learning algorithm has an important influence on the prediction accuracy. Since the number of input neurons and the number of output neurons are determined by the problem itself, the choice of the network structure refers to the choice of the size of the hidden layer. It can be seen from Section 4.1 that the best input feature is 12 dimensions, so the number of input neurons is 12, and because the predicted result is a numerical value, the number of output neurons is 1. The size of the hidden layer includes the number of convolutional layers, the size of the convolution kernel, the number of convolution kernels, the number of recurrent layers, and the size of recurrent layers.

First, set the number of convolutional layers, the size and number of convolution kernels to 1, and first experimentally study the influence of the size of the LSTM loop layer on the prediction accuracy, and select the optimal loop layer size. Set the number of hidden layers to 1, 2, 3, 4, and 5, and set the number of neurons in each layer to 8, 16, 32, 64, 128, and 256. The detailed experimental data is shown in Table 4.

Table 4

Visualize Table 4 to get Figure 7. From Figure 7, it can be seen directly that the root mean square error RMSE decreases with the increase of the number of neurons and the number of hidden layers, but when it increases to a certain amount, the RMSE does not Decrease but increase. At this time, the network structure is too large and the number of neurons is too large, and the phenomenon of over-fitting occurs. When the number of neurons is too small, the RMSE is larger, because the network scale is too small to effectively fit the training data, and the problem of underfitting occurs. Therefore, too many neurons will reduce the prediction accuracy. For different problems, it is necessary to set an appropriate network scale. Therefore, the number of LSTM hidden layers is set to 3, and the number of neurons in each layer is 128.

After choosing the size of the loop layer, continue to experiment to study the effect of the size of the convolution layer on the prediction accuracy. Set the number of convolutional layers to 1, 2, 3, 4, 5, the size of the convolution kernels are 1×1, 3×3, 5×5, 7×7, the number of convolution kernels and the convolution sliding step The length is set to 1 to ensure that the output dimension of the convolutional layer is consistent with the input dimension. The specific experimental results are shown in Table 5, and Table 5 is visualized as shown in Figure 8.

table 5

It can be seen intuitively from Fig. 8 that when the number of convolutional layers is 2 and the size of the convolution kernel is 3×3, the RMSE value is the smallest and the prediction accuracy is the highest. When the number of convolutional layers is 1, the degree of abstraction of the data is not enough, and the data still has a lot of noise. When the number of convolutional layers is greater than 2, the data is too abstract and the original features of the data are lost, so the abstraction is not enough Or excessive abstraction will reduce the prediction accuracy of the algorithm. When the convolution kernel is 1×1, only nonlinear changes are made to the data, but the spatial features around the data are not abstracted, and when the convolution kernel is larger than 3×3, the spatial features around the data are collected too much , On the contrary, it affects the prediction accuracy, which shows that the location far away from the data has less correlation with the current data. In summary, when the number of convolutional layers is 2 and the convolution sum size is 3×3, the spatial characteristics of the data are better abstracted. Therefore, the number of convolutional layers is set to 2, and the size of the convolution kernel is set. It is 3×3.

The network structure optimization of convolutional neural network and long short-term memory network includes the following parts:

Hyperparameter optimization of cyclic layer of long-short-term memory network;

Convolutional layer hyperparameter optimization;

On the basis of selecting the above-mentioned hyperparameters, these three different combination methods were used for experimental research. The experimental results show that the (1) algorithm combination method has the highest prediction accuracy, so the serial combination method of CNN first and LSTM is used. The iterative training method of deep learning algorithms has always been a key issue of research, and the effect of the training method will directly affect the accuracy of the prediction. Among them, the most commonly used method to solve the training optimization problem is the method based on gradient descent, which focuses on how to optimize the training effect with the least number of training times, and at the same time prevent the occurrence of over-fitting problems. The three training optimization methods of Adam, SGD and RMSProp were compared and analyzed, and the best training method was selected according to the experimental results. The experimental results are shown in Table 6. It can be seen from Table 6 that after using the SGD optimization method, the prediction accuracy is lower than the other two optimization methods. The RMSProp optimization method and the Adam optimization method have the same impact on the prediction accuracy.

Table 6

It can be seen from Figure 9-11 that the training process of the Adam optimization method is unstable, and the loss value fluctuates significantly as the number of iterations increases. The RMSProp optimization method has a faster convergence rate than the Adam optimization method, and the training process is stable. The loss value steadily decreases as the number of iterations increases, and the training optimization effect is better. The SGD optimization method converges slowly, and the loss value fluctuates during the training process, and the training effect is not as good as that of RMSProp.

In summary, the RMSProp training optimization method converges fast, the training process is the most stable, and the training optimization effect is the best.

So choose to use RMSProp training optimization method.

The financial market has high requirements for timeliness, and deep learning algorithm training requires a large amount of calculation and time-consuming, and it is difficult to meet the high timeliness needs of the financial market. Therefore, it is necessary to use GPU high-performance computing technology for parallel optimization of the training process. The use of multiple GPUs to accelerate the training process effectively improves the training speed and further enhances the usability of the prediction method in the foreign exchange market.

The deep learning model is an iterative process. In order to train the model faster, common parallel deep learning model training methods are used to train the model. In order to ensure the timeliness of the model in practical applications, it is necessary to use multiple GPUs to accelerate the training process of the model in parallel.

In each iteration, according to the value of the current parameter, the forward propagation algorithm is used to calculate the predicted value on a part of the training data set. According to the difference between the predicted value and the real value, the back propagation algorithm calculates the parameter gradient according to the loss function. The parameters are updated. There are two ways to parallelize deep learning models: synchronous parallel mode and asynchronous parallel mode.

The flowchart of the asynchronous training mode is shown in Figure 12. It can be seen that in each iteration of the asynchronous parallel mode algorithm, different devices read the latest parameters, and then obtain a small part of the training data for training, run the backpropagation process independently and update the parameters independently .

The difference between the synchronous parallel mode and the asynchronous parallel mode algorithm is that all devices in the synchronous parallel mode algorithm process obtain the same parameter, as shown in Figure 13 below. According to Figure 13, it can be seen that the synchronous parallel mode algorithm is read by different devices in each iteration. For the same parameters, after the back-propagation algorithm, take the average value of the parameter update gradient to update the parameters, and finally update the parameters uniformly. The training process of the algorithm in two ways.

The GPU-based synchronization mode parallel training optimization algorithm is described as follows. n is the number of GPUs, D _train is the training data set, batch-size is the size of the training data set of each batch, the data sets of different batches are distributed to different GPUs for training at the same time, and the calculations are calculated on different GPUs. Gradient value

Then calculate the average of the gradients on all GPUs to get

use

As the gradient update amount of this training.

Up to 4 GPU devices are used for experimental analysis. Different numbers of GPUs are used to accelerate the algorithm training process. The acceleration effect of the training process is shown in Figure 14. It can be clearly seen from Figure 14 that as the number of GPUs increases, the training speed Shows a nearly linear steady upward trend, but it will also increase the corresponding overhead such as data communication. Due to the large amount of calculation required in the training process, the training speed of a single GPU has increased exponentially. In order to better meet the high timeliness requirements of the foreign exchange market, an appropriate increase in the number of GPUs will effectively improve the training speed of the prediction method and the actual Usability in application scenarios.

The experimental environment of this prediction method is shown in Table 7 below.

Table 7

Use Python3 as the main programming language, and use Google's TensorFlow as the deep learning framework. TensorFlow is currently the most popular deep learning framework, which can quickly implement various deep learning algorithms and has the advantages of strong portability, convenience and flexibility, and good performance. Using the Tensorboard visualization tool, the data flow diagram of the prediction method is shown in Figure 15. In Figure 15, the input data first flows from the input layer to the first convolutional layer layer1-conv1, and then flows into the second convolutional layer layer2-conv2, after the convolution calculation of the two convolutional layers, it flows into the LSTM recurrent layer layers -lstm, finally flows into the fully connected layer to calculate the forward propagation prediction value, and then according to the back propagation algorithm, use RMSProp to train the optimization algorithm, train and update the parameters of each layer, and finally save the trained network to the hard disk for use Predictive analysis of new data is used.

The prediction method selects EURUSD (Euro to U.S. Dollar), AUDUSD (Australian Dollar to U.S. Dollar), XAUUSD (Gold to U.S. Dollar), GBPJPY (British Pound to Japanese Yen), EURJPY (Euro to U.S. Dollar) from January 3, 2008 to January 3, 2018. 15-minute data of 9 active currency pairs, including GBPUSD (pound sterling to U.S. dollar), USDCHF (U.S. dollar to Swiss franc), USDJPY (U.S. dollar to Japanese yen), and USDCAD (U.S. dollar to Canadian dollar), raw transaction data Downloaded from the MT4 trading platform, the US dollar index and US economic indicators are collected from the Golden Ten Data website ^[62] . The original data example of EURUSD (Euro to U.S. Dollar) is shown in Table 8. The data format of other currency pairs is similar to this. The article is limited in length and will not be shown one by one.

Table 8

datedate	timetime	openopen	highhigh	lowlow	closeclose	usdx_ooenusdx_ooen	usdx_highusdx_high	usdx_lowusdx_low	usdx_closeusdx_close	raterate	gdpgdp
2008.01.032008.01.03	0：000:00	1.47231.4723	1.47241.4724	1.4721.472	1.47211.4721	76.0576.05	76.0676.06	76.0376.03	76.0676.06	1.921.92	14.7214.72
2008.01.032008.01.03	0：150:15	1.47221.4722	1.47251.4725	1.4721.472	1.47241.4724	76.0676.06	76.0876.08	76.0376.03	76.0476.04	1.921.92	14.7214.72
2008.01.032008.01.03	0：300:30	1.47231.4723	1.47251.4725	1.47121.4712	1.47121.4712	76.0476.04	76.0476.04	76.0276.02	76.0276.02	1.921.92	14.7214.72
2008.01.032008.01.03	0：450:45	1.47131.4713	1.47221.4722	1.47111.4711	1.47151.4715	76.0276.02	76.0576.05	75.9675.96	75.9975.99	1.921.92	14.7214.72

Since the original data only has basic transaction data and economic index data, technical index data needs to be calculated based on basic transaction data. Through statistical analysis of the original data, it is found that the original data has the problem of missing, and there is also strong noise, and the direct unit of the feature dimension is inconsistent. In response to the above problems, the missing data was compensated, the data at the previous moment compensated for the missing data, and so on. After making up for the missing data, normalize the feature with zero mean, and map the original data to a distribution with a mean of 0 and a standard deviation of 1. The normalized calculation formula is:

In formula (16), μ is the mean value of the original features, and σ is the standard deviation. By making up and normalizing the original data, the integrity and standardization of the original data are guaranteed. Select the first 80% of the data as the training set, and the remaining 20% as the test set. To construct the comparison method, first initialize the necessary hyperparameter values: the number of lag periods is set to 30, the number of hidden layers is set to three, the number of hidden layer nodes is set to 128, the loss function is selected as the mean square error, and the optimization method is selected as RMSProp and batch_size. Is 300, and the number of training iterations is set to 1000.

Since this algorithm is a regression algorithm, RMSE (Root Mean Square Error) is selected as the evaluation standard of prediction effect. RMSE is very sensitive to the prediction error of the sequence, and can well reflect the prediction accuracy of the algorithm. The smaller the RMSE value, the higher the prediction accuracy of the algorithm. The calculation formula of RMSE is as follows:

In formula (17), y _i is the i-th true value,

Is the i-th predicted value, and n is the length of the predicted sequence and the true sequence. On the basis of setting the hyperparameter values, construct a comparative prediction method based on BP, CNN, RNN, and LSTM neural network, and conduct an experimental comparative analysis with the constructed C-LSTM prediction method. According to the root mean square error of different prediction methods Judge the forecast accuracy of various forecasting methods. If the constructed forecasting method has the smallest root mean square error and its forecasting accuracy is better than its comparative forecasting method, it can be proved that the forecasting method constructed based on two deep learning algorithms can be used in foreign exchange time series analysis. Validity and applicability in the

Based on two deep learning algorithms of CNN and LSTM, the C-LSTM foreign exchange time series short-term prediction method was constructed, and the relatively optimal combination of input features, the optimal number of lag periods, the optimal hidden layer size and algorithm combination method were selected, and The training method with the best effect further improves the prediction accuracy of the C-LSTM prediction method.

In order to verify the effectiveness and applicability of the constructed C-LSTM foreign exchange time series short-term forecasting method in foreign exchange market analysis and forecasting, different neural network algorithms such as BP, CNN, RNN| and LSTM were used to construct multiple comparative forecasting methods. To compare and analyze the forecasting effects of multiple forecasting methods, the lower the RMSE value obtained by the forecasting method in the test set data, the better the forecasting effect. The specific experimental results are shown in Table 9 below, and Table 9 is visualized as shown in Figure 16.

Table 9

The fitting diagram of the prediction effect of some test data on the EURUSD (Euro to U.S. dollar) currency pair is shown below. Due to the limitation of the length of the article, the fitting diagram of the prediction effect of the test data of all currency pairs is not shown here. The prediction effect is based on the test data of other currency pairs. Combine the pictures to draw a consistent conclusion.

From Figure 16-21, it can be seen intuitively that the constructed C-LSTM prediction method has the lowest RMSE value on 9 different currencies, and the prediction effect fitting graph has the best fitting effect. Therefore, according to the experimental data, the construction The forecasting effect of the forecasting method is better than that of all comparative forecasting methods, which fully proves the effectiveness and applicability of the constructed C-LSTM foreign exchange time series short-term forecasting method in foreign exchange time series analysis.

Further analysis, the prediction effect of the RNN prediction method is the worst. As the number of iterations increases, the RNN prediction effect does not improve, and the problem of gradient disappearance occurs. The prediction effect of the corresponding LSTM prediction method has been greatly improved, which proves that the LSTM network structure can Effectively solve the problem of gradient disappearance. Among them, the BP neural network algorithm has also achieved relatively good prediction results, but for more complex problems, the prediction effect of the BP neural network is lower than that of deep neural networks such as CNN and LSTM. Although the prediction effect of CNN neural network is better than that of BP and RNN neural networks, its prediction effect is lower than that of LSTM because it is difficult to effectively mine the temporal characteristics of data. The constructed C-LSTM prediction method effectively combines the advantages of LSTM and CNN, fully excavates the temporal and spatial characteristics of foreign exchange time series data, and improves the prediction accuracy of the prediction method.

In summary, the C-LSTM foreign exchange time series short-term prediction method is constructed by combining the two deep learning algorithms of CNN and LSTM. Its prediction effect is better than that of the two algorithms alone and the prediction effect of BP and RNN neural networks. It proves the effectiveness of the combination of the two deep learning algorithms and the applicability of the forecasting method in the analysis of foreign exchange time series. It can provide a certain reference for improving the prediction accuracy of deep learning algorithms, and at the same time provide certain theoretical and practical value for the application of deep learning technology in foreign exchange time series analysis.

Of course, the above description is not a limitation of the present invention, and the present invention is not limited to the above examples. Changes, modifications, additions or substitutions made by those skilled in the art within the essential scope of the present invention shall also belong to the present invention. The scope of protection of the invention.

Claims

A foreign exchange time series forecasting method is characterized in that it comprises the following steps:

Step 1. Construct a C-LSTM prediction method based on the combination of convolutional neural network and long short-term memory network, which specifically includes:

1-1. Construct a C-LSTM network model based on the combination of convolutional neural network and long short-term memory network, including:

1-1-1, build five functional modules including input layer, hidden layer, output layer, network training and network prediction;

1-1-2, construct training and prediction algorithms for the C-LSTM short-term prediction method of foreign exchange time series based on the combination of convolutional neural network and long short-term memory network;

1-2. Choose the activation function of C-LSTM that combines convolutional neural network and long-term short-term memory network;

1-3, define the loss function of C-LSTM combining convolutional neural network and long short-term memory network;

1-4. Select transaction indicators and fundamental data as the input features of C-LSTM combined with convolutional neural network and long- and short-term memory network;

Step 2. Train and optimize the method constructed in Step 1 from the three aspects of input features, network structure and training methods. The training optimization items include feature optimization of principal component analysis, convolutional neural network and long- and short-term memory network combined C- LSTM lag period optimization, C-LSTM structure optimization combining convolutional neural network and long short-term memory network, C-LSTM training method optimization combining convolutional neural network and long short-term memory network, GPU-based parallel optimization;

In terms of input features, 18 indicator data are selected as input features. The 18 indicator data are divided into four categories: basic transaction data, technical indicator data, dollar index and national economic indicators. These four types of indicators are combined and based on principal components. The analysis method optimizes the input features, studies the impact of different indicators on the prediction accuracy and selects the best input features, and then experimentally studies the impact of the number of lag periods on the prediction accuracy, so as to select the best number of lag periods;

In terms of network structure, according to the grid search algorithm to study the best hidden layer structure size, by changing the combination of different convolutional neural networks and long and short-term memory networks, to study the impact of different algorithm combinations on the prediction accuracy, and choose the best The hidden layer size and algorithm combination method;

In terms of training methods, the Adam, SGD, and RMSProp methods are used to train the network. By comparing the prediction accuracy of the training algorithm and the change of the loss function with the number of iterations and the convergence speed during the training process, the effect of different training methods on the training is studied. And the impact of prediction accuracy, and finally choose the appropriate training method.
A foreign exchange time series prediction method according to claim 1, wherein in said step 1, the relu function is selected as the activation function of the C-LSTM that combines the convolutional neural network and the long- and short-term memory network, and in the network structure After adding the activation function, the neural network has the ability to fit nonlinear systems.
A foreign exchange time series prediction method according to claim 1, wherein in said step 1, the mean square error is selected as the loss function, and the loss function is shown in formula (1),

Among them, y i is the correct answer corresponding to the i-th data in the data sequence batch,
Is the predicted value of the neural network corresponding to the i-th data.
A foreign exchange time series forecasting method according to claim 1, characterized in that, in said step 1, technical indicators are calculated by trading indicators, and commonly used technical indicators include moving parallel lines and smooth moving parallel lines. Moving parallel lines and smoothing similarities and differences moving parallel lines are used to reflect the current trend of exchange rate changes. Anti-trend indicators are used to determine trend turning points. Anti-trend indicators include stochastic indicators, deviation rates, relative strength indicators, and price changes.
A foreign exchange time series forecasting method according to claim 4, wherein the moving parallel line indicator is to calculate the average value of the closing price of the exchange rate in a certain period of time, and the average value is used as the basis for judging the trend change. The calculation formula is shown in formula (2),

Among them, N represents the time period, close i represents the closing price of the i-th day;

Select the fast moving average and the slow moving average, and then calculate the DIF smooth moving average DEA, and finally get the smooth similarity and difference moving average. The specific calculation is shown in formulas (3)-(7).

BAR=2×(DIF-DEA) (7)

In formulas (3)-(7), EMA -1 is the exponential moving average of the previous day, Close is today's closing price, and BAR is the height of the MACD histogram.
A foreign exchange time series forecasting method according to claim 4, characterized in that the specific calculation formula of the stochastic index is as shown in formulas (8)-(11),

RSV N = (Close (N) -Low (N) )÷(High (N) -Low (N) )×100% (8)

J=3×K-2×D (11)

Among them, Close (N) is the average closing price in N days, Low (N) is the lowest price in N days, High (N) is the highest price in N days, K -1 is the K value of the previous day, and D -1 is D value of the previous day;

The specific calculation formula of the deviation rate is as formula (12),

Among them, Close is the closing price of the day, N is the time period, and the value is 12;

The calculation formula of the relative strength index is as formula (13),

Among them, Rise i is the increase in the closing price on the i day, and Fall i is the decrease in the closing price on the i day;

The formula for calculating the rate of price change is equation (14),

ROC=Close÷Close -N (14)

Among them, Close is the closing price of the day, and Close -N is the closing price of the previous N days.
A foreign exchange time series prediction method according to claim 1, characterized in that, in step 2, a feature optimization algorithm is constructed based on PCA, and the input features are reduced in dimensionality.
A foreign exchange time series forecasting method according to claim 1, wherein the step of constructing a feature optimization algorithm based on PCA is specifically:

Perform centralization processing on the input n-dimensional feature matrix D, that is, each column of data is subtracted from the column mean μ;

Calculate the covariance matrix S of the input feature matrix after centering;

For the calculated eigenvalue λ of the covariance matrix and its corresponding eigenvector ω, and sort the eigenvalues from large to small λ 1 , λ 2 ,..., λ n ;

Take the eigenvectors ω 1 , ω 2 ,..., ω k corresponding to the first k large eigenvalues λ 1 , λ 2 ,..., λ k , and map the n-dimensional features to k-dimensional through equation (15),

The k- th dimension of the new x′ i is the projection of x i in the direction of the k-th principal component ω k . By selecting the eigenvectors corresponding to the largest k eigenvalues, the features with the smaller variance are discarded, so that each n The dimensional column vector is mapped to a k-dimensional column vector x′ i , and a k-dimensional feature matrix D′ is obtained.
A foreign exchange time series prediction method according to claim 1, wherein the optimization of the C-LSTM network structure combining the convolutional neural network and the long- and short-term memory network in step 2 includes the following parts:

Hyperparameter optimization of cyclic layer of long-short-term memory network;

Convolutional layer hyperparameter optimization;

The algorithm combination method is optimized. The combination method of convolutional neural network and long short-term memory network includes:

The convolutional neural network is first followed by the long and short-term memory network, and the output of the convolutional neural network layer is used as the input of the long and short-term memory network layer;

The long-short-term memory network is first followed by the convolutional neural network, and the output of the long-short-term memory network layer is used as the input of the convolutional neural network layer;

After the convolutional neural network, the long and short-term memory network is performed separately, and the output of the two algorithms is combined to make the final prediction.
A foreign exchange time series prediction method according to claim 1, characterized in that, in said step 2, Adam, SGD and RMSProp methods are used for network training, and RMSProp training optimization method is selected, and the RMSProp training optimization method has a fast convergence speed , The training process is the most stable, and the training optimization effect is the best.