CN111445341A

CN111445341A - Futures model training and transaction implementation method based on multi-scale self-attention

Info

Publication number: CN111445341A
Application number: CN202010419707.1A
Authority: CN
Inventors: 江晨舟; 李武军
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2020-05-18
Filing date: 2020-05-18
Publication date: 2020-07-24

Abstract

The invention discloses a futures model training and transaction implementation method based on multi-scale self-attention, which comprises the steps of collecting futures main force contract five-level high-frequency data at the construction stage of a futures high-frequency data set, preprocessing the data, and constructing a label by using future price change; in a deep feature extraction layer training stage, constructing a deep neural network based on multi-scale self-attention, training the network by using the constructed label and storing model parameters; in the training stage of the transaction model, constructing the transaction model, using the features output by the deep feature extraction layer, and training the transaction model by using a method of maximizing the sharp ratio; and in the stage of outputting transaction decision by using the deep feature extraction layer and the transaction model, outputting transaction actions by using the features extracted by the deep feature extraction layer and the trained transaction model. The invention considers the multi-scale characteristics of the financial time sequence and the correlation among different time sequences from the angle of the model, and improves the accuracy of the futures trading data prediction.

Description

Futures model training and transaction implementation method based on multi-scale self-attention

Technical Field

The invention relates to a futures model training and transaction implementation method based on multi-scale self-attention, and belongs to the technical field of model quantification.

Background

Quantitative investment as an investment methodology has many advantages such as discipline, systematicness, timeliness and quantification. With the development of computer technology and the advent of the big data era, more and more people choose to use a quantitative investment method using computer technology as a foundation to replace the conventional subjective investment method, and various quantitative investment funds are established successively. In recent years, as machine learning artificial intelligence technology develops, more and more people try to use the technology to construct intelligent quantitative investment strategies.

At present, many quantitative investment research works based on deep learning are available and mainly divided into two parts: non-deep learning based methods and deep learning based methods. The non-deep learning-based method is mainly characterized in that after characteristics are manually constructed, modeling is carried out through a non-deep model, for example, linear regression, a support vector machine and a tree-based model such as a random forest and a gradient lifting tree are used. The deep model method is mainly used for establishing a deep neural network learning rule, such as a convolutional neural network, a cyclic neural network and the like, after appropriate data preprocessing is carried out.

But none of these works has modelled on the multi-scale nature of financial time series data and correlation information between different time series data.

Disclosure of Invention

The purpose of the invention is as follows: at present, relevant work related to deep learning and quantification of financial models does not consider multi-scale characteristics of financial time series data and relevant information between different time series data from the perspective of the models. Aiming at the problem, the invention provides a futures model training and transaction implementation method based on multi-scale self-attention, and the model training method comprises the following steps: constructing a futures high-frequency data set; constructing a depth feature extraction layer network based on multi-scale self-attention; training a depth feature extraction layer network, learning multi-scale time sequence information of high-frequency financial data and correlation information among different time sequences, and extracting depth features; and then constructing an automatic transaction model, and training the transaction model by maximizing the sharp ratio by using the extracted depth features as input. The transaction implementation method comprises the following steps: in real transaction, transaction decisions are output in real time through a depth feature extraction layer and a transaction model. The invention has the characteristic of no dependence on artificial features; the method can make up for the defect that the conventional model cannot well acquire multi-scale time sequence characteristics of high-frequency data and correlation characteristics among different time sequences, can automatically extract the time sequence characteristics, and can automatically make a trading strategy according to the extracted depth characteristics, so that the accuracy is higher.

The technical scheme is as follows: a futures model training method based on multi-scale self-attention comprises the steps of constructing a futures main power contract high-frequency data set, constructing and training a deep feature extraction layer based on multi-scale self-attention, and training a transaction model based on features obtained by the deep feature extraction layer.

The construction steps of the futures mastership contract high-frequency data set are as follows:

step 101, removing non-principal contract data from the obtained limit order book data; the main power contract is the contract with the largest transaction amount in the current period.

102, calculating the average weighted average price change percentage after K moments in the future as a label;

and 103, preprocessing the price data of each gear of the futures main force contract high-frequency data at each moment, and standardizing all price data.

The depth feature extraction layer based on the multi-scale self-attention is formed by stacking convolution network layers which are alternately convolved transversely and longitudinally, the transverse convolution layer mainly extracts features of purchase and sale prices of different gears in high-frequency data at the same time, the longitudinal convolution layer mainly uses a multi-scale extraction module to learn and extract multi-scale time sequence features in time sequence dimensions, the multi-scale extraction module mainly enables the time sequence data to be parallelly spliced after passing through one-dimensional convolution kernels of different sizes to obtain the multi-scale features, the transverse convolution layer and the longitudinal convolution layer are alternately arranged in three layers, two multi-head self-attention layers are used for learning correlation features of different time sequences after the convolution layer is followed, and finally, a layer of long-Short Term Memory (L ong Short Term Memory, L STM) network and a full connection layer are used for learning the time sequence features and outputting change prediction values of future prices.

The training steps of the depth feature extraction layer based on the multi-scale self-attention are as follows:

step 201, initializing network parameters of a depth feature extraction layer;

step 202, inputting the preprocessed high-frequency data block and a corresponding label thereof; the high-frequency data of each day are divided to obtain continuous high-frequency data blocks which are cut into the same size; and dividing a training set, a verification set and a test set according to time;

step 203, training a depth feature extraction layer according to the input preprocessed high-frequency data block and the corresponding label thereof;

step 204, randomly sampling data with the batch size b from a training set during training, using an average Absolute Error (MAE) loss function as a loss function, and updating by using a gradient descent method after calculating the network parameter gradient through back propagation;

step 205, stopping training when the number of training rounds reaches the maximum value, and taking the index R on the verification set²Removing the last full-connection layer in the network after the highest model, and saving the rest parameters as a depth feature extraction layer for use in the subsequent steps;

step 206, calculating depth characteristics obtained after all high-frequency data blocks on the training set pass through the depth characteristic extraction layer

Wherein M represents the dimension of the depth feature; and calculating a mean of the depth features

And standard deviation e_σFor subsequent use.

The training steps of the transaction model based on the features obtained by the deep extraction layer are as follows:

step 301, constructing a transaction model and initializing network parameters;

step 302, the high frequency data of each day is divided into continuous high frequency data blocks with the same size, andrecording the intermediate price of the last time of each high-frequency data block as the price of the time, and obtaining a price sequence and a price difference sequence, wherein the price sequence is P ═ P₀,p₁,…p_T]Sequence of valence difference r_t＝p_t+1-p_t；

Step 303, randomly sampling L continuous high-frequency data blocks in a certain day during training, and then calculating the depth features of the high-frequency data blocks according to the depth feature extraction layer

Step 304, computing L STM hidden layer representation for any time t in continuous high frequency data block

Wherein θ is a parameter of L STM;

step 305, the decision signal F of the previous time is processed_t-1And h_tSplicing, calculating F_tThe formula is as follows:

F_t＝tanh(w^Th_t-1+b+uF_t-1)

and calculating a reward value R_t＝r_tF_t-1-c(|F_t-F_t-1|), wherein c is the transaction cost;

step 306, repeating steps 304 and 305 until L high-frequency data blocks are traversed;

step 307, establish the objective function U_TGradient update using a gradient ascent method such that U_TMaximization, i.e. increasing the gain of trading strategies;

and 308, if the number of training rounds reaches the maximum times, saving the model with the highest sharp rate on the verification set as the transaction model used last, and otherwise returning to the step 302.

The transaction model is composed of L STM layer and full connection layer, L STM layer is used for inputting each timeNormalized depth feature e_tAnd hidden layer characteristic h of last moment_t-1Performing time sequence modeling and outputting h_t(ii) a The full connection layer will make the decision signal F of the last time_t-1And h_tSplicing, output F_t；F_tOutputting a transaction decision action a through a symbolic function_t∈ { -1,0,1}, where-1 denotes holding empty head, 0 denotes holding empty bin, and 1 denotes holding multi-head.

A futures trading implementation method based on multi-scale self-attention outputs trading decisions through a trained trading model, and specifically comprises the following steps:

step 401, loading a deep feature extraction network and a transaction model;

step 402, reading high-frequency data in real time and arranging the high-frequency data into a format of a high-frequency data block;

step 403, the high frequency data block is preprocessed in the same step 103, and the transaction signal F at the last moment is read_t-1；

Step 404, calculating depth features through a depth feature extraction network and standardizing the depth features to obtain

Step 405, will

F_t-1Inputting the transaction model to obtain a transaction signal F_tThen obtaining the transaction action a through a symbolic function_tAnd is executed.

Has the advantages that: compared with the prior art, the futures model training and transaction implementation method based on multi-scale self-attention provided by the invention can better capture the high-frequency time sequence characteristics of futures by constructing the neural network based on multi-scale self-attention, and the transaction model constructed by the extracted depth characteristics has higher accuracy.

Drawings

FIG. 1 is a block diagram of a multi-scale self-attention based depth feature extraction layer in accordance with an embodiment of the present invention;

FIG. 2 is a flow chart of the training of the multi-scale self-attention-based depth feature extraction layer implemented by the present invention;

FIG. 3 is a flow chart of a training process for a feature-versus-transaction model based on deep extraction layers as practiced by the present invention;

FIG. 4 is a flow chart of a transaction model decision making process implemented by the present invention.

Detailed Description

The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.

A futures model training method based on multi-scale self attention comprises the following steps:

1) constructing a futures initiative contract high-frequency data set;

2) constructing a depth feature extraction layer based on multi-scale self-attention;

3) training a depth feature extraction layer based on multi-scale self-attention;

4) and training a transaction model based on the features obtained by the deep extraction layer.

In step 1), the step of constructing the futures mastership contract high-frequency data set for the collected futures high-frequency data is as follows:

and 103, preprocessing the price data of each gear at each moment of the high-frequency data, and normalizing all the price data.

FIG. 1 is a structural diagram of a depth feature extraction layer based on multi-scale self-attention, and the depth feature extraction layer is mainly formed by stacking convolution network layers which are alternately convolved in a transverse direction and a longitudinal direction, wherein the transverse convolution layer mainly extracts features of purchase and sale prices with different price depths in high-frequency data at the same moment, the longitudinal convolution layer mainly uses a multi-scale extraction module to learn and extract multi-scale time sequence features in a time sequence dimension, the multi-scale extraction module mainly conducts parallel connection on the time sequence data through one-dimensional convolution kernels with different sizes and then conducts splicing to obtain the multi-scale features, the transverse convolution layer and the longitudinal convolution layer are alternately arranged into three layers, two multi-head self-attention layers are arranged behind the convolution layers and used for learning correlation features of different time sequences, and finally one L STM layer and one full-connection layer are used for learning the time sequence features and.

Table 1 shows the structure of the multi-scale self-attention based depth feature extraction layer and the output size of each layer.

TABLE 1

In step 3), the training step of the depth feature extraction layer based on multi-scale self-attention comprises the following steps:

step 201, initializing network parameters of a depth feature extraction layer;

step 202, inputting the preprocessed high-frequency data block and a corresponding label thereof; the high-frequency data of each day are divided to obtain continuous high-frequency data blocks which are cut into the same size; and the training set, the verification set and the test set are divided according to time sequence.

step 205, when the number of training rounds reaches the maximumStopping training when the value is reached, and taking the index R on the verification set²Removing the last full-connection layer in the network after the highest model, and saving the rest parameters as a depth feature extraction layer for use in the subsequent steps;

And standard deviation e_σFor subsequent use.

The transaction model in the step 4) mainly comprises an L STM layer and a full connection layer, wherein the L STM layer is used for inputting the standardized depth characteristics e at each moment_tAnd hidden layer characteristic h of last moment_t-1Performing time sequence modeling and outputting h_t(ii) a The full connection layer will make the decision signal F of the last time_t-1And h_tSplicing, output F_t；F_tOutputting a transaction decision action a through a symbolic function_t∈ { -1,0,1}, where-1 denotes holding empty head, 0 denotes holding empty bin, and 1 denotes holding multi-head.

step 301, constructing a transaction model and initializing network parameters;

step 302, dividing the high-frequency data of each day, cutting the high-frequency data into continuous high-frequency data blocks with the same size, recording the middle price of the last moment of each high-frequency data block as the price of the moment, and obtaining a price sequence and a price difference sequence, wherein the price sequence is P ═ P₀,p₁,…p_T]Sequence of valence difference r_t＝p_t+1-p_t；

Wherein θ is a parameter of L STM;

F_t＝tanh(w^Th_t-1+b+uF_t-1)

The optimized object of the trading model is the sharp ratio, and the specific formula is as follows:

the sharp ratio is used for measuring the stability of income acquisition, and compared with other objective functions, the sharp ratio considers more factors and has better properties. The gradient calculation uses a time-based Back-Propagation threshold time (BPTT).

A futures trading implementation method based on multi-scale self-attention utilizes features obtained by a deep extraction layer and a trained trading model to output trading decisions, and comprises the following specific steps:

step 401, loading a deep feature extraction network and a transaction model;

Step 405, will

FIG. 2 is a flow chart of a training process of a depth feature extraction layer based on multi-scale self-attention, and the main process is described as follows, network parameters of the depth feature extraction layer are initialized, preprocessed high-frequency data blocks and corresponding labels thereof are input, the depth feature extraction layer is trained according to the preprocessed high-frequency data blocks and the corresponding labels thereof, fixed batch size data are randomly sampled from a training set during training, a loss function is updated by using L1 loss function after network parameter gradient is calculated through back propagation, training is stopped when the number of training rounds reaches the maximum value, and an index R on a verification set is obtained²Removing the last full-connection layer in the network from the highest model, and saving the rest parameters as a depth feature extraction layer for use in the subsequent steps; and calculating the depth characteristics, the mean value and the standard deviation of the depth characteristics, which are obtained after all the high-frequency data blocks on the training set pass through the depth characteristic extraction layer.

The main process is described as the following steps of constructing a transaction model and initializing network parameters, segmenting high-frequency data of each day, cutting the high-frequency data into continuous high-frequency data blocks with the same size, marking the intermediate price of the last moment of each high-frequency data block as the price of the moment to obtain a price sequence and a price difference sequence, randomly sampling L continuous high-frequency data blocks in a certain day during training, calculating the depth characteristics of the high-frequency data blocks according to the depth characteristic extraction layer and standardizing the depth characteristics, traversing L continuous high-frequency data blocks, inputting the depth characteristic of the current moment and the hidden layer representation of an STM of the last moment L at any moment, calculating the hidden layer representation of the current moment L STM, splicing a decision signal and a splicing reward value of the last moment, establishing a target function Sharp ratio, performing gradient updating by using a gradient ascent method to maximize the Sharp ratio, if the number of turns reaches the maximum times, storing the Sharp ratio on a verification set as the highest Sharp ratio, and otherwise repeating the above steps.

FIG. 4 is a flow chart of a transaction model decision making implemented by the present invention; the main process is described as follows: loading a deep feature extraction network and a transaction model; reading high-frequency data in real time and arranging the high-frequency data into a format of a high-frequency data block; preprocessing the high-frequency data block and reading a transaction signal at the current moment; calculating depth features through a depth feature extraction network and standardizing; calculating and executing a transaction signal by the depth feature, the hidden layer feature of the transaction model at the previous moment and the transaction signal at the previous moment through the transaction model; the above process is repeated until the transaction is over.

Claims

1. A futures model training method based on multi-scale self-attention is characterized by comprising the following steps: the method comprises the steps of constructing a futures mastership contract high-frequency data set, constructing and training a depth feature extraction layer based on multi-scale self-attention, and training a transaction model based on features obtained by the depth extraction layer.

2. The method for multi-scale self-attention based futures model training according to claim 1, wherein: the construction steps of the futures mastership contract high-frequency data set are as follows:

step 101, removing non-principal contract data from the obtained limit order book data; the main power contract is the contract with the largest transaction amount in the current time period;

3. The method for multi-scale self-attention based futures model training according to claim 1, wherein: the depth feature extraction layer based on the multi-scale self-attention is formed by stacking convolution network layers which are alternately convoluted transversely and longitudinally, the transverse convolution layer mainly extracts features of different gear buying and selling prices in high-frequency data at the same moment, and the longitudinal convolution layer mainly uses a multi-scale extraction module to learn and extract multi-scale time sequence features in time sequence dimension; the transverse coiling layers and the longitudinal coiling layers are alternated for three layers; two layers of multi-head self-attention layers follow the convolutional layer for learning correlation characteristics of different time sequences; and finally, a layer of long-time memory network and a layer of full connection layer are used for learning time sequence characteristics and outputting a change predicted value of future price.

4. The method for multi-scale self-attention based futures model training according to claim 1, wherein: in the depth feature extraction layer based on multi-scale self-attention, time sequence data are parallelly spliced after passing through one-dimensional convolution kernels with different sizes through a multi-scale extraction module, so that multi-scale features are obtained.

5. The method for multi-scale self-attention based futures model training according to claim 1, wherein: the training steps of the depth feature extraction layer based on the multi-scale self-attention are as follows:

step 201, initializing network parameters of a depth feature extraction layer;

step 204, randomly sampling data with the batch size b from a training set during training, wherein the loss function uses an average absolute error loss function, and updating by using a gradient descent method after calculating the network parameter gradient through back propagation;

And standard deviation e_σFor subsequent use.

6. The method for multi-scale self-attention based futures model training according to claim 1, wherein: the training steps of the transaction model based on the features obtained by the deep extraction layer are as follows:

step 301, constructing a transaction model and initializing network parameters;

step 302, the high frequency data of each day is divided into continuous high frequency data blocks with the same size,and recording the intermediate price of the last moment of each high-frequency data block as the price of the moment to obtain a price sequence and a price difference sequence, wherein the price sequence is P ═ P₀，p₁，...p_T]Sequence of valence difference r_t＝p_t+1-p_t；

Wherein θ is a parameter of L STM;

F_t＝tanh(w^Th_t-1+b+uF_t-1)

7. According to the claimsThe multi-scale self-attention-based futures model training method is characterized in that the transaction model consists of an L STM layer and a full-link layer, wherein the L STM layer is used for inputting normalized depth features e at each moment_tAnd hidden layer characteristic h of last moment_t-1Performing time sequence modeling and outputting h_t(ii) a The full connection layer will make the decision signal F of the last time_t-1And h_tSplicing, output F_t；F_tOutputting a transaction decision action a through a symbolic function_t∈ { -1,0,1}, where-1 denotes holding empty head, 0 denotes holding empty bin, and 1 denotes holding multi-head.

8. The method for multi-scale self-attention based futures model training according to claim 6, wherein: the optimized object of the trading model is the sharp ratio, and the specific formula is as follows:

the sharp ratio is used for measuring the stability of income acquisition, and compared with other objective functions, the sharp ratio considers more factors and has better properties. The gradient calculation uses a time-based back propagation algorithm.

9. A futures trading implementation method based on multi-scale self-attention is characterized in that a trading decision is output through a trained trading model, and the method comprises the following specific steps:

step 401, loading a deep feature extraction network and a transaction model;

step 403, pre-process the high frequency data block and read the transaction signal F at the previous time_t-1；

Step 405, will