CN111445341A - Futures model training and transaction implementation method based on multi-scale self-attention - Google Patents
Futures model training and transaction implementation method based on multi-scale self-attention Download PDFInfo
- Publication number
- CN111445341A CN111445341A CN202010419707.1A CN202010419707A CN111445341A CN 111445341 A CN111445341 A CN 111445341A CN 202010419707 A CN202010419707 A CN 202010419707A CN 111445341 A CN111445341 A CN 111445341A
- Authority
- CN
- China
- Prior art keywords
- frequency data
- training
- transaction
- model
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 71
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000000605 extraction Methods 0.000 claims abstract description 65
- 230000009471 action Effects 0.000 claims abstract description 7
- 230000008859 change Effects 0.000 claims abstract description 6
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 238000010276 construction Methods 0.000 claims abstract description 3
- 230000006870 function Effects 0.000 claims description 14
- 238000012795 verification Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 8
- 239000000284 extract Substances 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 7
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000015654 memory Effects 0.000 claims description 2
- 238000013528 artificial neural network Methods 0.000 abstract description 4
- 238000013135 deep learning Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000011002 quantification Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/06—Asset management; Financial planning or analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Software Systems (AREA)
- Development Economics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Human Resources & Organizations (AREA)
- Operations Research (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention discloses a futures model training and transaction implementation method based on multi-scale self-attention, which comprises the steps of collecting futures main force contract five-level high-frequency data at the construction stage of a futures high-frequency data set, preprocessing the data, and constructing a label by using future price change; in a deep feature extraction layer training stage, constructing a deep neural network based on multi-scale self-attention, training the network by using the constructed label and storing model parameters; in the training stage of the transaction model, constructing the transaction model, using the features output by the deep feature extraction layer, and training the transaction model by using a method of maximizing the sharp ratio; and in the stage of outputting transaction decision by using the deep feature extraction layer and the transaction model, outputting transaction actions by using the features extracted by the deep feature extraction layer and the trained transaction model. The invention considers the multi-scale characteristics of the financial time sequence and the correlation among different time sequences from the angle of the model, and improves the accuracy of the futures trading data prediction.
Description
Technical Field
The invention relates to a futures model training and transaction implementation method based on multi-scale self-attention, and belongs to the technical field of model quantification.
Background
Quantitative investment as an investment methodology has many advantages such as discipline, systematicness, timeliness and quantification. With the development of computer technology and the advent of the big data era, more and more people choose to use a quantitative investment method using computer technology as a foundation to replace the conventional subjective investment method, and various quantitative investment funds are established successively. In recent years, as machine learning artificial intelligence technology develops, more and more people try to use the technology to construct intelligent quantitative investment strategies.
At present, many quantitative investment research works based on deep learning are available and mainly divided into two parts: non-deep learning based methods and deep learning based methods. The non-deep learning-based method is mainly characterized in that after characteristics are manually constructed, modeling is carried out through a non-deep model, for example, linear regression, a support vector machine and a tree-based model such as a random forest and a gradient lifting tree are used. The deep model method is mainly used for establishing a deep neural network learning rule, such as a convolutional neural network, a cyclic neural network and the like, after appropriate data preprocessing is carried out.
But none of these works has modelled on the multi-scale nature of financial time series data and correlation information between different time series data.
Disclosure of Invention
The purpose of the invention is as follows: at present, relevant work related to deep learning and quantification of financial models does not consider multi-scale characteristics of financial time series data and relevant information between different time series data from the perspective of the models. Aiming at the problem, the invention provides a futures model training and transaction implementation method based on multi-scale self-attention, and the model training method comprises the following steps: constructing a futures high-frequency data set; constructing a depth feature extraction layer network based on multi-scale self-attention; training a depth feature extraction layer network, learning multi-scale time sequence information of high-frequency financial data and correlation information among different time sequences, and extracting depth features; and then constructing an automatic transaction model, and training the transaction model by maximizing the sharp ratio by using the extracted depth features as input. The transaction implementation method comprises the following steps: in real transaction, transaction decisions are output in real time through a depth feature extraction layer and a transaction model. The invention has the characteristic of no dependence on artificial features; the method can make up for the defect that the conventional model cannot well acquire multi-scale time sequence characteristics of high-frequency data and correlation characteristics among different time sequences, can automatically extract the time sequence characteristics, and can automatically make a trading strategy according to the extracted depth characteristics, so that the accuracy is higher.
The technical scheme is as follows: a futures model training method based on multi-scale self-attention comprises the steps of constructing a futures main power contract high-frequency data set, constructing and training a deep feature extraction layer based on multi-scale self-attention, and training a transaction model based on features obtained by the deep feature extraction layer.
The construction steps of the futures mastership contract high-frequency data set are as follows:
step 101, removing non-principal contract data from the obtained limit order book data; the main power contract is the contract with the largest transaction amount in the current period.
102, calculating the average weighted average price change percentage after K moments in the future as a label;
and 103, preprocessing the price data of each gear of the futures main force contract high-frequency data at each moment, and standardizing all price data.
The depth feature extraction layer based on the multi-scale self-attention is formed by stacking convolution network layers which are alternately convolved transversely and longitudinally, the transverse convolution layer mainly extracts features of purchase and sale prices of different gears in high-frequency data at the same time, the longitudinal convolution layer mainly uses a multi-scale extraction module to learn and extract multi-scale time sequence features in time sequence dimensions, the multi-scale extraction module mainly enables the time sequence data to be parallelly spliced after passing through one-dimensional convolution kernels of different sizes to obtain the multi-scale features, the transverse convolution layer and the longitudinal convolution layer are alternately arranged in three layers, two multi-head self-attention layers are used for learning correlation features of different time sequences after the convolution layer is followed, and finally, a layer of long-Short Term Memory (L ong Short Term Memory, L STM) network and a full connection layer are used for learning the time sequence features and outputting change prediction values of future prices.
The training steps of the depth feature extraction layer based on the multi-scale self-attention are as follows:
step 201, initializing network parameters of a depth feature extraction layer;
step 202, inputting the preprocessed high-frequency data block and a corresponding label thereof; the high-frequency data of each day are divided to obtain continuous high-frequency data blocks which are cut into the same size; and dividing a training set, a verification set and a test set according to time;
step 203, training a depth feature extraction layer according to the input preprocessed high-frequency data block and the corresponding label thereof;
step 204, randomly sampling data with the batch size b from a training set during training, using an average Absolute Error (MAE) loss function as a loss function, and updating by using a gradient descent method after calculating the network parameter gradient through back propagation;
step 205, stopping training when the number of training rounds reaches the maximum value, and taking the index R on the verification set2Removing the last full-connection layer in the network after the highest model, and saving the rest parameters as a depth feature extraction layer for use in the subsequent steps;
step 206, calculating depth characteristics obtained after all high-frequency data blocks on the training set pass through the depth characteristic extraction layerWherein M represents the dimension of the depth feature; and calculating a mean of the depth featuresAnd standard deviation eσFor subsequent use.
The training steps of the transaction model based on the features obtained by the deep extraction layer are as follows:
step 301, constructing a transaction model and initializing network parameters;
step 302, the high frequency data of each day is divided into continuous high frequency data blocks with the same size, andrecording the intermediate price of the last time of each high-frequency data block as the price of the time, and obtaining a price sequence and a price difference sequence, wherein the price sequence is P ═ P0,p1,…pT]Sequence of valence difference rt=pt+1-pt;
Step 303, randomly sampling L continuous high-frequency data blocks in a certain day during training, and then calculating the depth features of the high-frequency data blocks according to the depth feature extraction layer
Step 304, computing L STM hidden layer representation for any time t in continuous high frequency data block Wherein θ is a parameter of L STM;
step 305, the decision signal F of the previous time is processedt-1And htSplicing, calculating FtThe formula is as follows:
Ft=tanh(wTht-1+b+uFt-1)
and calculating a reward value Rt=rtFt-1-c(|Ft-Ft-1|), wherein c is the transaction cost;
step 306, repeating steps 304 and 305 until L high-frequency data blocks are traversed;
step 307, establish the objective function UTGradient update using a gradient ascent method such that UTMaximization, i.e. increasing the gain of trading strategies;
and 308, if the number of training rounds reaches the maximum times, saving the model with the highest sharp rate on the verification set as the transaction model used last, and otherwise returning to the step 302.
The transaction model is composed of L STM layer and full connection layer, L STM layer is used for inputting each timeNormalized depth feature etAnd hidden layer characteristic h of last momentt-1Performing time sequence modeling and outputting ht(ii) a The full connection layer will make the decision signal F of the last timet-1And htSplicing, output Ft;FtOutputting a transaction decision action a through a symbolic functiont∈ { -1,0,1}, where-1 denotes holding empty head, 0 denotes holding empty bin, and 1 denotes holding multi-head.
A futures trading implementation method based on multi-scale self-attention outputs trading decisions through a trained trading model, and specifically comprises the following steps:
step 401, loading a deep feature extraction network and a transaction model;
step 402, reading high-frequency data in real time and arranging the high-frequency data into a format of a high-frequency data block;
step 403, the high frequency data block is preprocessed in the same step 103, and the transaction signal F at the last moment is readt-1;
Step 404, calculating depth features through a depth feature extraction network and standardizing the depth features to obtain
Step 405, willFt-1Inputting the transaction model to obtain a transaction signal FtThen obtaining the transaction action a through a symbolic functiontAnd is executed.
Has the advantages that: compared with the prior art, the futures model training and transaction implementation method based on multi-scale self-attention provided by the invention can better capture the high-frequency time sequence characteristics of futures by constructing the neural network based on multi-scale self-attention, and the transaction model constructed by the extracted depth characteristics has higher accuracy.
Drawings
FIG. 1 is a block diagram of a multi-scale self-attention based depth feature extraction layer in accordance with an embodiment of the present invention;
FIG. 2 is a flow chart of the training of the multi-scale self-attention-based depth feature extraction layer implemented by the present invention;
FIG. 3 is a flow chart of a training process for a feature-versus-transaction model based on deep extraction layers as practiced by the present invention;
FIG. 4 is a flow chart of a transaction model decision making process implemented by the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
A futures model training method based on multi-scale self attention comprises the following steps:
1) constructing a futures initiative contract high-frequency data set;
2) constructing a depth feature extraction layer based on multi-scale self-attention;
3) training a depth feature extraction layer based on multi-scale self-attention;
4) and training a transaction model based on the features obtained by the deep extraction layer.
In step 1), the step of constructing the futures mastership contract high-frequency data set for the collected futures high-frequency data is as follows:
step 101, removing non-principal contract data from the obtained limit order book data; the main power contract is the contract with the largest transaction amount in the current period.
102, calculating the average weighted average price change percentage after K moments in the future as a label;
and 103, preprocessing the price data of each gear at each moment of the high-frequency data, and normalizing all the price data.
FIG. 1 is a structural diagram of a depth feature extraction layer based on multi-scale self-attention, and the depth feature extraction layer is mainly formed by stacking convolution network layers which are alternately convolved in a transverse direction and a longitudinal direction, wherein the transverse convolution layer mainly extracts features of purchase and sale prices with different price depths in high-frequency data at the same moment, the longitudinal convolution layer mainly uses a multi-scale extraction module to learn and extract multi-scale time sequence features in a time sequence dimension, the multi-scale extraction module mainly conducts parallel connection on the time sequence data through one-dimensional convolution kernels with different sizes and then conducts splicing to obtain the multi-scale features, the transverse convolution layer and the longitudinal convolution layer are alternately arranged into three layers, two multi-head self-attention layers are arranged behind the convolution layers and used for learning correlation features of different time sequences, and finally one L STM layer and one full-connection layer are used for learning the time sequence features and.
Table 1 shows the structure of the multi-scale self-attention based depth feature extraction layer and the output size of each layer.
TABLE 1
In step 3), the training step of the depth feature extraction layer based on multi-scale self-attention comprises the following steps:
step 201, initializing network parameters of a depth feature extraction layer;
step 202, inputting the preprocessed high-frequency data block and a corresponding label thereof; the high-frequency data of each day are divided to obtain continuous high-frequency data blocks which are cut into the same size; and the training set, the verification set and the test set are divided according to time sequence.
Step 203, training a depth feature extraction layer according to the input preprocessed high-frequency data block and the corresponding label thereof;
step 204, randomly sampling data with the batch size b from a training set during training, using an average Absolute Error (MAE) loss function as a loss function, and updating by using a gradient descent method after calculating the network parameter gradient through back propagation;
step 205, when the number of training rounds reaches the maximumStopping training when the value is reached, and taking the index R on the verification set2Removing the last full-connection layer in the network after the highest model, and saving the rest parameters as a depth feature extraction layer for use in the subsequent steps;
step 206, calculating depth characteristics obtained after all high-frequency data blocks on the training set pass through the depth characteristic extraction layerWherein M represents the dimension of the depth feature; and calculating a mean of the depth featuresAnd standard deviation eσFor subsequent use.
The transaction model in the step 4) mainly comprises an L STM layer and a full connection layer, wherein the L STM layer is used for inputting the standardized depth characteristics e at each momenttAnd hidden layer characteristic h of last momentt-1Performing time sequence modeling and outputting ht(ii) a The full connection layer will make the decision signal F of the last timet-1And htSplicing, output Ft;FtOutputting a transaction decision action a through a symbolic functiont∈ { -1,0,1}, where-1 denotes holding empty head, 0 denotes holding empty bin, and 1 denotes holding multi-head.
The training steps of the transaction model based on the features obtained by the deep extraction layer are as follows:
step 301, constructing a transaction model and initializing network parameters;
step 302, dividing the high-frequency data of each day, cutting the high-frequency data into continuous high-frequency data blocks with the same size, recording the middle price of the last moment of each high-frequency data block as the price of the moment, and obtaining a price sequence and a price difference sequence, wherein the price sequence is P ═ P0,p1,…pT]Sequence of valence difference rt=pt+1-pt;
Step 303, randomly sampling L continuous high-frequency data blocks in a certain day during training, and then calculating the depth features of the high-frequency data blocks according to the depth feature extraction layer
Step 304, computing L STM hidden layer representation for any time t in continuous high frequency data block Wherein θ is a parameter of L STM;
step 305, the decision signal F of the previous time is processedt-1And htSplicing, calculating FtThe formula is as follows:
Ft=tanh(wTht-1+b+uFt-1)
and calculating a reward value Rt=rtFt-1-c(|Ft-Ft-1|), wherein c is the transaction cost;
step 306, repeating steps 304 and 305 until L high-frequency data blocks are traversed;
step 307, establish the objective function UTGradient update using a gradient ascent method such that UTMaximization, i.e. increasing the gain of trading strategies;
and 308, if the number of training rounds reaches the maximum times, saving the model with the highest sharp rate on the verification set as the transaction model used last, and otherwise returning to the step 302.
The optimized object of the trading model is the sharp ratio, and the specific formula is as follows:
the sharp ratio is used for measuring the stability of income acquisition, and compared with other objective functions, the sharp ratio considers more factors and has better properties. The gradient calculation uses a time-based Back-Propagation threshold time (BPTT).
A futures trading implementation method based on multi-scale self-attention utilizes features obtained by a deep extraction layer and a trained trading model to output trading decisions, and comprises the following specific steps:
step 401, loading a deep feature extraction network and a transaction model;
step 402, reading high-frequency data in real time and arranging the high-frequency data into a format of a high-frequency data block;
step 403, the high frequency data block is preprocessed in the same step 103, and the transaction signal F at the last moment is readt-1;
Step 404, calculating depth features through a depth feature extraction network and standardizing the depth features to obtain
Step 405, willFt-1Inputting the transaction model to obtain a transaction signal FtThen obtaining the transaction action a through a symbolic functiontAnd is executed.
FIG. 2 is a flow chart of a training process of a depth feature extraction layer based on multi-scale self-attention, and the main process is described as follows, network parameters of the depth feature extraction layer are initialized, preprocessed high-frequency data blocks and corresponding labels thereof are input, the depth feature extraction layer is trained according to the preprocessed high-frequency data blocks and the corresponding labels thereof, fixed batch size data are randomly sampled from a training set during training, a loss function is updated by using L1 loss function after network parameter gradient is calculated through back propagation, training is stopped when the number of training rounds reaches the maximum value, and an index R on a verification set is obtained2Removing the last full-connection layer in the network from the highest model, and saving the rest parameters as a depth feature extraction layer for use in the subsequent steps; and calculating the depth characteristics, the mean value and the standard deviation of the depth characteristics, which are obtained after all the high-frequency data blocks on the training set pass through the depth characteristic extraction layer.
The main process is described as the following steps of constructing a transaction model and initializing network parameters, segmenting high-frequency data of each day, cutting the high-frequency data into continuous high-frequency data blocks with the same size, marking the intermediate price of the last moment of each high-frequency data block as the price of the moment to obtain a price sequence and a price difference sequence, randomly sampling L continuous high-frequency data blocks in a certain day during training, calculating the depth characteristics of the high-frequency data blocks according to the depth characteristic extraction layer and standardizing the depth characteristics, traversing L continuous high-frequency data blocks, inputting the depth characteristic of the current moment and the hidden layer representation of an STM of the last moment L at any moment, calculating the hidden layer representation of the current moment L STM, splicing a decision signal and a splicing reward value of the last moment, establishing a target function Sharp ratio, performing gradient updating by using a gradient ascent method to maximize the Sharp ratio, if the number of turns reaches the maximum times, storing the Sharp ratio on a verification set as the highest Sharp ratio, and otherwise repeating the above steps.
FIG. 4 is a flow chart of a transaction model decision making implemented by the present invention; the main process is described as follows: loading a deep feature extraction network and a transaction model; reading high-frequency data in real time and arranging the high-frequency data into a format of a high-frequency data block; preprocessing the high-frequency data block and reading a transaction signal at the current moment; calculating depth features through a depth feature extraction network and standardizing; calculating and executing a transaction signal by the depth feature, the hidden layer feature of the transaction model at the previous moment and the transaction signal at the previous moment through the transaction model; the above process is repeated until the transaction is over.
Claims (9)
1. A futures model training method based on multi-scale self-attention is characterized by comprising the following steps: the method comprises the steps of constructing a futures mastership contract high-frequency data set, constructing and training a depth feature extraction layer based on multi-scale self-attention, and training a transaction model based on features obtained by the depth extraction layer.
2. The method for multi-scale self-attention based futures model training according to claim 1, wherein: the construction steps of the futures mastership contract high-frequency data set are as follows:
step 101, removing non-principal contract data from the obtained limit order book data; the main power contract is the contract with the largest transaction amount in the current time period;
102, calculating the average weighted average price change percentage after K moments in the future as a label;
and 103, preprocessing the price data of each gear at each moment of the high-frequency data, and normalizing all the price data.
3. The method for multi-scale self-attention based futures model training according to claim 1, wherein: the depth feature extraction layer based on the multi-scale self-attention is formed by stacking convolution network layers which are alternately convoluted transversely and longitudinally, the transverse convolution layer mainly extracts features of different gear buying and selling prices in high-frequency data at the same moment, and the longitudinal convolution layer mainly uses a multi-scale extraction module to learn and extract multi-scale time sequence features in time sequence dimension; the transverse coiling layers and the longitudinal coiling layers are alternated for three layers; two layers of multi-head self-attention layers follow the convolutional layer for learning correlation characteristics of different time sequences; and finally, a layer of long-time memory network and a layer of full connection layer are used for learning time sequence characteristics and outputting a change predicted value of future price.
4. The method for multi-scale self-attention based futures model training according to claim 1, wherein: in the depth feature extraction layer based on multi-scale self-attention, time sequence data are parallelly spliced after passing through one-dimensional convolution kernels with different sizes through a multi-scale extraction module, so that multi-scale features are obtained.
5. The method for multi-scale self-attention based futures model training according to claim 1, wherein: the training steps of the depth feature extraction layer based on the multi-scale self-attention are as follows:
step 201, initializing network parameters of a depth feature extraction layer;
step 202, inputting the preprocessed high-frequency data block and a corresponding label thereof; the high-frequency data of each day are divided to obtain continuous high-frequency data blocks which are cut into the same size; and dividing a training set, a verification set and a test set according to time;
step 203, training a depth feature extraction layer according to the input preprocessed high-frequency data block and the corresponding label thereof;
step 204, randomly sampling data with the batch size b from a training set during training, wherein the loss function uses an average absolute error loss function, and updating by using a gradient descent method after calculating the network parameter gradient through back propagation;
step 205, stopping training when the number of training rounds reaches the maximum value, and taking the index R on the verification set2Removing the last full-connection layer in the network after the highest model, and saving the rest parameters as a depth feature extraction layer for use in the subsequent steps;
step 206, calculating depth characteristics obtained after all high-frequency data blocks on the training set pass through the depth characteristic extraction layerWherein M represents the dimension of the depth feature; and calculating a mean of the depth featuresAnd standard deviation eσFor subsequent use.
6. The method for multi-scale self-attention based futures model training according to claim 1, wherein: the training steps of the transaction model based on the features obtained by the deep extraction layer are as follows:
step 301, constructing a transaction model and initializing network parameters;
step 302, the high frequency data of each day is divided into continuous high frequency data blocks with the same size,and recording the intermediate price of the last moment of each high-frequency data block as the price of the moment to obtain a price sequence and a price difference sequence, wherein the price sequence is P ═ P0,p1,...pT]Sequence of valence difference rt=pt+1-pt;
Step 303, randomly sampling L continuous high-frequency data blocks in a certain day during training, and then calculating the depth features of the high-frequency data blocks according to the depth feature extraction layer
Step 304, computing L STM hidden layer representation for any time t in continuous high frequency data block Wherein θ is a parameter of L STM;
step 305, the decision signal F of the previous time is processedt-1And htSplicing, calculating FtThe formula is as follows:
Ft=tanh(wTht-1+b+uFt-1)
and calculating a reward value Rt=rtFt-1-c(|Ft-Ft-1|), wherein c is the transaction cost;
step 306, repeating steps 304 and 305 until L high-frequency data blocks are traversed;
step 307, establish the objective function UTGradient update using a gradient ascent method such that UTMaximization, i.e. increasing the gain of trading strategies;
and 308, if the number of training rounds reaches the maximum times, saving the model with the highest sharp rate on the verification set as the transaction model used last, and otherwise returning to the step 302.
7. According to the claimsThe multi-scale self-attention-based futures model training method is characterized in that the transaction model consists of an L STM layer and a full-link layer, wherein the L STM layer is used for inputting normalized depth features e at each momenttAnd hidden layer characteristic h of last momentt-1Performing time sequence modeling and outputting ht(ii) a The full connection layer will make the decision signal F of the last timet-1And htSplicing, output Ft;FtOutputting a transaction decision action a through a symbolic functiont∈ { -1,0,1}, where-1 denotes holding empty head, 0 denotes holding empty bin, and 1 denotes holding multi-head.
8. The method for multi-scale self-attention based futures model training according to claim 6, wherein: the optimized object of the trading model is the sharp ratio, and the specific formula is as follows:
the sharp ratio is used for measuring the stability of income acquisition, and compared with other objective functions, the sharp ratio considers more factors and has better properties. The gradient calculation uses a time-based back propagation algorithm.
9. A futures trading implementation method based on multi-scale self-attention is characterized in that a trading decision is output through a trained trading model, and the method comprises the following specific steps:
step 401, loading a deep feature extraction network and a transaction model;
step 402, reading high-frequency data in real time and arranging the high-frequency data into a format of a high-frequency data block;
step 403, pre-process the high frequency data block and read the transaction signal F at the previous timet-1;
Step 404, calculating depth features through a depth feature extraction network and standardizing the depth features to obtain
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010419707.1A CN111445341A (en) | 2020-05-18 | 2020-05-18 | Futures model training and transaction implementation method based on multi-scale self-attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010419707.1A CN111445341A (en) | 2020-05-18 | 2020-05-18 | Futures model training and transaction implementation method based on multi-scale self-attention |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111445341A true CN111445341A (en) | 2020-07-24 |
Family
ID=71656908
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010419707.1A Pending CN111445341A (en) | 2020-05-18 | 2020-05-18 | Futures model training and transaction implementation method based on multi-scale self-attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111445341A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114612223A (en) * | 2022-03-18 | 2022-06-10 | 上海爱富爱克斯网络科技发展有限责任公司 | Financial data information processing method |
CN116306254A (en) * | 2023-02-18 | 2023-06-23 | 交通运输部规划研究院 | Truck load estimation method and model training method and device thereof |
-
2020
- 2020-05-18 CN CN202010419707.1A patent/CN111445341A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114612223A (en) * | 2022-03-18 | 2022-06-10 | 上海爱富爱克斯网络科技发展有限责任公司 | Financial data information processing method |
CN116306254A (en) * | 2023-02-18 | 2023-06-23 | 交通运输部规划研究院 | Truck load estimation method and model training method and device thereof |
CN116306254B (en) * | 2023-02-18 | 2023-11-10 | 交通运输部规划研究院 | Truck load estimation method and model training method and device thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110032926B (en) | Video classification method and device based on deep learning | |
CN108681752B (en) | Image scene labeling method based on deep learning | |
CN112819604A (en) | Personal credit evaluation method and system based on fusion neural network feature mining | |
CN110807760B (en) | Tobacco leaf grading method and system | |
CN109598387A (en) | Forecasting of Stock Prices method and system based on two-way cross-module state attention network model | |
CN105760821A (en) | Classification and aggregation sparse representation face identification method based on nuclear space | |
CN112800876A (en) | Method and system for embedding hypersphere features for re-identification | |
CN111445341A (en) | Futures model training and transaction implementation method based on multi-scale self-attention | |
CN109063983B (en) | Natural disaster damage real-time evaluation method based on social media data | |
CN114511710A (en) | Image target detection method based on convolutional neural network | |
CN112307760A (en) | Deep learning-based financial report emotion analysis method and device and terminal | |
CN114049222A (en) | Tendency prediction method based on attention mechanism and reinforcement learning | |
CN116362799A (en) | Product demand prediction method based on supply chain multi-source data fusion and storage medium | |
CN116503158A (en) | Enterprise bankruptcy risk early warning method, system and device based on data driving | |
CN114004530B (en) | Enterprise electric power credit modeling method and system based on ordering support vector machine | |
CN117315381A (en) | Hyperspectral image classification method based on second-order biased random walk | |
CN113496221A (en) | Point supervision remote sensing image semantic segmentation method and system based on depth bilateral filtering | |
CN114399661A (en) | Instance awareness backbone network training method | |
CN113688715A (en) | Facial expression recognition method and system | |
CN116884067B (en) | Micro-expression recognition method based on improved implicit semantic data enhancement | |
KR102409041B1 (en) | portfolio asset allocation reinforcement learning method using actor critic model | |
CN112241785A (en) | Book interview method based on deep reinforcement learning | |
CN115310999A (en) | Enterprise power utilization behavior analysis method and system based on multilayer perceptron and sequencing network | |
CN113762415A (en) | Neural network-based intelligent matching method and system for automobile financial products | |
CN116363498A (en) | Loess plateau terrace automatic identification method based on deep migration learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |