CN111445341A - Futures model training and transaction implementation method based on multi-scale self-attention - Google Patents

Futures model training and transaction implementation method based on multi-scale self-attention Download PDF

Info

Publication number
CN111445341A
CN111445341A CN202010419707.1A CN202010419707A CN111445341A CN 111445341 A CN111445341 A CN 111445341A CN 202010419707 A CN202010419707 A CN 202010419707A CN 111445341 A CN111445341 A CN 111445341A
Authority
CN
China
Prior art keywords
frequency data
training
transaction
model
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010419707.1A
Other languages
Chinese (zh)
Inventor
江晨舟
李武军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202010419707.1A priority Critical patent/CN111445341A/en
Publication of CN111445341A publication Critical patent/CN111445341A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Human Resources & Organizations (AREA)
  • Operations Research (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a futures model training and transaction implementation method based on multi-scale self-attention, which comprises the steps of collecting futures main force contract five-level high-frequency data at the construction stage of a futures high-frequency data set, preprocessing the data, and constructing a label by using future price change; in a deep feature extraction layer training stage, constructing a deep neural network based on multi-scale self-attention, training the network by using the constructed label and storing model parameters; in the training stage of the transaction model, constructing the transaction model, using the features output by the deep feature extraction layer, and training the transaction model by using a method of maximizing the sharp ratio; and in the stage of outputting transaction decision by using the deep feature extraction layer and the transaction model, outputting transaction actions by using the features extracted by the deep feature extraction layer and the trained transaction model. The invention considers the multi-scale characteristics of the financial time sequence and the correlation among different time sequences from the angle of the model, and improves the accuracy of the futures trading data prediction.

Description

Futures model training and transaction implementation method based on multi-scale self-attention
Technical Field
The invention relates to a futures model training and transaction implementation method based on multi-scale self-attention, and belongs to the technical field of model quantification.
Background
Quantitative investment as an investment methodology has many advantages such as discipline, systematicness, timeliness and quantification. With the development of computer technology and the advent of the big data era, more and more people choose to use a quantitative investment method using computer technology as a foundation to replace the conventional subjective investment method, and various quantitative investment funds are established successively. In recent years, as machine learning artificial intelligence technology develops, more and more people try to use the technology to construct intelligent quantitative investment strategies.
At present, many quantitative investment research works based on deep learning are available and mainly divided into two parts: non-deep learning based methods and deep learning based methods. The non-deep learning-based method is mainly characterized in that after characteristics are manually constructed, modeling is carried out through a non-deep model, for example, linear regression, a support vector machine and a tree-based model such as a random forest and a gradient lifting tree are used. The deep model method is mainly used for establishing a deep neural network learning rule, such as a convolutional neural network, a cyclic neural network and the like, after appropriate data preprocessing is carried out.
But none of these works has modelled on the multi-scale nature of financial time series data and correlation information between different time series data.
Disclosure of Invention
The purpose of the invention is as follows: at present, relevant work related to deep learning and quantification of financial models does not consider multi-scale characteristics of financial time series data and relevant information between different time series data from the perspective of the models. Aiming at the problem, the invention provides a futures model training and transaction implementation method based on multi-scale self-attention, and the model training method comprises the following steps: constructing a futures high-frequency data set; constructing a depth feature extraction layer network based on multi-scale self-attention; training a depth feature extraction layer network, learning multi-scale time sequence information of high-frequency financial data and correlation information among different time sequences, and extracting depth features; and then constructing an automatic transaction model, and training the transaction model by maximizing the sharp ratio by using the extracted depth features as input. The transaction implementation method comprises the following steps: in real transaction, transaction decisions are output in real time through a depth feature extraction layer and a transaction model. The invention has the characteristic of no dependence on artificial features; the method can make up for the defect that the conventional model cannot well acquire multi-scale time sequence characteristics of high-frequency data and correlation characteristics among different time sequences, can automatically extract the time sequence characteristics, and can automatically make a trading strategy according to the extracted depth characteristics, so that the accuracy is higher.
The technical scheme is as follows: a futures model training method based on multi-scale self-attention comprises the steps of constructing a futures main power contract high-frequency data set, constructing and training a deep feature extraction layer based on multi-scale self-attention, and training a transaction model based on features obtained by the deep feature extraction layer.
The construction steps of the futures mastership contract high-frequency data set are as follows:
step 101, removing non-principal contract data from the obtained limit order book data; the main power contract is the contract with the largest transaction amount in the current period.
102, calculating the average weighted average price change percentage after K moments in the future as a label;
and 103, preprocessing the price data of each gear of the futures main force contract high-frequency data at each moment, and standardizing all price data.
The depth feature extraction layer based on the multi-scale self-attention is formed by stacking convolution network layers which are alternately convolved transversely and longitudinally, the transverse convolution layer mainly extracts features of purchase and sale prices of different gears in high-frequency data at the same time, the longitudinal convolution layer mainly uses a multi-scale extraction module to learn and extract multi-scale time sequence features in time sequence dimensions, the multi-scale extraction module mainly enables the time sequence data to be parallelly spliced after passing through one-dimensional convolution kernels of different sizes to obtain the multi-scale features, the transverse convolution layer and the longitudinal convolution layer are alternately arranged in three layers, two multi-head self-attention layers are used for learning correlation features of different time sequences after the convolution layer is followed, and finally, a layer of long-Short Term Memory (L ong Short Term Memory, L STM) network and a full connection layer are used for learning the time sequence features and outputting change prediction values of future prices.
The training steps of the depth feature extraction layer based on the multi-scale self-attention are as follows:
step 201, initializing network parameters of a depth feature extraction layer;
step 202, inputting the preprocessed high-frequency data block and a corresponding label thereof; the high-frequency data of each day are divided to obtain continuous high-frequency data blocks which are cut into the same size; and dividing a training set, a verification set and a test set according to time;
step 203, training a depth feature extraction layer according to the input preprocessed high-frequency data block and the corresponding label thereof;
step 204, randomly sampling data with the batch size b from a training set during training, using an average Absolute Error (MAE) loss function as a loss function, and updating by using a gradient descent method after calculating the network parameter gradient through back propagation;
step 205, stopping training when the number of training rounds reaches the maximum value, and taking the index R on the verification set2Removing the last full-connection layer in the network after the highest model, and saving the rest parameters as a depth feature extraction layer for use in the subsequent steps;
step 206, calculating depth characteristics obtained after all high-frequency data blocks on the training set pass through the depth characteristic extraction layer
Figure BDA0002496404200000021
Wherein M represents the dimension of the depth feature; and calculating a mean of the depth features
Figure BDA0002496404200000022
And standard deviation eσFor subsequent use.
The training steps of the transaction model based on the features obtained by the deep extraction layer are as follows:
step 301, constructing a transaction model and initializing network parameters;
step 302, the high frequency data of each day is divided into continuous high frequency data blocks with the same size, andrecording the intermediate price of the last time of each high-frequency data block as the price of the time, and obtaining a price sequence and a price difference sequence, wherein the price sequence is P ═ P0,p1,…pT]Sequence of valence difference rt=pt+1-pt
Step 303, randomly sampling L continuous high-frequency data blocks in a certain day during training, and then calculating the depth features of the high-frequency data blocks according to the depth feature extraction layer
Figure BDA0002496404200000031
Step 304, computing L STM hidden layer representation for any time t in continuous high frequency data block
Figure BDA0002496404200000032
Figure BDA0002496404200000033
Wherein θ is a parameter of L STM;
step 305, the decision signal F of the previous time is processedt-1And htSplicing, calculating FtThe formula is as follows:
Ft=tanh(wTht-1+b+uFt-1)
and calculating a reward value Rt=rtFt-1-c(|Ft-Ft-1|), wherein c is the transaction cost;
step 306, repeating steps 304 and 305 until L high-frequency data blocks are traversed;
step 307, establish the objective function UTGradient update using a gradient ascent method such that UTMaximization, i.e. increasing the gain of trading strategies;
and 308, if the number of training rounds reaches the maximum times, saving the model with the highest sharp rate on the verification set as the transaction model used last, and otherwise returning to the step 302.
The transaction model is composed of L STM layer and full connection layer, L STM layer is used for inputting each timeNormalized depth feature etAnd hidden layer characteristic h of last momentt-1Performing time sequence modeling and outputting ht(ii) a The full connection layer will make the decision signal F of the last timet-1And htSplicing, output Ft;FtOutputting a transaction decision action a through a symbolic functiont∈ { -1,0,1}, where-1 denotes holding empty head, 0 denotes holding empty bin, and 1 denotes holding multi-head.
A futures trading implementation method based on multi-scale self-attention outputs trading decisions through a trained trading model, and specifically comprises the following steps:
step 401, loading a deep feature extraction network and a transaction model;
step 402, reading high-frequency data in real time and arranging the high-frequency data into a format of a high-frequency data block;
step 403, the high frequency data block is preprocessed in the same step 103, and the transaction signal F at the last moment is readt-1
Step 404, calculating depth features through a depth feature extraction network and standardizing the depth features to obtain
Figure BDA0002496404200000034
Step 405, will
Figure BDA0002496404200000035
Ft-1Inputting the transaction model to obtain a transaction signal FtThen obtaining the transaction action a through a symbolic functiontAnd is executed.
Has the advantages that: compared with the prior art, the futures model training and transaction implementation method based on multi-scale self-attention provided by the invention can better capture the high-frequency time sequence characteristics of futures by constructing the neural network based on multi-scale self-attention, and the transaction model constructed by the extracted depth characteristics has higher accuracy.
Drawings
FIG. 1 is a block diagram of a multi-scale self-attention based depth feature extraction layer in accordance with an embodiment of the present invention;
FIG. 2 is a flow chart of the training of the multi-scale self-attention-based depth feature extraction layer implemented by the present invention;
FIG. 3 is a flow chart of a training process for a feature-versus-transaction model based on deep extraction layers as practiced by the present invention;
FIG. 4 is a flow chart of a transaction model decision making process implemented by the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
A futures model training method based on multi-scale self attention comprises the following steps:
1) constructing a futures initiative contract high-frequency data set;
2) constructing a depth feature extraction layer based on multi-scale self-attention;
3) training a depth feature extraction layer based on multi-scale self-attention;
4) and training a transaction model based on the features obtained by the deep extraction layer.
In step 1), the step of constructing the futures mastership contract high-frequency data set for the collected futures high-frequency data is as follows:
step 101, removing non-principal contract data from the obtained limit order book data; the main power contract is the contract with the largest transaction amount in the current period.
102, calculating the average weighted average price change percentage after K moments in the future as a label;
and 103, preprocessing the price data of each gear at each moment of the high-frequency data, and normalizing all the price data.
FIG. 1 is a structural diagram of a depth feature extraction layer based on multi-scale self-attention, and the depth feature extraction layer is mainly formed by stacking convolution network layers which are alternately convolved in a transverse direction and a longitudinal direction, wherein the transverse convolution layer mainly extracts features of purchase and sale prices with different price depths in high-frequency data at the same moment, the longitudinal convolution layer mainly uses a multi-scale extraction module to learn and extract multi-scale time sequence features in a time sequence dimension, the multi-scale extraction module mainly conducts parallel connection on the time sequence data through one-dimensional convolution kernels with different sizes and then conducts splicing to obtain the multi-scale features, the transverse convolution layer and the longitudinal convolution layer are alternately arranged into three layers, two multi-head self-attention layers are arranged behind the convolution layers and used for learning correlation features of different time sequences, and finally one L STM layer and one full-connection layer are used for learning the time sequence features and.
Table 1 shows the structure of the multi-scale self-attention based depth feature extraction layer and the output size of each layer.
TABLE 1
Figure BDA0002496404200000041
Figure BDA0002496404200000051
In step 3), the training step of the depth feature extraction layer based on multi-scale self-attention comprises the following steps:
step 201, initializing network parameters of a depth feature extraction layer;
step 202, inputting the preprocessed high-frequency data block and a corresponding label thereof; the high-frequency data of each day are divided to obtain continuous high-frequency data blocks which are cut into the same size; and the training set, the verification set and the test set are divided according to time sequence.
Step 203, training a depth feature extraction layer according to the input preprocessed high-frequency data block and the corresponding label thereof;
step 204, randomly sampling data with the batch size b from a training set during training, using an average Absolute Error (MAE) loss function as a loss function, and updating by using a gradient descent method after calculating the network parameter gradient through back propagation;
step 205, when the number of training rounds reaches the maximumStopping training when the value is reached, and taking the index R on the verification set2Removing the last full-connection layer in the network after the highest model, and saving the rest parameters as a depth feature extraction layer for use in the subsequent steps;
step 206, calculating depth characteristics obtained after all high-frequency data blocks on the training set pass through the depth characteristic extraction layer
Figure BDA0002496404200000052
Wherein M represents the dimension of the depth feature; and calculating a mean of the depth features
Figure BDA0002496404200000053
And standard deviation eσFor subsequent use.
The transaction model in the step 4) mainly comprises an L STM layer and a full connection layer, wherein the L STM layer is used for inputting the standardized depth characteristics e at each momenttAnd hidden layer characteristic h of last momentt-1Performing time sequence modeling and outputting ht(ii) a The full connection layer will make the decision signal F of the last timet-1And htSplicing, output Ft;FtOutputting a transaction decision action a through a symbolic functiont∈ { -1,0,1}, where-1 denotes holding empty head, 0 denotes holding empty bin, and 1 denotes holding multi-head.
The training steps of the transaction model based on the features obtained by the deep extraction layer are as follows:
step 301, constructing a transaction model and initializing network parameters;
step 302, dividing the high-frequency data of each day, cutting the high-frequency data into continuous high-frequency data blocks with the same size, recording the middle price of the last moment of each high-frequency data block as the price of the moment, and obtaining a price sequence and a price difference sequence, wherein the price sequence is P ═ P0,p1,…pT]Sequence of valence difference rt=pt+1-pt
Step 303, randomly sampling L continuous high-frequency data blocks in a certain day during training, and then calculating the depth features of the high-frequency data blocks according to the depth feature extraction layer
Figure BDA0002496404200000054
Step 304, computing L STM hidden layer representation for any time t in continuous high frequency data block
Figure BDA0002496404200000055
Figure BDA0002496404200000061
Wherein θ is a parameter of L STM;
step 305, the decision signal F of the previous time is processedt-1And htSplicing, calculating FtThe formula is as follows:
Ft=tanh(wTht-1+b+uFt-1)
and calculating a reward value Rt=rtFt-1-c(|Ft-Ft-1|), wherein c is the transaction cost;
step 306, repeating steps 304 and 305 until L high-frequency data blocks are traversed;
step 307, establish the objective function UTGradient update using a gradient ascent method such that UTMaximization, i.e. increasing the gain of trading strategies;
and 308, if the number of training rounds reaches the maximum times, saving the model with the highest sharp rate on the verification set as the transaction model used last, and otherwise returning to the step 302.
The optimized object of the trading model is the sharp ratio, and the specific formula is as follows:
Figure BDA0002496404200000062
the sharp ratio is used for measuring the stability of income acquisition, and compared with other objective functions, the sharp ratio considers more factors and has better properties. The gradient calculation uses a time-based Back-Propagation threshold time (BPTT).
A futures trading implementation method based on multi-scale self-attention utilizes features obtained by a deep extraction layer and a trained trading model to output trading decisions, and comprises the following specific steps:
step 401, loading a deep feature extraction network and a transaction model;
step 402, reading high-frequency data in real time and arranging the high-frequency data into a format of a high-frequency data block;
step 403, the high frequency data block is preprocessed in the same step 103, and the transaction signal F at the last moment is readt-1
Step 404, calculating depth features through a depth feature extraction network and standardizing the depth features to obtain
Figure BDA0002496404200000063
Step 405, will
Figure BDA0002496404200000064
Ft-1Inputting the transaction model to obtain a transaction signal FtThen obtaining the transaction action a through a symbolic functiontAnd is executed.
FIG. 2 is a flow chart of a training process of a depth feature extraction layer based on multi-scale self-attention, and the main process is described as follows, network parameters of the depth feature extraction layer are initialized, preprocessed high-frequency data blocks and corresponding labels thereof are input, the depth feature extraction layer is trained according to the preprocessed high-frequency data blocks and the corresponding labels thereof, fixed batch size data are randomly sampled from a training set during training, a loss function is updated by using L1 loss function after network parameter gradient is calculated through back propagation, training is stopped when the number of training rounds reaches the maximum value, and an index R on a verification set is obtained2Removing the last full-connection layer in the network from the highest model, and saving the rest parameters as a depth feature extraction layer for use in the subsequent steps; and calculating the depth characteristics, the mean value and the standard deviation of the depth characteristics, which are obtained after all the high-frequency data blocks on the training set pass through the depth characteristic extraction layer.
The main process is described as the following steps of constructing a transaction model and initializing network parameters, segmenting high-frequency data of each day, cutting the high-frequency data into continuous high-frequency data blocks with the same size, marking the intermediate price of the last moment of each high-frequency data block as the price of the moment to obtain a price sequence and a price difference sequence, randomly sampling L continuous high-frequency data blocks in a certain day during training, calculating the depth characteristics of the high-frequency data blocks according to the depth characteristic extraction layer and standardizing the depth characteristics, traversing L continuous high-frequency data blocks, inputting the depth characteristic of the current moment and the hidden layer representation of an STM of the last moment L at any moment, calculating the hidden layer representation of the current moment L STM, splicing a decision signal and a splicing reward value of the last moment, establishing a target function Sharp ratio, performing gradient updating by using a gradient ascent method to maximize the Sharp ratio, if the number of turns reaches the maximum times, storing the Sharp ratio on a verification set as the highest Sharp ratio, and otherwise repeating the above steps.
FIG. 4 is a flow chart of a transaction model decision making implemented by the present invention; the main process is described as follows: loading a deep feature extraction network and a transaction model; reading high-frequency data in real time and arranging the high-frequency data into a format of a high-frequency data block; preprocessing the high-frequency data block and reading a transaction signal at the current moment; calculating depth features through a depth feature extraction network and standardizing; calculating and executing a transaction signal by the depth feature, the hidden layer feature of the transaction model at the previous moment and the transaction signal at the previous moment through the transaction model; the above process is repeated until the transaction is over.

Claims (9)

1. A futures model training method based on multi-scale self-attention is characterized by comprising the following steps: the method comprises the steps of constructing a futures mastership contract high-frequency data set, constructing and training a depth feature extraction layer based on multi-scale self-attention, and training a transaction model based on features obtained by the depth extraction layer.
2. The method for multi-scale self-attention based futures model training according to claim 1, wherein: the construction steps of the futures mastership contract high-frequency data set are as follows:
step 101, removing non-principal contract data from the obtained limit order book data; the main power contract is the contract with the largest transaction amount in the current time period;
102, calculating the average weighted average price change percentage after K moments in the future as a label;
and 103, preprocessing the price data of each gear at each moment of the high-frequency data, and normalizing all the price data.
3. The method for multi-scale self-attention based futures model training according to claim 1, wherein: the depth feature extraction layer based on the multi-scale self-attention is formed by stacking convolution network layers which are alternately convoluted transversely and longitudinally, the transverse convolution layer mainly extracts features of different gear buying and selling prices in high-frequency data at the same moment, and the longitudinal convolution layer mainly uses a multi-scale extraction module to learn and extract multi-scale time sequence features in time sequence dimension; the transverse coiling layers and the longitudinal coiling layers are alternated for three layers; two layers of multi-head self-attention layers follow the convolutional layer for learning correlation characteristics of different time sequences; and finally, a layer of long-time memory network and a layer of full connection layer are used for learning time sequence characteristics and outputting a change predicted value of future price.
4. The method for multi-scale self-attention based futures model training according to claim 1, wherein: in the depth feature extraction layer based on multi-scale self-attention, time sequence data are parallelly spliced after passing through one-dimensional convolution kernels with different sizes through a multi-scale extraction module, so that multi-scale features are obtained.
5. The method for multi-scale self-attention based futures model training according to claim 1, wherein: the training steps of the depth feature extraction layer based on the multi-scale self-attention are as follows:
step 201, initializing network parameters of a depth feature extraction layer;
step 202, inputting the preprocessed high-frequency data block and a corresponding label thereof; the high-frequency data of each day are divided to obtain continuous high-frequency data blocks which are cut into the same size; and dividing a training set, a verification set and a test set according to time;
step 203, training a depth feature extraction layer according to the input preprocessed high-frequency data block and the corresponding label thereof;
step 204, randomly sampling data with the batch size b from a training set during training, wherein the loss function uses an average absolute error loss function, and updating by using a gradient descent method after calculating the network parameter gradient through back propagation;
step 205, stopping training when the number of training rounds reaches the maximum value, and taking the index R on the verification set2Removing the last full-connection layer in the network after the highest model, and saving the rest parameters as a depth feature extraction layer for use in the subsequent steps;
step 206, calculating depth characteristics obtained after all high-frequency data blocks on the training set pass through the depth characteristic extraction layer
Figure FDA0002496404190000011
Wherein M represents the dimension of the depth feature; and calculating a mean of the depth features
Figure FDA0002496404190000012
And standard deviation eσFor subsequent use.
6. The method for multi-scale self-attention based futures model training according to claim 1, wherein: the training steps of the transaction model based on the features obtained by the deep extraction layer are as follows:
step 301, constructing a transaction model and initializing network parameters;
step 302, the high frequency data of each day is divided into continuous high frequency data blocks with the same size,and recording the intermediate price of the last moment of each high-frequency data block as the price of the moment to obtain a price sequence and a price difference sequence, wherein the price sequence is P ═ P0,p1,...pT]Sequence of valence difference rt=pt+1-pt
Step 303, randomly sampling L continuous high-frequency data blocks in a certain day during training, and then calculating the depth features of the high-frequency data blocks according to the depth feature extraction layer
Figure FDA0002496404190000021
Step 304, computing L STM hidden layer representation for any time t in continuous high frequency data block
Figure FDA0002496404190000022
Figure FDA0002496404190000023
Wherein θ is a parameter of L STM;
step 305, the decision signal F of the previous time is processedt-1And htSplicing, calculating FtThe formula is as follows:
Ft=tanh(wTht-1+b+uFt-1)
and calculating a reward value Rt=rtFt-1-c(|Ft-Ft-1|), wherein c is the transaction cost;
step 306, repeating steps 304 and 305 until L high-frequency data blocks are traversed;
step 307, establish the objective function UTGradient update using a gradient ascent method such that UTMaximization, i.e. increasing the gain of trading strategies;
and 308, if the number of training rounds reaches the maximum times, saving the model with the highest sharp rate on the verification set as the transaction model used last, and otherwise returning to the step 302.
7. According to the claimsThe multi-scale self-attention-based futures model training method is characterized in that the transaction model consists of an L STM layer and a full-link layer, wherein the L STM layer is used for inputting normalized depth features e at each momenttAnd hidden layer characteristic h of last momentt-1Performing time sequence modeling and outputting ht(ii) a The full connection layer will make the decision signal F of the last timet-1And htSplicing, output Ft;FtOutputting a transaction decision action a through a symbolic functiont∈ { -1,0,1}, where-1 denotes holding empty head, 0 denotes holding empty bin, and 1 denotes holding multi-head.
8. The method for multi-scale self-attention based futures model training according to claim 6, wherein: the optimized object of the trading model is the sharp ratio, and the specific formula is as follows:
Figure FDA0002496404190000024
the sharp ratio is used for measuring the stability of income acquisition, and compared with other objective functions, the sharp ratio considers more factors and has better properties. The gradient calculation uses a time-based back propagation algorithm.
9. A futures trading implementation method based on multi-scale self-attention is characterized in that a trading decision is output through a trained trading model, and the method comprises the following specific steps:
step 401, loading a deep feature extraction network and a transaction model;
step 402, reading high-frequency data in real time and arranging the high-frequency data into a format of a high-frequency data block;
step 403, pre-process the high frequency data block and read the transaction signal F at the previous timet-1
Step 404, calculating depth features through a depth feature extraction network and standardizing the depth features to obtain
Figure FDA0002496404190000031
Step 405, will
Figure FDA0002496404190000032
Ft-1Inputting the transaction model to obtain a transaction signal FtThen obtaining the transaction action a through a symbolic functiontAnd is executed.
CN202010419707.1A 2020-05-18 2020-05-18 Futures model training and transaction implementation method based on multi-scale self-attention Pending CN111445341A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010419707.1A CN111445341A (en) 2020-05-18 2020-05-18 Futures model training and transaction implementation method based on multi-scale self-attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010419707.1A CN111445341A (en) 2020-05-18 2020-05-18 Futures model training and transaction implementation method based on multi-scale self-attention

Publications (1)

Publication Number Publication Date
CN111445341A true CN111445341A (en) 2020-07-24

Family

ID=71656908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010419707.1A Pending CN111445341A (en) 2020-05-18 2020-05-18 Futures model training and transaction implementation method based on multi-scale self-attention

Country Status (1)

Country Link
CN (1) CN111445341A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612223A (en) * 2022-03-18 2022-06-10 上海爱富爱克斯网络科技发展有限责任公司 Financial data information processing method
CN116306254A (en) * 2023-02-18 2023-06-23 交通运输部规划研究院 Truck load estimation method and model training method and device thereof

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612223A (en) * 2022-03-18 2022-06-10 上海爱富爱克斯网络科技发展有限责任公司 Financial data information processing method
CN116306254A (en) * 2023-02-18 2023-06-23 交通运输部规划研究院 Truck load estimation method and model training method and device thereof
CN116306254B (en) * 2023-02-18 2023-11-10 交通运输部规划研究院 Truck load estimation method and model training method and device thereof

Similar Documents

Publication Publication Date Title
CN110032926B (en) Video classification method and device based on deep learning
CN108681752B (en) Image scene labeling method based on deep learning
CN112819604A (en) Personal credit evaluation method and system based on fusion neural network feature mining
CN110807760B (en) Tobacco leaf grading method and system
CN109598387A (en) Forecasting of Stock Prices method and system based on two-way cross-module state attention network model
CN105760821A (en) Classification and aggregation sparse representation face identification method based on nuclear space
CN112800876A (en) Method and system for embedding hypersphere features for re-identification
CN111445341A (en) Futures model training and transaction implementation method based on multi-scale self-attention
CN109063983B (en) Natural disaster damage real-time evaluation method based on social media data
CN114511710A (en) Image target detection method based on convolutional neural network
CN112307760A (en) Deep learning-based financial report emotion analysis method and device and terminal
CN114049222A (en) Tendency prediction method based on attention mechanism and reinforcement learning
CN116362799A (en) Product demand prediction method based on supply chain multi-source data fusion and storage medium
CN116503158A (en) Enterprise bankruptcy risk early warning method, system and device based on data driving
CN114004530B (en) Enterprise electric power credit modeling method and system based on ordering support vector machine
CN117315381A (en) Hyperspectral image classification method based on second-order biased random walk
CN113496221A (en) Point supervision remote sensing image semantic segmentation method and system based on depth bilateral filtering
CN114399661A (en) Instance awareness backbone network training method
CN113688715A (en) Facial expression recognition method and system
CN116884067B (en) Micro-expression recognition method based on improved implicit semantic data enhancement
KR102409041B1 (en) portfolio asset allocation reinforcement learning method using actor critic model
CN112241785A (en) Book interview method based on deep reinforcement learning
CN115310999A (en) Enterprise power utilization behavior analysis method and system based on multilayer perceptron and sequencing network
CN113762415A (en) Neural network-based intelligent matching method and system for automobile financial products
CN116363498A (en) Loess plateau terrace automatic identification method based on deep migration learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination