CN110335162A - A kind of stock market quantization transaction system and algorithm based on deeply study - Google Patents
A kind of stock market quantization transaction system and algorithm based on deeply study Download PDFInfo
- Publication number
- CN110335162A CN110335162A CN201910650290.7A CN201910650290A CN110335162A CN 110335162 A CN110335162 A CN 110335162A CN 201910650290 A CN201910650290 A CN 201910650290A CN 110335162 A CN110335162 A CN 110335162A
- Authority
- CN
- China
- Prior art keywords
- agent
- decision
- stock
- strategy
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Development Economics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Technology Law (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention discloses a kind of stock market quantization transaction systems and algorithm based on deeply study, and the quantization transaction system includes Agent, and the Agent is formed using LSTM net structure, and input includes the current state s of the current time t of stock markett, record have hidden state h in the LSTM network of the association in time between historical datat‑1With the decision a of last moment t-1t‑1, export the decision a at current timet.The present invention constructs the Agent of transaction system using LSTM recurrent neural network, the ability that there is large-scale LSTM model height to indicate feature, may learn temporal characteristics abundant indicates, the problem of the problem of Agent based on LSTM network can excavate the time series pattern in stock certificate data and remember historical behavior, the financial characterization generated when being able to solve in the prior art by existing quantization transaction as decision problem processing and optimal movement execute.
Description
Technical field
The present invention relates to stock trading technology fields, and in particular to a kind of stock market quantization based on deeply study
Transaction system and algorithm.
Background technique
China's securities market since the 1980s, has gone through two as one of main capital market in the whole world
The development course of more than ten years.In recent years, quantization transaction gradually enters into people's sight.Quantization transaction is not required to establish and manage with finance
The connection of opinion, but finance data is analyzed using mathematical model and artificial intelligence model;Based on these mathematical models, quantization is handed over
Maximum profit is easily obtained by the trading instruction of computer program sending.Compared with traditional trading strategies, quantization transaction tool
Have more advantages: quantization transaction system can make rapid reaction with real-time tracking turn of the market and to it;And plan of trading
Design slightly is rather than the predefined rule of the mankind based on the mathematical model by acquiring in long history data, thus can with gram
Take the emotion influence factor of mankind deal maker.
Quantization transaction has many algorithms, and most of algorithms are focused on to financial market dynamic modeling, and are done based on these models
Decision out.However, simple model is difficult to capture all since participant in the market has complexity and continually changing behavior
The important attribute of market situation can be characterized.Therefore, the trade decision process based on model is easier failure.Another research side
Method is emphatically handled quantization transaction as a decision problem, is directly carried out using the Agent of transaction system training smart
Decision, and nowadays this method is faced with two defects for characterizing financial signal and executing optimal movement.
First defect is derived from the difficulty of stock market character representation.Include much noise, fluctuating and the number of share of stock of movement
According to being typically denoted as highly unstable time series.It is usually artificial to extract stock in order to mitigate data noise and uncertainty
Ticket feature, such as rolling average or stochastic technique index, to summarize stock market situation.Currently, in terms of quantifying transaction
There are many work of extensive investigative technique analysis indexes, however, it is exactly its extensive energy that technology, which analyzes a well-known disadvantage,
Power is poor.For example, Moving Average feature is enough to describe stock trend, but may suffer heavy losses in mean regression market.
Second defect is since stock exchange execution is dynamic behaviour, this is a system sex work, needs to consider to be permitted
More practical factors.Often changing transaction position does not only have any contribution to profit, but also be easy to cause transaction cost and sliding point
Massive losses.
Summary of the invention
The purpose of the present invention is to provide a kind of stock markets based on deeply study to quantify transaction system and algorithm,
The above-mentioned financial signal table that the algorithm generates when being able to solve in the prior art by existing quantization transaction as decision problem processing
The problem of the problem of sign and optimal movement execute.
The present invention is achieved through the following technical solutions:
A kind of stock market quantization transaction system based on deeply study, the quantization transaction system includes Agent,
The Agent is formed using LSTM net structure, and input includes the current state s of the current time t of stock markett, record have
Hidden state h in the LSTM network of association in time between historical datat-1With the decision a of last moment t-1t-1, export and work as
The decision a at preceding momentt.The Agent of transaction system of the invention using LSTM recurrent neural network construction transaction system, large size
The ability that there is LSTM model height to indicate feature, may learn temporal characteristics abundant indicates, based on LSTM network
Agent can excavate the time series pattern in stock certificate data and remember historical behavior.Therefore, the present invention is in characterization stock certificate data
When, it can directly learn character representation more powerful out from stock certificate data, rather than be indicated using predefined manual features.
Also, except when historical behavior and corresponding position need to define in policy learning part simultaneously except preceding market condition
Modeling, Agent can learn in stock exchange process to a greater extent to more preferably strategy, to select optimal movement.
As a further improvement of the present invention, Agent obtain historical stock data be trained, training when according to it is current when
Carve the current state s of tt, record have hidden state h in the LSTM network of the association in time between historical datat-1And last moment
The decision a of t-1t-1, and export the decision a at current timet;Calculate the decision a at current timetIn tactful πθCorresponding accumulation
Reward and adaptive expectations;Further according to Policy-Gradient algorithm optimization strategy πθ, it realizes accumulation award obtained and maximizes, optimization
Strategy;It every time all will more new strategy π after training iterationθ, finally when reaching frequency of training, Agent according to it is newest strategy into
Row stock exchange.
Further, the Agent includes for the memory module based on historical information prediction future state and for determining
The decision-making module of which kind of movement is taken, the memory module receives the hidden state h of inputt-1, the decision-making module is according to input
Current state st, last moment t-1 decision at-1With hidden state ht-1Export the decision a at current timet。
Further, the above-mentioned stock market quantization transaction system based on deeply study further includes frequency of training control mould
Block and real trade module, the frequency of training control module be stored with Agent standard exercise number or receive user it is defeated
The standard exercise number entered, and record the frequency of training of Agent;Agent terminates when frequency of training reaches standard exercise number
Training, real trade module carry out stock exchange using the strategy in newest decision.
The present invention also provides a kind of stock markets based on deeply study to quantify trading algorithms, which passes through quantization
The agent of transaction system training smart directly carries out decision;The Agent is constructed using LSTM recurrent neural network,
Input includes the current state st of the current time t of stock market, records the LSTM network for having the association in time between historical data
In hidden state ht-1 and last moment t-1 decision at-1, export current time decision at, the algorithm include with
Lower step:
S1, stock certificate data is obtained;
S2, Agent obtain the current state s of the current time t of stock markett, record there is time between historical data to close
Hidden state h in the LSTM network of connectiont-1With the decision a of last moment t-1t-1, and export the decision a at current timet;
S3, Agent calculate the decision a at current timetIn tactful πθCorresponding progressive award and adaptive expectations;
S4, Agent are according to Policy-Gradient algorithm optimization strategy πθ, finally make the award of accumulation obtained in step S3 real
Now maximize;
S5, judge whether to reach frequency of training, be, jump to step S6, otherwise jump to step S2;
S6, Agent carry out stock exchange according to newest strategy.
In the prior art, the quantization transaction system of stock handles quantization transaction as a decision problem, utilizes friendship
The easily intelligentized agent of systematic training directly carries out decision, and there are two that characterize financial signal and the optimal movement of execution to lack
It falls into.Algorithm in this programme directly learns character representation more powerful out when characterizing stock certificate data from stock certificate data, without
It is to be indicated using predefined manual features, Agent can learn in stock exchange process to a greater extent to more preferably
Strategy, to select optimal movement.LSTM network is a kind of special recurrent neural network, it by input gate, forget door and
Three doors of out gate safeguard information at any time.The architecture of oriented cycles creates the internal state of network, allows it
It handles time-based sequence data and remembers that timing contacts, therefore can solve long-term Dependence Problem.Large-scale LSTM model tool
The ability for thering is height to indicate feature, it can study to temporal characteristics abundant indicates that the Agent based on LSTM network can be dug
Pick stock certificate data in time series pattern and remember historical behavior.
Preferably, the stock certificate data includes daily opening price, closing price, highest price, lowest price and historical trend data
Amount.
Further, the strategy includes Long Position long, neutrality position neutral and Short Position short tri- choosings
, it enables
at∈ { long, neutral, short }={ 1,0, -1 } (5);
Step S3 specifically includes the following steps:
The decision a at S31, Agent acquisition current timetIn tactful πθCorresponding reward value, the reward value r of moment ttFor
In formula (2), whereinWithRespectively represent the opening price and closing price of t moment;△ n is the variable quantity of trading volume;
N is the current quantity that agent holds stock;M is existing assets, and I is initial wealth;ctIt indicates in moment t since transaction generates
Transaction cost;
S32, progressive award is calculated, enabling τ is state, movement and the track sets of reward: τ=(s0, a0, r0, s1, a1,
r1...);Then, the progressive award R (τ) of track τ is
T indicates the number of parameters for needing to optimize in algorithm model in each plot in formula (7);
S33, desired total reward value is calculated as J (θ):
In step S32 and S33, tactful πθIt is a kind of rule, Agent reference policy is determined to take and executed dynamic
Make;Assuming that θ indicates strategy πθParameter, J (θ) is to define desired total reward value, track τ~πθIt is state, movement and reward
Sequence: τ=(s0, a0, r0, s1, a1, r1...);
S34, our target are to solve for reasonable parameter θ and expectancy reward J (θ) are maximized, and need to enable J (θ) maximum
Change, such as formula (9):
According to optimization algorithm, by the gradient for solving J (θ)And then undated parameter θ, it can finally acquire function J
The locally optimal solution of (θ).
Wherein, step S34 solves the gradient of J (θ)The specific method is as follows with undated parameter θ:
(a) solution strategies gradient g:
(b), Utilization strategies gradient g adjustable strategies parameter θ obtains locally optimal solution:
θ←θ+αg (3);
Wherein α is learning rate.
In the present solution, when solving LSTM network, first choice uses parameter to carry out parametrization table for the deep neural network of θ
Show strategy, and Utilization strategies gradient method carrys out optimisation strategy πθ, the reason is that the expectation that it is capable of directly optimisation strategy is always awarded, and
Optimal policy is directly searched in policy space in end-to-end mode, eliminates cumbersome intermediate link.In intensified learning
Strategy optimization algorithm ratio Q-Learning method is simpler, has the differentiable target letter for hiding parameter since it only needs one
Number, does not need to describe different market conditions using series of discrete state yet, but can be directly from continuous perception number
According to middle learning strategy.
Compared with prior art, the present invention having the following advantages and benefits:
1, the Agent of transaction system of the invention using LSTM recurrent neural network construction transaction system, large-scale LSTM mould
The ability that there is type height to indicate feature, may learn temporal characteristics abundant indicates, the Agent based on LSTM network can be with
It excavates the time series pattern in stock certificate data and remembers historical behavior.Therefore, the present invention, can be direct when characterizing stock certificate data
Learn character representation more powerful out from stock certificate data, rather than is indicated using predefined manual features.Also, except when
Except preceding market condition, historical behavior and corresponding position need clearly to model in policy learning part simultaneously, and Agent is more
It can learn in stock exchange process in big degree to more preferably strategy, to select optimal movement.
2, algorithm of the invention is when solving LSTM network, and first choice uses parameter to be joined for the deep neural network of θ
Numberization indicates strategy, and Utilization strategies gradient method carrys out optimisation strategy πθ, the reason is that the expectation that it is capable of directly optimisation strategy is total
Award, and optimal policy is searched in policy space directly in end-to-end mode, eliminate cumbersome intermediate link.Extensive chemical
Strategy optimization algorithm ratio Q-Learning method in habit is simpler, has the differentiable for hiding parameter since it only needs one
Objective function does not need to describe different market conditions using series of discrete state yet, but can be directly from continuous
Learning strategy in perception data.
Detailed description of the invention
Attached drawing described herein is used to provide to further understand the embodiment of the present invention, constitutes one of the application
Point, do not constitute the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is that the stock market in the embodiment of the present invention quantifies transaction system Agent internal model structure;
Fig. 2 (a)-Fig. 2 (f) be followed successively by chosen in the embodiment of the present invention 6 stocks (stock code 600547,600028,
600999,601988,002415,600016) raw financial data;
Fig. 3 (a)-Fig. 3 (f) be chosen in the embodiment of the present invention 6 stocks (stock code 600547,600028,
600999,601988,002415,600016) Agent based on the LSTM network and Agent based on full Connection Neural Network
Prediction accumulated earnings;
Fig. 4 (a) is the backtracking test result of stock 002415;
Fig. 4 (b) is the backtracking test result of stock 6001988.
Specific embodiment
In the prior art, the quantization transaction system of stock handles quantization transaction as a decision problem, utilizes friendship
The easy intelligentized agent of systematic training directly carries out decision, and nowadays this method is faced with the financial signal of characterization and executes optimal
Two defects of movement, the defect are generated when by existing quantization transaction as decision problem processing.The purpose of the present invention
It is to propose a kind of algorithm of defect for being able to solve financial characterization and optimal movement execution, when characterizing stock certificate data,
Directly learn character representation more powerful out from stock certificate data, rather than is indicated using predefined manual features.Also, it removes
Except current market condition, historical behavior and corresponding position need clearly to model in policy learning part simultaneously,
Agent can learn in stock exchange process to a greater extent to more preferably strategy, to select optimal movement.
To make the objectives, technical solutions, and advantages of the present invention clearer, below with reference to embodiment and attached drawing, to this
Invention is described in further detail, and exemplary embodiment of the invention and its explanation for explaining only the invention, are not made
For limitation of the invention.
In the following description, a large amount of specific details are elaborated in order to provide a thorough understanding of the present invention.However, for this
Field those of ordinary skill it is evident that: the present invention need not be carried out using these specific details.In other instances, it is
The present invention that avoids confusion, does not specifically describe well known structure, circuit, material or method.
As shown in Figure 1, a kind of stock market quantization transaction system based on deeply study includes Agent, it is described
Agent is formed using LSTM net structure, and input includes the current state s of the current time t of stock markett, record go through
Hidden state h in the LSTM network of association in time between history datat-1With the decision a of last moment t-1t-1, output is currently
The decision a at momentt, the prize example value that Agent is obtained at this time is rt.A kind of stock market based on deeply study of the invention
Quantization trading algorithms construct the Agent of transaction system using LSTM recurrent neural network.LSTM network, which is that one kind is special, to be passed
Return neural network, it forgets three doors of door and out gate and safeguard information at any time by input gate.The system of oriented cycles
The Structure Creating internal state of network allows it to handle time-based sequence data and remembers that timing contacts, therefore can be with
Solve long-term Dependence Problem.The ability that there is large-scale LSTM model height to indicate feature, it can study is special to the time abundant
Sign indicates that the Agent based on LSTM network can excavate the time series pattern in stock certificate data and remember historical behavior.It is based on
The Agent of LSTM is made of input layer, 5 layers LSTM layers (every layer by 31 hidden units), hidden layer and soft-max layers.
Meanwhile a kind of stock market quantization trading algorithms based on deeply study use the policy optimization of intensified learning
Method come train transaction Agent.Policy-Gradient is a kind of common policy optimization method, it is total by the expectation of continuous calculative strategy
The gradient about policing parameter is awarded to update policing parameter, finally converges on optimal policy.Therefore we are solving LSTM net
When network, first choice uses parameter to indicate strategy for the deep neural network of θ to carry out parametrization, and Utilization strategies gradient method is next excellent
Change strategy πθ, the reason is that the expectation that it is capable of directly optimisation strategy is always awarded, and in end-to-end mode directly in policy space
Middle search optimal policy, eliminates cumbersome intermediate link.Strategy optimization algorithm ratio Q-Learning method in intensified learning
It is simpler, there is the differentiable objective function for hiding parameter since it only needs one, also do not need to utilize series of discrete shape
State describes different market conditions (in Q-Learning), but can directly from continuous perception data (market characteristics)
Middle learning strategy.
Policy-Gradient method be it is a kind of directly come approximate representation and optimisation strategy using approaching device, finally obtain optimal policy
Method, this method optimization is strategy expectation always award maxθE[R|πθ],
Wherein R indicates award summation obtained in the three unities (episode);
T indicates that the number of parameters for needing to optimize in algorithm model in each plot, plot (episode) are intensified learning neck
Domain generic term and means no longer carry out repeating its meaning in the present embodiment.
The most common thought of Policy-Gradient is the probability for increasing total higher plot of award and occurring.Policy-Gradient method it is specific
Process is: assuming that the state of a complete episode, movement and the track of award are as follows: τ=(s0, a0, r0, s1, a1, r1..., sT-1,
aT-1, rT-1, sT), then the form that Policy-Gradient is expressed as:
Utilize the gradient adjustable strategies parameter θ:
θ←θ+αg (3);
Wherein, α is learning rate, controls the rate of policing parameter update.In formula (2)Ladder
Degree item indicates to can be improved the direction of track probability of occurrence, is multiplied by after scoring function R, can always to award in single plot
Higher τ more " firmly draws " probability density over to one's side.Different tracks is always much awarded if had collected, passes through above-mentioned training process
Meeting is so that probability density to higher course bearing movement is always awarded, maximizes the probability that high award track τ occurs.
Above-mentioned si(i=0,1 ..., T-1), ai(i=0,1 ..., T-1), ri(i=0,1 ..., T-1) successively indicate the i moment
State, movement and reward.
However in some cases, total award R of each plot is not negative, then the value of all gradient g is also all big
In equal to 0.It encounters each track τ in the training process at this time, can all make probability density to positive direction " drawing over to one's side ", very great Cheng
Degree slows down pace of learning.This meeting is so that the variance of gradient g is very big.Therefore R can be reduced using certain normalizing operation
The variance of gradient g.The skill enables algorithm to improve the probability of occurrence of total award biggish track τ of R, while reducing total award R
The probability of occurrence of lesser track τ.According to above-mentioned thought, by the unity of form of Policy-Gradient are as follows:
Wherein, b is a baseline relevant to current track τ, is usually arranged as an expectation estimation of R, it is therefore an objective to subtract
The variance of small R.As can be seen that R is more than benchmark b more, corresponding track τ selected probability is bigger.Therefore in extensive shape
In the DRL task of state, can be parameterized by deep neural network indicates strategy, and is asked using traditional Policy-Gradient method
Solve optimal policy.In brief, there are two advantages for policy optimization method tool: the flexible target of optimization and the market item continuously described
Part, therefore, Selection Strategy optimization algorithm of the present invention is as transaction frame.
According to process of exchange, there are two Functional portions by Agent: one is the note based on historical information prediction future state
Part is recalled, the other is determining the decision part which kind of takes act.To which the Agent includes for being based on historical information
Predict that the memory module of future state and the decision-making module for determining which kind of takes act, the memory module receive input
Hidden state ht-1, the decision-making module is according to the current state s of inputt, last moment t-1 decision at-1And hidden state
ht-1Export the decision a at current timet.In the present embodiment, Agent just realizes the two functions using a LSTM network enough,
Reason is as follows:
(1) input of Agent not only includes the market watch of each period, further includes the comprehensive letter changed over time
Breath.LSTM deep neural network can be with connecting each other between memory time sequence data.
(2) search space being greatly reduced.If search space is excessive, multilayer dense network may be because of search space mistake
It can not train greatly;The structure and parameter of LSTM network can be shared at any time, this can reduce trained difficulty on a large scale.
In the present embodiment, quantization transaction issues will be solved, using deeply study (DRL) to design the friendship of stock market
Easy system.DRL supports, since deep neural network has height ability to express, to be based on from the end-to-end study for perceiving action
The Agent of DRL can learn from high dimensional data to high dimensional feature expression, solve more complicated decision problem.S in Fig. 1tIt represents
Market situation;htRepresent the hidden state of LSTM network;atIt is the movement that t moment Agent is taken;With htAnd atAssociated st
It is input into LSTM network.The input of Agent includes the current state s in markettWith the hidden state h in LSTM networkt-1,
It can remember the association in time between historical data.In addition, the decision of last moment is also by additional input to LSTM.Agent will
Export the decision at current time.The working principle of Agent is as follows: each moment Agent obtains Vehicles Collected from Market state and integrates and goes through
History data are to predict future state.Comprehensive all information, Agent will make in real time trade decision, which is used for more New Transaction
The inner parameter of Agent obtains direct yield based on decision.After successive ignition, Agent can learn to how improving it to determine
Plan.After finally reaching frequency of training, Agent will carry out stock exchange according to newest strategy.
A kind of stock market quantization trading algorithms based on deeply study, the algorithm pass through quantization transaction system training
Intelligentized agent directly carries out decision;The Agent is constructed using LSTM recurrent neural network, the algorithm include with
Lower step:
S1, stock certificate data is obtained;
S2, Agent obtain the current state s of the current time t of stock markett, record there is time between historical data to close
Hidden state h in the LSTM network of connectiont-1With the decision a of last moment t-1t-1, and export the decision a at current timet, wherein
The hidden state h of incoming subsequent timet-1Contain the anticipation according to the situation of change of historical stock data to subsequent time state
Meaning so that stock certificate data becomes the data based on time series and establishes timing connection, therefore can solve long-term rely on and ask
Topic;
S3, at this point, Agent known state, movement and reward constitute sequence, thus agent according to known state, movement
And the sequence of reward composition calculates the decision a at current timetIn tactful πθCorresponding progressive award and adaptive expectations;
S4, Agent are according to Policy-Gradient algorithm optimization strategy πθ, so that the award of accumulation obtained in step S3 is realized most
Bigization;
S5, judge whether to reach frequency of training, be, jump to step S6, otherwise jump to step S2;
S6, Agent carry out stock exchange according to newest strategy.
It include the price series traded in the state of Agent, but due to the randomness and original in market in the present embodiment
Much noise in beginning price series, we, which are additionally added, summarizes market trend or dynamic financial technology index.Therefore the number of share of stock
According to including daily opening price, closing price, highest price, lowest price, dynamic financial technology index and historical trend data amount.
The strategy includes tri- Long Position long, neutrality position neutral and Short Position short options, is enabled
at∈ { long, neutral, short }={ 1,0, -1 } (5);
Assume that the result of neutral position will also have an impact the income of dealer in the strategy.The target of Agent is
Maximize final income.
Step S3 specifically includes the following steps:
The decision a at S31, agent acquisition current timetIn tactful πθCorresponding reward value, the reward value r of moment ttFor
In formula (6), whereinWithRespectively represent the opening price and closing price of t moment;△ n is the variable quantity of trading volume;
N is the current quantity that agent holds stock;M is existing assets, and I is initial wealth;ctIt indicates in moment t since transaction generates
Transaction cost;In stock exchange trading system, the time interval of Agent deal maker's transaction is also t, i.e., deal maker is in each time
It trades at the end of the t of interval, the reward value accordingly generated is used to assess income.
S32, progressive award is calculated, enabling τ is state, movement and the track sets of reward: τ=(s0, a0, r0, s1, a1,
r1...);The track sets have executed reward value r in step S31tWhen, just has become known, then, the accumulation of track τ is encouraged
Encourage R (τ) are as follows:
T indicates the number of parameters for needing to optimize in algorithm model in each plot in formula (7);
S33, desired total reward value is calculated as J (θ):
In step S32 and S33, tactful π is a kind of rule, i.e. Agent reference policy is determined to take and executed
Movement.θ indicates strategy πθParameter, J (θ) is to define desired total reward value.Track τ~πθIt is using strategy πθUnder state,
The sequence of movement and reward: τ=(s0, a0, r0, s1, a1, r1...) and return value be track progressive award
Our target is to find a reasonable parameter θ expectancy reward J (θ) maximization is made to obtain maxθJ (θ), therefore need:
According to optimization algorithm, by the Policy-Gradient for solving J (θ)And then undated parameter θ, it can finally acquire letter
The locally optimal solution of number J (θ).Specifically:
Policy-Gradient
Utilize Policy-Gradient g adjustable strategies parameter θ:
θ←θ+αg (3)。
The formula indicates learning rate that multiplied by Policy-Gradient g, plus current strategies, obtained policing parameter is assigned to tactful ginseng
Number, obtained strategy is as new strategy, with adjustable strategies parameter θ.
Deep learning is combined with intensified learning in the present embodiment, the expression and online friendship for stock signal characteristic
Easily.In this frame, the Agent based on LSTM is capable of the dynamic of automatic sensing stock market, and LSTM can subtract to a certain extent
The difficulty of stock feature is gently manually extracted from mass data.In addition, we increase financial technology index in LSTM network,
To reduce the influence of market noise.Nitrification enhancement is used to train Agent study trading strategies.
In order to verify the effect of transaction system and trading algorithms in the present embodiment, present inventor has carried out data
Verifying and comparative analysis, in the process, data acquiring mode are as follows: the stock certificate data used is obtained by Tushare interface,
Tushare is free, open source a python Money Data interface packet, and main realize adopts the finance datas such as stock from data
Collection, the process that store of surface cleaning to data can provide for financial analysis personnel and quick, clean and tidy and various be convenient for analyzing
Data, greatly mitigate workload in terms of data acquisition for them, make they focus more on strategy and model research
In realization.The selection of data set are as follows: transaction system can capture the day line number evidence of China Stock Markets's stock, including daily closing price
And opening price, highest price and lowest price.Had chosen in the present embodiment 6 stocks (stock code 600547,600028,600999,
601988,002415,600016) data from 2015-10-25 to 2017-10-25, these data respectively show completely not
Same market trend, successively as shown in Fig. 2 (a)-Fig. 2 (f).Fig. 2 (a)-Fig. 2 (f) display mainly includes the original of three trend
Finance data, d1, d2, d3, d4 in Fig. 2 (a)-Fig. 2 (f) respectively represent four kinds of different customized dynamic financial technologies and refer to
Mark combination, the Index Content of d1, d2, d3, d4 are listed in respectively in table 1:
Table 1 includes that the feature of financial technology index combines
The brief meaning of listed technical indicator will be explained in table 2 in table 1:
2 financial technology Index Content meaning of table
To in the training process of Agent, the turnover rate of baseline function b is 0.8.The weight of Agent is in -0.2 and 0.2 range
Inside equably initialized.We will use Adam majorized function to train the neural network of Agent, learning rate 0.001.
Inventor compares the Agent based on LSTM and the Agent performance based on full Connection Neural Network first:
The neural network connected entirely is formed by four layers: input layer;One full articulamentum with 32 nodes;One has 64
The dropout layer of a node;Soft-max layer with 3 output units.And it is 0.007 that learning rate, which is arranged, and attenuation rate is
0.99, Adam optimization method is for optimizing loss function.
The finance data of two Agent is inputted, associated financial data include daily opening price, closing price and historical trend
Data volume.Original capital is 500,000 yuan (CNY), and the position scale traded every time is fixed as 10,000 yuan.
By 10,000 repetitive exercises, result final later is utilized to the performance of assessment models.Fig. 3 (a)-(f) is shown
Two Agent of training stage different performances is compared the Agent based on LSTM network on different stocks and is connected based on complete
The accumulated earnings of the Agent of neural network.The final profit that table 2 summarizes each Agent after training and can obtain.
The full Connection Neural Network of table 3 and the performance of LSTM network compare
In table 3 and Fig. 3 (a)-(f), FC represents full Connection Neural Network.It is horizontal in Fig. 2 (a)-Fig. 2 (f) and Fig. 3 (a)-(f)
Axis episode represents the number of iterations of training, and longitudinal axis portfolio represents asset holdings, and unit is 1000 RMB.
It can be seen that the height ability to express due to deep neural network from Fig. 3 (a)-(f) and table 1, both
Agent can show more advantages on stock market.When initial data significantly fluctuation up and down, the mind that connects entirely
It performs poor through network, the Agent based on LSTM can show more preferably, be that can remember to count due to the Agent based on LSTM
According to sequential correlation.The experimental results showed that LSTM neural network is more suitable for building transaction Agent.
Inventor also had chosen 2 stocks to the transaction system in the present embodiment and has carried out time survey, from October 25 in 2005
Day on October 25th, 2,017 13 years day line number evidences is had collected in total, backtracking test is carried out to the on-line performance performance of Agent.
Wherein, preceding 4 years data are used to be arranged the parameter of Agent;Testing time section is from October 25 to 2017 years October in 2009 25.
For 2009 to 2017 each years, the sliding window of training data was as unit of 2 years, with one in entire training set
Year pushes ahead for unit and is trained.In sliding window, the parameter previously learnt is for instructing following friendship in 1 year
Easily.The process learnt by sliding window, Agent will be seen that continually changing stock market.Time in 2009 to 2017
Test result of tracing back is shown in Fig. 4 (a) and Fig. 4 (b).Fig. 4 (a) is the backtracking test result of stock 002415;Fig. 4 (b) is stock
The backtracking test result of ticket 6001988.It can be seen that, Agent in most cases can be with additional income from result.
In addition, technical indicator is a kind of based on historical price, quantity or (in the feelings of futures contract in Technical analysis
Under condition) be intended to predict financial market direction uncovered position interest mathematical computations.Many technical indexs have also been employed that, trade
Member also the new index of continual exploitation to obtain better result.Since the stock price from market contains excessive noise, this
Technical indicator used in the examples includes that opening price, closing price etc. can also increase technical finger in other embodiments
Mark is to capture main trend, such as the input packet of the Agent based on LSTM neural network not only includes daily opening price, closing quotation
Valence, highest price and lowest price also include trading volume and other technologies indicator combination.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.
Claims (8)
1. a kind of stock market based on deeply study quantifies transaction system, the quantization transaction system includes Agent,
It is characterized in that, the Agent is formed using LSTM net structure, and input includes the current shape of the current time t of stock market
State st, record have hidden state h in the LSTM network of the association in time between historical datat-1With the decision of last moment t-1
at-1, export the decision a at current timet。
2. a kind of stock market based on deeply study according to claim 1 quantifies transaction system, feature exists
In Agent obtains historical stock data and is trained, according to the current state s of current time t when trainingt, record have history number
According to association in time LSTM network in hidden state ht-1With the decision a of last moment t-1t-1, and export current time
Decision at;Calculate the decision a at current timetIn tactful πθ.Corresponding progressive award and adaptive expectations;Further according to strategy
Gradient algorithm optimisation strategy πθ, it realizes accumulation award obtained and maximizes, optimisation strategy;It every time all will more after training iteration
New strategy πθ, finally when reaching frequency of training, Agent carries out stock exchange according to newest strategy.
3. a kind of stock market based on deeply study according to claim 1 or 2 quantifies transaction system, feature
It is, the Agent includes for the memory module based on historical information prediction future state and for determining which kind of takes move
The decision-making module of work, the memory module receive the hidden state h of inputt-1, the decision-making module is according to the current state of input
st, last moment t-1 decision at-1With hidden state ht-1Export the decision a at current timet。
4. a kind of stock market based on deeply study according to claim 1 or 2 quantifies transaction system, feature
It is, further includes frequency of training control module and real trade module, the frequency of training control module is stored with the mark of Agent
Quasi- frequency of training or the standard exercise number for receiving user's input, and record the frequency of training of Agent;Agent is in training time
When number reaches standard exercise number, terminate training, real trade module carries out stock exchange using the strategy in newest decision.
5. a kind of stock market based on deeply study quantifies trading algorithms, which passes through quantization transaction system training intelligence
The agent of energyization directly carries out decision;It is characterized in that, the Agent is constructed using LSTM recurrent neural network, the calculation
Method the following steps are included:
S1, stock certificate data is obtained;
S2, Agent obtain the current state s of the current time t of stock markett, record have association in time between historical data
Hidden state h in LSTM networkt-1With the decision a of last moment t-1t-1, and export the decision a at current timet;
S3, Agent calculate the decision a at current timetIn tactful πθCorresponding progressive award and adaptive expectations;
S4, Agent are according to Policy-Gradient algorithm optimization strategy πθ, realize the award of accumulation obtained in step S3 most
Bigization;
S5, judge whether to reach frequency of training, be, jump to step S6, otherwise jump to step S2;
S6, Agent carry out stock exchange according to newest strategy.
6. a kind of stock market based on deeply study according to claim 5 quantifies trading algorithms, feature exists
In the stock certificate data includes daily opening price, closing price, highest price, lowest price and historical trend data amount.
7. a kind of stock market based on deeply study according to claim 5 quantifies trading algorithms, feature exists
In,
The strategy includes tri- Long Position long, neutrality position neutral and Short Position short options, is enabled
at∈ { long, neutral, short }={ 1,0, -1 } (5);
Step S3 specifically includes the following steps:
The decision a at S31, Agent acquisition current timetIn tactful πθCorresponding reward value, the reward value r of moment ttFor
In formula (2), whereinWithRespectively represent the opening price and closing price of t moment;Δ n is the variable quantity of trading volume;N is
Agent holds the current quantity of stock;M is existing assets, and I is initial wealth;ctIt indicates in moment t due to transaction generation
Transaction cost;
S32, progressive award is calculated, enabling τ is state, movement and the track sets of reward: τ=(s0, a0, r0, s1, a1, r1...);
Then, the progressive award R (τ) of track τ is
T indicates the number of parameters for needing to optimize in algorithm model in each plot in formula (7);
S33, desired total reward value is calculated as J (θ):
In step S32 and S33, tactful πθIt is a kind of rule, Agent reference policy determines the movement that take and execute;It is false
If θ indicates strategy πθParameter, J (θ) is to define desired total reward value, track τ~πθIt is state, movement and the sequence of reward:
τ=(s0, a0, r0, s1, a1, r1...);
S34, our target are to solve for reasonable parameter θ and expectancy reward J (θ) are maximized, and need that J (θ) is enabled to maximize, such as
Formula (9):
According to optimization algorithm, by the gradient for solving J (θ)And then undated parameter θ, it can finally acquire function J's (θ)
Locally optimal solution.
8. a kind of stock market based on deeply study according to claim 7 quantifies trading algorithms, feature exists
In step S34 solves the gradient of J (θ)The specific method is as follows with undated parameter θ:
(a) solution strategies gradient g:
(b), Utilization strategies gradient g adjustable strategies parameter θ obtains locally optimal solution:
θ←θ+αg (3);Wherein α is learning rate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910650290.7A CN110335162A (en) | 2019-07-18 | 2019-07-18 | A kind of stock market quantization transaction system and algorithm based on deeply study |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910650290.7A CN110335162A (en) | 2019-07-18 | 2019-07-18 | A kind of stock market quantization transaction system and algorithm based on deeply study |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110335162A true CN110335162A (en) | 2019-10-15 |
Family
ID=68145668
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910650290.7A Pending CN110335162A (en) | 2019-07-18 | 2019-07-18 | A kind of stock market quantization transaction system and algorithm based on deeply study |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110335162A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111061564A (en) * | 2019-12-11 | 2020-04-24 | 中国建设银行股份有限公司 | Server capacity adjusting method and device and electronic equipment |
CN111539495A (en) * | 2020-07-10 | 2020-08-14 | 北京海天瑞声科技股份有限公司 | Recognition method based on recognition model, model training method and device |
CN112068420A (en) * | 2020-07-30 | 2020-12-11 | 同济大学 | Real-time control method and device for drainage system |
CN112101556A (en) * | 2020-08-25 | 2020-12-18 | 清华大学 | Method and device for identifying and removing redundant information in environment observation quantity |
CN112884576A (en) * | 2021-02-02 | 2021-06-01 | 上海卡方信息科技有限公司 | Stock trading method based on reinforcement learning |
TWI732650B (en) * | 2020-08-12 | 2021-07-01 | 中國信託商業銀行股份有限公司 | Stock prediction method and server end for stock prediction |
CN113283986A (en) * | 2021-04-28 | 2021-08-20 | 南京大学 | Algorithm transaction system and training method of algorithm transaction model based on system |
CN113807965A (en) * | 2021-09-17 | 2021-12-17 | 中国银行股份有限公司 | Transaction data analysis and prediction method and device |
CN114092254A (en) * | 2021-11-26 | 2022-02-25 | 桂林电子科技大学 | Consumption financial transaction method based on artificial intelligence and transaction system thereof |
-
2019
- 2019-07-18 CN CN201910650290.7A patent/CN110335162A/en active Pending
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111061564A (en) * | 2019-12-11 | 2020-04-24 | 中国建设银行股份有限公司 | Server capacity adjusting method and device and electronic equipment |
CN111539495A (en) * | 2020-07-10 | 2020-08-14 | 北京海天瑞声科技股份有限公司 | Recognition method based on recognition model, model training method and device |
CN112068420A (en) * | 2020-07-30 | 2020-12-11 | 同济大学 | Real-time control method and device for drainage system |
TWI732650B (en) * | 2020-08-12 | 2021-07-01 | 中國信託商業銀行股份有限公司 | Stock prediction method and server end for stock prediction |
CN112101556A (en) * | 2020-08-25 | 2020-12-18 | 清华大学 | Method and device for identifying and removing redundant information in environment observation quantity |
CN112101556B (en) * | 2020-08-25 | 2021-08-10 | 清华大学 | Method and device for identifying and removing redundant information in environment observation quantity |
CN112884576A (en) * | 2021-02-02 | 2021-06-01 | 上海卡方信息科技有限公司 | Stock trading method based on reinforcement learning |
CN113283986A (en) * | 2021-04-28 | 2021-08-20 | 南京大学 | Algorithm transaction system and training method of algorithm transaction model based on system |
CN113807965A (en) * | 2021-09-17 | 2021-12-17 | 中国银行股份有限公司 | Transaction data analysis and prediction method and device |
CN114092254A (en) * | 2021-11-26 | 2022-02-25 | 桂林电子科技大学 | Consumption financial transaction method based on artificial intelligence and transaction system thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110335162A (en) | A kind of stock market quantization transaction system and algorithm based on deeply study | |
Wu et al. | Adaptive stock trading strategies with deep reinforcement learning methods | |
Lee et al. | Pattern discovery of fuzzy time series for financial prediction | |
Xiong et al. | Hybrid ARIMA-BPNN model for time series prediction of the Chinese stock market | |
Jiang et al. | Investor sentiment and machine learning: Predicting the price of China's crude oil futures market | |
Xu et al. | An optimized decomposition integration framework for carbon price prediction based on multi-factor two-stage feature dimension reduction | |
CN106886571A (en) | A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis | |
CN113435204A (en) | Stock price fluctuation prediction method based on news information | |
Wang et al. | An enhanced hybrid model based on multiple influencing factors and divide-conquer strategy for carbon price prediction | |
Nanayakkara et al. | Anticipatory ethics and the role of uncertainty | |
Tang et al. | Deep hierarchical strategy model for multi-source driven quantitative investment | |
Weng et al. | Stock price prediction based on LSTM and Bert | |
Wei et al. | Energy financial risk early warning model based on Bayesian network | |
CN108154264A (en) | A kind of Prediction of Stock Index method based on ARMA-LSTM models | |
Putri et al. | Currency movement forecasting using time series analysis and long short-term memory | |
CN115953215A (en) | Search type recommendation method based on time and graph structure | |
Koparanov et al. | Lookback period, epochs and hidden states effect on time series prediction using a LSTM based neural network | |
CN112418575A (en) | Futures initiative contract quantitative timing decision system based on cloud computing and artificial intelligence deep learning algorithm | |
Loginov et al. | On the impact of streaming interface heuristics on GP trading agents: an FX benchmarking study | |
Guo | Stock Price Prediction Using Machine Learning | |
Yong et al. | Adaptive detection of FOREX repetitive chart patterns | |
Xu et al. | Prediction Model of International Trade Risk Based on Stochastic Time-Series Neural Network | |
Li et al. | Ensemble Investment Strategies Based on Reinforcement Learning | |
Zhang et al. | A hybrid price prediction method for carbon trading with multi-data fusion and multi-frequency analysis | |
Rajanand et al. | Stock Price Prediction using Depthwise Pointwise CNN with Sequential LSTM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191015 |