CN110335162A - A kind of stock market quantization transaction system and algorithm based on deeply study - Google Patents

A kind of stock market quantization transaction system and algorithm based on deeply study Download PDF

Info

Publication number
CN110335162A
CN110335162A CN201910650290.7A CN201910650290A CN110335162A CN 110335162 A CN110335162 A CN 110335162A CN 201910650290 A CN201910650290 A CN 201910650290A CN 110335162 A CN110335162 A CN 110335162A
Authority
CN
China
Prior art keywords
agent
decision
stock
strategy
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910650290.7A
Other languages
Chinese (zh)
Inventor
吴佳
王晨
孙洪永
熊礼东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910650290.7A priority Critical patent/CN110335162A/en
Publication of CN110335162A publication Critical patent/CN110335162A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Technology Law (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a kind of stock market quantization transaction systems and algorithm based on deeply study, and the quantization transaction system includes Agent, and the Agent is formed using LSTM net structure, and input includes the current state s of the current time t of stock markett, record have hidden state h in the LSTM network of the association in time between historical datat‑1With the decision a of last moment t-1t‑1, export the decision a at current timet.The present invention constructs the Agent of transaction system using LSTM recurrent neural network, the ability that there is large-scale LSTM model height to indicate feature, may learn temporal characteristics abundant indicates, the problem of the problem of Agent based on LSTM network can excavate the time series pattern in stock certificate data and remember historical behavior, the financial characterization generated when being able to solve in the prior art by existing quantization transaction as decision problem processing and optimal movement execute.

Description

A kind of stock market quantization transaction system and algorithm based on deeply study
Technical field
The present invention relates to stock trading technology fields, and in particular to a kind of stock market quantization based on deeply study Transaction system and algorithm.
Background technique
China's securities market since the 1980s, has gone through two as one of main capital market in the whole world The development course of more than ten years.In recent years, quantization transaction gradually enters into people's sight.Quantization transaction is not required to establish and manage with finance The connection of opinion, but finance data is analyzed using mathematical model and artificial intelligence model;Based on these mathematical models, quantization is handed over Maximum profit is easily obtained by the trading instruction of computer program sending.Compared with traditional trading strategies, quantization transaction tool Have more advantages: quantization transaction system can make rapid reaction with real-time tracking turn of the market and to it;And plan of trading Design slightly is rather than the predefined rule of the mankind based on the mathematical model by acquiring in long history data, thus can with gram Take the emotion influence factor of mankind deal maker.
Quantization transaction has many algorithms, and most of algorithms are focused on to financial market dynamic modeling, and are done based on these models Decision out.However, simple model is difficult to capture all since participant in the market has complexity and continually changing behavior The important attribute of market situation can be characterized.Therefore, the trade decision process based on model is easier failure.Another research side Method is emphatically handled quantization transaction as a decision problem, is directly carried out using the Agent of transaction system training smart Decision, and nowadays this method is faced with two defects for characterizing financial signal and executing optimal movement.
First defect is derived from the difficulty of stock market character representation.Include much noise, fluctuating and the number of share of stock of movement According to being typically denoted as highly unstable time series.It is usually artificial to extract stock in order to mitigate data noise and uncertainty Ticket feature, such as rolling average or stochastic technique index, to summarize stock market situation.Currently, in terms of quantifying transaction There are many work of extensive investigative technique analysis indexes, however, it is exactly its extensive energy that technology, which analyzes a well-known disadvantage, Power is poor.For example, Moving Average feature is enough to describe stock trend, but may suffer heavy losses in mean regression market.
Second defect is since stock exchange execution is dynamic behaviour, this is a system sex work, needs to consider to be permitted More practical factors.Often changing transaction position does not only have any contribution to profit, but also be easy to cause transaction cost and sliding point Massive losses.
Summary of the invention
The purpose of the present invention is to provide a kind of stock markets based on deeply study to quantify transaction system and algorithm, The above-mentioned financial signal table that the algorithm generates when being able to solve in the prior art by existing quantization transaction as decision problem processing The problem of the problem of sign and optimal movement execute.
The present invention is achieved through the following technical solutions:
A kind of stock market quantization transaction system based on deeply study, the quantization transaction system includes Agent, The Agent is formed using LSTM net structure, and input includes the current state s of the current time t of stock markett, record have Hidden state h in the LSTM network of association in time between historical datat-1With the decision a of last moment t-1t-1, export and work as The decision a at preceding momentt.The Agent of transaction system of the invention using LSTM recurrent neural network construction transaction system, large size The ability that there is LSTM model height to indicate feature, may learn temporal characteristics abundant indicates, based on LSTM network Agent can excavate the time series pattern in stock certificate data and remember historical behavior.Therefore, the present invention is in characterization stock certificate data When, it can directly learn character representation more powerful out from stock certificate data, rather than be indicated using predefined manual features. Also, except when historical behavior and corresponding position need to define in policy learning part simultaneously except preceding market condition Modeling, Agent can learn in stock exchange process to a greater extent to more preferably strategy, to select optimal movement.
As a further improvement of the present invention, Agent obtain historical stock data be trained, training when according to it is current when Carve the current state s of tt, record have hidden state h in the LSTM network of the association in time between historical datat-1And last moment The decision a of t-1t-1, and export the decision a at current timet;Calculate the decision a at current timetIn tactful πθCorresponding accumulation Reward and adaptive expectations;Further according to Policy-Gradient algorithm optimization strategy πθ, it realizes accumulation award obtained and maximizes, optimization Strategy;It every time all will more new strategy π after training iterationθ, finally when reaching frequency of training, Agent according to it is newest strategy into Row stock exchange.
Further, the Agent includes for the memory module based on historical information prediction future state and for determining The decision-making module of which kind of movement is taken, the memory module receives the hidden state h of inputt-1, the decision-making module is according to input Current state st, last moment t-1 decision at-1With hidden state ht-1Export the decision a at current timet
Further, the above-mentioned stock market quantization transaction system based on deeply study further includes frequency of training control mould Block and real trade module, the frequency of training control module be stored with Agent standard exercise number or receive user it is defeated The standard exercise number entered, and record the frequency of training of Agent;Agent terminates when frequency of training reaches standard exercise number Training, real trade module carry out stock exchange using the strategy in newest decision.
The present invention also provides a kind of stock markets based on deeply study to quantify trading algorithms, which passes through quantization The agent of transaction system training smart directly carries out decision;The Agent is constructed using LSTM recurrent neural network, Input includes the current state st of the current time t of stock market, records the LSTM network for having the association in time between historical data In hidden state ht-1 and last moment t-1 decision at-1, export current time decision at, the algorithm include with Lower step:
S1, stock certificate data is obtained;
S2, Agent obtain the current state s of the current time t of stock markett, record there is time between historical data to close Hidden state h in the LSTM network of connectiont-1With the decision a of last moment t-1t-1, and export the decision a at current timet
S3, Agent calculate the decision a at current timetIn tactful πθCorresponding progressive award and adaptive expectations;
S4, Agent are according to Policy-Gradient algorithm optimization strategy πθ, finally make the award of accumulation obtained in step S3 real Now maximize;
S5, judge whether to reach frequency of training, be, jump to step S6, otherwise jump to step S2;
S6, Agent carry out stock exchange according to newest strategy.
In the prior art, the quantization transaction system of stock handles quantization transaction as a decision problem, utilizes friendship The easily intelligentized agent of systematic training directly carries out decision, and there are two that characterize financial signal and the optimal movement of execution to lack It falls into.Algorithm in this programme directly learns character representation more powerful out when characterizing stock certificate data from stock certificate data, without It is to be indicated using predefined manual features, Agent can learn in stock exchange process to a greater extent to more preferably Strategy, to select optimal movement.LSTM network is a kind of special recurrent neural network, it by input gate, forget door and Three doors of out gate safeguard information at any time.The architecture of oriented cycles creates the internal state of network, allows it It handles time-based sequence data and remembers that timing contacts, therefore can solve long-term Dependence Problem.Large-scale LSTM model tool The ability for thering is height to indicate feature, it can study to temporal characteristics abundant indicates that the Agent based on LSTM network can be dug Pick stock certificate data in time series pattern and remember historical behavior.
Preferably, the stock certificate data includes daily opening price, closing price, highest price, lowest price and historical trend data Amount.
Further, the strategy includes Long Position long, neutrality position neutral and Short Position short tri- choosings , it enables
at∈ { long, neutral, short }={ 1,0, -1 } (5);
Step S3 specifically includes the following steps:
The decision a at S31, Agent acquisition current timetIn tactful πθCorresponding reward value, the reward value r of moment ttFor
In formula (2), whereinWithRespectively represent the opening price and closing price of t moment;△ n is the variable quantity of trading volume; N is the current quantity that agent holds stock;M is existing assets, and I is initial wealth;ctIt indicates in moment t since transaction generates Transaction cost;
S32, progressive award is calculated, enabling τ is state, movement and the track sets of reward: τ=(s0, a0, r0, s1, a1, r1...);Then, the progressive award R (τ) of track τ is
T indicates the number of parameters for needing to optimize in algorithm model in each plot in formula (7);
S33, desired total reward value is calculated as J (θ):
In step S32 and S33, tactful πθIt is a kind of rule, Agent reference policy is determined to take and executed dynamic Make;Assuming that θ indicates strategy πθParameter, J (θ) is to define desired total reward value, track τ~πθIt is state, movement and reward Sequence: τ=(s0, a0, r0, s1, a1, r1...);
S34, our target are to solve for reasonable parameter θ and expectancy reward J (θ) are maximized, and need to enable J (θ) maximum Change, such as formula (9):
According to optimization algorithm, by the gradient for solving J (θ)And then undated parameter θ, it can finally acquire function J The locally optimal solution of (θ).
Wherein, step S34 solves the gradient of J (θ)The specific method is as follows with undated parameter θ:
(a) solution strategies gradient g:
(b), Utilization strategies gradient g adjustable strategies parameter θ obtains locally optimal solution:
θ←θ+αg (3);
Wherein α is learning rate.
In the present solution, when solving LSTM network, first choice uses parameter to carry out parametrization table for the deep neural network of θ Show strategy, and Utilization strategies gradient method carrys out optimisation strategy πθ, the reason is that the expectation that it is capable of directly optimisation strategy is always awarded, and Optimal policy is directly searched in policy space in end-to-end mode, eliminates cumbersome intermediate link.In intensified learning Strategy optimization algorithm ratio Q-Learning method is simpler, has the differentiable target letter for hiding parameter since it only needs one Number, does not need to describe different market conditions using series of discrete state yet, but can be directly from continuous perception number According to middle learning strategy.
Compared with prior art, the present invention having the following advantages and benefits:
1, the Agent of transaction system of the invention using LSTM recurrent neural network construction transaction system, large-scale LSTM mould The ability that there is type height to indicate feature, may learn temporal characteristics abundant indicates, the Agent based on LSTM network can be with It excavates the time series pattern in stock certificate data and remembers historical behavior.Therefore, the present invention, can be direct when characterizing stock certificate data Learn character representation more powerful out from stock certificate data, rather than is indicated using predefined manual features.Also, except when Except preceding market condition, historical behavior and corresponding position need clearly to model in policy learning part simultaneously, and Agent is more It can learn in stock exchange process in big degree to more preferably strategy, to select optimal movement.
2, algorithm of the invention is when solving LSTM network, and first choice uses parameter to be joined for the deep neural network of θ Numberization indicates strategy, and Utilization strategies gradient method carrys out optimisation strategy πθ, the reason is that the expectation that it is capable of directly optimisation strategy is total Award, and optimal policy is searched in policy space directly in end-to-end mode, eliminate cumbersome intermediate link.Extensive chemical Strategy optimization algorithm ratio Q-Learning method in habit is simpler, has the differentiable for hiding parameter since it only needs one Objective function does not need to describe different market conditions using series of discrete state yet, but can be directly from continuous Learning strategy in perception data.
Detailed description of the invention
Attached drawing described herein is used to provide to further understand the embodiment of the present invention, constitutes one of the application Point, do not constitute the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is that the stock market in the embodiment of the present invention quantifies transaction system Agent internal model structure;
Fig. 2 (a)-Fig. 2 (f) be followed successively by chosen in the embodiment of the present invention 6 stocks (stock code 600547,600028, 600999,601988,002415,600016) raw financial data;
Fig. 3 (a)-Fig. 3 (f) be chosen in the embodiment of the present invention 6 stocks (stock code 600547,600028, 600999,601988,002415,600016) Agent based on the LSTM network and Agent based on full Connection Neural Network Prediction accumulated earnings;
Fig. 4 (a) is the backtracking test result of stock 002415;
Fig. 4 (b) is the backtracking test result of stock 6001988.
Specific embodiment
In the prior art, the quantization transaction system of stock handles quantization transaction as a decision problem, utilizes friendship The easy intelligentized agent of systematic training directly carries out decision, and nowadays this method is faced with the financial signal of characterization and executes optimal Two defects of movement, the defect are generated when by existing quantization transaction as decision problem processing.The purpose of the present invention It is to propose a kind of algorithm of defect for being able to solve financial characterization and optimal movement execution, when characterizing stock certificate data, Directly learn character representation more powerful out from stock certificate data, rather than is indicated using predefined manual features.Also, it removes Except current market condition, historical behavior and corresponding position need clearly to model in policy learning part simultaneously, Agent can learn in stock exchange process to a greater extent to more preferably strategy, to select optimal movement.
To make the objectives, technical solutions, and advantages of the present invention clearer, below with reference to embodiment and attached drawing, to this Invention is described in further detail, and exemplary embodiment of the invention and its explanation for explaining only the invention, are not made For limitation of the invention.
In the following description, a large amount of specific details are elaborated in order to provide a thorough understanding of the present invention.However, for this Field those of ordinary skill it is evident that: the present invention need not be carried out using these specific details.In other instances, it is The present invention that avoids confusion, does not specifically describe well known structure, circuit, material or method.
As shown in Figure 1, a kind of stock market quantization transaction system based on deeply study includes Agent, it is described Agent is formed using LSTM net structure, and input includes the current state s of the current time t of stock markett, record go through Hidden state h in the LSTM network of association in time between history datat-1With the decision a of last moment t-1t-1, output is currently The decision a at momentt, the prize example value that Agent is obtained at this time is rt.A kind of stock market based on deeply study of the invention Quantization trading algorithms construct the Agent of transaction system using LSTM recurrent neural network.LSTM network, which is that one kind is special, to be passed Return neural network, it forgets three doors of door and out gate and safeguard information at any time by input gate.The system of oriented cycles The Structure Creating internal state of network allows it to handle time-based sequence data and remembers that timing contacts, therefore can be with Solve long-term Dependence Problem.The ability that there is large-scale LSTM model height to indicate feature, it can study is special to the time abundant Sign indicates that the Agent based on LSTM network can excavate the time series pattern in stock certificate data and remember historical behavior.It is based on The Agent of LSTM is made of input layer, 5 layers LSTM layers (every layer by 31 hidden units), hidden layer and soft-max layers.
Meanwhile a kind of stock market quantization trading algorithms based on deeply study use the policy optimization of intensified learning Method come train transaction Agent.Policy-Gradient is a kind of common policy optimization method, it is total by the expectation of continuous calculative strategy The gradient about policing parameter is awarded to update policing parameter, finally converges on optimal policy.Therefore we are solving LSTM net When network, first choice uses parameter to indicate strategy for the deep neural network of θ to carry out parametrization, and Utilization strategies gradient method is next excellent Change strategy πθ, the reason is that the expectation that it is capable of directly optimisation strategy is always awarded, and in end-to-end mode directly in policy space Middle search optimal policy, eliminates cumbersome intermediate link.Strategy optimization algorithm ratio Q-Learning method in intensified learning It is simpler, there is the differentiable objective function for hiding parameter since it only needs one, also do not need to utilize series of discrete shape State describes different market conditions (in Q-Learning), but can directly from continuous perception data (market characteristics) Middle learning strategy.
Policy-Gradient method be it is a kind of directly come approximate representation and optimisation strategy using approaching device, finally obtain optimal policy Method, this method optimization is strategy expectation always award maxθE[R|πθ],
Wherein R indicates award summation obtained in the three unities (episode);
T indicates that the number of parameters for needing to optimize in algorithm model in each plot, plot (episode) are intensified learning neck Domain generic term and means no longer carry out repeating its meaning in the present embodiment.
The most common thought of Policy-Gradient is the probability for increasing total higher plot of award and occurring.Policy-Gradient method it is specific Process is: assuming that the state of a complete episode, movement and the track of award are as follows: τ=(s0, a0, r0, s1, a1, r1..., sT-1, aT-1, rT-1, sT), then the form that Policy-Gradient is expressed as:
Utilize the gradient adjustable strategies parameter θ:
θ←θ+αg (3);
Wherein, α is learning rate, controls the rate of policing parameter update.In formula (2)Ladder Degree item indicates to can be improved the direction of track probability of occurrence, is multiplied by after scoring function R, can always to award in single plot Higher τ more " firmly draws " probability density over to one's side.Different tracks is always much awarded if had collected, passes through above-mentioned training process Meeting is so that probability density to higher course bearing movement is always awarded, maximizes the probability that high award track τ occurs.
Above-mentioned si(i=0,1 ..., T-1), ai(i=0,1 ..., T-1), ri(i=0,1 ..., T-1) successively indicate the i moment State, movement and reward.
However in some cases, total award R of each plot is not negative, then the value of all gradient g is also all big In equal to 0.It encounters each track τ in the training process at this time, can all make probability density to positive direction " drawing over to one's side ", very great Cheng Degree slows down pace of learning.This meeting is so that the variance of gradient g is very big.Therefore R can be reduced using certain normalizing operation The variance of gradient g.The skill enables algorithm to improve the probability of occurrence of total award biggish track τ of R, while reducing total award R The probability of occurrence of lesser track τ.According to above-mentioned thought, by the unity of form of Policy-Gradient are as follows:
Wherein, b is a baseline relevant to current track τ, is usually arranged as an expectation estimation of R, it is therefore an objective to subtract The variance of small R.As can be seen that R is more than benchmark b more, corresponding track τ selected probability is bigger.Therefore in extensive shape In the DRL task of state, can be parameterized by deep neural network indicates strategy, and is asked using traditional Policy-Gradient method Solve optimal policy.In brief, there are two advantages for policy optimization method tool: the flexible target of optimization and the market item continuously described Part, therefore, Selection Strategy optimization algorithm of the present invention is as transaction frame.
According to process of exchange, there are two Functional portions by Agent: one is the note based on historical information prediction future state Part is recalled, the other is determining the decision part which kind of takes act.To which the Agent includes for being based on historical information Predict that the memory module of future state and the decision-making module for determining which kind of takes act, the memory module receive input Hidden state ht-1, the decision-making module is according to the current state s of inputt, last moment t-1 decision at-1And hidden state ht-1Export the decision a at current timet.In the present embodiment, Agent just realizes the two functions using a LSTM network enough, Reason is as follows:
(1) input of Agent not only includes the market watch of each period, further includes the comprehensive letter changed over time Breath.LSTM deep neural network can be with connecting each other between memory time sequence data.
(2) search space being greatly reduced.If search space is excessive, multilayer dense network may be because of search space mistake It can not train greatly;The structure and parameter of LSTM network can be shared at any time, this can reduce trained difficulty on a large scale.
In the present embodiment, quantization transaction issues will be solved, using deeply study (DRL) to design the friendship of stock market Easy system.DRL supports, since deep neural network has height ability to express, to be based on from the end-to-end study for perceiving action The Agent of DRL can learn from high dimensional data to high dimensional feature expression, solve more complicated decision problem.S in Fig. 1tIt represents Market situation;htRepresent the hidden state of LSTM network;atIt is the movement that t moment Agent is taken;With htAnd atAssociated st It is input into LSTM network.The input of Agent includes the current state s in markettWith the hidden state h in LSTM networkt-1, It can remember the association in time between historical data.In addition, the decision of last moment is also by additional input to LSTM.Agent will Export the decision at current time.The working principle of Agent is as follows: each moment Agent obtains Vehicles Collected from Market state and integrates and goes through History data are to predict future state.Comprehensive all information, Agent will make in real time trade decision, which is used for more New Transaction The inner parameter of Agent obtains direct yield based on decision.After successive ignition, Agent can learn to how improving it to determine Plan.After finally reaching frequency of training, Agent will carry out stock exchange according to newest strategy.
A kind of stock market quantization trading algorithms based on deeply study, the algorithm pass through quantization transaction system training Intelligentized agent directly carries out decision;The Agent is constructed using LSTM recurrent neural network, the algorithm include with Lower step:
S1, stock certificate data is obtained;
S2, Agent obtain the current state s of the current time t of stock markett, record there is time between historical data to close Hidden state h in the LSTM network of connectiont-1With the decision a of last moment t-1t-1, and export the decision a at current timet, wherein The hidden state h of incoming subsequent timet-1Contain the anticipation according to the situation of change of historical stock data to subsequent time state Meaning so that stock certificate data becomes the data based on time series and establishes timing connection, therefore can solve long-term rely on and ask Topic;
S3, at this point, Agent known state, movement and reward constitute sequence, thus agent according to known state, movement And the sequence of reward composition calculates the decision a at current timetIn tactful πθCorresponding progressive award and adaptive expectations;
S4, Agent are according to Policy-Gradient algorithm optimization strategy πθ, so that the award of accumulation obtained in step S3 is realized most Bigization;
S5, judge whether to reach frequency of training, be, jump to step S6, otherwise jump to step S2;
S6, Agent carry out stock exchange according to newest strategy.
It include the price series traded in the state of Agent, but due to the randomness and original in market in the present embodiment Much noise in beginning price series, we, which are additionally added, summarizes market trend or dynamic financial technology index.Therefore the number of share of stock According to including daily opening price, closing price, highest price, lowest price, dynamic financial technology index and historical trend data amount.
The strategy includes tri- Long Position long, neutrality position neutral and Short Position short options, is enabled
at∈ { long, neutral, short }={ 1,0, -1 } (5);
Assume that the result of neutral position will also have an impact the income of dealer in the strategy.The target of Agent is Maximize final income.
Step S3 specifically includes the following steps:
The decision a at S31, agent acquisition current timetIn tactful πθCorresponding reward value, the reward value r of moment ttFor
In formula (6), whereinWithRespectively represent the opening price and closing price of t moment;△ n is the variable quantity of trading volume; N is the current quantity that agent holds stock;M is existing assets, and I is initial wealth;ctIt indicates in moment t since transaction generates Transaction cost;In stock exchange trading system, the time interval of Agent deal maker's transaction is also t, i.e., deal maker is in each time It trades at the end of the t of interval, the reward value accordingly generated is used to assess income.
S32, progressive award is calculated, enabling τ is state, movement and the track sets of reward: τ=(s0, a0, r0, s1, a1, r1...);The track sets have executed reward value r in step S31tWhen, just has become known, then, the accumulation of track τ is encouraged Encourage R (τ) are as follows:
T indicates the number of parameters for needing to optimize in algorithm model in each plot in formula (7);
S33, desired total reward value is calculated as J (θ):
In step S32 and S33, tactful π is a kind of rule, i.e. Agent reference policy is determined to take and executed Movement.θ indicates strategy πθParameter, J (θ) is to define desired total reward value.Track τ~πθIt is using strategy πθUnder state, The sequence of movement and reward: τ=(s0, a0, r0, s1, a1, r1...) and return value be track progressive award Our target is to find a reasonable parameter θ expectancy reward J (θ) maximization is made to obtain maxθJ (θ), therefore need:
According to optimization algorithm, by the Policy-Gradient for solving J (θ)And then undated parameter θ, it can finally acquire letter The locally optimal solution of number J (θ).Specifically:
Policy-Gradient
Utilize Policy-Gradient g adjustable strategies parameter θ:
θ←θ+αg (3)。
The formula indicates learning rate that multiplied by Policy-Gradient g, plus current strategies, obtained policing parameter is assigned to tactful ginseng Number, obtained strategy is as new strategy, with adjustable strategies parameter θ.
Deep learning is combined with intensified learning in the present embodiment, the expression and online friendship for stock signal characteristic Easily.In this frame, the Agent based on LSTM is capable of the dynamic of automatic sensing stock market, and LSTM can subtract to a certain extent The difficulty of stock feature is gently manually extracted from mass data.In addition, we increase financial technology index in LSTM network, To reduce the influence of market noise.Nitrification enhancement is used to train Agent study trading strategies.
In order to verify the effect of transaction system and trading algorithms in the present embodiment, present inventor has carried out data Verifying and comparative analysis, in the process, data acquiring mode are as follows: the stock certificate data used is obtained by Tushare interface, Tushare is free, open source a python Money Data interface packet, and main realize adopts the finance datas such as stock from data Collection, the process that store of surface cleaning to data can provide for financial analysis personnel and quick, clean and tidy and various be convenient for analyzing Data, greatly mitigate workload in terms of data acquisition for them, make they focus more on strategy and model research In realization.The selection of data set are as follows: transaction system can capture the day line number evidence of China Stock Markets's stock, including daily closing price And opening price, highest price and lowest price.Had chosen in the present embodiment 6 stocks (stock code 600547,600028,600999, 601988,002415,600016) data from 2015-10-25 to 2017-10-25, these data respectively show completely not Same market trend, successively as shown in Fig. 2 (a)-Fig. 2 (f).Fig. 2 (a)-Fig. 2 (f) display mainly includes the original of three trend Finance data, d1, d2, d3, d4 in Fig. 2 (a)-Fig. 2 (f) respectively represent four kinds of different customized dynamic financial technologies and refer to Mark combination, the Index Content of d1, d2, d3, d4 are listed in respectively in table 1:
Table 1 includes that the feature of financial technology index combines
The brief meaning of listed technical indicator will be explained in table 2 in table 1:
2 financial technology Index Content meaning of table
To in the training process of Agent, the turnover rate of baseline function b is 0.8.The weight of Agent is in -0.2 and 0.2 range Inside equably initialized.We will use Adam majorized function to train the neural network of Agent, learning rate 0.001.
Inventor compares the Agent based on LSTM and the Agent performance based on full Connection Neural Network first:
The neural network connected entirely is formed by four layers: input layer;One full articulamentum with 32 nodes;One has 64 The dropout layer of a node;Soft-max layer with 3 output units.And it is 0.007 that learning rate, which is arranged, and attenuation rate is 0.99, Adam optimization method is for optimizing loss function.
The finance data of two Agent is inputted, associated financial data include daily opening price, closing price and historical trend Data volume.Original capital is 500,000 yuan (CNY), and the position scale traded every time is fixed as 10,000 yuan. By 10,000 repetitive exercises, result final later is utilized to the performance of assessment models.Fig. 3 (a)-(f) is shown Two Agent of training stage different performances is compared the Agent based on LSTM network on different stocks and is connected based on complete The accumulated earnings of the Agent of neural network.The final profit that table 2 summarizes each Agent after training and can obtain.
The full Connection Neural Network of table 3 and the performance of LSTM network compare
In table 3 and Fig. 3 (a)-(f), FC represents full Connection Neural Network.It is horizontal in Fig. 2 (a)-Fig. 2 (f) and Fig. 3 (a)-(f) Axis episode represents the number of iterations of training, and longitudinal axis portfolio represents asset holdings, and unit is 1000 RMB.
It can be seen that the height ability to express due to deep neural network from Fig. 3 (a)-(f) and table 1, both Agent can show more advantages on stock market.When initial data significantly fluctuation up and down, the mind that connects entirely It performs poor through network, the Agent based on LSTM can show more preferably, be that can remember to count due to the Agent based on LSTM According to sequential correlation.The experimental results showed that LSTM neural network is more suitable for building transaction Agent.
Inventor also had chosen 2 stocks to the transaction system in the present embodiment and has carried out time survey, from October 25 in 2005 Day on October 25th, 2,017 13 years day line number evidences is had collected in total, backtracking test is carried out to the on-line performance performance of Agent. Wherein, preceding 4 years data are used to be arranged the parameter of Agent;Testing time section is from October 25 to 2017 years October in 2009 25. For 2009 to 2017 each years, the sliding window of training data was as unit of 2 years, with one in entire training set Year pushes ahead for unit and is trained.In sliding window, the parameter previously learnt is for instructing following friendship in 1 year Easily.The process learnt by sliding window, Agent will be seen that continually changing stock market.Time in 2009 to 2017 Test result of tracing back is shown in Fig. 4 (a) and Fig. 4 (b).Fig. 4 (a) is the backtracking test result of stock 002415;Fig. 4 (b) is stock The backtracking test result of ticket 6001988.It can be seen that, Agent in most cases can be with additional income from result.
In addition, technical indicator is a kind of based on historical price, quantity or (in the feelings of futures contract in Technical analysis Under condition) be intended to predict financial market direction uncovered position interest mathematical computations.Many technical indexs have also been employed that, trade Member also the new index of continual exploitation to obtain better result.Since the stock price from market contains excessive noise, this Technical indicator used in the examples includes that opening price, closing price etc. can also increase technical finger in other embodiments Mark is to capture main trend, such as the input packet of the Agent based on LSTM neural network not only includes daily opening price, closing quotation Valence, highest price and lowest price also include trading volume and other technologies indicator combination.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims (8)

1. a kind of stock market based on deeply study quantifies transaction system, the quantization transaction system includes Agent, It is characterized in that, the Agent is formed using LSTM net structure, and input includes the current shape of the current time t of stock market State st, record have hidden state h in the LSTM network of the association in time between historical datat-1With the decision of last moment t-1 at-1, export the decision a at current timet
2. a kind of stock market based on deeply study according to claim 1 quantifies transaction system, feature exists In Agent obtains historical stock data and is trained, according to the current state s of current time t when trainingt, record have history number According to association in time LSTM network in hidden state ht-1With the decision a of last moment t-1t-1, and export current time Decision at;Calculate the decision a at current timetIn tactful πθ.Corresponding progressive award and adaptive expectations;Further according to strategy Gradient algorithm optimisation strategy πθ, it realizes accumulation award obtained and maximizes, optimisation strategy;It every time all will more after training iteration New strategy πθ, finally when reaching frequency of training, Agent carries out stock exchange according to newest strategy.
3. a kind of stock market based on deeply study according to claim 1 or 2 quantifies transaction system, feature It is, the Agent includes for the memory module based on historical information prediction future state and for determining which kind of takes move The decision-making module of work, the memory module receive the hidden state h of inputt-1, the decision-making module is according to the current state of input st, last moment t-1 decision at-1With hidden state ht-1Export the decision a at current timet
4. a kind of stock market based on deeply study according to claim 1 or 2 quantifies transaction system, feature It is, further includes frequency of training control module and real trade module, the frequency of training control module is stored with the mark of Agent Quasi- frequency of training or the standard exercise number for receiving user's input, and record the frequency of training of Agent;Agent is in training time When number reaches standard exercise number, terminate training, real trade module carries out stock exchange using the strategy in newest decision.
5. a kind of stock market based on deeply study quantifies trading algorithms, which passes through quantization transaction system training intelligence The agent of energyization directly carries out decision;It is characterized in that, the Agent is constructed using LSTM recurrent neural network, the calculation Method the following steps are included:
S1, stock certificate data is obtained;
S2, Agent obtain the current state s of the current time t of stock markett, record have association in time between historical data Hidden state h in LSTM networkt-1With the decision a of last moment t-1t-1, and export the decision a at current timet
S3, Agent calculate the decision a at current timetIn tactful πθCorresponding progressive award and adaptive expectations;
S4, Agent are according to Policy-Gradient algorithm optimization strategy πθ, realize the award of accumulation obtained in step S3 most Bigization;
S5, judge whether to reach frequency of training, be, jump to step S6, otherwise jump to step S2;
S6, Agent carry out stock exchange according to newest strategy.
6. a kind of stock market based on deeply study according to claim 5 quantifies trading algorithms, feature exists In the stock certificate data includes daily opening price, closing price, highest price, lowest price and historical trend data amount.
7. a kind of stock market based on deeply study according to claim 5 quantifies trading algorithms, feature exists In,
The strategy includes tri- Long Position long, neutrality position neutral and Short Position short options, is enabled
at∈ { long, neutral, short }={ 1,0, -1 } (5);
Step S3 specifically includes the following steps:
The decision a at S31, Agent acquisition current timetIn tactful πθCorresponding reward value, the reward value r of moment ttFor
In formula (2), whereinWithRespectively represent the opening price and closing price of t moment;Δ n is the variable quantity of trading volume;N is Agent holds the current quantity of stock;M is existing assets, and I is initial wealth;ctIt indicates in moment t due to transaction generation Transaction cost;
S32, progressive award is calculated, enabling τ is state, movement and the track sets of reward: τ=(s0, a0, r0, s1, a1, r1...);
Then, the progressive award R (τ) of track τ is
T indicates the number of parameters for needing to optimize in algorithm model in each plot in formula (7);
S33, desired total reward value is calculated as J (θ):
In step S32 and S33, tactful πθIt is a kind of rule, Agent reference policy determines the movement that take and execute;It is false If θ indicates strategy πθParameter, J (θ) is to define desired total reward value, track τ~πθIt is state, movement and the sequence of reward: τ=(s0, a0, r0, s1, a1, r1...);
S34, our target are to solve for reasonable parameter θ and expectancy reward J (θ) are maximized, and need that J (θ) is enabled to maximize, such as Formula (9):
According to optimization algorithm, by the gradient for solving J (θ)And then undated parameter θ, it can finally acquire function J's (θ) Locally optimal solution.
8. a kind of stock market based on deeply study according to claim 7 quantifies trading algorithms, feature exists In step S34 solves the gradient of J (θ)The specific method is as follows with undated parameter θ:
(a) solution strategies gradient g:
(b), Utilization strategies gradient g adjustable strategies parameter θ obtains locally optimal solution:
θ←θ+αg (3);Wherein α is learning rate.
CN201910650290.7A 2019-07-18 2019-07-18 A kind of stock market quantization transaction system and algorithm based on deeply study Pending CN110335162A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910650290.7A CN110335162A (en) 2019-07-18 2019-07-18 A kind of stock market quantization transaction system and algorithm based on deeply study

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910650290.7A CN110335162A (en) 2019-07-18 2019-07-18 A kind of stock market quantization transaction system and algorithm based on deeply study

Publications (1)

Publication Number Publication Date
CN110335162A true CN110335162A (en) 2019-10-15

Family

ID=68145668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910650290.7A Pending CN110335162A (en) 2019-07-18 2019-07-18 A kind of stock market quantization transaction system and algorithm based on deeply study

Country Status (1)

Country Link
CN (1) CN110335162A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061564A (en) * 2019-12-11 2020-04-24 中国建设银行股份有限公司 Server capacity adjusting method and device and electronic equipment
CN111539495A (en) * 2020-07-10 2020-08-14 北京海天瑞声科技股份有限公司 Recognition method based on recognition model, model training method and device
CN112068420A (en) * 2020-07-30 2020-12-11 同济大学 Real-time control method and device for drainage system
CN112101556A (en) * 2020-08-25 2020-12-18 清华大学 Method and device for identifying and removing redundant information in environment observation quantity
CN112884576A (en) * 2021-02-02 2021-06-01 上海卡方信息科技有限公司 Stock trading method based on reinforcement learning
TWI732650B (en) * 2020-08-12 2021-07-01 中國信託商業銀行股份有限公司 Stock prediction method and server end for stock prediction
CN113283986A (en) * 2021-04-28 2021-08-20 南京大学 Algorithm transaction system and training method of algorithm transaction model based on system
CN113807965A (en) * 2021-09-17 2021-12-17 中国银行股份有限公司 Transaction data analysis and prediction method and device
CN114092254A (en) * 2021-11-26 2022-02-25 桂林电子科技大学 Consumption financial transaction method based on artificial intelligence and transaction system thereof

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061564A (en) * 2019-12-11 2020-04-24 中国建设银行股份有限公司 Server capacity adjusting method and device and electronic equipment
CN111539495A (en) * 2020-07-10 2020-08-14 北京海天瑞声科技股份有限公司 Recognition method based on recognition model, model training method and device
CN112068420A (en) * 2020-07-30 2020-12-11 同济大学 Real-time control method and device for drainage system
TWI732650B (en) * 2020-08-12 2021-07-01 中國信託商業銀行股份有限公司 Stock prediction method and server end for stock prediction
CN112101556A (en) * 2020-08-25 2020-12-18 清华大学 Method and device for identifying and removing redundant information in environment observation quantity
CN112101556B (en) * 2020-08-25 2021-08-10 清华大学 Method and device for identifying and removing redundant information in environment observation quantity
CN112884576A (en) * 2021-02-02 2021-06-01 上海卡方信息科技有限公司 Stock trading method based on reinforcement learning
CN113283986A (en) * 2021-04-28 2021-08-20 南京大学 Algorithm transaction system and training method of algorithm transaction model based on system
CN113807965A (en) * 2021-09-17 2021-12-17 中国银行股份有限公司 Transaction data analysis and prediction method and device
CN114092254A (en) * 2021-11-26 2022-02-25 桂林电子科技大学 Consumption financial transaction method based on artificial intelligence and transaction system thereof

Similar Documents

Publication Publication Date Title
CN110335162A (en) A kind of stock market quantization transaction system and algorithm based on deeply study
Wu et al. Adaptive stock trading strategies with deep reinforcement learning methods
Lee et al. Pattern discovery of fuzzy time series for financial prediction
Xiong et al. Hybrid ARIMA-BPNN model for time series prediction of the Chinese stock market
Jiang et al. Investor sentiment and machine learning: Predicting the price of China's crude oil futures market
Xu et al. An optimized decomposition integration framework for carbon price prediction based on multi-factor two-stage feature dimension reduction
CN106886571A (en) A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis
CN113435204A (en) Stock price fluctuation prediction method based on news information
Wang et al. An enhanced hybrid model based on multiple influencing factors and divide-conquer strategy for carbon price prediction
Nanayakkara et al. Anticipatory ethics and the role of uncertainty
Tang et al. Deep hierarchical strategy model for multi-source driven quantitative investment
Weng et al. Stock price prediction based on LSTM and Bert
Wei et al. Energy financial risk early warning model based on Bayesian network
CN108154264A (en) A kind of Prediction of Stock Index method based on ARMA-LSTM models
Putri et al. Currency movement forecasting using time series analysis and long short-term memory
CN115953215A (en) Search type recommendation method based on time and graph structure
Koparanov et al. Lookback period, epochs and hidden states effect on time series prediction using a LSTM based neural network
CN112418575A (en) Futures initiative contract quantitative timing decision system based on cloud computing and artificial intelligence deep learning algorithm
Loginov et al. On the impact of streaming interface heuristics on GP trading agents: an FX benchmarking study
Guo Stock Price Prediction Using Machine Learning
Yong et al. Adaptive detection of FOREX repetitive chart patterns
Xu et al. Prediction Model of International Trade Risk Based on Stochastic Time-Series Neural Network
Li et al. Ensemble Investment Strategies Based on Reinforcement Learning
Zhang et al. A hybrid price prediction method for carbon trading with multi-data fusion and multi-frequency analysis
Rajanand et al. Stock Price Prediction using Depthwise Pointwise CNN with Sequential LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191015