CN110335162A

CN110335162A - A kind of stock market quantization transaction system and algorithm based on deeply study

Info

Publication number: CN110335162A
Application number: CN201910650290.7A
Authority: CN
Inventors: 吴佳; 王晨; 孙洪永; 熊礼东
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-07-18
Filing date: 2019-07-18
Publication date: 2019-10-15

Abstract

The invention discloses a kind of stock market quantization transaction systems and algorithm based on deeply study, and the quantization transaction system includes Agent, and the Agent is formed using LSTM net structure, and input includes the current state s of the current time t of stock market_t, record have hidden state h in the LSTM network of the association in time between historical data_t‑1With the decision a of last moment t-1_t‑1, export the decision a at current time_t.The present invention constructs the Agent of transaction system using LSTM recurrent neural network, the ability that there is large-scale LSTM model height to indicate feature, may learn temporal characteristics abundant indicates, the problem of the problem of Agent based on LSTM network can excavate the time series pattern in stock certificate data and remember historical behavior, the financial characterization generated when being able to solve in the prior art by existing quantization transaction as decision problem processing and optimal movement execute.

Description

A kind of stock market quantization transaction system and algorithm based on deeply study

Technical field

The present invention relates to stock trading technology fields, and in particular to a kind of stock market quantization based on deeply study Transaction system and algorithm.

Background technique

China's securities market since the 1980s, has gone through two as one of main capital market in the whole world The development course of more than ten years.In recent years, quantization transaction gradually enters into people's sight.Quantization transaction is not required to establish and manage with finance The connection of opinion, but finance data is analyzed using mathematical model and artificial intelligence model；Based on these mathematical models, quantization is handed over Maximum profit is easily obtained by the trading instruction of computer program sending.Compared with traditional trading strategies, quantization transaction tool Have more advantages: quantization transaction system can make rapid reaction with real-time tracking turn of the market and to it；And plan of trading Design slightly is rather than the predefined rule of the mankind based on the mathematical model by acquiring in long history data, thus can with gram Take the emotion influence factor of mankind deal maker.

Quantization transaction has many algorithms, and most of algorithms are focused on to financial market dynamic modeling, and are done based on these models Decision out.However, simple model is difficult to capture all since participant in the market has complexity and continually changing behavior The important attribute of market situation can be characterized.Therefore, the trade decision process based on model is easier failure.Another research side Method is emphatically handled quantization transaction as a decision problem, is directly carried out using the Agent of transaction system training smart Decision, and nowadays this method is faced with two defects for characterizing financial signal and executing optimal movement.

First defect is derived from the difficulty of stock market character representation.Include much noise, fluctuating and the number of share of stock of movement According to being typically denoted as highly unstable time series.It is usually artificial to extract stock in order to mitigate data noise and uncertainty Ticket feature, such as rolling average or stochastic technique index, to summarize stock market situation.Currently, in terms of quantifying transaction There are many work of extensive investigative technique analysis indexes, however, it is exactly its extensive energy that technology, which analyzes a well-known disadvantage, Power is poor.For example, Moving Average feature is enough to describe stock trend, but may suffer heavy losses in mean regression market.

Second defect is since stock exchange execution is dynamic behaviour, this is a system sex work, needs to consider to be permitted More practical factors.Often changing transaction position does not only have any contribution to profit, but also be easy to cause transaction cost and sliding point Massive losses.

Summary of the invention

The purpose of the present invention is to provide a kind of stock markets based on deeply study to quantify transaction system and algorithm, The above-mentioned financial signal table that the algorithm generates when being able to solve in the prior art by existing quantization transaction as decision problem processing The problem of the problem of sign and optimal movement execute.

The present invention is achieved through the following technical solutions:

A kind of stock market quantization transaction system based on deeply study, the quantization transaction system includes Agent, The Agent is formed using LSTM net structure, and input includes the current state s of the current time t of stock market_t, record have Hidden state h in the LSTM network of association in time between historical data_t-1With the decision a of last moment t-1_t-1, export and work as The decision a at preceding moment_t.The Agent of transaction system of the invention using LSTM recurrent neural network construction transaction system, large size The ability that there is LSTM model height to indicate feature, may learn temporal characteristics abundant indicates, based on LSTM network Agent can excavate the time series pattern in stock certificate data and remember historical behavior.Therefore, the present invention is in characterization stock certificate data When, it can directly learn character representation more powerful out from stock certificate data, rather than be indicated using predefined manual features. Also, except when historical behavior and corresponding position need to define in policy learning part simultaneously except preceding market condition Modeling, Agent can learn in stock exchange process to a greater extent to more preferably strategy, to select optimal movement.

As a further improvement of the present invention, Agent obtain historical stock data be trained, training when according to it is current when Carve the current state s of t_t, record have hidden state h in the LSTM network of the association in time between historical data_t-1And last moment The decision a of t-1_t-1, and export the decision a at current time_t；Calculate the decision a at current time_tIn tactful π_θCorresponding accumulation Reward and adaptive expectations；Further according to Policy-Gradient algorithm optimization strategy π_θ, it realizes accumulation award obtained and maximizes, optimization Strategy；It every time all will more new strategy π after training iteration_θ, finally when reaching frequency of training, Agent according to it is newest strategy into Row stock exchange.

Further, the Agent includes for the memory module based on historical information prediction future state and for determining The decision-making module of which kind of movement is taken, the memory module receives the hidden state h of input_t-1, the decision-making module is according to input Current state s_t, last moment t-1 decision a_t-1With hidden state h_t-1Export the decision a at current time_t。

Further, the above-mentioned stock market quantization transaction system based on deeply study further includes frequency of training control mould Block and real trade module, the frequency of training control module be stored with Agent standard exercise number or receive user it is defeated The standard exercise number entered, and record the frequency of training of Agent；Agent terminates when frequency of training reaches standard exercise number Training, real trade module carry out stock exchange using the strategy in newest decision.

The present invention also provides a kind of stock markets based on deeply study to quantify trading algorithms, which passes through quantization The agent of transaction system training smart directly carries out decision；The Agent is constructed using LSTM recurrent neural network, Input includes the current state st of the current time t of stock market, records the LSTM network for having the association in time between historical data In hidden state ht-1 and last moment t-1 decision at-1, export current time decision at, the algorithm include with Lower step:

S1, stock certificate data is obtained；

S2, Agent obtain the current state s of the current time t of stock market_t, record there is time between historical data to close Hidden state h in the LSTM network of connection_t-1With the decision a of last moment t-1_t-1, and export the decision a at current time_t；

S3, Agent calculate the decision a at current time_tIn tactful π_θCorresponding progressive award and adaptive expectations；

S4, Agent are according to Policy-Gradient algorithm optimization strategy π_θ, finally make the award of accumulation obtained in step S3 real Now maximize；

S5, judge whether to reach frequency of training, be, jump to step S6, otherwise jump to step S2；

S6, Agent carry out stock exchange according to newest strategy.

In the prior art, the quantization transaction system of stock handles quantization transaction as a decision problem, utilizes friendship The easily intelligentized agent of systematic training directly carries out decision, and there are two that characterize financial signal and the optimal movement of execution to lack It falls into.Algorithm in this programme directly learns character representation more powerful out when characterizing stock certificate data from stock certificate data, without It is to be indicated using predefined manual features, Agent can learn in stock exchange process to a greater extent to more preferably Strategy, to select optimal movement.LSTM network is a kind of special recurrent neural network, it by input gate, forget door and Three doors of out gate safeguard information at any time.The architecture of oriented cycles creates the internal state of network, allows it It handles time-based sequence data and remembers that timing contacts, therefore can solve long-term Dependence Problem.Large-scale LSTM model tool The ability for thering is height to indicate feature, it can study to temporal characteristics abundant indicates that the Agent based on LSTM network can be dug Pick stock certificate data in time series pattern and remember historical behavior.

Preferably, the stock certificate data includes daily opening price, closing price, highest price, lowest price and historical trend data Amount.

Further, the strategy includes Long Position long, neutrality position neutral and Short Position short tri- choosings , it enables

a_t∈ { long, neutral, short }={ 1,0, -1 } (5)；

Step S3 specifically includes the following steps:

The decision a at S31, Agent acquisition current time_tIn tactful π_θCorresponding reward value, the reward value r of moment t_tFor

In formula (2), whereinWithRespectively represent the opening price and closing price of t moment；△ n is the variable quantity of trading volume； N is the current quantity that agent holds stock；M is existing assets, and I is initial wealth；c_tIt indicates in moment t since transaction generates Transaction cost；

S32, progressive award is calculated, enabling τ is state, movement and the track sets of reward: τ=(s₀, a₀, r₀, s₁, a₁, r₁...)；Then, the progressive award R (τ) of track τ is

T indicates the number of parameters for needing to optimize in algorithm model in each plot in formula (7)；

S33, desired total reward value is calculated as J (θ):

In step S32 and S33, tactful π_θIt is a kind of rule, Agent reference policy is determined to take and executed dynamic Make；Assuming that θ indicates strategy π_θParameter, J (θ) is to define desired total reward value, track τ~π_θIt is state, movement and reward Sequence: τ=(s₀, a₀, r₀, s₁, a₁, r₁...)；

S34, our target are to solve for reasonable parameter θ and expectancy reward J (θ) are maximized, and need to enable J (θ) maximum Change, such as formula (9):

According to optimization algorithm, by the gradient for solving J (θ)And then undated parameter θ, it can finally acquire function J The locally optimal solution of (θ).

Wherein, step S34 solves the gradient of J (θ)The specific method is as follows with undated parameter θ:

(a) solution strategies gradient g:

(b), Utilization strategies gradient g adjustable strategies parameter θ obtains locally optimal solution:

θ←θ+αg (3)；

Wherein α is learning rate.

In the present solution, when solving LSTM network, first choice uses parameter to carry out parametrization table for the deep neural network of θ Show strategy, and Utilization strategies gradient method carrys out optimisation strategy π_θ, the reason is that the expectation that it is capable of directly optimisation strategy is always awarded, and Optimal policy is directly searched in policy space in end-to-end mode, eliminates cumbersome intermediate link.In intensified learning Strategy optimization algorithm ratio Q-Learning method is simpler, has the differentiable target letter for hiding parameter since it only needs one Number, does not need to describe different market conditions using series of discrete state yet, but can be directly from continuous perception number According to middle learning strategy.

Compared with prior art, the present invention having the following advantages and benefits:

1, the Agent of transaction system of the invention using LSTM recurrent neural network construction transaction system, large-scale LSTM mould The ability that there is type height to indicate feature, may learn temporal characteristics abundant indicates, the Agent based on LSTM network can be with It excavates the time series pattern in stock certificate data and remembers historical behavior.Therefore, the present invention, can be direct when characterizing stock certificate data Learn character representation more powerful out from stock certificate data, rather than is indicated using predefined manual features.Also, except when Except preceding market condition, historical behavior and corresponding position need clearly to model in policy learning part simultaneously, and Agent is more It can learn in stock exchange process in big degree to more preferably strategy, to select optimal movement.

2, algorithm of the invention is when solving LSTM network, and first choice uses parameter to be joined for the deep neural network of θ Numberization indicates strategy, and Utilization strategies gradient method carrys out optimisation strategy π_θ, the reason is that the expectation that it is capable of directly optimisation strategy is total Award, and optimal policy is searched in policy space directly in end-to-end mode, eliminate cumbersome intermediate link.Extensive chemical Strategy optimization algorithm ratio Q-Learning method in habit is simpler, has the differentiable for hiding parameter since it only needs one Objective function does not need to describe different market conditions using series of discrete state yet, but can be directly from continuous Learning strategy in perception data.

Detailed description of the invention

Attached drawing described herein is used to provide to further understand the embodiment of the present invention, constitutes one of the application Point, do not constitute the restriction to the embodiment of the present invention.In the accompanying drawings:

Fig. 1 is that the stock market in the embodiment of the present invention quantifies transaction system Agent internal model structure；

Fig. 2 (a)-Fig. 2 (f) be followed successively by chosen in the embodiment of the present invention 6 stocks (stock code 600547,600028, 600999,601988,002415,600016) raw financial data；

Fig. 3 (a)-Fig. 3 (f) be chosen in the embodiment of the present invention 6 stocks (stock code 600547,600028, 600999,601988,002415,600016) Agent based on the LSTM network and Agent based on full Connection Neural Network Prediction accumulated earnings；

Fig. 4 (a) is the backtracking test result of stock 002415；

Fig. 4 (b) is the backtracking test result of stock 6001988.

Specific embodiment

In the prior art, the quantization transaction system of stock handles quantization transaction as a decision problem, utilizes friendship The easy intelligentized agent of systematic training directly carries out decision, and nowadays this method is faced with the financial signal of characterization and executes optimal Two defects of movement, the defect are generated when by existing quantization transaction as decision problem processing.The purpose of the present invention It is to propose a kind of algorithm of defect for being able to solve financial characterization and optimal movement execution, when characterizing stock certificate data, Directly learn character representation more powerful out from stock certificate data, rather than is indicated using predefined manual features.Also, it removes Except current market condition, historical behavior and corresponding position need clearly to model in policy learning part simultaneously, Agent can learn in stock exchange process to a greater extent to more preferably strategy, to select optimal movement.

To make the objectives, technical solutions, and advantages of the present invention clearer, below with reference to embodiment and attached drawing, to this Invention is described in further detail, and exemplary embodiment of the invention and its explanation for explaining only the invention, are not made For limitation of the invention.

In the following description, a large amount of specific details are elaborated in order to provide a thorough understanding of the present invention.However, for this Field those of ordinary skill it is evident that: the present invention need not be carried out using these specific details.In other instances, it is The present invention that avoids confusion, does not specifically describe well known structure, circuit, material or method.

As shown in Figure 1, a kind of stock market quantization transaction system based on deeply study includes Agent, it is described Agent is formed using LSTM net structure, and input includes the current state s of the current time t of stock market_t, record go through Hidden state h in the LSTM network of association in time between history data_t-1With the decision a of last moment t-1_t-1, output is currently The decision a at moment_t, the prize example value that Agent is obtained at this time is r_t.A kind of stock market based on deeply study of the invention Quantization trading algorithms construct the Agent of transaction system using LSTM recurrent neural network.LSTM network, which is that one kind is special, to be passed Return neural network, it forgets three doors of door and out gate and safeguard information at any time by input gate.The system of oriented cycles The Structure Creating internal state of network allows it to handle time-based sequence data and remembers that timing contacts, therefore can be with Solve long-term Dependence Problem.The ability that there is large-scale LSTM model height to indicate feature, it can study is special to the time abundant Sign indicates that the Agent based on LSTM network can excavate the time series pattern in stock certificate data and remember historical behavior.It is based on The Agent of LSTM is made of input layer, 5 layers LSTM layers (every layer by 31 hidden units), hidden layer and soft-max layers.

Meanwhile a kind of stock market quantization trading algorithms based on deeply study use the policy optimization of intensified learning Method come train transaction Agent.Policy-Gradient is a kind of common policy optimization method, it is total by the expectation of continuous calculative strategy The gradient about policing parameter is awarded to update policing parameter, finally converges on optimal policy.Therefore we are solving LSTM net When network, first choice uses parameter to indicate strategy for the deep neural network of θ to carry out parametrization, and Utilization strategies gradient method is next excellent Change strategy π_θ, the reason is that the expectation that it is capable of directly optimisation strategy is always awarded, and in end-to-end mode directly in policy space Middle search optimal policy, eliminates cumbersome intermediate link.Strategy optimization algorithm ratio Q-Learning method in intensified learning It is simpler, there is the differentiable objective function for hiding parameter since it only needs one, also do not need to utilize series of discrete shape State describes different market conditions (in Q-Learning), but can directly from continuous perception data (market characteristics) Middle learning strategy.

Policy-Gradient method be it is a kind of directly come approximate representation and optimisation strategy using approaching device, finally obtain optimal policy Method, this method optimization is strategy expectation always award max_θE[R|π_θ],

Wherein R indicates award summation obtained in the three unities (episode)；

T indicates that the number of parameters for needing to optimize in algorithm model in each plot, plot (episode) are intensified learning neck Domain generic term and means no longer carry out repeating its meaning in the present embodiment.

The most common thought of Policy-Gradient is the probability for increasing total higher plot of award and occurring.Policy-Gradient method it is specific Process is: assuming that the state of a complete episode, movement and the track of award are as follows: τ=(s₀, a₀, r₀, s₁, a₁, r₁..., s_T-1, a_T-1, r_T-1, s_T), then the form that Policy-Gradient is expressed as:

Utilize the gradient adjustable strategies parameter θ:

θ←θ+αg (3)；

Wherein, α is learning rate, controls the rate of policing parameter update.In formula (2)Ladder Degree item indicates to can be improved the direction of track probability of occurrence, is multiplied by after scoring function R, can always to award in single plot Higher τ more " firmly draws " probability density over to one's side.Different tracks is always much awarded if had collected, passes through above-mentioned training process Meeting is so that probability density to higher course bearing movement is always awarded, maximizes the probability that high award track τ occurs.

Above-mentioned s_i(i=0,1 ..., T-1), a_i(i=0,1 ..., T-1), r_i(i=0,1 ..., T-1) successively indicate the i moment State, movement and reward.

However in some cases, total award R of each plot is not negative, then the value of all gradient g is also all big In equal to 0.It encounters each track τ in the training process at this time, can all make probability density to positive direction " drawing over to one's side ", very great Cheng Degree slows down pace of learning.This meeting is so that the variance of gradient g is very big.Therefore R can be reduced using certain normalizing operation The variance of gradient g.The skill enables algorithm to improve the probability of occurrence of total award biggish track τ of R, while reducing total award R The probability of occurrence of lesser track τ.According to above-mentioned thought, by the unity of form of Policy-Gradient are as follows:

Wherein, b is a baseline relevant to current track τ, is usually arranged as an expectation estimation of R, it is therefore an objective to subtract The variance of small R.As can be seen that R is more than benchmark b more, corresponding track τ selected probability is bigger.Therefore in extensive shape In the DRL task of state, can be parameterized by deep neural network indicates strategy, and is asked using traditional Policy-Gradient method Solve optimal policy.In brief, there are two advantages for policy optimization method tool: the flexible target of optimization and the market item continuously described Part, therefore, Selection Strategy optimization algorithm of the present invention is as transaction frame.

According to process of exchange, there are two Functional portions by Agent: one is the note based on historical information prediction future state Part is recalled, the other is determining the decision part which kind of takes act.To which the Agent includes for being based on historical information Predict that the memory module of future state and the decision-making module for determining which kind of takes act, the memory module receive input Hidden state h_t-1, the decision-making module is according to the current state s of input_t, last moment t-1 decision a_t-1And hidden state h_t-1Export the decision a at current time_t.In the present embodiment, Agent just realizes the two functions using a LSTM network enough, Reason is as follows:

(1) input of Agent not only includes the market watch of each period, further includes the comprehensive letter changed over time Breath.LSTM deep neural network can be with connecting each other between memory time sequence data.

(2) search space being greatly reduced.If search space is excessive, multilayer dense network may be because of search space mistake It can not train greatly；The structure and parameter of LSTM network can be shared at any time, this can reduce trained difficulty on a large scale.

In the present embodiment, quantization transaction issues will be solved, using deeply study (DRL) to design the friendship of stock market Easy system.DRL supports, since deep neural network has height ability to express, to be based on from the end-to-end study for perceiving action The Agent of DRL can learn from high dimensional data to high dimensional feature expression, solve more complicated decision problem.S in Fig. 1_tIt represents Market situation；h_tRepresent the hidden state of LSTM network；a_tIt is the movement that t moment Agent is taken；With h_tAnd a_tAssociated s_t It is input into LSTM network.The input of Agent includes the current state s in market_tWith the hidden state h in LSTM network_t-1, It can remember the association in time between historical data.In addition, the decision of last moment is also by additional input to LSTM.Agent will Export the decision at current time.The working principle of Agent is as follows: each moment Agent obtains Vehicles Collected from Market state and integrates and goes through History data are to predict future state.Comprehensive all information, Agent will make in real time trade decision, which is used for more New Transaction The inner parameter of Agent obtains direct yield based on decision.After successive ignition, Agent can learn to how improving it to determine Plan.After finally reaching frequency of training, Agent will carry out stock exchange according to newest strategy.

A kind of stock market quantization trading algorithms based on deeply study, the algorithm pass through quantization transaction system training Intelligentized agent directly carries out decision；The Agent is constructed using LSTM recurrent neural network, the algorithm include with Lower step:

S1, stock certificate data is obtained；

S2, Agent obtain the current state s of the current time t of stock market_t, record there is time between historical data to close Hidden state h in the LSTM network of connection_t-1With the decision a of last moment t-1_t-1, and export the decision a at current time_t, wherein The hidden state h of incoming subsequent time_t-1Contain the anticipation according to the situation of change of historical stock data to subsequent time state Meaning so that stock certificate data becomes the data based on time series and establishes timing connection, therefore can solve long-term rely on and ask Topic；

S3, at this point, Agent known state, movement and reward constitute sequence, thus agent according to known state, movement And the sequence of reward composition calculates the decision a at current time_tIn tactful π_θCorresponding progressive award and adaptive expectations；

S4, Agent are according to Policy-Gradient algorithm optimization strategy π_θ, so that the award of accumulation obtained in step S3 is realized most Bigization；

S6, Agent carry out stock exchange according to newest strategy.

It include the price series traded in the state of Agent, but due to the randomness and original in market in the present embodiment Much noise in beginning price series, we, which are additionally added, summarizes market trend or dynamic financial technology index.Therefore the number of share of stock According to including daily opening price, closing price, highest price, lowest price, dynamic financial technology index and historical trend data amount.

The strategy includes tri- Long Position long, neutrality position neutral and Short Position short options, is enabled

a_t∈ { long, neutral, short }={ 1,0, -1 } (5)；

Assume that the result of neutral position will also have an impact the income of dealer in the strategy.The target of Agent is Maximize final income.

Step S3 specifically includes the following steps:

In formula (6), whereinWithRespectively represent the opening price and closing price of t moment；△ n is the variable quantity of trading volume； N is the current quantity that agent holds stock；M is existing assets, and I is initial wealth；c_tIt indicates in moment t since transaction generates Transaction cost；In stock exchange trading system, the time interval of Agent deal maker's transaction is also t, i.e., deal maker is in each time It trades at the end of the t of interval, the reward value accordingly generated is used to assess income.

S32, progressive award is calculated, enabling τ is state, movement and the track sets of reward: τ=(s₀, a₀, r₀, s₁, a₁, r₁...)；The track sets have executed reward value r in step S31_tWhen, just has become known, then, the accumulation of track τ is encouraged Encourage R (τ) are as follows:

S33, desired total reward value is calculated as J (θ):

In step S32 and S33, tactful π is a kind of rule, i.e. Agent reference policy is determined to take and executed Movement.θ indicates strategy π_θParameter, J (θ) is to define desired total reward value.Track τ~π_θIt is using strategy π_θUnder state, The sequence of movement and reward: τ=(s₀, a₀, r₀, s₁, a₁, r₁...) and return value be track progressive award Our target is to find a reasonable parameter θ expectancy reward J (θ) maximization is made to obtain max_θJ (θ), therefore need:

According to optimization algorithm, by the Policy-Gradient for solving J (θ)And then undated parameter θ, it can finally acquire letter The locally optimal solution of number J (θ).Specifically:

Policy-Gradient

Utilize Policy-Gradient g adjustable strategies parameter θ:

θ←θ+αg (3)。

The formula indicates learning rate that multiplied by Policy-Gradient g, plus current strategies, obtained policing parameter is assigned to tactful ginseng Number, obtained strategy is as new strategy, with adjustable strategies parameter θ.

Deep learning is combined with intensified learning in the present embodiment, the expression and online friendship for stock signal characteristic Easily.In this frame, the Agent based on LSTM is capable of the dynamic of automatic sensing stock market, and LSTM can subtract to a certain extent The difficulty of stock feature is gently manually extracted from mass data.In addition, we increase financial technology index in LSTM network, To reduce the influence of market noise.Nitrification enhancement is used to train Agent study trading strategies.

In order to verify the effect of transaction system and trading algorithms in the present embodiment, present inventor has carried out data Verifying and comparative analysis, in the process, data acquiring mode are as follows: the stock certificate data used is obtained by Tushare interface, Tushare is free, open source a python Money Data interface packet, and main realize adopts the finance datas such as stock from data Collection, the process that store of surface cleaning to data can provide for financial analysis personnel and quick, clean and tidy and various be convenient for analyzing Data, greatly mitigate workload in terms of data acquisition for them, make they focus more on strategy and model research In realization.The selection of data set are as follows: transaction system can capture the day line number evidence of China Stock Markets's stock, including daily closing price And opening price, highest price and lowest price.Had chosen in the present embodiment 6 stocks (stock code 600547,600028,600999, 601988,002415,600016) data from 2015-10-25 to 2017-10-25, these data respectively show completely not Same market trend, successively as shown in Fig. 2 (a)-Fig. 2 (f).Fig. 2 (a)-Fig. 2 (f) display mainly includes the original of three trend Finance data, d1, d2, d3, d4 in Fig. 2 (a)-Fig. 2 (f) respectively represent four kinds of different customized dynamic financial technologies and refer to Mark combination, the Index Content of d1, d2, d3, d4 are listed in respectively in table 1:

Table 1 includes that the feature of financial technology index combines

The brief meaning of listed technical indicator will be explained in table 2 in table 1:

2 financial technology Index Content meaning of table

To in the training process of Agent, the turnover rate of baseline function b is 0.8.The weight of Agent is in -0.2 and 0.2 range Inside equably initialized.We will use Adam majorized function to train the neural network of Agent, learning rate 0.001.

Inventor compares the Agent based on LSTM and the Agent performance based on full Connection Neural Network first:

The neural network connected entirely is formed by four layers: input layer；One full articulamentum with 32 nodes；One has 64 The dropout layer of a node；Soft-max layer with 3 output units.And it is 0.007 that learning rate, which is arranged, and attenuation rate is 0.99, Adam optimization method is for optimizing loss function.

The finance data of two Agent is inputted, associated financial data include daily opening price, closing price and historical trend Data volume.Original capital is 500,000 yuan (CNY), and the position scale traded every time is fixed as 10,000 yuan. By 10,000 repetitive exercises, result final later is utilized to the performance of assessment models.Fig. 3 (a)-(f) is shown Two Agent of training stage different performances is compared the Agent based on LSTM network on different stocks and is connected based on complete The accumulated earnings of the Agent of neural network.The final profit that table 2 summarizes each Agent after training and can obtain.

The full Connection Neural Network of table 3 and the performance of LSTM network compare

In table 3 and Fig. 3 (a)-(f), FC represents full Connection Neural Network.It is horizontal in Fig. 2 (a)-Fig. 2 (f) and Fig. 3 (a)-(f) Axis episode represents the number of iterations of training, and longitudinal axis portfolio represents asset holdings, and unit is 1000 RMB.

It can be seen that the height ability to express due to deep neural network from Fig. 3 (a)-(f) and table 1, both Agent can show more advantages on stock market.When initial data significantly fluctuation up and down, the mind that connects entirely It performs poor through network, the Agent based on LSTM can show more preferably, be that can remember to count due to the Agent based on LSTM According to sequential correlation.The experimental results showed that LSTM neural network is more suitable for building transaction Agent.

Inventor also had chosen 2 stocks to the transaction system in the present embodiment and has carried out time survey, from October 25 in 2005 Day on October 25th, 2,017 13 years day line number evidences is had collected in total, backtracking test is carried out to the on-line performance performance of Agent. Wherein, preceding 4 years data are used to be arranged the parameter of Agent；Testing time section is from October 25 to 2017 years October in 2009 25. For 2009 to 2017 each years, the sliding window of training data was as unit of 2 years, with one in entire training set Year pushes ahead for unit and is trained.In sliding window, the parameter previously learnt is for instructing following friendship in 1 year Easily.The process learnt by sliding window, Agent will be seen that continually changing stock market.Time in 2009 to 2017 Test result of tracing back is shown in Fig. 4 (a) and Fig. 4 (b).Fig. 4 (a) is the backtracking test result of stock 002415；Fig. 4 (b) is stock The backtracking test result of ticket 6001988.It can be seen that, Agent in most cases can be with additional income from result.

In addition, technical indicator is a kind of based on historical price, quantity or (in the feelings of futures contract in Technical analysis Under condition) be intended to predict financial market direction uncovered position interest mathematical computations.Many technical indexs have also been employed that, trade Member also the new index of continual exploitation to obtain better result.Since the stock price from market contains excessive noise, this Technical indicator used in the examples includes that opening price, closing price etc. can also increase technical finger in other embodiments Mark is to capture main trend, such as the input packet of the Agent based on LSTM neural network not only includes daily opening price, closing quotation Valence, highest price and lowest price also include trading volume and other technologies indicator combination.

Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims

1. a kind of stock market based on deeply study quantifies transaction system, the quantization transaction system includes Agent, It is characterized in that, the Agent is formed using LSTM net structure, and input includes the current shape of the current time t of stock market State s_t, record have hidden state h in the LSTM network of the association in time between historical data_t-1With the decision of last moment t-1 a_t-1, export the decision a at current time_t。

2. a kind of stock market based on deeply study according to claim 1 quantifies transaction system, feature exists In Agent obtains historical stock data and is trained, according to the current state s of current time t when training_t, record have history number According to association in time LSTM network in hidden state h_t-1With the decision a of last moment t-1_t-1, and export current time Decision a_t；Calculate the decision a at current time_tIn tactful π_θ.Corresponding progressive award and adaptive expectations；Further according to strategy Gradient algorithm optimisation strategy π_θ, it realizes accumulation award obtained and maximizes, optimisation strategy；It every time all will more after training iteration New strategy π_θ, finally when reaching frequency of training, Agent carries out stock exchange according to newest strategy.

3. a kind of stock market based on deeply study according to claim 1 or 2 quantifies transaction system, feature It is, the Agent includes for the memory module based on historical information prediction future state and for determining which kind of takes move The decision-making module of work, the memory module receive the hidden state h of input_t-1, the decision-making module is according to the current state of input s_t, last moment t-1 decision a_t-1With hidden state h_t-1Export the decision a at current time_t。

4. a kind of stock market based on deeply study according to claim 1 or 2 quantifies transaction system, feature It is, further includes frequency of training control module and real trade module, the frequency of training control module is stored with the mark of Agent Quasi- frequency of training or the standard exercise number for receiving user's input, and record the frequency of training of Agent；Agent is in training time When number reaches standard exercise number, terminate training, real trade module carries out stock exchange using the strategy in newest decision.

5. a kind of stock market based on deeply study quantifies trading algorithms, which passes through quantization transaction system training intelligence The agent of energyization directly carries out decision；It is characterized in that, the Agent is constructed using LSTM recurrent neural network, the calculation Method the following steps are included:

S1, stock certificate data is obtained；

S2, Agent obtain the current state s of the current time t of stock market_t, record have association in time between historical data Hidden state h in LSTM network_t-1With the decision a of last moment t-1_t-1, and export the decision a at current time_t；

S4, Agent are according to Policy-Gradient algorithm optimization strategy π_θ, realize the award of accumulation obtained in step S3 most Bigization；

S6, Agent carry out stock exchange according to newest strategy.

6. a kind of stock market based on deeply study according to claim 5 quantifies trading algorithms, feature exists In the stock certificate data includes daily opening price, closing price, highest price, lowest price and historical trend data amount.

7. a kind of stock market based on deeply study according to claim 5 quantifies trading algorithms, feature exists In,

a_t∈ { long, neutral, short }={ 1,0, -1 } (5)；

Step S3 specifically includes the following steps:

In formula (2), whereinWithRespectively represent the opening price and closing price of t moment；Δ n is the variable quantity of trading volume；N is Agent holds the current quantity of stock；M is existing assets, and I is initial wealth；c_tIt indicates in moment t due to transaction generation Transaction cost；

S32, progressive award is calculated, enabling τ is state, movement and the track sets of reward: τ=(s₀, a₀, r₀, s₁, a₁, r₁...)；

Then, the progressive award R (τ) of track τ is

S33, desired total reward value is calculated as J (θ):

In step S32 and S33, tactful π_θIt is a kind of rule, Agent reference policy determines the movement that take and execute；It is false If θ indicates strategy π_θParameter, J (θ) is to define desired total reward value, track τ~π_θIt is state, movement and the sequence of reward: τ=(s₀, a₀, r₀, s₁, a₁, r₁...)；

S34, our target are to solve for reasonable parameter θ and expectancy reward J (θ) are maximized, and need that J (θ) is enabled to maximize, such as Formula (9):

According to optimization algorithm, by the gradient for solving J (θ)And then undated parameter θ, it can finally acquire function J's (θ) Locally optimal solution.

8. a kind of stock market based on deeply study according to claim 7 quantifies trading algorithms, feature exists In step S34 solves the gradient of J (θ)The specific method is as follows with undated parameter θ:

(a) solution strategies gradient g:

θ←θ+αg (3)；Wherein α is learning rate.