CN110223180A

CN110223180A - Portfolio Selection Based method based on depth attention network and intensified learning

Info

Publication number: CN110223180A
Application number: CN201910390018.XA
Authority: CN
Inventors: 王静远; 张阳; 吴俊杰
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2019-05-10
Filing date: 2019-05-10
Publication date: 2019-09-10

Abstract

The Portfolio Selection Based method based on depth attention network and intensified learning that the invention discloses a kind of, the neural network model for merging attention mechanism is introduced into financial field, using Sharpe Ratio as reward function, income and risk are balanced when generating Portfolio Selection Based using intensified learning frame training pattern.It is also proposed that modeling the correlation between different assets by completely new across assets attention mechanism, and deep exploration is carried out in terms of model interpretation.Superior performance, generalization and strong robustness are achieved in Sino-U.S. history stock market and firm offer simulation.

Description

Portfolio Selection Based method based on depth attention network and intensified learning

Technical field

The present invention relates to finance data minings and intensified learning technical field, more particularly to a kind of to be based on depth The Portfolio Selection Based method of attention network and intensified learning.

Background technique

In recent years, financial innovation and the combination of artificial intelligence technology take in multiple financial fields including quantization transaction Obtained successful application.In view of being capable of handling large-scale transaction and providing rational decision making support, quantization trading strategies are for a long time It is used by financial institution and hedge fund always and achieves surprising success.

Traditional quantization trading strategies are typically based on specific financial logic.A kind of viewpoint thinks the money that history is excellent in Production will continue to keep preferably to show at the next moment, and this phenomenon is known as momentum effect；And the viewpoint of another mean reversion Think that assets price can be intended to mean value with the time.Although these traditional quantization trading strategies have solid Finance Support, but the part of properties in financial market is all only focused on, therefore be difficult to obtain good performance on complicated financial market Performance.

Recently with the rise of deep learning and artificial intelligence technology, it is pre- that share price much is carried out using deep neural network It surveys, is also suggested using the method that intensified learning generates investment tactics, but at present using deep learning to real financial market Research still suffers from challenge below: first is that for the balance of income and risk, second is that the correlation between assets how is modeled, Third is that the interpretation of investment tactics.

Therefore, how to provide that a kind of Generalization Capability is strong, the investment combination method suitable for different complicated market scenarios is this The problem of field technical staff's urgent need to resolve.

Summary of the invention

In view of this, the present invention provides a kind of Portfolio Selection Based sides based on depth attention network and intensified learning The neural network model for merging attention mechanism is introduced into financial field by method, using Sharpe Ratio as reward function, is used Intensified learning frame training pattern balances income and risk when generating Portfolio Selection Based.It is also proposed that by completely new Across assets attention mechanism model the correlation between different assets, and carried out in terms of model interpretation deep It explores.Superior performance, generalization and strong robustness are achieved in Sino-U.S. history stock market and firm offer simulation.

To achieve the goals above, the present invention adopts the following technical scheme:

A kind of Portfolio Selection Based method based on depth attention network and intensified learning, comprising:

S1: based on financial field, the classical victor that buys in sells the vanquished's strategy and recently the stock spy of K historical juncture Vector is levied, preliminary characterization extraction is carried out using the shot and long term memory network with historic state attention mechanism, obtains stock table Levy r；

S2: r and across assets attention networks are characterized to model the correlativity between stock based on stock, and thus selected Victor and the vanquished obtain victor's scoring of each branch stock；

S3: it is scored according to the victor of current all stocks and generates investment combination；

S4: it is based on investment combination, using Sharpe Ratio as the reward function of intensified learning, Optimized model parameter.

Preferably, step S1 is specifically included:

LSTM models Temporal dependency: h_k=LSTM (h_k-1, x_k), k ∈ [1, K], wherein h_kIt is that LSTM is obtained in kth step coding Hidden state, the hidden state h that final step obtains_KAs the characterization of stock, it contains Temporal dependency all in X；X is most The characteristic sequence of nearly K historical juncture composition,

X={ x₁..., x_k... x_K, wherein

The attention mechanism modeling overall situation and long-term dependence for introducing historic state obtain new stock characterization r, meter Calculate formula are as follows:

Wherein ATT () is that attention calculates function, is defined as:

α_k=w^T·tanh(W⁽¹⁾h_k+W⁽²⁾h_K)

w^T, W⁽¹⁾, W⁽²⁾Learn for corresponding coefficient matrix, random initializtion and being continued to optimize in subsequent training process To final value.This refers to be continued to optimize using Sharpe Ratio as reward function.

Preferably, step S2 is specifically included:

Give the characterization r of any one stock⁽ⁱ⁾, query vector q is each mapped to by parameter matrix⁽ⁱ⁾, index Vector k⁽ⁱ⁾With value vector v⁽ⁱ⁾

q⁽ⁱ⁾=W^(Q)r⁽ⁱ⁾, k⁽ⁱ⁾=W^(K)r⁽ⁱ⁾, v⁽ⁱ⁾=W^(v)r⁽ⁱ⁾

The index of stock j is inquired in correlativity modeling between stock j and stock i using stock i, i.e.,

Wherein, D_kIt is standardization coefficient；

Using the correlativity after normalization as weight, weight to obtain using the value vector of current all I branch stocks new Stock characterizes a⁽ⁱ⁾；

Finally, vector a will be characterized using full articulamentum⁽ⁱ⁾It is mapped as victor's scoring: s⁽ⁱ⁾=sigmoid (w^(s)T·a⁽ⁱ⁾+e^(s))

Wherein, w^(s)For the parameter matrix of full articulamentum, e^(s)For the biasing of full articulamentum.

Preferably, step S3 is specifically included:

The victor of all stocks is scored and is arranged according to descending, and obtains sequence number o of any stock i after sequence⁽ⁱ⁾, parameter preset G decision buying long, the stock number sold short respectively；

To any stock i, if o⁽ⁱ⁾∈ [1, G], then buy long for stock i, and the ratio between investments bought long calculates It is as follows:

Similarly, if o⁽ⁱ⁾∈ (I-G, I], then the ratio between investments sold short, and sold short for stock i calculates as follows:

Remaining is not chosen as the stock of victor or the vanquished, then in default of buying long/short sales signal without investment.

Preferably, step S4 is specifically included:

One continuing investment sequence with T holding period of sampling is concentrated from training data every time, it is special to input history respectively Vector is levied, the investment combination sequence at each moment is obtained according to model

The Sharpe Ratio of entire Investment Sequences and the reward function as intensified learning can be then calculated, is risen using gradient Method optimizes model parameter, i.e. argmaxH_T{B⁺, B^-}；

Wherein,A_TIt is the average return of the Investment Sequences each holding period, calculation formula isTC_tTake for actual transaction procedure, V_TIt is the fluctuation of Investment Sequences, uses the mark of income Quasi- difference is measured, and calculation formula is

Preferably, after step s4, further includes: model is explained using significance analysis method.

Preferably, described to include: to the specific steps that model explains using significance analysis method

Partial derivative is asked to the various features of input according to victor scoring, according to the size of partial derivative and positive and negative is analyzed The influence that each feature plays the decision of model.

It can be seen via above technical scheme that compared with prior art, the present disclosure provides one kind to be infused based on depth The neural network model for merging attention mechanism is introduced into finance by the Portfolio Selection Based method of meaning power network and intensified learning In field, using Sharpe Ratio as reward function, put down using intensified learning frame training pattern when generating Portfolio Selection Based The income that weighs and risk.It is also proposed that an investment combination based on across assets attention mechanism generates model, and wherein Use intensified learning for Optimized model parameter.The it is proposeds of across assets attention mechanism, the different assets of very good solution it Between the problem that models of correlativity, Generalization Capability is strong.Intensified learning frame enables model to receive with Sharpe Ratio for guiding balance Benefit and risk.In addition, the method by significance analysis can explore the interpretation of model, deep learning is opened Internal black box.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.

Fig. 1 attached drawing is the Portfolio Selection Based method provided by the invention based on depth attention network and intensified learning Schematic diagram；

Fig. 2 attached drawing is that the LSTM stock characterization extraction knot module provided by the invention based on historic state attention mechanism is shown It is intended to；

Fig. 3 attached drawing is across assets attention schematic network structures provided by the invention；

Fig. 4 attached drawing is the accumulated earnings that model provided by the invention is obtained in US stock market up to 27 years (1990-2016) Curve and comparison with existing method；

Fig. 5 attached drawing is the interpretation analysis provided by the invention which factor to generate investment combination according to for model.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

The following detailed description of the Portfolio Selection Based side of the deeply study provided by the invention based on attention mechanism Method.

1, data demand

Since the stock feature vector of input is really the time for including stock exchange data and corresponding company's financial report data Sequence.Therefore, other than the attributes such as closing price, trading volume, issuance of shares number that stock exchange data itself should include, also It needs comprising listed company corresponding with stock in the financial report data of synchronization, at least should include p/e ratio, price value ratio, dividend Equal attributes.

2, technical essential

The present invention relates to the cross discipline application of artificial intelligence field and financial field, by intensified learning frame for realizing The BWSL strategy of financial field classics.Model mainly includes three parts, and first part uses the LSTM comprising attention mechanism It is extracted to carry out the characterization of stock, followed by across assets attention networks (Cross Asset Attention Network, CAAN) modeling stock between correlativity, obtain every stock victor scoring.It is finally scored and is felt according to victor It obtains and whether the branch stock is bought long or sold short, and distribute ratio between investments, generate final Portfolio Selection Based, refer to attached Fig. 1.

Entire model is optimized end to end using intensified learning, using Sharpe Ratio as reward function, enables model Enough pay close attention to the balance of long-term gain and risk.

2.1 financial logics and problem definition

For the ease of introduce, introduce first involved in invention it is basic finance concept, and provide Portfolio Selection Problem and The mathematical definition of BWSL strategy.

Holding period: holding period is the minimum time unit invested to assets.The present invention divides time shaft in order For the holding period with regular length, such as day degree or monthly.In the present invention, it is referred to as at the beginning of t-th of holding period Moment t.

Continuing investment: continuing investment includes one section of continuous holding period, and for t-th of holding period, an investment tactics makes Assets are invested with capital, and obtain income (may be negative) at the t+1 moment.

Assets price: the price of an assets is defined as a time series:

WhereinI-th of assets is represented in the price of t moment.Stock is used in the present invention to be described as assets, This can also be expanded into other kinds of assets.

Buying long: buying long transactional operation is to a certain assets in t₁Moment is bought in, in t₂Moment is sold.In buying long operation, transaction Person expects that the price of the assets can go up in future.

Short sales: short sale is operated to stock trader in t₁Moment borrows a number of assets to sell, in t₂Moment buys back identical number The assets of amount return stock trader.In selling do-nothing operation, dealer expects that the price of the assets can drop in future.

Investment combination: the fixed Assets Pool being made of I assets, an investment combination are defined as a vector b=(b⁽¹⁾..., b⁽ⁱ⁾..., b^(I))^T, wherein b⁽ⁱ⁾It is the ratio between investments to i-th of assets, and

Zero investment combination: zero investment combination refers to, for a series of investment combination { b¹..., b^j... b^JCombine When, their gross investment is 0, i.e.,Wherein M^jIndicate the investment distributed j-th of investment combination.

BWSL investment tactics: the present invention is by classical the buying in financial field based on victor sells the vanquished (BWSL), BWSL Core concept buy in the high assets of the following amount of increase (victor, winners), sell the low assets of the following amount of increase (the vanquished, losers).The present invention executes BWSL strategy: the throwing of a buying long using the zero investment combination mode comprising two investment combinations Money combination b⁺For buying in victor, the investment combination b of a short sales^-For the vanquished that sells short.

Any moment t gives budgetary restraintsAccording toIn ratio between investments, can calculate separately and sell in do-nothing operation The borrowing quantity of any stock iThen it is obtained by selling short these the vanquished's stocks Capital, and then by buy long operation buy in victor's stock, can equally calculate any stock buys in quantity

Ending, sells the stock of buying long, buys back the stock return of short sales to stock trader, can finally calculate t moment strategy Earning rate are as follows:

By the formula as it can be seen that obtain positive income R_t> 0 then means that the average price amount of increase of victor's stock is higher than and loses The average price amount of increase of person's stock.That is, even if current time all stock is all dropping, as long as can guarantee victor The average drop range of stock is less than the vanquished's stock, then BWSL strategy still can obtain positive income.So the core of BWSL strategy is just It is to consider the relative performance between stock, it is as so correct that select victor and the vanquished as possible according to history feature.

2.2 stocks characterization is extracted

Original issue stock feature includes two classes, and one kind is the original transaction feature of stock, including price amount of increase, stability bandwidth and friendship Yi Liang.Another kind of is the financial report data that stock corresponds to listed company, including market value, p/e ratio, price value ratio, dividend.Handle stock Primitive character composition characteristic vector, the feature vector of any moment tFeature comprising above seven dimensions.

The Portfolio Selection Based of any moment t in order to obtain, the input X to design a model are composition of nearest K historical juncture Characteristic sequence, i.e. X={ x₁..., x_k... x_K, wherein

First using with historic state attention mechanism shot and long term memory (Long Short-Term Memory, LSTM it) carries out preliminary characterization to extract, structure is as shown in Figure 2.

LSTM models Temporal dependency: h_k=LSTM (h_k-1, x_k), k ∈ [1, K], wherein h_kIt is that LSTM is obtained in kth step coding Hidden state, the hidden state h that final step obtains_KAs the characterization of stock, it contains Temporal dependency all in X；

Historic state attention mechanism: the attention mechanism modeling overall situation and long-term dependence of historic state are introduced, is obtained R is characterized to new stock, its calculation formula is:

Wherein ATT () is that attention calculates function, is defined as:

α_k=w^T·tanh(W⁽¹⁾h_k+W⁽¹⁾h_K)

I.e. first according to h_K, h_kCalculate the corresponding attention force vector α of any moment_k, then calculated using softmax function Obtain the attention weight ATT (h at corresponding moment_K, h_k).Wherein w^T, W⁽¹⁾, W⁽²⁾For corresponding coefficient matrix, random initializtion is simultaneously Study is continued to optimize in subsequent training process obtains final value.

For i-th stock of t moment, can be expressed as in conjunction with the obtained stock characterization of LSTM and history attention mechanismIt includes the timing and global dependence from t-K+1 to t moment, in model proposed by the present invention, all stock It all shares the same stock characterization and extracts network, all LSTM-HA (remember, Long Short by the shot and long term of history attention Term Memory-Historical Attention) extract stock characterization also have stronger generalization.

2.3 victor the vanquisheds selection

In order to fully demonstrate the characteristic that the BWSL strategy of 2.1 section introductions extremely pays close attention to relative performance, sufficiently model stock it Between correlativity, select victor and the vanquished.Notice that power module (as shown in Figure 3) will be extracted according to 2.2 sections across assets Stock characterization is that every stock generates corresponding victor's scoring.

Give the characterization r of any one stock⁽ⁱ⁾(omit time index t and without loss of generality), passes through parameter matrix first It is each mapped to query vector q⁽ⁱ⁾, index vector k⁽ⁱ⁾With value vector v⁽ⁱ⁾；

Wherein W^(Q), W^(K), W^(v)It respectively inquires, index, the coefficient matrix that value vector calculates, random initializtion and rear Study, which is continued to optimize, in continuous training process obtains final value.

The index of stock j is inquired in correlativity modeling between stock j and i using stock i, i.e.,

Wherein D_kIt is standardization coefficient, the correlativity after next using normalization uses current all I as weight The value vector of branch stock weights to obtain new stock characterization a⁽ⁱ⁾, attention coefficient S ATT (q⁽ⁱ⁾, k^(j)) equally pass through softmax Function is calculated.

Finally, use full articulamentum will characterize DUAL PROBLEMS OF VECTOR MAPPING be victor score, victor scoring it is higher represent current time be somebody's turn to do Stock is more likely to become victor: s⁽ⁱ⁾=sigmoid (w^(s)T·a⁽ⁱ⁾+e^(s)), wherein w^(s)For the parameter matrix of full articulamentum.

2.4 investment combinations generate

Victor's scoring { s of every stock is obtained by the CAAN module of 2.3 sections⁽¹⁾..., s⁽ⁱ⁾... s^(I), the present invention is to that A little stocks for obtaining higher victor's scoring carry out buying long operation, and the stock to score for those lower victors carries out short sales behaviour Make.It is arranged firstly, scoring the victor of all stocks according to descending, and obtains sequence number o of any stock i after sequence⁽ⁱ⁾.Parameter preset G determines the stock number bought long, sold short respectively, and (the stock number that fictitious transaction for convenience, is arranged is identical All it is G)

This is softmax function, and each S (i) is compressed between 0~1 by exponential average, i ' here be for And i is distinguished, change j into indicate the formula can also with.

The optimization of 2.5 intensified learnings

In order to enable model to consider that the income and risk of investment tactics, the present invention use Sharpe Ratio (income wind simultaneously Dangerous ratio) it is optimized as the reward function of intensified learning.

Sharpe Ratio: Sharpe Ratio has reacted under unit risk, and the risk income rate of investment is beyond risk free return Degree gives the chain transaction comprising T holding period, and corresponding Sharpe Ratio can calculate with the following method:

Wherein A_TIt is the average return of the Investment Sequences each holding period, calculation formula isTC_tTake for actual transaction procedure.V_TIt is the fluctuation of Investment Sequences, uses the mark of income Quasi- difference is measured, and calculation formula is

In training process, one continuing investment sequence with T holding period of sampling is concentrated from training data every time, respectively History feature vector is inputted, the investment combination sequence at each moment is obtained according to model

The Sharpe Ratio of entire Investment Sequences and the reward function as intensified learning can be then calculated, is risen using gradient Method optimizes model parameter, i.e. argmaxH_T{B⁺, B^-}。

Below with reference to specific example to further illustrate the technical scheme of the present invention.

Embodiment is related to the historical simulation transaction performance and explanatory analysis of US stock market.

1, data

US stock market is up to 47 years stock exchange datas and corresponding listed company's financial report numbers from 1970.12-2016.12 Cover many great financial events and different state of market according to, period, for example, 1995-2000 Internet bubble and The subprime crisis of 2007-2009.Effective stock every year on average is more than 1000, this allows under different state of market Fully measure actual performance of the invention.

2, evaluation index

The index of the assessment investment combination performance of most standard is exactly to calculate the accumulated earnings obtained in time horizon of vestment, calculation formula For

3, American market performance

Based on method provided by the invention, model is trained using the data of 1970.1-1989.12, in the U.S. The result obtained between stock market 1990.1-2016.12 is 27 years total is as follows.

As shown in figure 4, AS, AS-NP, AS-NC, which respectively represent the present invention, proposes that model AlphaStock, AlphaStock are gone Fall history attention mechanism, AlphaStock removes performance acquired by across assets attention mechanism.Other several this line difference Represent the leading edge method of deep bid index, financial field and computer field.It can see AlphaStock mould proposed by the present invention Type and its reduction version outclass the conventional method of other benchmark in the performance of American history mock trading in the market.

4, explanatory analysis

The present invention use sensitivity analysis method, attempt open deep learning model black box, interpretation model how root The selection of investment combination is carried out according to the history feature of input.

As shown in figure 5, horizontal axis represents the K historical juncture (being set as history in this experiment 12 months) of input, longitudinal axis generation The correlation that table input feature vector scores with last victor.

From left to right four width figure respectively represents price amount of increase feature, transaction measure feature, stability bandwidth feature and taking-over market Influence to final result.As can be seen that model proposed by the present invention, which tends to selection, has higher long-term gain, lower fluctuation Rate, higher intrinsic value and the stock underestimated recently are as victor.

Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.

The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims

1. a kind of Portfolio Selection Based method based on depth attention network and intensified learning characterized by comprising

S1: based on financial field it is classical buy in victor sell the vanquished's strategy and recently the stock feature of K historical juncture to Amount carries out preliminary characterization extraction using the shot and long term memory network with historic state attention mechanism, obtains stock characterization r；

S2: r and across assets attention networks are characterized to model the correlativity between stock based on stock, and thus select victor And the vanquished, obtain victor's scoring of each branch stock；

2. a kind of Portfolio Selection Based method based on depth attention network and intensified learning according to claim 1, It is characterized in that, step S1 is specifically included:

LSTM models Temporal dependency: h_k=LSTM (h_k-1, x_k), k ∈ [1, K], wherein h_kIt is hidden that be LSTM obtain in kth step coding State, the hidden state h that final step obtains_KAs the characterization of stock, it contains Temporal dependency all in X；X is nearest K The characteristic sequence of a historical juncture composition, X={ x₁..., x_k... x_K, wherein

The attention mechanism modeling overall situation and long-term dependence for introducing historic state obtain new stock characterization r, calculate public Formula are as follows:

Wherein ATT () is that attention calculates function, is defined as:

α_k=w^T·tanh(W^(1)h _k+W⁽²⁾h_K)

w^T, W⁽¹⁾, W⁽²⁾For corresponding coefficient matrix, random initializtion and study is continued to optimize in subsequent training process obtain most Whole value.

3. a kind of Portfolio Selection Based method based on depth attention network and intensified learning according to claim 2, It is characterized in that, step S2 is specifically included:

Wherein, D_kIt is standardization coefficient；

Using the correlativity after normalization as weight, weight to obtain new stock using the value vector of current all I branch stocks Characterize a⁽ⁱ⁾；

4. a kind of Portfolio Selection Based method based on depth attention network and intensified learning according to claim 3, It is characterized in that, step S3 is specifically included:

The victor of all stocks is scored and is arranged according to descending, and obtains sequence number o of any stock i after sequence⁽ⁱ⁾, in advance Setting parameter G determines the stock number bought long, sold short respectively；

To any stock i, if o⁽ⁱ⁾∈ [1, G], then buy long for stock i, and the ratio between investments bought long calculates such as Under:

5. a kind of Portfolio Selection Based method based on depth attention network and intensified learning according to claim 4, It is characterized in that, step S4 is specifically included:

Concentrate continuing investment sequence of the sampling one with T holding period from training data every time, respectively input history feature to Amount, obtains the investment combination sequence at each moment according to model

The Sharpe Ratio of entire Investment Sequences and the reward function as intensified learning can be then calculated, the method risen using gradient Model parameter is optimized, i.e. argmaxH_T{B⁺, B^-}；

6. a kind of Portfolio Selection Based method based on depth attention network and intensified learning according to claim 5, It is characterized in that, after step s4, further includes: explained using significance analysis method to model.

7. a kind of Portfolio Selection Based method based on depth attention network and intensified learning according to claim 6, It is characterized in that, described include: to the specific steps that model explains using significance analysis method

Partial derivative is asked to the various features of input according to victor scoring, according to the size of partial derivative and positive and negative analyzes each The influence that feature plays the decision of model.