KR102173243B1

KR102173243B1 - Methode for Performance Improvement of Portfolio Asset Allocation Using Recurrent Reinforcement Learning

Info

Publication number: KR102173243B1
Application number: KR1020180067908A
Authority: KR
Inventors: 이주홍; 강문주
Original assignee: (주)밸류파인더스
Priority date: 2018-06-14
Filing date: 2018-06-14
Publication date: 2020-11-03
Also published as: KR20190143543A

Abstract

본 발명은 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 관한 것으로서, 포트폴리오 자산배분을 위한 회귀 강화학습 모델의 성능향상을 위해, 과거의 자산데이터 외에 자산예측값 데이터와 인공생성 데이터를 생성하여 활용하는 방법에 대하여 구체적인 구현모델을 제시하고 이러한 모델이 효과적이라는 것을 실험을 통해 입증한다. 본 발명에 의하여 창안된 ‘자산예측값 데이터와 인공생성 데이터를 생성하여 활용하는 회귀 강화학습 모델’은 LSTM(Long Short-Term Memory)으로 구현하며, 자산예측값 데이터는 운용기간 동안의 예측정확도에 따른 자산가격의 상승, 하락에 대한 가상 예측값을 생성해 사용하고, 인공생성 데이터는 가우시안 프로세스를 사용한다.The present invention relates to a method for improving portfolio asset allocation performance using regression reinforcement learning, in which asset predicted value data and artificially generated data are generated and utilized in addition to past asset data to improve the performance of a regression reinforcement learning model for portfolio asset allocation. A concrete implementation model is presented for the method, and experiments prove that this model is effective. The'regression reinforcement learning model that generates and utilizes asset prediction data and artificially generated data' created by the present invention is implemented with Long Short-Term Memory (LSTM), and the asset predicted value data is an asset according to the prediction accuracy during the operating period. Virtual predictions for rising and falling prices are generated and used, and artificially generated data uses a Gaussian process.

Description

{Methode for Performance Improvement of Portfolio Asset Allocation Using Recurrent Reinforcement Learning}

본 발명은 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 관한 것으로서, 좀더 자세하게는 기존의 회귀 강화학습을 기반으로 한 포트폴리오 운용모델의 성능향상을 위해 자산들의 예측값과 인공적으로 생성한 데이터를 사용하는 방법에 관한 것이다.The present invention relates to a method for improving portfolio asset allocation performance using regression reinforcement learning, and more specifically, using predicted values of assets and artificially generated data to improve the performance of a portfolio management model based on existing regression reinforcement learning. It's about how.

최근 들어, 인공지능 기술은 비약적으로 발전되고 있고, 다양한 분야에 적용되어 뛰어난 성과를 거두고 있다. 금융 분야에서도 인공지능을 적용한 산업이 빠르게 발전하고 있는데, 인공지능이 학습한 알고리즘을 이용해 투자조언, 투자결정 및 자산운용을 할 수 있게 되었다. 인공지능이 적용되는 금융 분야의 세부적인 영역으로는 포트폴리오 최적화, 신용등급 평가, 주식투자, 자산예측 등이 있다. 그중 포트폴리오 최적화는 투자의 안정성 확보와 수익 창출이라는 목표를 위해 중요한 의사결정이 필요하다.In recent years, artificial intelligence technology has been rapidly developing, and has been applied in various fields to achieve outstanding results. In the financial field, the industry applying artificial intelligence is rapidly developing, and it is possible to provide investment advice, investment decisions, and asset management using algorithms learned by artificial intelligence. The detailed areas of the financial field where artificial intelligence is applied include portfolio optimization, credit rating evaluation, stock investment, and asset forecasting. Among them, portfolio optimization requires important decision making for the goal of securing investment stability and generating profits.

기존의 포트폴리오 알고리즘으로는 Markowitz의 Mean-Variance모델, 선형계획법, 비선형계획법 등이 있고, 인공지능을 활용한 방법으로는 인공신경망, 강화학습 등의 방법들이 있는데, 그 중 회귀 강화학습 방법이 최근에 많은 관심을 받고 활발히 연구되어 오고 있다. 그러나 회귀 강화학습에 관한 기존의 연구들은 자산들의 과거데이터만 사용하기 때문에 포트폴리오의 성능향상에 도움을 줄 수 있는 다른 요소들에 대한 적용이 부족하다. Existing portfolio algorithms include Markowitz's Mean-Variance Model, Linear Programming, and Nonlinear Programming, and methods using artificial intelligence include artificial neural networks and reinforcement learning. Among them, regression reinforcement learning is recently It has received a lot of attention and has been actively researched. However, since existing studies on regression reinforcement learning only use historical data of assets, there is insufficient application to other factors that can help improve the performance of the portfolio.

이와 관련하여 Markowitz는 포트폴리오를 최적화하는 평균-분산(Mean-Variance)모델을 소개하면서 포트폴리오 이론을 체계화하였다. 마코위츠 모델은 모든 투자기회 중에서 최적의 수익률, 리스크 조합을 가진 투자기회를 결정하는 이론으로, 각 종목의 종목 간의 과거자료, 평균수익률, 분산만을 사용해 분산투자 하는 이론이다. 위험의 정도인 종목 간의 분산을 최소화, 최소 기대수익률 달성, 모든 사용 가능한 금액을 투자하는 제약조건 세 가지를 가진 비선형계획 모델이다. 그리고 Moody는 회귀 강화학습을 사용하여 포트폴리오의 자산 할당 및 거래 시스템을 최적화하는 방법을 제시했다. 또한, Moody와 Saffell은 회귀 강화학습과 Q-Learning을 실제 데이터를 사용하여 비교 실험을 하였고 Q-Learning보다 회귀 강화학습이 더 좋은 결과를 보였다고 소개하였다.In this regard, Markowitz systematized the portfolio theory by introducing a mean-variance model that optimizes the portfolio. The Markowitz model is a theory that determines an investment opportunity with an optimal rate of return and risk combination among all investment opportunities. It is a theory of diversifying investment using only historical data, average rate of return, and variance between the stocks of each stock. It is a nonlinear planning model that has three constraints: the degree of risk: minimizing variance between stocks, achieving a minimum expected return, and investing all available amounts. And Moody presented a method for optimizing the portfolio's asset allocation and trading system using regression reinforcement learning. In addition, Moody and Saffell conducted a comparative experiment on regression reinforcement learning and Q-Learning using real data, and introduced that regression reinforcement learning showed better results than Q-Learning.

Yue Deng이 제시한 모델은 딥러닝과 강화학습의 두 가지 학습개념에서 영감을 얻어 구현되었다. 제시된 모델에서 딥러닝 부분은 유익한 기능 학습을 위한 역동적인 시장 상태를 자동으로 감지한다. 그런 다음 강화학습 부분은 딥러닝을 통해 추출된 정보와 상호 작용하고 알려지지 않은 환경에서 최종 보상을 축적하기 위해 거래 의사 결정을 내린다. 학습 시스템은 심층구조와 반복구조를 모두 나타내는 복합적인 신경망으로 구현했다. 그리고 Saud Almahdi는 매매 시그널과 자산 배분 비중을 얻기 위해 칼마 지수(Calmar Ratio)를 이용한 회귀 강화학습 모델을 제안했다. 실험은 자주 거래되는 상장된 펀드로 구성된 포트폴리오를 사용하여, Expected Maximum Drawdown 기반의 목적 함수인 칼마 지수가 이전에 제안된 회귀 강화학습의 목적 함수와 비교하여 좀더 우수한 성능을 산출한 결과를 발표하였다. 반면 Lu, Daivid W는 회귀 강화학습을 LSTM(Long Short-Term Memory)으로 구현한 모델을 제시하였다. LSTM은 기존에 제시된 RNN 보다 일반적으로 좋은 성능을 보이며 회귀 강화학습의 훈련방식을 BPTT 학습방법으로 사용하여 만족할 수 있음을 보였다. The model proposed by Yue Deng was implemented with inspiration from two learning concepts: deep learning and reinforcement learning. In the presented model, the deep learning part automatically detects the dynamic market conditions for learning beneficial functions. The reinforcement learning part then interacts with the information extracted through deep learning and makes transaction decisions to accumulate final rewards in an unknown environment. The learning system was implemented as a complex neural network representing both deep and repetitive structures. And Saud Almahdi proposed a regression reinforcement learning model using Calmar Ratio to obtain the trading signal and the share of asset allocation. In the experiment, using a portfolio composed of frequently traded listed funds, the results of Kalmar index, an objective function based on Expected Maximum Drawdown, yielded better performance compared to the objective function of regression reinforcement learning previously proposed. On the other hand, Lu and Daivid W presented a model that implemented regression reinforcement learning with Long Short-Term Memory (LSTM). It was shown that LSTM generally shows better performance than the existing RNN and can be satisfied by using the training method of regression reinforcement learning as the BPTT learning method.

한편, 자산예측은 포트폴리오 성능향상에 도움을 줄 가능성이 높다. 이에 대하여 Jain는 “자산예측은 이익을 얻는 중요한 역할을 한다.”고 주장하였고, Mohapatra는 “시장의 변동성으로 인해 예측의 경과가 100% 정확하지 않더라도 투자에 도움이 될 수 있다”고 주장하였다. 수익률이 높고 안정적인 종목들로 포트폴리오를 구성하는 것과 함께 좋은 예측력의 자산예측 모델을 이용하여 매매하면 포트폴리오를 통해 추구하는 목표 달성에 근접할 수 있다. 또한 인공적으로 생성된 데이터를 사용하는 것이 도움이 될 수 있다. 실제 관측된 데이터들만을 사용한 훈련은 포트폴리오의 목표달성을 이루기에는 부족할 수 있다. 주식 데이터가 초 단위, 일 단위로 많은 데이터 포인트들을 가지고 있지만, 데이터 포인트들의 집합은 단지 하나의 트렌드만을 나타내기 때문이다. 훈련 데이터의 부족은 견고한 모델 구축에 문제가 될 수 있다. 따라서 실제 데이터와 유사한 트렌드를 가지면서도 다양한 변동성을 지닌 인공 데이터들을 생성해 학습에 함께 사용한다면 견고한 포트폴리오 모델 학습에 도움을 줄 수 있을 것이다.Meanwhile, asset forecasting is highly likely to help improve portfolio performance. In response, Jain argued that "asset forecasting plays an important role in profit-making," and Mohapatra argued that "the market volatility can help with investment even if the forecast is not 100% accurate." By composing a portfolio with high-yielding and stable stocks and using an asset forecasting model with good predictive power, you can approach the achievement of the goals pursued through the portfolio. It can also be helpful to use artificially generated data. Training using only actual observed data may be insufficient to achieve the portfolio's goals. This is because stock data has many data points in units of seconds or days, but the set of data points represents only one trend. Lack of training data can be a problem in building robust models. Therefore, creating artificial data with various variability while having a trend similar to real data and using it together for training will help to train a robust portfolio model.

그러나 지금까지는, 과거의 자산데이터 외에 자산예측과 인공생성 데이터를 사용하여 회귀 강화학습을 사용한 포트폴리오의 성능을 향상시키는 구체적인 방법이나 모델에 대한 연구나 발명이 없었기 때문에 과거의 자산데이터만으로 포트폴리오 알고리즘을 구현할 수밖에 없었고, 이에 따라 포트폴리오 자산배분의 성능이 낮을 수밖에 없었다. However, until now, there has been no research or invention on a specific method or model for improving the performance of a portfolio using regression reinforcement learning using asset prediction and artificially generated data other than past asset data. As a result, the performance of the portfolio asset allocation was inevitably low.

Marokwitz, H., 1992 Portfolio selection. Journal of Finance 7, 77-91. Marokwitz, H., 1992 Portfolio selection. Journal of Finance 7, 77-91. SeungKyu Hwang, HyungJoon Lim, ShiYong Yoo, “Simulation on the Optimal Asset Allocation with Expected Returns Estimates.”, KAPP 11.1 (2009): 27-57 SeungKyu Hwang, HyungJoon Lim, ShiYong Yoo, “Simulation on the Optimal Asset Allocation with Expected Returns Estimates.”, KAPP 11.1 (2009): 27-57 Moody J et al, 1997, Performance function and reinforcement Learning for trading systems and portfolios, Journal of Forecasting.Moody J et al, 1997, Performance function and reinforcement Learning for trading systems and portfolios, Journal of Forecasting. Moody, J., & Saffell, M. (2001). Learning to trade via direct reinforcement, IEEE Transaction on Neural Networks. Moody, J., & Saffell, M. (2001). Learning to trade via direct reinforcement, IEEE Transaction on Neural Networks. Deng, Y., Bao, F., Kong, Y., Ren, Z., and Dai, Q. (2016). Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Neural Networks and Learning Systems. Deng, Y., Bao, F., Kong, Y., Ren, Z., and Dai, Q. (2016). Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Neural Networks and Learning Systems. Almahdi, S., & Yang, S. Y. (2017). “An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown”, Expert Systems with Applications, 87, 267-279. Almahdi, S., & Yang, S. Y. (2017). “An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown”, Expert Systems with Applications, 87, 267-279. Lu, David W. “Agent Inspired Trading Using Recurrent Reinforcement Learning and LSTM Neural Networks.” arXiv preprint arXiv:1707.07338(2017). Lu, David W. “Agent Inspired Trading Using Recurrent Reinforcement Learning and LSTM Neural Networks.” arXiv preprint arXiv: 1707.07338 (2017). Jain, Vikalp Ravi, Manisha Gupta, and Raj Mohan Singh. “Analysis and Prediction of Individual Stock Prices of Financial Sector Companies in NIFTY50.” International Journal of Information Engineering and Electronic Business 10.2 (2018): 33. Jain, Vikalp Ravi, Manisha Gupta, and Raj Mohan Singh. “Analysis and Prediction of Individual Stock Prices of Financial Sector Companies in NIFTY50.” International Journal of Information Engineering and Electronic Business 10.2 (2018): 33. Mohapatra, Avilasa, et al. “Applications of neural network based methods on stock market prediction: survey.” International Journal of Engineering and Technology(UAE) 7.26 (2018): 71-76. Mohapatra, Avilasa, et al. “Applications of neural network based methods on stock market prediction: survey.” International Journal of Engineering and Technology (UAE) 7.26 (2018): 71-76. Kanghee Park, Hyunjung Shin, “Stock Trading Model using Portfolio Optimization and Forecasting Stock Price Movement.”, KIIE 39.6 (2013): 535-545 Kanghee Park, Hyunjung Shin, “Stock Trading Model using Portfolio Optimization and Forecasting Stock Price Movement.”, KIIE 39.6 (2013): 535-545 Guresen, Erkam, Gulgun Kayakutlu, and Tugrul U. Daim. “Using artificial neural network models in stock market index prediction.” Expert Systems with Applications 38.8 (2011): 10389-10397 Guresen, Erkam, Gulgun Kayakutlu, and Tugrul U. Daim. “Using artificial neural network models in stock market index prediction.” Expert Systems with Applications 38.8 (2011): 10389-10397 Moody, J., Wu, L., Liao, Y., & Saffell, M. (1998). Performance functions and reinforcement learning for trading systems and portfolios. Science, 17 (Feburary 1997), 441-470 Moody, J., Wu, L., Liao, Y., & Saffell, M. (1998). Performance functions and reinforcement learning for trading systems and portfolios. Science, 17 (Feburary 1997), 441-470

본 발명에서는 포트폴리오 자산배분을 위한 회귀 강화학습 모델의 성능향상을 위해, 과거의 자산데이터 외에 자산예측값 데이터와 인공생성 데이터를 생성하여 활용하는 방법에 대하여 구체적인 구현모델을 제시하고 이러한 모델이 효과적이라는 것을 실험을 통해 입증한다. 본 발명에 의하여 창안된 ‘자산예측값 데이터와 인공생성 데이터를 생성하여 활용하는 회귀 강화학습 모델’은 LSTM(Long Short-Term Memory)으로 구현하며, 자산예측값 데이터는 운용기간 동안의 예측정확도에 따른 자산가격의 상승, 하락에 대한 가상 예측값을 생성해 사용하고, 인공생성 데이터는 가우시안 프로세스를 사용한다.In the present invention, in order to improve the performance of the regression reinforcement learning model for portfolio asset allocation, a concrete implementation model is presented for a method of generating and utilizing asset prediction data and artificially generated data in addition to past asset data. Prove it through experimentation. The'regression reinforcement learning model that generates and utilizes asset prediction data and artificially generated data' created by the present invention is implemented with Long Short-Term Memory (LSTM), and the asset predicted value data is an asset according to the prediction accuracy during the operating period. Virtual predictions for rising and falling prices are generated and used, and artificially generated data uses a Gaussian process.

상술한 목적을 달성하기 위하여 창안된, 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법의 일 실시예는, 정보시스템에 의하여 수행되며, 과거자산 데이터, 자산예측값 데이터 및 인공생성 데이터에 대한 회귀 강화학습을 이용하여 포트폴리오 자산배분의 성능을 향상시키는 방법으로서, 일정범위 이내의 예측정확도에 따라 상기 자산예측값 데이터를 생성하는 단계; 상기 과거자산 데이터에 가우시안 프로세스를 적용하여 상기 인공생성 데이터를 생성하는 단계; 상기 과거자산 데이터, 상기 자산예측값 데이터 및 상기 인공생성 데이터를 포트폴리오 운용정보로 하여 LSTM(Long Short-Term Memory)의 Hidden State와 Cell State를 통하여 전달받는 단계; Unfold된 LSTM은 각 시점에 대응되고, t시점의 LSTM으로부터 자산배분비중

를 얻는 단계; 상기 자산배분비중

를 통해 상기 t시점에서의 포트폴리오 수익률

를 얻는 단계; 상기 포트폴리오 수익률

로 T시점까지의 목적함수

를 구하는 단계; 및 상기 목적함수

가 최대화되도록 내부가중치

를 조정하는 단계; 를 포함하는 것을 특징으로 하는 것이 바람직하다.An embodiment of a portfolio asset allocation performance improvement method using regression reinforcement learning according to the present invention, created to achieve the above object, is performed by an information system, and is applied to historical asset data, asset forecast data, and artificially generated data. A method of improving the performance of portfolio asset allocation by using regression reinforcement learning, the method comprising: generating the asset prediction value data according to a prediction accuracy within a certain range; Generating the artificially generated data by applying a Gaussian process to the past asset data; Receiving the historical asset data, the asset predicted value data, and the artificially generated data as portfolio management information and transmitted through a hidden state and a cell state of a long short-term memory (LSTM); The unfolded LSTM corresponds to each time point, and the weight of asset allocation from the LSTM at time t

Obtaining; Share of asset allocation above

The portfolio return at point t above through

Obtaining; Above portfolio return

Objective function up to point T

Obtaining; And the objective function

Internal weight to maximize

Adjusting the; It is preferably characterized in that it comprises a.

본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 또 다른 실시예는, 상술한 특징들에 더하여, 상기 자산예측값 데이터는 포트폴리오 운용기간 동안에 운용시작 시점을 기준으로 상승예측은 1, 하락예측은 -1로 표현하는 것을 특징으로 하는 것도 가능하다.Another embodiment of the method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention is, in addition to the above-described features, the asset forecast value data is 1, It is also possible to characterize the downward prediction by expressing -1.

뿐만 아니라, 이들에 더하여, 상기 가우시안 프로세스는 각 시간에서 관측된 자산의 가격으로 이루어진 훈련데이터와 공분산함수 커널을 사용하여 하나의 확률 분포를 정의하고, 상기 공분산함수 커널은 노이즈모델을 적용한 제곱지수 커널로서 아래와 같은 수식에 의하여 산출되는 것을 특징으로 하는 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법으로 하는 것도 가능하다. 여기서

은 상기 훈련데이터 집합에 대한 불확실성과 관련된 매개변수,

은 기간(일)이다.In addition, in addition to these, the Gaussian process defines one probability distribution using training data consisting of the price of an asset observed at each time and a covariance function kernel, and the covariance function kernel is a square exponential kernel applied with a noise model. It is also possible to improve the portfolio asset allocation performance using regression reinforcement learning, characterized in that it is calculated by the following equation. here

Is a parameter related to the uncertainty for the training data set,

Is the period (days).

또한, 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 또 다른 실시예는, 상술한 특징들에 더하여, 상기 목적함수

는 아래의 미분수식에 의햐여 최대화되는 것을 특징으로 하는 것도 바람직하다.In addition, another embodiment of the method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention is, in addition to the above-described features, the objective function

It is also preferred that is characterized in that it is maximized by the following differential equation.

이상에서 살펴본 바와 같이 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법은 과거의 자산데이터 외에 자산예측값 데이터와 인공생성 데이터를 생성하여 활용할 수 있는 구체적인 모델을 제시하기 때문에 포트폴리오의 성능을 향상시킬 수 있게되며, 후술하게 되는 실험결과에서 보듯이, 자산예측값 데이터와 인공생성 데이터를 생성하여 활용하는 경우 최대 약 34%의 성능향상을 할 수 있게 된다.As described above, the portfolio asset allocation performance improvement method using regression reinforcement learning according to the present invention improves the performance of the portfolio because it presents a specific model that can generate and utilize asset forecast value data and artificially generated data in addition to past asset data. As shown in the experimental results to be described later, when the asset predicted value data and artificially generated data are generated and used, the performance can be improved by up to about 34%.

도 1은 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에서 48% 예측정확도를 갖는 자산예측값 데이터 생성을 예시한 것이다.
도 2는 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에서 인공생성 데이터의 생성과정을 도시한 것이다.
도 3은 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에서 LSTM으로 구현한 회귀 강화학습 모델구조를 도시한 것이다.
도 4는 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법의 실험에 사용된 5개의 포트폴리오 집합을 도시한 것이다.
도 5는 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법의 실험에 사용된 각 자산의 샤프지수를 표시한 것이다.
도 6은 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법의 실험에 사용된 5개의 포트폴리오 집합에 대한 평균샤프지수를 표시한 것이다.
도 7은 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 실험에서 Unfold 수에 따른 성능을 도시한 것이다.
도 8은 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 실험에서 상태길이에 따른 성능을 도시한 것이다.
도 9는 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 실험에서 알고리즘별 성능을 도시한 것이다.
도 10은 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 실험에서 예측값 적용에 따른 샤프지수 변화를 도시한 것이다.
도 11은 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 실험에서 인공생성 데이터 적용에 따른 샤프지수 변화를 도시한 것이다.
도 12는 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 실험에서 인공생성 데이터 조합에 따른 성능 변화를 도시한 것이다.
도 13은 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 실험에서 인공생성 데이터 비율을 도시한 것이다.
도 14는 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 실험에서 인공생성 데이터 비율에 따른 샤프지수 변화를 도시한 것이다.1 illustrates the generation of asset prediction data having 48% prediction accuracy in the method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention.
FIG. 2 shows a process of generating artificially generated data in the method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention.
3 shows the structure of a regression reinforcement learning model implemented by LSTM in the portfolio asset allocation performance improvement method using regression reinforcement learning according to the present invention.
4 shows a set of five portfolios used in an experiment of a method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention.
5 shows the Sharp index of each asset used in the experiment of the portfolio asset allocation performance improvement method using regression reinforcement learning according to the present invention.
6 shows the average sharp index for five portfolio sets used in the experiment of the portfolio asset allocation performance improvement method using regression reinforcement learning according to the present invention.
7 shows the performance according to the number of Unfolds in an experiment on the method for improving the performance of portfolio asset allocation using regression reinforcement learning according to the present invention.
8 shows performance according to state length in an experiment on a method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention.
9 shows the performance of each algorithm in an experiment on a method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention.
10 shows the change in the Sharpe index according to the application of a predicted value in an experiment on a method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention.
11 shows the change of the Sharpe index according to the application of artificially generated data in an experiment on the method for improving the performance of portfolio asset allocation using regression reinforcement learning according to the present invention.
12 is a diagram illustrating a performance change according to a combination of artificially generated data in an experiment on a method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention.
13 shows a ratio of artificially generated data in an experiment on a method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention.
14 shows a change in the Sharpe index according to an artificially generated data ratio in an experiment on a method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention.

이하에서 상술한 목적과 특징이 분명해지도록 본 발명을 상세하게 설명할 것이며, 이에 따라 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한 본 발명을 설명함에 있어서 본 발명과 관련한 공지기술 중 이미 그 기술 분야에 익히 알려져 있는 것으로서, 그 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. Hereinafter, the present invention will be described in detail so that the above-described objects and features become clear, and accordingly, a person of ordinary skill in the technical field to which the present invention pertains will be able to easily implement the technical idea of the present invention. In addition, in describing the present invention, when it is determined that a detailed description of the known technology may unnecessarily obscure the subject matter of the present invention, a detailed description thereof is provided as it is already well known in the technical field among known technologies related to the present invention. I will omit it.

아울러, 본 발명에서 사용되는 용어는 가능한 한 현재 널리 사용되는 일반적인 용어를 선택하였으나, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며 이 경우는 해당되는 발명의 설명부분에서 상세히 그 의미를 기재하였으므로, 단순한 용어의 명칭이 아닌 용어가 가지는 의미로서 본 발명을 파악하여야 함을 밝혀두고자 한다. 실시 예들에 대한 설명에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 실시 예들을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. In addition, the terms used in the present invention have selected general terms that are currently widely used as far as possible, but in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning of the terms has been described in detail in the description of the corresponding invention. It should be noted that the present invention should be understood as the meaning of the term, not the name of. The terms used in the description of the embodiments are only used to describe specific embodiments, and are not intended to limit the embodiments. Singular expressions include plural expressions unless the context clearly indicates otherwise.

실시 예들은 여러 가지 형태로 변경을 가할 수 있고 다양한 부가적 실시 예들을 가질 수 있는데, 여기에서는 특정한 실시 예들이 도면에 표시되고 관련된 상세한 설명이 기재되어 있다. 그러나 이는 실시 예들을 특정한 형태에 한정하려는 것이 아니며, 실시 예들의 사상 및 기술 범위에 포함되는 모든 변경이나 균등물 내지 대체물을 포함하는 것으로 이해되어야 할 것이다. The embodiments may be changed in various forms and may have various additional embodiments, in which specific embodiments are indicated in the drawings and related detailed descriptions are described. However, this is not intended to limit the embodiments to a specific form, and it should be understood that all changes, equivalents, or substitutes included in the spirit and scope of the embodiments are included.

상술한 바와 같이 본 발명에서는, 포트폴리오 자산배분을 위한 회귀 강화학습 모델의 성능향상을 위해, 과거의 자산데이터 외에 자산예측값 데이터와 인공생성 데이터를 생성하여 활용하는 방법에 대한 것이며다. 또한 본 발명에 의하여 창안된 ‘자산예측값 데이터와 인공생성 데이터를 생성하여 활용하는 회귀 강화학습 모델’은 LSTM(Long Short-Term Memory)으로 구현하며, 자산예측값 데이터는 운용기간 동안의 예측정확도에 따른 자산가격의 상승, 하락에 대한 가상 예측값을 생성해 사용하고, 인공생성 데이터는 가우시안 프로세스를 사용한다.As described above, in the present invention, in order to improve the performance of the regression reinforcement learning model for portfolio asset allocation, a method of generating and utilizing asset prediction value data and artificially generated data in addition to past asset data. In addition, the'regression reinforcement learning model that generates and utilizes asset prediction data and artificially generated data' created according to the present invention is implemented by LSTM (Long Short-Term Memory), and the asset forecast data is based on the prediction accuracy during the operation period. Virtual predictions for the rise and fall of asset prices are generated and used, and artificially generated data uses a Gaussian process.

이하에서는 첨부된 도면을 참조하여 본 발명을 설명한다. 먼저 자산예측값 데이터의 생성에 대하여 도 1을 참조하여 설명한다. 도 1은 본 발명에 의한, 자산예측값 데이터 생성을 예시한 것인데, 예측정확도에 따른 자산예측 정보를 회귀 강화학습에 적용하기 위해서는 일정한 예측 정확도에 따라 상기 자산예측값 데이터를 인공적으로 생성하도록 하는 것이 바람직한데, 상기 자산예측값 데이터는 운용기간 동안에 운용시작 시점을 기준으로 상승예측은 1, 하락예측은 -1로 표현하도록 하는 것이 더욱 바람직하다. 또한 자산예측값 정확도는 38%부터 64%까지 2% 단위로 나누어 적용하도록 하고, 생성된 자산예측값 데이터를 회귀 강화학습의 입력에 추가하도록 하는 것이 바람직하다. 도 1은 48% 예측 정확도로 생성된 자산예측값 데이터 생성 예시이다.Hereinafter, the present invention will be described with reference to the accompanying drawings. First, the generation of the asset prediction value data will be described with reference to FIG. 1. 1 illustrates the generation of asset prediction data according to the present invention. In order to apply the asset prediction information according to the prediction accuracy to regression reinforcement learning, it is desirable to artificially generate the asset prediction data according to a certain prediction accuracy. , It is more preferable to express the asset forecast value data as 1 for an upward forecast and -1 for a downward forecast based on the starting point of operation during the management period. In addition, it is desirable to apply the asset forecast accuracy in 2% increments from 38% to 64%, and add the generated asset forecast data to the input of regression reinforcement learning. 1 is an example of generating asset prediction data generated with 48% prediction accuracy.

다음으로는, 도 2를 참조하여 가우시안 프로세스를 이용한 데이터를 인공으로 생성하는 방법에 대하여 설명한다. 상기 가우시안 프로세스는 각 시간에서 관측된 자산의 가격으로 이루어진 훈련 데이터와 공분산 함수 커널을 사용하여, 함수에 대한 하나의 확률 분포를 정의하도록 하는 것이 바람직한데, 본 발명에서는 제곱지수 커널을 사용하고, 변동성을 부여하기 위하여 노이즈 모델을 사용하도록 하는 것이 더욱 바람직한데, 구체적인 수식은 다음과 같다.Next, a method of artificially generating data using a Gaussian process will be described with reference to FIG. 2. In the Gaussian process, it is preferable to define one probability distribution for the function by using training data consisting of the price of an asset observed at each time and a covariance function kernel.In the present invention, a square index kernel is used, and It is more preferable to use a noise model in order to give a, a specific equation is as follows.

여기서

은 훈련 데이터 집합에 대한 불확실성과 관련된 매개변수,

는 Kronecker delta이며,

은 기간(일)이다. 상기 매개변수

값을 조정하여, 원본데이터의 트렌드는 따르지만 각 시간별로 차이가 있는 인공 생성 데이터를 생성하도록 하는 것이 바람직한데, 도 2는 인공생성 데이터의 생성과정에 대한 예시이다.here

Is the parameter associated with the uncertainty for the training data set,

Is the Kronecker delta,

Is the period (days). Above parameters

By adjusting the value, it is desirable to generate artificially generated data that follows the trend of the original data but differs for each time period. FIG. 2 is an example of a process of generating artificially generated data.

한편, 포트폴리오를 위한 회귀 강화학습의 목적은 환경 즉 주식시장과 상호작용을 통해 목적함수에 해당하는 샤프지수가 최대화되도록 하는 행동 즉 포트폴리오 자산배분 비중을 최적화하는 모델을 학습시키는 것이다. 상기 회귀 강화학습의 주요 특징으로는 이전 자산배분 비중에 관한 정보들을 전달받아 현시점의 입력과 상호작용하여 자산배분 비중을 출력한다. 또한, 모델이 입력으로 받는 상태와 최대화 하려는 목적함수의 설정이 자유로워 다양하게 정의할 수 있다.On the other hand, the purpose of regression reinforcement learning for portfolios is to learn a model that optimizes the proportion of portfolio asset allocation, that is, the behavior that maximizes the Sharp index corresponding to the objective function through interaction with the environment, that is, the stock market. The main characteristic of the regression reinforcement learning is that information on the previous asset allocation proportion is received, and the asset allocation proportion is output by interacting with the current input. In addition, it is possible to define variously as the state received by the model and the objective function to be maximized are freely set.

본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에서는 상기 회귀 강화학습의 모델을 Unfold된 LSTM(Long Short-Term Memory)을 사용하여 구현하도록 하는 것이 바람직하다. 상기 LSTM은 회귀 강화학습과 같이 이전 포트폴리오 운용에 대한 정보를 LSTM의 Hidden State와 Cell State를 통해서 이전의 정보를 전달받아 현재 시점에서 입력으로 받은 상태와 상호작용하여 행동을 결정한다.In the portfolio asset allocation performance improvement method using regression reinforcement learning according to the present invention, it is preferable to implement the regression reinforcement learning model using an unfolded long short-term memory (LSTM). The LSTM receives previous information about previous portfolio management through the hidden state and cell state of the LSTM, such as regression reinforcement learning, and interacts with the state received as an input at the present time to determine an action.

본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에서, 회귀 강화학습 모델의 훈련은 다음과 같은 순서로 진행되도록 하는 것이 바람직하다. 먼저 Unfold된 LSTM은 각 시점에 대응되고, t시점의 LSTM으로부터 자산배분비중

를 얻도록 하는 것이 바람직하다. 그리고 상기 자산배분비중

를 통해 포트폴리오 수익률

를 얻도록 하는 것이 바람직하다. 그리고 T시점까지의 그리고 상기 포트폴리오 수익률

로 목적함수

를 구한다. 최종적으로 상기 목적함수

를 최대화하도록 LSTM의 내부 가중치

를 조정하도록록 하는 것이 바람직하다. 아래의 수식은 상기 목적함수

를 최대화하는 미분 수식이다. 아래의 수식을 정확히 계산하고 최적화하기 위해서 LSTM의 BPTT(Backpropagation Through Time)학습방법을 사용하는 것이 더욱 바람직하다. In the method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention, it is preferable to train the regression reinforcement learning model in the following order. First, the unfolded LSTM corresponds to each time point, and the asset allocation weight from the LSTM at time t

It is desirable to get And the above asset allocation ratio

Portfolio returns through

It is desirable to get And the portfolio return to and above point T

As objective function

Find Finally, the objective function above

LSTM's internal weight to maximize

It is desirable to adjust The formula below is the objective function

Is a differential formula that maximizes It is more preferable to use LSTM's BPTT (Backpropagation Through Time) learning method to accurately calculate and optimize the following equation.

상기 목적함수

는 대표적인 포트폴리오 성능척도인 샤프 지수를 사용하도록 하는 것이 바람직하다. 도 3은 LSTM으로 구현한 회귀 강화학습 모델 구조이다.Above objective function

It is desirable to use the Sharp index, which is a representative portfolio performance measure. 3 is a regression reinforcement learning model structure implemented by LSTM.

한편 본 발명에 적용되는 회귀 강화학습을 이용한 포트폴리오 운용모델 또는 에이전트(Agent)는 상태(State), 행동(Action) 및 보상(Reward)에 대한 정의가 필요하며, 아래와 같이 정의하는 것이 바람직하다. Meanwhile, a portfolio management model or agent using regression reinforcement learning applied to the present invention requires definition of state, action, and reward, and is preferably defined as follows.

에이전트의 액션(Action)

: 시간

시점에서 포트폴리오의

개의 개별 자산들에 대한 자산 배분 비중 벡터. Agent's Action

: time

At the point of the portfolio

The asset allocation weight vector for each of the individual assets.

: 시간

시점에서 포트폴리오의

번째 개별자산에 대한 자산 배분 비중.

: time

At the point of the portfolio

Share of asset allocation to the first individual asset.

에이전트의 상태(State)

: 시간

시점에서 포트폴리오 운용 에이전트의 상태 입력 벡터. Agent State

: time

The state input vector of the portfolio management agent at the time point.

: 예측값 사용 유무에 따라 두 가지 경우로 나뉜다.

: It is divided into two cases depending on whether the predicted value is used or not.

-예측값 사용: 시간

시점에서

번째 개별자산의 과거

일 동안의 자산 가격 일 단위 수익률 벡터와 미래

일 동안의 예측값 벡터,-Use predicted value: time

At this point

The past of the first individual asset

Asset price for days, daily rate of return vector and future

Vector of predicted values for days,

-예측값 미사용: 시간

시점에서

번째 개별자산의 과거

일 동안의 자산 가격 일 단위 수익률 벡터,-Predicted value not used: time

At this point

The past of the first individual asset

Asset price daily rate of return vector for days,

: 시간

시점에서 과거

일 동안의 과거 일 단위 수익률 벡터,

: time

Past at the time point

Vector of past daily returns for days,

: 시간

시점에서

번째 개별자산의 일 단위 수익률,

: time

At this point

The daily rate of return of the first individual asset,

: 시간

시점에서

번째 개별자산의 자산가격

: time

At this point

Asset price of the first individual asset

:

번째 개별자산의 시간

시점에서 미래

+

구간의 예측 정확도에 따른 예측값 벡터, 각 예측값들은 상승 예측 시 1, 하락 예측 시 -1로 설정,

:

Time of the first individual asset

Future at the point

+

A vector of predicted values according to the prediction accuracy of the interval, and each predicted value is set to 1 for rising prediction and -1 for falling prediction,

에이전트의 보상(Reward) 목적함수

: 총 운용 기간 T시간 동안 에이전트의 행동으로 인해 발생한 포트폴리오의 수익률에 대한 샤프 지수, Agent's reward objective function

: Sharp index of the return of the portfolio resulting from the agent's actions during the total operating period T,

: 시간

시점에서 포트폴리오의 수익률,

: time

The return on the portfolio at the time point,

: 시간

시점에서 자산의 가격변화에 따라 변경된 자산 배분 비중

: time

The proportion of the asset allocation that has changed at the time of the asset price change

<효과검증을 위한 실험><Experiment to verify effectiveness>

본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법의 효과검증을 위하여 실 데이터를 적용하여 하기와 같이 실험하였다.In order to verify the effectiveness of the portfolio asset allocation performance improvement method using regression reinforcement learning according to the present invention, real data were applied and experimented as follows.

(1) 실험 데이터(1) Experimental data

도 4에서 보는 바와 같이 지수데이터 7개, 국내 주식 데이터 8개, 해외 주식 데이터 10개 등 총 25개의 데이터를 사용했다. 그리고 5개의 자산들의 집합을 설정해 5개의 포트폴리오를 구성하였다. 모든 실험은 5개의 포트폴리오에 대한 성능의 평균을 이용해 비교하였다. 사용되는 데이터는 모두 일 단위 데이터를 사용했으며, 훈련 데이터의 기간은 2012년 10월 17일부터 2014년 1월 3일까지로 설정했고, 테스트 데이터의 기간은 2014년 1월 6일부터 2015년 3월 26일까지로 설정하였다. 그리고 운용일자는 20일로 고정하였다. 각 자산들의 테스트 기간 샤프지수는 도 5와 같고, 상기 5개의 포트폴리오에 대한 테스트 기간 동안의 평균샤프지수는 도 6에서 보는 바와 같았다.As shown in FIG. 4, a total of 25 data were used, including 7 index data, 8 domestic stock data, and 10 overseas stock data. Then, five portfolios were formed by setting a set of five assets. All experiments were compared using the average of the performance for the five portfolios. All data used were daily data, the training data period was set from October 17, 2012 to January 3, 2014, and the test data period was from January 6, 2014 to 3, 2015. It was set until the 26th of the month. And the operation date is fixed at 20 days. The sharp index of each asset during the test period is as shown in FIG. 5, and the average sharp index of the five portfolios during the test period is as shown in FIG.

(2) 실험 환경(2) Experimental environment

실험 환경은 Intel Xeon 3.50Ghz CPU, 128G DRAM과 NVIDIA GTX 1080을 사용하여 진행했다. 실험 프로그램은 Python과 Tensorflow를 사용했다.The experiment environment was conducted using Intel Xeon 3.50Ghz CPU, 128G DRAM and NVIDIA GTX 1080. The experimental program used Python and Tensorflow.

(3) 최적의 Unfold수 실험(3) Optimal Unfold Number Experiment

LSTM으로 구현한 회귀 강화학습의 최적의 Unfold수를 찾는 실험을 수행했다. Unfold만을 변수로 두고, 상태의 길이는 현시점부터 과거 60일로 고정해 실험했다. Unfold수는 3,5,8,10,12로 실험했다. 도 7은 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 실험에서 Unfold 수에 따른 성능을 도시한 것이며, 도 7에서 보는 바와 같이 실험결과 Unfold 수가 3에서 가장 높은 성능을 보였으며, An experiment was conducted to find the optimal number of unfolds for regression reinforcement learning implemented with LSTM. Only Unfold was set as a variable, and the length of the state was fixed at 60 days from the present time. Unfold numbers were tested with 3,5,8,10,12. 7 shows the performance according to the number of Unfolds in the experiment of the method for improving the portfolio asset allocation performance using regression reinforcement learning according to the present invention. As shown in FIG. 7, the experiment result showed the highest performance at 3 ,

(4) 최적의 상태의 길이 실험(4) Optimal length experiment

상기 최적의 Unfold수 실험결과에 따라, Unfold수를 3으로 고정한 다음, 최적의 상태 길이를 찾기 위해 현 시점에서 과거 20일, 40일, 60일, 80일, 100일, 120일로 설정하여 실험했다. 도 8은 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 실험에서 상태길이에 따른 성능을 도시한 것이다. 도 8에서 보는 바와 같이 상태 길이는 과거 20일에서 가장 좋은 성능을 보였다. According to the optimal Unfold number experiment result, the Unfold number was fixed to 3, and then the experiment was performed by setting the past 20 days, 40 days, 60 days, 80 days, 100 days, and 120 days at the present time to find the optimal state length. . 8 shows performance according to state length in an experiment on a method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention. As shown in Fig. 8, the state length showed the best performance in the past 20 days.

(5) 회귀 강화학습과 다른 알고리즘의 성능 비교실험(5) Performance comparison experiment of regression reinforcement learning and other algorithm

회귀 강화학습의 기본적인 성능을 분석하기 위하여, 상기 최적의 Unfold수 실험결과 및 상기 최적의 상태의 길이 실험결과에 따라 최적의 Unfold수와 상태의 길이를 적용하여 Markowitz모델과 1/N 포트폴리오와 비교실험 하였다. 도 9는 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 실험에서 알고리즘별 성능을 도시한 것인데, 도 9에서 보는 바와 같이 회귀 강화학습 알고리즘에서 가장 좋은 성능을 보였다. In order to analyze the basic performance of regression reinforcement learning, a comparison experiment with the Markowitz model and 1/N portfolio by applying the optimal number of Unfolds and the length of the state according to the experiment result of the optimal number of Unfolds and the length of the optimal state. I did. FIG. 9 shows the performance of each algorithm in an experiment on the portfolio asset allocation performance improvement method using regression reinforcement learning according to the present invention. As shown in FIG. 9, the regression reinforcement learning algorithm shows the best performance.

(6) 자산예측값 적용 실험(6) Test of applying predicted asset value

상기 실험결과들과 자산예측값 데이터를 적용하여 회귀 강화학습 모델의 성능을 실험했다. Unfold수는 3, 상태의 길이는 20일, 예측 정확도는 38% ~ 64%로 2%단위로 나누어 실험했다. 도 10은 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 실험으로서, 과거 자산데이터만을 사용한 회귀 강화학습의 샤프지수 대비 증감률을 그래프로 표현한 것이다. 도 10에서 보는 바와 같이 예측정확도가 높을 때는 물론이고, 낮은 예측정확도에서도 유의미한 상승을 보였다.The performance of the regression reinforcement learning model was tested by applying the above experimental results and asset prediction data. The experiment was performed by dividing the number of unfolds into 3, the length of the state was 20 days, and the prediction accuracy was 38% ~ 64% in 2% units. FIG. 10 is an experiment on a method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention, showing a graph of the increase/decrease rate of regression reinforcement learning using only past asset data. As shown in FIG. 10, when the prediction accuracy is high, as well as when the prediction accuracy is low, a significant increase was shown.

(7) 인공생성 데이터 적용 실험(7) Experiment with artificial generation data application

가우시안 프로세스의 파라미터

을 여러 단계로 나누어 원본 데이터와 함께 학습시켰다. 도 11은 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 실험에서, 과거자산 데이터만을 이용한 회귀 강화학습의 샤프지수 대비 인공생성 데이터 적용에 따른 샤프지수 변화를 도시한 것인데, 도 11에서 보는 바와 같이 원본 데이터와 인공생성 데이터의 비율은 1:1이고, 인공생성 데이터를 적용한 모든 경우에서 성능향상을 보였다. Gaussian process parameters

Was divided into several steps and trained with the original data. FIG. 11 is a diagram illustrating a change in the Sharp index according to the application of artificially generated data compared to the Sharp index of regression reinforcement learning using only past asset data in an experiment on the method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention. As shown in Fig. 11, the ratio between the original data and the artificially generated data is 1:1, and the performance improved in all cases of applying the artificially generated data.

(8) 인공생성 데이터 조합의 성능실험(8) Performance experiment of artificially generated data combination

상기 (7)의 실험에서는 원본 데이터와 인공생성 데이터의 비율이 1:1이었지만, 이 실험에서는 원본 데이터와 2개의 다른 파라미터의 인공생성 데이터의 조합으로 실험을 진행했다. 각각의 비율을 1:1:1로 설정하여 진행했다. 도 12는 본 발명에 의한 회귀 강화학습을 이용한 포트폴리오 자산배분 성능향상 방법에 대한 실험에서 과거자산 데이터만을 이용한 회귀 강화학습의 샤프지수 대비 증감률에 대비할 때 인공생성 데이터 조합에 따른 성능 변화를 도시한 것이다. In the experiment of (7), the ratio of the original data and the artificially generated data was 1:1, but in this experiment, the experiment was conducted with a combination of the original data and artificially generated data of two different parameters. Each ratio was set to 1:1:1. FIG. 12 shows the performance change according to the combination of artificially generated data when compared to the increase/decrease rate compared to the Sharp index of regression reinforcement learning using only past asset data in an experiment on the method for improving portfolio asset allocation performance using regression reinforcement learning according to the present invention. will be.

(9) 인공생성 데이터의 비율증가에 따른 성능실험(9) Performance test according to the increase in the ratio of artificially generated data

상기 (7) 및 상기 (8)의 실험에서 최고의 성능을 보인 경우, 즉 원본 데이터, 첫 번째

=0.002, 두 번째

=0.008인 경우에 대하여 원본 데이터와 인공생성 데이터의 비율을 조정한 실험을 진행했다. 이 경우 인공생성 데이터의 비율증가가 성능향상에 어떠한 영향을 미치는지 실험하였다. 도 13은 인공생성 데이터의 비율을 나타내고 있고, 실험결과는 도 14와 같은데 과거자산 데이터만을 이용한 회귀 강화학습의 샤프지수 대비 인공생성 데이터의 비율에 따른 증감률이 도시되어 있다. 도 14에서 보는 바와 같이 인공생성 데이터의 비율이 1:1:1인 경우와 1:3:3인 경우가 높게 나오며, 인공생성 데이터의 비율이 1:3:3인 경우는 약 34%의 성능향상을 보이고 있다. In the case of showing the best performance in the experiments of (7) and (8) above, that is, the original data, the first

=0.002, second

For the case of =0.008, an experiment was conducted in which the ratio of the original data and artificially generated data was adjusted. In this case, we experimented to see how the increase in the rate of artificially generated data affects the performance improvement. Fig. 13 shows the ratio of artificially generated data, and the experimental result is the same as Fig. 14, showing the increase and decrease rate according to the ratio of artificially generated data to the Sharp index of regression reinforcement learning using only past asset data. As shown in Fig. 14, when the ratio of artificially generated data is 1:1:1 and 1:3:3 are high, when the ratio of artificially generated data is 1:3:3, the performance is about 34%. It is showing improvement.

(10) 실험결과(10) Experiment results

실험결과를 통해 회귀 강화학습을 이용한 포트폴리오 자산배분시 과거예측값 데이터와 인공생성 데이터를 사용하는 것이 성능향상에 크게 도움이 되었음을 알 수 있었고, 최대 약 34%의 성능향상이 있는 것을 확인하였다.Through the experimental results, it was found that the use of historical predicted value data and artificially generated data greatly helped improve performance when portfolio asset allocation using regression reinforcement learning, and it was confirmed that there was a maximum improvement of about 34%.

상술한 여러 가지 예로 본 발명을 설명하였으나, 본 발명은 반드시 이러한 예들에 국한되는 것이 아니고, 본 발명의 기술사상을 벗어나지 않는 범위 내에서 다양하게 변형 실시될 수 있다. 따라서 본 발명에 개시된 예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 예들에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 한다. Although the present invention has been described with various examples described above, the present invention is not necessarily limited to these examples, and various modifications may be made without departing from the spirit of the present invention. Accordingly, the examples disclosed in the present invention are not intended to limit the technical idea of the present invention, but to explain the technical idea, and the scope of the technical idea of the present invention is not limited by these examples. The scope of protection of the present invention should be interpreted by the claims below, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

Claims

As a method of improving the performance of portfolio asset allocation through regression reinforcement learning on past asset data, asset forecast data and artificially generated data, performed by an information system,
Generating, by the asset prediction value data generation module of the information system, the asset prediction value data according to a prediction accuracy within a predetermined range;
Generating, by a Gaussian process module of the information system, the artificially generated data by applying a Gaussian process to the past asset data;
The LSTM (Long Short-Term Memory) of the information system uses the historical asset data, the asset predicted value data, and the artificially generated data as portfolio management information and transmits it through the Hidden State and Cell State of the Long Short-Term Memory (LSTM). Receiving step;
The LSTM in which the regression reinforcement learning model training module of the information system is unfolded corresponds to each time point, and the asset allocation ratio from the LSTM at time t

Obtaining;
The regression reinforcement learning model training module of the information system

The portfolio return at point t above through

Objective function up to point T

Obtaining; And
The regression reinforcement learning model training module of the information system

Internal weight to maximize

Adjusting the; Including,
The asset forecast value data is expressed as 1 for the upward forecast and -1 for the downward forecast, based on the start of management during the portfolio management period, and the asset forecast accuracy is applied in 2% increments from 38% to 64%, and the asset forecast value Add data to the input of regression reinforcement learning,
The Gaussian process defines one probability distribution using training data consisting of the price of an asset observed at each time and a covariance function kernel,
The covariance function kernel is a square exponential kernel to which a noise model is applied, and is calculated by the following equation,

Is a parameter related to the uncertainty for the training data set,

The Kronecker delta,

Silver period (days)
Above parameters

Portfolio asset allocation performance improvement method using regression reinforcement learning, characterized in that by adjusting values to generate artificially generated data that differs for each time period

delete

The method of claim 1,
Above objective function

Is a portfolio asset allocation performance improvement method using regression reinforcement learning characterized in that it is maximized by the following differential equation