KR20210125773A

KR20210125773A - A real-time stock price prediction system using LSTM neural network and text miner

Info

Publication number: KR20210125773A
Application number: KR1020200043398A
Authority: KR
Inventors: 홍성혁; 고경일
Original assignee: 백석대학교산학협력단
Priority date: 2020-04-09
Filing date: 2020-04-09
Publication date: 2021-10-19
Also published as: KR102355255B1

Abstract

The present invention relates to a real-time stock price prediction system using a long short-term memory (LSTM) neural network and a text miner which uses an LSTM neural network to use past data of a stock price to train the LSTM neural network, predicts the stock price with the trained neural network, and analyzes real-time stock market news through a text miner to assign a weight to the stock price. The real-time stock price prediction system using an LSTM neural network and a text miner comprises: a neural network module having an LSTM neural network; a neural network training unit using past stock price data to train the LSTM neural network; a text miner extracting mood data representing whether a favorable factor or an unfavorable factor is present in a number by text-mining news data; and a stock price prediction unit inputting stock price data up to the day into the LSTM neural network to acquire outputted prediction stock price data, and giving weight to the acquired prediction stock price data with the mood data to calculate final prediction stock price data. By the system, an accurate stock price can be predicted by reflecting all news environments changing constantly and prediction by time series of the stock price by predicting the stock price through the LSTM neural network and giving a weight with analysis results of real-time news through the text miner.

Description

{ A real-time stock price prediction system using LSTM neural network and text miner }

본 발명은 LSTM(long short-term memory networks) 신경망을 이용하여 주가의 과거 데이터를 이용하여 학습시키고, 학습된 신경망으로 주가를 예측하되, 텍스트 마이너를 통해 실시간 증시 뉴스를 분석하여 예측된 주가에 가중치를 부가하는, LSTM 신경망과 텍스트 마이너를 이용한 주가 실시간 예측 시스템에 관한 것이다.The present invention uses LSTM (long short-term memory networks) neural network to learn stock prices using historical data, predict stock prices with the learned neural network, and analyze real-time stock news through text miners to give weights to the predicted stock prices. It relates to a real-time stock price prediction system using an LSTM neural network and a text miner.

일반적으로, 현재 주가 예측 시스템은 과거의 데이터를 분석하여 시계열을 통해 추론하거나[특허문헌 1], 인공지능을 이용하여 주가의 변동 패턴을 학습하고 미래를 예측하는 방식을 이용하고 있다[특허문헌 2].In general, the current stock price prediction system analyzes past data and infers it through time series [Patent Document 1], or uses artificial intelligence to learn the stock price fluctuation pattern and predict the future [Patent Document 2] ].

그러나 주가에 미치는 요소들이 다양하고, 과거의 데이터를 가지고 미래를 예측하는 것은 사실상 불가능하다. 또한, 주식 시장은 과거의 거래 패턴이 반복하지 않고, 국제관계, 특정사건, 유명인의 발언 등 매 순간 새로운 상황에 영향을 받는다. 그래서 통상의 딥러닝 알고리즘을 적용한다고 해도 시시각각 변화하는 주식의 가격을 예측하는 것은 거의 불가능하다.However, there are various factors that affect stock prices, and it is virtually impossible to predict the future based on past data. In addition, the stock market does not repeat past trading patterns, but is influenced by new situations at every moment, such as international relations, specific events, and famous statements. Therefore, it is almost impossible to predict the price of stocks that change every moment even if a normal deep learning algorithm is applied.

즉, 통상의 인공 신경망의 경우 모든 입출력이 독립적이라고 가정하고 계산을 한다. 그러나 주가는 과거의 데이터, 즉, 과거 주가가 미래의 주가에 영향을 준다. 따라서 이러한 특성을 잘 반영하는 학습 방법을 적용할 필요가 있다.That is, in the case of a normal artificial neural network, it is assumed that all inputs and outputs are independent and the calculation is performed. However, stock prices are historical data, that is, past stock prices influence future stock prices. Therefore, it is necessary to apply a learning method that reflects these characteristics well.

또한, 주가의 시계열 분석 뿐만 아니라 뉴스에 의한 시장 상황을 반영하여 주가를 예측할 필요가 있다.In addition, it is necessary to predict stock prices by reflecting market conditions based on news as well as time series analysis of stock prices.

한국 등록특허공보 제10-1508361호(2015.04.08.공고)Korean Patent Publication No. 10-1508361 (2015.04.08. Announcement) 한국 등록특허공보 제10-1458004호(2014.11.04.공고)Korean Patent Publication No. 10-148004 (2014.11.04. Announcement)

Sepp Hochreiter, JuRgen Schmidhuber, "Long Short-Term Memory", Neural Computation, Vol. 9 , Issue 8, November 1997, pp 1735??1780 Sepp Hochreiter, JuRgen Schmidhuber, "Long Short-Term Memory", Neural Computation, Vol. 9, Issue 8, November 1997, pp 1735??1780

본 발명의 목적은 상술한 바와 같은 문제점을 해결하기 위한 것으로, LSTM(long short-term memory networks) 신경망을 이용하여 주가의 과거 데이터를 이용하여 학습시키고, 학습된 신경망으로 주가를 예측하되, 텍스트 마이너를 통해 실시간 증시 뉴스를 분석하여 예측된 주가에 가중치를 부가하는, LSTM 신경망과 텍스트 마이너를 이용한 주가 실시간 예측 시스템을 제공하는 것이다.An object of the present invention is to solve the above-described problems, by using a long short-term memory networks (LSTM) neural network to learn stock prices using historical data, and predicting stock prices using the learned neural network, It is to provide a real-time stock price prediction system using LSTM neural network and text miner that analyzes real-time stock news and adds weight to the predicted stock price.

즉, 본 발명의 목적은 순환 인공신경망(RNN, recurrent neural networks) 계열인 LSTM 신경망을 이용하여 주가의 시계열성에 의해 예측 데이터를 생성하고, 텍스트 마이너(YTextMiner 등)를 이용하여 실시간 증시뉴스를 분석하여 호재 및 악재로 판단 후 예측 데이터 생성 시 가중치(Weight)를 조절하여 과거 시계열 데이터와 비교하여 최적의 주가를 예측하는, LSTM 신경망과 텍스트 마이너를 이용한 주가 실시간 예측 시스템을 제공하는 것이다.That is, an object of the present invention is to generate prediction data by the time series of stock prices using a LSTM neural network, which is a series of recurrent neural networks (RNN), and analyze real-time stock news using text miners (YTextMiner, etc.) This is to provide a real-time stock price prediction system using an LSTM neural network and text miner that predicts the optimal stock price by comparing it with past time series data by adjusting the weight when generating forecast data after judging it as good news or bad news.

상기 목적을 달성하기 위해 본 발명은 LSTM 신경망과 텍스트 마이너를 이용한 주가 실시간 예측 시스템에 관한 것으로서, LSTM 신경망을 구비하는 신경망 모듈; 과거의 주가 데이터를 이용하여 상기 LSTM 신경망을 학습시키는 신경망 학습부; 뉴스 데이터를 텍스트 마이닝하여 호재 또는 악재 여부를 수치로 나타내는 무드 데이터를 추출하는 텍스트 마이너; 및, 당일까지의 주가 데이터를 상기 LSTM 신경망에 입력시켜, 출력되는 예측 주가 데이터를 획득하고, 획득된 예측 주가 데이터에 상기 무드 데이터로 가중치를 주어 최종 예측 주가 데이터를 산출하는 주가 예측부를 포함하는 것을 특징으로 한다.In order to achieve the above object, the present invention relates to a real-time stock price prediction system using an LSTM neural network and a text miner, comprising: a neural network module including an LSTM neural network; a neural network learning unit for learning the LSTM neural network using past stock price data; a text miner for extracting mood data that numerically indicates whether news data is good news or bad news by text mining; and a stock price prediction unit that inputs stock price data up to the day to the LSTM neural network to obtain output predicted stock price data, and calculates final predicted stock price data by weighting the obtained predicted stock price data with the mood data. characterized.

또한, 본 발명은 LSTM 신경망과 텍스트 마이너를 이용한 주가 실시간 예측 시스템에 있어서, 상기 신경망 학습부는, (a) 과거 주가의 시계열 데이터를 수집하는 단계; (b) 각 주가의 중간값을 산출하는 단계; (c) 각 주가의 중간값의 증분을 산출하는 단계; (d) 산출된 증분을 정규화 하는 단계; (e) 정규화된 증분의 시계열 데이터로부터 순차 모델을 생성하는 단계; 및, (f) 상기 순차 모델로 상기 LSTM 신경망을 학습시키는 단계를 포함하는 방법을 수행하는 것을 특징으로 한다.In addition, the present invention provides a real-time stock price prediction system using an LSTM neural network and a text miner, the neural network learning unit comprising: (a) collecting time series data of a past stock price; (b) calculating a median value of each stock price; (c) calculating an increment of the median of each stock price; (d) normalizing the calculated increments; (e) generating a sequential model from the normalized incremental time series data; and, (f) training the LSTM neural network with the sequential model.

또한, 본 발명은 LSTM 신경망과 텍스트 마이너를 이용한 주가 실시간 예측 시스템에 있어서, 상기 (c)단계에서, 상기 중간값의 증분에 시그모이드 함수를 적용하여 상기 증분을 정규화 하는 것을 특징으로 한다.In addition, the present invention is characterized in that, in a real-time stock price prediction system using an LSTM neural network and a text miner, the increment is normalized by applying a sigmoid function to the increment of the intermediate value in step (c).

또한, 본 발명은 LSTM 신경망과 텍스트 마이너를 이용한 주가 실시간 예측 시스템에 있어서, 상기 정규화된 증분을 다음 수식 1에 의하여 산출하는 것을 특징으로 한다.In addition, the present invention is characterized in that, in a real-time stock price prediction system using an LSTM neural network and a text miner, the normalized increment is calculated by Equation 1 below.

[수식 1][Formula 1]

단,

,

이고,step,

,

ego,

t는 날짜이고, H(t)와 L(t)는 주가 데이터의 각각 t일의 고가와 저가를 나타냄.t is the date, and H(t) and L(t) represent the high and low prices of stock price data on day t, respectively.

또한, 본 발명은 LSTM 신경망과 텍스트 마이너를 이용한 주가 실시간 예측 시스템에 있어서, 상기 (e)단계에서, 전체 정규화된 증분의 시계열 데이터 S(-N+1),S(-N+2), ....,S(0)로부터 순차적으로 다수의 배치 세트를 생성하되, 배치 크기 n+1로 하나의 세트를 형성하여 일련의 N-n개의 배치 세트를 구성하고, 각 배치 세트는 n개의 과거 주가 데이터 세트와, 해당 세트일 때의 예측 주가의 정답으로 구성함으로써 순차 모델을 구성하는 것을 특징으로 한다.In addition, the present invention provides a real-time stock price prediction system using an LSTM neural network and a text miner, in the step (e), time series data S(-N+1), S(-N+2), . . It is characterized in that the sequential model is constructed by configuring the set and the correct answer of the predicted stock price in the case of the set.

상술한 바와 같이, 본 발명에 따른 LSTM 신경망과 텍스트 마이너를 이용한 주가 실시간 예측 시스템에 의하면, LSTM 신경망을 통해 주가를 예측하고 텍스트 마이너를 통해 실시간 뉴스의 분석 결과로 가중함으로써, 주가의 시계열성에 의한 예측과 동시에 시시각각 변하는 뉴스 환경을 모두 반영하여 보다 정확한 주가를 예측할 수 있는 효과가 얻어진다.As described above, according to the real-time stock price prediction system using the LSTM neural network and the text miner according to the present invention, the stock price is predicted through the LSTM neural network and weighted with the analysis result of the real-time news through the text miner. At the same time as forecasting, the effect of more accurate stock price prediction is obtained by reflecting all the changing news environment from moment to moment.

도 1은 본 발명을 실시하기 위한 전체 시스템의 예시 구성도.
도 2는 본 발명의 일실시예에 따른 LSTM 신경망과 텍스트 마이너를 이용한 주가 실시간 예측 시스템의 구성에 대한 블록도.
도 3은 본 발명의 일실시예에 따른 LSTM 신경망의 구조를 예시한 도면.
도 4는 본 발명의 일실시예에 따른 신경망 학습부의 학습 방법을 설명하는 흐름도.
도 5는 본 발명의 일실시예에 따른 주가 데이터 세트를 예시한 표.
도 6는 본 발명의 일실시예에 따른 주가 학습 방법에 대한 프로그래밍 코드.
도 7은 본 발명의 일실시예에 따른 정규화 방법에 대한 프로그래밍 코드.
도 8은 본 발명의 일실시예에 따른 훈련 방법에 대한 프로그래밍 코드.1 is an exemplary configuration diagram of an entire system for implementing the present invention.
2 is a block diagram of a real-time stock price prediction system using an LSTM neural network and a text miner according to an embodiment of the present invention.
3 is a diagram illustrating the structure of an LSTM neural network according to an embodiment of the present invention.
4 is a flowchart illustrating a learning method of a neural network learning unit according to an embodiment of the present invention.
5 is a table illustrating a stock price data set according to an embodiment of the present invention.
6 is a programming code for a stock price learning method according to an embodiment of the present invention.
7 is programming code for a normalization method according to an embodiment of the present invention;
8 is programming code for a training method according to an embodiment of the present invention;

이하, 본 발명의 실시를 위한 구체적인 내용을 도면에 따라서 설명한다.Hereinafter, specific contents for carrying out the present invention will be described with reference to the drawings.

또한, 본 발명을 설명하는데 있어서 동일 부분은 동일 부호를 붙이고, 그 반복 설명은 생략한다.In addition, in demonstrating this invention, the same part is attached|subjected with the same code|symbol, and the repetition description is abbreviate|omitted.

먼저, 본 발명을 실시하기 위한 전체 시스템의 구성을 도 1을 참조하여 설명한다.First, the configuration of the entire system for implementing the present invention will be described with reference to FIG. 1 .

도 1에서 보는 바와 같이, 본 발명을 실시하기 위한 전체 시스템은 사용자 단말(10), 주가를 예측하는 주가예측 서버(30), 주가 정보를 제공하는 주가정보 서버(50), 및, 뉴스 등을 제공하는 뉴스 서버(60)로 구성된다. 추가적으로, 필요한 데이터를 저장하는 데이터베이스(40)를 더 포함하여 구성될 수 있다.As shown in FIG. 1, the entire system for implementing the present invention includes a user terminal 10, a stock price prediction server 30 for predicting stock prices, a stock price information server 50 for providing stock price information, and news. It is composed of a news server 60 that provides. Additionally, it may be configured to further include a database 40 for storing necessary data.

먼저, 사용자 단말(10)은 사용자가 사용하는 단말로서, 스마트폰, 태블릿PC, 노트북, 개인용 컴퓨터(PC) 등 컴퓨팅 기능을 가지는 통상의 컴퓨터 단말이나 전용 단말이다.First, the user terminal 10 is a terminal used by a user, and is a general computer terminal or dedicated terminal having a computing function, such as a smart phone, a tablet PC, a notebook computer, and a personal computer (PC).

사용자 단말(10)은 주가예측 서버(30)에 접속하여, 주가예측 서버(30)에서 제공하는 주가 예측 서비스를 제공받을 수 있다. 즉, 사용자 단말(10)은 주가예측 서버(30)에서 종목을 입력하고, 입력된 종목에 대한 예상 주가를 요청하여 수신한다.The user terminal 10 may access the stock price prediction server 30 to receive a stock price prediction service provided by the stock price prediction server 30 . That is, the user terminal 10 inputs an item from the stock price prediction server 30 , and requests and receives an expected stock price for the input item.

바람직하게는, 예상 주가는 최고 목표가와 최저 목표가 등으로 구성된다.Preferably, the expected stock price is composed of the highest target price and the lowest target price.

다음으로, 주가정보 서버(50)는 주가 정보를 제공하는 통상의 서버로서, 증권사, 주가 정보제공 업체 등에서 운영되는 서버이다. 주가정보 서버(50)는 다수 개가 존재할 수 있으며, 각 주가 정보 서버(50)는 서로 다른 종류의 주가 정보를 제공할 수 있다.Next, the stock price information server 50 is a normal server that provides stock price information, and is a server operated by a securities company, a stock price information provider, and the like. A plurality of stock price information servers 50 may exist, and each stock price information server 50 may provide different types of stock price information.

바람직하게는, 주가 정보는 고가, 저가, 종가 등으로 구성된다. 또한, 각 주가 정보는 날짜별 정보(날짜별 데이터)로 구성된다. 즉, 주가 정보는 현재 또는 과거의 거래 영업일에 거래된 주가 정보를 포함한다.Preferably, the stock price information consists of a high price, a low price, a closing price, and the like. In addition, each stock price information is composed of information by date (data by date). That is, the stock price information includes stock price information traded on a current or past trading business day.

다음으로, 뉴스 서버(60)는 뉴스를 제공하는 서버로서, 각 신문사, 방송사, 또는 공시 기관 등에 의해 운영되는 서버이다. 뉴스 서버(60)는 온라인 상에 뉴스를 제공한다.Next, the news server 60 is a server that provides news, and is a server operated by each newspaper company, broadcasting company, or public institution. The news server 60 provides news online.

뉴스 정보는 텍스트로 작성된 뉴스 데이터들이다.News information is news data written in text.

뉴스 서버(60)가 제공하는 뉴스는 제공자에 따라 구분될 수 있다. 즉, 뉴스 제공자는 금융감독원, 각 신문사 또는 방송사 등이다.News provided by the news server 60 may be classified according to providers. That is, the news provider is the Financial Supervisory Service, each newspaper or broadcasting company.

다음으로, 주가예측 서버(30)은 통상의 서버로서, 사용자 또는 사용자 단말(10)의 요청에 따라, 주가를 예측하고, 예측된 주가 정보를 사용자에게 제공한다. 바람직하게는, 특정 종목명을 입력받아, 해당 종목에 대한 주가를 예측하여, 해당 종목의 예측 주가 데이터를 제공한다.Next, the stock price prediction server 30 is a normal server, according to the request of the user or the user terminal 10, predicts a stock price, and provides the predicted stock price information to the user. Preferably, by receiving the name of a specific stock, predicting the stock price for the stock, and providing the predicted stock price data of the stock.

특히, 주가예측 서버(30)은 LSTM 신경망의 신경망을 구비하고, 과거의 주가 정보를 이용하여 LSTM 신경망을 학습시킨다. 그리고 학습시킨 LSTM 신경망에 현재 주가 정보를 입력하여, 예측된 주가 데이터를 출력한다.In particular, the stock price prediction server 30 includes a neural network of the LSTM neural network, and learns the LSTM neural network using past stock price information. Then, the current stock price information is input to the trained LSTM neural network, and the predicted stock price data is output.

또한, 주가예측 서버(30)는 예측된 주가 데이터를 최고 목표가와 최저 목표가로 제공할 수 있다. 이때, 바람직하게는, 예측 주가에서 오차 범위를 설정하여 최고 목표가와 최저 목표가를 추출한다.In addition, the stock price prediction server 30 may provide the predicted stock price data as the highest target price and the lowest target price. In this case, preferably, the highest target price and the lowest target price are extracted by setting an error range in the predicted stock price.

또한, 주가예측 서버(30)는 주가정보 서버(50)로부터 주가 정보를 수신하여 활용한다.In addition, the stock price prediction server 30 receives and utilizes stock price information from the stock price information server 50 .

또한, 주가예측 서버(30)는 텍스트 마이너를 구비하여, 텍스트 마이너를 이용하여 뉴스에 대한 마이닝 분석을 수행한다. 텍스트 마이너를 통해, 현재 뉴스가 긍정적인지 부정적인지를 수치(호재 또는 악재에 대한 수치)로 추출한다. 이때의 수치를 뉴스 평가도라 부르기로 한다.In addition, the stock price prediction server 30 includes a text miner, and performs mining analysis on news using the text miner. Through the text miner, whether the current news is positive or negative is extracted as a number (a number for good news or bad news). The figure at this time will be referred to as a news evaluation map.

특히, 텍스트 마이너에 해당 종목명을 입력하여, 해당 종목에 대한 뉴스 평가도를 추출한다. 또한, 텍스트 마이너는 뉴스 서버(60)로부터 뉴스 데이터를 분석하여 뉴스 평가도를 추출한다.In particular, by inputting the name of the item in the text minor, the news evaluation for the item is extracted. In addition, the text miner analyzes news data from the news server 60 to extract a news rating.

또한, 주가예측 서버(30)는 뉴스 평가도를 가중치로 변환하여, 앞서 구한 예측 주가 데이터에 해당 가중치를 가중하여 최종 예측 주가 데이터를 산출한다.In addition, the stock price prediction server 30 converts the news rating into a weight, and weights the weight to the previously obtained predicted stock price data to calculate the final predicted stock price data.

다음으로, 본 발명의 일실시예에 따른 주가 실시간 예측 시스템(30)의 구성을 도 2를 참조하여 설명한다.Next, the configuration of the real-time stock price prediction system 30 according to an embodiment of the present invention will be described with reference to FIG. 2 .

본 발명에 따른 주가 실시간 예측 시스템(30)은 앞서 주가예측 서버(30) 형태로 구현될 수 있다. 또는, 주가 실시간 예측 시스템(30)은 서버-클라이언트 형태로 구현되고, 사용자 단말(10)에 클라이언트 모듈이 설치될 수 있다.The real-time stock price prediction system 30 according to the present invention may be implemented in the form of the stock price prediction server 30 above. Alternatively, the real-time stock price prediction system 30 may be implemented in a server-client form, and a client module may be installed in the user terminal 10 .

도 2에서 보는 바와 같이, 본 발명의 일실시예에 따른 주가 실시간 예측 시스템(30)은 LSTM 신경망으로 구성되는 신경망 모듈(31), 신경망을 학습시키는 신경망 학습부(32), 뉴스를 분석하는 텍스트 마이너(33), 및, 주가를 예측하는 주가 예측부(34)로 구성된다.As shown in FIG. 2 , the real-time stock price prediction system 30 according to an embodiment of the present invention includes a neural network module 31 composed of an LSTM neural network, a neural network learning unit 32 for learning a neural network, and a text for analyzing news. It is composed of a minor 33 and a stock price prediction unit 34 that predicts a stock price.

먼저, 신경망 모듈(31)은 LSTM(long short-term memory networks) 신경망을 구비한다.First, the neural network module 31 includes a long short-term memory networks (LSTM) neural network.

LSTM(Long Short Term Memory networks) 신경망은 RNN(순환신경망, Recurrent Neural Network)의 한 종류이며, 과거의 데이터가 미래에 영향을 줄 수 있는 구조를 가진다. 따라서 LSTM 신경망은 주가 예측에 가장 적합한 모델이다.LSTM (Long Short Term Memory networks) neural network is a type of RNN (Recurrent Neural Network), and has a structure in which past data can influence the future. Therefore, the LSTM neural network is the best model for stock price prediction.

기존의 RNN 신경망에서는 장기 의존성 문제점인 체인 룰(Chain Rule)에 의해 [-1, 1]사이의 값들이 계속 곱해지다보니 앞쪽으로 갈수록 그 값이 작아지고 결국에는 소멸한다. 따라서 기존의 RNN 방식은 파라미터(Parameter)들이 업데이트 되지 않는 문제를 가진다. LSTM은 이를 해결하기 위해 제안된 방식이다[비특허문헌 1].In the existing RNN neural network, the values between [-1, 1] are continuously multiplied by the chain rule, which is a long-term dependency problem, so the value gets smaller as it goes forward and eventually disappears. Therefore, the existing RNN method has a problem that parameters are not updated. LSTM is a method proposed to solve this [Non-Patent Document 1].

기존의 RNN은 은닉층(Hidden Layer)에서 은닉 상태(Hidden State, St)를 계산할 때 단순히 St=tanh(Uxt+WSt)로 계산하였지만, LSTM에서는 총 4가지의 계산과정이 있다.In the conventional RNN, when calculating the hidden state (St) in the hidden layer, it is simply calculated as St = tanh (Uxt + WSt), but in LSTM, there are a total of four calculation processes.

도 3에서 보는 바와 같이, LSTM의 은닉층(Hidden Layer)에서 신경망 레이어(Neural Network Layer)가 4개가 존재한다. 즉, 도 3의 중앙 사각형 모양 내에 신경망 레이어가 4개 존재함을 확인할 수 있다. 또한 LSTM의 핵심은 뉴럴(A)의 상단부분에 있는 수평선에 아주 마이너한 선형 연산을 거치고 전체 체인을 관통하여 정보는 큰 변함 없이 다음 단계로 전달하게 된다는 것이다.As shown in FIG. 3 , there are four neural network layers in the hidden layer of the LSTM. That is, it can be confirmed that four neural network layers exist in the central square shape of FIG. 3 . In addition, the core of LSTM is that a very minor linear operation is performed on the horizontal line at the top of the neural (A), and information is transmitted to the next stage without much change through the entire chain.

또한, LSTM 신경망은 주식 가격의 고가와 저가를 기본값으로 하여 학습된다. 특히, 고가와 저가에 대해 특정 기간(예를 들어 30일, 이하 반영 기간) 동안 과거 데이터를 가지고 훈련시키고 하루를 예측하여 실제 데이터와 비교한다. 이와 같이 비교하면서 반영 기간(일례로서 30일) 단위로 훈련 세트가 미래로 이동하면서 학습된다. 또한, 실시간 뉴스 정보를 분석하여 과거의 유사한 뉴스와 비교하여 당시에 주가에 미친 비율을 계산한다. 그리고 가중치를 조정을 반복하여 최종 목표가를 정하여 주식에 대한 매매나 매수를 결정하도록 한다.In addition, the LSTM neural network is trained with the high and low prices of stocks as default values. In particular, for high and low prices, we train with historical data for a specific period (for example, 30 days, less reflection period), predict one day, and compare it with the real data. In this comparison, the training set is learned as it moves into the future in units of a reflection period (30 days as an example). In addition, by analyzing real-time news information, it calculates the ratio of the impact to the stock price at the time by comparing it with similar news in the past. Then, by repeating the adjustment of the weights, the final target price is set to decide whether to buy or sell the stock.

또한, LSTM 신경망은 하나의 종목에 대하여 학습된다. 따라서 새로운 종목(또는 다른 종목)의 주가를 예측하기 위해서는, LSTM 신경망을 초기화 하고, 해당 종목의 과거 주가 데이터로 다시 학습시킨다.In addition, the LSTM neural network is trained for one event. Therefore, in order to predict the stock price of a new stock (or another stock), the LSTM neural network is initialized and trained again with the historical stock price data of the relevant stock.

또는, 다른 실시예로서, 신경망 모듈(31)은 예측하고자 하는 각 종목에 대응되는 LSTM 신경망을 종목별로 구비한다. 즉, 각 종목에 해당하는 LSTM 신경망을 다수 구비하고, 종목별로 해당 종목의 과거 주가 데이터로 학습한다.Alternatively, as another embodiment, the neural network module 31 includes an LSTM neural network corresponding to each event to be predicted for each event. That is, a plurality of LSTM neural networks corresponding to each stock are provided, and each stock learns from the historical stock price data of the stock.

다음으로, 신경망 학습부(32)는 LSTM 신경망을 학습시킨다. 즉, 주가의 과거 데이터를 수집하여, 수집된 과거 데이터를 이용하여 LSTM 신경망을 학습시킨다. 특히, 바람직하게는, 예측하려는 당일까지의 주가 정보의 시계열 데이터를 LSTM 신경망에 넣어, LSTM 신경망을 학습시킨다.Next, the neural network learning unit 32 trains the LSTM neural network. That is, by collecting historical data of stock prices, the LSTM neural network is trained using the collected historical data. In particular, preferably, time series data of stock price information up to the day to be predicted is put into the LSTM neural network to train the LSTM neural network.

신경망 학습부(32)가 LSTM 신경망을 학습하는 구체적인 방법이 도 4에 도시되고 있다.A specific method for the neural network learning unit 32 to learn the LSTM neural network is illustrated in FIG. 4 .

먼저, 주가의 과거 데이터를 수집한다(S21). 예측하고자 하는 날짜의 전일까지의 주가 데이터를 수집한다. 일례로서, 익일의 주가를 예측하려면, 당일까지의 과거 데이터를 수집한다.First, historical data of stock prices are collected (S21). Collect stock price data up to the day before the date you want to predict. As an example, in order to predict the stock price of the next day, historical data up to that day are collected.

다음으로, 도 5에서 보는 바와 같이, 주가 예측은 과거의 주가의 중간값을 기본으로 한다. 중간값은 다음 식과 같이 고가와 저가의 평균으로 구한다(S22).Next, as shown in FIG. 5 , the stock price prediction is based on the median value of the past stock price. The median value is obtained as the average of the high and low prices as shown in the following equation (S22).

[수학식 1][Equation 1]

여기서, t는 날짜이고, M(t)은 t일의 중간값이고, H(t)와 L(t)는 각각 t일의 고가와 저가를 나타낸다.Here, t is the date, M(t) is the median value of day t, and H(t) and L(t) represent the high and low prices of day t, respectively.

도 5의 표는 2019년 2월부터 2020년 2월까지의 삼성전자의 주가 데이터 세트이다.The table in Fig. 5 is a data set of Samsung Electronics' stock price from February 2019 to February 2020.

다음으로, 기준 날은 주식 가격(또는 증분)을 0으로 해서 다음 날 증분을 계산한다(S23).Next, for the reference day, the stock price (or increment) is set to 0 and the increment of the next day is calculated (S23).

도 5의 예에서, No.1이 기준일 경우, 기준 날의 증분 값은 0이 된다. 또한, 주가는 1 보다 작거나 -1 보다 크게 이전되어 정규화 된 프로세스를 나타낸다.In the example of Fig. 5, when No. 1 is the reference, the increment value of the reference day becomes 0. Also, stock prices are shifted less than 1 or greater than -1, indicating a normalized process.

즉, 먼저, t일의 증분 △M(t)을 다음 식에 의해 구한다.That is, first, the increment ΔM(t) of day t is obtained by the following equation.

[수학식 2][Equation 2]

즉, 높은 가격(고가)과 낮은 가격(저가)의 평균 가격(중간값)을 계산한 다음 증분을 계산한다.That is, the average price (median) of the high (high) and low (low) prices are calculated, and then the increment is calculated.

다음으로, 증분 △M(t)을 시그모이드 함수로 정규화 한다(S24). 정규화는 다음 식과 같다.Next, the increment ΔM(t) is normalized with a sigmoid function (S24). Normalization is as follows.

[수학식 3][Equation 3]

여기서, S(t)는 t일의 정규화된 증분이다.where S(t) is the normalized increment of t days.

다음으로, 정규화된 증분의 시계열 데이터를 순차 모델을 생성한다(S25).Next, a sequential model is generated using the normalized incremental time series data (S25).

전체 시계열 데이터의 크기를 N이라 하고, 배치 크기를 n으로 설정한다. 이때, N > n이고, 바람직하게는, N > 2n을 만족하도록 N과 n을 설정한다. 즉, 배치 크기 보다 많은 배치 개수를 만들기 위한 것이다.Let the size of the entire time series data be N, and set the batch size to n. In this case, N > n, preferably, N and n are set to satisfy N > 2n. That is, to make the number of batches larger than the batch size.

전체 정규화된 증분의 시계열 데이터는 일련의 날짜에 의한 데이터로서, S(-N+1), S(-N+2), ...., S(0) 이다. S(-k)는 k일 이전의 정규화된 증분 데이터이고, S(0)는 당일의 정규화된 증분 데이터이다.The full normalized incremental time series data is data by a series of dates, S(-N+1), S(-N+2), ...., S(0). S(-k) is the normalized incremental data of the previous k days, and S(0) is the normalized incremental data of the day.

이때, 배치 크기 n으로 배치(batch)를 생성한다. 전체 데이터 크기 N인 경우, 다음과 같은 배치들이 N-n개가 생성된다.At this time, a batch is created with a batch size n. For a total data size of N, N-n batches are generated as follows.

[ S(-N+1), S(-N+2), S(-N+3), ..., S(-N+n) ], S(-N+n+1)[ S(-N+1), S(-N+2), S(-N+3), ..., S(-N+n) ], S(-N+n+1)

[ S(-N+2), S(-N+3), S(-N+4), ..., S(-N+n+1) ], S(-N+n+2)[ S(-N+2), S(-N+3), S(-N+4), ..., S(-N+n+1) ], S(-N+n+2)

......

[ S(-n-1), S(-n), S(-n+1), ..., S(-2) ], S(-1)[ S(-n-1), S(-n), S(-n+1), ..., S(-2) ], S(-1)

[ S(-n), S(-n+1), S(-n+2), ..., S(-1) ], S(0)[ S(-n), S(-n+1), S(-n+2), ..., S(-1) ], S(0)

위와 같이, 순차 모델은 N개의 연속된 시계열 데이터(정규화된 증분 데이터)를 배치 크기 n+1로 하나의 세트를 구성하되, 순차적으로 세트를 구성하여 일련의 N-n개의 배치 세트를 구성한다. 이때, 각 배치 세트의 마지막 데이터를 해당 세트의 결과 값(진짜 값)으로 설정한다. 즉, 각 배치는 n개의 과거 주가 데이터 세트와, 해당 세트일 때의 익일 주가(예측할 주가의 진짜 주가)(정답인 주가)로 구성된다.As described above, the sequential model constructs one set of N consecutive time series data (normalized incremental data) with a batch size of n+1, but constructs a set sequentially to form a series of N-n batch sets. At this time, the last data of each batch set is set as the result value (true value) of that set. That is, each batch consists of a data set of n historical stock prices, and the next-day stock price in that set (the real stock price of the predicted stock price) (the correct stock price).

다음으로, 각 배치 세트를 하나의 학습 데이터로 하여, LSTM 신경망을 학습시킨다(S26).Next, using each batch set as one training data, the LSTM neural network is trained (S26).

도 6는 파이썬(Python) 언어로 작성된 주가 학습 코드의 일부이다. Seq_len은 훈련(학습) 세트를 의미하므로 도 6의 No.1 ~ No. 30으로 시작한다. 30일간의 주식 거래 데이터 세트 인 30 개의 훈련 세트를 사용한다. 즉, 배치 크기가 30(또는 30일)이고, 배치 개수는 30개이다.6 is a part of stock learning code written in Python language. Seq_len means a training (learning) set, so No. 1 to No. start with 30 We use a training set of 30, which is a 30-day stock trading data set. That is, the batch size is 30 (or 30 days), and the number of batches is 30.

학습 후, 다음날 하루 가격을 예측하여 주식 투자자가 주식을 판매할 것인지 구매할 것인지 결정할 수 있다.After learning, by predicting the price for the next day, the stock investor can decide whether to sell or buy the stock.

바람직하게는, 도 7에서 보는 바와 같이, 주식 가격을 잘 예측하기 위해 데이터 세트(또는 배치 세트)를 정규화한다. 훈련 데이터와 훈련 데이터를 검증하는 테스트 데이터 세트를 9 : 1 비율로 나눈다. 즉, "row = int (round (result.shape [0] * 0.9))"의 코드와 같다.Preferably, the data set (or batch set) is normalized to predict stock prices well, as shown in FIG. 7 . We divide the training data and the test data set that validates the training data in a 9:1 ratio. That is, it is equivalent to the code of "row = int (round (result.shape[0] * 0.9))".

도 7의 0.9는 이 훈련 세트는 데이터 세트의 90%이고 0.1은 테스트 세트가 데이터 세트의 10%임을 의미한다. 미래에 예측할 주식에 따라 해당 비율은 달라질 수 있다. In Figure 7, 0.9 means that this training set is 90% of the data set, and 0.1 means that the test set is 10% of the data set. The ratio may vary depending on the stock to be predicted in the future.

순차 모델을 구축한 후 도 8과 같이 훈련을 시작하여 모델이 맞는지 검증한다. 도 8의 배치 크기(Batch_size)는 연속으로 훈련할 수 있는 데이터 수이며, 에포크(Epochs)는 반복 횟수 및 데이터 단위이다. 검증을 위해 확인된 훈련 세트의 90%와 테스트 세트의 10%를 사용한다. 훈련 세트와 검증을 하는 테스트 세트로 나누어져 있고, 각 세트는 절대 겹치지 않도록 한다.After the sequential model is built, training is started as shown in FIG. 8 to verify whether the model is correct. The batch size (Batch_size) of FIG. 8 is the number of data that can be continuously trained, and epochs are the number of repetitions and data units. For validation, 90% of the identified training set and 10% of the test set are used. It is divided into a training set and a test set for validation, and each set never overlaps.

다음으로, 텍스트 마이너는 해당 종목의 뉴스 데이터를 수집하여 텍스트 마이닝을 수행하고, 해당 종목의 무드 데이터를 산출한다.Next, the text miner collects news data of the corresponding item, performs text mining, and calculates mood data of the corresponding item.

바람직하게는, 텍스트 마이너는 YTextMiner 등 상용화된 도구를 사용한다. 텍스트 마이너(YTextMiner 등)는 텍스트를 분석하는 도구로서, 실시간 증시 뉴스를 분석하여 키워드별 주식 종목에 해당하는 뉴스가 호재인지 악재인지를 판단한다.Preferably, the text miner uses a commercially available tool such as YTextMiner. Text Miner (YTextMiner, etc.) is a text analysis tool, and it analyzes real-time stock news to determine whether news corresponding to stock items by keyword is good news or bad news.

무드 데이터는 해당 종목에 대해 호재인지 악재인지를 수치로 나타낸 지표이다. 호재일 경우 무드 데이터의 값은 커지고, 악재일 경우 무드 값은 작아진다.Mood data is an indicator that numerically indicates whether a stock is good news or bad news. In case of good news, the value of mood data increases, and in case of bad news, the value of mood data decreases.

바람직하게는, 무드 데이터(또는 무드 값)는 (1-α)와 (1+α) 사이의 값을 가진다. 이때, α는 1% ~ 30%의 값을 가진다.Preferably, the mood data (or mood value) has a value between (1-α) and (1+α). In this case, α has a value of 1% to 30%.

다음으로, 주가 예측부(34)는 현재 주가 정보를 신경망 모듈(31) 또는 LSTM 신경망에 입력하여 그 출력값을 획득하고, 출력값에 텍스트 마이너(33)에서 구한 무드 데이터를 가중치로 가중하여, 최종 예측 주가 정보를 산출한다.Next, the stock price prediction unit 34 inputs the current stock price information to the neural network module 31 or the LSTM neural network to obtain an output value thereof, and weights the mood data obtained in the text minor 33 to the output value as a weight to make a final prediction Calculate stock price information.

먼저, 신경망 모듈(31)을 통해, LSTM 신경망에 주가 정보(고가와 저가)의 시계열 데이터를 입력시켜, 주가(또는 익일의 주가)를 예측한다.First, through the neural network module 31, time series data of stock price information (high price and low price) is input to the LSTM neural network to predict stock prices (or stock prices of the next day).

이때, 주가 정보는 정규화된 증분 값으로 변환하여 입력하고, 반대로, 출력값을 역변환하여 예측 주가로서 변환한다.At this time, the stock price information is converted into a normalized incremental value and input, and on the contrary, the output value is inversely transformed to be converted into a predicted stock price.

또한, 사전에 정해진 오차 범위에 의해, 예측 주가의 최고 목표가와 최저 목표가로 구성된 제1 예측 주가 정보를 획득한다.In addition, the first predicted stock price information including the highest target price and the lowest target price of the predicted stock price is acquired according to a predetermined error range.

그리고 텍스트 마이너(33)를 통해, 뉴스 데이터로부터 무드 데이터를 산출하고, 산출된 무드 데이터로부터 가중치를 획득한다.Then, through the text miner 33, mood data is calculated from the news data, and weights are obtained from the calculated mood data.

그리고 예측 주가 정보에 가중치로 가중하여, 최종 예측 주가 정보를 획득한다.Then, by weighting the predicted stock price information with a weight, the final predicted stock price information is obtained.

즉, 본 발명의 주가 실시간 예측 시스템(30)은 시계열 데이터를 LSTM을 통하여 학습시키고 예측할 때 가중치를 조정하는데 사용한다. 딥러닝을 통한 주식 예측 시스템은 실시간 변화는 시장을 인지하지 못하기 때문에 실시간 뉴스 마이닝을 통해서 수집한 것을 분석하여 최종 주가 예측 때 참고하여 정확도를 높이는데 사용한다.That is, the real-time stock price prediction system 30 of the present invention learns time series data through LSTM and uses it to adjust weights when predicting. Since the stock prediction system through deep learning does not recognize real-time changes in the market, it is used to improve the accuracy by analyzing the data collected through real-time news mining and referring to the final stock price prediction.

앞서 설명한 바와 같이, 본 발명의 실시간 예측 시스템은 과거 데이터(예를 들어, 30일간 데이터)를 LSTM 모델을 이용하여 반복 훈련시키고 미래의 1일 주가를 예측하며, 실시간 증시 뉴스는 YTextMiner를 사용하여 최종 주가 예측을 할 때 가중치를 적용한다.As described above, the real-time prediction system of the present invention repeatedly trains past data (for example, 30-day data) using an LSTM model and predicts future daily stock prices, and real-time stock news is final using YTextMiner. Weights are applied when predicting stock prices.

즉, 본 발명의 실시간 예측 시스템은 순환신경망(RNN:　Recurrent Neural Network)을 이용하여 과거의 주가 분석을 시계열을 통해 추론을 수행하고, 텍스트 마이너(YTextMiner 등)를 이용하여 실시간 증시 뉴스를 검색하고 이를 통해 주가에 미치는 영향을 분석하여 가중치에 변화를 주어 단기 주가를 예측한다.That is, the real-time prediction system of the present invention uses a recurrent neural network (RNN: 　 Recurrent Neural Network) to infer past stock price analysis through time series, and searches for real-time stock news using text miners (YTextMiner, etc.) The short-term stock price is predicted by changing the weight by analyzing the impact on the stock price.

인공지능을 이용한 예측 시스템 개발은 적용 분야가 다양하여 날씨, 기후변화, 지진발생, 태풍 이동 경로, 바이러스 이동 경로, 자동차 이동 등 다양하게 적용가능하기 때문에 과거의 데이터를 가지고 미래를 예측하는 연구는 활발히 진행되고 있다. 그 중 주가 예측은 시장 규모가 크기 때문에 가장 활발히 기술개발이 되고 있는 분야로 알려져 있다. 주가 예측을 하기 위해서는 지속적인 모니터링을 통하여 매매 시점을 결정해야 하기 때문에 많은 노력이 들어가지만, 인공지능의 딥러닝 기법을 적용하여 과거의 데이터를 분석하고 주가에 영향을 미치는 뉴스를 텍스트 마이닝하여 자동화된 트레이딩 봇을 통하여 매수와 매도가 이루어질 수 있다. 이를 통해, 불필요한 모니터링 없이 최적의 시기에 매도와 매수가 진행될 수 있다. 또한, 기술적 분석을 통하여 감정적 투자를 통해 손실을 발생하는 경우를 최소화 시키면 보다 효율적으로 주식의 매매 가능하다. 따라서 인공지능 기반의 자동화된 트레이딩 봇 개발이 필요하다.Prediction system development using artificial intelligence can be applied in a variety of ways, such as weather, climate change, earthquakes, typhoon movement routes, virus movement routes, automobile movement, etc. is in progress Among them, stock price prediction is known as the most active technology development field due to its large market size. In order to predict stock prices, it takes a lot of effort because it is necessary to determine the timing of trading through continuous monitoring. Buying and selling can be done through the bot. Through this, selling and buying can proceed at the optimal time without unnecessary monitoring. In addition, by minimizing the case of loss through emotional investment through technical analysis, it is possible to buy and sell stocks more efficiently. Therefore, it is necessary to develop an automated trading bot based on artificial intelligence.

미국의 경우 대부분이 주식 자동매매프로그램을 통한 시스템 트레이딩하고 있다. 주식 자동매매프로그램이란 단순하게 이유 없는 종목을 자동으로 사고파는 기계가 아니라 기술적인 분석을 통하여 명확한 기준이 존재하는 알고리즘을 통해 매매한다. 이를 통해, 전문투자자들의 수준까지 끌어올려 매매를 진행할 수 있고, 접근방식에 따른 시장가, 매수호가, 매도호가의 선택 매매 기간에 따른 분할매도 일괄매도, 또한 감정적인 손절 또는 익절이 아닌 인공지능의 기계적인 매매를 수행할 수 있다. 또한, 이러한 방식은 주식시장에서 가장 독이 되는 감정을 버리고, 시간적 투자가 불가능한 시점에 급등과 급락을 통해 손해 보는 기회비용을 잡아 준다. 결국, 전문성과 시간 측면에서 모두 효과를 가진다.In the case of the United States, most of the system trading is done through an automatic stock trading program. The automatic stock trading program is not simply a machine that automatically buys and sells stocks for no reason, but trades them through an algorithm that has clear standards through technical analysis. Through this, it is possible to proceed with trading by raising the level of professional investors, split selling according to the market price, bid price, and bid price according to the optional trading period according to the approach, and also a machine of artificial intelligence that is not emotional stop loss or profit loss You can conduct trades. In addition, this method throws away the most poisonous emotions in the stock market, and captures the opportunity cost of losing money through surges and sharp declines when time investment is impossible. In the end, both in terms of professionalism and time, it has an effect.

이상, 본 발명자에 의해서 이루어진 발명을 실시 예에 따라 구체적으로 설명하였지만, 본 발명은 실시 예에 한정되는 것은 아니고, 그 요지를 이탈하지 않는 범위에서 여러 가지로 변경 가능한 것은 물론이다.As mentioned above, although the invention made by the present inventors has been described in detail according to the embodiments, the present invention is not limited to the embodiments, and various changes can be made without departing from the gist of the present invention.

10 : 사용자 단말 30 : 주가예측 서버
31 : 신경망 모듈 32 : 신경망 학습부
33 : 텍스트 마이너 34 : 주가 예측부
40 : 데이터베이스 50 : 주가정보 서버
60 : 뉴스 서버10: user terminal 30: stock price prediction server
31: neural network module 32: neural network learning unit
33: text minor 34: stock price prediction unit
40: database 50: stock price information server
60 : news server

Claims

In a real-time stock price prediction system using an LSTM neural network and a text miner,
a neural network module having an LSTM neural network;
a neural network learning unit for learning the LSTM neural network using past stock price data;
a text miner for extracting mood data that numerically indicates whether news data is good news or bad news by text mining; and,
and a stock price prediction unit that inputs stock price data up to the day into the LSTM neural network to obtain output predicted stock price data, and calculates final predicted stock price data by weighting the obtained predicted stock price data with the mood data. Real-time stock price prediction system using LSTM neural network and text miner.

According to claim 1,
The neural network learning unit,
(a) collecting time series data of past stock prices;
(b) calculating a median value of each stock price;
(c) calculating an increment of the median of each stock price;
(d) normalizing the calculated increments;
(e) generating a sequential model from the normalized incremental time series data; and,
(f) a real-time stock price prediction system using an LSTM neural network and a text miner, characterized in that performing a method comprising the step of training the LSTM neural network with the sequential model.

3. The method of claim 2,
In step (c), a real-time stock price prediction system using an LSTM neural network and text miner, characterized in that the increment is normalized by applying a sigmoid function to the increment of the median value.

4. The method of claim 3,
A real-time stock price prediction system using an LSTM neural network and a text miner, characterized in that the normalized increment is calculated by Equation 1 below.
[Formula 1]

step,

,

ego,
t is the date, and H(t) and L(t) represent the high and low prices of stock price data on day t, respectively.

3. The method of claim 2,
In step (e), a plurality of batch sets are sequentially generated from the time series data S(-N+1), S(-N+2), ...., S(0) of the entire normalized increment, A sequential model is constructed by forming a set with a batch size of n+1 to form a series of Nn batch sets, and each batch set consists of n historical stock price data sets and the correct answer of the predicted stock price for the set. Real-time stock price prediction system using LSTM neural network and text miner, characterized in that