KR20200064198A

KR20200064198A - Stock prediction method and apparatus by ananyzing news article by artificial neural network model

Info

Publication number: KR20200064198A
Application number: KR1020180146548A
Authority: KR
Inventors: 서찬웅; 김정일
Original assignee: 디비디스커버코리아 주식회사
Priority date: 2018-11-23
Filing date: 2018-11-23
Publication date: 2020-06-08
Also published as: KR102194200B1

Abstract

The present invention relates to a method of predicting a stock index based on news article analysis by using an artificial neural network model and an apparatus thereof and, more specifically, to a method of predicting a daily stock index from a machine-learnt artificial neural network model by inputting a plurality of new articles into the model, and an apparatus performing the method. According to the present invention, from a news article, instead of extracting a relation tuple (O1; P; O2) having a structure similar to a subject (S), a verb (V) and an object (O) like in Open IE, word level information of which the level is lower than the relation tuple is extracted, and thus, input with a reduced information loss is provided to an artificial neural network. Since a news article title is data which is relatively short and has a low used word count deviation, work level information is used as input instead of applying the form of a relation tuple, and thus, an information loss can be significantly reduced and a beneficial result can be created. Moreover, since a recurrent neural network (RNN) or a long short-term memory (LSTM) model which is a kind of RNN is used as an artificial neural network, sequentially appearing data such as words of a news article can be more accurately processed, and thus, a more reliable result about a stock index can be derived.

Description

Stock prediction method and apparatus by ananyzing news article by artificial neural network model}

본 발명은 인공신경망 모델을 이용한 뉴스 기사 분석에 의한 주가지수 예측 방법 및 장치에 관한 것으로서, 더욱 상세하게는 다수의 뉴스 기사를 기계학습(machine learning)된 인공신경망 모델에 입력하여, 이 모델로부터 매일의 주가지수를 예측하는 방법 및 그 방법을 수행하는 장치에 관한 것이다.The present invention relates to a method and apparatus for predicting a stock index by analyzing a news article using an artificial neural network model, and more specifically, inputting a number of news articles into a machine-learned artificial neural network model, and daily from this model. It relates to a method for predicting the stock price index and the apparatus for performing the method.

지금까지 뉴스 기사를 사용하여 주가지수를 예측하려는 많은 시도가 있어왔다. 초기에는 단어 빈도 기반 분석, 특정 형태소만을 기반으로 한 분석을 통하여 뉴스 기사를 분석하였으나, 이러한 연구들은 주로 단어 빈도를 기반으로 이루어져 실제 텍스트의 의미론적 분석이 이루어질 수 없었다.So far, many attempts have been made to predict stock prices using news articles. Initially, news articles were analyzed through word-frequency-based analysis and analysis based only on specific morphemes, but these studies were mainly based on word-frequency, so semantic analysis of actual text could not be achieved.

최근에는 자연어 처리 기술과 딥러닝(deep learning) 알고리즘의 발달로 문장 내에서 특정 엔티티(entity)를 추출할 수 있게 되었으며, 단어의 의미론적 분석이 가능하게 되었다. 2013년에 Milolov가 제안한 뉴럴 네트워크 기반 Word2vec 알고리즘으로 각각의 단어를 다수의 차원 공간 내의 값들로 표현하여 단어의 상대적인 비교가 가능하게 되었다. 이와 같이 텍스트를 구성하는 하나의 단어 또는 문장을 수치화하는 방법인 단어 임베딩(embedding) 또는 문장 임베딩을 이용하여 기사를 수치화하고, 이를 인공 신경망의 일종인 CNN(convolutional neural network) 또는 DNN(deep neural network) 모델의 입력으로 사용하여, 이로부터 주가지수 등을 예측하는 방법을 개시하기도 하였다.Recently, with the development of natural language processing technology and deep learning algorithm, it is possible to extract a specific entity within a sentence, and semantic analysis of words is possible. In 2013, with the neural network-based Word2vec algorithm proposed by Milolov, it is possible to compare words by expressing each word as values in multiple dimensional spaces. In this way, an article is quantified using word embedding or sentence embedding, which is a method of quantifying one word or sentence constituting text, and this is a type of artificial neural network, a convolutional neural network (CNN) or a deep neural network (DNN). ) As a model input, a method for predicting the stock price index was also disclosed.

이 경우, 매일의 주가지수를 예측하기 위하여, 그 날에 생성되는 뉴스 기사를 입력하여야 하는데 하루에 생성되는 기사의 수는 날마다 변화하므로, 고정된 사이즈의 CNN 입력 벡터를 생성하기 위하여는 하루에 생성된 모든 기사 벡터의 평균을 계산하여 이를 하루의 기사 벡터로 표현하여야 하였다. 이와 같은 기사 벡터의 평균은 모든 데이터를 반영하는 대표값으로의 의미가 있으나, 데이터가 다양한 특성을 가지고 있을수록 오히려 대표값으로의 의미를 상실할 가능성이 있는 문제점이 있었다.In this case, in order to predict the daily stock price index, news articles generated on that day must be inputted, and the number of articles generated per day varies from day to day, so to generate a CNN input vector of a fixed size, it is generated per day. It was necessary to calculate the average of all the article vectors that were created and express them as a daily article vector. The average of such article vectors has a meaning as a representative value reflecting all data, but as the data has various characteristics, there is a possibility that the meaning as a representative value may be lost.

한편, 또다른 인공 신경망의 일종인 NTN(neural tensor network) 모델을 이용하여, 날짜별 평균 벡터를 생성함이 없이 기사단위의 스코어링을 통해 주가의 상승/하락을 예측하는 모델이 제안되었다. 여기에 Open IE(open information extraction) 기술을 적용하여 NTN의 입력을 생성하는데, Open IE는 일반적인 자연어 텍스트에서 relation tuple(O1; P; O2)을 추출하는 방식을 사용한다. 기존 DNN 및 CNN을 사용한 방식에서도 입력 데이터를 생성하는데 NTN이 사용되기도 하였다. 이러한 Open IE에서 추출하는, 주어(S), 동사(V), 목적어(O)와 유사한 구조를 갖는 O1, P, O2의 트리플렛을 뽑아내는 과정에서 데이터 손실이 발생하는 문제점이 있었다. 즉, 뉴스 기사의 타이틀은 축약어와 단어의 함축적 의미가 많아 전처리가 쉽지 않으며, 고정된 데이터가 아닌 실시간 데이터를 처리해야 하는 모델 특성상 이와 같은 데이터 손실을 피하기 어려운 문제점이 있었다.On the other hand, using another neural tensor network (NTN) model, which is a kind of artificial neural network, a model for predicting the rise/fall of stock prices through article-level scoring has been proposed without generating an average vector for each date. Here, an NTN input is generated by applying Open IE (open information extraction) technology. Open IE uses a method of extracting relation tuples (O1; P; O2) from normal natural language text. NTN was also used to generate input data in the existing method using DNN and CNN. In the process of extracting triplets of O1, P, and O2 having structures similar to the subject (S), verb (V), and object (O) extracted from the Open IE, data loss occurs. In other words, the title of the news article has a lot of implications for abbreviations and words, so it is not easy to preprocess, and it is difficult to avoid such data loss due to the model property that needs to process real-time data rather than fixed data.

USUS7467108 B2USUS7467108 B2

본 발명은 이와 같은 문제점을 해결하기 위해 창안된 것으로서, 뉴스 기사로부터, Open IE에서와 같이 주어(S), 동사(V), 목적어(O)와 유사한 구조를 갖는 relation tuple(O1; P; O2)를 추출하지 않고 이보다 낮은 단위인 단어 레벨 정보를 추출함으로써, 정보 손실을 줄인 입력을 인공 신경망으로 제공한다. 뉴스 기사 타이틀은 비교적 짧으며, 사용된 단어 수 편차가 적은 데이터이므로, 이와 같이 relation tuple의 형태를 적용하지 않고 단어 레벨 정보를 입력으로 사용함으로써 정보 손실을 훨씬 줄이고 유용한 결과를 내게 한다. 또한 인공신경망으로서 RNN(recurrent neural network) 또는 RNN의 일종인 LSTM(long short-term memory) 모델을 사용함으로써 뉴스 기사의 단어와 같이 순차적으로 등장하는 데이터 처리를 더욱 정확하게 하여, 주가지수에 대하여 더욱 신뢰성있는 결과를 도출할 수 있도록 하는데 그 목적이 있다.The present invention was created to solve this problem, and from a news article, relation tuple (O1; P; O2) having a structure similar to the subject (S), verb (V), and object (O), as in Open IE. ), by extracting word-level information, which is a unit lower than this, is provided to the artificial neural network to reduce information loss. Since the news article title is relatively short and the variation in the number of words used is small, the information loss is much reduced and useful results are obtained by using word-level information as input without applying the form of relation tuple. In addition, by using a recurrent neural network (RNN) or a long short-term memory (LSTM) model, which is a type of RNN, as an artificial neural network, it is possible to more accurately process data appearing sequentially, such as words in news articles, so that the stock index is more reliable. Its purpose is to enable you to achieve results.

이와 같은 목적을 달성하기 위하여 본 발명에 따른 인공신경망 모델을 이용한 뉴스 기사 분석에 의한 주가지수 예측 방법은, (a) 첫번째 인공신경망 모델(이하 '제1-1 모델'이라 한다)에 데이터 전처리가 수행된 뉴스 기사 타이틀 문장을 입력받는 단계; (b) 상기 제1-1 모델로부터 각 뉴스 기사 타이틀 문장에 대한 임베딩 벡터(이하 '문장 임베딩 벡터'라 한다)를 출력하는 단계; (c) 상기 각 문장 임베딩 벡터를 두번째 인공신경망 모델(이하 '제1-2 모델'이라 한다)에 입력받는 단계; 및 (d) 상기 제1-2 모델에서 주가지수 예측 결과를 출력하는 단계를 포함한다.In order to achieve this purpose, the stock index prediction method by analyzing news articles using the artificial neural network model according to the present invention includes: (a) preprocessing data in the first artificial neural network model (hereinafter referred to as the'first-one model'). Receiving the title sentence of the performed news article; (b) outputting an embedding vector (hereinafter referred to as a "sentence embedding vector") for each news article title sentence from the 1-1 model; (c) receiving each sentence embedding vector into a second artificial neural network model (hereinafter referred to as a '1-2 model'); And (d) outputting a stock index prediction result from the 1-2 model.

본 발명의 다른 측면에 따르면, 인공신경망 모델을 이용한 뉴스 기사 분석에 의한 주가지수 예측 방법은, (a) 첫번째 인공신경망 모델(이하 '제2-1 모델'이라 한다)에 데이터 전처리가 수행된 뉴스 기사 문장을 입력받는 단계; (b) 상기 제2-1 모델로부터 각 뉴스 기사 문장에 대한 임베딩 벡터(이하 '문장 임베딩 벡터'라 한다)를 출력하는 단계; (c) 상기 각 문장 임베딩 벡터를 두번째 인공신경망 모델(이하 '제2-2 모델'이라 한다)에 입력받는 단계; (d) 상기 제2-2 모델에서 다수의 문장 임베딩 벡터로 구성되는 각 기사에 대한 임베딩 벡터(이하 '기사 임베딩 벡터'라 한다)를 산출하는 단계; (e) 상기 각 기사 임베딩 벡터를 세번째 인공신경망 모델(이하 '제2-3 모델'이라 한다)에 입력받는 단계; 및 (f) 상기 제2-3 모델에서 주가지수 예측 결과를 출력하는 단계를 포함한다.According to another aspect of the present invention, a method for predicting a stock index by analyzing a news article using an artificial neural network model includes: (a) News in which data pre-processing is performed on the first artificial neural network model (hereinafter referred to as a '2-1 model'). Receiving an article sentence; (b) outputting an embedding vector (hereinafter referred to as a "sentence embedding vector") for each news article sentence from the 2-1 model; (c) receiving each sentence embedding vector into a second artificial neural network model (hereinafter referred to as a '2-2 model'); (d) calculating an embedding vector (hereinafter referred to as an'article embedding vector') for each article composed of a plurality of sentence embedding vectors in the 2-2 model; (e) receiving each article's embedding vector into a third artificial neural network model (hereinafter referred to as a '2-3 model'); And (f) outputting a stock index prediction result from the 2-3 model.

상기 제1-1 모델에서 입력받는 뉴스 기사 타이틀 문장 또는 상기 제2-1 모델에서 입력받는 뉴스 기사 문장은, 각각 특정 갯수의 단어 임베딩 벡터로 구성될 수 있다.The news article title sentence received from the 1-1 model or the news article sentence received from the 2-1 model may each be composed of a specific number of word embedding vectors.

상기 각 인공신경망 모델은, 일방향 RNN(uni-directional recurrent neural network), 일방향 LSTM(uni-directional long short-term memory), 일방향 GRU(gated recurrent unit), 양방향 RNN(bi-directional RNN), 양방향 LSTM(bi-directional LSTM) 및, 양방향 GRU((bi-directional GRU) 중 어느 하나로 구성될 수 있다.Each of the artificial neural network models includes a one-way uni-directional recurrent neural network (RNN), a uni-directional long short-term memory (LSTM), a gated recurrent unit (GRU), a bi-directional RNN (RNN), and a bidirectional LSTM. (bi-directional LSTM) and bi-directional GRU (bi-directional GRU).

상기 출력되는 주가지수 예측 결과는, 상기 출력되는 주가지수 예측 결과는, 주가 지수의 상승 또는 하강 여부, 또는 주가 지수의 상승 또는 하강에 대한 확률, 또는 주가 지수 자체의 값일 수 있다.The output stock price prediction result, the output stock price prediction result may be whether the stock index rises or falls, the probability of the stock index rising or falling, or the value of the stock index itself.

상기 제1-1 모델 및 제1-2 모델은, 상기 제1-1 모델 및 제1-2 모델과 동일하게 배열된 레이어에 의해 학습되어 형성될 수 있다.The 1-1 model and the 1-2 model may be formed by learning by layers arranged in the same manner as the 1-1 model and the 1-2 model.

상기 제2-1 모델, 제2-2 모델 및 제2-3 모델은, 상기 제2-1 모델, 제2-2 모델 및 제2-3 모델과 동일하게 배열된 레이어에 의해 학습되어 형성될 수 있다.The 2-1 model, the 2-2 model, and the 2-3 model are learned and formed by layers arranged in the same manner as the 2-1 model, the 2-2 model, and the 2-3 model. Can be.

본 발명의 또 다른 측면에 따르면, 인공신경망 모델을 이용하여 뉴스 기사 분석에 의한 주가지수를 예측하기 위한 장치는, 적어도 하나의 프로세서; 및 컴퓨터로 실행가능한 명령을 저장하는 적어도 하나의 메모리를 포함하되, 상기 적어도 하나의 메모리에 저장된 상기 컴퓨터로 실행가능한 명령은, 상기 적어도 하나의 프로세서에 의하여, 뉴스 기사를 입력받아 인공신경망 모델을 이용하여 주가지수 예측 결과를 출력하는 단계가 실행되도록 한다.According to another aspect of the present invention, an apparatus for predicting a stock index by analyzing a news article using an artificial neural network model includes: at least one processor; And at least one memory for storing computer-executable instructions, wherein the computer-executable instructions stored in the at least one memory receive news articles by the at least one processor and use an artificial neural network model. Then, the step of outputting the prediction result of the stock index is executed.

상기 뉴스 기사를 입력받아 인공신경망 모델을 이용하여 주가지수 예측 결과를 출력하는 단계는, (a) 첫번째 인공신경망 모델(이하 '제1-1 모델'이라 한다)에 데이터 전처리가 수행된 뉴스 기사 타이틀 문장을 입력받는 단계; (b) 상기 제1-1 모델로부터 각 뉴스 기사 타이틀 문장에 대한 임베딩 벡터(이하 '문장 임베딩 벡터'라 한다)를 출력하는 단계; (c) 상기 각 문장 임베딩 벡터를 두번째 인공신경망 모델(이하 '제1-2 모델'이라 한다)에 입력받는 단계; 및 (d) 상기 제1-2 모델에서 주가지수 예측 결과를 출력하는 단계를 포함할 수 있다.The step of receiving the news article and outputting the prediction result of the stock price index using the artificial neural network model includes (a) the title of the news article on which the first artificial neural network model (hereinafter referred to as the '1-1 model') is preprocessed with data. Receiving a sentence; (b) outputting an embedding vector (hereinafter referred to as a "sentence embedding vector") for each news article title sentence from the 1-1 model; (c) receiving each sentence embedding vector into a second artificial neural network model (hereinafter referred to as a '1-2 model'); And (d) outputting a stock index prediction result from the 1-2 model.

상기 뉴스 기사를 입력받아 인공신경망 모델을 이용하여 주가지수 예측 결과를 출력하는 단계는, (a) 첫번째 인공신경망 모델(이하 '제2-1 모델'이라 한다)에 데이터 전처리가 수행된 뉴스 기사 문장을 입력받는 단계; (b) 상기 제2-1 모델로부터 각 뉴스 기사 문장에 대한 임베딩 벡터(이하 '문장 임베딩 벡터'라 한다)를 출력하는 단계; (c) 상기 각 문장 임베딩 벡터를 두번째 인공신경망 모델(이하 '제2-2 모델'이라 한다)에 입력받는 단계; (d) 상기 제2-2 모델에서 다수의 문장 임베딩 벡터로 구성되는 각 기사에 대한 임베딩 벡터(이하 '기사 임베딩 벡터'라 한다)를 산출하는 단계; (e) 상기 각 기사 임베딩 벡터를 세번째 인공신경망 모델(이하 '제2-3 모델'이라 한다)에 입력받는 단계; 및 (f) 상기 제2-3 모델에서 주가지수 예측 결과를 출력하는 단계를 포함할 수 있다.The step of receiving the news article and outputting the prediction result of the stock price index using the artificial neural network model includes (a) the sentence of the news article on which the first artificial neural network model (hereinafter referred to as the '2-1 model') is preprocessed with data. Receiving an input; (b) outputting an embedding vector (hereinafter referred to as a "sentence embedding vector") for each news article sentence from the 2-1 model; (c) receiving each sentence embedding vector into a second artificial neural network model (hereinafter referred to as a '2-2 model'); (d) calculating an embedding vector (hereinafter referred to as an'article embedding vector') for each article composed of a plurality of sentence embedding vectors in the 2-2 model; (e) receiving each article's embedding vector into a third artificial neural network model (hereinafter referred to as a '2-3 model'); And (f) outputting a stock index prediction result from the 2-3 model.

본 발명의 또 다른 측면에 따르면, 인공신경망 모델을 이용하여 뉴스 기사 분석에 의한 주가지수를 예측하기 위한 컴퓨터 프로그램은, 비일시적 저장 매체에 저장되며, 프로세서에 의하여, (a) 첫번째 인공신경망 모델(이하 '제1-1 모델'이라 한다)에 데이터 전처리가 수행된 뉴스 기사 타이틀 문장을 입력받는 단계; (b) 상기 제1-1 모델로부터 각 뉴스 기사 타이틀 문장에 대한 임베딩 벡터(이하 '문장 임베딩 벡터'라 한다)를 출력하는 단계; (c) 상기 각 문장 임베딩 벡터를 두번째 인공신경망 모델(이하 '제1-2 모델'이라 한다)에 입력받는 단계; 및 (d) 상기 제1-2 모델에서 주가지수 예측 결과를 출력하는 단계가 실행되도록 하는 명령을 포함한다.According to another aspect of the present invention, a computer program for predicting a stock index by news article analysis using an artificial neural network model is stored in a non-transitory storage medium, and by a processor, (a) the first artificial neural network model ( Hereinafter referred to as'first-one model') receiving a title sentence of a news article on which data pre-processing has been performed; (b) outputting an embedding vector (hereinafter referred to as a "sentence embedding vector") for each news article title sentence from the 1-1 model; (c) receiving each sentence embedding vector into a second artificial neural network model (hereinafter referred to as a '1-2 model'); And (d) outputting a result of predicting a stock price index in the first-2 model.

인공신경망 모델을 이용하여 뉴스 기사 분석에 의한 주가지수를 예측하기 위한 컴퓨터 프로그램은, 비일시적 저장 매체에 저장되며, 프로세서에 의하여, (a) 첫번째 인공신경망 모델(이하 '제2-1 모델'이라 한다)에 데이터 전처리가 수행된 뉴스 기사 문장을 입력받는 단계; (b) 상기 제2-1 모델로부터 각 뉴스 기사 문장에 대한 임베딩 벡터(이하 '문장 임베딩 벡터'라 한다)를 출력하는 단계; (c) 상기 각 문장 임베딩 벡터를 두번째 인공신경망 모델(이하 '제2-2 모델'이라 한다)에 입력받는 단계; (d) 상기 제2-2 모델에서 다수의 문장 임베딩 벡터로 구성되는 각 기사에 대한 임베딩 벡터(이하 '기사 임베딩 벡터'라 한다)를 산출하는 단계; (e) 상기 각 기사 임베딩 벡터를 세번째 인공신경망 모델(이하 '제2-3 모델'이라 한다)에 입력받는 단계; 및 (f) 상기 제2-3 모델에서 주가지수 예측 결과를 출력하는 단계가 실행되도록 하는 명령을 포함한다.The computer program for predicting the stock price index by analyzing news articles using the artificial neural network model is stored in a non-transitory storage medium, and by a processor, (a) the first artificial neural network model (hereinafter referred to as the '2-1 model') Step) receiving a sentence of a news article on which data pre-processing has been performed; (b) outputting an embedding vector (hereinafter referred to as a "sentence embedding vector") for each news article sentence from the 2-1 model; (c) receiving each sentence embedding vector into a second artificial neural network model (hereinafter referred to as a '2-2 model'); (d) calculating an embedding vector (hereinafter referred to as an'article embedding vector') for each article composed of a plurality of sentence embedding vectors in the 2-2 model; (e) receiving each article's embedding vector into a third artificial neural network model (hereinafter referred to as a '2-3 model'); And (f) outputting a result of predicting a stock price index in the 2-3 model.

본 발명에 의하면, 뉴스 기사로부터, Open IE에서와 같이 주어(S), 동사(V), 목적어(O)와 유사한 구조를 갖는 relation tuple(O1; P; O2)를 추출하지 않고 이보다 낮은 단위인 단어 레벨 정보를 추출함으로써, 정보 손실을 줄인 입력을 인공 신경망으로 제공한다. 뉴스 기사 타이틀은 비교적 짧으며, 사용된 단어 수 편차가 적은 데이터이므로, 이와 같이 relation tuple의 형태를 적용하지 않고 단어 레벨 정보를 입력으로 사용함으로써 정보 손실을 훨씬 줄이고 유용한 결과를 내게 한다. 또한 인공신경망으로서 RNN(recurrent neural network) 또는 RNN의 일종인 LSTM(long short-term memory) 모델을 사용함으로써 뉴스 기사의 단어와 같이 순차적으로 등장하는 데이터 처리를 더욱 정확하게 하여, 주가지수에 대하여 더욱 신뢰성있는 결과를 도출할 수 있도록 하는 효과가 있다.According to the present invention, a relation unit (O1; P; O2) having a structure similar to a subject (S), a verb (V), and an object (O) is not extracted from a news article, but is lower than this. By extracting word-level information, it provides an artificial neural network with input that reduces information loss. Since the news article title is relatively short and the variation in the number of words used is small, the information loss is much reduced and useful results are obtained by using word-level information as input without applying the form of relation tuple. In addition, by using a recurrent neural network (RNN) or a long short-term memory (LSTM) model, which is a type of RNN, as an artificial neural network, it is possible to more accurately process data appearing sequentially, such as words in news articles, so that the stock index is more reliable. It has the effect of being able to produce a good result.

도 1은 인공신경망으로서 RNN(recurrent neural network) 및 양방향 RNN(bi-directional RNN)의 구조를 도시한 도면.
도 2는 본 발명에 따른 인공신경망 모델을 이용한 뉴스 기사 분석에 의한 주가지수 예측 방법의 제1 실시예에 따른 프로세스를 나타내는 모식도.
도 3은 본 발명에 따른 인공신경망 모델을 이용한 뉴스 기사 분석에 의한 주가지수 예측 방법 제1 실시예에 대한 순서도.
도 4는 본 발명에 따른 인공신경망 모델을 이용한 뉴스 기사 분석에 의한 주가지수 예측 방법의 제2 실시예에 따른 프로세스를 나타내는 모식도.
도 5는 본 발명에 따른 인공신경망 모델을 이용한 뉴스 기사 분석에 의한 주가지수 예측 방법 제2 실시예에 대한 순서도.
도 6은 본 발명에 따른 인공신경망 모델을 이용한 뉴스 기사 분석에 의한 주가지수 예측 장치의 구성을 나타내는 도면.1 is a diagram illustrating structures of a recurrent neural network (RNN) and a bi-directional RNN (RNN) as an artificial neural network.
Figure 2 is a schematic diagram showing the process according to the first embodiment of the stock index prediction method by analyzing news articles using the artificial neural network model according to the present invention.
Figure 3 is a flow chart for the first embodiment of the stock index prediction method by analyzing news articles using the artificial neural network model according to the present invention.
4 is a schematic diagram showing a process according to a second embodiment of a method for predicting a stock index by analyzing news articles using an artificial neural network model according to the present invention.
5 is a flow chart for a second embodiment of a method for predicting a stock index by analyzing news articles using an artificial neural network model according to the present invention.
6 is a view showing the configuration of a stock index prediction apparatus by analyzing news articles using an artificial neural network model according to the present invention.

이하 첨부된 도면을 참조로 본 발명의 바람직한 실시예를 상세히 설명하기로 한다. 이에 앞서, 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Prior to this, the terms or words used in the present specification and claims should not be construed as being limited to ordinary or lexical meanings, and the inventor appropriately explains the concept of terms in order to explain his or her invention in the best way. Based on the principle that it can be defined, it should be interpreted as meanings and concepts consistent with the technical spirit of the present invention. Therefore, the configuration shown in the embodiments and drawings described in this specification is only one of the most preferred embodiments of the present invention and does not represent all of the technical spirit of the present invention, and thus can replace them at the time of application. It should be understood that there may be equivalents and variations.

도 1은 인공신경망으로서 RNN(recurrent neural network) 및 양방향 RNN(bi-directional RNN)의 구조를 도시한 도면이다.FIG. 1 is a diagram illustrating structures of a recurrent neural network (RNN) and a bi-directional RNN (RNN) as an artificial neural network.

전술한 바와 같이, 본 발명은 S, V, O 보다 낮은 Level 단위(Word 단위)를 기사 단위 레이어로, 기사 단위를 표현하는 레이어를 날짜 단위의 레이어로 올려주는 딥러닝 네트워크를 구성한다. 또한 Word 단위와 기사 단위의 Input 수의 편차가 큰 것을 감안하여 Dynamic RNN(또는 LSTM-GRU)로 모델 구성하며, Zero-padding 없이 Sequence 데이터를 처리하는 모델로 구성한다. 특히, 본 발명은 도 1(a)와 같이 일방향 RNN을 사용할 수도 있으나, 모든 정보를 입력 받은 후 분석이 가능한 모델이므로 도 1(b)와 같이 Bi-directional RNN 구조를 사용하는 것이 바람직하다.As described above, the present invention constitutes a deep learning network that raises lower level units (word units) than S, V, and O as article unit layers and layers representing article units as date units. In addition, considering that the variation in the number of inputs in word units and article units is large, the model is composed of dynamic RNN (or LSTM-GRU), and the model that processes sequence data without zero-padding. In particular, the present invention may use a one-way RNN as shown in FIG. 1(a), but it is preferable to use a bi-directional RNN structure as shown in FIG. 1(b) because it is a model capable of analysis after receiving all information.

도 2를 참조하여 후술하는 바와 같이 Word 단위를 Encoding하는 첫번째 인공신경망 모델(310)에서 시퀀스 단위는 문장을 구성하는 Word의 순서가 되며, 기사 단위를 Encoding 하는 두번째 인공신경망 모델(320)에서 시퀀스 단위는 기사 생성 시간이 기준으로 사용한다. As will be described later with reference to FIG. 2, the sequence unit in the first artificial neural network model 310 encoding the word unit becomes the order of the words constituting the sentence, and the sequence unit in the second artificial neural network model 320 encoding the article unit. The article creation time is used as a basis.

도 2는 본 발명에 따른 인공신경망 모델을 이용한 뉴스 기사 분석에 의한 주가지수 예측 방법의 제1 실시예에 따른 프로세스를 나타내는 모식도이고, 도 3은 본 발명에 따른 인공신경망 모델을 이용한 뉴스 기사 분석에 의한 주가지수 예측 방법 제1 실시예에 대한 순서도이다.2 is a schematic diagram showing a process according to a first embodiment of a method for predicting a stock index by analyzing a news article using an artificial neural network model according to the present invention, and FIG. 3 is for analyzing a news article using an artificial neural network model according to the present invention. It is a flow chart for the first embodiment of the stock index prediction method.

RNN(recurrent neural network) 또는 RNN의 일종인 LSTM(long short-term memory)의 구조를 가지는 인공신경망 모델(310)의 입력으로 뉴스 기사가 입력된다. 입력되는 뉴스 기사는 데이터 전처리를 거친 데이터이다. 본 발명의 주가지수 예측 방법 및 장치(200, 도 6 참조)는 수집된 뉴스 기사에 대하여 이러한 데이터 전처리 과정을 데이터 전처리 어플리케이션(500, 도 6 참조)이 자동으로 수행한다(S301).News articles are input to the input of the artificial neural network model 310 having a structure of a recurrent neural network (RNN) or a long short-term memory (LSTM), which is a type of RNN. News articles that are input are data that has been preprocessed. The stock index prediction method and apparatus 200 (refer to FIG. 6) of the present invention automatically performs the data pre-processing process for the collected news articles (see FIG. 6) (S301).

데이터 전처리 과정에서는, 수집한 뉴스 기사들에 대하여 먼저 시간, 제목 등이 중복되는 기사를 제거하고, 여기서 특수문자나 숫자 등을 제거하고 소문자 변환을 거칠 수 있다. 이러한 변환을 거친 데이터에 대하여 텍스트 정규화(lemmatization, stemming)를 거치도록 하고 문장 내 단어의 수를 일정하게 만든다. 예를 들어 한 문장을 3개의 단어로 구성되도록 처리할 수 있다. 물론 이는 하나의 예이며, 적절한 단어의 수를 결정할 수 있다. 이와 같이 한 후, 최종적으로 입력되는 문장의 최대 시퀀스 길이(max sequence length)를 계산하고 패딩(padding)을 수행한다. 이와 같이 데이터 전처리된 뉴스 기사는 인공신경망 모델(310)의 입력으로 사용된다(S302).In the data pre-processing process, the articles with duplicate times, titles, etc. are first removed from the collected news articles, and special characters or numbers, etc., can be removed and lowercase characters are converted. The data that has undergone this conversion is subjected to text normalization (stemming) and the number of words in a sentence is made constant. For example, a sentence can be processed to consist of three words. Of course, this is just one example, and you can determine the appropriate number of words. After doing this, the maximum sequence length of the finally input sentence is calculated and padding is performed. The news article pre-processed as described above is used as an input of the artificial neural network model 310 (S302).

도 2 및 도 3의 제1 실시예에서는 인공신경망 모델의 입력으로서 하나의 뉴스 기사가 하나의 문장으로 이루어지도록 한 경우를 도시하였다. 즉, 예를 들어 뉴스 기사의 제목 한 문장(10)이 하나의 뉴스 기사를 구성하는 것이다. 도 2 및 도 3의 예에서는, 3개의 단어(11,12,13)로 구성된 각 문장(10)들이 각각 서로 다른 기사의 타이틀로서, 해당 문장(기사 타이틀)이 하나의 뉴스 기사를 이룬다. In the first embodiment of FIGS. 2 and 3, a case in which one news article is made of one sentence as an input of an artificial neural network model is illustrated. That is, for example, one sentence 10 of the title of the news article constitutes one news article. In the example of FIGS. 2 and 3, each sentence 10 composed of three words 11, 12, and 13 is a title of a different article, and the sentence (article title) forms a news article.

인공신경망 모델 입력의 각 단어(11,12,13)는, 데이터 전처리 과정에서 최종 필터링된 단어를 임베딩 테이블(embedding table)에 매칭시켜 생성한 단어 임베딩 벡터(word embedding vector)이며, 이러한 워드 임베딩 벡터의 시퀀스(sequence)가 인공신경망 모델의 입력이 된다. 각 단어에 표시된 3개의 원은 임베딩 벡터의 차원을 가리키며, 도면에서는 편의상 3차원으로 표시하였으나, 이에 한정되지 않으며, 구성에 따라 n차원의 임베딩 벡터가 될 수 있다. 이와 같은 단어의 임베딩 벡터로의 임베딩은 2가지 방법이 가능하다. 첫째는 이미 사전 학습된 워드 임베딩을 사용하여 임베딩 벡터로 매칭을 시키는 방법, 둘째는 사전 학습된 워드 임베딩을 사용하지 않고, 차원을 임의로 넣은 후 워드 임베딩 사전을 함께 학습시키는 방식이 있다. 본 발명에서는 두번째 방법을 사용하는 것을 기본으로 하나, 첫번째 방법을 배제하는 것은 아니며, 첫번째 방법을 채택할 수도 있다.Each word (11, 12, 13) of the artificial neural network model input is a word embedding vector generated by matching the final filtered word to an embedding table in the data preprocessing process, and such a word embedding vector. The sequence of is the input to the artificial neural network model. The three circles indicated in each word indicate the dimension of the embedding vector, and in the drawing, for convenience, the three circles are not limited thereto, and may be an n-dimensional embedding vector depending on the configuration. There are two ways to embed such words into the embedding vector. First, there is a method of matching with an embedding vector using a pre-trained word embedding. Second, there is a method of using a pre-trained word embedding and learning the word embedding dictionary together after randomly entering a dimension. In the present invention, the second method is basically used, but the first method is not excluded, and the first method may be adopted.

도 2 및 도 4에 도시된 바와 같이 본 발명의 인공신경망 모델(310,320,410,420,430)은, 양방향 GRU(gated recurrent unit)를 사용하는 것으로 예시하였으나, 이외에도 양방향 RNN(bi-directional dynamic RNN), 양방향 LSTM(bi-directional LSTM) 등을 사용할 수 있다. 경우에 따라서는 일방향 RNN, 일방향 LSTM 또는 일방향 GRU를 사용할 수도 있으나, 양방향 RNN, 양방향 LSTM 또는 양방향 GRU를 사용하는 것이 바람직하다. 또한 이중에서도 RNN에서 발전한 LSTM 또는 GRU(gated recurrent unit)를 사용하는 것이 더욱 바람직하며, 도 2 및 도 4의 인공신경망 모델(310,320,410,420,430)들은 각각, LSTM보다도 처리속도에서 장점을 가지고 있는 양방향 GRU로써 구성하였다. 이에 따라 각 인공신경망 모델(310,320,410,420,430)에 포함된 박스들은 각각 GRU cell을 의미한다.2 and 4, the artificial neural network model 310, 320, 410, 420, 430 of the present invention is illustrated as using a bidirectional gRU (gated recurrent unit), but also bi-directional dynamic RNN (RNN), bidirectional LSTM (bi -directional LSTM). In some cases, one-way RNN, one-way LSTM, or one-way GRU may be used, but it is preferable to use two-way RNN, two-way LSTM, or two-way GRU. In addition, it is more preferable to use an LSTM or a gated recurrent unit (GRU) developed in RNN, and the artificial neural network models (310, 320, 410, 420, 430) of FIGS. 2 and 4 are configured as bidirectional GRUs, which have advantages in processing speed, respectively, than LSTM. Did. Accordingly, the boxes included in each of the artificial neural network models 310, 320, 410, 420, and 430 refer to GRU cells, respectively.

이러한 입력으로부터 제1 실시예의 첫번째 인공신경망 모델(이하 '제1-1 모델'이라 한다)(310)에서 출력되는 데이터는 각 문장에 대한 임베딩 벡터(20), 즉, 문장 임베딩 벡터(20)이다(S303). 도 2에는 출력으로서 문장 임베딩 벡터(20)들의 시퀀스가 도시되어 있다. 도 2의 제1 실시예에서는 전술한 바와 같이 이러한 각 문장 임베딩 벡터가 결국 각 기사를 가리키는 임베딩 벡터가 되는 것이다.The data output from the first artificial neural network model of the first embodiment (hereinafter referred to as'first-first model') 310 from this input is an embedding vector 20 for each sentence, that is, a sentence embedding vector 20. (S303). 2 shows a sequence of sentence embedding vectors 20 as output. In the first embodiment of FIG. 2, as described above, each of these sentence embedding vectors eventually becomes an embedding vector pointing to each article.

이와 같은 문장, 즉, 기사의 임베딩 벡터(20)의 시퀀스는 RNN 또는 LSTM 구조의 두번째 인공신경망 모델(이하 '제1-2 모델'이라 한다)(320)의 입력이 된다(S304). 또한 이로부터 제1-2 인공신경망 모델(320)의 출력(30)은, 입력된 모든 기사로부터 분석된 주가지수의 상승 또는 하강 결과(30)이다(S305). Such a sentence, that is, the sequence of the embedding vector 20 of the article becomes the input of the second artificial neural network model (hereinafter referred to as the '1-2 model') 320 of the RNN or LSTM structure (S304). In addition, from this, the output 30 of the 1-2 artificial neural network model 320 is a result 30 of rising or falling of the stock index analyzed from all input articles (S305).

이에 대한 실시예로서, 제1-2 모델(320)의 출력(30)은 주가 지수의 상승 혹은 하강 여부, 또는 주가 지수의 상승 혹은 하강에 대한 확률이 될 수 있다. 즉, 예를 들어 상승 63%, 하강 47%등 합이 100%가 되는 확률이 각각 산출될 수 있다.As an embodiment of this, the output 30 of the 1-2 model 320 may be a probability of a rise or fall of the stock index, or a rise or fall of the stock index. That is, for example, a probability that the sum of 100%, such as 63% of the rise and 47% of the fall, can be calculated.

또는 제1-2 모델(320)의 출력(30)은 주가 지수 자체의 값이 될 수 있다. 즉, 예를 들어 전날 뉴스 기사를 모델에 넣었을 때, 오늘 S&P500 지수값 자체 2790 이다 등을 예측할 수 있는 것이다.Alternatively, the output 30 of the 1-2 model 320 may be a value of the stock price index itself. That is, for example, when a news article was added to the model the day before, the S&P 500 index value itself is 2790, etc., can be predicted.

이와 같은 제1-1 모델(310) 및 제1-2 모델(320)에 의한 주가지수 예측 결과 산출은, 주가지수 예측 어플리케이션(400, 도 6 참조)에 의해 수행된다.The calculation of the stock index prediction result by the 1-1 model 310 and the 1-2 model 320 is performed by the stock index prediction application 400 (see FIG. 6 ).

도 4는 본 발명에 따른 인공신경망 모델을 이용한 뉴스 기사 분석에 의한 주가지수 예측 방법의 제2 실시예에 따른 프로세스를 나타내는 모식도이고, 도 5는 본 발명에 따른 인공신경망 모델을 이용한 뉴스 기사 분석에 의한 주가지수 예측 방법 제2 실시예에 대한 순서도이다.4 is a schematic diagram showing a process according to a second embodiment of a method for predicting a stock index by analyzing news articles using an artificial neural network model according to the present invention, and FIG. 5 is for analyzing news articles using an artificial neural network model according to the present invention. It is a flow chart for the second embodiment of the method for predicting the stock price index.

도 2 및 도 3의 경우와 마찬가지로, RNN(recurrent neural network) 또는 RNN의 일종인 LSTM(long short-term memory) 또는 GRU(gated recurrent unit)의 구조를 가지는 인공신경망 모델(410)의 입력으로 뉴스 기사가 입력된다. 다른 인공신경망 모델(420,430) 역시 각각 RNN, LSTM, GRU 중 어느 하나로 구성된다. 입력되는 뉴스 기사는 데이터 전처리를 거친 데이터이다. 본 발명의 주가지수 예측 방법 및 장치(200)는 수집된 뉴스 기사에 대하여 데이터 전처리 과정을 데이터 전처리 어플리케이션(500, 도 6 참조)이 자동으로 수행한다(S501).As in the case of FIGS. 2 and 3, news is input to the artificial neural network model 410 having a structure of a recurrent neural network (RNN) or a long short-term memory (LSTM) which is a type of RNN or a gated recurrent unit (GRU). The article is entered. Other artificial neural network models 420 and 430 are also composed of one of RNN, LSTM, and GRU, respectively. News articles that are input are data that has been preprocessed. The stock index prediction method and apparatus 200 of the present invention automatically performs a data pre-processing process for the collected news articles (see FIG. 6) (S501).

도 2 및 도 3의 제1 실시예와의 차이점은, 인공신경망 모델로 입력(S502)되는 뉴스 기사가 다수의 문장(40)으로 구성되어 있다는 점이다. 즉, 제1 실시예에서는 특정 갯수의 단어로 이루어진 하나의 문장(10), 예를 들어 뉴스 기사의 타이틀(제목)이 되는데 반해, 제2 실시예에서는 각 문장이 특정 갯수의 단어로 이루어진 것은 동일하나, 각 기사는 문장(40)이 특정 갯수 모여서 구성되는 것이다. 따라서 제2 실시예의 첫번째 인공신경망 모델(이하 '제2-1 모델'이라 한다)(410)로는 특정 갯수의 문장(40) 시퀀스로 구성된 기사들이 입력되고, 각 이러한 입력으로부터 제1-1 모델(410)에서 출력되는 데이터는 각 문장에 대한 임베딩 벡터(이하 '문장 임베딩 벡터'라 한다)(50)이다(S503). 도 4에는 출력으로서 문장 임베딩 벡터(50)들의 시퀀스가 도시되어 있다. 도 4의 제2 실시예에서는 전술한 바와 같이 이러한 특정 갯수의 문장 임베딩 벡터(50)의 시퀀스가 하나의 기사를 이루는 것이다.The difference from the first embodiment of FIGS. 2 and 3 is that a news article inputted into the artificial neural network model (S502) is composed of a plurality of sentences 40. That is, in the first embodiment, one sentence 10 of a specific number of words, for example, a title (title) of a news article, whereas in the second embodiment, each sentence is the same as a specific number of words One, each article is composed of a certain number of sentences (40). Therefore, articles composed of a certain number of sentences 40 sequences are input to the first artificial neural network model of the second embodiment (hereinafter referred to as the '2-1 model') 410, and the first-first model ( The data output from 410) is an embedding vector for each sentence (hereinafter referred to as a “sentence embedding vector”) 50 (S503). 4 shows a sequence of sentence embedding vectors 50 as output. In the second embodiment of FIG. 4, as described above, a sequence of such a specific number of sentence embedding vectors 50 constitutes one article.

이와 같은 기사를 구성하는 다수의 문장의 임베딩 벡터(50) 시퀀스는 RNN 또는 LSTM 구조의 두번째 인공신경망 모델(이하 '제2-2 모델'이라 한다)(420)의 입력이 된다(S504). 또한 이로부터 제2-2 인공신경망 모델(420)의 출력은, 다수의 문장 임베딩 벡터(50) 시퀀스로 구성된 하나의 기사에 대하여 부여된 하나의 임베딩 벡터(이하 '기사 임베딩 벡터'라 한다)(60)이다(S505). 도 4에서 제2-2 모델(420)의 출력은, 이와 같이 각 기사마다 부여된 기사 임베딩 벡터들이 된다.The sequence of the embedding vector 50 of a plurality of sentences constituting such an article becomes the input of the second artificial neural network model (hereinafter referred to as the '2-2 model') 420 of the RNN or LSTM structure (S504). In addition, from this, the output of the 2-2 artificial neural network model 420 is an embedding vector (hereinafter referred to as an'article embedding vector') given to a single article composed of a sequence of a plurality of sentence embedding vectors 50 ( 60) (S505). The output of the 2-2 model 420 in FIG. 4 becomes article embedding vectors assigned to each article as described above.

도 4에서는 제2 실시예의 세번째 인공신경망 모델(이하 '제2-3 모델'이라 한다)(430)을 더 포함한다. 제2-3 모델(430)의 입력(S506)은 전술한 바와 같은 기사 임베딩 벡터(60)들이고, 제2-3 모델(430)의 출력은 이와 같이 입력된 모든 기사 임베딩 벡터(60)들로부터 분석된 주가지수의 상승 또는 하강 결과(70)이다(S507).In FIG. 4, the third artificial neural network model of the second embodiment (hereinafter referred to as the '2-3 model') 430 is further included. The input (S506) of the 2-3 model 430 is the article embedding vectors 60 as described above, and the output of the 2-3 model 430 is from all the article embedding vectors 60 input as described above. It is the result of the rise or fall of the analyzed stock index (70) (S507).

이에 대한 실시예로서, 제2-3 모델(430)의 출력(70)은 주가 지수의 상승 혹은 하강 여부, 또는 주가 지수의 상승 혹은 하강에 대한 확률이 될 수 있다. 즉, 예를 들어 상승 63%, 하강 47%등 합이 100%가 되는 확률이 각각 산출될 수 있다.As an example of this, the output 70 of the 2-3 model 430 may be a probability of a rise or fall of the stock index, or a rise or fall of the stock index. That is, for example, a probability that the sum of 100%, such as 63% of the rise and 47% of the fall, can be calculated.

또는 제2-3 모델(430)의 출력(70)은 주가 지수 자체의 값이 될 수 있다. 즉, 예를 들어 전날 뉴스 기사를 모델에 넣었을 때, 오늘 S&P500 지수값 자체 2790 이다 등을 예측할 수 있는 것이다.Alternatively, the output 70 of the 2-3 model 430 may be a value of the stock price index itself. That is, for example, when a news article was added to the model the day before, the S&P 500 index value itself is 2790, etc., can be predicted.

이와 같은 제2-1 모델, 제2-2 모델 및, 제2-3 모델에 의한 주가지수 예측 결과 산출은, 주가지수 예측 어플리케이션(400, 도 6 참조)에 의해 수행된다.The calculation of the stock index prediction results by the 2-1 model, the 2-2 model, and the 2-3 model is performed by the stock index prediction application 400 (see FIG. 6 ).

도 6은 본 발명에 따른 인공신경망 모델을 이용한 뉴스 기사 분석에 의한 주가지수 예측 장치(200)의 구성을 나타내는 도면이다.6 is a view showing the configuration of the stock index prediction apparatus 200 by analyzing news articles using an artificial neural network model according to the present invention.

인공신경망 모델을 이용한 뉴스 기사 분석에 의한 주가지수 예측 장치(200)는 프로세서(210), 프로그램과 데이터를 저장하는 비휘발성 저장부(220), 실행 중인 프로그램들을 저장하는 휘발성 메모리(230), 다른 기기와 통신을 수행하기 위한 통신부(240), 이들 장치 사이의 내부 통신 통로인 버스 등으로 이루어져 있다. 실행 중인 프로그램으로는, 장치 드라이버, 운영체계(Operating System), 및 다양한 어플리케이션이 있을 수 있다. 이러한 어플리케이션에는 도 2 내지 도 5를 참조하여 전술한 바와 같이, 인공신경망 모델(310,320,410,420,430))을 이용하여 뉴스 기사 분석에 의한 주가지수 예측을 수행하는 주가지수 예측 어플리케이션(300,400), 뉴스 기사 데이터를, 주가지수 예측 어플리케이션(300,400)의 인공신경망 모델(310,410)에 입력으로 만들기 위한 데이터 전처리를 수행하는 데이터 전처리 어플리케이션(500)이 포함된다.The stock index prediction apparatus 200 based on news article analysis using an artificial neural network model includes a processor 210, a nonvolatile storage unit 220 for storing programs and data, a volatile memory 230 for storing running programs, and others. It consists of a communication unit 240 for performing communication with the device, a bus, etc., which is an internal communication path between these devices. As a running program, there may be a device driver, an operating system, and various applications. In this application, as described above with reference to FIGS. 2 to 5, the stock index prediction application (300,400) and news article data for performing stock index prediction by news article analysis using artificial neural network models (310,320,410,420,430). A data pre-processing application 500 that performs data pre-processing for making an input to the artificial neural network models 310 and 410 of the stock index prediction application 300 and 400 is included.

10: 뉴스 기사 타이틀 문장
11,12,13 : 각 뉴스 기사 타이틀 문장을 구성하는 단어 임베딩 벡터
20: 제1-1 모델의 출력 문장 임베딩 벡터
30: 제1-2 모델의 출력 주가지수 예측 결과
40: 뉴스 기사 문장
41,42,43 : 각 뉴스 기사 문장을 구성하는 단어 임베딩 벡터
50: 제2-1 모델의 출력 문장 임베딩 벡터
60: 제2-2 모델의 출력 기사 임베딩 벡터
70: 제2-3 모델의 출력 주가지수 예측 결과
100: 주가지수 상승 하강 예측을 위한 NTN 모델
200: 인공신경망 모델을 이용한 주가지수 예측 장치
300: 주가지수 예측 어플리케이션의 제1 실시예
310: 제1-1 모델
311: bi-directional RNN(또는 bi-directional LSTM) 인공신경망 모델의 일 방향 GRU-Cell 라인
312: bi-directional RNN(또는 bi-directional LSTM) 인공신경망 모델의 반대 방향 GRU-Cell 라인
320: 제1-2 모델
400: 주가지수 예측 어플리케이션의 제2 실시예
410: 제2-1 모델
411: bi-directional RNN(또는 bi-directional LSTM) 인공신경망 모델의 일 방향 GRU-Cell 라인
412: bi-directional RNN(또는 bi-directional LSTM) 인공신경망 모델의 반대 방향 GRU-Cell 라인
420: 제2-2 모델
430: 제2-3 모델10: News article title sentence
11,12,13: Word embedding vector of each news article title sentence
20: output sentence embedding vector of the 1-1 model
30: Prediction result of output index of model 1-2
40: News article sentence
41,42,43: The word embedding vector of each news article sentence
50: 2-1 model output sentence embedding vector
60: output article embedding vector of the 2-2 model
70: 2-3 model output stock index prediction results
100: NTN model for estimating stock index rise and fall
200: Stock index prediction device using artificial neural network model
300: first embodiment of stock index prediction application
310: Model 1-1
311: One-way GRU-Cell line of bi-directional RNN (or bi-directional LSTM) artificial neural network model
312: GRU-Cell line opposite to bi-directional RNN (or bi-directional LSTM) artificial neural network model
320: Model 1-2
400: second embodiment of stock index prediction application
410: 2-1 model
411: One-way GRU-Cell line of bi-directional RNN (or bi-directional LSTM) artificial neural network model
412: GRU-Cell line opposite to bi-directional RNN (or bi-directional LSTM) artificial neural network model
420: 2-2 model
430: 2-3 model

Claims

As a method of predicting stock index by analyzing news articles using artificial neural network model,
(a) receiving a title sentence of a news article on which data pre-processing has been performed in the first artificial neural network model (hereinafter referred to as'first-first model');
(b) outputting an embedding vector (hereinafter referred to as a "sentence embedding vector") for each news article title sentence from the 1-1 model;
(c) receiving each sentence embedding vector into a second artificial neural network model (hereinafter referred to as a '1-2 model'); And
(d) outputting a stock index prediction result from the 1-2 model.
Stock index prediction method by analyzing news articles using artificial neural network models.

As a method of predicting stock index by analyzing news articles using artificial neural network model,
(a) receiving a news article sentence in which data pre-processing has been performed in the first artificial neural network model (hereinafter referred to as a '2-1 model');
(b) outputting an embedding vector (hereinafter referred to as a "sentence embedding vector") for each news article sentence from the 2-1 model;
(c) receiving each sentence embedding vector into a second artificial neural network model (hereinafter referred to as a '2-2 model');
(d) calculating an embedding vector (hereinafter referred to as an'article embedding vector') for each article composed of a plurality of sentence embedding vectors in the 2-2 model;
(e) receiving each article's embedding vector into a third artificial neural network model (hereinafter referred to as a '2-3 model'); And
(f) outputting a stock index prediction result from the 2-3 model.
Stock index prediction method by analyzing news articles using artificial neural network models.

The method according to claim 1 or claim 2,
The title sentence of the news article received from the 1-1 model or the sentence of the news article received from the 2-1 model,
Each consisting of a certain number of word embedding vectors
Characterized in that, the stock index prediction method by analyzing news articles using an artificial neural network model.

The method according to claim 1 or claim 2,
Each artificial neural network model,
One-way uni-directional recurrent neural network (RNN), one-way uni-directional long short-term memory (LSTM), one-way gated recurrent unit (GRU), bi-directional RNN (RNN), bi-directional LSTM (LSTM), and , Consisting of any one of bi-directional GRUs
Characterized in that, the stock index prediction method by analyzing news articles using an artificial neural network model.

The method according to claim 1 or claim 2,
The output stock price prediction result,
Whether the stock index rises or falls, or the probability of the stock index rising or falling, or the value of the stock index itself
Characterized in that, the stock index prediction method by analyzing news articles using an artificial neural network model.

The method according to claim 1,
The 1-1 model and the 1-2 model,
Formed by learning by layers arranged in the same way as the 1-1 model and 1-2 model
Characterized in that, the stock index prediction method by analyzing news articles using an artificial neural network model.

The method according to claim 2,
The 2-1 model, the 2-2 model and the 2-3 model,
The 2-1 model, the 2-2 model, and the learned and formed by the layers arranged in the same way as the 2-3 model
Characterized in that, the stock index prediction method by analyzing news articles using an artificial neural network model.

As a device for predicting the stock index by analyzing news articles using artificial neural network model,
At least one processor; And
It includes at least one memory for storing computer executable instructions,
The computer-executable instructions stored in the at least one memory are, by the at least one processor,
Step of receiving the news article and outputting the prediction result of the stock index using the artificial neural network model
Stock index prediction device by analyzing news articles using artificial neural network model, which is executed.

The method according to claim 8,
The step of receiving the news article and outputting a prediction result of a stock index using an artificial neural network model may include:
(a) receiving a title sentence of a news article on which data pre-processing has been performed in the first artificial neural network model (hereinafter referred to as'first-first model');
(b) outputting an embedding vector (hereinafter referred to as a "sentence embedding vector") for each news article title sentence from the 1-1 model;
(c) receiving each sentence embedding vector into a second artificial neural network model (hereinafter referred to as a '1-2 model'); And
(d) outputting a stock index prediction result from the 1-2 model.
It characterized in that it comprises, a stock index prediction apparatus by analyzing news articles using an artificial neural network model.

The method according to claim 8,
The step of receiving the news article and outputting a prediction result of a stock index using an artificial neural network model may include:
(a) receiving a news article sentence in which data pre-processing has been performed in the first artificial neural network model (hereinafter referred to as a '2-1 model');
(b) outputting an embedding vector (hereinafter referred to as a "sentence embedding vector") for each news article sentence from the 2-1 model;
(c) receiving each sentence embedding vector into a second artificial neural network model (hereinafter referred to as a '2-2 model');
(d) calculating an embedding vector (hereinafter referred to as an'article embedding vector') for each article composed of a plurality of sentence embedding vectors in the 2-2 model;
(e) receiving each article's embedding vector into a third artificial neural network model (hereinafter referred to as a '2-3 model'); And
(f) outputting a stock index prediction result from the 2-3 model.
It characterized in that it comprises, a stock index prediction apparatus by analyzing news articles using an artificial neural network model.

As a computer program for predicting stock price index by analyzing news articles using artificial neural network model,
Stored in a non-transitory storage medium, by a processor,
(a) receiving a title sentence of a news article on which data pre-processing has been performed in the first artificial neural network model (hereinafter referred to as'first-first model');
(b) outputting an embedding vector (hereinafter referred to as a "sentence embedding vector") for each news article title sentence from the 1-1 model;
(c) receiving each sentence embedding vector into a second artificial neural network model (hereinafter referred to as a '1-2 model'); And
(d) outputting a stock index prediction result from the 1-2 model.
A computer program for predicting a stock price index by analyzing news articles using an artificial neural network model, which includes instructions to cause the execution.

As a computer program for predicting stock price index by analyzing news articles using artificial neural network model,
Stored in a non-transitory storage medium, by a processor,
(a) receiving a news article sentence in which data pre-processing has been performed in the first artificial neural network model (hereinafter referred to as a '2-1 model');
(b) outputting an embedding vector (hereinafter referred to as a "sentence embedding vector") for each news article sentence from the 2-1 model;
(c) receiving each sentence embedding vector into a second artificial neural network model (hereinafter referred to as a '2-2 model');
(d) calculating an embedding vector (hereinafter referred to as an'article embedding vector') for each article composed of a plurality of sentence embedding vectors in the 2-2 model;
(e) receiving each article's embedding vector into a third artificial neural network model (hereinafter referred to as a '2-3 model'); And
(f) outputting a stock index prediction result from the 2-3 model.
A computer program for predicting a stock price index by analyzing news articles using an artificial neural network model, which includes instructions to cause the execution.